Leveraging Elasticsearch to Improve Data Discoverability in Science Gateways
TimeTuesday, July 301:30pm - 2pm
DescriptionThe Texas Advanced Computing Center (TACC) maintains an expanding portfolio of science gateway projects aimed at connecting researchers in diverse domains with TACC’s high-performance computing (HPC) resources. Recent additions include gateways for the NeuroNex Technology Hub, Planet Texas 2050, and the UT Research Cyberinfrastructure. These projects all share the goal of facilitating the curation, processing, and sharing of data by users. By providing a web interface for data curation and submission of jobs to HPC systems, we are able to provide advanced computing capabilities to users without specialist knowledge of HPC architectures. We also provide new features, such as interactive applications, which are unavailable to users with only command-line access.
As the volume of data shared using these gateways grows, a key architectural challenge and request from our users is to make this data quickly and efficiently discoverable from the user interface. As the volume of data grows and with some projects generating tens of millions of files, the discoverability of data has often been a hindrance to the science goals of a project, especially in a collaborative environment. We describe our efforts to enhance data discoverability at TACC using Elasticsearch, a search engine and document store which provides the capability to analyze and search file metadata.