Impact Measurement Strategies for Science Gateways: Present and Future
TimeTuesday, July 3011am - 12:30pm
DescriptionFor the purpose of this panel, a Science Gateway is taken to mean an online platform that enables the connection of at least three entities: i) expensive instruments, supercomputers, or databases / datasets, ii) authors of scientific codes that may consume resources or streams of data emanating from the first category and who may wish to make their codes available for others to use, and iii) vast audiences interested in using the products of the first two categories. By employing distributed computing and data infrastructures, Science Gateways facilitate the creation of significant quantities of research products in the form of new tangible and sometimes intangible digital artifacts. Such products may ultimately take the form of publications, but often may also be new standalone scientific products like datasets and scientific software. Furthermore, they also transform the process of research, for example by changing the direction of collaborations, failed experiments, advances in the “science of cyberinfrastructure” that supports research, or even databases of simulation results that may be used for a variety of purposes. They may also impact education and industry collaboration via new educational approaches. Many of these research product midpoints may never see the light of publication, and yet they contribute significantly to the enterprise of science worldwide. If and when they do reach a stage of publication, often the gateways that supported them are silent partners in such end results, making it difficult and likely inaccurate to measure impact with conventional methods. Compounding this problem is that the publication process runs much slower than the speed of thought enabled by science gateways, sometimes to the point that traditionally measured impact of a science gateway effort may only be possible near the end or after the expiration of funding for the gateway effort. While largely the scientific community acknowledges the value of such platforms often employing distributed computing and data infrastructures, measuring impact in a way that matches the speed of scientific productivity enabled by science gateways, and the acceptance of such measures by the community as a valid standard to justify funding and existence, remains unsolved.
This panel will engage the creators and operators of significant science gateways, large centers that support science gateways, and individuals with expertise in impact measurement to discuss what is currently being done, and the possibility of initiating an effort that would enable an accepted means of approaching this impact problem. Chosen gateway operator panelists can highlight the value of assets contained in the gateway, extensive analysis of user data for alternative impact, cost trade-offs of gateway development, and the value of resources that would otherwise be unavailable to researchers (see panelist list below for examples of these impact measures). Impact measurement expertise on the panel will present lessons from computational social science in analyzing the impact of gateways and future directions for such work (see panelist list).
Note to reviewers: Two possible panelists in addition to those below are checking into conflicts in time with the PEARC meeting.
Proposed panel duration: 90 minutes
Dan Stanzione: Dr. Dan Stanzione, Associate Vice President for Research at The University of Texas at Austin since 2018 and Executive Director of the Texas Advanced Computing Center (TACC) since 2014, is a nationally recognized leader in high performance computing. He is the principal investigator (PI) for a National Science Foundation (NSF) grant to acquire and deploy Frontera, which will be the fastest supercomputer at any U.S. university. Stanzione is also the PI of TACC's Stampede2 and Wrangler systems, supercomputers for high performance computing and for data-focused applications, respectively. For six years he was co-PI of CyVerse, a large-scale NSF life sciences cyberinfrastructure. Stanzione was also a co-PI for TACC's Ranger and Lonestar supercomputers, large-scale NSF systems previously deployed at UT Austin. Stanzione received his bachelor's degree in electrical engineering and his master's degree and doctorate in computer engineering from Clemson University.
Sharon Tettegah: Dr. Tettegah, Director & Professor, Center for Black Studies Research at the University of California, Santa Barbara. She is the principal investigator for a National Science Foundation grant, Coordinating Curricula and User Preferences to Increase the Participation of Women and Students of Color in Engineering. Investigators will engage in data mining of syllabi, course content, public spaces, and instructor materials to aggregate information about curricula presented in different representational forms, such as equations, images, narratives, simulations, and videos through CUE4CHANGE science gateway based on HUBzero.
Stephen K. Burley: Dr. Burley is the director of the RCSB Protein DataBank (RCSB PDB), which manages the Protein Data Bank (PDB) archive jointly with Worldwide Protein Data Bank partners in Europe (PDB in Europe; PDBe), Asia (PDB Japan; PDBj), and the BioMagResBank (BMRB). The PDB archive, established in 1971, is the single global data resource for 3D experimental structures of large biological molecules (proteins, DNA, and RNA). Since 2000, contributions to the PDB have come from more than 30,000 structural biologists worldwide. Millions of data consumers coming from every sovereign nation recognized by the United Nations access the PDB at no charge on an annual basis. Approximately ~2 million structure data files are downloaded every day from the PDB archive, without limitations on usage. The estimated replacement value of PDB data is ~$15 billion. The RCSB PDB is supported jointly by the NSF, NIH, and DOE. An independent economic impact study estimated a 15000-fold return on federal investment, excluding the impact of the PDB data on the biopharmaceutical industry. A recently published study showed that ~5,900 PDB structures contributed to 184 of 210 US FDA New Drug Approvals for the years 2010-2016.
Gerhard Klimeck: Dr. Klimeck is the director of nanoHUB.org, a science gateway established in 1998 and serving 16,000 simulation users annually and an additional 1.4 million unique consumers of other resources on nanoHUB annually. nanoHUB gathers and analyzes significant amounts of user telemetry data with novel clustering and visualization methods to demonstrate use in more than 200 classroom settings annually, and to date has impacted more than 35,000 students. nanoHUB has been cited more than 2000 times in the scientific literature, which, if it were an individual would have an h-index of 82.
Emre Brookes: Dr. Brookes is the project lead for GenApp, a framework for rapidly generating complete applications. Applications can be generated in a variety of target languages, including stand-alone GUI and web based. GenApp is being used by multiple projects to produce science gateways. SASSIE-web, serving the small angle scattering community, currently has over 600 registered users who ran over 20k jobs in 2018 and has generated over 40 publications. The goal of GenApp is to minimize the effort of scientist-developers to build and maintain applications so that they can maximize their “science” productivity. A key metric for GenApp is the amount of effort scientists can save in deploying gateways and applications, to where a gateway has been deployed in under an hour.
Mark Miller: Dr. Miller is the PI of the CIPRES science gateway, created in December 2009 from the CIPRES Portal, a highly used gateway for phylogenetic research. CIPRES supports nearly 20 parallel codes, and provides access to XSEDE resources. Since its creation, CIPRES has served over 35,000 unique users running 1.50 million jobs that have run more than 115 million core hours. Currently CIPRES supports over 25,000 monthly job submissions from over 2000 users. It has been used by over 90 instructors in an educational setting, and has supported over 5,200 publications.
Sabine Brunswicker: Dr. Brunswicker is a computational social scientist and Director of the Research Center for Open Digital Innovation at Purdue University. Her work is particularly focused on open digital innovation, describing new ways of using information technologies to organize the collective design and use of innovative digital goods on digital platforms such as science gateways. In her work, she designs and examines systems and technologies that support open digital innovation with respect to their technological and behavioral impact. She uses techniques of computational social science (agent-based modeling, network analysis, experiments) to predict individual as well as collective outcomes in open digital innovation.