Overview
In this project, we investigated factors associated with reuse of publicly accessible research data, which is data that is made freely available on a journal, repository, or other website. Funding agencies, such as NSF, mandate that data be made available to the public. However, it takes time and resources to do so. In order to help data sharers understand the impact of this effort and to understand if those using the data can re-use it, we studied datasets on popular data repositories, such as KNB, Figshare, and Dryad, and used R’s web scraping capabilities to gather information on heavily reused datasets. We gathered metrics like downloads, citations, views, usability scores, metadata information, dataset size, and more from thousands of datasets from six chosen repositories. We used these metrics to understand reuse, which we measured using both the number of downloads and citations. We also analyzed equity of access by utilizing information that some repositories, such as ICPSR and NSF PAR, track on the makeup of their data users and data sharers. If you’d like to learn more, please come to our virtual poster session!
Teaser Video
Zoom Link
Project Website
Fellow
Emily Kurtz
University of Minnesota, Political Science (PhD) and Statistics (MS)
Interns
Aditi Mahabal
University of Virginia, College of Arts and Sciences
Akilesh S Ramakrishna
University of Virginia, College of Arts and Sciences & Batten School of Leadership and Public Policy
Mentors
Alyssa Mikytuck
Postdoctoral Research Associate, Biocomplexity Institute, University of Virginia
Gizem Korkmaz
Research Associate Professor, Biocomplexity Institute, University of Virginia
Sarah Nusser
Professor Emeritus, Biocomplexity Institute, University of Virginia
Stakeholder
Martin Halbert
Science Advisor for Public Access, National Science Foundation