In this project, we investigated factors associated with reuse of publicly accessible research data, which is data that is made freely available on a journal, repository, or other website. Funding agencies, such as NSF, mandate that data be made available to the public. However, it takes time and resources to do so. In order to help data sharers understand the impact of this effort and to understand if those using the data can re-use it, we studied datasets on popular data repositories, such as KNB, Figshare, and Dryad, and used R’s web scraping capabilities to gather information on heavily reused datasets. We gathered metrics like downloads, citations, views, usability scores, metadata information, dataset size, and more from thousands of datasets from six chosen repositories. We used these metrics to understand reuse, which we measured using both the number of downloads and citations. We also analyzed equity of access by utilizing information that some repositories, such as ICPSR and NSF PAR, track on the makeup of their data users and data sharers. If you’d like to learn more, please come to our virtual poster session!
University of Minnesota, Political Science (PhD) and Statistics (MS)
University of Virginia, College of Arts and Sciences
University of Virginia, College of Arts and Sciences & Batten School of Leadership and Public Policy
Postdoctoral Research Associate, Biocomplexity Institute, University of Virginia
Research Associate Professor, Biocomplexity Institute, University of Virginia
Professor Emeritus, Biocomplexity Institute, University of Virginia
Science Advisor for Public Access, National Science Foundation