Overview

In this project, we investigated factors associated with reuse of publicly accessible research data, which is data that is made freely available on a journal, repository, or other website. Funding agencies, such as NSF, mandate that data be made available to the public. However, it takes time and resources to do so. In order to help data sharers understand the impact of this effort and to understand if those using the data can re-use it, we studied datasets on popular data repositories, such as KNB, Figshare, and Dryad, and used R’s web scraping capabilities to gather information on heavily reused datasets. We gathered metrics like downloads, citations, views, usability scores, metadata information, dataset size, and more from thousands of datasets from six chosen repositories. We used these metrics to understand reuse, which we measured using both the number of downloads and citations. We also analyzed equity of access by utilizing information that some repositories, such as ICPSR and NSF PAR, track on the makeup of their data users and data sharers. If you’d like to learn more, please come to our virtual poster session! 

Teaser Video

Zoom Link

 

Project Website

 

Fellow

Emily Kurtz

University of Minnesota, Political Science (PhD) and Statistics (MS) 

 

 

 

Interns

Aditi Mahabal 

University of Virginia, College of Arts and Sciences  

 

 

 

Akilesh S Ramakrishna  

University of Virginia, College of Arts and Sciences & Batten School of Leadership and Public Policy  

 

 

 

Mentors 

Alyssa Mikytuck

Postdoctoral Research Associate, Biocomplexity Institute, University of Virginia

Gizem Korkmaz

Research Associate Professor, Biocomplexity Institute, University of Virginia

Sarah Nusser

Professor Emeritus, Biocomplexity Institute, University of Virginia

Stakeholder

Martin Halbert

Science Advisor for Public Access, National Science Foundation