Overview:
This goal of this project is to identify emerging research topics across time utilizing topic models and visualization techniques. The data utilized for this project is a corpus of Research and Development abstracts that is publicly available from Federal RePORTER. We built on prior work for this project by adding the 2019 data to our dataset and using the topic modeling techniques of Latent Dirichlet Allocation and Nonnegative Matrix Factorization. Using these topic model results we employed an emerging topic strategy to determine which topics are gaining (or waning) in popularity over time. We also created a dashboard for users to interact with topic model results and even create their own topic models about specific areas of interest, for example, pandemics.
Teaser Video:
Research Project Webpage:
Click here for more details about the project including findings, data, and methods.
Fellows:
Lara Haase
Carnegie Mellon University, MS in Public Policy and Management – Data Analytics
Interns:
Martha Czernuszenko
The University of Texas at Austin, Information Systems & Canfield Business Honors Program
Liz Miller
William & Mary, International Relations
Sean Pietrowicz
University of Notre Dame, Applied and Computational Mathematics and Statistics
Mentors:
Kathryn Linehan
Research Scientist (Project Lead), Biocomplexity Institute, University of Virginia
Eric Oh
Research Assistant Professor, Biocomplexity Institute, University of Virginia
Stephanie Shipp
Deputy Division Director and Research Professor, Biocomplexity Institute, University of Virginia
Joel Thurston
Senior Scientist, Biocomplexity Institute, University of Virginia
Stakeholders:
National Center for Science and Engineering Statistics, Research & Development Statistics Program:
- John Jankowski, Program Director
- Audrey Kindlon, Survey Statistician
- Chris Pece, Senior Analyst
- Ronda Britt, Senior Analyst
- Gary Anderson, Senior Science Resources Analyst