Skip to main content

Cary Institute hosts data library interns

Three interns worked with Cary scientists to dig into data on zoonotic diseases.

Photo by Markus Spiske

This summer, Cary Institute of Ecosystem Studies had the honor of hosting three data librarian interns from the National Center for Data Services (NCDS). 

Part of the Network of the National Library of Medicine, NCDS provides internships that introduce students from historically excluded racial and ethnic groups to data librarianship, providing practical experiences and skills needed to be competitive for data librarian positions.

Three sites were chosen to host this year’s cohort of interns, and Cary Institute was unique among them. 

“We were the only site where the library students got to learn what it’s like to work directly with scientists,” explained Amy Schuler, director of information services and library at Cary and adviser to the NCDS interns. 

Over the course of eight weeks, the interns — Corey Black (University of Wisconsin-Madison), Jennifer Ye Moon-Chung (University of Pittsburgh), and Katya Mueller (Syracuse University) — worked with Cary scientist mentors Barbara Han and Adrian Castellanos on three data-heavy projects related to zoonotic disease. The students also received training on tools like R, Python, SQL, Git, and learned about what it’s like to be a data librarian. 

photos of three NCDS interns
Cary Institute’s 2023 National Center for Data Services interns. From left to right: Corey Black, Jennifer Ye Moon-Chung, and Katya Mueller.

With more data being generated than ever, and a growing emphasis on making data publicly available, more and more library science programs are offering data science specializations, Schuler explained. “The idea is that data is a collection — a research output like any other type of information — that librarians are often asked to manage, organize, describe, and curate.” 

Accordingly, the students’ projects gave them plenty of practical experience in exploring, cleaning, and visualizing data — with the additional challenge of getting acquainted with a variety of scientific terms that they’d never heard before. 

Katya Mueller dug into data related to alphaviruses, an understudied group of mosquito-borne pathogens that infect the brain, causing diseases such as chikungunya and Eastern equine encephalitis. After merging several data sets, Katya explored relationships among the data using a variety of visualizations, including a map of the geographic distribution of diseases caused by alphaviruses. 

Jenn Ye Moon-Chung explored ZooScores — a way of ranking pathogens to estimate their ability to spill over from animals to humans. Creating visualizations helped Jenn to better understand the data, uncover interesting trends, and develop new questions. 

Corey Black’s project focused on flaviviruses — viruses transmitted through ticks and mosquitoes that can cause severe diseases, including Zika, dengue, and yellow fever. Corey’s project explored whether generalist viruses (those that can infect many different types of mammals) have more potential to cause pandemics. 

All of the students’ projects led to interesting insights, said Han. “They discovered some patterns that we didn't know before, and — as is common in science — revealed lots more questions worth following up on.” 

The interns presented their work to their peers and the other NCDS mentors on August 11. They have also been accepted to present at the Midwest Data Librarian Symposium in October, where their roundtable will focus on the experiences of early professional librarians, managing expectations, barriers to entry, mitigating imposter syndrome, and the importance of mentors. 

One of the students has already created a data-cleaning app based off of her project. 

“It’s satisfying to know they may have gotten something that would prove useful to their career in the future,” said Castellanos.  

More on this topic