In the Americas, primate species likely to harbor Zika – and potentially transmit the virus – are common, abundant, and often live near people. So reports a study published in Epidemics. Findings are based on an innovative model developed by a collaborative team of researchers from Cary Institute of Ecosystem Studies and IBM Research through its Science for Social Good initiative.
Lead author Barbara Han, a disease ecologist at Cary Institute, explains: “When modeling disease systems, data gaps can undermine our ability to predict where people are at risk. Globally, only two primate species have been confirmed positive for Zika virus. We were interested in how a marriage of two modeling techniques could help us overcome limited data on primate biology and ecology – with the goal of identifying surveillance priorities.”
The recent Zika epidemic in the Americas was one of the largest outbreaks in modern times, infecting over half a million people. Like other mosquito-borne flaviviruses, Zika circulates in the wild. Primates can serve as disease reservoirs of spillover infection in regions where mosquitoes feed on both primates and people.
By analyzing data on flaviviruses and the primate species known to carry them, and comparing these traits to 364 primate species that occur globally, the model identified known flavivirus carriers with 82% accuracy and assigned risk scores to additional primate species likely to carry Zika virus. The end product includes an interactive map that takes into account primate geographic ranges to identify hotspots where people are most at risk of Zika spillover.
Primate species in the Americas with Zika risk scores over 90% included: the tufted capuchin (Cebus apella), the Venezuelan red howler (Alouatta seniculus), and the white-faced capuchin (Cebus capucinus) – species adapted to living among people in developed areas. Also on the list: white-fronted capuchins (Cebus albifrons), commonly kept as pets and captured for live trade, and spider monkeys (Saimiri boliviensis), which are hunted for bushmeat in parts of their range.
“These species are geographically widespread, with abundant populations that live near human population centers. They are notorious crop raiders. They’re kept as pets. People display them in cities as tourist attractions and hunt them for bushmeat. In terms of disease spillover risk, this is a highly alarming result,” says coauthor Subho Majumdar.
Adding to the concern: the mosquito species most likely to spread Zika are commonly found near humans, and are able to thrive in natural and altered landscapes.
To overcome data gaps, the team combined two statistical tools – multiple imputation and Bayesian multi-label machine learning – to assign primate species with a risk score indicating their potential for Zika positivity.
Traits of six mosquito-borne diseases were assessed: yellow fever, dengue fever, Japanese encephalitis, St. Louis encephalitis, Zika virus, and West Nile virus. Three of these had known primate reservoirs.
Biological and ecological traits of the 18 primate species that have tested positive for any mosquito-borne flavivirus were compared to the traits of 364 primate species that occur globally. 33 features were assessed – including things like metabolic rate, gestation period, litter size, and behavior. Features were weighted for importance in predicting Zika positivity.
Han explains: “Like all pathogens, Zika virus has unique requirements for what it needs in an animal host. To determine which species could harbor Zika, we need to know what these traits are, which species have these traits, and which of these species can transmit the pathogen to humans. This is a lot of information, much of which is unknown.”
To overcome data limitations, a statistical method called Multiply Imputed Chained Equations (MICE) was used. MICE sets computer algorithms to the task of searching through datasets of organism traits to draw connections between organisms with similar or related traits. When the algorithm encounters a missing data entry, it uses these connections to infer the missing information and fill the ‘blanks’ in the dataset.
Machine learning was applied to this ‘filled in’ dataset to predict primate species most likely to carry Zika virus. The model produced a risk score for each species by combining flavivirus infection history and biological traits to predict the likelihood of Zika positivity.
This method could help improve forecasting models for other disease systems, beyond Zika. Senior author Kush Varshney from IBM Research explains, “Data gaps are a reality, especially in infectious diseases that originate from wild animal hosts. Models like the one we developed can overcome some of these gaps and help pinpoint species of concern to fine-tune surveillance, forecast spillover events, and help guide efforts by the public health community.”
With Varshney adding, “Conducting machine learning on small-sized, incomplete, and noisy datasets to support critical decision making is a challenge shared across many industries and sectors. We will surely use the experience gained from this project in many different application areas.”
Han concludes, "This research was made possible by innovations provided by the broader scientific community. We relied on primate and pathogen data collected by hundreds of field researchers, and the base machine learning and imputation methods that we adapted in this research already existed. Partners at IBM Research took on a lion's share of the math and coding. It was an incredibly successful interdisciplinary collaboration – the kind we need more of if we want to find new solutions to complex problems."
Han, Barbara & Majumdar, Subhabrata & P. Calmon, Flavio & S. Glicksberg, Benjamin & Horesh, Raya & Kumar, Abhishek & Perer, Adam & B. von Marschall, Elisa & Wei, Dennis & Mojsilovic, Aleksandra & Kush, Ramazon. (2019). Confronting data sparsity to identify potential sources of Zika virus spillover infection among primates. Epidemics. 10.1016/j.epidem.2019.01.005.
Barbara Han – Cary Institute of Ecosystem Studies, Subhabrata Majumdar – IBM Research and University of Florida Informatics Institute (now at AT&T Labs), Flavio Calmon – IBM Research (now at Harvard University), Benjamin Glicksberg – IBM Research (now at University of California, San Francisco), Raya Horesh – IBM Research, Abhishek Kumar – IBM Research (now at Google Brain), Adam Perer – IBM Research (now at Carnegie Mellon University), Elisa von Marschall – The Weather Company, Dennis Wei – IBM Research, Aleksandra Mojsilović – IBM Research. Kush Varshney – IBM Research
Cary Institute of Ecosystem Studies is an independent nonprofit center for environmental research. Since 1983, our scientists have been investigating the complex interactions that govern the natural world and the impacts of climate change on these systems. Our findings lead to more effective management and policy actions and increased environmental literacy. Staff are global experts in the ecology of: cities, disease, forests, and freshwater.
IBM's Science for Social Good initiative partners IBM Research scientists and engineers with academic fellows and subject matter experts from a diverse range of non-governmental organizations (NGOs) to tackle emerging societal challenges using science and technology. Inspired by other successful philanthropic IBM initiatives that prioritize the giving of time and talent to governments and non-profits, Science for Social Good is built on the premise that applied science and technology can solve the world’s toughest problems; accelerating the rate and pace of solutions through the scientific method.