Skip to main content

Tracking infectious disease

Machine learning is an exciting tool because it can model risk using incomplete datasets.

Imagine a world where a computer program could pinpoint the next infectious disease outbreak, guiding response efforts and saving lives. Cary Institute disease ecologist Dr. Barbara Han is bringing this vision closer to reality.

Most emerging infectious diseases are spread to humans by animals, with more than a billion people suffering annually. Prevention hinges on knowing which animals carry disease, where they come in contact with people, and how this interface is shaped by urbanization, poverty, agriculture, political unrest, and climate instability. To date, most infectious diseases are dealt with reactively, with the medical community scrambling to contain outbreaks.

In the age of big data, Han is harnessing computing power to develop targeted disease surveillance tools. Thanks to the efforts of field biologists, data repositories house robust information on the ecology, behavior, physiology, and distribution of the world’s wildlife. Using machine learning, a form of artificial intelligence, Han is mining these data to identify characteristics common among disease ‘reservoirs.’ These animals harbor pathogens that make other species, including humans, sick. 

Rodents have been long maligned for their role in spreading disease, from Hantavirus to the plague. Fittingly, they were the first group that Han investigated. With University of Georgia colleagues, she developed an algorithm-based sorting model that assessed 2,227 rodent species for more than 50 characteristics. Not only did it predict known rodent reservoirs with 90% accuracy, it flagged 58 new potential reservoirs and 159 new hyper-reservoirs – animals that can carry multiple pathogens. 

The riskiest rodents live fast, die young, and have large geographic ranges in areas with low biodiversity. Past work has shown animals that mature quickly and reproduce early and often tend to invest less in immune response. Fast-lived rodent species may be more tolerant of pathogens, hence better reservoirs. And since they thrive in diverse habitats, including those fragmented by development, they are more likely to come into contact with people.

Results, published in the Proceedings of the National Academy of Sciences, provide a watch list of high-risk rodents. They also highlight areas vulnerable to rodent-borne disease outbreaks, including North America, South America’s Atlantic coast, Europe, Russia, and parts of Central and East Asia. Two potential reservoir species flagged by the model were confirmed before the paper went to press. In the U.S., red-backed voles carry the tapeworms that cause echinococcosis, while in Asia Minor, Gunther’s voles harbor the leishmaniasis protozoan.

Machine learning is an exciting tool because it can model risk using incomplete datasets. While scientists have catalogued 1.9 billion of Earth’s animal inhabitants, only a fraction have been carefully monitored, with efforts skewed toward wealthy nations. The disease-sleuthing model Han is refining will help bolster surveillance in less affluent regions. Ultimately, turning predictions into prevention will require collaboration with experts on the ground. 

Han and colleagues are currently using machine learning to explore potential reservoirs of Ebola and other filoviruses that infect humans and other great apes.


More on this topic