Armed with a rapid response research grant from the National Science Foundation, University of Oregon biologist Stilianos Louca is mining public databases for genomic and associated data being filed on the coronavirus that causes COVID-19.
Unlike the on-the-ground approach Dr. John Snow, the father of epidemiology, used in the mid-1850s to find the source of a cholera outbreak in London, Louca is working on computers. The hope is to model a robust phylogenetic tree with predictive power to help guide medical decisions and public policy about the disease.
Matthew W. Pennell, an evolutionary biologist at the University of British Columbia, is the project’s co-principal investigator.
“Our goal is to develop more accurate statistical methods for estimating epidemiological parameters of infectious diseases from phylogenetic data, such as transmission rates and the basic reproduction ratio, and apply these methods to improve our understanding and predictions for COVID-19,” said Louca, an assistant professor in the Department of Biology and the Institute of Ecology and Evolution.
Sequenced viral genomes are submitted in real time from researchers around the world into two primary, open-access databases: GenBank of the National Center for Biotechnology Information and the GISAID Initiative, originally known as the Global Initiative on Sharing All Influenza Data.
Genomes are typically submitted along with other useful data, including information on cities, countries and sampling dates, which provide valuable information for modeling the spread of the epidemic, Louca said.
Generally, phylogenetic trees constructed from viral genomes sampled from patients contain information about the historical pattern of transmission and dispersal of infectious diseases. Mathematical models of evolution allow researchers to infer critical epidemiological parameters, such as transmission rates, from information encoded in phylogenetic trees.
Such models to date, however, have potentially serious flaws, Louca and Pennell argued in a paper published in the April 23 print issue of the journal Nature. Many commonly used mathematical models of disease evolution, they concluded, may yield highly inaccurate parameter estimates and may severely underestimate associated uncertainties.
In their new one-year NSF project, they are aiming to clarify what epidemiological insights can be reliably inferred from phylogenetic trees and develop new approaches to characterize COVID-19 transmission. Additionally, Louca said, they hope to determine which environmental, biological and policy factors affect the spread of COVID-19 based on the phylogenetic data.
—By Jim Barlow, University Communications