The COVID-19 pandemic underscores the need to develop new approaches to vaccine development and drug discovery. Traditional approaches take years, time we don’t have when responding to an emerging, highly contagious and devastating disease. However, machine-learning can greatly expedite the drug discovery process.
Last month, MIT Jameel Clinic hosted the AI Cures Drug Discovery Conference to showcase research by scientists using machine-learning to augment and accelerate drug discovery. Ten speakers presented their research, many of whom used computer science to tightly model biology and chemistry, says Regina Barzilay, the faculty lead of Jameel Clinic. Most researchers postponed other work to quickly turn to COVID-19 at the start of the pandemic, while simultaneously relying on a foundation of existing research.
An example of applying concepts from other areas appeared in the first talk. Bonnie Berger, the Simons Professor of Mathematics at MIT, gave a presentation on how understanding viral evolution can aid in vaccine development and drug design. She likens viral escape—when viruses evolve, escaping recognition from antibodies—to changes in an English language sentence. If “the boy pats the dog” changes to “the boy patx the dog,” the sentence is rendered meaningless, similar to a virus mutating, but losing its ability to replicate. In contrast, if the sentence shifts to “the boy eats the dog,” it changes its meaning but preserves grammaticality, the equivalent of a viral protein preserving its function, but escaping recognition from antibodies.
Viral escape is the reason we don’t have a comprehensive vaccine for the flu, which causes hundreds of thousands of deaths worldwide each year. Early evidence suggests the SARS-CoV-2 spike protein also exhibits viral escape, Berger says. “Ideally, understanding escape and predicting escape will inform antiviral and vaccine development.”
In response, Berger developed a novel approach to study viral escape in SARS-CoV-2, as well as influenza and HIV, using a model that’s often employed in natural language processing, the branch of machine-learning that deals with deciphering human language. The model simultaneously learns the grammar and semantics of a protein sequence. Intuitively, parts of a protein that exhibit high viral escape retain their grammar, or function, but have high semantic change and the ability to elude antibodies.
Berger and her group compared predictions from the model to analysis of viral escape done in the lab. They aligned with each other, a promising sign. The model points to a specific subunit on the SARS-CoV-2 spike protein, an area that helps the virus fuse to the cell membrane, as a promising target for vaccines and therapeutics.
Throughout the day, several talks highlighted the reiterative and collaborative nature of research across institutions. Jian Tang, an assistant professor at Mila-Quebec AI Institute at HEC Montréal, presented his recent research on a fundamental problem in drug discovery. Tang, who describes himself as “classically a machine-learning person, but recently fascinated with the biological world,” set to work on representing 3D structures of molecules. Often, molecules are represented using strings of special characters or 2D molecular graphs. But, molecular 3D structures, called conformations, are a more “natural and intrinsic” was to represent molecules, which can help determine both physical and biological properties in modeling.
Traditionally, 3D structure can be determined experimentally using crystallography, or using computational methods like molecular dynamics. Both approaches can be prohibitively expensive. Working off a large set of training data, Tang developed a deep generative model to predict the conformations of molecules. In experiments to evaluate the strength of the model, Tang utilized a vast dataset of 33 million molecular conformers generated by Rafael Gomez-Bombarelli, a professor of Materials Science and Engineering at MIT.
After Tang, Rafael Gomez-Bombarelli detailed his own research in the next talk, expanding the conversation on molecular 3D structures. Gomez-Bombarelli focuses on converting 2D graphs into 3D conformers using physical simulations. Every molecule is actually vibrating and can be represented as a distribution of geometric forms, depending on energy in the system.
“The most stable [forms] might not be the one that’s actually biologically reactive,” Gomez-Bombarelli says.
“We’re increasingly tasking our graph model to learn what the role of geometry may have been. It may have been very large, maybe small.”
David Gifford, a Professor of Electrical Engineering and Computer Science and a Professor of Biological Engineering at MIT, was the final researcher to speak. His recent work explores a new paradigm for designing and evaluating vaccines. The majority of potential SARS-CoV-2 vaccines are “evaluated on their ability to produce neutralizing antibodies,” Gifford says, which bind to the virus spike protein and block entry to the cells. “It’s now known that the neutralizing antibodies in convalescent patients are not durable,” he says. “Between thirteen and forty percent of patients lose these neutralizing antibodies.”
In contrast, Gifford has focused on cellular immunity, which occurs when an infected cell expresses a specific set of proteins, which acts like “little billboards” on the cell surface to confer immunity.
Every person has their own set of alleles for these proteins, which determine what bits of the virus will be displayed and actually used to activate our immune system, Gifford says. When a vaccine delivers immunogens to the body, it’s important to understand how viral proteins are displayed, and if they induce immunological memory.
To explore this nuance, Gifford and his group assayed blood from recovered COVID-19 patients to find which SARS-CoV-2 peptides provoke an immune response. Using that data which came from a small number of people, they were able to model which SARS-CoV-2 proteins are immunogenic more broadly. Next, they used the model to score how a specific set of alleles responds to virus infection, Gifford explained. They could also use the model to estimate how many people would be covered by a given vaccine, based on the frequency of genetic variation in the population.
Their final step involved a machine-learning model using combinatorial optimization to predict the best sequences of peptides for vaccine design, which would optimize coverage across the population. The model outperformed the baseline of 29 other peptide vaccine designs.
“Combinatorial optimization always wins,” Gifford concluded. “You can always make a much better vaccine than has been designed before.” With collaborators at MIT and Harvard, the group has plans to experimentally validate certain vaccine designs.
“COVID-19 appeared early January, in that interim time, there has been enormous progress with machine-learning and prediction,” says molecular biologist and Nobel Prize winner Phil Sharp, who moderated the conference panel on the future of pharmacology, as shaped by the pandemic.
“We not only need to think about today and the pandemic, and how we use the tools that we have today, both antivirals and antibiotics, but also, what do we want to be building in the future,” says Anne Fischer, a program manager at DARPA.
Najat Khan, Chief Data Science Officer for Janssen R&D, the pharmaceutical company of Johnson & Johnson, spoke of “very deliberately applying data science across the entire pipeline, either to make medicine in a more effective or efficient way.”
“As a former chemist, there’s a huge amount of value if we can make more intelligent decisions early on, versus all the bespoke assays that are done today,” she says.
Khan says the company has collaborated with Dimitris Bertsimas, a professor in the Sloan School of Management and Jameel Clinic Faculty Lead, to predict COVID-19 hotspots in the United States for a vaccine program. Using a complex machine-learning model, they predicted three and a half months ago a need for vaccine trials to be in certain areas in the Midwest, where COVID-19 cases are now soaring.
“When the shutdowns happened, everybody decided to use their wares,” says panelist Noubar Afeyan, CEO of Flagship Pioneering and co-founder of Moderna. “There’s an unprecedented amount of data, and publicly available data that’s been spewing out in the last six months.”
With more data available, machine-learning efforts in drug discovery can go even further in a short amount of time.
Afeyan says, “I’d hate to see that genie put back in the bottle.”