
Drug development takes years, but machine-learning can help accelerate that process. Early in the COVID-19 pandemic, Prof. Rafael Gomez-Bombarelli (MIT Department of Materials Science and Engineering) began working on machine-learning tools to aid in identifying molecules with therapeutic effects. Gomez-Bombarelli will present his work, “Enhancing end-to-end molecular property prediction with geometrical and conformational representations,” at the AI Cures Drug Discovery Conference on October 30th.
In some computer tasks, such as predicting properties of molecules, chemical properties are often represented as two-dimensional graphs, with annotations to characterize how atoms are arranged relative to one another. Gomez-Bombarelli and collaborator Simon Axelrod from Harvard took a different, innovative route. They generated three-dimensional structures for more than 400,000 molecules, which had been previously tested for therapeutic effects against SARS. They then used machine-learning models to predict which molecules could be active against the coronavirus. At the same time, they studied whether augmenting 2D graphs with molecules’ 3D structures, known as conformers, could improve predictions.
While SARS-CoV-2 is a novel virus, its sequence and structure are nearly the same as SARS-CoV. “We looked at SARS-CoV data to see what’s the best job you can do with that, then tried to transfer that knowledge to COVID-19,” Gomez-Bombarelli says.
They used datasets from both AI Cures and the Broad Repurposing Hub. To generate 3D structures, the group applied physical simulations, following laws of quantum mechanics, to data in the 2D graphs. “Physics extrapolates, it’s robust, it doesn’t need training data,” Gomez-Bombarelli says. “If it’s a molecule that nobody has ever seen, then quantum mechanics still applies.” Ultimately, physics can help tease out information that arises from the graphs in a manner that is hard to infer without simulations.
While the work is on COVID-19, Gomez-Bombarelli’s research goal, in part, is assessing the role that 3D geometry can play in machine-learning. In general, with additional 3D information, some models used in drug repurposing may make better predictions compared to using 2D graphs alone. For other tasks in machine-learning, the researchers found that generating 3D information may not be an improvement.
But in the specific case of transferring knowledge from SARS to COVID-19, using 3D information is more effective than only using graphs, Gomez-Bombarelli says. “It’s clear to us that there’s something in the graph that machine-learning is not able to extract. The step of getting the 3D and then doing the machine-learning is better.”
Most recently, Gomez-Bombarelli is pursuing a second project, with collaborator Bradley Pentelute (MIT Department of Chemistry), to use machine-learning models to design peptides, which are molecules made of chains of amino acids. Certain peptides can penetrate the human cell to help deliver an attached drug. But for a peptide with even 40 amino acids, the number of possible sequences is nearly infinite, making the task of designing peptides very complex.
To tackle this challenge, Gomez-Bombarelli developed models that suggests peptides that may not exist in nature, but could be synthesized in a lab and used to disrupt SARS-CoV-2 or other diseases. The models not only predict the properties of peptides, but also predict how feasible it would be to synthesize them in a lab.
So far, peptides predicted by the model have been tested in mice and were able to successfully deliver a desired molecule to cells. It’s an encouraging result: at least in the stage of mice and cell essays, according to Gomez-Bombarelli, this class of peptides may be the most effective yet.