The use of AI to streamline drug discovery is exploding. Researchers are deploying machine learning models to help identify molecules among billions of options that might have the properties needed to develop new drugs.
But with so many variables to consider, from the cost of materials to the risk of something going wrong, assessing the cost of synthesizing the best candidates is no easy task, even if scientists use AI.
The numerous challenges associated with identifying the best and most cost-effective molecules to test are not only one of the reasons it takes so long to develop new drugs, but are also a major driver of high prescription drug prices.
To help scientists make cost-conscious choices, MIT researchers developed an algorithmic framework to automatically identify optimal molecular candidates. This minimizes synthesis costs while maximizing the likelihood that a candidate will have the desired properties. The algorithm also identifies the materials and experimental steps needed to synthesize these molecules.
A quantitative framework known as SPARROW (Synthetic Planning and Reward-Based Path Optimization Workflow) takes into account the cost of synthesizing a batch of molecules at a time, as multiple candidates may be derived from some of the same compounds.
Additionally, this integrated approach captures key information about molecular design, property prediction, and synthetic planning from online repositories and widely used AI tools.
In addition to helping pharmaceutical companies discover new drugs more efficiently, SPARROW can be used in applications such as the invention of new pesticides or the discovery of special materials for organic electronics.
“Compound selection is now very much an art, sometimes a very successful one. But because there are all these other models and predictive tools that give us information about how molecules work and can be synthesized, we can and should use that information to make decisions,” says Connor Coley, MIT, 1957. He is an Assistant Professor of Career Development in the Departments of Chemical Engineering, Electrical Engineering, and Computer Engineering and senior author of a paper on SPARROW.
Coley is joined on the paper by senior author Jenna Fromer SM ’24. This study today natural computational science.
Complex cost considerations
In a sense, whether a scientist should synthesize and test a particular molecule ultimately comes down to the question of the cost of synthesis and the value of the experiment. However, determining cost or value is a difficult problem in itself.
For example, an experiment may require expensive materials or have a high risk of failure. In terms of value, one might consider how useful it would be to know the properties of this molecule, or whether such predictions entail a high level of uncertainty.
At the same time, pharmaceutical companies are increasingly using batch synthesis to increase efficiency. Instead of testing molecules one at a time, they use combinations of chemical building blocks to test multiple candidates at once. However, this means that chemical reactions must all require the same experimental conditions. This makes estimating cost and value more difficult.
SPARROW addresses these issues by considering the shared intermediate compounds involved in molecular synthesis and incorporating that information into a value-for-cost function.
“When you think about the optimization game of designing a batch of molecules, the cost of adding a new structure depends on the molecules you’ve already chosen,” Coley said.
The framework also takes into account the cost of starting materials, the number of reactions involved in each synthetic route, and the likelihood of that reaction being successful on the first attempt.
To utilize SPARROW, scientists provide a set of molecular compounds they want to test and a definition of the properties they are looking for.
Here, SPARROW gathers information about the molecules and their synthetic routes, then compares the value of each molecule against the cost of synthesizing the candidates. Automatically selects the best subset of candidates that meet your criteria and finds the most cost-effective synthetic route for those compounds.
“We do all of this optimization in one step, so we can capture all of our competing objectives simultaneously,” says Fromer.
various frameworks
SPARROW is unique in that it can incorporate never-before-seen molecular structures designed by human hand, existing in virtual catalogs, or invented by generative AI models.
“We have many different sources of ideas. “Part of the beauty of SPARROW is that it puts all of these ideas on a level playing field,” adds Coley.
Researchers evaluated SPARROW by applying it to three case studies. Case studies based on real-world problems faced by chemists are designed to test SPARROW’s ability to find cost-effective synthetic schemes while using a wide range of input molecules.
They found that SPARROW effectively captures the marginal cost of batch synthesis and identifies common experimental steps and intermediate chemicals. It can also be scaled up to handle hundreds of potential molecular candidates.
“For example, the machine learning community for chemistry has so many models that work well for retrosynthesis or molecular property prediction, but how do you actually use them? Our framework aims to derive value from this previous work. By creating SPARROW, we hope to guide other researchers in thinking about complex subselections using cost and utility functions,” says Fromer.
In the future, the researchers hope to incorporate additional complexities into SPARROW. For example, they want to allow the algorithm to take into account that the value of testing one compound may not always be constant. They also want to include more elements of parallel chemistry in their value-for-money capabilities.
“Fromer and Coley’s work better aligns algorithmic decision-making with the practical realities of chemical synthesis. “With traditional computer design algorithms, the task of determining how best to synthesize a set of designs is left to the medicinal chemist, resulting in fewer optimal choices for the medicinal chemist and creating additional work,” said Patrick Riley, Senior Vice President of Artificial Intelligence. says: Information from Relay Therapeutics, which was not involved in this study. “This paper shows a principled path that includes consideration of combinatorial synthesis, which we hope will lead to higher quality and more accepted algorithm designs.”
“Identifying compounds to synthesize in a way that carefully balances time, cost, and potential for progress toward goals while providing useful new information is one of the most challenging tasks for drug discovery teams. “Fromer and Coley’s SPARROW approach does this in an effective and automated manner, providing human medicinal chemistry teams with a useful tool and taking an important step toward a fully autonomous approach to drug discovery,” said Memorial Sloan Kettering Cancer. The center was not involved in the work, adds computational chemist John Chodera.
This research was supported in part by the DARPA Accelerated Molecular Discovery Program, the Office of Naval Research, and the National Science Foundation.