Industry News Details


How is Machine Learning helping to develop TB drugs?

 Many biologists use machine learning (ML) as a computational tool to analyze a massive amount of data, helping them to recognise potential new drugs. MIT researchers have now integrated a new feature into these types of machine learning algorithms, enhancing their prediction-making ability.

Using this new tool allows computer models to account for uncertainty in the data they are testing, MIT researchers detected several promising components that target a protein required by the bacteria that cause tuberculosis (TB).

Although computer scientists previously used this technique, they have not taken off in biology. “It could also prove useful in protein design and many other fields of biology,” says the Simons Professor of Mathematics and head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) Bonnie Berger.

“This method is part of a known subfield of machine learning, but people have not brought it to biology,” states Berger. “It is a paradigm shift and is absolutely how biological exploration should be done.”

Assistant Professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard, Bryan Bryson and Berger are the senior authors of the study that appears today in Cell Systems. Brian Hie, an MIT graduate student, is the paper’s lead author.

Better Predictions

ML is a type of computer modeling in which an algorithm learns to predict based on data that it has already seen. In the past few years, biologists have begun machine learning to scour vast databases of potential drug compounds to find molecules that interact with specific targets.

The only limitation of this technique is that the algorithms perform well when the data they’re examining is similar to their training. Algorithms are not significantly superior at evaluating molecules that are very different from those they have already seen.

The researchers applied a method called the Gaussian process to assign uncertainty values to the data that the algorithms are trained on to overcome the obstacle. When the models are analyzing the training data that way, they also consider how reliable those predictions are.

For instance, if the data go into the model, it predicts how strongly a particular molecule binds to a target protein and the uncertainty of those predictions. The model can use that information to predict protein-target interactions that it hasn’t seen before. This model also forecasts the certainty of its predictions. While analyzing new data, the model’s predictions may have lower certainty for molecules different from the training data. This information can help researchers to decide which molecules to analyse experimentally. View More