Science: Machine Learning and Molecular Keys of Biology

Bruno Correia is a computational biologist. He used to hold a certain rule as paramount in his lab: Machine learning is not allowed. Correia did not consider it real science. Nevertheless, he is now using it to detect potential interactions between proteins 40,000 times faster than conventional methods would. Proteins are the complex folding molecules that respond to people’s biological processes. In February 2020, the journal Nature Methods featured his system on its cover. Concerning his early reluctance and to embrace machine learning, he now says that he was wrong. Moreover, he added that he is glad that he was wrong.

Geometric machine learning changed his mind. This is an emerging subfield of artificial intelligence that can learn the patterns of curved surfaces.

Proteins interact by fitting together. Their irregular shapes come together like three-dimensional pieces of a complicated puzzle. Researchers have spent decades trying figuring out how they do this. The well-known protein folding problem has challenged scientists since the mid – 20th century. Thus, they are attempting to understand the interaction of proteins. They are doing this by decoding the link between its final 3D shape and its constituent amino acids. IBM started to develop its line of Blue Gene supercomputers to tackle the folding problem in 1999. Moreover, 20 years following this, DeepMind applied state-of-the-art machine learning algorithms to it.

Machine Learning

Correia’s program is MaSIF (Molecular Surface Interaction Fingerprinting). It avoids the inherent complexity of a protein’s 3D shape by ignoring the internal structure of molecules. Instead, the system scans the 2D surface of a protein for what the researchers call interaction fingerprints. These are the features a neural network learns about, which indicates that another protein can bind there. Mohammed AIQuraishi is a protein researcher at Harvard Medical School, who is also using machine learning. He said that the idea is that when any two molecules come together, what they are essentially presenting to one another is a surface. Thus, that is all you need to analyze. This is a very, very innovative method.

A surface-focused framework to predict protein interactions, such as in MaSIF, can help accelerate the so-called de novo protein design. The design tries synthesizing useful proteins from scratch instead of relying on those that occur naturally. Michael Bronstein is a geometric machine learning expert at Imperial College London. He said that design could also be used for basic biology. Bronstein helped to develop the system. Bronstein wondered how cancer affected the properties of the protein. It turns out, you can destroy something in the protein that makes them work differently if these mutations are the result of the disease. MaSIF can answer these fundamental questions.

You might want to understand how deep learning creates protein fingerprints. In that case, Bronstein suggests looking at digital cameras from the early 2000s. Those models had algorithms for face detention that did quite a simple job. Bronstein explained that you just need to detect that there are a face, a mouth, a nose, and eyes. This works regardless of whether it has fat lips or thin lips, a short nose, or a long nose.