AI-powered predictions of the three-dimensional structures of nearly all cataloged proteins known to science have been made by DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI). The catalog is freely and openly available to the scientific community, via the AlphaFold Protein Structure Database.
The two organizations hope the expanded database will continue to increase our understanding of biology, helping countless more scientists in their work as they strive to tackle global challenges.
This major milestone marks the database being expanded by approximately 200 times. It has grown from nearly 1 million protein structures to over 200 million, and now covers almost every organism on Earth that has had its genome sequenced. Predicted structures for a wide range of species, including plants, bacteria, animals, and other organisms are now included in the expanded database. This opens up new avenues of research across the life sciences that will have an impact on global challenges, including sustainability, food insecurity, and neglected diseases.
Now, a predicted structure will be available for practically all protein sequences in the UniProt protein database. This release will also open up new research avenues, including supporting bioinformatics and computational work by allowing scientists to potentially spot patterns and trends in the database.
“AlphaFold now offers a 3D view of the protein universe,” said Edith Heard, Director General of EMBL. “The popularity and growth of the AlphaFold Database is testament to the success of the collaboration between DeepMind and EMBL. It shows us a glimpse of the power of multidisciplinary science.”
“We’ve been amazed by the rate at which AlphaFold has already become an essential tool for hundreds of thousands of scientists in labs and universities across the world,” said Demis Hassabis, Founder and CEO of DeepMind. “From fighting disease to tackling plastic pollution, AlphaFold has already enabled incredible impact on some of our biggest global challenges. Our hope is that this expanded database will aid countless more scientists in their important work and open up completely new avenues of scientific discovery.”
DeepMind and EMBL-EBI launched the AlphaFold database in July 2021. At that time it contained more than 350,000 protein structure predictions, including the entire human proteome. Subsequent updates saw the addition of UniProtKB/SwissProt and 27 new proteomes, 17 of which represent neglected tropical diseases that continue to devastate the lives of more than 1 billion people globally.
More than 1,000 scientific papers have cited the database and over 500,000 researchers from over 190 countries have accessed the AlphaFold Database to view over two million structures in just over one year.
The team has also seen researchers building on AlphaFold to create and adapt tools such as Foldseek and Dali which allow users to search for entries similar to a given protein. Others have adopted the core machine learning ideas behind AlphaFold, forming the backbone of a slate of new algorithms in this space, or applying them to areas such as RNA structure prediction or developing new models for designing proteins.
AlphaFold has also shown impact in areas such as improving our ability to fight plastic pollution, gaining insight into Parkinson’s disease, increasing the health of honey bees, understanding how ice forms, tackling neglected diseases such as Chagas disease and Leishmaniasis, and exploring human evolution.
“We released AlphaFold in the hopes that other teams could learn from and build on the advances we made, and it has been exciting to see that happen so quickly. Many other AI research organizations have now entered the field and are building on AlphaFold’s advances to create further breakthroughs. This is truly a new era in structural biology, and AI-based methods are going to drive incredible progress,” said John Jumper, Research Scientist and AlphaFold Lead at DeepMind.
“AlphaFold has sent ripples through the molecular biology community. In the past year alone, there have been over a thousand scientific articles on a broad range of research topics which use AlphaFold structures; I have never seen anything like it,” said Sameer Velankar, Team Leader at EMBL-EBI’s Protein Data Bank in Europe. “And this is just the impact of one million predictions; imagine the impact of having over 200 million protein structure predictions openly accessible in the AlphaFold Database.”
DeepMind and EMBL-EBI will continue to refresh the database periodically, with the aim of improving features and functionality in response to user feedback. Access to structures will continue to be fully open, under a CC-BY 4.0 license, and bulk downloads will be made available via Google Cloud Public Datasets.