Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb;20(2):205-213.
doi: 10.1038/s41592-022-01685-y. Epub 2022 Nov 24.

AlphaFill: enriching AlphaFold models with ligands and cofactors

Affiliations

AlphaFill: enriching AlphaFold models with ligands and cofactors

Maarten L Hekkelman et al. Nat Methods. 2023 Feb.

Abstract

Artificial intelligence-based protein structure prediction approaches have had a transformative effect on biomolecular sciences. The predicted protein models in the AlphaFold protein structure database, however, all lack coordinates for small molecules, essential for molecular structure or function: hemoglobin lacks bound heme; zinc-finger motifs lack zinc ions essential for structural integrity and metalloproteases lack metal ions needed for catalysis. Ligands important for biological function are absent too; no ADP or ATP is bound to any of the ATPases or kinases. Here we present AlphaFill, an algorithm that uses sequence and structure similarity to 'transplant' such 'missing' small molecules and ions from experimentally determined structures to predicted protein models. The algorithm was successfully validated against experimental structures. A total of 12,029,789 transplants were performed on 995,411 AlphaFold models and are available together with associated validation metrics in the alphafill.eu databank, a resource to help scientists make new hypotheses and design targeted experiments.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Validation of the AlphaFill algorithm.
a, Distribution of the LEV score of all transplants obtained with 100% sequence identity (the validation set with n = 28,619 independent observations). 408 transplants (1%) with LEV score >2.5 are not shown for clarity. b, The local r.m.s.d. correlates with the LEV score in the validation set, Pearson correlation coefficient 0.51 (n = 8,039; mono-atomic transplants were not used (main text)). c, Distribution of the local r.m.s.d. of all transplants in the AlphaFill models as boxplots in 10% identity ranges. Boxes are based on 3,594,940; 3,866,810; 2,079,705; 1,005,953; 495,357; 369,307; 268,904 and 252,681 transplants, respectively, and extend from first to third quartile with the median as the middle line. Whiskers extend to 1.5 times the interquartile range. For clarity, 332,771; 333,325; 181,126; 79,594; 42,273; 34,634; 29,368 and 24,263 outliers, respectively, are not shown. Maximum values are 107.4, 82.1, 40.6, 37.1, 61.5, 44.4, 35.6 and 35.5 Å. d, The distribution of the TCS for all transplants in the AlphaFill models (n = 6,859,380). Mono-atomic transplants (5,170,409 compounds) are left out (main text). e, The TCS correlates with the LEV score in the validation set (n = 8,039; mono-atomic transplants were used (main text)), Pearson correlation coefficient 0.51. f, Comparison of the TCS before and after energy minimization for four subsets of the validation set (each with n = 50), illustrating that TCS improves for low until highest TCS by refinement.
Fig. 2
Fig. 2. Screenshot of the AlphaFill entry page for cellular retinoic acid-binding protein 2 (AF-P29373).
The Mol* viewer on the left can be controlled by the table of transplanted compounds on the right. Clicking a compound in the table brings up a zoom of the binding site. Compounds can be hidden or shown individually using the tick boxes. Transplants at 70% or more sequence identity are displayed. The identity cutoff can be changed using the selector above the table. In this example, retinal (RET) inherited from PDB-REDO entry 4i9s (ref. ) is shown and flagged with a yellow box as medium confidence due to high TCS. All other transplanted compounds are hidden from view, providing the ‘optimize’ option for the selected transplant. After optimization (Supplementary Fig. 2) the is TCS is reduced to 0.29 Å, which is considered high confidence. A sodium from PDB-REDO entry 2frs (ref. ) is flagged for its high local r.m.s.d.
Fig. 3
Fig. 3. Human myoglobin structures in AlphaFold and AlphaFill.
a, The ribbon diagram of the AlphaFold model of human myoglobin. b, The heme-shaped cavity in the AlphaFold model, wherein the histidine side chains (gray cylinders colored by atom type) are ready to facilitate the heme biding. c, The heme-shaped cavity in the AlphaFill model, wherein the binding site is ‘filled’ with the transplanted heme group and the CO and O2 ligands; ligands are shown in stick-mode colored by atom type (heme) with the heme iron as a gray sphere.
Fig. 4
Fig. 4. Examples of transplanted zinc ions (purple spheres).
All proteins are presented as a ribbon diagram (each protein in a different color, for clarity); side chains coordinating the zinc ions are shown as cylinders colored by atom type for noncarbon atoms. a, A catalytic (top) and a structural (bottom) zinc ion in the STAM-binding protein. b, Two structural zinc ions in the human BMI-1. c, Zinc ion transferred into a structural zinc binding site in the zinc-finger protein 91 (top), wrongly placed zinc ion in the same protein (bottom). d, The bimetallic zinc binding site in ENPP1-7 as found in PDB-REDO models (PDB identifiers for ENPP1-7: 6weu, ref. ; 5mhp, ref. ; 6c01, ref. ; 4lqy, ref. ; 5veo, ref. ; 5egh, ref. and 5tcd, ref. , respectively), compared to the same binding site as found in the human ENPP1-7 models from AlphaFold and as available in AlphaFill, containing the two zinc ions. For clarity, only the backbone of ENPP1 is shown as a green ribbon diagram; side chains are colored green, blue, red, pink, orange, purple and gold for ENPP1-7, respectively.
Fig. 5
Fig. 5. AlphaFill helps to understand the activation state of the Abl kinase AlphaFold model.
a, AlphaFill model of the ABL1 kinase with ADP and magnesium ions shown. The state of the kinase is not known a priori. b, AlphaFill model of the ABL1 kinase with ATP (mapped from AGS) bound. c, ADP binding site of the human ABL1 kinase in PDB-REDO entry 2g2i (ref. ), which represents an active kinase state. d, ABL1 kinase bound with AGS in PDB-REDO entry 2g2f (ref. ), which represents an ‘intermediate’ kinase state. The kinase is presented as gray ribbon diagram for all panels, ligands are in blue cylinders colored by atom type for noncarbon atoms, and magnesium ions are shown as blush pink spheres.

References

    1. Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. - PMC - PubMed
    1. Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373:871–876. - PMC - PubMed
    1. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. - PMC - PubMed
    1. Tunyasuvunakool K, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596:590–596. - PMC - PubMed
    1. Perrakis A, Sixma TK. AI revolutions in biology. EMBO Rep. 2021;22:e54046. - PMC - PubMed

Publication types

LinkOut - more resources