1 | INTRODUCTION
Computing the three-dimensional structure of protein molecules from their amino acid sequence first emerged as an aspiration in the 1960s (1), and since then many different approaches have been tried. CASP (Critical Assessment of Structure Prediction) was introduced in 1994 with the aim of accelerating progress by rigorously assessing the performance of methods through a community-wide experiment. Every two years, members of the experimental community are asked to provide information about soon-to-be-released structures, and the amino acid sequence information is passed on to the computational community, with the challenge of calculating the corresponding three-dimensional atomic structures. The similarity between computed and experimental structures is then examined by independent assessors with the support of the UC Davis Center for CASP (https://predictioncenter.org), the outcome discussed at an international meeting, and findings published in a special journal issue. This paper summarizes the results of the 15th experiment, held in 2022, and the authors are the experiment organizers. Other papers in this issue of PROTEINS are by the assessors and research groups with leading performances in various aspects of the experiment. Full details of the experiment, including targets ad results are at (https://predictioncenter.org). CASP is complemented by CAMEO (2), a continuous evaluation of computed structure accuracy that utilizes PDB weekly releases as targets.
CASP has seen massive progress in model accuracy over the course of the experiments. Initially, through homology modeling methods. But until recently, there was very limited effectiveness for structures where homology was not applicable, and accuracy very rarely approached that of experimental methods. In CASP13 (2018), that began to change dramatically through the effective application of convolutional neural networks. That approach resulted in models with correct folds for the majority of targets (3). That was followed by the introduction of attention-based networks and other algorithmic advances in CASP14 (2020), resulting in accuracy judged to be competitive with experiment for about two-thirds of the single protein targets (4). Although multiple participating research groups made progress in 2020, by far the most accurate results were obtained with AlphaFold2 (AF2) from the company DeepMind.
CASP15 (2022) builds on these earlier results. Although agreement with experiment for single protein structures had largely converged by 2020, key questions remained. These include whether observed limitations, particularly for shallow sequence alignments, would be overcome, how different protocols built around AF2 would perform, and whether other new methods would match or exceed AF2 performance. Of great interest was whether and to what extent deep learning methods would prove effective in addressing other problems in computational structural biology. The scope of CASP was to allow fuller investigation of that. One of the areas of most interest for the application of deep learning is that of protein complexes. For the last five rounds, CASP has included that category, in collaboration with CAPRI (5). For the first time, this CASP also includes categories on calculating RNA structure from sequence (in collaboration with RNA Puzzles (6)) and of calculating the structure of protein-small ligand complexes, particularly relevant to drug design. These are both areas where papers have suggested deep learning may make a major difference (7, 8). CASP also continued its long-standing category on methods to estimate the accuracy of models with new emphasis on the estimated accuracy of protein complexes, a critical factor for structure usefulness. This CASP also includes a new category for modeling ensembles of macromolecular conformations (9).The original framing of the protein folding problem was done 50 years ago (1), when there were only experimental structures for a few small, simple, highly ordered proteins. We now appreciate that proteins and RNA may adopt different conformations, both under the same conditions and in response to changes such as ligand binding or mutations so that speaking of ‘the conformation’ can become meaningless. Thus, computational methods should be able to reproduce multiple observed structural states. Results for all these categories are summarized in this paper; other papers in this Proteins issue describe detailed assessment results (9-14) and comments on the outcome by some of those contributing targets (15, 16). The issue also contains papers by selected participating groups. Some categories no longer seen as relevant were dropped: contact prediction which is now an integral part of deep learning methods, and refinement of initial models. The latter category, although partly successful in earlier CASPs, in 2020 could not produce models of comparable accuracy to those obtained with AF2. The ‘data assisted’ and ‘function analysis’ categories, although still relevant, were also not included in this CASP.
As set out below, this was another very exciting CASP round, with areas of major progress. There are remaining major limitations in some areas, but in all there are clear prospects for further progress.