1 | INTRODUCTION
Computing the three-dimensional structure of protein molecules from
their amino acid sequence first emerged as an aspiration in the 1960s
(1), and since then many different approaches have been tried. CASP
(Critical Assessment of Structure Prediction) was introduced in 1994
with the aim of accelerating progress by rigorously assessing the
performance of methods through a community-wide experiment. Every two
years, members of the experimental community are asked to provide
information about soon-to-be-released structures, and the amino acid
sequence information is passed on to the computational community, with
the challenge of calculating the corresponding three-dimensional atomic
structures. The similarity between computed and experimental structures
is then examined by independent assessors with the support of the UC
Davis Center for CASP (https://predictioncenter.org), the outcome
discussed at an international meeting, and findings published in a
special journal issue. This paper summarizes the results of the
15th experiment, held in 2022, and the authors are the
experiment organizers. Other papers in this issue of PROTEINS are by the
assessors and research groups with leading performances in various
aspects of the experiment. Full details of the experiment, including
targets ad results are at (https://predictioncenter.org). CASP is
complemented by CAMEO (2), a continuous evaluation of computed structure
accuracy that utilizes PDB weekly releases as targets.
CASP has seen massive progress in model accuracy over the course of the
experiments. Initially, through homology modeling methods. But until
recently, there was very limited effectiveness for structures where
homology was not applicable, and accuracy very rarely approached that of
experimental methods. In CASP13 (2018), that began to change
dramatically through the effective application of convolutional neural
networks. That approach resulted in models with correct folds for the
majority of targets (3). That was followed by the introduction of
attention-based networks and other algorithmic advances in CASP14
(2020), resulting in accuracy judged to be competitive with experiment
for about two-thirds of the single protein targets (4). Although
multiple participating research groups made progress in 2020, by far the
most accurate results were obtained with AlphaFold2 (AF2) from the
company DeepMind.
CASP15 (2022) builds on these earlier results. Although agreement with
experiment for single protein structures had largely converged by 2020,
key questions remained. These include whether observed limitations,
particularly for shallow sequence alignments, would be overcome, how
different protocols built around AF2 would perform, and whether other
new methods would match or exceed AF2 performance. Of great interest was
whether and to what extent deep learning methods would prove effective
in addressing other problems in computational structural biology. The
scope of CASP was to allow fuller investigation of that. One of the
areas of most interest for the application of deep learning is that of
protein complexes. For the last five rounds, CASP has included that
category, in collaboration with CAPRI (5). For the first time, this CASP
also includes categories on calculating RNA structure from sequence (in
collaboration with RNA Puzzles (6)) and of calculating the structure of
protein-small ligand complexes, particularly relevant to drug design.
These are both areas where papers have suggested deep learning may make
a major difference (7, 8). CASP also continued its long-standing
category on methods to estimate the accuracy of models with new emphasis
on the estimated accuracy of protein complexes, a critical factor for
structure usefulness. This CASP also includes a new category for
modeling ensembles of macromolecular conformations (9).The original
framing of the protein folding problem was done 50 years ago (1), when
there were only experimental structures for a few small, simple, highly
ordered proteins. We now appreciate that proteins and RNA may adopt
different conformations, both under the same conditions and in response
to changes such as ligand binding or mutations so that speaking of ‘the
conformation’ can become meaningless. Thus, computational methods should
be able to reproduce multiple observed structural states. Results for
all these categories are summarized in this paper; other papers in this
Proteins issue describe detailed assessment results (9-14) and comments
on the outcome by some of those contributing targets (15, 16). The issue
also contains papers by selected participating groups. Some categories
no longer seen as relevant were dropped: contact prediction which is now
an integral part of deep learning methods, and refinement of initial
models. The latter category, although partly successful in earlier
CASPs, in 2020 could not produce models of comparable accuracy to those
obtained with AF2. The ‘data assisted’ and ‘function analysis’
categories, although still relevant, were also not included in this
CASP.
As set out below, this was another very exciting CASP round, with areas
of major progress. There are remaining major limitations in some areas,
but in all there are clear prospects for further progress.