3 | DISCUSSION
This CASP saw consolidation of the major progress in computing the
structure of single proteins achieved in the previous round and
extension of deep learning methods to other challenges in structural
biology with impressive success for protein assemblies.
For single protein structures, the most effective methods obtained only
slightly higher levels of accuracy to those of CASP14. It’s not
surprising that there is no major advance here, since for many targets
agreement with experiment was likely already within experimental
uncertainty, leaving little room for improvement. But there are some
limitations to the methods. Shallow sequence alignments sometimes result
in poor quality structures. Large proteins (more than about 1000 amino
acids) may also have models of domains and domain interfaces that are of
slightly lower accuracy than usually achieved.
All the most successful single protein methods used variations on
AlphaFold2, sometimes embedding all or part of that software into their
existing pipelines. The next most successful method is RosettaFold (24),
though the difference in performance is substantial. RosettaFold2 is now
available (25) and benchmarking shows improved performance. Several
Large Language Models (LLMs) were included in this CASP, but performance
lagged very significantly, including for shallow alignment targets where
these methods were expected to be superior. Poor performance of these
methods in this CASP round should not be seen as a final result, and the
LLM methods may evolve to be more powerful.
Notably, running the standard version of AlphaFold2 with default
parameters either through the ColabFold server (22) or locally installed
often resulted in less accurate models. The primary reason for that
appears to be that for some targets more extensive sampling than the
default is needed. Successful groups used different approaches to
increase sampling as well as tuning of multiple sequence alignments. As
yet there are no benchmark studies that help a user chose between these
approaches so as to optimize accuracy for the least increase in
computing costs. Users are advised to carefully read the protocol
descriptions in the relevant papers in this Proteins special issue and
to keep an eye on the literature for useful studies. The key message is
not to expect the best results using the standard protocol: if the
structure obtained that way has acceptable predicted accuracy, usually
no additional steps are necessary. However, if it falls short, and
additional computing resources are available, further sampling is
recommended.
The major advance this CASP was in the application of deep learning
methods to protein assemblies. For many targets, but not all, agreement
with experiment may be approaching experimental uncertainty limits, but
it should be emphasized that we do not yet have adequate calibration of
what that is. There were few transient complexes included in the target
set and most of those were for antibody or nanobody-antigen complexes.
Three non-homology complexes of this type were accurately modeled, three
were not, an encouraging result compared to earlier benchmarking (29)
but the methods do still have a way to go. Other apparent limitations
are for some interfaces in large targets, although the role of
experimental uncertainty in those results is unclear.
As with single proteins, the successful methods for protein complexes
had AlphaFold2 at their core but used extensive sampling beyond the
defaults. Several of the most successful groups were also successful for
single targets, and the methods overlap. Also similarly to single
protein targets, further benchmarking is needed to guide user protocol
selection.
CASP has placed a long-running emphasis on not only the production of
accurate protein structures but also on the provision of reliable
estimates of co-ordinate uncertainty, with estimates provided both by
participants submitting the structures (self-estimates) and by other
groups who specialize in this area (3rd party
estimates). The increase in accuracy of protein complexes in this CASP
brought new importance to accuracy estimates for these structures too.
AlphaFold2-based tertiary structure self-accuracy estimates have high
reliability, consistent with the 2020 results (40). For protein
assemblies, third party accuracy estimates will usually allow selection
of a high-quality model, though not necessarily the best. Residue
accuracy self-estimates correlate strongly with experiment, though are
somewhat less reliable for interface residues than elsewhere.
CASP introduced three new categories this round, based on increased
interest in the potential for deep learning methods to lead to advances
there (17). The first is for modeling multiple structures in ensembles,
both protein and RNA. Difficulties in obtaining suitable targets limited
the significance of the results, but nevertheless, those that were
obtained are generally encouraging. As with single proteins and protein
complexes, successful methods were AF2-based, and incorporate a variety
of enhanced sampling techniques. Although far from perfect, the results
show that ensemble modeling is possible with current deep learning
approaches. From the CASP perspective, the main difficulty going
forwards is in obtaining suitable ensemble targets. We urge those who
will have suitable experimental structures to get in touch.
Assessment of protein-ligand complexes is another new category in this
CASP. Ideal targets here are series of compounds binding to the same
protein. In spite of target limitations, a clear finding is that
classical ligand docking methods were still superior to the new deep
learning methods. These results are now a year old, and improved methods
are no doubt under development.
The final new modeling category is RNA structure (13). Here, in spite of
promising new deep learning methods having been published (7) and a
number of CASP15 groups exploring the approach, classical approaches
proved superior. Overall fold accuracy, Watson-Crick helical regions and
their packing is often accurately modeled, but there were difficulties
with modeling the more irregular regions of the targets. The effects of
flexibility also complicate the assessment. The greater flexibility of
RNA structures suggests that current experimental procedures may not
always be adequate for assessment of computational methods. Conversely,
computational methods must produce ensembles of structures to adequately
represent experimental reality.
This round of CASP saw one very major advance (greatly improved accuracy
in modeling protein complexes) and greater clarity on the application of
AF2 to single proteins, as well as interesting and provocative starts to
modeling of macromolecular complexes, protein ligand complexes, and RNA
structure. We look forward to further major gains in the next CASP, in
2024.