3 | DISCUSSION
This CASP saw consolidation of the major progress in computing the structure of single proteins achieved in the previous round and extension of deep learning methods to other challenges in structural biology with impressive success for protein assemblies.
For single protein structures, the most effective methods obtained only slightly higher levels of accuracy to those of CASP14. It’s not surprising that there is no major advance here, since for many targets agreement with experiment was likely already within experimental uncertainty, leaving little room for improvement. But there are some limitations to the methods. Shallow sequence alignments sometimes result in poor quality structures. Large proteins (more than about 1000 amino acids) may also have models of domains and domain interfaces that are of slightly lower accuracy than usually achieved.
All the most successful single protein methods used variations on AlphaFold2, sometimes embedding all or part of that software into their existing pipelines. The next most successful method is RosettaFold (24), though the difference in performance is substantial. RosettaFold2 is now available (25) and benchmarking shows improved performance. Several Large Language Models (LLMs) were included in this CASP, but performance lagged very significantly, including for shallow alignment targets where these methods were expected to be superior. Poor performance of these methods in this CASP round should not be seen as a final result, and the LLM methods may evolve to be more powerful.
Notably, running the standard version of AlphaFold2 with default parameters either through the ColabFold server (22) or locally installed often resulted in less accurate models. The primary reason for that appears to be that for some targets more extensive sampling than the default is needed. Successful groups used different approaches to increase sampling as well as tuning of multiple sequence alignments. As yet there are no benchmark studies that help a user chose between these approaches so as to optimize accuracy for the least increase in computing costs. Users are advised to carefully read the protocol descriptions in the relevant papers in this Proteins special issue and to keep an eye on the literature for useful studies. The key message is not to expect the best results using the standard protocol: if the structure obtained that way has acceptable predicted accuracy, usually no additional steps are necessary. However, if it falls short, and additional computing resources are available, further sampling is recommended.
The major advance this CASP was in the application of deep learning methods to protein assemblies. For many targets, but not all, agreement with experiment may be approaching experimental uncertainty limits, but it should be emphasized that we do not yet have adequate calibration of what that is. There were few transient complexes included in the target set and most of those were for antibody or nanobody-antigen complexes. Three non-homology complexes of this type were accurately modeled, three were not, an encouraging result compared to earlier benchmarking (29) but the methods do still have a way to go. Other apparent limitations are for some interfaces in large targets, although the role of experimental uncertainty in those results is unclear.
As with single proteins, the successful methods for protein complexes had AlphaFold2 at their core but used extensive sampling beyond the defaults. Several of the most successful groups were also successful for single targets, and the methods overlap. Also similarly to single protein targets, further benchmarking is needed to guide user protocol selection.
CASP has placed a long-running emphasis on not only the production of accurate protein structures but also on the provision of reliable estimates of co-ordinate uncertainty, with estimates provided both by participants submitting the structures (self-estimates) and by other groups who specialize in this area (3rd party estimates). The increase in accuracy of protein complexes in this CASP brought new importance to accuracy estimates for these structures too. AlphaFold2-based tertiary structure self-accuracy estimates have high reliability, consistent with the 2020 results (40). For protein assemblies, third party accuracy estimates will usually allow selection of a high-quality model, though not necessarily the best. Residue accuracy self-estimates correlate strongly with experiment, though are somewhat less reliable for interface residues than elsewhere.
CASP introduced three new categories this round, based on increased interest in the potential for deep learning methods to lead to advances there (17). The first is for modeling multiple structures in ensembles, both protein and RNA. Difficulties in obtaining suitable targets limited the significance of the results, but nevertheless, those that were obtained are generally encouraging. As with single proteins and protein complexes, successful methods were AF2-based, and incorporate a variety of enhanced sampling techniques. Although far from perfect, the results show that ensemble modeling is possible with current deep learning approaches. From the CASP perspective, the main difficulty going forwards is in obtaining suitable ensemble targets. We urge those who will have suitable experimental structures to get in touch.
Assessment of protein-ligand complexes is another new category in this CASP. Ideal targets here are series of compounds binding to the same protein. In spite of target limitations, a clear finding is that classical ligand docking methods were still superior to the new deep learning methods. These results are now a year old, and improved methods are no doubt under development.
The final new modeling category is RNA structure (13). Here, in spite of promising new deep learning methods having been published (7) and a number of CASP15 groups exploring the approach, classical approaches proved superior. Overall fold accuracy, Watson-Crick helical regions and their packing is often accurately modeled, but there were difficulties with modeling the more irregular regions of the targets. The effects of flexibility also complicate the assessment. The greater flexibility of RNA structures suggests that current experimental procedures may not always be adequate for assessment of computational methods. Conversely, computational methods must produce ensembles of structures to adequately represent experimental reality.
This round of CASP saw one very major advance (greatly improved accuracy in modeling protein complexes) and greater clarity on the application of AF2 to single proteins, as well as interesting and provocative starts to modeling of macromolecular complexes, protein ligand complexes, and RNA structure. We look forward to further major gains in the next CASP, in 2024.