All reanalysed datasets have been made available via PRIDE datasets including input, output, and intermediate output of the Nextflow workflow, as datasets PXD042651 and PXD042650, respectively.
Our TopDownApp is free and open to use, and can be used on a local computer or institutional server-setup, installation as simple as downloading and running the containerised app. However, unlike other computing environments for TD data processing like TDPortal or TopPIC Gateway [22,23], the workflows themselves are not reconfigurable via GUI. Instead, high-throughput analysis can be conducted on a huge variety of compute infrastructures through application of the underlying nextflows. Changing the sequence or the type of tools in the workflow needs to be coded in nextflow. As a beneficial side effect of this, adding new tools into the workflow can be simply achieved by the user with a few lines of configuration change for any software if it can be called from the command line and has compatible in-/output. Another distinction is the direct integration of results visualisation as main component of the UI of TopDownApp.
In the future, the TopDownApp could be enhanced with additional deconvolution and identification software modules, such as MSPathFinderT [24], to further increase the software options for data analysis. Moreover, the inclusion of a dedicated label-free quantification module, such as FLASHQuant, would be ideal to enable quantitative analysis. To ensure robust and reproducible results, adopting additional control strategies at different levels including proteins, protein isoforms, and proteoforms, would be highly beneficial. Currently, parameter configuration through the browser is not fully implemented due to the wide variety of supported parameters. Adoption of other PSI standard data formats, such as the ProForma 2.0 notation [25] for proteoform presentation, would improve data interoperability. Additionally, we expect that the availability of an open data analysis workflow will enable the reuse and reanalysis of TD proteomics datasets in the public domain. This would open many possibilities such as the integration of the results in popular bioinformatics resources such as PRIDE, UniProtKB [26] and the Human Proteoform Atlas [27], making TD and proteoform data more FAIR. Furthermore, in our view, the availability of open source analysis platforms will be essential to the success of the envisioned Human Proteoform Project [28].
We expect that the availability of TopDownApp as an open and shared development platform will aid the reproducibility of TD data analysis, providing a basis for new, automated, and easy to share TD data analysis workflows. Further, we envision an improved accessibility to deconvolution and identification data. As we demonstrated one method of visualisation of TD identification results, we hope this to serve as a basis for more analysis results to be made (visually) accessible to current practitioners. In the medium-long term we expect that this and analogous efforts can improve the integration of proteoform data in bioinformatics resources so that all biomedical researchers can benefit from the adoption of open science practices in the TD proteomics field.