All reanalysed datasets have been made available via PRIDE datasets
including input, output, and intermediate output of the Nextflow
workflow, as datasets PXD042651 and PXD042650, respectively.
Our TopDownApp is free and open to use, and can be used on a local
computer or institutional server-setup, installation as simple as
downloading and running the containerised app. However, unlike other
computing environments for TD data processing like TDPortal or TopPIC
Gateway
[22,23],
the workflows themselves are not reconfigurable via GUI. Instead,
high-throughput analysis can be conducted on a huge variety of compute
infrastructures through application of the underlying nextflows.
Changing the sequence or the type of tools in the workflow needs to be
coded in nextflow. As a beneficial side effect of this, adding new tools
into the workflow can be simply achieved by the user with a few lines of
configuration change for any software if it can be called from the
command line and has compatible in-/output. Another distinction is the
direct integration of results visualisation as main component of the UI
of TopDownApp.
In the future, the TopDownApp could be enhanced with additional
deconvolution and identification software modules, such as MSPathFinderT
[24],
to further increase the software options for data analysis. Moreover,
the inclusion of a dedicated label-free quantification module, such as
FLASHQuant, would be ideal to enable quantitative analysis. To ensure
robust and reproducible results, adopting additional control strategies
at different levels including proteins, protein isoforms, and
proteoforms, would be highly beneficial. Currently, parameter
configuration through the browser is not fully implemented due to the
wide variety of supported parameters. Adoption of other PSI standard
data formats, such as the ProForma 2.0 notation
[25]
for proteoform presentation, would improve data interoperability.
Additionally, we expect that the availability of an open data analysis
workflow will enable the reuse and reanalysis of TD proteomics datasets
in the public domain. This would open many possibilities such as the
integration of the results in popular bioinformatics resources such as
PRIDE, UniProtKB
[26]
and the Human Proteoform Atlas
[27],
making TD and proteoform data more FAIR. Furthermore, in our view, the
availability of open source analysis platforms will be essential to the
success of the envisioned Human Proteoform Project
[28].
We expect that the availability of TopDownApp as an open and shared
development platform will aid the reproducibility of TD data analysis,
providing a basis for new, automated, and easy to share TD data analysis
workflows. Further, we envision an improved accessibility to
deconvolution and identification data. As we demonstrated one method of
visualisation of TD identification results, we hope this to serve as a
basis for more analysis results to be made (visually) accessible to
current practitioners. In the medium-long term we expect that this and
analogous efforts can improve the integration of proteoform data in
bioinformatics resources so that all biomedical researchers can benefit
from the adoption of open science practices in the TD proteomics field.