TopDownApp is open source and freely available under an Apache 2.0
licence to allow for free use and software development. The source code
can be found athttps://github.com/mwalzer/TopDownApp.
The application is designed to work on local machines as well as in
remote settings. Yet, the framework can be used for development of
scaled-up and/or automated use cases, since the different parts are
functional on their own or in combination. For example, the workflow can
be used to automate the analysis, in particular on research compute
infrastructures, which frequently support the use of Singularity
containers
[17].
All tools used in the workflow have to be containerised and can be
specified via a customisable configuration file. Note however that the
workflow needs to reflect potential tool command and parameter changes
first, and that data hand-over compatibility needs to be ensured
(preferably via the use of mzML and mzTab).
In addition to the containerisation of the tools, the processes of
running the workflow and the data visualisation are containerised as
well. As a result, the minimal setup for a user is the container system
(Singularity/Docker) and a combined container (see Supplementary
Material). The application can be started via a single command and then
used through a web browser (see Supplementary Material and the code
repository athttps://github.com/mwalzer/TopDownAppfor more details). Here, a local MS raw file and a protein sequence
database (in fasta format) can be selected as input, and the protein
modification parameters can also be set. Currently, the user interface
allows to select a number of standard protein modifications and
combinations thereof. From there, the analysis workflow can be started,
and the results can be inspected once the data analysis process is
finished. The successfully deconvolved spectra can be selected for
visualisation against their original form from a table (see
Supplementary Figure S2). Within each spectrum, each peak can be
selected for deconvolution visualisation. Identified spectra are listed
in a table, from which one can also select the corresponding spectra to
be visualised. New visualisations can be developed using a Python
notebook and added to the user interface (see Supplementary Material).
The identification results are also available to download in the form of
a development version of the mzTab standard format for TD proteomics
data (see Supplementary Material).
The mzTab format was chosen because it has proven to be effective for
representing identification data in other quickly developing
specialisations of MS technologies, including metabolomics and
lipidomics
[8].
Furthermore, the format adaptations necessary for TD data are minor, and
we expect a straightforward process to establish a community agreed
extension of mzTab. Likewise, the representation of TD peak data in mzML
is already supported by the data standard. However for complete
compatibility, the deconvolution data needs to be represented and
formalised in a format extension, for which we developed a functional
proposal here. We used a TD spectrum reporting convention that attaches
deconvolution information such as charge and isotope target as the
userParam section of the spectrum schema representation and an implicitm/z array value reinterpretation as the mass. This is compatible
with the current release of the widely used Python library for mzML
consumption (and other PSI data standards), Pyteomics
[16]
and many other mzML capable software libraries. For now, only
FLASHDeconv supports the deconvolution specific userParam section in
mzML output, and thus visualisation of deconvolution is exclusive to
FLASHDeconv. For both mzML and mzTab, input from the community will be
needed for both to become accepted data standards for representing TD
data. For details on the format specification extensions, see the
Supplementary Material.
We demonstrate the utility of TopDownApp with two datasets, the first to
showcase examples of visualisation, and the second to provide a
re-assessment of a previously published human dataset. The first is anE. coli lysate, measured using a Thermo Scientific Orbitrap
Eclipse (MassIVE dataset accession MSV000087484)
[18].
There, we used an E. coli (strain: K12 MG1655 i) proteome
database from UniProtKB/SwissProt (canonical; release 2023_01), and
selected one modification (Oxidation [unimod:35]). The workflow
configuration was FLASHDeconv for deconvolution and TopPIC for
identification. We used the UI to visualise the precursor
deconvolutions. Figure 2 shows how TopDownApp can be used to examine the
signal quality of the precursors corresponding to the identified
proteoforms, which is crucial information for the quality control of
identification
[19]
[20].
Figure 2 (left and right panels) show visualisation examples of high and
low quality precursor signals, respectively. For each precursor mass,
TopDownApp shows all its differently charged isotope packets in the
input raw spectrum (blue colour coded peaks) as well as noisy peaks
around them (red peaks). Users can easily distinguish signal and noise
components by colour and do not need to search for separate m/zvalue regions to observe peaks with different charges. In this way,
TopDownApp organises the precursor information from different places in
the spectrum for easy appraisal.