TopDownApp is open source and freely available under an Apache 2.0 licence to allow for free use and software development. The source code can be found athttps://github.com/mwalzer/TopDownApp. The application is designed to work on local machines as well as in remote settings. Yet, the framework can be used for development of scaled-up and/or automated use cases, since the different parts are functional on their own or in combination. For example, the workflow can be used to automate the analysis, in particular on research compute infrastructures, which frequently support the use of Singularity containers [17]. All tools used in the workflow have to be containerised and can be specified via a customisable configuration file. Note however that the workflow needs to reflect potential tool command and parameter changes first, and that data hand-over compatibility needs to be ensured (preferably via the use of mzML and mzTab).
In addition to the containerisation of the tools, the processes of running the workflow and the data visualisation are containerised as well. As a result, the minimal setup for a user is the container system (Singularity/Docker) and a combined container (see Supplementary Material). The application can be started via a single command and then used through a web browser (see Supplementary Material and the code repository athttps://github.com/mwalzer/TopDownAppfor more details). Here, a local MS raw file and a protein sequence database (in fasta format) can be selected as input, and the protein modification parameters can also be set. Currently, the user interface allows to select a number of standard protein modifications and combinations thereof. From there, the analysis workflow can be started, and the results can be inspected once the data analysis process is finished. The successfully deconvolved spectra can be selected for visualisation against their original form from a table (see Supplementary Figure S2). Within each spectrum, each peak can be selected for deconvolution visualisation. Identified spectra are listed in a table, from which one can also select the corresponding spectra to be visualised. New visualisations can be developed using a Python notebook and added to the user interface (see Supplementary Material). The identification results are also available to download in the form of a development version of the mzTab standard format for TD proteomics data (see Supplementary Material).
The mzTab format was chosen because it has proven to be effective for representing identification data in other quickly developing specialisations of MS technologies, including metabolomics and lipidomics [8]. Furthermore, the format adaptations necessary for TD data are minor, and we expect a straightforward process to establish a community agreed extension of mzTab. Likewise, the representation of TD peak data in mzML is already supported by the data standard. However for complete compatibility, the deconvolution data needs to be represented and formalised in a format extension, for which we developed a functional proposal here. We used a TD spectrum reporting convention that attaches deconvolution information such as charge and isotope target as the userParam section of the spectrum schema representation and an implicitm/z array value reinterpretation as the mass. This is compatible with the current release of the widely used Python library for mzML consumption (and other PSI data standards), Pyteomics [16] and many other mzML capable software libraries. For now, only FLASHDeconv supports the deconvolution specific userParam section in mzML output, and thus visualisation of deconvolution is exclusive to FLASHDeconv. For both mzML and mzTab, input from the community will be needed for both to become accepted data standards for representing TD data. For details on the format specification extensions, see the Supplementary Material.
We demonstrate the utility of TopDownApp with two datasets, the first to showcase examples of visualisation, and the second to provide a re-assessment of a previously published human dataset. The first is anE. coli lysate, measured using a Thermo Scientific Orbitrap Eclipse (MassIVE dataset accession MSV000087484) [18]. There, we used an E. coli (strain: K12 MG1655 i) proteome database from UniProtKB/SwissProt (canonical; release 2023_01), and selected one modification (Oxidation [unimod:35]). The workflow configuration was FLASHDeconv for deconvolution and TopPIC for identification. We used the UI to visualise the precursor deconvolutions. Figure 2 shows how TopDownApp can be used to examine the signal quality of the precursors corresponding to the identified proteoforms, which is crucial information for the quality control of identification [19] [20]. Figure 2 (left and right panels) show visualisation examples of high and low quality precursor signals, respectively. For each precursor mass, TopDownApp shows all its differently charged isotope packets in the input raw spectrum (blue colour coded peaks) as well as noisy peaks around them (red peaks). Users can easily distinguish signal and noise components by colour and do not need to search for separate m/zvalue regions to observe peaks with different charges. In this way, TopDownApp organises the precursor information from different places in the spectrum for easy appraisal.