DIA-NN software is a powerful software tool for processing data-independent acquisition (DIA) proteomics datasets. It uses neural networks and other algorithms to identify and quantify peptides and proteins from DIA data and is specifically optimized for fast chromatography. Here, we share some processing settings that we have found work well for processing ZenoTOF 7600 system data.
Processing data using a spectral library built with OneOmics suite
Spectral libraries can be generated in OneOmics suite using the ProteinPilot app. The group file created from the ProteinPilot app can be converted to a DIA-NN software-compatible file by performing a short processing run in the Extractor app, which extracts the library information and writes a *.txt file into your Results folder. It is important to use the Exclude Modifications option in the Extractor app at this step, as the DIA-NN software does not yet recognize all the modifications that the ProteinPilot app can identify using the Thorough search option. Recommended filters are 1% global protein and 1% global peptide. If you have multiple datasets that you want to combine into a single library, you can do so with the Extractor app, which will align the retention times and merge the 2 libraries at the protein level.
Once the ion library has been generated in Extractor, download the ion library from the Data Store. You can find the downloaded file, full_ion_library.txt, in the same folder that you selected your results to be written to. Now, you can process your SWATH DIA data in DIA-NN software using the library you just built with OneOmics suite. Download the latest version of DIA-NN software from GitHub.
The settings shown below can be used as a recommended starting point to process data in DIA-NN software collected using SCIEX QTOF instruments.
- Select your library using the Spectral Library button
- Select your data files and define your output file names
- Change the settings under Algorithm, as shown below
- When using the ProteinPilot app library, where grouping is already done, set Protein Inference to Off
- When using a library generated from a FASTA file, set Protein inference to Genes
- Change the settings under Output, as shown below
- Enter the key command line instructions shown in Additional options
- Press Run to start data processing
Many applications will benefit from the use of the “–library-headers…,” “–report-lib-info” and “–relaxed-prot-inf” commands. Other commands can be added to your processing workflow in the Additional options window, by entering command line instructions.
- If using a library generated through the OneOmics suite, you will need to use the “–library-headers…” flag to convert the library headers to a DIA-NN software-compatible format
- If you want to review the fragment ions used for quantification in the outputs, you can use the “–report-lib-info” flag. This flag is needed if you plan to review your results from DIA-NN software using OneOmics suite.
- If processing data with a library that was generated in silico from a FASTA file, it is recommended to use the protein inference flag, “–relaxed-prot-inf.” You do not need to use this when processing with a library that was generated with the ProteinPilot App pipeline. Note that you can also review data from in silico-generated library searches if you include the “–report-lib-info” flag.
The DIA-NN software documentation on command line tools provides the following descriptions of the flags recommended, above.
- instructs DIA-NN to use a very heuristical protein inference algorithm (similar to the one used by FragPipe and many other software tools), wherein DIA-NN aims to make sure that no protein is present simultaneously in multiple protein groups. This mode (i) is recommended for method optimisation & benchmarks, (ii) might be convenient for gene set enrichment analysis and related kinds of downstream processing. However the default protein inference strategy of DIA-NN is more reliable for differential expression analyses (this is one of the advantages of DIA-NN)
- Note in DIA-NN software 1.8.1, there is now a check box for Heuristic protein inference that has similar functionality to this flag.
- adds extra library information on the precursor and its fragments to the main output report
- specifies column names in the spectral library to be used, in the order described in Spectral library formats. Use ‘*’ (without quotes) instead of the column name if a particular column is irrelevant, or if DIA-NN already recognizes its name