Menu

Home

Marc Strous Manuel Kleiner Xiaoli Dong
Attachments
calisp-settings.JPG (90559 bytes)
clumpy_label.amino.JPG (130402 bytes)
clumpy_label.delta.JPG (161491 bytes)
clumpy_label.protein.JPG (161246 bytes)
default.amino.JPG (132268 bytes)
default.delta.JPG (143666 bytes)
default.proteins.JPG (145936 bytes)
filtering.metrics.JPG (168053 bytes)
peptide-spectra.X.jpg (161158 bytes)

Calis-p: CALgary approach to ISotopes in proteomics

The previous version of Calis-p was introduced in our 2018 PNAS paper: "Metaproteomics method to determine carbon sources and assimilation pathways of species in microbial communities". It was designed to estimate natural isotope abundances of individual species in microbial communities using proteomics data, termed "protein-based stable isotope fingerprinting" or [Direct Protein-SIF]. In addition to isotope fingerprinting, the current version (2.0) can also estimate isotope abundances of labeling experiments, for use in protein-based stable isotope probing (Protein-SIP) experiments (Preprint).

Calis-p can run on Mac, Linux, and Unix. It takes protein identification search engine output in form of scored peptide spectra match (PSM) tables as input. An additional input that is required is the raw MS data in mzML format. From these two inputs Calis-p estimates 13C/12C and δ13C values for all peptides that pass the filters, as well as the summary of average values and standard errors for each species in the dataset.

This software is no longer maintained. Please use the newer python version of Calisp instead.


Installation

For noise filtering, Calis-p depends on another program, ''mcl''. You can find more information on ''mcl'' here: http://micans.org/mcl/. From that page, navigate to "Licence and Software" for installation instructions and a download link. After you have successfully installed ''mcl'', you can extract the downloaded zipped archive of Calis-p:

>unzip calis-p-2.0.zip

The jar file and [README] are in the "Calisp-2.0" folder.


How Calis-p works

  • Calis-p starts reading data from the input files that contain the peptides and spectra. A detailed explanation of how it deals with files and folders is provided here: [Input files].
  • How Calis-p handles peptides with sulfur, modifications and assigns peptides to different species is explained here: [Peptide processing]
  • How Calis-p extracts a peptide's MS1 spectra is explained here: [Parsing Spectra]
  • How Calis-p eliminates noisy spectra is explained here: [Eliminating noisy spectra]
  • How Calis-p estimates a peptide's isotopic composition is explained here: [Estimating isotopic composition]
  • How Calis-p calculates center statistics for species and proteins is explained here: [Center statistics]

Quick Start

Preparation of input files for Calis-p:

Calis-p requires at least two input files providing different data to be able to compute isotope compositions for organisms and proteins. First, a mzML file containing the mass spectrometry data is needed. Second, a file that contains the peptide-spectrum match (PSM) data provided by a peptide identification algorithm such as SEQUEST HT in Proteome Discoverer is needed. The PSM data can either be provided as a tab delimited table with specific columns or in the open mzIdentML (.mzid) format that can be provided by many proteomic search engines.

Click here for instructions on how to prepare the [mzML files].
Click here for instructions on how to prepare the [PSM files].

Command line usage and options

Common options are provided in the [README] file. For instructions, you cal also type:

>java -jar path/to/Calisp-2.0/Calisp-2.0.jar -h

To compute 13C/12C and δ13C values of species, proteins and peptides in files within folder "my_peptide_folder" and spectra in files within folder "my_spectrum_folder", and save results in "my_output_folder":

>java -Xms10g -jar path/to/Calisp-2.0/Calisp-2.0.jar -threads 10 -peptideFile my_peptide_folder -spectrumFile my_spectrum_folder -outputFile my_output_folder

In this example, Calis-p will use up to 10 Gb of memory and 10 threads, which will be sufficient to process around 10 mzML files in 10 minutes.

Calis-p output files

Calisp creates its reports, to share all its estimates using the different models with you (details on the models [Estimating isotopic composition]). By default, the reports are created in the folder “calisp-output”. You can instruct it to use a different folder with the parameter --output. If the output folder already exists, it will overwrite the previous results. Calisp will create the following tab-delimited files, which you can open in a spreadsheet program:

File Screenshot of example file Description
calisp-settings.csv Settings A list of all the user-parameters and files used for the computation.
filtering.metrics.csv filtering.metrics Summary of the number of isotope patterns that were kept or rejected during quality filtering.
default.delta.csv default.delta The estimates of the default model for organisms and files. These values are used for the [Direct Protein-SIF] method to obtain natural abundance 13C values.
default.proteins.csv default.protein The estimates of the default model for individual proteins. Usually the estimates are not accurate enough to determine SIF values for individual proteins and thus you mostly want to ignore these values.
default.amino.csv default.amino Estimate of the per-amino acid 13C content using the default model for organisms and files.
neutron_abundance.delta.csv neutron.delta The estimates of the neutron abundance model for organisms and files. This is the standard model to be used for Protein-SIP.
neutron_abundance.proteins.csv neutron.proteins The estimates of the neutron abundance model for individual proteins. This is the standard model to be used for Protein-SIP.
neutron_abundance.amino.csv neutron.amino Estimate of the per-amino acid label content using the neutron abundance model for organisms and files. This is the standard model to be used for Protein-SIP.
clumpy_label.delta.csv clumpy.delta The estimates of the clumpy label model for organisms and files. This model can be used for Protein-SIP under special circumstances for details see below and [Estimating isotopic composition].
clumpy_label.proteins.csv clumpy.protein The estimates of the clumpy label model for individual proteins.
clumpy_label.amino.csv clumpy.amino Estimate of the per-amino acid label content using the clumpy label model for organisms and files.
peptides.csv peptides first half of columns / peptides 2nd half of columns For each peptide, all information is provided including the aggregated intensity, the normalized spectrum and the estimates of all three models. Use this file if you would like to analyze your data, for example using R.
peptide-spectra.X.csv (one file for each MzML file X) peptide-spectra For each MzML file, each individual peptide spectrum extracted. These files are most probably only useful for debugging Calisp.

Project Members:

Related

Wiki: Center statistics
Wiki: Direct Protein-SIF
Wiki: Eliminating noisy spectra
Wiki: Estimating isotopic composition
Wiki: Input files
Wiki: PSM files
Wiki: Parsing Spectra
Wiki: Peptide processing
Wiki: README
Wiki: mzML files