The previous version of Calis-p was introduced in our 2018 PNAS paper: "Metaproteomics method to determine carbon sources and assimilation pathways of species in microbial communities". It was designed to estimate natural isotope abundances of individual species in microbial communities using proteomics data, termed "protein-based stable isotope fingerprinting" or [Direct Protein-SIF]. In addition to isotope fingerprinting, the current version (2.0) can also estimate isotope abundances of labeling experiments, for use in protein-based stable isotope probing (Protein-SIP) experiments (Preprint).
Calis-p can run on Mac, Linux, and Unix. It takes protein identification search engine output in form of scored peptide spectra match (PSM) tables as input. An additional input that is required is the raw MS data in mzML format. From these two inputs Calis-p estimates 13C/12C and δ13C values for all peptides that pass the filters, as well as the summary of average values and standard errors for each species in the dataset.
This software is no longer maintained. Please use the newer python version of Calisp instead.
For noise filtering, Calis-p depends on another program, ''mcl''. You can find more information on ''mcl'' here: http://micans.org/mcl/. From that page, navigate to "Licence and Software" for installation instructions and a download link. After you have successfully installed ''mcl'', you can extract the downloaded zipped archive of Calis-p:
>unzip calis-p-2.0.zip
The jar file and [README] are in the "Calisp-2.0" folder.
Calis-p requires at least two input files providing different data to be able to compute isotope compositions for organisms and proteins. First, a mzML file containing the mass spectrometry data is needed. Second, a file that contains the peptide-spectrum match (PSM) data provided by a peptide identification algorithm such as SEQUEST HT in Proteome Discoverer is needed. The PSM data can either be provided as a tab delimited table with specific columns or in the open mzIdentML (.mzid) format that can be provided by many proteomic search engines.
Click here for instructions on how to prepare the [mzML files].
Click here for instructions on how to prepare the [PSM files].
Common options are provided in the [README] file. For instructions, you cal also type:
>java -jar path/to/Calisp-2.0/Calisp-2.0.jar -h
To compute 13C/12C and δ13C values of species, proteins and peptides in files within folder "my_peptide_folder" and spectra in files within folder "my_spectrum_folder", and save results in "my_output_folder":
>java -Xms10g -jar path/to/Calisp-2.0/Calisp-2.0.jar -threads 10 -peptideFile my_peptide_folder -spectrumFile my_spectrum_folder -outputFile my_output_folder
In this example, Calis-p will use up to 10 Gb of memory and 10 threads, which will be sufficient to process around 10 mzML files in 10 minutes.
Calisp creates its reports, to share all its estimates using the different models with you (details on the models [Estimating isotopic composition]). By default, the reports are created in the folder “calisp-output”. You can instruct it to use a different folder with the parameter --output. If the output folder already exists, it will overwrite the previous results. Calisp will create the following tab-delimited files, which you can open in a spreadsheet program:
File | Screenshot of example file | Description |
---|---|---|
calisp-settings.csv | Settings | A list of all the user-parameters and files used for the computation. |
filtering.metrics.csv | filtering.metrics | Summary of the number of isotope patterns that were kept or rejected during quality filtering. |
default.delta.csv | default.delta | The estimates of the default model for organisms and files. These values are used for the [Direct Protein-SIF] method to obtain natural abundance 13C values. |
default.proteins.csv | default.protein | The estimates of the default model for individual proteins. Usually the estimates are not accurate enough to determine SIF values for individual proteins and thus you mostly want to ignore these values. |
default.amino.csv | default.amino | Estimate of the per-amino acid 13C content using the default model for organisms and files. |
neutron_abundance.delta.csv | neutron.delta | The estimates of the neutron abundance model for organisms and files. This is the standard model to be used for Protein-SIP. |
neutron_abundance.proteins.csv | neutron.proteins | The estimates of the neutron abundance model for individual proteins. This is the standard model to be used for Protein-SIP. |
neutron_abundance.amino.csv | neutron.amino | Estimate of the per-amino acid label content using the neutron abundance model for organisms and files. This is the standard model to be used for Protein-SIP. |
clumpy_label.delta.csv | clumpy.delta | The estimates of the clumpy label model for organisms and files. This model can be used for Protein-SIP under special circumstances for details see below and [Estimating isotopic composition]. |
clumpy_label.proteins.csv | clumpy.protein | The estimates of the clumpy label model for individual proteins. |
clumpy_label.amino.csv | clumpy.amino | Estimate of the per-amino acid label content using the clumpy label model for organisms and files. |
peptides.csv | peptides first half of columns / peptides 2nd half of columns | For each peptide, all information is provided including the aggregated intensity, the normalized spectrum and the estimates of all three models. Use this file if you would like to analyze your data, for example using R. |
peptide-spectra.X.csv (one file for each MzML file X) | peptide-spectra | For each MzML file, each individual peptide spectrum extracted. These files are most probably only useful for debugging Calisp. |
Wiki: Center statistics
Wiki: Direct Protein-SIF
Wiki: Eliminating noisy spectra
Wiki: Estimating isotopic composition
Wiki: Input files
Wiki: PSM files
Wiki: Parsing Spectra
Wiki: Peptide processing
Wiki: README
Wiki: mzML files