Evaluation scripts for BiPACE/CeMAPP-DTW, June 2012
Author: nils.hoffmann@cebitec.uni-bielefeld.de
Version: 1.1.1
Last updated: Oct. 16th, 2012
Changes:
1.1.1:
-Added missing gradle.properties
1.1:
-Updated build.gradle to download the correct maltcms version zip
from sourceforge.net automatically, if not present.
-Updated build.gradle to download the correct groovy version zip
from codehaus.net automatically, if not present.
-Removed (most) dependencies to maltcms.de artifactory.
-Set default configurations to work on stand-alone computers.
-Added LeishmaniaEvalShort.sh and LeishmaniaGraphicsShort.sh
for a fast (computes only a few combinations) start.
-Updated this README with updated information.
-Added metabolights database link.
-Added maltcms link.
1.0:
-Initial version with evaluation scripts and data.
###########################################################################
0. Citation:
###########################################################################
If you use any of the material provided in this distribution, we ask you
to cite the following publication:
Hoffmann et al., "Combining peak- and chromatogram-based retention time
alignment algorithms for multiple chromatography-mass spectrometry
datasets", BMC Bioinformatics, 2012, 13:214, doi:10.1186/1471-2105-13-214
More specific information on the supplied datasets and their original
publications may be found in Section 4 of this document.
###########################################################################
1. License:
###########################################################################
The source code of this distribution is licensed under the GNU Lesser
General Public License version 3. Details may be found in the LICENSE file.
Please see section 4 for details on the datasets contained in this
distribution.
###########################################################################
2. Requirements:
###########################################################################
-a Unix-compatible operating system (Linux or MacOS X).
-a recent JAVA SDK, version 6, a.k.a. 1.6 (7, a.k.a 1.7 should
work, but has not been tested).
-a recent version of gradle (www.gradle.org), version 1.0.+, please
follow their installation instructions for your system.
-a recent installation of GNU-R (www.r-project.org) > 2.14 with ggplot2.
To install from R's command line: library.install("ggplot2")
###########################################################################
If you experience problems building and/or running the project, please
contact the author for assistance.
###########################################################################
3. Running the evaluation:
###########################################################################
Please note that the evaluation will place a HUGE workload on your
computer for a long time. Total runtime depends largely on the
performance of your system. We have run both evaluations on a dedicated
cluster system with 50 nodes for about one week in total. These numbers may
vary greatly! You have been warned ;-)!
The default settings for the evaluations concerning parallel execution
have been adapted to allow immediate execution on stand-alone computer
hardware.
At your command prompt, change to the 'scripts' directory
>cd scripts
SMALL SAMPLE EVALUATION:
There is a small example parameterization available for quick results. Run
>./LeishmaniaEvalShort.sh
and
>./LeishmaniaGraphicsShort.sh
to generate graphics in the same style as in the publication.
Settings for this small example are in
src/main/scripts/cfg/LeishmaniaShort.groovy
LEISHMANIA EVALUATION:
To run the full Leishmania evaluation (aka 'Robinson'), run
>./LeishmaniaEval.sh
To generate the publication graphics, run
>./LeishmaniaGraphics.sh
Settings for this evaluation are in
src/main/scripts/cfg/Leishmania.groovy
WHEAT EVALUATION:
In order to run the full Wheat dataset evaluation (aka 'Hohenheim'), run
>./WheatEval.sh
To generate the publication graphics, run
>./WheatGraphics.sh
Settings for this evaluation are in
src/main/scripts/cfg/Wheat.groovy
###########################################################################
3.1 Customization:
###########################################################################
To change the number of parallel processes, change the
'maxThreads' property in
src/main/scripts/cfg/Wheat.groovy
or
src/main/scripts/cfg/Leishmania.groovy
to the number of cpus available on your system. Please also set the
value
'useQSub'
to 'false' to turn off the use of grid submission.
Finally, the number of cpus to use for each Maltcms instance
cross.Factory.maxthreads = 4
can be set in
cfg/cemappDtw.properties and cfg/cemappDtwRt.properties and should be set to one.
This value should be matched by
cpusPerJob = 4
in either src/main/scripts/cfg/Wheat.groovy or src/main/scripts/cfg/Leishmania.groovy.
Individual parameterizations run for relatively short time. Each instance will require
at most 2GBytes (Leishmania) of main memory with one parallel task per instance.
The Wheat dataset may consume more, since it runs four alignments in parallel and
depending on the other parameterization. However, 24 GBytes
were never exceeded in our experiments.
The grid-enabled versions (WheatGrid.sh and LeishmaniaGrid.sh) will most definitely
require custom configuration.
###########################################################################
In order to save memory, you can set
cross.datastructures.fragments.FileFragment.useCachedList = true
in cfg/evaluationDefaults.properties. This will limit the number of mass spectra
kept in memory.
More details about configuration options are given in the configuration files
scripts/src/main/scripts/cfg/Leishmania.groovy and
scripts/src/main/scripts/cfg/Wheat.groovy
and in the evaluation script at
scripts/src/main/scripts/Evaluation.groovy
###########################################################################
4. Notes and References:
###########################################################################
-Evaluation scripts, source-code, and assets
Hoffmann et al., "Combining peak- and chromatogram-based retention time alignment algorithms
for multiple chromatography-mass spectrometry datasets", BMC Bioinformatics, 2012,
13:214, doi:10.1186/1471-2105-13-214
-Maltcms, the framework behind BiPace and Cemapp-DTW is available at
http://maltcms.sf.net
-Leishmania samples (directory Robinson2007)
The samples, ground truth files, and peak lists were downloaded from the
supplementary materials page at http://www.biomedcentral.com/1471-2105/8/419
accompanying the publication of
Robinson et al., "A dynamic programming approach for the alignment of signal
peaks in multiple gas chromatography-mass spectrometry experiments", BMC Bioinformatics,
2007, 8:419, doi:10.1186/1471-2105-8-419
The peak lists provided from the same resource were processed to fit the input format
of Maltcms.
-Wheat samples (directory Hohenheim/)
These samples, ground truth files, and peak lists are provided for use in
scientific evaluations. When using these samples, please cite
Hoegy P, Keck M, Niehaus K, Franzaring J, Fangmeier A: Effects of atmospheric CO2 enrichment on
biomass, yield and low molecular weight metabolites in wheat grain. Journal of Cereal Science 2010,
52(2):215-220
and
Hoegy P, Wieser H, Koehler P, Schwadorf K, Breuer J, Franzaring J, Muntifering R, Fangmeier A:
Effects of elevated CO2 on grain yield and quality of wheat: results from a 3-year free-air
CO2 enrichment experiment. Plant Biology 2009, 11:60-69.
The wheat samples are also available from the metabolights database at
http://www.ebi.ac.uk/metabolights/MTBLS21
###########################################################################