--*** MaryGold README ***--
DESCRIPTION
MaryGold is an open source software package for the analysis
of contig graphs generated from Next-Generation sequencing data.
By decomposing contig graphs into bi- and triconnected components
MaryGold generates potential source-sink pairs of a bubbles
that represent sequence variation.
MaryGold was designed for variation detection in metagenomics samples.
Both variation within and between metagenomics datasets can be explored.
MaryGold can also be used for sequence variation detection between
two single genomes by co-assembling them.
If you use MaryGold, than please cite:
J.F. Nijkamp, M. Pop, M.J.T. Reinders and D. de Ridder
Exploring variation aware contig graphs for (comparative) metagenomics using MaryGold.
submitted
Contact information: http://bioinformatics.tudelft.nl
INSTALLATION
On the MaryGold's Sourceforge page binaries are provided. These binaries
are compiled on and have only been tested on Linux 64-bit machines. The
AMOS and OGDF libraries have been statically linked, Boost has been
dynamically linked (required for 'printCounts').
Some postprocessing parts of MaryGold require Python, such as generation
of the input files for Circos and finding linear paths in the compressed
contig graph.
If you wish to use the software on another platform you will probably
have to compile MaryGold yourself.
Requirements for compilation of the C++ code :
- The AMOS library
- The Open Graph Drawing Library
- The Boost library
Instructions:
./bootstrap
./configure
make
make install
If the required libraries are not in the standard path than provide them to contigure:
./configure \
--with-AMOS-include-path=../amos/include/AMOS \
--with-AMOS-lib-path=../amos/lib/AMOS/ \
--with-BOOST-include-path=/usr/include/boost \
--with-BOOST-lib-path=/usr/lib64/ \
--with-OGDF-include-path=../OGDF/ \
--with-OGDF-lib-path=/data../OGDF/_release/ \
PYTHON requirements
For the python parts of MaryGold the modules numpy, scipy, editdist and biopython are required
USAGE
A. Finding multi-allelic sites with MaryGold
MaryGold consists of three main steps:
1. Converting the AMOS graph information (CTE and CTG bank accounts) to GML:
bnk2gml -b my.bnk > graph.gml
2. Finding separation pairs by decomposing graph into bi- and triconnected components
getSeppairs -i graph.gml > seppairs.txt
3. Run the bubble search algorithm using the separation pairs as seeds
buildMotifs [-troks] -b my.bnk -q seppairs.txt
This is an example using three E.coli samples, that have been labelled at the end
of each read with their strain name, for example read names in the fasta file could be:
>r77136_1_O157
>r15478_1_HS
>r126947_1_K12
B. Generating some informative files
1. The read depths per contig per sample
printCounts calculated the read depths and read counts for each samples using regular expressions:
printCounts -x ".*O157;.*HS;.*K12" -b ../proba.bnk/
2. Set a threshold on the read depth to indicate whether the contig has enough reads
to belong to the sample. The threshold can either be set per sample, or one
threshold for all samples.
readDepth2member -d readdepths.txt -t '1.2;1.5;2' > membership.txt
3. Generate ID map
iid2eid -b ../proba.bnk/ > iid2eid.txt
C. Generating linear sequences
This will generate linearscaf.fasta and linearscaf.txt
python motiftigger.py -f motifs.txt \
-d readdepths.txt \
-m membership.txt \
-i iid2eid.txt \
-g compressed.gml \
-b my.bnk/ \
-o linearscaf.txt
D. CIRCOS: Generating the circos figure with multi-allelic sites
1. Generating the Circos source files
python toCircos.py -b my.bnk -r readdepths.txt -z membership.txt -m motifs.txt -c compressed.gml -i iid2eid.txt
This will generate a number of files, which are required for Circos:
bands.conf
distfile.txt (Average edit distances between paths in bubble)
hist.stacked.0.txt (Inferred read depths for paths through the bubble)
hist.stacked.1.txt
hist.stacked.2.txt
ideogram.conf
ideogram.label.conf
ideogram.position.conf
karyotype.txt
marygold.conf
ticks.conf
The number of hist.stacked.*.txt files depends of the number of samples.
2. Now you are ready to generate the Circos figure:
circos -conf marygold.conf