Diversity Maps 0.1.0 (October 26, 2011)
Copyright (C) Fedor Konovalov; licenced under GNU GPL
REQUIRES:
Perl 5.6.x or higher;
Perl module: Postscript::Simple;
Perl modules: PDL, PDL::Stats::GLM, PDL::NiceSlice;
Perl module: Graphics::ColorUtils;
Perl module: Getopt::Long (included in most distributions).
If you don't have these modules, install them from CPAN (#cpan <module_name>) or use a Standalone version of Diversity Maps.
USAGE:
> perl DiversityMaps-0.1.0.pl [options] [input_file output_file]
or
> ./DiversityMaps-0.1.0.pl [options] [input_file output_file]
If input/output is not specified, the program reads a file 'infile' from the current directory and writes out a Postscript file 'outfile-[NAME].ps', name depending on the mode.
EXAMPLES:
./DiversityMaps-0.1.0.pl -r1 Morex -cmax 26 -fs 8 Barley_OPA.txt BOPA_DMaps.ps
Will read genotypes from file 'Barley_OPA.txt' and generate diversity maps per-chromosome (26 accessions per page), based on similarity to accession 'Morex' (color: red) and to the one most distant from it (color: blue), chosen automatically. Yellow color will indicate a difference from both references.
Font size for printing accession names will be eight. Output will be saved in 'BOPA_DMaps.ps'.
./DiversityMaps-0.1.0.pl -sp -s -p
Will read genotypes from file 'infile' and generate spectral diversity maps in passport mode (per-accession, all chromosomes on one page). Initial colors for spectral haplotyping will be chosen automatically. Output will be saved in 'outfile-spectral.ps'. Accessions will be ordered by similarity.
GENERAL OPTIONS:
-s Sort accessions by similarity (default: off)
-p Passport mode (default: off)
-cmax <integer> Maximum number of chromosomes per page
-cw <value> Chromosome width (default: 1.4, or 18 with -p)
-ch <value> Chromosome maximum height, mm (default: 200)
-cs <value> Space between chromosomes, mm (default: 0.22, or 7 with -p)
-lw <value> Line width, mm (default: 0.02)
-fs <value> Font size, points (default: 5, or 15 with -p)
SPECTRAL MODE OPTIONS:
-sp
Spectral visualization mode. Each chromosome region will be colored based on its similarity against all the other accessions, taking into account the overall diversity distribution between them. This method is all-vs-all comparison -- it does not rely on reference genotypes or any other manual 'tinkering' with the data which leads to biased analysis.
By default, the initial accession colors (painted onto all the other accessions' chromosomes, in matching regions) are chosen automatically based on the integral genetic distances.
This is the best method to observe the global diversity patterns, especially in large collections. In auto mode, color shifts may be neccessary to improve visual appearance (see below).
-crand
In Spectral mode, use random initial colors, equally distant from grey. This is helpful to identify chromosome regions with unique or rare haplotypes: the less greyish the color is, the more unique the haplotype is at this point.
-cfile <filename>
In Spectral mode, the initial colors will be loaded from a file. This enables you to specify your own colors for arbitrarily chosen groups, to see how distinct these groups really are, which regions are similar or vary between them, or within.
-init
In Spectral mode, paint the entire chromosomes with their initial colors. Allows to see the initial coloring in Auto and Random spectral modes.
NOTE: This method does NOT take into account the genotypes at each locus; use it only for optimizing a coloring scheme for Auto spectral mode.
-hap <integer> (default: 7)
Minimum haplotype length. In Spectral mode, this is the minimal number of matching successive markers required to 'paint' the target region using the query accession color. The right choice depends on the species' reproduction mode, level of diversity and density of markers.
-pca <integer> (default: 123)
In Spectral auto mode, specify which principal components to use for initial coloring as RGB values. You can also change the order of components here, to swap the corresponding color channels.
NOTE: this value should be three digits long, e.g. 123 or 412.
-hue <integer> (default: 0)
In Spectral auto mode, remap initial colors by shifting their hue.
NOTE: this value should be positive, e.g. 83 or 205. For a negative hue shift, specify values above 360.
-sat <value> (default: 1)
In Spectral auto mode, this is a saturation modifier. Values < 1 reduce saturation, higher numbers increase it. Something in between 0.5..2 is generally useful.
-rgamma <value> (default: 1)
-ggamma <value> (default: 1)
-bgamma <value> (default: 1)
In Spectral auto mode, allows to change the gamma of individual RGB channels. >1 means lighter tones, 0..1 darkens the channel. Values between 0.5..2 are generally useful.
NOTE: set a negative value to invert the channel!
REFERENCE MODE OPTIONS:
(if '-sp' is not specified, the program switches to this 'classic' mode)
-r1 <name> First reference accession (default: auto-select)
-r2 <name> Second reference accession (default: auto-select)
-rs <name> Use a single reference accession (default: off)
-i Ignore non-informative states (default: on)
-acolor <R,G,B> Reference 1 fill color (default: 255,127,127)
-bcolor <R,G,B> Reference 2 fill color (default: 127,127,255)
-xcolor <R,G,B> Non-ref fill color (default: 240,240,127)
-ncolor <R,G,B> Noninformative fill color (default: 220,220,220)
-acolorh <R,G,B> Reference 1 line color (default: 255,63,63)
-bcolorh <R,G,B> Reference 2 line color (default: 63,63,255)
-xcolorh <R,G,B> Non-ref line color (default: 255,255,63)
-ncolorh <R,G,B> Noninformative line color (default: 235,235,235)
INPUT FILE FORMAT:
Chromosome chr1 chr1 chr1 chr2 chr2 ...
Position 0.1 1.9 12.6 2 8.3 ...
Accession1 A B B - B ...
Accession2 B B - A B ...
...
1st line should contain a word 'Chromosome', followed by arbitrary chromosome names or numbers, separated by spaces or tabulation;
2nd line should contain a word 'Position', followed by marker/SNP positions in cM, bp or other units;
3rd line onwards should contain an accession name and single-letter marker states: 'A', 'B' or '-' (missing data).
Accession names should not contain spaces; the table should be fully populated (add '-' for missing datapoints).
FILE FORMAT FOR INITIAL COLORS (optional in Spectral mode):
Accession1 192 88 165
Accession2 28 64 102
Accession3 29 58 154
...
This file is simple: four columns, accession names first, followed by color values for Red, Green and Blue. Color values should be integer in the range 0..255. Use spaces or tabs as separators.
Accession names should be the same as in your genotype file; the table should be fully populated.
***