Menu

Modules

Pasi Rastas

Overview of the modules

Lep-MAP2 (LM2) consists of modules Filtering, SeparateChromosomes, JoinSingles, OrderMarkers and ParentCall. Each of these modules can run by
java -cp path moduleName [parameters]
, where path is the directory where the class files of LM2 are located. Each module requires some parameters, simply running "java -cp path moduleName" lists these parameters and a short description of each parameter.

Additional modules have been added to Lep-MAP2:

  • Tranpose provides fast transpose function to convert data into linkage format
  • ParentCallGrandparents calls markers taking into account the genotype information on grandparents (however, grandparents are not being taken into account in any other way)
  • AchiasmaticMeiosis converts data with achiasmatic meiosis (no recombination in one sex) into single parent informative data. This module provides similar functionality that was in Lep-MAP1.

The input of LM2 consists of genotypes of one or several full-sib families (parents and their offspring), given in pre-makeped LINKAGE (Lathrop et al., 1984) format. The format gives the pedigree information on columns 1-6 and genotypes starting on column 7 onward (marker 1 is on column 7). The columns in linkage files are separated by tabs, but genotypes are given as two space separated numbers (like "1 2" or "220 150"). A missing genotype is coded as "0 0". The Columns 1-6 are family name, individual name, father, mother, sex, and phenotype. The phenotype is not used by LM2 (can be set to all 0).

Only full-sib type pedigree structure is supported, but data from several types of crosses can be given as full-sibs without noticeable loss of information.

The output files are always written to standard output. These can be directed to file by using >file at the end of each command. Some progress messages are also printed to standard error, these can directed to a file by adding 2>file2.

Filtering

The Filtering module handles filtering of the data, i.e. filtering markers based on, e.g. high segregation distortion (dataTolerance) and excess number of missing genotypes (missingLimit). This module outputs the filtered data in linkage format to be used with other LM2 modules and for further analysis (e.g. QTL). By default, genotypes are outputted in bi-allelic format with alleles 1 and 2. By providing option keepAlleles=1 the original alleles are kept (If there are a lot of markers with 3-4 alleles (e.g. microsatellites), keeping the alleles is probably wise as markers with more alleles have more information about the haplotypes).

Example :
java -cp bin/ Filtering data=data1.linkage dataTolerance=0.001 >data1_f.linkage

SeparateChromosomes

The SeparateChromosomes module assigns markers into linkage groups (LGs) by computing all pair-wise LOD scores between markers and joins markers that with LOD score higher than a user given parameter lodLimit. The output is a map file where line i (not commented by #) corresponds to marker i and gives its LG name (number).

Examples :
java -cp bin/ SeparateChromosomes data=data1_f.linkage lodLimit=5 >map1.txt

java -cp bin/ SeparateChromosomes data=data1-10.linkage lodLimit=8 >map1-10.txt

JoinSingles

The JoinSingles module assigns singular markers to existing LGs by computing LOD scores between each single marker and possible multiple markers from the existing LGs. It needs a map file as first input parameter and then it outputs a new map file with possible more markers assigned to LGs.

Example :
java -cp bin/ JoinSingles map1.txt data=data1_f.linkage lodLimit=4 >map1_js.txt

OrderMarkers

OrderMarkers orders the markers within each LG by maximizing the likelihood of the data given the order. Either a map file (map=map_file) or output of previous run of OrderMarkers (evaluateOrder=order_file) have to be provided (only the first column, the marker name is read from the latter file).

The output is markers (numbered as 1..N) with paternal and maternal position. Also the error estimate is provided for each marker and a duplicate flag if this marker was removed being identical to some other marker (duplicate removal can be disabled, if desired, by setting removeDuplicates=0).

Examples :
java -cp bin/ OrderMarkers map=map1_js.txt data=data1_f.linkage >order1.txt

java -cp bin/ OrderMarkers evaluateOrder=order1.txt data=data1_f.linkage >order1.2.txt

Compute map distances from a given order
java -cp bin/ OrderMarkers evaluateOrder=order1.2.txt data=data1_f.linkage improveOrder=0 >order1.3.txt

Compute sex averaged map distances from a given order
java -cp bin/ OrderMarkers evaluateOrder=order1.2.txt data=data1_f.linkage improveOrder=0 sexAveraged=1 >order1.SA.txt

(sometimes it is easier to separate each family to a separate file, multiple datafiles can be given as an input)
java -cp bin/ OrderMarkers map=map1.txt data=data1.linkage data2.linkage data3.linkage >order1-3.txt

achiasmatic meiosis (no recombination in male):
java -cp bin/ OrderMarkers map=map.txt data=data.linkage initRecombination=0 0.05

or if there are problems with 0 recombination:

java -cp bin/ OrderMarkers map=map.txt data=data.linkage initRecombination=0.000000001 0.05 learnRecombinationParameters=0 1

achiasmatic meiosis (no recombination in female):
java -cp bin/ OrderMarkers map=map.txt data=data.linkage initRecombination=0.05 0

or

java -cp bin/ OrderMarkers map=map.txt data=data.linkage initRecombination=0.05 0.000000001 learnRecombinationParameters=1 0

Note the achiasmatic meiosis fix in some cases (avoiding 0 recombination rate). It is also possible to use AchiamaticMeiosis module.

sex averaged maps
java -cp bin/ OrderMarkers map=map.txt data=data.linkage sexAveraged=1

large datasets (order each chromosome separately, filterWindow and polishWindow)
java -cp bin/ OrderMarkers map=mapBig.txt data=dataBig.linkage filterWindow=10 polishWindow=100 chromosome=1

large datasets on multiple families, parallel computation on X cores
java -cp bin/ OrderMarkers map=mapBig.txt data=dataBig.linkage filterWindow=10 polishWindow=100 numThreads=X chromosome=1

Hints for OrderMarkers:

To be sure on the marker order, it is wise to repeat marker ordering a few times. Order with best likelihood is typically the most correct.

The errors in marker orders can be detected by making a pair-wise LOD plot of the resulting markers. (The pair-wise data can be generated by LM2, it can be visualized in, e.g. R or gnuplot)

Because of genotyping errors, the map distances can become too long if the number of markers per chromosome is much higher than the number of individuals. Often setting parameter minError (to a value between 0.01-0.4) shortens the map considerably. Evaluating map orders with minErrors can be sufficient (I have not evaluated how minError affects marker ordering). Even large minError will work if there are multiple copies of identical markers (that reduce effective error parameter). It is also a good idea to try evaluating the maps without removing duplicates but setting minError. This will take into account how many times each identical marker occurs in the data (as if the duplicates are removed, then the marker occurs only once).

Large families can be split into smaller ones to obtain speedup from parallel computation (numThreads), even with single family data. This is because LM2 does not split families, it only distributes the computation on each family to different threads. Thus, maximum speedup can be obtained if all families are about equal size. Moreover, LM2 distributes the families in the order they are given in the input file. Sometimes by changing the order of families one can obtain better speedup using numThreads.

It can be wise to remove markers with high error estimate or markers at the map ends that cause long gaps.

BE SURE that your individuals match the pedigree. For example, use plink --genome (with genotype data without pedigree information and before ParentCall), to verify that your individuals are full-sibs.

If you notice gaps (or too long maps) in the output maps of OrderMarkers, try this:
If the gap is within a long stretch of single informative markers, setting initRecombination to something smaller (than default 0.05) like 0.001 typically fixes this (Note that the new version of LM2 (25th of April, 2016) will initialize recombination rates to 0 for non-informative sex, and probably solves this problem).

If the map is split into several parts (LG = X.1, LG = X.2), Setting maxDistance, e.g. to 0.4 sets the maximum recombination rate and does not allow that long gaps that cause LGs to split. Then it might also be easier to identify markers that cause problems.

ParentCall

The module ParentCall is used to call missing parental genotypes and markers on sex chromosomes (XY or ZW system, assumes known offspring sex). The input of ParentCall consists of genotype posteriors for each 10 possible SNP genotypes AA, AC, AG, AT, CC, CG, CT, GG, GT and TT. Other kind of variants (like indels) can be given as input by specifying them as SNPs (e.g. AA=homozygote indel, AT=heterozygote indel, TT=no indel). Typically these posteriors can be obtained from sequencing data or from SNP assays.

The first 6 lines presents the pedigree. Columns 1-2 give marker names (scaffold and pos) for genotypes, and can be any value for pedigree part. Thus, each line has n+2 tab separated columns if there are n individuals.

The output is also in the same posterior format. Scripts are provided to convert linkage files to posteriors and the posteriors back to linkage. These scripts make it possible to use ParentCall also for data only in genotype format.

Example (imputing missing parental genotypes from a linkage file) :

awk -f linkage2post.awk data1_missingParent.linkage|java -cp bin/ Transpose >data1.post

java -cp bin/ ParentCall data=data1.post >data1.call

awk -f simpleConvert.awk data1.call|cut -f 3-|java -cp bin/ Transpose >data1_imputed.linkage

(and use data1_imputed.linkage)

Sketch how to convert a vcf file to linkage

(cat pedigree_part.txt; awk -f vcf2posterior.awk data.vcf)|java -cp bin/ ParentCall data=- removeNonInformative=1|awk -f simpleConvert.awk|cut -f 3-|java -cp bin/ Transpose >genotypes.linkage


Related

Wiki: Home