Here, we want to know your opinions on good practice to construct a genetic map with Lep-map. How we treat markers with missing rate? with significant segregation distortion? And how we face with serveral arguements for Lep-map modules, like OrderMarkers (very important one).
########### background -- our workflow
(0) mapping populations: three double haploid families (from haploid tissues of three segregation populations), 51, 42, 61 are their population size;
Genotypes: we used RNA-seq to genotype the haploid individuals
(1) first we do data filtering, (three populations in one run to have a consensus map)
what we got:
Number of LGs = 12, markers in LGs = 50459, singles = 1399742
(3) then we do OrderMarkers
command we used for one linkage group:
java -cp /home/xfwei/bin/lep-map2/bin/ OrderMarkers data=all_160.linkage chromosome=1 map= all_160-100-20.linkage.map improveOrder=1 minError=0 useKosambi=1 numThreads=1 initError=0.01 missingClusteringLimit=0 hammingClusteringLimit=0 numMergeIterations=6 filterWindow=10 polishWindow=100 > all_160-100-20.linkage.map.LG1
(4) imputation
Missing data will bias the calculation of the genetic distance and the marker ordering, as it was reported. So we plan to do this step.
(5) redo OrderMarkers based on imputed data
(6) order evaluation with LOD matrix
########### our questions
(1) Do we really need to filter out the markers in segregation distortion? Could we filter them out to construct a map, and then fill them in? What is the potential negative effects, if we generate a map with the markers in segregation distortion?
BTW, the markers in segregation distortion are usually important, as they may provide biological interests.
(2) It is much clear for us that markers with missings will affect the calculation of genetic distance. An imputation is needed? Do you have any suggestion on implement such imputation? Maskov (Ward et al. 2013 BMC Genomics) was used times for such job, but we currently did not find it.
(3) We have to run "OrderMarkers" several times to get a map pass the evaluation with LOD matrix (illustrated: http://www.atgc.org/XLinkage/). Could you please provide us more on tuning the arguements of "OrderMarkers" , such as "minError=", "initError=", "missingClusteringLimit=", "hammingClusteringLimit=", "numMergeIterations=", "filterWindow=", "polishWindow="? We do not know where to go for these arguements, you know.
Thanks in advance. Looking forward to hearing from you.
Best wishes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For 1) about segregation distortion. You do not need to filter out distorted markers. However, often by doing this will filter out much noise (for example caused by repeats) and enables to separate markers into chromosomes more robustly. You can try separating chromosomes without filtering as well, but if you end up collapsing chromosomes, you need to do some filtering. I also noticed that you used more stringent filtering than the default (dataTolerance 0.01). If you end up using filtering, you can try to add less filtered markers to linkage groups using JoinSingles.
For 2) about imputation. I am not sure whether you need to do imputation. If data can be imputated, OrderMarkers can probably infer that information as well. Moreover, you can also output the most likely data from OrderMarkers if you need imputated data for some other analysis.
For 3) about parameter tuning. The value of minError can be used to reduce the map lengths if there are much more markers than individuals, filterWindow and polishWindow will make map construction faster. The initRecombination can be set to smaller value(s) if there are very many markers (should be about expected recombination rate between adjacent markers).
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Pasi and Lep-map users,
Here, we want to know your opinions on good practice to construct a genetic map with Lep-map. How we treat markers with missing rate? with significant segregation distortion? And how we face with serveral arguements for Lep-map modules, like OrderMarkers (very important one).
########### background -- our workflow
(0) mapping populations: three double haploid families (from haploid tissues of three segregation populations), 51, 42, 61 are their population size;
Genotypes: we used RNA-seq to genotype the haploid individuals
(1) first we do data filtering, (three populations in one run to have a consensus map)
command we used:
java -cp /home/xfwei/bin/lep-map2/bin/ Filtering data=all_160 dataTolerance=0.05 missingLimit=50 missingLimitIndividual=580080 MAFLimit=0.05 > all_160_50.linkage
(2) then we do SeparateChromosomes
command we used:
java -cp /home/xfwei/bin/lep-map2/bin/ SeparateChromosomes data=all_160_50.linkage sizeLimit=100 lodLimit=20 > all_160-100-20.linkage.map
what we got:
Number of LGs = 12, markers in LGs = 50459, singles = 1399742
(3) then we do OrderMarkers
command we used for one linkage group:
java -cp /home/xfwei/bin/lep-map2/bin/ OrderMarkers data=all_160.linkage chromosome=1 map= all_160-100-20.linkage.map improveOrder=1 minError=0 useKosambi=1 numThreads=1 initError=0.01 missingClusteringLimit=0 hammingClusteringLimit=0 numMergeIterations=6 filterWindow=10 polishWindow=100 > all_160-100-20.linkage.map.LG1
(4) imputation
Missing data will bias the calculation of the genetic distance and the marker ordering, as it was reported. So we plan to do this step.
(5) redo OrderMarkers based on imputed data
(6) order evaluation with LOD matrix
########### our questions
(1) Do we really need to filter out the markers in segregation distortion? Could we filter them out to construct a map, and then fill them in? What is the potential negative effects, if we generate a map with the markers in segregation distortion?
BTW, the markers in segregation distortion are usually important, as they may provide biological interests.
(2) It is much clear for us that markers with missings will affect the calculation of genetic distance. An imputation is needed? Do you have any suggestion on implement such imputation? Maskov (Ward et al. 2013 BMC Genomics) was used times for such job, but we currently did not find it.
(3) We have to run "OrderMarkers" several times to get a map pass the evaluation with LOD matrix (illustrated: http://www.atgc.org/XLinkage/). Could you please provide us more on tuning the arguements of "OrderMarkers" , such as "minError=", "initError=", "missingClusteringLimit=", "hammingClusteringLimit=", "numMergeIterations=", "filterWindow=", "polishWindow="? We do not know where to go for these arguements, you know.
Thanks in advance. Looking forward to hearing from you.
Best wishes.
Dear Jian-Feng Mao,
Thank you for your comprehensive question.
I will try to anwer to all your concerns.
For 1) about segregation distortion. You do not need to filter out distorted markers. However, often by doing this will filter out much noise (for example caused by repeats) and enables to separate markers into chromosomes more robustly. You can try separating chromosomes without filtering as well, but if you end up collapsing chromosomes, you need to do some filtering. I also noticed that you used more stringent filtering than the default (dataTolerance 0.01). If you end up using filtering, you can try to add less filtered markers to linkage groups using JoinSingles.
For 2) about imputation. I am not sure whether you need to do imputation. If data can be imputated, OrderMarkers can probably infer that information as well. Moreover, you can also output the most likely data from OrderMarkers if you need imputated data for some other analysis.
For 3) about parameter tuning. The value of minError can be used to reduce the map lengths if there are much more markers than individuals, filterWindow and polishWindow will make map construction faster. The initRecombination can be set to smaller value(s) if there are very many markers (should be about expected recombination rate between adjacent markers).
Cheers,
Pasi