Thank you for the software and all your comments on this forum, they have been very helpful for us on our linkage mapping project. Overall, our mapping has been going really well, however we are having some trouble fine-tuning order markers, and observe regions where the genetic position does not converge, so we see 2 -4 lines in some regions when comparing to the physical maps. We have tried a variety of things to remove these (detailed below), and whilst they improve on some LGs, they don't always improve every LG for both sex informative maps. I've attached some example plots, any advice you may have on helping resolve these double lines would be greatly appreciated!
Details of workflow:
Using F1 family with 94 offspring with 60,159 markers
For separate chromosomes, we used informative mask =1 and informative mask =2 to build separate maps for male and female informative markers
Join Singles to all with informative mask 13 and 23 to join the double heterozygous markers, resulting in 35,303 markers in the female map and 35,117 in the male map
In Order Markers we ran it 10 times and chose the one with the best likelihood
Trouble shooting so far:
added hyperphase and increased the number of merge (20) and polish (5) iterations
varied scale (2N/M, 3M/N, scale = 0.035)
tried not using join singles to all, going from separate chromosomes to OrderMarkers with informative mask 1 and 2 run separately, to reduce the number of markers.
We are also planning to thin our markers out further and test if fewer markers helps resolve
Attached below an example LG showing the male and female maps plotted separately against the physical position with our initial order markers run, where we didn't include any additional parameters, and the same LG but with scale = 2M/N and scale = 0.035. I have noticed that it often seems like the male map has more lines than females, but responds more to changes in the scale parameters, whilst the female maps start of better, but change very little with different parameters.
I was also a little confused about the implementation of the scale parameter as in the paper it says to use scale =0.01 where you have 100x more markers than individuals, but in the forum and on the documentation I often see scale = 2M/N, if you have time could you please help me understand the different implementations.
The maps look about as good as they can, being build "de novo" without genome coordinates. My gut feeling is that the double lines are caused because some markers have missing/erroneous genotypes. This uncertainty of map positions can be obtained from the map intervals (parameter calculateIntervals=file), I did discuss this in detail in my Lep-Anchor paper.
Basically I have suggested three ways to fix this:
1) evaluate the map in the physical order (evaluateOrder=phys_order.txt, improveOrder=0)
2) use FitStepFunction from Lep-Rec (included in Anchor)
3) use parameters usePhysical from OrderMarkers2.
The scale can be bit confusing. N is number of markers and M individuals. The original idea was that if map length is 100cM (and markers equally spaced), then M=N would require no scaling (M/N=1). With more markers scaling M/N would decrease (more markers than individuals). 2M/N would mean in the same framework that we assume a map length of 200cM. (To complicate the matter, now there is the proximityScale parameter to cope with genotyping errors manifesting many close by markers. This paramater is generally recommended for sequencing based data, e.g. proximityScale=50)
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for your reply. I will attempt to expand on your three suggested options, to ensure that we understand them correctly, and for future reference for other users:
Three ways to fix this: 1) evaluate the map in the physical order (evaluateOrder=phys_order.txt, improveOrder=0)
The rationale for this option is to constrain the order (1st, 2nd, 3rd, etc) of markers to their physical position on the assembly during the OrderMarkers2 step, to ensure a monotonically increasing marey map, which can then be used to estimate recombination rate.
The phys_order.txt file in this instance is a file with no header and four columns (correct me if i'm wrong), the columns being: SNPNUMBER (from the ParentCall output), Contig (the CHR column in the parentCall output), Position (the Mb position in the ParentCall output), and LG (from SeparateChromosome or JoinSingles2All output). One file per chromosome is required. So the first few lines would be something like:
32000CHR189000132050CHR1975001etc
The improveOrder=0 argument tells LepMap to retain this physical order. Thus OrderMarkers only estimates the recombination between markers and their genetic position in cM, without changing their order (1st, 2nd, etc). The output should in theory be a monotonically increasing marey map.
2) use FitStepFunction from Lep-Rec (included in Anchor)
This function, as described in LepRec, adjusts the linkage map so that it is monotonically increasing/decreasing when compared to the physical assembly.
To run this, run FitStepFunction with the OrderMarkers2 output (ordermarkersOutput_LGX.txt) for an LG and the snps.txt file. The snps.txt file contains two columns, with CHR and POS. Then run commands:
java -cp bin FitStepFunction map=ordermarkersOutput_LGX.txt snps=snps.txt > ordermarkersOutput_LGX_monotonicallyIncreasing.txt
NOTE: you can add an intervals file from OrderMarkers to FitStepFunction call.
3) use parameters usePhysical from OrderMarkers2.
We have not tried this yet, but to our understanding this option uses the known physical position from the assembly (found in CHR and POS columns of the ParentCall output) to guide the linkage map construction, thus making the genetic map more congruent with the markers physical position on the assembly and thus more likely to be monotonically increasing. Although, as seen for instance in the barn-owl paper, these maps with usePhysical are not necessarily increasing monotonically, and thus might still require the use of FitStepFunction.
N) estimating recombination
The monotonically increasing linkage maps can then be used as input for EstimateRecombination function from LepRec.
java -cp bin EstimateRecombination map=ordermarkersOutput_LGX_monotonicallyIncreasing.txt
Notes on assembly quality and LepAnchor
All three of these options as described here rely on a good high quality assembly. If the assembly has misassemblies and other structural issues, then an intermediate step correcting the assembly with LepAnchor is needed before one can obtain a monotonically increasing map. See LepAnchor paper and the LepAnchor discussion forum
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Pasi,
Thank you for the software and all your comments on this forum, they have been very helpful for us on our linkage mapping project. Overall, our mapping has been going really well, however we are having some trouble fine-tuning order markers, and observe regions where the genetic position does not converge, so we see 2 -4 lines in some regions when comparing to the physical maps. We have tried a variety of things to remove these (detailed below), and whilst they improve on some LGs, they don't always improve every LG for both sex informative maps. I've attached some example plots, any advice you may have on helping resolve these double lines would be greatly appreciated!
Details of workflow:
Trouble shooting so far:
Attached below an example LG showing the male and female maps plotted separately against the physical position with our initial order markers run, where we didn't include any additional parameters, and the same LG but with scale = 2M/N and scale = 0.035. I have noticed that it often seems like the male map has more lines than females, but responds more to changes in the scale parameters, whilst the female maps start of better, but change very little with different parameters.
I was also a little confused about the implementation of the scale parameter as in the paper it says to use scale =0.01 where you have 100x more markers than individuals, but in the forum and on the documentation I often see scale = 2M/N, if you have time could you please help me understand the different implementations.
Many thanks,
Sam
Dear Sam,
Thank you for your question.
The maps look about as good as they can, being build "de novo" without genome coordinates. My gut feeling is that the double lines are caused because some markers have missing/erroneous genotypes. This uncertainty of map positions can be obtained from the map intervals (parameter calculateIntervals=file), I did discuss this in detail in my Lep-Anchor paper.
Basically I have suggested three ways to fix this:
1) evaluate the map in the physical order (evaluateOrder=phys_order.txt, improveOrder=0)
2) use FitStepFunction from Lep-Rec (included in Anchor)
3) use parameters usePhysical from OrderMarkers2.
The scale can be bit confusing. N is number of markers and M individuals. The original idea was that if map length is 100cM (and markers equally spaced), then M=N would require no scaling (M/N=1). With more markers scaling M/N would decrease (more markers than individuals). 2M/N would mean in the same framework that we assume a map length of 200cM. (To complicate the matter, now there is the proximityScale parameter to cope with genotyping errors manifesting many close by markers. This paramater is generally recommended for sequencing based data, e.g. proximityScale=50)
Cheers,
Pasi
Hi Pasi,
Thank you very much for your suggestions and explanation, we will be sure to read and try lep-anchor.
Cheers,
Sam
Hello Pasi,
Thank you very much for your reply. I will attempt to expand on your three suggested options, to ensure that we understand them correctly, and for future reference for other users:
Three ways to fix this:
1) evaluate the map in the physical order (evaluateOrder=phys_order.txt, improveOrder=0)
The rationale for this option is to constrain the order (1st, 2nd, 3rd, etc) of markers to their physical position on the assembly during the OrderMarkers2 step, to ensure a monotonically increasing marey map, which can then be used to estimate recombination rate.
The phys_order.txt file in this instance is a file with no header and four columns (correct me if i'm wrong), the columns being: SNPNUMBER (from the ParentCall output), Contig (the CHR column in the parentCall output), Position (the Mb position in the ParentCall output), and LG (from SeparateChromosome or JoinSingles2All output). One file per chromosome is required. So the first few lines would be something like:
The
improveOrder=0argument tells LepMap to retain this physical order. Thus OrderMarkers only estimates the recombination between markers and their genetic position in cM, without changing their order (1st, 2nd, etc). The output should in theory be a monotonically increasing marey map.2) use FitStepFunction from Lep-Rec (included in Anchor)
This function, as described in LepRec, adjusts the linkage map so that it is monotonically increasing/decreasing when compared to the physical assembly.
To run this, run FitStepFunction with the OrderMarkers2 output (ordermarkersOutput_LGX.txt) for an LG and the snps.txt file. The snps.txt file contains two columns, with CHR and POS. Then run commands:
NOTE: you can add an intervals file from OrderMarkers to FitStepFunction call.
3) use parameters usePhysical from OrderMarkers2.
We have not tried this yet, but to our understanding this option uses the known physical position from the assembly (found in CHR and POS columns of the ParentCall output) to guide the linkage map construction, thus making the genetic map more congruent with the markers physical position on the assembly and thus more likely to be monotonically increasing. Although, as seen for instance in the barn-owl paper, these maps with usePhysical are not necessarily increasing monotonically, and thus might still require the use of FitStepFunction.
N) estimating recombination
The monotonically increasing linkage maps can then be used as input for
EstimateRecombinationfunction from LepRec.Notes on assembly quality and LepAnchor
All three of these options as described here rely on a good high quality assembly. If the assembly has misassemblies and other structural issues, then an intermediate step correcting the assembly with LepAnchor is needed before one can obtain a monotonically increasing map. See LepAnchor paper and the LepAnchor discussion forum