Menu

OrderMarkers2 scale parameter

2023-03-21
2023-04-02
  • Hui Zhen Tan

    Hui Zhen Tan - 2023-03-21

    Dear Pasi,

    I have applied Lep-MAP3 on my data and have some questions about accounting for an excess of SNPs, relative to number of individuals, in OrderMarkers2.

    We performed 10 OrderMarkers2 runs for each chromosome using the default values, selected the run with the best likelihood, and validated the map via various visualisations such as LMPlot and MareyMap. While our OrderMarkers2 workflow worked well for most of our chromosomes, we noticed that larger chromosomes (>2k SNPs) could face two issues. Firstly, the maps of these larger chromosomes could be split into 2 chunks with discrepancy between map position and physical positions (Fig 1 – light purple points). Through calculation of pairwise linkage disequilibrium and OrderMarker2 runs of a subset of SNPs spanning the breakpoint (e.g. SNPs from 40Mbp-80Mbp physical position), we determined that our result with the 2 chunks is likely an artefact. Secondly, the maps of these larger chromosomes are much shorter than (e.g. half the length of) that from CRI-MAP (Fig 1 – light purple points vs orange points).

    We are aware of the ‘scale’ parameter which has been discussed in this very helpful forum and in the Lep-MAP3 paper. From what I understand, increasing the ‘scale’ parameter (e.g. ‘scale=3M/N’) helps to account for an excess of SNPs relative to the number of individuals, and adding a second number (e.g. ‘scale=3M/N 3’) helps modulate that effect for the ends of the linkage map. After we applied ‘scale=3M/N 3’, the resulting map looks much better (Fig 2 – light purple points). For context, we had 36 full-sib families with 144 unique individuals.

    How do we know if we are applying a sufficient ‘scale’ to account for the excess number of markers relative to the number of individuals? I also find it challenging to understand how the scale parameter works and how it affects the results despite the available information about it. Could I have your help in explaining how the ‘scale=NUM1 NUM2’ parameter works?

    Thank you very much for your help.

    Best,
    Hui Zhen

     
  • Pasi Rastas

    Pasi Rastas - 2023-03-21

    Dear Hui Zhen,

    Thank you for very good question.

    Indeed, here the default scale seems to be the problem.

    The scale parameter removes map inflation due genotyping errors. The default value M/N means that there should be about same amount of markers and individuals (about 100cM map). If the map is much longer, then some markers might get compressed together and you can see such artefacts. If the map is about half of the desired length, then setting scale=2M/N should work ok. The scale only works when < 1, you can also change the recombination and/or interference parameters correspondingly (taking square root of the default values effectively doubles scale, you can only change some parameters as well, e.g. interference only).

    The second value in scale changes how first and last markers are handled, effectively with higher scaling. Without this, the first and last recombinations could be missed.

    Cheers,
    Pasi

     
  • Pasi Rastas

    Pasi Rastas - 2023-03-21

    To add,

    I have encountered this once, then I solved the problem by increasing interference parameters (interference1=0.05 and interference2=0.05) and by scaling heterozygote likelihoods. In my data, the problem was selfing data where each marker was in the same phase and lower likelihoods on homozygotes caused some bias in the maps. Attached is the script to try out this hetScaling, I don't know for sure if it applies here.

    zcat data.filt.gz|awk -vscale=0.5 -f scaleHet.awk|gzip >data_scaled.filt.gz
    

    Cheers,
    Pasi

     
  • Hui Zhen Tan

    Hui Zhen Tan - 2023-03-24

    Dear Pasi,

    Thank you very much for your helpful replies.

    I attach here my results from applying 4 different scale parameters (Fig 1 – light purple points). With ‘scale=2M/N’ or ‘scale=2M/N 2’, we still see discrepancies between map position and physical position. I would also add a note that I applied 24 merge + 8 polish iterations in these runs, and that we found increasing the iterations to be important in improving the map. I found ‘scale=3M/N 3’ to be the lowest value of scale that achieves a good map, and am not keen to increase scale to match the higher map length by CRI-MAP since I do not think that map lengths are comparable across datasets and methods. It is however good to see similar landscapes of recombination along the chromosome between the two methods. Would that be a good justification for choosing this value of scale?

    Two more clarifications about your replies:

    The default value M/N means that there should be about same amount of markers and individuals (about 100cM map).” Am I right in understanding that the default parameter of ‘scale=M/N’ would best be applied to maps that are about 100cM in length and with similar number of markers and individuals?

    The second value in scale changes how first and last markers are handled, effectively with higher scaling. Without this, the first and last recombinations could be missed.” Does NUM2 = the factor to multiply scale by for the first and last markers e.g. scale=3M/N 3 means scale 3x3M/N = 9M/N for the first and last markers?

    Best,
    Hui Zhen

     
  • Pasi Rastas

    Pasi Rastas - 2023-03-29

    Dear Hui Zhen,

    The Lep-MAP3's scaling assumes that markers and crossovers are more or less evenly distributed. I think you argue the use of higher scale by the fact the marker/crossover distribution is locally higher than the default handles?

    The second parameter is quite difficult to explain. I will try to explain it with an example:If the scale=0.25 2, then 7 first and last markers are scaled...
    (The example was incorrect, correct example below)

    Often the default scale works for maps with length 40-200cM. If the genotype likelihoods are low, map is much longer or markers/crossovers unevenly distributed, you might have to change the scale parameter(s).

    Cheers,
    Pasi

     

    Last edit: Pasi Rastas 2023-04-02
  • Hui Zhen Tan

    Hui Zhen Tan - 2023-03-31

    Dear Pasi,

    Thank you for your explanations. The example you gave for ‘scale=NUM1 NUM2’ was very helpful in illustrating how scaling changes along the ends of the chromosome.

    As you pointed out, the maps we needed to apply ‘scale’ to are the longer ones, which was around 150-250cM for us. Regarding marker/crossover distribution in our dataset, we think that they are quite well distributed. We decided to apply scale because we find that chromosomes with much higher number of markers than individuals show disagreements between physical and map distances of markers, and we applied scale to correct that as suggested in the Lep-MAP3 publication.

    I hope this answers your question.

    Best,
    Hui Zhen

     
  • Pasi Rastas

    Pasi Rastas - 2023-04-02

    Dear Hui Zhen,

    Just want to change my example a bit. The markers are not scaled as I explained for the ends but the end recombination parameters are scaled.

    For the example of scale=0.25 2, all markers are scaled by 0.25. Scaling means that genotype likelihood is raised to the power (here L => L^0.25). Then to avoid missing a few first and last crossovers/recombinations, the recombination paramaters are scaled for the ends. The first and last 7 possible crossovers/recombinations are scaled by 0.125, 0.25, 0.375, 0.5, 0.625, 0.75 and 0.875 (0.25*i*/2, where i is 1..7) starting from the map end.

    Cheers,
    Pasi

     
  • Hui Zhen Tan

    Hui Zhen Tan - 2023-04-02

    Dear Pasi,

    Thank you for writing back to clarify your example. This has been a very helpful discussion.

    Best,
    Hui Zhen

     

Log in to post a comment.