Menu

Evidence of map inflation

2024-03-12
2024-04-26
  • Tianzhu Xiong

    Tianzhu Xiong - 2024-03-12

    Dear Pasi,

    I am recently pondering over some linkage maps produced from an F2 cross including just parents and offspring in Lepidoptera. Markers are from RADseq, about 130-200 markers per chromosome with about 175 offspring.

    I used markers informative to only the paternal parent to infer male-recombination, and some chromosomes have maps much longer than 50cM. However, upon inspecting haplotypes, the recombination probability between markers towards each end of the chromosome is much higher than 0.5, sometimes 0.6, and even 0.7!.

    Is it sufficient to conclude that there is map inflation going on and I have inferred too many recombination breakpoints?

    Thanks,
    TZ

     
  • Pasi Rastas

    Pasi Rastas - 2024-03-13

    Dear TZ,

    Thank you for your question.

    My maps for Lepidoptera has been 50-80cM.

    If there are poor markers, these tend to go to the map ends. Maybe this is the case here. Removing markers from the ends might solve the problem. The new default mapping function is Morgan, so recombination rate of 0.7 produces 70cM gaps. For other mapping functions, this would break the linkage group.

    Also check that the data correct, for example IBD values between parents and offspring are about 0.5 or higher.

    If your markers are in the physical coordinates, the Marey map will typically tell if the end markers are noise or not. Physical coordinated can be utilised in OrderMarkers2 as well. The new flag proximityScale (e.g. =100) will improve maps for RADseq. Also usePhysical paramater will take into account the physical location of markers.

    Cheers,
    Pasi

     
  • Tianzhu Xiong

    Tianzhu Xiong - 2024-03-13

    Hi Pasi,

    Thanks for the insight about this problem. Recombination probability in meiosis is capped at 0.5 between any pair of markers (i.e., markers separated by an even number of recombination breakpoints = no recombination between markers), so that's why I think a recombination probability of 0.7 is abnormally high. I will check the IBD values between offspring and between parent-offpspring.

    Is the new flag proximityScale added only recently? May I ask what is the general guidance for setting its value?

    Thanks again for the suggestion!

    Best,
    Tianzhu

     
  • Pasi Rastas

    Pasi Rastas - 2024-03-13

    Dear Tianzhu,

    Could you post the map ends from Lep-MAP3, and maybe the commands you have used? I have not seen over 0.5 recombination fractions. Are you using grandparents for phasing? Without grandparents, I think the Lep-MAP3 phasing algorithm should flip 0.7 in this case to 0.3.

    Proximityscale should be about the length of the RAD site. Value of 100 (in bp) should be ok for most cases.

    Cheers,
    Pasi

     
  • Tianzhu Xiong

    Tianzhu Xiong - 2024-03-13

    Hi Pasi,

    For instance, in the attached map for chromosome 2, I plotted paternal haplotypes on the left-hand side (black-white corresponds to 0-1 in inferred phase -- the phase is NOT the grandparental phase and grandparents were not used for linkage analysis). On the right-hand side, I plotted three quantities:

    the p-curve (blue): the probability of recombination for a given marker with the left-most marker

    the q-curve (yellow): the probability of recombination for a given marker with the right-most marker

    the Marey map (red): the cumulative mapping distance

    The probability of recombination is simply calculated by the frequency of two markers at the opposite phase among inferred haplotypes.

    There seems to be way too few haplotypes without breakpoints, and a large chunk of chromosome on the right seems to recombine with over 0.6 probability with the left end.

    Tianzhu

     

    Last edit: Tianzhu Xiong 2024-03-13
  • Pasi Rastas

    Pasi Rastas - 2024-03-13

    Dear Tianzhu,

    Aha. We are talking about different thing here. I thought there was 0.7 fraction of crossovers between two adjacent markers (rf in the mapping functions).

    The map is about 75 cM. It is within my expected range of 50-80 cM.
    If you think this is too long, check if there are individuals that recombine too many times over all chrs. I think you can get this information from the Lep-MAP3's error stream output. You can also force Lep-MAP3 to produce shorter maps by changing the scale parameter.

    Would 0.6 might still be by chance (binomial distributed with p=0.5 and n=175)? Could it be chromatid interference or selection? Is there segregation distortion?

    Cheers,
    Pasi

     

    Last edit: Pasi Rastas 2024-03-13
  • Tianzhu Xiong

    Tianzhu Xiong - 2024-03-13

    Hi Pasi,

    Thanks for the quick response. 75cM is fine in my view too, but only when the extra 25cM is produced by, for instance, double crossovers, which would still keep the recombination probability capped at 0.5. It seems to happen to almost all (longer) chromosomes in the data so I was a bit suspicious what could be going on.

    It's extremely interesing that you mentioned chromatid interference:

    Chromatid interference

    If two COs are favored between nonoverlapping sets of chromatids then it would push recombination probability between chromosome ends above 0.5, and generate more single breakpoints.

    I guess the question right now is whether the current signal is real or just due to misspecified LepMap3 parameters.

    Cheers,
    Tianzhu

     
  • Pasi Rastas

    Pasi Rastas - 2024-03-21

    Dear Tianzhu,

    I think this is about the quality of your data and number of markers. I would aim for a higher number of markers, often you get about 1000 markers per chr with RADseq and 200 markers is still quite few. Moreover, I have no idea how you have analysed and processed your data, these are critical steps in the map construction.

    Cheers,
    Pasi

     
  • Tianzhu Xiong

    Tianzhu Xiong - 2024-04-24

    Hi Pasi,

    Thanks for your help in previous posts. I have gone back to do more research on this dataset. As you said 200-300 markers per chromosome are indeed still small. This cross involves ~180 progenies from 6 families of crosses (just parent-offpsring, no grandparents).

    A previous researcher working on this call data used the following command for OrderMarkers2, let's say it's Version A:

    java OrderMarkers2 map=map.txt data=- numThreads=2 recombination2=0 chromosome=1 informativeMask=1 minError=0.1 improveOrder=0 evaluateOrder=physical_order.txt usePhysical=1 0.1

    My code for the same call data was initially like the following (Version B),

    java -cp OrderMarkers2 evaluateOrder=physical_order.txt data=- recombination2=0 recombination1=0.001 interference1=0.001 outputPhasedData=1 useMorgan=1 improveOrder=0 numThreads=2 informativeMask=13

    But Version B gives me way too many recombinations (sometimes as high as 9 breakpoints in one individual), even though automatic scaling is supposed to take care of it. After comparing both codes, it seems that minError=0.1 is the key parameter in Version A that avoids this problem. When I added minError=0.1 to Version B it generates nearly identical results to Version A.

    However, here are my confusions:

    1) Why does a large value of minError suppress excessive recombination? I think minError=0.1 is extremely high since the default is minError=0.000001.

    2) Even though a large value of minError seems to fix the map, there is no prior knowledge to know how large it should be. For instance, if I set minError=0.3, there will be even less recombination per chromosome to the point that the map distance is usually 30cM-40cM, while minError=0.1 gives 70cM on larger chromosomes. And again, in these cases recombination probabilities between chromosome ends are not typical: for minError=0.3 the chromosome ends recombine with frequency << 0.5, and when minError=0.1 the chromosome ends recombine with frequency >> 0.5. So it seems that changing minError merely changes the number of haplotypes that display single recombination breakpoints, because usually chromosome ends recombine with frequency ~0.5 in butterflies.

    Maybe I should just go back to raw sequencing data and redo everything, but any insight on the questions above would be really helpful! Thank you!

    Best,
    Tianzhu

     
  • Pasi Rastas

    Pasi Rastas - 2024-04-26

    Dear Tianzhu,

    The parameter minError controls how much you trust your data. It will cap the genotype quality to this value, 0.1 is phred score of 10. If this is RADseq data, very useful parameter is proximityScale, value of 50 or 100 might work for most data to scale down the effect of too close by SNPs (from the same RAD site).

    I would be more interest on how you generated the input genotype data. Maybe the problem is there?

    Cheers,
    Pasi

     

Log in to post a comment.