Menu

SeparateChromosomes2

2020-01-27
2020-03-09
  • Lidia de los Ríos Pérez

    Hi Pasi,
    I am new at working with linkage maps and therefore I would like to know if what I have done so far makes sense.
    I am working with 6 families; two of them have the same father and the number of progeny for each of them is very different. Looks like this
    Fam Male Female Progeny
    N2 SL387 SL389 29
    N3 SL390 SL392 98
    N4 SL393 SL394 3
    N5 SL395 SL397 224
    N6 SL400 SL399 15
    N7 SL400 SL402 6

    I have a total of 1,356,797 SNPs obtained by following the GATK pipeline. The 1,356,797 SNPs are all bi-allelic and with no missing data.

    The species I am working on has 24 chromosomes.

    The commands that I have used are:

    java -cp bin/ ParentCall2 data=Pedigree_LM.txt vcfFile=SNPS_LepMap3.vcf halfSibs=1 removeNonInformative=1 > data_PC.call
    No grandparents present in family N2
    No grandparents present in family N3
    No grandparents present in family N4
    No grandparents present in family N5
    No grandparents present in family N6
    No grandparents present in family N7
    Number of individuals = 387
    Number of families = 6
    parent SL400 occurs 2 times in the pedigree
    Number of called markers = 1308436 (1308436 informative)
    Number of called Z/X markers = 0

    java -cp bin/ Filtering2 data=data_PC.call dataTolerance=0.01 > data_F.call
    chi^2 limits are 6.634765625, 9.2099609375, 11.3447265625
    No grandparents present in family N2
    No grandparents present in family N3
    No grandparents present in family N4
    No grandparents present in family N5
    No grandparents present in family N6
    No grandparents present in family N7
    Number of individuals = 387
    Number of families = 6
    Number of markers = 1308436

    Since I didn´t know which lodLimit to use, I ran SeparateChromosomes2 with different lodLimits (lodLimit= 3, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 and 70) to compare.

    java -cp bin/ SeparateChromosomes2 data=data_F.call lodLimit=(3-70) numThreads=16 > SC_map(3-70).txt
    I attached the file with the comparison of all SeparateChromosomes2 with the different LodLimits

    Questions:
    According to the results, is it correct to believe that lodLimit=50 would be the best option to use?
    If so, LGs 1-24 will be the main groups and the rest will be considered the tail, right?

    Thanks a lor for your support.

    Best regards,
    Lidia

     

    Last edit: Lidia de los Ríos Pérez 2020-01-27
  • Pasi Rastas

    Pasi Rastas - 2020-01-27

    Dear Lidia,

    Thank you for your question. The results look ok.

    LOD limits 40-60 seem reasonable. You could even take LOD=35 and run SeparateChromosomes2 on it again with map=map35.txt and lodLimit=40 (to split the largest group into two, you could try LOD between 35 and 40 for this).

    Sometimes I have checked whether the markers in the small groups (tail) are always next to some larger groups and added them to this larger group. This way I have noticed that some small groups are obviously noise (repeats) as they occur next to many larger groups.

    Cheers,
    Pasi

     
  • Lidia de los Ríos Pérez

    Dear Pasi,

    Thank you very much for your quick answer.

    I have a few more questions. For the next step (JoinSingles2All) how can I choose the LodLimit to use? Should it start with a LodLimit larger than the one I used for SeparateChoromosomes2?

    For example, if I use the results from SeparateChromosomes2 with lodLim=50, should I use for JoinSingles2All a lodLimit=51 ?

    Thanks in advance!

    Cheers,
    Lidia

     
  • Pasi Rastas

    Pasi Rastas - 2020-01-27

    Dear Lidia,

    You can use a smaller LOD limit with JoinSingles2All. For example, use 50 for SC2 and then 30 for JS2 (and maybe lodDifference=5 or 10).

    Cheers,
    Pasi

     
  • Lidia de los Ríos Pérez

    Dear Pasi,

    Thank you very much for your quick answer and suggestions. I have done the JoinSingles2All step as you suggested, one with lodDifference=5 and one with lodDifference=10.

    java -cp bin/ JoinSingles2All map=SC_map50.txt data=data_F.call lodLimit=30 lodDifference=5 iterate=1 numThreads=16> JS_map50_LL30_LD5.txt

    java -cp bin/ JoinSingles2All map=SC_map50.txt data=data_F.call lodLimit=30 lodDifference=10 iterate=1 numThreads=16> JS_map50_LL30_LD10.txt

    Questions:
    Is it correct to enable the iteration option?

    In SC2 I had a total of 582,263 markers in group 0 and after JoinSingles2All I was able to reduced this number. I noticed that the number of markers between both runs (JoinSingles2All with lodDifference=5 or 10) are the same in most groups. However, the groups 27, 28 and 80 increased considerably compared to the results from SC2.
    Do you have any idea of how to intepret this or how I can deal with this?

    Finally, how can I choose which one is better? Is it correct to believe that lodDifference=10 is better because 1) the difference in group 0 between both runs is of only 160 markers, 2) the overall number of markers in the 24 groups of interest is larger and 3) the number of markers in the tail is smaller?

    I attach a file with the results from SeparateChromosomes2 and both JoinSingles2All.

    Thanks again for your help.

    Lidia

     
    • Pasi Rastas

      Pasi Rastas - 2020-01-28

      Dear Lidia,

      Just a quick reply. Maybe it makes more sense to remove groups 25..N before JC2. This way the markers in the smaller groups could be added as well. You can remove these small groups with simple scripting.

      The differences what you see can be already seen in the results of SC2 for varying LOD.

      Cheers,
      Pasi

       
  • Lidia de los Ríos Pérez

    Dear Pasi,

    Just to make sure I understood correctly. From the file obtained in SC2 (SC_map50.txt) I should change all the groups 25..N to 0, so that they are counted as singles and they are taken into consideration in the JoinSingles2All step. Is this correct?

    Thanks!

    Cheers,
    Lidia

     
  • Pasi Rastas

    Pasi Rastas - 2020-01-29

    Dear Lidia,

    Yes, this is correct.

    Cheers,
    Pasi

     
  • Lidia de los Ríos Pérez

    Hi Pasi,

    First of all thank you very much for all your help. I have done as suggested and it looks alright. However, for the OrderMarkers2 step, I am getting the following warning for most of the chromosomes:

    Warning: Inconsistencies in phasing, please try using randomPhase=1 and/or hyperPhaser=1 in OrderMarkers2
    Enabling hyperPhaser...

    My command line is:
    java -cp bin/ OrderMarkers2 map=JS_map50_LL30_LD10.txt data=data_F.call chromosome=1 numThreads=16> OM_R1_LG1.txt 2> OM_R1_recombinations_LG1.txt

    • Could you please tell how to interpret this warning and how to solve the issue? I don´t know what randomPhase or hyperPhaser will do and which one would be the best option to use.
    • Additionally, should I include the parameter scale in my command line? And if so, how do I decide which value to use?

    I attach the file with the warning.

    Thanks in advance,

    Cheers,
    Lidia

     

    Last edit: Lidia de los Ríos Pérez 2020-02-19
  • Lidia de los Ríos Pérez

    Hi Pasi,

    Sorry to bother you again. Do you by chance have any suggestion concerning my previous message? I would like to make sure I do things the best possible way.

    Thanks in advance!
    Cheers,
    Lidia

     
  • Pasi Rastas

    Pasi Rastas - 2020-03-09

    Dear Lidia,

    For some data, the data phasing converges to a local optimum. If this happens so that Lep-MAP3 finds two different solutions for the same marker order, this warning is output.

    You can provide parameters "hyperPhaser=1" and/or increase phasing iterations by "phasingIterations=3" to correct for this. Or you can evaluate the output order again with "improveOrder=0 evaluateOrder=file randomPhase=1" and see if the likelihood improves. In most cases, the likelihood difference is small and the map is usable without any correction. However, I have had some very difficult data (one chromosome of one species) where multiple runs were required.

    Cheers,
    Pasi

     
  • Lidia de los Ríos Pérez

    Hi Pasi,

    Thank you very much for your answer. Just to make sure I understood correctly...

    So far, I did 10 replicates of OrderMarkers2 for each LG. In order to not have to repeat this step including "hyperPhaser=1" and/or "phasingIterations=3", I could:

    First, choose the replicate with the highest likelihood for each LG. Then, rerun OrderMarkers2 for the chosen replicate with java -cp bin/ OrderMarkers2 evaluateOrder=OM_R${i}_LG${j}.txt data=data_F.call improveOrder=0 randomPhase=1 numThreads=16 > EO_R${i}_LG${j}.txt

    Is this correct? Or should I rerun the OrderMarkers2 for all the replicates with "improveOrder=0 evaluateOrder=file randomPhase=1" and then choose the replicate with the largest likelihood for each LG.

    Thanks a lot!!

    Cheers,
    Lidia

     

    Last edit: Lidia de los Ríos Pérez 2020-03-09
  • Pasi Rastas

    Pasi Rastas - 2020-03-09

    Dear Lidia,

    If you run OrderMarkers2 something like this:

    java OrderMarkers2 ... >order1.map 2>order1.err

    The final likelihood is in order1.map, however, the likelhood used to choose the solution is printed in order1.err (e.g. *score = NUM). This might not be the same, sometimes the likelihood is even better in the map than in err.

    If you run LM3 for 10 times and choose the solution with highest likelihood (in the maps), it is probably good enough... If you are worried, you can re-evaluate all maps to have another 10 replicates (with randomPhase/hyperPhaser/numPhasingIterations).

    Using the grandparental phase (if you have grandparental data, grandparentPhase=1) also removes this problem but reduces the number of markers.

    Cheers,
    Pasi

     

Log in to post a comment.

MongoDB Logo MongoDB