Hi Pasi,
I am new at working with linkage maps and therefore I would like to know if what I have done so far makes sense.
I am working with 6 families; two of them have the same father and the number of progeny for each of them is very different. Looks like this
Fam Male Female Progeny
N2 SL387 SL389 29
N3 SL390 SL392 98
N4 SL393 SL394 3
N5 SL395 SL397 224
N6 SL400 SL399 15
N7 SL400 SL402 6
I have a total of 1,356,797 SNPs obtained by following the GATK pipeline. The 1,356,797 SNPs are all bi-allelic and with no missing data.
The species I am working on has 24 chromosomes.
The commands that I have used are:
java -cp bin/ ParentCall2 data=Pedigree_LM.txt vcfFile=SNPS_LepMap3.vcf halfSibs=1 removeNonInformative=1 > data_PC.call
No grandparents present in family N2
No grandparents present in family N3
No grandparents present in family N4
No grandparents present in family N5
No grandparents present in family N6
No grandparents present in family N7
Number of individuals = 387
Number of families = 6
parent SL400 occurs 2 times in the pedigree
Number of called markers = 1308436 (1308436 informative)
Number of called Z/X markers = 0
java -cp bin/ Filtering2 data=data_PC.call dataTolerance=0.01 > data_F.call
chi^2 limits are 6.634765625, 9.2099609375, 11.3447265625
No grandparents present in family N2
No grandparents present in family N3
No grandparents present in family N4
No grandparents present in family N5
No grandparents present in family N6
No grandparents present in family N7
Number of individuals = 387
Number of families = 6
Number of markers = 1308436
Since I didn´t know which lodLimit to use, I ran SeparateChromosomes2 with different lodLimits (lodLimit= 3, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 and 70) to compare.
java -cp bin/ SeparateChromosomes2 data=data_F.call lodLimit=(3-70) numThreads=16 > SC_map(3-70).txt
I attached the file with the comparison of all SeparateChromosomes2 with the different LodLimits
Questions:
According to the results, is it correct to believe that lodLimit=50 would be the best option to use?
If so, LGs 1-24 will be the main groups and the rest will be considered the tail, right?
LOD limits 40-60 seem reasonable. You could even take LOD=35 and run SeparateChromosomes2 on it again with map=map35.txt and lodLimit=40 (to split the largest group into two, you could try LOD between 35 and 40 for this).
Sometimes I have checked whether the markers in the small groups (tail) are always next to some larger groups and added them to this larger group. This way I have noticed that some small groups are obviously noise (repeats) as they occur next to many larger groups.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a few more questions. For the next step (JoinSingles2All) how can I choose the LodLimit to use? Should it start with a LodLimit larger than the one I used for SeparateChoromosomes2?
For example, if I use the results from SeparateChromosomes2 with lodLim=50, should I use for JoinSingles2All a lodLimit=51 ?
Thanks in advance!
Cheers,
Lidia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for your quick answer and suggestions. I have done the JoinSingles2All step as you suggested, one with lodDifference=5 and one with lodDifference=10.
Questions:
Is it correct to enable the iteration option?
In SC2 I had a total of 582,263 markers in group 0 and after JoinSingles2All I was able to reduced this number. I noticed that the number of markers between both runs (JoinSingles2All with lodDifference=5 or 10) are the same in most groups. However, the groups 27, 28 and 80 increased considerably compared to the results from SC2.
Do you have any idea of how to intepret this or how I can deal with this?
Finally, how can I choose which one is better? Is it correct to believe that lodDifference=10 is better because 1) the difference in group 0 between both runs is of only 160 markers, 2) the overall number of markers in the 24 groups of interest is larger and 3) the number of markers in the tail is smaller?
I attach a file with the results from SeparateChromosomes2 and both JoinSingles2All.
Just a quick reply. Maybe it makes more sense to remove groups 25..N before JC2. This way the markers in the smaller groups could be added as well. You can remove these small groups with simple scripting.
The differences what you see can be already seen in the results of SC2 for varying LOD.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just to make sure I understood correctly. From the file obtained in SC2 (SC_map50.txt) I should change all the groups 25..N to 0, so that they are counted as singles and they are taken into consideration in the JoinSingles2All step. Is this correct?
Thanks!
Cheers,
Lidia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First of all thank you very much for all your help. I have done as suggested and it looks alright. However, for the OrderMarkers2 step, I am getting the following warning for most of the chromosomes:
Warning: Inconsistencies in phasing, please try using randomPhase=1 and/or hyperPhaser=1 in OrderMarkers2
Enabling hyperPhaser...
My command line is:
java -cp bin/ OrderMarkers2 map=JS_map50_LL30_LD10.txt data=data_F.call chromosome=1 numThreads=16> OM_R1_LG1.txt 2> OM_R1_recombinations_LG1.txt
Could you please tell how to interpret this warning and how to solve the issue? I don´t know what randomPhase or hyperPhaser will do and which one would be the best option to use.
Additionally, should I include the parameter scale in my command line? And if so, how do I decide which value to use?
Sorry to bother you again. Do you by chance have any suggestion concerning my previous message? I would like to make sure I do things the best possible way.
Thanks in advance!
Cheers,
Lidia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For some data, the data phasing converges to a local optimum. If this happens so that Lep-MAP3 finds two different solutions for the same marker order, this warning is output.
You can provide parameters "hyperPhaser=1" and/or increase phasing iterations by "phasingIterations=3" to correct for this. Or you can evaluate the output order again with "improveOrder=0 evaluateOrder=file randomPhase=1" and see if the likelihood improves. In most cases, the likelihood difference is small and the map is usable without any correction. However, I have had some very difficult data (one chromosome of one species) where multiple runs were required.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much for your answer. Just to make sure I understood correctly...
So far, I did 10 replicates of OrderMarkers2 for each LG. In order to not have to repeat this step including "hyperPhaser=1" and/or "phasingIterations=3", I could:
First, choose the replicate with the highest likelihood for each LG. Then, rerun OrderMarkers2 for the chosen replicate with java -cp bin/ OrderMarkers2 evaluateOrder=OM_R${i}_LG${j}.txt data=data_F.call improveOrder=0 randomPhase=1 numThreads=16 > EO_R${i}_LG${j}.txt
Is this correct? Or should I rerun the OrderMarkers2 for all the replicates with "improveOrder=0 evaluateOrder=file randomPhase=1" and then choose the replicate with the largest likelihood for each LG.
Thanks a lot!!
Cheers,
Lidia
Last edit: Lidia de los Ríos Pérez 2020-03-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The final likelihood is in order1.map, however, the likelhood used to choose the solution is printed in order1.err (e.g. *score = NUM). This might not be the same, sometimes the likelihood is even better in the map than in err.
If you run LM3 for 10 times and choose the solution with highest likelihood (in the maps), it is probably good enough... If you are worried, you can re-evaluate all maps to have another 10 replicates (with randomPhase/hyperPhaser/numPhasingIterations).
Using the grandparental phase (if you have grandparental data, grandparentPhase=1) also removes this problem but reduces the number of markers.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Pasi,
I am new at working with linkage maps and therefore I would like to know if what I have done so far makes sense.
I am working with 6 families; two of them have the same father and the number of progeny for each of them is very different. Looks like this
Fam Male Female Progeny
N2 SL387 SL389 29
N3 SL390 SL392 98
N4 SL393 SL394 3
N5 SL395 SL397 224
N6 SL400 SL399 15
N7 SL400 SL402 6
I have a total of 1,356,797 SNPs obtained by following the GATK pipeline. The 1,356,797 SNPs are all bi-allelic and with no missing data.
The species I am working on has 24 chromosomes.
The commands that I have used are:
Since I didn´t know which lodLimit to use, I ran SeparateChromosomes2 with different lodLimits (lodLimit= 3, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 and 70) to compare.
Questions:
According to the results, is it correct to believe that lodLimit=50 would be the best option to use?
If so, LGs 1-24 will be the main groups and the rest will be considered the tail, right?
Thanks a lor for your support.
Best regards,
Lidia
Last edit: Lidia de los Ríos Pérez 2020-01-27
Dear Lidia,
Thank you for your question. The results look ok.
LOD limits 40-60 seem reasonable. You could even take LOD=35 and run SeparateChromosomes2 on it again with map=map35.txt and lodLimit=40 (to split the largest group into two, you could try LOD between 35 and 40 for this).
Sometimes I have checked whether the markers in the small groups (tail) are always next to some larger groups and added them to this larger group. This way I have noticed that some small groups are obviously noise (repeats) as they occur next to many larger groups.
Cheers,
Pasi
Dear Pasi,
Thank you very much for your quick answer.
I have a few more questions. For the next step (JoinSingles2All) how can I choose the LodLimit to use? Should it start with a LodLimit larger than the one I used for SeparateChoromosomes2?
For example, if I use the results from SeparateChromosomes2 with lodLim=50, should I use for JoinSingles2All a lodLimit=51 ?
Thanks in advance!
Cheers,
Lidia
Dear Lidia,
You can use a smaller LOD limit with JoinSingles2All. For example, use 50 for SC2 and then 30 for JS2 (and maybe lodDifference=5 or 10).
Cheers,
Pasi
Dear Pasi,
Thank you very much for your quick answer and suggestions. I have done the JoinSingles2All step as you suggested, one with lodDifference=5 and one with lodDifference=10.
Questions:
Is it correct to enable the iteration option?
In SC2 I had a total of 582,263 markers in group 0 and after JoinSingles2All I was able to reduced this number. I noticed that the number of markers between both runs (JoinSingles2All with lodDifference=5 or 10) are the same in most groups. However, the groups 27, 28 and 80 increased considerably compared to the results from SC2.
Do you have any idea of how to intepret this or how I can deal with this?
Finally, how can I choose which one is better? Is it correct to believe that lodDifference=10 is better because 1) the difference in group 0 between both runs is of only 160 markers, 2) the overall number of markers in the 24 groups of interest is larger and 3) the number of markers in the tail is smaller?
I attach a file with the results from SeparateChromosomes2 and both JoinSingles2All.
Thanks again for your help.
Lidia
Dear Lidia,
Just a quick reply. Maybe it makes more sense to remove groups 25..N before JC2. This way the markers in the smaller groups could be added as well. You can remove these small groups with simple scripting.
The differences what you see can be already seen in the results of SC2 for varying LOD.
Cheers,
Pasi
Dear Pasi,
Just to make sure I understood correctly. From the file obtained in SC2 (SC_map50.txt) I should change all the groups 25..N to 0, so that they are counted as singles and they are taken into consideration in the JoinSingles2All step. Is this correct?
Thanks!
Cheers,
Lidia
Dear Lidia,
Yes, this is correct.
Cheers,
Pasi
Hi Pasi,
First of all thank you very much for all your help. I have done as suggested and it looks alright. However, for the OrderMarkers2 step, I am getting the following warning for most of the chromosomes:
Warning: Inconsistencies in phasing, please try using randomPhase=1 and/or hyperPhaser=1 in OrderMarkers2
Enabling hyperPhaser...
My command line is:
java -cp bin/ OrderMarkers2 map=JS_map50_LL30_LD10.txt data=data_F.call chromosome=1 numThreads=16> OM_R1_LG1.txt 2> OM_R1_recombinations_LG1.txt
I attach the file with the warning.
Thanks in advance,
Cheers,
Lidia
Last edit: Lidia de los Ríos Pérez 2020-02-19
Hi Pasi,
Sorry to bother you again. Do you by chance have any suggestion concerning my previous message? I would like to make sure I do things the best possible way.
Thanks in advance!
Cheers,
Lidia
Dear Lidia,
For some data, the data phasing converges to a local optimum. If this happens so that Lep-MAP3 finds two different solutions for the same marker order, this warning is output.
You can provide parameters "hyperPhaser=1" and/or increase phasing iterations by "phasingIterations=3" to correct for this. Or you can evaluate the output order again with "improveOrder=0 evaluateOrder=file randomPhase=1" and see if the likelihood improves. In most cases, the likelihood difference is small and the map is usable without any correction. However, I have had some very difficult data (one chromosome of one species) where multiple runs were required.
Cheers,
Pasi
Hi Pasi,
Thank you very much for your answer. Just to make sure I understood correctly...
So far, I did 10 replicates of OrderMarkers2 for each LG. In order to not have to repeat this step including "hyperPhaser=1" and/or "phasingIterations=3", I could:
First, choose the replicate with the highest likelihood for each LG. Then, rerun OrderMarkers2 for the chosen replicate with java -cp bin/ OrderMarkers2 evaluateOrder=OM_R${i}_LG${j}.txt data=data_F.call improveOrder=0 randomPhase=1 numThreads=16 > EO_R${i}_LG${j}.txt
Is this correct? Or should I rerun the OrderMarkers2 for all the replicates with "improveOrder=0 evaluateOrder=file randomPhase=1" and then choose the replicate with the largest likelihood for each LG.
Thanks a lot!!
Cheers,
Lidia
Last edit: Lidia de los Ríos Pérez 2020-03-09
Dear Lidia,
If you run OrderMarkers2 something like this:
java OrderMarkers2 ... >order1.map 2>order1.err
The final likelihood is in order1.map, however, the likelhood used to choose the solution is printed in order1.err (e.g. *score = NUM). This might not be the same, sometimes the likelihood is even better in the map than in err.
If you run LM3 for 10 times and choose the solution with highest likelihood (in the maps), it is probably good enough... If you are worried, you can re-evaluate all maps to have another 10 replicates (with randomPhase/hyperPhaser/numPhasingIterations).
Using the grandparental phase (if you have grandparental data, grandparentPhase=1) also removes this problem but reduces the number of markers.
Cheers,
Pasi