I have a selfing population where I have the genotypes of their grandparents but unfortunately not the parent. On wiki it says "Data for the single parent is not really needed, but the grandparents (say two individuals from different lines crossed to form the parent) can be used. ". How does it work? Do I put the grandparents as parents in the pedigree? Thank you.
Best,
Huan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You have to add two dummy parents anyway. Just make a F2 pedigree with "dummy1" and "dummy2" as parents and your selfing population as their offspring. You can add the grandparents for dummy1 and dummy2. I have attached a small example pedigree. Please read the wiki carefully how to put the phasing parameters in OrderMarkers2.
Thank you very much for your prompt reply! It is really rare nowadays to have such helpful and reachable developers!
I didn't know it was OK to have individuals in the pedigree but missing in the vcf. Just to make sure I understand you correctly, I made a dummy vcf with the .ped you suggested and run ParentCall2 and this is what I got:
Warning: Different number of grandparents (4 and 2) in family F
Found 2 grandparents in family F
Number of individuals = 6
Number of families = 1
Warning: Individual D1 not contained in the data, set to all missing
Warning: Individual D2 not contained in the data, set to all missing
Number of called markers = 1494 (1494 informative)
Number of called Z/X markers = 0
Please kindly let me know if I understand this correctly.
Best,
Huan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was following your suggestions in this post "https://sourceforge.net/p/lep-map3/discussion/general/thread/48bc87a0e9/" to split my largest group (group1). I used the map I got using lodLimit=25 and set the lodLimit to be 30:
Out of my >100k markers, >99.9% of the markers had the same assignment. However the one with a map provided was about 4 times faster. If this is the case, should I always start with some map? How big can the gap be? Say can I use a map constructed with lodLimit=25 and use lodLimit=35 (map=LOD25.txt lodLimit=35)?
I'm asking because I have WGS data and >1M markers (after removing markers that are more than 99% correlated). Performance can really be an issue (already using 64 cores). It seems like when lodLimit increase, the computing time also increases. Therefore I was hoping to always use a map file to speed up the process if it won't affect the accuracy.
Looking forward to your advice.
Best,
Huan
Best,
Huan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With the map parameter, SeparateChromosomes2 computes LOD scores only for one linkage group of the map (default lg=1). So there are much fewer markers and therefore it is faster.
You can also yield good speedups with samplePairs parameter. For million markers I typically use samplePairs=0.1 to achieve 10x speedup. (I have used 0.01 as well to get results even faster). When this parameter is set, SC2 will sample pair-wise comparisons with this rate.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The SeparateChromosomes2 yield very nice results, matching the number of chromosomes I have nicely.
I am currently at the JoinSingles2All stage where I am trying to put the singletons back to the chromosomes. I have been playing with lodLimit and lodDifference. From what I understand, the lower they are, the less singletons will remain. The results agrees with this assumption, except that when I tried lodDifference = 0 (the default setting I believe), there were more singletons left than lodDifference = 1 (with everything else being the same). Any idea what could be going on?
Best,
Huan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, this is expected and possible behaviour. If the LOD score difference between best and second best group is larger than the lodDifference and higher exceed lodLimit, marker is put into the higher LOD group. Without lodDifference, if both LODs are higher than lodLimit, marker is not put to the map.
Cheers,
Pasi
edited: lodLimit=>LodDifference
Last edit: Pasi Rastas 2025-05-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much Pasi for the clarification. I am really glad that I asked. Now I understand the results better!
I am current at the OrderMarkers2 step and here are some of my observations/speculations that I hope to verify with you.
Supposedly I ran java -cp bin/ OrderMarkers2 map=map5_js.txt data=data_f.call chromosome=1 >order1.txtthe chromosome=1 option is referring to the assignment in map5_js.txt, not the first column in data_f.call, is it?
I have multiple families in my pedigree file; one with F1s and several with F2s. For F1s, I have the parents (F0s) and no grandparents. For the F2s, I do have grandparents (F0s). I used the grandparentPhase=1 in OrderMarkers2, and in the output I can see how many recombination events for all the F2s, but nothing was said about the F1s. Is it because "No grandparents present in family F1"? Please let me know if I am not clear.
Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, chromosome parameter refers to the linkage group numbers in your map5_js.txt file.
Currently, there is no way to have both grandparent and normally phased markers, so you should use selfingPhase=1 with your data (your data was selfing, right?).
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In my current dataset, I have several "families" and yes some of them are selfing data, but there are also crossings between F1s. Please see a demo.ped file attached:
F1: Grandparents and several F1s including P1 and P2.
F2_1: crossing between P1 and P2 where P1 is male (pollen) and P2 is female.
F2_2: still crossing between P1 and P2 where P2 is male (pollen) and P1 is female. Here I added duplicated P1 and P2 in the vcf (P1.dup and P2.dup) so their sex code does not conflict with P1 and P2 in F2_1.
F2_3: selfing of P1 (P1 X P1.dup)
F2_4: selfing of P2 (P2.dup X P2).
I chose to use grandparentsPhase=1 over selfingPhase=1 for the whole dataset because in wiki it reads "With data on these grandparents use grandparentPhase=1 in OrderMarkers2, without grandparental data use selfingPhase=1.", and I do have grandparents data for F2_3 and F2_4.
Since one can only use grandparentsPhase=1 or selfingPhase=1, should I built two maps separately, one with F1, F2_1 and F2_2 (grandparentsPhase=1), one with F2_3 and F2_4 (selfingPhase=1)?
The grandparentsPhase=1 works both with selfing and non-selfing data (when you have grandparents). It will remove parts of the data without grandparents.
If you have such mixed types of families, you probably have to make separate maps for different families. (Lep-MAP3 could made to work with these different phasing types, but currently you cannot input the variable phasing information for different families. )
Also note that male and female map positions make sense only on non-selfing crosses (F1 and F2). For selfing crosses, there Lep-MAP3 will output male and female positions but crossovers are (more or less) arbitrary set between dummy parents.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK now I understand why my F1 data wasn't used since there are no grandparents for them. However they are also not selfing data. So I guess the only way to include them is to not use any phasing information and let the program try all four possibilities, in a separate map?
Noted "that male and female map positions make sense only on non-selfing crosses (F1 and F2).". Since my species is actually hermaphrodite and I do not expect any achiasmatic meiosis, I am planning to use "sexAveraged=1". However I don't quite understand "For selfing crosses, the Lep-MAP3 will output male and female positions but crossovers are (more or less) arbitrary set between dummy parents.". Is this why that I see excessive recombination events in selfing ones? Should I just keep to the non-selfing crosses when using grandparentsPhase = 1? Please let me know if I am not making sense.
Best,
Huan
Last edit: hfan 2025-07-01
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Pasi,
I have a selfing population where I have the genotypes of their grandparents but unfortunately not the parent. On wiki it says "Data for the single parent is not really needed, but the grandparents (say two individuals from different lines crossed to form the parent) can be used. ". How does it work? Do I put the grandparents as parents in the pedigree? Thank you.
Best,
Huan
Dear Huan,
Thank you for your question.
You have to add two dummy parents anyway. Just make a F2 pedigree with "dummy1" and "dummy2" as parents and your selfing population as their offspring. You can add the grandparents for dummy1 and dummy2. I have attached a small example pedigree. Please read the wiki carefully how to put the phasing parameters in OrderMarkers2.
Cheers,
Pasi
Hi Pasi,
Thank you very much for your prompt reply! It is really rare nowadays to have such helpful and reachable developers!
I didn't know it was OK to have individuals in the pedigree but missing in the vcf. Just to make sure I understand you correctly, I made a dummy vcf with the .ped you suggested and run
ParentCall2
and this is what I got:Warning: Different number of grandparents (4 and 2) in family F
Found 2 grandparents in family F
Number of individuals = 6
Number of families = 1
Warning: Individual D1 not contained in the data, set to all missing
Warning: Individual D2 not contained in the data, set to all missing
Number of called markers = 1494 (1494 informative)
Number of called Z/X markers = 0
Please kindly let me know if I understand this correctly.
Best,
Huan
Dear Huan,
Seems to work correctly. Happy mapping!
Cheers,
Pasi
Thank you Pasi.
One more question, in the output of SeparateChromosomes2, does 0 means singletons (not assigned to any LGs)?
Best,
Huan
Dear Huan,
Yes, 0s are the singletons.
Cheers,
Pasi
Hi Pasi,
I was following your suggestions in this post "https://sourceforge.net/p/lep-map3/discussion/general/thread/48bc87a0e9/" to split my largest group (group1). I used the map I got using lodLimit=25 and set the lodLimit to be 30:
SeparateChromosomes2 numThreads=10 data=p.call lodLimit=30 sizeLimit=100 map=LOD25.map.txt > LOD30.map25.txt
then I compared its result to just using lodLimit=30 (without any map):
SeparateChromosomes2 numThreads=10 data=p.call lodLimit=30 sizeLimit=100 > LOD30.txt
Out of my >100k markers, >99.9% of the markers had the same assignment. However the one with a map provided was about 4 times faster. If this is the case, should I always start with some map? How big can the gap be? Say can I use a map constructed with lodLimit=25 and use lodLimit=35 (
map=LOD25.txt lodLimit=35
)?I'm asking because I have WGS data and >1M markers (after removing markers that are more than 99% correlated). Performance can really be an issue (already using 64 cores). It seems like when lodLimit increase, the computing time also increases. Therefore I was hoping to always use a map file to speed up the process if it won't affect the accuracy.
Looking forward to your advice.
Best,
Huan
Best,
Huan
Dear Huan,
With the map parameter, SeparateChromosomes2 computes LOD scores only for one linkage group of the map (default lg=1). So there are much fewer markers and therefore it is faster.
You can also yield good speedups with samplePairs parameter. For million markers I typically use samplePairs=0.1 to achieve 10x speedup. (I have used 0.01 as well to get results even faster). When this parameter is set, SC2 will sample pair-wise comparisons with this rate.
Cheers,
Pasi
Thank you Pasi for the response. I should have read the options more carefully!
I will explore the samplePairs as well. Thanks for the advice.
Best,
Huan
Hi Pasi, sorry more questions.
The SeparateChromosomes2 yield very nice results, matching the number of chromosomes I have nicely.
I am currently at the JoinSingles2All stage where I am trying to put the singletons back to the chromosomes. I have been playing with lodLimit and lodDifference. From what I understand, the lower they are, the less singletons will remain. The results agrees with this assumption, except that when I tried lodDifference = 0 (the default setting I believe), there were more singletons left than lodDifference = 1 (with everything else being the same). Any idea what could be going on?
Best,
Huan
Dear Huan,
Yes, this is expected and possible behaviour. If the LOD score difference between best and second best group is larger than the lodDifference and higher exceed lodLimit, marker is put into the higher LOD group. Without lodDifference, if both LODs are higher than lodLimit, marker is not put to the map.
Cheers,
Pasi
edited: lodLimit=>LodDifference
Last edit: Pasi Rastas 2025-05-26
Thank you very much Pasi for the clarification. I am really glad that I asked. Now I understand the results better!
I am current at the
OrderMarkers2
step and here are some of my observations/speculations that I hope to verify with you.java -cp bin/ OrderMarkers2 map=map5_js.txt data=data_f.call chromosome=1 >order1.txt
thechromosome=1
option is referring to the assignment inmap5_js.txt
, not the first column indata_f.call
, is it?grandparentPhase=1
inOrderMarkers2
, and in the output I can see how many recombination events for all the F2s, but nothing was said about the F1s. Is it because "No grandparents present in family F1"? Please let me know if I am not clear.Thank you!
Dear Huan,
Cheers,
Pasi
Hi Pasi,
In my current dataset, I have several "families" and yes some of them are selfing data, but there are also crossings between F1s. Please see a demo.ped file attached:
I chose to use grandparentsPhase=1 over selfingPhase=1 for the whole dataset because in wiki it reads "With data on these grandparents use grandparentPhase=1 in OrderMarkers2, without grandparental data use selfingPhase=1.", and I do have grandparents data for F2_3 and F2_4.
Since one can only use grandparentsPhase=1 or selfingPhase=1, should I built two maps separately, one with F1, F2_1 and F2_2 (grandparentsPhase=1), one with F2_3 and F2_4 (selfingPhase=1)?
Please kindly advise.
Best,
Huan
Last edit: hfan 2025-05-28
Dear Huan,
The grandparentsPhase=1 works both with selfing and non-selfing data (when you have grandparents). It will remove parts of the data without grandparents.
If you have such mixed types of families, you probably have to make separate maps for different families. (Lep-MAP3 could made to work with these different phasing types, but currently you cannot input the variable phasing information for different families. )
Also note that male and female map positions make sense only on non-selfing crosses (F1 and F2). For selfing crosses, there Lep-MAP3 will output male and female positions but crossovers are (more or less) arbitrary set between dummy parents.
Cheers,
Pasi
Hi Pasi,
Thank you very much for the reply.
OK now I understand why my F1 data wasn't used since there are no grandparents for them. However they are also not selfing data. So I guess the only way to include them is to not use any phasing information and let the program try all four possibilities, in a separate map?
Noted "that male and female map positions make sense only on non-selfing crosses (F1 and F2).". Since my species is actually hermaphrodite and I do not expect any achiasmatic meiosis, I am planning to use "sexAveraged=1". However I don't quite understand "For selfing crosses, the Lep-MAP3 will output male and female positions but crossovers are (more or less) arbitrary set between dummy parents.". Is this why that I see excessive recombination events in selfing ones? Should I just keep to the non-selfing crosses when using grandparentsPhase = 1? Please let me know if I am not making sense.
Best,
Huan
Last edit: hfan 2025-07-01