It would be really great if someone in the Linkage map community would help me regarding tools used to convert data in .vcf format to .linkage format. I tried using mega2, but it crashed midway. Then i tried the vcf2posterior.awk followed by simpleConvert.awk scripts provided here. Although this produced .linkage file, some of the genotype values were expressed in power of 10 ("7.94328e-06") which the Filtering module could not process. Even if i remove these values, its produces an error (Error: Sex of the parent(s) is not specified). I have no idea how to incorporate the pedigree values into the .linkage file and it would be really awesome if someone here guided me to do the same,
Most Sincerely
Saurav Baral
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The simplest way might be to use ParentCall2 module in Lep-MAP3. Provide the pedigree in data and vcf in vcfFile. Compressed vcfs are not yet supported but can be simply used like this
"zcat data.vcf.gz | java ParentCall2 ... data=pedigree.txt vcfFile=- ...".
The pedigree is given as tranpose (that of linkage file) and first and second columns are CHR and POS. ParentCall2 outputs data is such format that can be easily converted to linkage (simpleconvert, transpose, cut) or used in LM3.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried running the command as you suggested. I am getting a java error:
java.lang.NullPointerException
at DataParser.getParentsAndGrandParents(DataParser.java:154)
at DataParser.getParentsAndGrandParents(DataParser.java:108)
at DataParser.getNextLine(DataParser.java:615)
at DataParser.getNextLine(DataParser.java:540)
at ParentCall2.callParents(ParentCall2.java:104)
at ParentCall2.main(ParentCall2.java:711)
Error 504
Error: Unable to load input file or errors in the file
is there an error in my input files. The vcf i am using is produced using Tasses GBS v2 pipeline and the pedigree data is in the following format:
I transposed the pedigree files and did the analysis again. This time i am getting a new error:
No grandparents present in family 1
Number of individuals = 96
Number of families = 1
./.:0,0:0 2
Error 518
Error: vcf file does not contain such field
I tried searching in google for this error but was unable to find anything. I am really sorry that for coming to you with so many errors. This is my first time trying to prepare a linkage map and please forgive my ignorance.
Most sincerely
Saurav Baral
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Pasi,
Thank you for your invaluable help. I think the issue was with my vcf file. I filtered the vcf file to retain only biallelic sites and the program worked wonderfully. I am not exactly sure if that solved my problem but none-the-less it works now. I am really grateful for your help in this matter.
Most sincerely
Saurav Baral
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Pasi and Saurav,
I am trying to use LepMap-3 to generate linkage maps. To use in ParentCall2 module, I tried to transpose pedigree file (.txt) first. But it shows following error:
sed -e 's//\t/g' ped_lep_map_352.txt awk '{print "CHR\tPOS\t" $0}' >ped_transposed.txt
sed: -e expression #1, char 0: no previous regular expression
I do have my ped file format as follows:
GB 588160:MERGE 0 0 2 1
GB 588271:MERGE 0 0 1 1
GB 160_271_001:C7TMYANXX:6:250514961 588271:MERGE 588160:MERGE 0 1
GB 160_271_002:C7TMYANXX:6:250514962 588271:MERGE 588160:MERGE 0 1
Let me know if I am wrong! Thank you.
regards,
Gaurab
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much. Yes, I realized that I forgot to give that space. I transposed the pedigree file but when I ran ParentCall2, I got this error:
~/Desktop/GBS/LepMap$ java -cp ./bin ParentCall2 data=pedigree_transposed.txt vcfFile=Beagle_imputed_phased.vcf removeNonInformative=1 >outputcall.call
No grandparents present in family GB
Number of individuals = 352
Number of families = 1
Error 503
Error: Wrong number of columns in the input file
Note: this is my F1 population with unknown grandparents.
as I mentioned earlier I do have pedigree file in following format;
FYI I have attached the screenshot of bottom of the transposed pedigree file herewith. I do have a doubt on that long lines of zeros (0) there. Please let me know where i am going wrong.
The pedigree does not seem correct. I ran the command on you example in the first post and got a correct file. Do your file has more than one space between values? Maybe you can try this then. Or just sen the pedigree (ped_lep_map_352.txt) to me so I will figure this out.
But, I think this is not a good way of deal with exact problem. I have attached my untransposed ped file herewith. (Firstl, I entered the pedigree information in excel sheet and copied in notepad (.txt)). let me know where I was doing wrong.
I have some other questions regarding map construction.
Q1. Do I need to provide a separate column for phenotype in ped file if my aim is to create linkage map? If yes, does it matter with 0 or 1 as a phenotypic value for all individuals?
Q2. My mapping population is F1 from highy cross pollinated parents. Which parameters (if more than one) do I have to consider if I want to create 2 separete maps for parents (male and female) and an integrated map {my focus is on InformativeMask=0,1,2 or 3 and how often I need to give these parameters in different steps of map construction?}
Q3. I filtered the data using VCF tool (for MAF, hwe, missing genotypes, depth and biallelic SNPs before using LepMap3). I couldnot able to filter indels though. Do these indels can affect map construction?
Q4. I want to filter markers based on segregation pattern in Full sib family between parents and offisprings (AAxAa, aaxAa and AaxAa : which will be only informative in case of double pseudo test cross I think) before feeding into LepMap3. Can I do it in LepMap3 or It can automatically phased the genotypes? is there any specific tools to deal with double pseudo test cross population?
Since, this is my first attempt to prepare a linkage map and please forgive my ignorance.
Q1. Phenotypes are not currently used for anything.
Q2. Most reliable single parent maps you get by providing informativeMask=1 and informativeMask=2 separately. I have also made maps with informativeMask 13 and 23 as well. Most likely it suffices to set this parameter of OrderMarkers2 only.
Q3. Indels should work as SNPs.
Q4. If I understand the double pseudo test-cross correctly, it should work as F2 in LM3.
Cheers,
Pasi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
It would be really great if someone in the Linkage map community would help me regarding tools used to convert data in .vcf format to .linkage format. I tried using mega2, but it crashed midway. Then i tried the vcf2posterior.awk followed by simpleConvert.awk scripts provided here. Although this produced .linkage file, some of the genotype values were expressed in power of 10 ("7.94328e-06") which the Filtering module could not process. Even if i remove these values, its produces an error (Error: Sex of the parent(s) is not specified). I have no idea how to incorporate the pedigree values into the .linkage file and it would be really awesome if someone here guided me to do the same,
Most Sincerely
Saurav Baral
Dear Saurav Baral,
Thank you for your email.
The simplest way might be to use ParentCall2 module in Lep-MAP3. Provide the pedigree in data and vcf in vcfFile. Compressed vcfs are not yet supported but can be simply used like this
"zcat data.vcf.gz | java ParentCall2 ... data=pedigree.txt vcfFile=- ...".
The pedigree is given as tranpose (that of linkage file) and first and second columns are CHR and POS. ParentCall2 outputs data is such format that can be easily converted to linkage (simpleconvert, transpose, cut) or used in LM3.
Cheers,
Pasi
Dear Pasi,
I tried running the command as you suggested. I am getting a java error:
java.lang.NullPointerException
at DataParser.getParentsAndGrandParents(DataParser.java:154)
at DataParser.getParentsAndGrandParents(DataParser.java:108)
at DataParser.getNextLine(DataParser.java:615)
at DataParser.getNextLine(DataParser.java:540)
at ParentCall2.callParents(ParentCall2.java:104)
at ParentCall2.main(ParentCall2.java:711)
Error 504
Error: Unable to load input file or errors in the file
is there an error in my input files. The vcf i am using is produced using Tasses GBS v2 pipeline and the pedigree data is in the following format:
# family id father mother sex(1M,2F) phenotype(need one column at least, it can be 0)
1 1 0 0 1 2
1 2 0 0 2 1
1 1:250020986 1 2 2 1
1 10:250020986 1 2 2 1
1 11:250020986 1 2 2 1
1 12:250020986 1 2 2 1
1 13:250020986 1 2 2 1
1 14:250020986 1 2 2 1
1 15:250020986 1 2 2 1
1 16:250020986 1 2 2 1
1 17:250020986 1 2 2 1
1 18:250020986 1 2 2 1
1 19:250020986 1 2 2 1
1 2:250020986 1 2 2 1
1 20:250020986 1 2 2 1
1 21:250020986 1 2 2 1
1 22:250020986 1 2 2 1
1 23:250020986 1 2 2 1
1 24:250020986 1 2 2 1
Please let me know where i am going wrong.
Ok,
The pedigree must be tranposed, my old school fix is this (ped.txt is the pedigree you had):
sed -e 's/ /\t/g' ped.txt |./transpose_tab|awk '{print "CHR\tPOS\t" $0}' >ped_t.txt
Then used ped_t.txt for ParentCall2.
Cheers,
Pasi
Dear Pasi,
I transposed the pedigree files and did the analysis again. This time i am getting a new error:
No grandparents present in family 1
Number of individuals = 96
Number of families = 1
./.:0,0:0 2
Error 518
Error: vcf file does not contain such field
I tried searching in google for this error but was unable to find anything. I am really sorry that for coming to you with so many errors. This is my first time trying to prepare a linkage map and please forgive my ignorance.
Most sincerely
Saurav Baral
Dear Pasi,
Thank you for your invaluable help. I think the issue was with my vcf file. I filtered the vcf file to retain only biallelic sites and the program worked wonderfully. I am not exactly sure if that solved my problem but none-the-less it works now. I am really grateful for your help in this matter.
Most sincerely
Saurav Baral
Dear Pasi and Saurav,
I am trying to use LepMap-3 to generate linkage maps. To use in ParentCall2 module, I tried to transpose pedigree file (.txt) first. But it shows following error:
sed -e 's//\t/g' ped_lep_map_352.txt awk '{print "CHR\tPOS\t" $0}' >ped_transposed.txt
sed: -e expression #1, char 0: no previous regular expression
I do have my ped file format as follows:
GB 588160:MERGE 0 0 2 1
GB 588271:MERGE 0 0 1 1
GB 160_271_001:C7TMYANXX:6:250514961 588271:MERGE 588160:MERGE 0 1
GB 160_271_002:C7TMYANXX:6:250514962 588271:MERGE 588160:MERGE 0 1
Let me know if I am wrong! Thank you.
regards,
Gaurab
Dear Gaurab,
Please use the command exactly as here. I think you don't have space in sed and are missing transpose.
Cheers,
Pasi
Dear Pasi,
Thank you so much. Yes, I realized that I forgot to give that space. I transposed the pedigree file but when I ran ParentCall2, I got this error:
~/Desktop/GBS/LepMap$ java -cp ./bin ParentCall2 data=pedigree_transposed.txt vcfFile=Beagle_imputed_phased.vcf removeNonInformative=1 >outputcall.call
No grandparents present in family GB
Number of individuals = 352
Number of families = 1
Error 503
Error: Wrong number of columns in the input file
Note: this is my F1 population with unknown grandparents.
as I mentioned earlier I do have pedigree file in following format;
GB 588160:MERGE 0 0 2 1
GB 588271:MERGE 0 0 1 1
GB 160_271_001:C7TMYANXX:6:250514961 588271:MERGE 588160:MERGE 0 1
GB 160_271_002:C7TMYANXX:6:250514962 588271:MERGE 588160:MERGE 0 1
FYI I have attached the screenshot of bottom of the transposed pedigree file herewith. I do have a doubt on that long lines of zeros (0) there. Please let me know where i am going wrong.
Sincerely,
Gaurab Bhattarai
Dear Gaurab,
The pedigree does not seem correct. I ran the command on you example in the first post and got a correct file. Do your file has more than one space between values? Maybe you can try this then. Or just sen the pedigree (ped_lep_map_352.txt) to me so I will figure this out.
Cheers,
Pasi
Dear Pasi,
Sorry for the late reply. I solved the ped problem by simply removing that long single column and pasting data from sex column (then I replaced sex values 1 and 2 of both parents as '0' for phenotype value.(Bolded here to indicate)
588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE
CHR POS 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CHR POS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
But, I think this is not a good way of deal with exact problem. I have attached my untransposed ped file herewith. (Firstl, I entered the pedigree information in excel sheet and copied in notepad (.txt)). let me know where I was doing wrong.
I have some other questions regarding map construction.
Q1. Do I need to provide a separate column for phenotype in ped file if my aim is to create linkage map? If yes, does it matter with 0 or 1 as a phenotypic value for all individuals?
Q2. My mapping population is F1 from highy cross pollinated parents. Which parameters (if more than one) do I have to consider if I want to create 2 separete maps for parents (male and female) and an integrated map {my focus is on InformativeMask=0,1,2 or 3 and how often I need to give these parameters in different steps of map construction?}
Q3. I filtered the data using VCF tool (for MAF, hwe, missing genotypes, depth and biallelic SNPs before using LepMap3). I couldnot able to filter indels though. Do these indels can affect map construction?
Q4. I want to filter markers based on segregation pattern in Full sib family between parents and offisprings (AAxAa, aaxAa and AaxAa : which will be only informative in case of double pseudo test cross I think) before feeding into LepMap3. Can I do it in LepMap3 or It can automatically phased the genotypes? is there any specific tools to deal with double pseudo test cross population?
Since, this is my first attempt to prepare a linkage map and please forgive my ignorance.
sincerely,
Gaurab Bhattarai
Dear Gaurab,
Q1. Phenotypes are not currently used for anything.
Q2. Most reliable single parent maps you get by providing informativeMask=1 and informativeMask=2 separately. I have also made maps with informativeMask 13 and 23 as well. Most likely it suffices to set this parameter of OrderMarkers2 only.
Q3. Indels should work as SNPs.
Q4. If I understand the double pseudo test-cross correctly, it should work as F2 in LM3.
Cheers,
Pasi