Menu

Vcf to .linkage

2017-01-12
2018-05-07
  • Saurav Baral

    Saurav Baral - 2017-01-12

    Hello,

    It would be really great if someone in the Linkage map community would help me regarding tools used to convert data in .vcf format to .linkage format. I tried using mega2, but it crashed midway. Then i tried the vcf2posterior.awk followed by simpleConvert.awk scripts provided here. Although this produced .linkage file, some of the genotype values were expressed in power of 10 ("7.94328e-06") which the Filtering module could not process. Even if i remove these values, its produces an error (Error: Sex of the parent(s) is not specified). I have no idea how to incorporate the pedigree values into the .linkage file and it would be really awesome if someone here guided me to do the same,

    Most Sincerely
    Saurav Baral

     
  • Pasi Rastas

    Pasi Rastas - 2017-01-13

    Dear Saurav Baral,

    Thank you for your email.

    The simplest way might be to use ParentCall2 module in Lep-MAP3. Provide the pedigree in data and vcf in vcfFile. Compressed vcfs are not yet supported but can be simply used like this
    "zcat data.vcf.gz | java ParentCall2 ... data=pedigree.txt vcfFile=- ...".

    The pedigree is given as tranpose (that of linkage file) and first and second columns are CHR and POS. ParentCall2 outputs data is such format that can be easily converted to linkage (simpleconvert, transpose, cut) or used in LM3.

    Cheers,
    Pasi

     
  • Saurav Baral

    Saurav Baral - 2017-01-13

    Dear Pasi,

    I tried running the command as you suggested. I am getting a java error:

    java.lang.NullPointerException
    at DataParser.getParentsAndGrandParents(DataParser.java:154)
    at DataParser.getParentsAndGrandParents(DataParser.java:108)
    at DataParser.getNextLine(DataParser.java:615)
    at DataParser.getNextLine(DataParser.java:540)
    at ParentCall2.callParents(ParentCall2.java:104)
    at ParentCall2.main(ParentCall2.java:711)
    Error 504
    Error: Unable to load input file or errors in the file

    is there an error in my input files. The vcf i am using is produced using Tasses GBS v2 pipeline and the pedigree data is in the following format:

    # family id father mother sex(1M,2F) phenotype(need one column at least, it can be 0)
    1 1 0 0 1 2
    1 2 0 0 2 1
    1 1:250020986 1 2 2 1
    1 10:250020986 1 2 2 1
    1 11:250020986 1 2 2 1
    1 12:250020986 1 2 2 1
    1 13:250020986 1 2 2 1
    1 14:250020986 1 2 2 1
    1 15:250020986 1 2 2 1
    1 16:250020986 1 2 2 1
    1 17:250020986 1 2 2 1
    1 18:250020986 1 2 2 1
    1 19:250020986 1 2 2 1
    1 2:250020986 1 2 2 1
    1 20:250020986 1 2 2 1
    1 21:250020986 1 2 2 1
    1 22:250020986 1 2 2 1
    1 23:250020986 1 2 2 1
    1 24:250020986 1 2 2 1

    Please let me know where i am going wrong.

     
  • Pasi Rastas

    Pasi Rastas - 2017-01-13

    Ok,

    The pedigree must be tranposed, my old school fix is this (ped.txt is the pedigree you had):

    sed -e 's/ /\t/g' ped.txt |./transpose_tab|awk '{print "CHR\tPOS\t" $0}' >ped_t.txt

    Then used ped_t.txt for ParentCall2.

    Cheers,
    Pasi

     
  • Saurav Baral

    Saurav Baral - 2017-01-13

    Dear Pasi,

    I transposed the pedigree files and did the analysis again. This time i am getting a new error:

    No grandparents present in family 1
    Number of individuals = 96
    Number of families = 1
    ./.:0,0:0 2
    Error 518
    Error: vcf file does not contain such field

    I tried searching in google for this error but was unable to find anything. I am really sorry that for coming to you with so many errors. This is my first time trying to prepare a linkage map and please forgive my ignorance.

    Most sincerely

    Saurav Baral

     
  • Saurav Baral

    Saurav Baral - 2017-01-14

    Dear Pasi,
    Thank you for your invaluable help. I think the issue was with my vcf file. I filtered the vcf file to retain only biallelic sites and the program worked wonderfully. I am not exactly sure if that solved my problem but none-the-less it works now. I am really grateful for your help in this matter.

    Most sincerely

    Saurav Baral

     
  • Gaurab Bhattarai

    Dear Pasi and Saurav,
    I am trying to use LepMap-3 to generate linkage maps. To use in ParentCall2 module, I tried to transpose pedigree file (.txt) first. But it shows following error:

    sed -e 's//\t/g' ped_lep_map_352.txt awk '{print "CHR\tPOS\t" $0}' >ped_transposed.txt
    sed: -e expression #1, char 0: no previous regular expression

    I do have my ped file format as follows:
    GB 588160:MERGE 0 0 2 1
    GB 588271:MERGE 0 0 1 1
    GB 160_271_001:C7TMYANXX:6:250514961 588271:MERGE 588160:MERGE 0 1
    GB 160_271_002:C7TMYANXX:6:250514962 588271:MERGE 588160:MERGE 0 1

    Let me know if I am wrong! Thank you.

    regards,
    Gaurab

     
  • Pasi Rastas

    Pasi Rastas - 2018-04-18

    Dear Gaurab,

    Please use the command exactly as here. I think you don't have space in sed and are missing transpose.

    sed -e 's/ /\t/g' ped_lep_map_352.txt |./transpose_tab|awk '{print "CHR\tPOS\t" $0}' >ped_t.txt
    

    Cheers,
    Pasi

     
  • Gaurab Bhattarai

    Dear Pasi,

    Thank you so much. Yes, I realized that I forgot to give that space. I transposed the pedigree file but when I ran ParentCall2, I got this error:

    ~/Desktop/GBS/LepMap$ java -cp ./bin ParentCall2 data=pedigree_transposed.txt vcfFile=Beagle_imputed_phased.vcf removeNonInformative=1 >outputcall.call
    No grandparents present in family GB
    Number of individuals = 352
    Number of families = 1
    Error 503
    Error: Wrong number of columns in the input file

    Note: this is my F1 population with unknown grandparents.
    as I mentioned earlier I do have pedigree file in following format;

    GB 588160:MERGE 0 0 2 1
    GB 588271:MERGE 0 0 1 1
    GB 160_271_001:C7TMYANXX:6:250514961 588271:MERGE 588160:MERGE 0 1
    GB 160_271_002:C7TMYANXX:6:250514962 588271:MERGE 588160:MERGE 0 1

    FYI I have attached the screenshot of bottom of the transposed pedigree file herewith. I do have a doubt on that long lines of zeros (0) there. Please let me know where i am going wrong.

    Sincerely,
    Gaurab Bhattarai

     
  • Pasi Rastas

    Pasi Rastas - 2018-04-19

    Dear Gaurab,

    The pedigree does not seem correct. I ran the command on you example in the first post and got a correct file. Do your file has more than one space between values? Maybe you can try this then. Or just sen the pedigree (ped_lep_map_352.txt) to me so I will figure this out.

    sed -e 's/ [ ]*/\t/g' ped_lep_map_352.txt |./transpose_tab|awk '{print "CHR\tPOS\t" $0}' >ped_t.txt
    

    Cheers,
    Pasi

     
  • Gaurab Bhattarai

    Dear Pasi,

    Sorry for the late reply. I solved the ped problem by simply removing that long single column and pasting data from sex column (then I replaced sex values 1 and 2 of both parents as '0' for phenotype value.(Bolded here to indicate)
    588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE 588160:MERGE
    CHR POS 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    CHR POS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    But, I think this is not a good way of deal with exact problem. I have attached my untransposed ped file herewith. (Firstl, I entered the pedigree information in excel sheet and copied in notepad (.txt)). let me know where I was doing wrong.

    I have some other questions regarding map construction.

    Q1. Do I need to provide a separate column for phenotype in ped file if my aim is to create linkage map? If yes, does it matter with 0 or 1 as a phenotypic value for all individuals?

    Q2. My mapping population is F1 from highy cross pollinated parents. Which parameters (if more than one) do I have to consider if I want to create 2 separete maps for parents (male and female) and an integrated map {my focus is on InformativeMask=0,1,2 or 3 and how often I need to give these parameters in different steps of map construction?}

    Q3. I filtered the data using VCF tool (for MAF, hwe, missing genotypes, depth and biallelic SNPs before using LepMap3). I couldnot able to filter indels though. Do these indels can affect map construction?

    Q4. I want to filter markers based on segregation pattern in Full sib family between parents and offisprings (AAxAa, aaxAa and AaxAa : which will be only informative in case of double pseudo test cross I think) before feeding into LepMap3. Can I do it in LepMap3 or It can automatically phased the genotypes? is there any specific tools to deal with double pseudo test cross population?

    Since, this is my first attempt to prepare a linkage map and please forgive my ignorance.

    sincerely,

    Gaurab Bhattarai

     
  • Pasi Rastas

    Pasi Rastas - 2018-05-07

    Dear Gaurab,

    Q1. Phenotypes are not currently used for anything.

    Q2. Most reliable single parent maps you get by providing informativeMask=1 and informativeMask=2 separately. I have also made maps with informativeMask 13 and 23 as well. Most likely it suffices to set this parameter of OrderMarkers2 only.

    Q3. Indels should work as SNPs.

    Q4. If I understand the double pseudo test-cross correctly, it should work as F2 in LM3.

    Cheers,
    Pasi

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.