Menu

mapping haploid progeny

RWhetten
2019-05-22
2019-05-24
  • RWhetten

    RWhetten - 2019-05-22

    Hi Pasi,
    I have a set of 184 BAM files with data from haploid individuals segregating from a single diploid mother, and have produced a VCF file of biallelic SNPs by filtering the output of a variant-calling pipeline. I have a few questions about different aspects of the LepMap3 analysis.
    1. I started an analysis with the pileupParser2.awk and pileup2posterior.awk scripts to compare that output with results from the VCF file, but that job has been running for over three weeks - is there any way to speed that up?
    2. Using SeparateChromosomes2 on the data from the VCF file, I found that a LOD limit of 17 gave 12 linkage groups (the expected number), using distortionLod=1 because this is a single family. I used recombination1=0 in the OrderMarkers2 step because there is no information about male recombination in this dataset - is this an appropriate way to get accurate estimates of genetic distances in this population type?
    3. Plots produced with LMPlot and xdot shows some intervals with markers that form Y or T shapes with the main axis of the linkage group, or have red edges. Is the best method to move forward by dropping those markers and re-running OrderMarkers2 with all individuals, or would it be better to remove individuals identified as having excessive recombination events and reanalyze with all markers? I recognize that removing both some individuals and some markers will be desirable in the long run; the question is whether there is an optimal way to clean up the dataset.
    Thanks,
    Ross

     
  • Pasi Rastas

    Pasi Rastas - 2019-05-23

    Dear Ross,

    Thank you for your questions.

    1. Yes, you can run each scaffold/contig separately. First, index all the bam files and then give "samtools mpileup" parameter "-r C" for each contig C. Then you can join individual files created for each contig. I can put some info for the wiki how to do this.

    2. How did you code the pedigree? I think you could code it as F1 cross with other parent your mother and other "dummy" (say male) parent being always homozygote. Then you could call all your markers either as sex linked with ParentCall2 (all offspring males and XY, XLimit=0) or you could convert your data so that one homozygote call for offpspring is always converted to heterozygote (AA=>AA, BB=>AB, mother AB: father AA). I think I have a script for such conversion.

    3. Red edges are ones one should worry about. Typically these are fixed by re-running OrderMarkers2 or by removing some individuals with poor quality data. However, I cannot say for sure about this before I know how you have coded your data.

    Cheers,
    Pasi

     

    Last edit: Pasi Rastas 2019-05-23
  • Pasi Rastas

    Pasi Rastas - 2019-05-23

    To add,

    probably this works as well by duplicating mother to be male and female parent as well (without any modification to the data). Then the selfingPhase=1 in OrderMarker2 might work better and both map positions should be equal.

    Cheers,
    Pasi

     
  • RWhetten

    RWhetten - 2019-05-23

    Hi Pasi,
    Thanks for the quick reply. I'll try setting up a GNU parallel job to convert the BAM files to genotype likelihoods by scaffold and see how that works.
    Regarding coding - I don't have data for the diploid mother, so the pedigree (based on the example in the wiki) simply contains "male" and "female" in columns 3 & 4 of line 2, all samples listed as descended from "male" in line 3 and "female" in line 4, 0 for sample gender in line 5 and 0 for all in line 6. I'll try running OrderMarker2 with the selfingPhase=1 option and see what happens.
    Because the offspring are haploid, heterozygous calls are a sign of trouble - everything should be called homozygous because it is actually hemizygous. The idea of mapping as though they are sex-linked seems promising - can LepMap3 create multiple linkage groups of sex-linked markers at the SeparateChromosomes2 step?
    Regards,
    Ross

     
  • Pasi Rastas

    Pasi Rastas - 2019-05-24

    Dear Ross,

    I see. Indeed, the parent is not really needed. It should work this way as well. If you have enough sequencing coverage it does not matter how you use the data, and it should work just fine the way you have analysed the data. Just add selfingPhase as in such data you cannot construct separate male and female maps.

    However, as the data is haploid, you could call the variant more accurately by taking haploidy into account (no heterozygotes).

    And to add, you should not use recombination1=0, just use the defaults.

    Cheers,
    Pasi

     

    Last edit: Pasi Rastas 2019-05-24

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.