Menu

#12 Scalpel calling homozygous variants as Het

1.0
open
None
1
2019-01-07
2019-01-04
No

Hi Giuseppe,
I am running scalpel-discovery in the single mode on a cell line mixture and scalpel is calling few of the homozygous variants (clearly seen in IGV as homozygous) at 50% VAF or less (as Hets). The command I am running is:
scalpel-discovery --single --ref human_g1k_v37_.fasta --numprocs 12 --maxregcov 10000000 --bam $bam --bed $bed --dir

"8 48805816 . A AG . PASS AVGCOV=1233.0;MINCOV=1233;ALTCOV=1269;ZYG=het;COVRATIO=0.49;CHI2=0.52;FISHERPHREDSCORE=0;INH=na;BESTSTATE=na;COVSTATE=na GT:AD:DP 0/1:1269,1233:2502"

I tried playing with some parameter settings such as window size and kmer size and dropping window size to 150 or dropping kmer size to 10 recovers it, but its not always the case and I am not sure if this can have some other adverse effects.
Likewise, if I provide a smaller region in the bed (8:48805810-48805819), it calls it correctly as homozygous variant, but if you provide a larger interval like (8:48805438-48806208), it mis-calls it as a Het.

I understand that Scalpel performs de novo assembly and larger the region, the more reads its gonna have for that window and this can cause such issues. But I wanted to check with you if there is a way we can avoid this mis-labelling from happening or a way we can correct it. Please let me know.

Thank you.
Ashini

Discussion

  • Giuseppe Narzisi

    Hi Ashini,

    thank you for reporting this issue. Indeed the window size can impact the genotype.

    I would suggest to perform one more round of genotyping at the end by rerunning scalpel on the final list of variants. I would fix the window to 400 bp for all variants and center it at the variant location to make sure that all the variants are processed somehow consistently. As you already suggested, using a centered window helps reducing these problems, although it will likely not fix all of them. Hope this is useful.

     
  • Ashini Bolia

    Ashini Bolia - 2019-01-07

    Thanks for your reply.

    When you said to re-run Scalpel on the final list of variants, did you mean running scalpel-export?
    I tried running the export command but for the same results.
    "8 48805816 . A AG . PASS AVGCOV=1233.0;MINCOV=1233;ALTCOV=1269;ZYG=het;COVRATIO=0.49;CHI2=0.52;FISHERPHREDSCORE=0;INH=na;BESTSTATE=na;COVSTATE=na GT:AD:DP 0/1:1269,1233:2502"
    scalpel-export --single --db variants.db.dir --bed $bed --ref human_g1k_v37_.fasta

    Yes, you are right. Fixing window size might rescue this variant but we don't know what will be the impact on other variants. At this point it looks like, specially for these germline variants one needs to manually inspect variants to determine their zygosity.

     
  • Giuseppe Narzisi

    What I meant was running "scalpel-discovery" to re-genotype the final list of variants, where the BED input file contains the list of the windows centered at each variant location.

     
  • Ashini Bolia

    Ashini Bolia - 2019-01-07

    Oh okay. I think I got you. What you meant by re-genotyping is basically running Scalpel again with a new bed file provided that now we have information on all variants, so we can create a new bedfile centered around those variants. I tried this option and it does recover variants for a few cases that were mis-called as Het, even though were pure homozygous variants based on the window size.
    However, for all the other cases where True variants are supposed to be at 95% (we can determine this VAF based on dilution of cell lines. Example: A homozygous variant present in Major Cell line at 95% but is a heterozygous variant in minor cell line that was mixed at 5%) VAF are all called at 50% or less, even after adjusting the window sizes. I have never ever seen it call the variants that are not purely homozygous above 50%.
    Thanks,
    Ashini

     

    Last edit: Ashini Bolia 2019-01-07

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.