Hi Giuseppe,
I am running scalpel-discovery in the single mode on a cell line mixture and scalpel is calling few of the homozygous variants (clearly seen in IGV as homozygous) at 50% VAF or less (as Hets). The command I am running is:
scalpel-discovery --single --ref human_g1k_v37_.fasta --numprocs 12 --maxregcov 10000000 --bam $bam --bed $bed --dir
"8 48805816 . A AG . PASS AVGCOV=1233.0;MINCOV=1233;ALTCOV=1269;ZYG=het;COVRATIO=0.49;CHI2=0.52;FISHERPHREDSCORE=0;INH=na;BESTSTATE=na;COVSTATE=na GT:AD:DP 0/1:1269,1233:2502"
I tried playing with some parameter settings such as window size and kmer size and dropping window size to 150 or dropping kmer size to 10 recovers it, but its not always the case and I am not sure if this can have some other adverse effects.
Likewise, if I provide a smaller region in the bed (8:48805810-48805819), it calls it correctly as homozygous variant, but if you provide a larger interval like (8:48805438-48806208), it mis-calls it as a Het.
I understand that Scalpel performs de novo assembly and larger the region, the more reads its gonna have for that window and this can cause such issues. But I wanted to check with you if there is a way we can avoid this mis-labelling from happening or a way we can correct it. Please let me know.
Thank you.
Ashini
Hi Ashini,
thank you for reporting this issue. Indeed the window size can impact the genotype.
I would suggest to perform one more round of genotyping at the end by rerunning scalpel on the final list of variants. I would fix the window to 400 bp for all variants and center it at the variant location to make sure that all the variants are processed somehow consistently. As you already suggested, using a centered window helps reducing these problems, although it will likely not fix all of them. Hope this is useful.
Thanks for your reply.
When you said to re-run Scalpel on the final list of variants, did you mean running scalpel-export?
I tried running the export command but for the same results.
"8 48805816 . A AG . PASS AVGCOV=1233.0;MINCOV=1233;ALTCOV=1269;ZYG=het;COVRATIO=0.49;CHI2=0.52;FISHERPHREDSCORE=0;INH=na;BESTSTATE=na;COVSTATE=na GT:AD:DP 0/1:1269,1233:2502"
scalpel-export --single --db variants.db.dir --bed $bed --ref human_g1k_v37_.fasta
Yes, you are right. Fixing window size might rescue this variant but we don't know what will be the impact on other variants. At this point it looks like, specially for these germline variants one needs to manually inspect variants to determine their zygosity.
What I meant was running "scalpel-discovery" to re-genotype the final list of variants, where the BED input file contains the list of the windows centered at each variant location.
Oh okay. I think I got you. What you meant by re-genotyping is basically running Scalpel again with a new bed file provided that now we have information on all variants, so we can create a new bedfile centered around those variants. I tried this option and it does recover variants for a few cases that were mis-called as Het, even though were pure homozygous variants based on the window size.
However, for all the other cases where True variants are supposed to be at 95% (we can determine this VAF based on dilution of cell lines. Example: A homozygous variant present in Major Cell line at 95% but is a heterozygous variant in minor cell line that was mixed at 5%) VAF are all called at 50% or less, even after adjusting the window sizes. I have never ever seen it call the variants that are not purely homozygous above 50%.
Thanks,
Ashini
Last edit: Ashini Bolia 2019-01-07