Thank you for your question. To me, it looks like you have your commands backwards: You run samtools mpileup FIRST, and pipe the output of that into VarScan. Also, keep in mind that you don't need to run two separate mpileup commands to generate the two-sample mpileup VarScan needs. You can give Samtools mpileup two BAM files. Also, a minor point, always provide an argument to VarScan option flags (e.g. use --mpileup 1). Here's my recommended restructuring of your command: samtools mpileup -B -q...
Hello, and thanks for asking. The short answer is no. The VarScan readcounts command has been around forever, and it does facilitate some specialized analyses. However, it does not replace the bam-readcount functionality required for false positive filtering (though this is high on the priority list for the next release).
Brian, If you extract Somatic calls only, you'll get only variants that are found in the tumor. You can use the processSomatic command to facilitate this extraction.
Brian, the ref column is the reference base. The alts correspond to alternate allele numbers 1 and 2, so you're correct that alt 1 is the alternate allele in the normal and alt 2 is the alternate allele in the tumor IN THIS CASE. The numbers representing alts are sample-agnostic, so it would also be possible for the normal to be 1/2. If there's not a comma-separated list, then the alt allele is the alternate seen, not necessarily in the tumor, but for this position. You have to look at the genotype...
1. BED-like var file for bamreadcount For SNVs it's straightforward as you say: chromosome, start, stop, and start equals stop. Indels are more challenging because of how they can be represented at different positions. I usually create a BED-like file for indels in which I print one line for every position from (start - indel_sizse - 1) to (stop + indel_size + 1). Bam-readcounts will return counts for all positions, and VarScan will use the best match to obtain metrics for filtering. 2. Running FPfilter...
Hello, thanks for your question. The input for VarScan fpfilter is NOT from VarScan readcounts. It instead requires a separate program, bam-readcount. This is a dependency we hope to fix with a future release, but if you use bam-readcount to generate the readcounts, you should be good to go.
Thank you for providing the examples; now I see what you were trying to describe. An ambiguity code in the reference is an unusual edge case, and a possibility that I had not considered (or coded for) in VarScan. It should be possible to add this compatibility for the next release.
Jose, That's an odd one! Would you mind providing a few lines of the relevant VCF output, and the mpileup for those positions?
Dear Yuyu Liu, Thank you for your question. I should first emphasize that using this parameter may not be necessary, as VarScan is designed around the notion that tumors are usually not pure. The tumor-purity option essentially adjusts the minimum variant allele frequency (--min-var-freq) downward, such that a lower VAF threshold is used when calling the consensus base for the tumor. The actual code looks something like this: tumorMinVarFreq = (minVarFreq * tumorPurity); Thus, if your normal --min-var-freq...
Brandon, thanks for the question. The differences that you see all come down to read counting. The denominator for the VAF calculation is the total number of reads counted for any allele at that position, regardless of which allele is being counted. If >2 alleles are observed, it's very possible that the denominator accounts for base calls that you may not be aware of in the VCF, i.e. base calls that support neither the reference nor the main ALT allele. Consensus calling mode typically only reports...
Thank you for your question. I'm not sure I have enough information to answer entirely. Are you working with WES or WGS data? By de novo mutations, you mean ones that are present in the child but not the parents, I presume. There should be 1-2 in the exome and perhaps 60-80 genome-wide. Are you using VarScan mpileup2cns on the trio, and then looking for variants that appear heterozygous only in the proband? That would be my recommendation, for starters. I would then apply VarScan FPfilter with the...
Dear VarScan users, We are writing an NIH grant application for funding to harden, improve, and expand VarScan functionality. If you have feature requests or suggestions for improving our tool, please share them here or send them to me directly. Also, if you or someone at your institution uses VarScan and would be willing to sign a letter of support (LOS) for our application, please drop me a line at dankoboldt at gmail. I will send you our specific aims and a template letter that you can customize....
Sara, Thanks again for bringing this to my attention. It turns out the somatic P-value threshold was not being applied in the FP filter. This has been fixed for the next release (v2.4.4).
Changhan, thanks for clarifying!
Manal, you have to run it on each pair. I'm not sure what you're asking about the read depth; this is one of the output fields provided by VarScan, but only reported where VarScan calls a variant.
Zhou, Your ref-arm-sizes file is wrong: it should have one line for each chromosome arm.
Harshmi, I'm not sure what to make of the columsn, which don't match normal VarScan output formats. If you're still having trouble with this issue, would you mind posting your mpileup for the relevant line, as well as the VarScan version and output line?
Guillaume, thank you for your question and I apologize for the delay in replying. In your example, the read counts are somewhat confusing because it's an insertion. Based on the VarScan output, there are 237 reads supporting the reference, 1692 supporting the insertion, and 86 reads supporting other alleles. Getting a look at your mpileup is the only way to be sure; would you mind sharing it?
Zhou, the ref-arm-sizes file is one that you create, designating the coordinates of the "p" and "q" arms of each chromosome. It should be tab-delimited with four columns: chromosome, start, stop, p/q. For example: chr1 1 125000000 p chr1 125000001 249250621 q If you need to find out the coordinates of p/q arms, the "cytoBand.txt.gz" file in the UCSC Genome Database provides this information.
Dear Sara, That certainly is the intent of the parameter, but it's possible the logic wasn't implemented. I'll take a look at the code and get back to you here.
Sorry for my lateness in answering this, but the VarScan native output uses IUPAC Ambiguity codes. If you want VCF-style output, use the parameter --output-vcf 1.
Cristina, I sent you a more detailed explanation by message, but in general you should NOT do this. When calling de novo mutations, it's essential to know that you have sufficient coverage in both parents. If you combine them into a single BAM, all of the coverage could be from one parent.
Raheleh, thanks for your question. I'm not sure I have enough information to provide detailed guidance, but most studies with paired tumor-normal data are most interested in somatic alterations. To that end, you should work with the FPfilter-passed somatic mutation calls from each tumor (one for SNVS, one for indels). You might combine the SNVs and indels into a single file for convenience. I can't think of a good reason you'd want to combine somatic calls from multiple patients into a single file....
Hello, and thanks for posting. That error was corrected in a later release of VarScan. Please update to the newest version on GitHub
Leonor, Thank you for the question, and I apologize profusely for my delay in getting back to you. Yes, VarScan calls the most-observed variant allele in each sample. You might get better results calling samples individually. The little-known 'readcounts' command of VarScan will take a single-sample mpileup as input and generate read counts for each observed allele (including reference).
Jon, Thanks. The problem is that you're using an old version of VarScan (v2.3.9). The current version is v2.4.3. This particular bug was fixed in v2.4.0. Get the newest release on GitHub.
You're correct that mis-typed parameters are not always caught by VarScan, though user-provided parameters settings displayed when you run most commands. I'm not sure I'd call this a bug, so much as a missing feature, but I'll look into it for the next release.
I'll look into this -- would you mind posting (or sending me) the mpileup line(s) for that position?
Hello, thank you for posting. Would you mind sharing the mpileup line(s) from the malformed VCF variant?
Aidan, VarScan does not create semicolon-delimited output, with the possible exception being the INFO field when VCF output is specified. If you have true semicolon delimiters, something is very wrong! It would help enormously if you shared the first few lines of your input/output files as well as the commands being used.
Holly, You definiitely want to use bam-readcount for the fp-fiter purposes. VarScan's readcount feature operates differently and isn't designed for use with the filter (honestly I didn't think anyone used that command).
Yes! Glad you figured it out.
For anyone still encountering array index out of bound exceptions, I recommend using...
Use the --vcf-sample-list parameter, giving it a file of sample names in order one...
Thanks for replying, Robert. I would add that in general, you want to run processSomatic...
Zejun, I'm afraid that's outside the scope of the trio command. Instead, I'd run...
Most likely, this has to do with how the bam-readcounts file is generated for the...
I think this was caused by the command syntax issue I pointed out on the other t...
Would you like to share the first 10 lines of your pileup file? In particular, if...
You need to provide the output file basename as the third argument. The --output-vcf...
In general, you can ignore this warning. However, I strongly recommend running VarScan...
I think this is just a matter of a change to the column ordering in DNAcopy output....
Alex, could I ask you to send me the actual mpileup lines and the output you received?...
Alex, thanks for posting. It looks like a clean deletion in IGV, but seeing the mpileup...
I have not heard of this error before. If you wouldn't mind, please share the command...
Vai, That's an often-reported issue caused by zero-depth lines in the mpileup. It...
That's an odd result. Would you mind posting/sharing the first 20 lines of your variants...
This issue should be addressed in the latest VarScan release, v2.4.2.
They're not supposed to match: The DP field is the total depth for that position...
Hello Xiaoxiao, You can ignore the "Not resetting" warnings, as these are status...
Not at all, Raman. It means you need to include chrM in your ref-arm-sizes file....
If you post some examples (VarScan output), I'll take a look.
We typiclly use -q 1 or -q 10.... this is minimum mapping quality, so a difficult...
That's curious... it might be an error with the call-counting code. Will look into...
Always trust the calls within the files over the counters. If you want, set the "--validation...
Corentin, You can use VarScan 2 copynumber on exome or WGS data. However, it does...
It's true that VarScan has been tuned using primarily Illumina data, and I haven't...
Yaseen and Bella, Thank you both for asking. The correct way is to license it through...
Thanks for the message -- these things are possible (and recommended, if you read...
I believe I've put in a fix that should handle this exception more gracefully, but...
Ankita, that looks like a mal-formed mpileup line (as I said by e-mail), but I'm...
Thank you all for reporting this. It seems to be caused by incomplete mpileup lines,...
That's an odd behavior for certain. Best guess? The "--output-vcf 1" parameter is...
"Consensus" really only refers to the second column in your example, which is typically...
Colin, Correct: --min-avg-qual is a threshold for individual bases, and determines...
Colin, thanks for the message. I'm afraid the name of that parameter is misleading....
Hello, That's a typo in the paper, for item 1. Variants present only in the tumor...
Yes, it should be species-agnostic. Most likely it's the arm sizes file -- you need...
Floris, I just released new files (VarScan v2.3.9) with patches that should resolve...
Floris, Thank you for the test files, which helped me identify the problem. I will...
For bam-readcount, you need to give it an input list of positions in the format:...
Thank you for asking! In your case, I would: 1.) Run processSomatic and isolate the...
Yes, one needs to generate the readcounts with bam-readcount. I do intend to put...
Thanks, Chris! I will try these out and see about implementing the speedup in the...
elmesmari, No, that's VarScan native output format. You need to use --output-vcf...
Ania, VarScan does not populate the QUAL column of the VCF file; instead, it provides...
Flavia, Thank you again for reporting this; it helped me track down an issue with...
Hello all, I've just released a new version of VarScan (v2.3.7) that includes an...
VarScan v2.3.7 is now available, and addresses this issue: https://sourceforge.n...
Yes, thank you for commenting. This IS an issue due to the SAMtools 0-depth formatting...
Thank you for asking. That parameter is poorly named; more accurately, it would be...
Darren, Thank you for posting your question, and for attaching all of the files that...
Correction: in reviewing the current code to investigate the missing columns issue,...
The first error is probably a chromosome in your segments file that's not in the...
Typically we run in parallel with a perl script that sub-sets the BAM file by chromosome...
Phillipp, I can't imagine a scenario where VarScan would output an incomplete VCF...
This looks like the zero-depth mpileup line issue, which is a known bug that should...
Sophie, I hope you've been able to make more progress with this issue. I'm afraid...
To the best of my knowledge, VarScan is outputting valid VCF. Your incident might...
SDP is the Samtools depth, which reflects what you'll probably see in the mpileup....
It's a little surprising that SAMtools bogs down with fewer than 76 samples; that...
Jane, Have you tried running the "readcounts" subcommand of VarScan? It should tell...
Can you provide some details of the command you were running? How many samples were...
Thanks for your message, and I'm sorry to hear about your troubles! Would you send...
This is a known VCF indel formatting issue that was corrected in v2.3.6.
Hello, You need to put your input file before the parameter settings, i.e. java -jar...
Hello, It is puzzling that a subset of files were missing VCF headers. Would you...
Thanks to both of you for bringing this up -- another user e-mailed to report the...
Andrea, Thanks for posting this... it looks like VarScan's calculated VAF was lower...
Vincent, DP4 is a SAMtools field that contains forward ref, reverse ref, forward...