VarScan Activity

Variant detection in next-generation sequencing data

Brought to you by: dkoboldt

Activity for VarScan

2 years ago
Shihua Dong posted a comment on discussion Help

I am using VarScan v2.4.6, I am failed to output vcf format file when call variant by pileup2snp, my code is: samtools mpileup -f $reference $input.bam | varscan pileup2snp --output-vcf 1 > $output.vcf The result is just the same as without --output-vcf 1, it's do not in vcf format. Could someone help to figure out it, thanks.
2 years ago
Ester posted a comment on discussion Help

Hi I'm testing the mpileup2somatic command on a single patient (at the moment) after having done samtools mpileup normal bam and tumor bam from exomes data. For this step , these are the parameters I used: mpileup2somatic --min-coverage 8 --min-var-freq 0.01 --strand-filter 0 (we are interested in investigating very low VAF). I obtained the paired vcf and then as suggested, I intended to use the fpfilter but something in the previous step with processSomatic goes wrong. I ran this command: java -jar...
2 years ago
Ester modified a comment on discussion Help
2 years ago
Ester posted a comment on discussion Help

Hi! I'm trying to use fpfilter on output from varscan mpileup2somatic command: java -jar VarScan.v2.4.6.jar fpfilter somatic.vcf bamreadcount.regions --min-var-freq 0.01 --min-var-count 5 --filtered-file fail.vcf --output-file fpfilter.somatic. vcf but it seems to me that the varcount filter does not work, in fact there are variants reported in the output file with less than 5 alternative reads, examples: NORMAL TUMOR 0/0:9:9:9:0:0%:.:1E0 0/1:32:32:30:2:6.25%:2:6.0488E-1 0/0:55:55:55:0:0%:.:1E0 0/1:201:201:198:3:1.49%:2:4.8246E-1...
2 years ago
Ester posted a comment on discussion Help

Hi! I'm trying to use fpfilter on output from varscan mpileup2somatic command: java -jar VarScan.v2.4.6.jar fpfilter somatic.vcf bamreadcount.regions --min-var-freq 0.01 --min-var-count 5 --filtered-file fail.vcf --output-file fpfilter.somatic. vcf but it seems to me that the varcount filter does not work, in fact there are variants reported in the output file with less than 5 alternative reads, examples: NORMAL TUMOR 0/0:9:9:9:0:0%:.:1E0 0/1:32:32:30:2:6.25%:2:6.0488E-1 0/0:55:55:55:0:0%:.:1E0 0/1:201:201:198:3:1.49%:2:4.8246E-1...
4 years ago
Natalie Saini posted a comment on discussion Help

Hello, I am using a compute node with 512GB memory allowed. My mpileup files are ~300GB each (normal and tumor). Is it possible to run VarScan somatic caller such that we don't store all data on the node and instead write it to a tmp file or tmp dir? I am not being able to complete VarScan somatic calling currently. Thanks!
4 years ago
Jess Buddle posted a comment on discussion Help

Hello, i am trying to compare 2 SNP tables generated through varScan using java -jar VarScan.jar compare ${folder}/X.varScan.snp ${folder}/Y.varScan.snp unique1 ${folder}/X_unique.varScan.snp But i keep getting the warning "Warning: Unable to parse chrom/position from chrom position" I have tried everything i can think of to fix it but nothing works. It tells me how many positions are unique or shared but doesnt write this into an output file (output generated is empty) Thanks!
4 years ago
Dan Koboldt posted a comment on discussion Help

Thank you for your question. To me, it looks like you have your commands backwards: You run samtools mpileup FIRST, and pipe the output of that into VarScan. Also, keep in mind that you don't need to run two separate mpileup commands to generate the two-sample mpileup VarScan needs. You can give Samtools mpileup two BAM files. Also, a minor point, always provide an argument to VarScan option flags (e.g. use --mpileup 1). Here's my recommended restructuring of your command: samtools mpileup -B -q...
4 years ago
Melissa Chua posted a comment on discussion Help

I want to run samtools and pipe the mpileup output into a VarScan call using the following conditions: 30 mapping score 30 phred quality score output-snp format min coverage = 50 min tumor coverage = 3 p-value = 0.01 min variant frequency = 4% Process substitution: java -jar VarScan.v2.3.9.jar somatic <(samtools mpileup -B -q 30 -Q 30 normal_sor t.bam) <(samtools mpileup -B -q 30 -Q 30 tumor_sort.bam) output -mpileup --strand-filter 1 --output-snp --min-coverage 50 --min-coverage-tumor 3 --p-value...
4 years ago
rk2802 posted a comment on discussion Help

Hello .. I am trying to analyze exome copy number variation by GISTIC2, which requires ref arm size file and have downloaded the cytoband.txt.gx file for hg38 from UCSC. After removing of all unnecessary lines such as having alt or Chr mt or ChrUN etc I am trying to summarize using R and fail to do so. Kindly help me in creating my ref-arm-size file. Thank You.
4 years ago
Michael Yourshaw posted a comment on discussion Help

The docs recommend piping samtools mpileup into mpileup2cns. However, it is possible for a empty or nearly empty bam file to produce no mpileup. In this case mpileup2cns will eventually time out, or hang indefinitely. It seems to be safe to use an intermediate file samtools mpileup >mpileup.tsv followed by java -jar varscan.jar mpileup2cns mpileup.tsv
4 years ago
Dan Koboldt posted a comment on discussion Help

Hello, and thanks for asking. The short answer is no. The VarScan readcounts command has been around forever, and it does facilitate some specialized analyses. However, it does not replace the bam-readcount functionality required for false positive filtering (though this is high on the priority list for the next release).
4 years ago
Xiangyu Zhang posted a comment on discussion Help

I have the same confusion, did you get the answer ?
4 years ago
Hubert DENISE posted a comment on discussion Help

Hi, I'm using VarScan to variant call metagenomics samples. However I'm not sure how to interpret the ambiguous calls in my context. What does it mean when the ouput is: A->R with 20% alt frequency for example? My understanding is that, in this case, 20% of the reads have the ambiguous call but is VarScan saying that 50% of those are ref (A->A) and 50% are alt (A->G)? or that VarScan cannot 'decide' if these 20% of the reads have A or G (i.e. could be up to 100% alt)? I could not find the answer...
5 years ago
Vanessa Souza posted a comment on discussion Help

Hi, Dan, I'm new to bioinformatics. Could you explain in detail how to prepare BED-like var file for bamreadcount. As suggested for SNV that I use print $1,$2,$2, but for indels I didn’t understand exactly how to proceed. I believe this is the cause of my problem.
5 years ago
vic posted a comment on discussion Help

Hi, Vanessa, I guess you need to be more specific. I've included below a general rule for raising question in GATK forum. If you could bring up more detail, Dan, myself or other users might be able to help you better. Thanks. Vic Writing a New Post Couldn’t find a solution from the steps above? Make a new post! Read these instructions to determine the category of your post and what information we need so that you can get your question answered: 1. GATK Errors An error message from a GATK Tool that...
5 years ago
Vanessa Souza modified a comment on discussion Help

perl script_FPfilter sample.varScan.snp.filter.bed varScan.variants.readcounts –output-basename varScan.variants.filter I get the following output: 27841 variants 27841 failed to get readcounts for variant allele 0 had read position < 0.1 0 had strandedness < 0.01 0 had var_count < 4 0 had var_freq < 0.05 0 had mismatch qualsum difference > 50 0 had mapping quality difference > 30 0 had read length difference > 25 0 had var_distance_to_3' < 0.2 0 passed the strand filter Apparently, the problem is...
5 years ago
Vanessa Souza posted a comment on discussion Help

I did it, but my problem remains.
5 years ago
Vanessa Souza posted a comment on discussion Help

I did it, but my problem remains.
5 years ago
Vanessa Souza posted a comment on discussion Help

I did it, but my problem remains.
5 years ago
vic posted a comment on discussion Help

Hi, Vanessa, Yes, I did. Please see above Dan's message: "Hello, thanks for your question. The input for VarScan fpfilter is NOT from VarScan readcounts. It instead requires a separate program, bam-readcount. This is a dependency we hope to fix with a future release, but if you use bam-readcount to generate the readcounts, you should be good to go." Basically, you need to generate a Varscan format somatic calling instead of VCF format of it; You then need to download another software called "bam-readcount"...
5 years ago
Vanessa Souza posted a comment on discussion Help

Do you able to solve the problem of failure in the result because no read count was returned? I'm having the same problem.
5 years ago
Vanessa Souza posted a comment on discussion Help

I'm new to variant calling and I'm having trouble running VarScan2 v2.4.4 (Support Protocol 1) which can be found in the following online document: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4278659 / The authors advise using the fpfilter.pl accessory script (https://sourceforge.net/projects/varscan/files/scripts/). When I run the following command: perl script_FPfilter sample.varScan.snp.filter.bed varScan.variants.readcounts –output-basename varScan.variants.filter I get the following output:...
5 years ago
Brian Wiley posted a comment on discussion Help

Hi, In my example below, this is a short subset of SS=1 for SS=Germline. In general how is this SS determined? More specifically why would variants with FREQ very different from 50% (heterozygous) and 100% (homozygous) be classified as Germline, and even more so with these FREQ and with low SPV be given classification SS=1 for Germline. chr1 29902 . AAGCCAGACGC A . PASS DP=260;SS=1;SSC=30;GPV=1;SPV=0.00082866;AC=1;AN=2 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:225:214:10:4.46%:145,69,4,6 chr1 185125 . A G ....
5 years ago
KELLY L BOLTON posted a comment on discussion Help

Hello, under the documentation for somaticFilter it states: "This command filters somatic mutation calls to remove clusters of false positives and SNV calls near indels." What does removing clusters of false positives mean? How are false positives and clusters of these defined? Thanks Kelly
5 years ago
Brian Wiley modified a comment on discussion Help

Great, is it possible to specify and output directory for these files? Also after test running I am confused. All the variants in my indel file that had a comma in the alt column are gone from the 3 output files. Does proccessSomatic not include any of these where the normal and tumor have different indels at a position? EDIT: sorry some do occur in the output files but not many. Only about 3.8% of all the indels that have two alt (comma separated) made it to the three output files for my first test....
5 years ago
Brian Wiley modified a comment on discussion Help

Great, is it possible to specify and output directory for these files? Also after test running I am confused. All the variants in my indel file that had a comma in the alt column are gone from the 3 output files. Does proccessSomatic not include any of these where the normal and tumor have different indels at a position?
5 years ago
Brian Wiley posted a comment on discussion Help

Great, is it possible to specify and output directory for these files?
5 years ago
Dan Koboldt posted a comment on discussion Help

Brian, If you extract Somatic calls only, you'll get only variants that are found in the tumor. You can use the processSomatic command to facilitate this extraction.
5 years ago
Brian Wiley modified a comment on discussion Help

Thanks Dan, but then how would one extract just the tumor from the vcf, i.e. I don't want any information for the normal because I used an unmatched normal for all my tumors since they are really HIV patients and not tumors. So I want to just obtain a VCF for the tumor and remove the normal entirely. I was thinking I would first do an awk on the alt column and remove the first alt only if comma separated and then with bcftools just ^ the NORMAL to remove it. Is this was someone would do? Also I just...
5 years ago
Brian Wiley posted a comment on discussion Help

Thanks but then how would one extract just the tumor from the vcf, i.e. I don't want any information for the normal because I used an unmatched normal for all my tumors since they are really HIV patients and not tumors. So I want to just obtain a VCF for the tumor and remove the normal entirely. I was thinking I would first do an awk on the alt column and remove the first alt only if comma separated and then with bcftools just ^ the NORMAL to remove it. Is this was someone would do? Also I just started...
5 years ago
Dan Koboldt posted a comment on discussion Help

Brian, the ref column is the reference base. The alts correspond to alternate allele numbers 1 and 2, so you're correct that alt 1 is the alternate allele in the normal and alt 2 is the alternate allele in the tumor IN THIS CASE. The numbers representing alts are sample-agnostic, so it would also be possible for the normal to be 1/2. If there's not a comma-separated list, then the alt allele is the alternate seen, not necessarily in the tumor, but for this position. You have to look at the genotype...
5 years ago
Brian Wiley posted a comment on discussion Help

Sorry also to clarify if there is not comma separated list then the variant is for the tumor?
5 years ago
Brian Wiley posted a comment on discussion Help

Hi, I want to confirm the output from Varscan somatic. If I have matched normal and tumor and both normal and tumor have variants, I assume based off running same thing with table output, the first alt is for the normal and the second alt is for the tumor. So for the first one the normal is CT>C and the tumor is CT>CTT. Example chr17 5529758 . CT C,CTT . PASS DP=20;SS=5;SSC=0;GPV=1E0;SPV=7.0898E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:10:8:2:20%:8,0,2,0 0/2:.:10:8:2:20%:8,0,2,0 chr17 6659795 . TTC T,TTCTCTCTCTC...
5 years ago
vic posted a comment on discussion Help

Hi, I came across a strange problem and it seems that --output-vcf 1 option disabled --strand-filter: So, if I do samtools mpileup -B -f hg19.fa -q 1 normal tumor | java -jar VarScan.v2.4.4.jar somatic -mpileup var.vcf --min-var-freq 0.08 --p-value 0.10 --strand-filter 1 --output-vcf 1, 0 were removed by the strand filter; But if I remove --output-vcf 1 from the above code, 582878 were removed by the strand filter. Also, is there a way to convert Varscan format into vcf 4.1? I need vcf 4.1 for Gemini....
5 years ago
vic posted a comment on discussion Help

Hi, Dan, 1. BED-like var file for bam-readcount Yes. It works. Then, I ran a test: combined somatic output snp head -20 and indel head -20; awk a BED-like file; all 38 variants have bam-readcounts; but fpfilter says only 30 had a bam-readcount result and 8 failed because no readcounts were returned. I checked failed variants without readcount, but there're readcounts in readcount file, e.g failed "chr1 10439 A -C 33 26 44.07% /-C 14 5 26.32% /-C Germline 1.1034090888899382E-11 0.9525820540573707...
5 years ago
Dan Koboldt posted a comment on discussion Help

1. BED-like var file for bamreadcount For SNVs it's straightforward as you say: chromosome, start, stop, and start equals stop. Indels are more challenging because of how they can be represented at different positions. I usually create a BED-like file for indels in which I print one line for every position from (start - indel_sizse - 1) to (stop + indel_size + 1). Bam-readcounts will return counts for all positions, and VarScan will use the best match to obtain metrics for filtering. 2. Running FPfilter...
5 years ago
vic posted a comment on discussion Help

Hi, Dan, it worked for my snp.Somatic.hc. I have four further questions: 1) To generate a bed-like var file for bam-readcount: if it's a snp, say, chr1 100 A C..., I do print $1,$2,$2 and get chr1 100 100 for var; however, for an indel: chr1 100 A +C..., do I still print $1,$2,$2 because it's an insertion; then, only for a deletion: chr1 100 A -C..., I print $1,($2+1),($2+1). Why isn't ($2-1)? What if it's a long deletion: chr1 100 A -CCC..., do I print $1,($2+1),($2+1) or $1,($2+1),($2+4)? 2) I...
5 years ago
vic posted a comment on discussion Help

Hi, Dan, thanks so much for prompt response. I'll try it out and let you know.
5 years ago
Dan Koboldt posted a comment on discussion Help

Hello, thanks for your question. The input for VarScan fpfilter is NOT from VarScan readcounts. It instead requires a separate program, bam-readcount. This is a dependency we hope to fix with a future release, but if you use bam-readcount to generate the readcounts, you should be good to go.
5 years ago
vic posted a comment on discussion Help

Hi, Does anyone know why I get no readcounts when do fpfilter? 10438 variants in input file 0 had a bam-readcount result 0 had reads1>=2 0 passed filters 10438 failed filters 10438 failed because no readcounts were returned I followed STD Varscan steps: 1) samtools mpileup -B -f ref.fa -q 1 normal.bam tumor.bam | varscan somatic; 2) varscan processSomatic; 3) samtools mpileup -B -f ref.fa tumor.bam | varscan readcounts --variants-file; 4) varscan fpfilter var.file readcount.file. var.file looks like:...
6 years ago
liweiguo1 posted a comment on discussion Help

This was mentioned within the manual, the command (mpileup2cns) reports all positions that met the miniumum coverage (reference, SNP, or indel), However, when I used the command below and expect to output all of the position(11578) include monomorphic position which met the miniumum coverage,but in the output vcf Only variants will be reported(7268). I wonder what change am I to make to get all of the 11578 position genotype that met the miniumum coverage , Thank you very much. java -jar VarScan.jar...
6 years ago
Ye Tian modified a comment on discussion Help

Update: problem soved. the parameter --mpileup 1 has to be added into the command. This was mentioned within the manual but not the protocol article. This is really basic. I was following the instructions from the protocol Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection I have a pair of normal tumor samples to analyze so I followed the instruction to mpileup them together with samtools. A combined, mpileup'ed file was generated. However, when I called the function VarScan...
6 years ago
Ye Tian posted a comment on discussion Help

This is really basic. I was following the instructions from the protocol Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection I have a pair of normal tumor samples to analyze so I followed the instruction to mpileup them together with samtools. A combined, mpileup'ed file was generated. However, when I called the function VarScan somatic, I put in one input, followed by the name of my designated output filename. I keep getting error notice that I finally figured out it's expecting...
6 years ago
Jose Fernandez modified a comment on discussion Help

I´ve just stumbled upon a variant whose REF and ALT fields are identical (REF=A and ALT=A). Is this supported in the VCF standard?
6 years ago
Jose Fernandez posted a comment on discussion Help

I just stumbled upon a variant whose REF and ALT fields are identical (REF=A and ALT=A). Is this supported in the VCF standard?
6 years ago
Yongzhe Ren posted a comment on discussion Help

There is a variant result from varscan fpfilter using default parameter like below : 17 7579362 . AACC A . MinMMQSdiff DP=473;SOMATIC;SS=2;SSC=255;GPV=1E0;SPV=6.8578E-82 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:89:89:0:0%:68,21,0,0 1/1:.:384:14:370:96.35%:5,9,244,126 the used bam-readcount result is 17 7579363 A 405 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 G:1:60.00:32.00:0.00:1:0:0.01:0.01:32.00:1:0.01:135.00:0.01...
6 years ago
Jose Fernandez modified a comment on discussion Help

No problem! It should be an easy fix for the next release. For the time being I am using awk to replace the ambiguous codes with N's in the REF column of the VCF files. It seems to do the trick.
6 years ago
Jose Fernandez modified a comment on discussion Help

No problem! It should be an easy fix for the next release. For the time being I am using awk to replace the ambiguous codes with N's in the VCF files. It seems to do the trick.
6 years ago
Jose Fernandez posted a comment on discussion Help

No problem! It should be an easy fix for the next release. For the time being I am using awk to replace the ambiguous codes for N's. in the VCF files. It seems to do the trick.
6 years ago
Dan Koboldt posted a comment on discussion Help

Thank you for providing the examples; now I see what you were trying to describe. An ambiguity code in the reference is an unusual edge case, and a possibility that I had not considered (or coded for) in VarScan. It should be possible to add this compatibility for the next release.
6 years ago
Jose Fernandez modified a comment on discussion Help

Hi Dan, thanks for your answer! I invoked samtools mpileup like this: samtools mpileup -C 50 -B -q 1 -Q 15 -f genome.fasta sample.bam > sample.pileup VarScan's output (2 selected entries) chr2 233142970 . Y C . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:25:0:0:2:0 chr2 233143729 . Y T . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:36:0:0:2:0 Samtools mpileup's...
6 years ago
Jose Fernandez modified a comment on discussion Help

I am running Varscan 2.4.4 (mpileup2cns) using mpileups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but the VCF output is malformed as it contains IUPAC codes in the REF column. The reference genome contains IUPAC codes but none of the reads do, would it be possible to not report variants called on these?
6 years ago
Jose Fernandez posted a comment on discussion Help

Hi Dan, thanks for your answer! I invoked samtools mpileup like this: samtools mpileup -C 50 -B -q 1 -Q 15 -f genome.fasta sample.bam > sample.pileup VarScan's output (2 selected entries) chr2 233142970 . Y C . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:25:0:0:2:0 chr2 233143729 . Y T . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:36:0:0:2:0 Samtools mpileup's...
6 years ago
Dan Koboldt posted a comment on discussion Help

Jose, That's an odd one! Would you mind providing a few lines of the relevant VCF output, and the mpileup for those positions?
6 years ago
Jose Fernandez modified a comment on discussion Help

I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains IUPAC codes in the REF column. The reference genome contains IUPAC codes but none of the reads do, would it be possible to not report variants called on these?
6 years ago
Jose Fernandez modified a comment on discussion Help

I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains IUPAC codes in the REF column. The reference genome contains IUPAC codes, would it be possible to not report variants called on these?
6 years ago
Jose Fernandez modified a comment on discussion Help

I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains "Y" characters in the REF column. Is this expected? Needless to say that none of the reads contain "Y" characters.
6 years ago
Jose Fernandez posted a comment on discussion Help

I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains "Y" characters in the REF column. Is this expected? Needless to say that none of the reads contain "Y" characters.
6 years ago
Yuyu Liu posted a comment on discussion Help

Thank you very much for the answer! This explains why I got slightly different results for indels but identical results for LOH.
6 years ago
Dan Koboldt posted a comment on discussion Help

Dear Yuyu Liu, Thank you for your question. I should first emphasize that using this parameter may not be necessary, as VarScan is designed around the notion that tumors are usually not pure. The tumor-purity option essentially adjusts the minimum variant allele frequency (--min-var-freq) downward, such that a lower VAF threshold is used when calling the consensus base for the tumor. The actual code looks something like this: tumorMinVarFreq = (minVarFreq * tumorPurity); Thus, if your normal --min-var-freq...
6 years ago
Yuyu Liu posted a comment on discussion Help

I am running Varscan for LOH analysis. I estimated that my samples have a tumor purity of 0.4. I ran the program with tumor purity of 1.0 and 0.4. The results didn't change at all. Is this normal? How does the tumor purity option change the calculations?
6 years ago
Jonathan posted a comment on discussion Help

Hi, I am wondering if Varscan2 is compatible with Nanopore data. If yes, are there any tips or reminders for user like myself? Have a nice day. Cheers, Jonathan
6 years ago
Dan Koboldt posted a comment on discussion Help

Brandon, thanks for the question. The differences that you see all come down to read counting. The denominator for the VAF calculation is the total number of reads counted for any allele at that position, regardless of which allele is being counted. If >2 alleles are observed, it's very possible that the denominator accounts for base calls that you may not be aware of in the VCF, i.e. base calls that support neither the reference nor the main ALT allele. Consensus calling mode typically only reports...
6 years ago
Brandon modified a comment on discussion Help

Hello, I use the following command for poolseq data: VarScan.v2.4.3.jar mpileup2cns --min-coverage 8 --p-value 0.05 --min-var-freq 0.000625 --strand-filter 1 --min-freq-for-hom 0.80 --min-avg-qual 20 --output-vcf 1 > out.vcf I then use GATK VariantsToTable to write the file to .txt. I then use the TYPE column to distinguish either SNP or INDEL. I put each TYPE into separate files. I then further filter the SNP data for biallelic loci. I've noticed that for the TYPE=SNP data, I have inconsistencies...
6 years ago
Brandon modified a comment on discussion Help

Hello, I use the following command for poolseq data: VarScan.v2.4.3.jar mpileup2cns --min-coverage 8 --p-value 0.05 --min-var-freq 0.000625 --strand-filter 1 --min-freq-for-hom 0.80 --min-avg-qual 20 --output-vcf 1 > out.vcf I then use GATK VariantsToTable to write the file to .txt. I then use the TYPE column to distinguish either SNP or INDEL. I put each TYPE into separate files. I've noticed that for the TYPE=SNP data, I have inconsistencies for FREQ, DP, AD, and RD. Sometimes AD + RD != DP (but...
6 years ago
Brandon posted a comment on discussion Help

Hello, I use the following command: VarScan.v2.4.3.jar mpileup2cns --min-coverage 8 --p-value 0.05 --min-var-freq 0.000625 --strand-filter 1 --min-freq-for-hom 0.80 --min-avg-qual 20 --output-vcf 1 > /scratch/lindb/JP_pangenome/JP_pooled/varscan/JP_pooled_varscan_bedfile_0000.vcf I then use GATK VariantsToTable to write the file to .txt. I then use the TYPE column to distinguish either SNP or INDEL. I put each TYPE into separate files. I've noticed that for the TYPE=SNP data, I have inconsistencies...
6 years ago
Dan Koboldt posted a comment on discussion Help

Thank you for your question. I'm not sure I have enough information to answer entirely. Are you working with WES or WGS data? By de novo mutations, you mean ones that are present in the child but not the parents, I presume. There should be 1-2 in the exome and perhaps 60-80 genome-wide. Are you using VarScan mpileup2cns on the trio, and then looking for variants that appear heterozygous only in the proband? That would be my recommendation, for starters. I would then apply VarScan FPfilter with the...
6 years ago
JJJ posted a comment on discussion Help

Hello, I am currently using VarScan v2.4.4 to call SNP variants. There are two things which are really confusing. The first is How the P-value is calculated. what I know so far is that it is based on the supporting reads for ALT and REF. The confusing part is Fisher exact test needs at least 2X2 table. So, what are the other two data? How can I verify the P-value I got from VarScan by using the information from samtools by my own. The second is How to deal with 1/2 GT. I found that there is a situation...
6 years ago
Lait modified a comment on discussion Help

Hi we perform WES on trios, (mother, father and child with cancer). we are very much interested in catching de novo mutations in the children. We have tried two sets of values for fpfilter: set 1: Fewer than 3 variant-supporting reads Variant allele frequency below 0.05 Relative average read position < 0.15 Average distance to effective 3' end < 0.15 Average mismatch quality sum for variant reads > 100 Average mapping quality of variant reads < 30 Average base quality of variant reads < 30 Strand...
6 years ago
Lait posted a comment on discussion Help

Hi we perform WES on trios, (mother, father and child with cancer). we are very much interested in catching de novo mutations in the children. We have tried two sets of values for fpfilter: set 1: Fewer than 3 variant-supporting reads Variant allele frequency below 0.05 Relative average read position < 0.15 Average distance to effective 3' end < 0.15 Average mismatch quality sum for variant reads > 100 Average mapping quality of variant reads < 30 Average base quality of variant reads < 30 Strand...
6 years ago
Dan Koboldt posted a comment on discussion Open Discussion

Dear VarScan users, We are writing an NIH grant application for funding to harden, improve, and expand VarScan functionality. If you have feature requests or suggestions for improving our tool, please share them here or send them to me directly. Also, if you or someone at your institution uses VarScan and would be willing to sign a letter of support (LOS) for our application, please drop me a line at dankoboldt at gmail. I will send you our specific aims and a template letter that you can customize....
6 years ago
Gianfilippo Coppola posted a comment on discussion Help

Hi, how can I limit the analysis using a target design BED file ? Thanks Gianfilippo
6 years ago
Gianfilippo Coppola posted a comment on discussion Help

Hi, is the varscan "readcount" command replacing the bam-readcount utility ? Thanks Gianfilippo
6 years ago
ZhouYifan posted a comment on discussion Help

I have some files,when I run the perl script different file will have different result. For example: Use of uninitialized value $input in <handle> at mergeSegments.pl line 127. readline() on unopened filehandle at mergeSegments.pl line 127. Can't use an undefined value as a symbol reference at mergeSegments.pl line 215. This is some files run the script appear this error and no anything output. Meanwhile a tsv file is generated when the error information in the previous reply. Please tell me the...
6 years ago
ZhouYifan posted a comment on discussion Help

Thank you . I have create my ref-arm-size file. But when I run the perl script still have some error.example: Use of uninitialized value $input in <handle> at mergeSegments.pl line 127. readline() on unopened filehandle at mergeSegments.pl line 127. Can't use an undefined value as a symbol reference at mergeSegments.pl line 215. I exam my ref-arm-size file , it have all chr. I don't know the reason of this question. Hope you can help me !</handle>
7 years ago
Titos posted a comment on discussion Help

Hello colleagues I am working on haploid genomes and I would like to know with which parameter I could launch "varscan mpileup2snp" to consider haploid variants. Thank you in advance
7 years ago
Sagi posted a comment on discussion Open Discussion

Hello, I'm using Varscan in a project that we are working on, and it is crucial for me to know how Varscan2 pileup2cns decides on the most frequent non-reference allele to report. I understand that it calls a consensus variant for each sample based on the greatest number of supporting reads but what happens when two different alleles share the same number of supporting reads. Thank you.
7 years ago
SaraM modified a comment on discussion Help

Hi, I am using Varscan2 (version 2.4.3) to call somatic variants and I would have a question regarding the fpfilter. I thought that through the parameter "--max-somatic-p" I could apply a threshold to the SPV field values of my variants but it seems to me that it is not really responsive, even if I use very low values for the parameter "--max-somatic-p-depth". I had a look at the fpfilter code (I am an absolute Java beginner) and it seems to me that "--max-somatic-p" is not really used for processing...
7 years ago
cristina posted a comment on discussion Help

Hello, thanks in advance for the help If I use samtools mpileup -B -q 1 -f hg38.fa recal_reads08.bam recal_reads06.bam recal_reads02.bam > trio.mpileup dad mum child And I want to call the snp variants from mpileup file java -jar VarScan.v2.3.9.jar mpileup2snp trio.mpileup --output-vcf > varscansnp.vcf in the results of vcf file I get CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample1 correspond to recal_reads08.bam? Because I have the feeling is not in the same order as...
7 years ago
Dan Koboldt posted a comment on discussion Help

Sara, Thanks again for bringing this to my attention. It turns out the somatic P-value threshold was not being applied in the FP filter. This has been fixed for the next release (v2.4.4).
7 years ago
Dan Koboldt posted a comment on discussion Help

Changhan, thanks for clarifying!
7 years ago
Dan Koboldt posted a comment on discussion Help

Manal, you have to run it on each pair. I'm not sure what you're asking about the read depth; this is one of the output fields provided by VarScan, but only reported where VarScan calls a variant.
7 years ago
Dan Koboldt posted a comment on discussion Help

Zhou, Your ref-arm-sizes file is wrong: it should have one line for each chromosome arm.
7 years ago
Dan Koboldt posted a comment on discussion Open Discussion

Harshmi, I'm not sure what to make of the columsn, which don't match normal VarScan output formats. If you're still having trouble with this issue, would you mind posting your mpileup for the relevant line, as well as the VarScan version and output line?
7 years ago
Dan Koboldt posted a comment on discussion Help

Guillaume, thank you for your question and I apologize for the delay in replying. In your example, the read counts are somewhat confusing because it's an insertion. Based on the VarScan output, there are 237 reads supporting the reference, 1692 supporting the insertion, and 86 reads supporting other alleles. Getting a look at your mpileup is the only way to be sure; would you mind sharing it?
7 years ago
Dan Koboldt posted a comment on discussion Help

Zhou, the ref-arm-sizes file is one that you create, designating the coordinates of the "p" and "q" arms of each chromosome. It should be tab-delimited with four columns: chromosome, start, stop, p/q. For example: chr1 1 125000000 p chr1 125000001 249250621 q If you need to find out the coordinates of p/q arms, the "cytoBand.txt.gz" file in the UCSC Genome Database provides this information.
7 years ago
ZhouYifan posted a comment on discussion Help

Can you tell me how the chr-arm file is obtained? Thank you very much!
7 years ago
ZhouYifan posted a comment on discussion Help

Can you tell me how the chr-arm file is obtained? Thank you very much!
7 years ago
Guillaume Barreau posted a comment on discussion Help

Hello, I have used varscan with java -jar VarScan.v2.3.9.jar mpileup2cns RS.pileup --vcf-sample-list sample.list --min-reads2 3 --min-coverage 15 --p-value 0.9 --strand-filter 0 --variants --output-vcf > RS.vcf The vcf file is attached to this message. When I look at the results, I see that there is a field called AD (allele depth), a field called DP and a field called FREQ. In most cases, FREQ = AD/DP. However in some cases, this doesn't seem to be the case. Furthermore in those cases, I can't see...
7 years ago
ZhouYifan posted a comment on discussion Help

How do you adjust the parameters of VARSCAN2 SOMATIC according to your own data?please help me！ Thank you
7 years ago
Manal Alattas posted a comment on discussion Help

Hi I have 5 normal files and 14 meningioma files Can I use VarScan to read all files at once or I have to run it on each pair (normal -meningioma) also , is it possible to detect the read deapth ? I need your help please thank you
7 years ago
Changhan Yoon posted a comment on discussion Help

I missed filtering steps which I made. It's my mistake.
7 years ago
Changhan Yoon posted a comment on discussion Help

Hi, I am confused. I am using VarScan2 (2.3.9) to identify and profile somatic mutations from cfDNAs and matched normal blood samples. When I used 0.005 for –-min-var-freq, I could not find some variants with higher VAF. However, when I used 0.05 for --min-var-freq, I found previously unfound variants with VAF over 0.05. For example, A variant (chr2 29606628 . C G . PASS DP=1775;SOMATIC;SS=2;SSC=255;GPV=1E0;SPV=1.136E-98; GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:447:444:0:0%:240,204,0,0 0/1:.:1328:700:601:46.2%:349,351,313,288)...
7 years ago
Changhan Yoon posted a comment on discussion Help

Hi, I am confused. I am using VarScan2 (2.3.9) to identify and profile somatic mutations from cfDNAs and matched normal blood samples. When I used 0.005 for –-min-var-freq, I could not find some variants with higher VAF. However, when I used 0.05 for --min-var-freq, I found previously unfound variants with VAF over 0.05. For example, A variant (chr2 29606628 . C G . PASS DP=1775;SOMATIC;SS=2;SSC=255;GPV=1E0;SPV=1.136E-98; GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:447:444:0:0%:240,204,0,0 0/1:.:1328:700:601:46.2%:349,351,313,288)...
7 years ago
Dan Koboldt posted a comment on discussion Help

Dear Sara, That certainly is the intent of the parameter, but it's possible the logic wasn't implemented. I'll take a look at the code and get back to you here.
7 years ago
SaraM posted a comment on discussion Help

Hi, I am using Varscan2 (version 2.4.3) to call somatic variants and I would have a question regarding the fpfilter. I thought that through the parameter "--max-somatic-p" I could apply a threshold to the values of my SPV field of my variants but it seems to me that it is not really responsive, even if I use very low values for the parameter "--max-somatic-p-depth". I had a look at the fpfilter code (I am an absolute Java beginner) and it seems to me that "--max-somatic-p" is not really used for...
7 years ago
Carlos Garrido-Allepuz posted a comment on discussion Help

On another topic, whe we use samtools 1.9 (last version) for the mpileup for the commands mpileup2snp and mpileup2indel, and then we use those archives for the filter command, the command stops due to parsing errors in the snp.vcf archive (pointing to the FREQ fiel of the Format Column). But when we use the same archives with samtools 1.8, this problem do not appear.
7 years ago
Carlos Garrido-Allepuz posted a comment on discussion Help

Good Morning we are having some issues with the Filter command. It seems that the option --min-strand2 it is not working correctly. We are launching the command with the option by default (1, that is, as we understand, that the variant has to be in at least one of the strands to pass the filter), but it is filtering out some variants by the strand2 filter. If we use other numbers for this filter, the same number of variants (and the same variants) are being filtered out. Is this a bug, or we are...
7 years ago
Raheleh modified a comment on discussion Help

Dear Dan, many thanks for your reply. How can I merge my vcf files? I tried to use CombineVariants to combine my 6 vcf files, but I get this error : Unable to access jarfile/ GenomeAnalysisTK.jar In fact there is no file namely GenomeAnalysisTK.jar. Do you know how can I fix this problem? Another problem is about the format of vcf file, it's not the same as accepted format for ANNOVAR. How can I convert them to annovar format? About the vcf file could you please tell me why the third colum (ID) is...

1 >