Activity for Dan Koboldt

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thank you for your question. To me, it looks like you have your commands backwards: You run samtools mpileup FIRST, and pipe the output of that into VarScan. Also, keep in mind that you don't need to run two separate mpileup commands to generate the two-sample mpileup VarScan needs. You can give Samtools mpileup two BAM files. Also, a minor point, always provide an argument to VarScan option flags (e.g. use --mpileup 1). Here's my recommended restructuring of your command: samtools mpileup -B -q...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, and thanks for asking. The short answer is no. The VarScan readcounts command has been around forever, and it does facilitate some specialized analyses. However, it does not replace the bam-readcount functionality required for false positive filtering (though this is high on the priority list for the next release).

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Brian, If you extract Somatic calls only, you'll get only variants that are found in the tumor. You can use the processSomatic command to facilitate this extraction.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Brian, the ref column is the reference base. The alts correspond to alternate allele numbers 1 and 2, so you're correct that alt 1 is the alternate allele in the normal and alt 2 is the alternate allele in the tumor IN THIS CASE. The numbers representing alts are sample-agnostic, so it would also be possible for the normal to be 1/2. If there's not a comma-separated list, then the alt allele is the alternate seen, not necessarily in the tumor, but for this position. You have to look at the genotype...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    1. BED-like var file for bamreadcount For SNVs it's straightforward as you say: chromosome, start, stop, and start equals stop. Indels are more challenging because of how they can be represented at different positions. I usually create a BED-like file for indels in which I print one line for every position from (start - indel_sizse - 1) to (stop + indel_size + 1). Bam-readcounts will return counts for all positions, and VarScan will use the best match to obtain metrics for filtering. 2. Running FPfilter...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, thanks for your question. The input for VarScan fpfilter is NOT from VarScan readcounts. It instead requires a separate program, bam-readcount. This is a dependency we hope to fix with a future release, but if you use bam-readcount to generate the readcounts, you should be good to go.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thank you for providing the examples; now I see what you were trying to describe. An ambiguity code in the reference is an unusual edge case, and a possibility that I had not considered (or coded for) in VarScan. It should be possible to add this compatibility for the next release.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Jose, That's an odd one! Would you mind providing a few lines of the relevant VCF output, and the mpileup for those positions?

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Dear Yuyu Liu, Thank you for your question. I should first emphasize that using this parameter may not be necessary, as VarScan is designed around the notion that tumors are usually not pure. The tumor-purity option essentially adjusts the minimum variant allele frequency (--min-var-freq) downward, such that a lower VAF threshold is used when calling the consensus base for the tumor. The actual code looks something like this: tumorMinVarFreq = (minVarFreq * tumorPurity); Thus, if your normal --min-var-freq...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Brandon, thanks for the question. The differences that you see all come down to read counting. The denominator for the VAF calculation is the total number of reads counted for any allele at that position, regardless of which allele is being counted. If >2 alleles are observed, it's very possible that the denominator accounts for base calls that you may not be aware of in the VCF, i.e. base calls that support neither the reference nor the main ALT allele. Consensus calling mode typically only reports...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thank you for your question. I'm not sure I have enough information to answer entirely. Are you working with WES or WGS data? By de novo mutations, you mean ones that are present in the child but not the parents, I presume. There should be 1-2 in the exome and perhaps 60-80 genome-wide. Are you using VarScan mpileup2cns on the trio, and then looking for variants that appear heterozygous only in the proband? That would be my recommendation, for starters. I would then apply VarScan FPfilter with the...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Open Discussion

    Dear VarScan users, We are writing an NIH grant application for funding to harden, improve, and expand VarScan functionality. If you have feature requests or suggestions for improving our tool, please share them here or send them to me directly. Also, if you or someone at your institution uses VarScan and would be willing to sign a letter of support (LOS) for our application, please drop me a line at dankoboldt at gmail. I will send you our specific aims and a template letter that you can customize....

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Sara, Thanks again for bringing this to my attention. It turns out the somatic P-value threshold was not being applied in the FP filter. This has been fixed for the next release (v2.4.4).

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Changhan, thanks for clarifying!

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Manal, you have to run it on each pair. I'm not sure what you're asking about the read depth; this is one of the output fields provided by VarScan, but only reported where VarScan calls a variant.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Zhou, Your ref-arm-sizes file is wrong: it should have one line for each chromosome arm.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Open Discussion

    Harshmi, I'm not sure what to make of the columsn, which don't match normal VarScan output formats. If you're still having trouble with this issue, would you mind posting your mpileup for the relevant line, as well as the VarScan version and output line?

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Guillaume, thank you for your question and I apologize for the delay in replying. In your example, the read counts are somewhat confusing because it's an insertion. Based on the VarScan output, there are 237 reads supporting the reference, 1692 supporting the insertion, and 86 reads supporting other alleles. Getting a look at your mpileup is the only way to be sure; would you mind sharing it?

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Zhou, the ref-arm-sizes file is one that you create, designating the coordinates of the "p" and "q" arms of each chromosome. It should be tab-delimited with four columns: chromosome, start, stop, p/q. For example: chr1 1 125000000 p chr1 125000001 249250621 q If you need to find out the coordinates of p/q arms, the "cytoBand.txt.gz" file in the UCSC Genome Database provides this information.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Dear Sara, That certainly is the intent of the parameter, but it's possible the logic wasn't implemented. I'll take a look at the code and get back to you here.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Sorry for my lateness in answering this, but the VarScan native output uses IUPAC Ambiguity codes. If you want VCF-style output, use the parameter --output-vcf 1.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Cristina, I sent you a more detailed explanation by message, but in general you should NOT do this. When calling de novo mutations, it's essential to know that you have sufficient coverage in both parents. If you combine them into a single BAM, all of the coverage could be from one parent.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Raheleh, thanks for your question. I'm not sure I have enough information to provide detailed guidance, but most studies with paired tumor-normal data are most interested in somatic alterations. To that end, you should work with the FPfilter-passed somatic mutation calls from each tumor (one for SNVS, one for indels). You might combine the SNVs and indels into a single file for convenience. I can't think of a good reason you'd want to combine somatic calls from multiple patients into a single file....

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, and thanks for posting. That error was corrected in a later release of VarScan. Please update to the newest version on GitHub

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Leonor, Thank you for the question, and I apologize profusely for my delay in getting back to you. Yes, VarScan calls the most-observed variant allele in each sample. You might get better results calling samples individually. The little-known 'readcounts' command of VarScan will take a single-sample mpileup as input and generate read counts for each observed allele (including reference).

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Jon, Thanks. The problem is that you're using an old version of VarScan (v2.3.9). The current version is v2.4.3. This particular bug was fixed in v2.4.0. Get the newest release on GitHub.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    You're correct that mis-typed parameters are not always caught by VarScan, though user-provided parameters settings displayed when you run most commands. I'm not sure I'd call this a bug, so much as a missing feature, but I'll look into it for the next release.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    I'll look into this -- would you mind posting (or sending me) the mpileup line(s) for that position?

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, thank you for posting. Would you mind sharing the mpileup line(s) from the malformed VCF variant?

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Aidan, VarScan does not create semicolon-delimited output, with the possible exception being the INFO field when VCF output is specified. If you have true semicolon delimiters, something is very wrong! It would help enormously if you shared the first few lines of your input/output files as well as the commands being used.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Holly, You definiitely want to use bam-readcount for the fp-fiter purposes. VarScan's readcount feature operates differently and isn't designed for use with the filter (honestly I didn't think anyone used that command).

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Yes! Glad you figured it out.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    For anyone still encountering array index out of bound exceptions, I recommend using...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Use the --vcf-sample-list parameter, giving it a file of sample names in order one...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thanks for replying, Robert. I would add that in general, you want to run processSomatic...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Zejun, I'm afraid that's outside the scope of the trio command. Instead, I'd run...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Most likely, this has to do with how the bam-readcounts file is generated for the...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    I think this was caused by the command syntax issue I pointed out on the other t...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Would you like to share the first 10 lines of your pileup file? In particular, if...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    You need to provide the output file basename as the third argument. The --output-vcf...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    In general, you can ignore this warning. However, I strongly recommend running VarScan...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    I think this is just a matter of a change to the column ordering in DNAcopy output....

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Alex, could I ask you to send me the actual mpileup lines and the output you received?...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Alex, thanks for posting. It looks like a clean deletion in IGV, but seeing the mpileup...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    I have not heard of this error before. If you wouldn't mind, please share the command...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Vai, That's an often-reported issue caused by zero-depth lines in the mpileup. It...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    That's an odd result. Would you mind posting/sharing the first 20 lines of your variants...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    This issue should be addressed in the latest VarScan release, v2.4.2.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    They're not supposed to match: The DP field is the total depth for that position...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello Xiaoxiao, You can ignore the "Not resetting" warnings, as these are status...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Not at all, Raman. It means you need to include chrM in your ref-arm-sizes file....

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    If you post some examples (VarScan output), I'll take a look.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    We typiclly use -q 1 or -q 10.... this is minimum mapping quality, so a difficult...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    That's curious... it might be an error with the call-counting code. Will look into...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Always trust the calls within the files over the counters. If you want, set the "--validation...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Corentin, You can use VarScan 2 copynumber on exome or WGS data. However, it does...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    It's true that VarScan has been tuned using primarily Illumina data, and I haven't...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Yaseen and Bella, Thank you both for asking. The correct way is to license it through...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thanks for the message -- these things are possible (and recommended, if you read...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    I believe I've put in a fix that should handle this exception more gracefully, but...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Ankita, that looks like a mal-formed mpileup line (as I said by e-mail), but I'm...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thank you all for reporting this. It seems to be caused by incomplete mpileup lines,...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    That's an odd behavior for certain. Best guess? The "--output-vcf 1" parameter is...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    "Consensus" really only refers to the second column in your example, which is typically...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Colin, Correct: --min-avg-qual is a threshold for individual bases, and determines...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Colin, thanks for the message. I'm afraid the name of that parameter is misleading....

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, That's a typo in the paper, for item 1. Variants present only in the tumor...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Yes, it should be species-agnostic. Most likely it's the arm sizes file -- you need...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Floris, I just released new files (VarScan v2.3.9) with patches that should resolve...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Floris, Thank you for the test files, which helped me identify the problem. I will...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    For bam-readcount, you need to give it an input list of positions in the format:...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thank you for asking! In your case, I would: 1.) Run processSomatic and isolate the...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Yes, one needs to generate the readcounts with bam-readcount. I do intend to put...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thanks, Chris! I will try these out and see about implementing the speedup in the...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    elmesmari, No, that's VarScan native output format. You need to use --output-vcf...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Ania, VarScan does not populate the QUAL column of the VCF file; instead, it provides...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Flavia, Thank you again for reporting this; it helped me track down an issue with...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello all, I've just released a new version of VarScan (v2.3.7) that includes an...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    VarScan v2.3.7 is now available, and addresses this issue: https://sourceforge.n...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Yes, thank you for commenting. This IS an issue due to the SAMtools 0-depth formatting...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thank you for asking. That parameter is poorly named; more accurately, it would be...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Darren, Thank you for posting your question, and for attaching all of the files that...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Correction: in reviewing the current code to investigate the missing columns issue,...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    The first error is probably a chromosome in your segments file that's not in the...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Typically we run in parallel with a perl script that sub-sets the BAM file by chromosome...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Phillipp, I can't imagine a scenario where VarScan would output an incomplete VCF...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    This looks like the zero-depth mpileup line issue, which is a known bug that should...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Sophie, I hope you've been able to make more progress with this issue. I'm afraid...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    To the best of my knowledge, VarScan is outputting valid VCF. Your incident might...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    SDP is the Samtools depth, which reflects what you'll probably see in the mpileup....

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    It's a little surprising that SAMtools bogs down with fewer than 76 samples; that...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Jane, Have you tried running the "readcounts" subcommand of VarScan? It should tell...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Can you provide some details of the command you were running? How many samples were...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thanks for your message, and I'm sorry to hear about your troubles! Would you send...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    This is a known VCF indel formatting issue that was corrected in v2.3.6.

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, You need to put your input file before the parameter settings, i.e. java -jar...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Hello, It is puzzling that a subset of files were missing VCF headers. Would you...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Thanks to both of you for bringing this up -- another user e-mailed to report the...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Andrea, Thanks for posting this... it looks like VarScan's calculated VAF was lower...

  • Dan Koboldt Dan Koboldt posted a comment on discussion Help

    Vincent, DP4 is a SAMtools field that contains forward ref, reverse ref, forward...

1 >