|
From: Petr D. <pd...@sa...> - 2014-09-23 13:31:24
|
There is a slowly growing documentation on the new website, please check the variant calling workflow http://www.htslib.org/workflow/ You need to use `bcftools call`, not `bcftools view`. petr On Tue, 2014-09-23 at 09:25 -0400, jbr950 wrote: > Good catch - I was trying to use an old version of bcftools with the > new version of samtools.. > > > So I changed that to point to the new version of bcftools, and now: > > > samtools-bcftools-htslib-1.0_x64-linux/bin/samtools mpileup -BQ0 > -d10000000 -l $variantBed -uf ${ref} $bamFile > test.bcf > > > > So far so good.. > > > And then: > cat test.bcf | samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools > view -cg - > Error: Could not parse --min-ac g > > > > On the other hand, calling: . > cat test.bcf | samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools > view- > > > > does create a nice output, so I'm inclined to let it go with the > defaults. I don't see equivalent arguments for the -c and -g flags and > I'm not particularly attached to the use of Bayesian inference, > although it would be good to understand what differences I can expect > between the old method (using -c) and the new (sans -c)? > > > Thanks for your time and assistance! > > > Jonathan > > > > > > On Tue, Sep 23, 2014 at 2:40 AM, Petr Danecek <pd...@sa...> > wrote: > This does not look like output from samtools 1.0 - it outputs > VCFv4.2 > > petr > > On Mon, 2014-09-22 at 14:53 -0400, jbr950 wrote: > > Hi Petr, > > Thanks for your reply. I grabbed the > > samtools-bcftools-htslib-1.0_x64-linux binary and tried > again. > > > > > > When I'd used Samtools 0.1.18, the issue I emailed about was > that the > > number of lines in output varied by .bam file, and I didn't > understand > > why lines were being ommitted, and why not in a common > manner. > > > > > > Using version 1.0 and the same command (I checked on the > older > > samtools and it is producing output), I get no output at > all. Mpileup > > writes a header and then stops. I have copied and pasted > the output. > > > > > > > > My .bed is 3 columns: chr start stop > > with no header. > > > > > > Any advice appreciated, thanks! > > > > > > Jonathan > > > > > > > > > > > > > > [mpileup] 1 samples in 1 input files > > (mpileup) Max depth is above 1M. Potential memory hog! > > [bcf_sync] incorrect number of fields (0 != 5) at 0:0 > > [afs] 0:0.000 > > ##fileformat=VCFv4.1 > > ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read > depth"> > > ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# > high-quality > > ref-forward bases, ref-reverse, alt-forward and alt-reverse > bases"> > > > ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square > > mapping quality of covering reads"> > > ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred > probability of > > all samples being the same"> > > > ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood > > estimate of the first ALT allele frequency (assuming HWE)"> > > > ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood > > estimate of the first ALT allele count (no HWE assumption)"> > > ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate > of genotype > > frequencies"> > > ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based > HWE test > > P-value based on G3"> > > ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio > of > > genotype likelihoods with and without the constraint"> > > ##INFO=<ID=UGT,Number=1,Type=String,Description="The most > probable > > unconstrained genotype configuration in the trio"> > > ##INFO=<ID=CGT,Number=1,Type=String,Description="The most > probable > > constrained genotype configuration in the trio"> > > ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for > strand > > bias, baseQ bias, mapQ bias and tail distance bias"> > > ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates > that the > > variant is an INDEL."> > > ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred > probability of > > the nonRef allele frequency in group1 samples being larger > (,smaller) > > than in group2."> > > ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior > weighted > > chi^2 P-value for testing the association between group1 and > group2 > > samples."> > > ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred > scaled > > PCHI2."> > > ##INFO=<ID=PR,Number=1,Type=Integer,Description="# > permutations > > yielding a smaller PCHI2."> > > ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant > Distance > > Bias"> > > ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> > > ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype > Quality"> > > ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods > for > > RR,RA,AA genotypes (R=ref,A=alt)"> > > ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# > high-quality > > bases"> > > > ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand > > bias P-value"> > > ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of > > Phred-scaled genotype likelihoods"> > > #CHROM POS ID REF ALT QUAL FILTER INFO > FORMAT > > > > > > > > On Thu, Sep 4, 2014 at 5:20 AM, Petr Danecek > <pd...@sa...> wrote: > > Hi Jonathan, > > > > these are good questions. Could you please try with > the latest > > release, > > I am happy to answer any remaining issues not solved > by the > > upgrade. > > > > Cheers, > > Petr > > > > On Sun, 2014-08-17 at 01:34 -0400, Jo R wrote: > > > Hello, > > > I'm attemping to get nucleotide frequency > information at > > a number > > > of positions across a number of samples, and am > having > > difficulty > > > interpreting some output. Any insights would be > > appreciated. > > > > > > > > > I'm running the following command: > > > > > > > > > samtools mpileup -BQ0 -d10000000 -l > VariantBed.bed -uf > > $refFile $bam > > > | bcftools view -bcg - | bcftools view - > > > > ${sampleName}_validation.vcf > > > > > > > > > > > > I notice that this command creates an output file > with > > > an unpredictable number of rows. Running the > command using > > the same > > > bed file on a set of different .bam files creates > a set of > > output vcf > > > files with a wide distribution in numbers of rows. > > > > > > > > > I presumed that the difference in row numbers > means that > > some > > > positions drop out on some .bam files because > those samples > > lacked > > > coverage where other samples had coverage. > > > > > > > > > If that's the case, though, I don't know what to > make of > > lines like > > > the following one: > > > > > > > > > 1 2160881 . G . 28.2 . > > > DP=0;VDB=0.0003;;AC1=2;FQ=-30 PL 0 > > > > > > > > > > > > here, it looks like DP=0, but this position still > got > > reported in the > > > vcf output. I also don't see AC1 in the legend > for INFO > > tags in the > > > samtools specification page, so I don't know what > to make of > > a value > > > of 2. > > > > > > > > > So, I am confused. Positions with a positive > value of DP > > and DP4 make > > > sense to me. But why are some positions > completely ommitted > > from the > > > vcf output, and other positions reporting a DP=0? > > > > > > > > > Thanks for any advice. > > > > > > > > > Best regards, > > > Jonathan > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > > > Samtools-help mailing list > > > Sam...@li... > > > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by > Genome > > Research > > Limited, a charity registered in England with > number 1021457 > > and a > > company registered in England with number 2742969, > whose > > registered > > office is 215 Euston Road, London, NW1 2BE. > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > Research > Limited, a charity registered in England with number 1021457 > and a > company registered in England with number 2742969, whose > registered > office is 215 Euston Road, London, NW1 2BE. > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |