From: jbr950 <jb...@gm...> - 2014-09-23 13:26:10
|
Good catch - I was trying to use an old version of bcftools with the new version of samtools.. So I changed that to point to the new version of bcftools, and now: samtools-bcftools-htslib-1.0_x64-linux/bin/samtools mpileup -BQ0 -d10000000 -l $variantBed -uf ${ref} $bamFile > test.bcf So far so good.. And then: cat test.bcf | samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools view -cg - Error: Could not parse --min-ac g On the other hand, calling: . cat test.bcf | samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools view- does create a nice output, so I'm inclined to let it go with the defaults. I don't see equivalent arguments for the -c and -g flags and I'm not particularly attached to the use of Bayesian inference, although it would be good to understand what differences I can expect between the old method (using -c) and the new (sans -c)? Thanks for your time and assistance! Jonathan On Tue, Sep 23, 2014 at 2:40 AM, Petr Danecek <pd...@sa...> wrote: > This does not look like output from samtools 1.0 - it outputs VCFv4.2 > > petr > > On Mon, 2014-09-22 at 14:53 -0400, jbr950 wrote: > > Hi Petr, > > Thanks for your reply. I grabbed the > > samtools-bcftools-htslib-1.0_x64-linux binary and tried again. > > > > > > When I'd used Samtools 0.1.18, the issue I emailed about was that the > > number of lines in output varied by .bam file, and I didn't understand > > why lines were being ommitted, and why not in a common manner. > > > > > > Using version 1.0 and the same command (I checked on the older > > samtools and it is producing output), I get no output at all. Mpileup > > writes a header and then stops. I have copied and pasted the output. > > > > > > > > My .bed is 3 columns: chr start stop > > with no header. > > > > > > Any advice appreciated, thanks! > > > > > > Jonathan > > > > > > > > > > > > > > [mpileup] 1 samples in 1 input files > > (mpileup) Max depth is above 1M. Potential memory hog! > > [bcf_sync] incorrect number of fields (0 != 5) at 0:0 > > [afs] 0:0.000 > > ##fileformat=VCFv4.1 > > ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth"> > > ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality > > ref-forward bases, ref-reverse, alt-forward and alt-reverse bases"> > > ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square > > mapping quality of covering reads"> > > ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of > > all samples being the same"> > > ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood > > estimate of the first ALT allele frequency (assuming HWE)"> > > ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood > > estimate of the first ALT allele count (no HWE assumption)"> > > ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype > > frequencies"> > > ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test > > P-value based on G3"> > > ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of > > genotype likelihoods with and without the constraint"> > > ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable > > unconstrained genotype configuration in the trio"> > > ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable > > constrained genotype configuration in the trio"> > > ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand > > bias, baseQ bias, mapQ bias and tail distance bias"> > > ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the > > variant is an INDEL."> > > ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of > > the nonRef allele frequency in group1 samples being larger (,smaller) > > than in group2."> > > ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted > > chi^2 P-value for testing the association between group1 and group2 > > samples."> > > ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled > > PCHI2."> > > ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations > > yielding a smaller PCHI2."> > > ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance > > Bias"> > > ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> > > ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> > > ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for > > RR,RA,AA genotypes (R=ref,A=alt)"> > > ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality > > bases"> > > ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand > > bias P-value"> > > ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of > > Phred-scaled genotype likelihoods"> > > #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT > > > > > > > > On Thu, Sep 4, 2014 at 5:20 AM, Petr Danecek <pd...@sa...> wrote: > > Hi Jonathan, > > > > these are good questions. Could you please try with the latest > > release, > > I am happy to answer any remaining issues not solved by the > > upgrade. > > > > Cheers, > > Petr > > > > On Sun, 2014-08-17 at 01:34 -0400, Jo R wrote: > > > Hello, > > > I'm attemping to get nucleotide frequency information at > > a number > > > of positions across a number of samples, and am having > > difficulty > > > interpreting some output. Any insights would be > > appreciated. > > > > > > > > > I'm running the following command: > > > > > > > > > samtools mpileup -BQ0 -d10000000 -l VariantBed.bed -uf > > $refFile $bam > > > | bcftools view -bcg - | bcftools view - > > > > ${sampleName}_validation.vcf > > > > > > > > > > > > I notice that this command creates an output file with > > > an unpredictable number of rows. Running the command using > > the same > > > bed file on a set of different .bam files creates a set of > > output vcf > > > files with a wide distribution in numbers of rows. > > > > > > > > > I presumed that the difference in row numbers means that > > some > > > positions drop out on some .bam files because those samples > > lacked > > > coverage where other samples had coverage. > > > > > > > > > If that's the case, though, I don't know what to make of > > lines like > > > the following one: > > > > > > > > > 1 2160881 . G . 28.2 . > > > DP=0;VDB=0.0003;;AC1=2;FQ=-30 PL 0 > > > > > > > > > > > > here, it looks like DP=0, but this position still got > > reported in the > > > vcf output. I also don't see AC1 in the legend for INFO > > tags in the > > > samtools specification page, so I don't know what to make of > > a value > > > of 2. > > > > > > > > > So, I am confused. Positions with a positive value of DP > > and DP4 make > > > sense to me. But why are some positions completely ommitted > > from the > > > vcf output, and other positions reporting a DP=0? > > > > > > > > > Thanks for any advice. > > > > > > > > > Best regards, > > > Jonathan > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > > > Samtools-help mailing list > > > Sam...@li... > > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome > > Research > > Limited, a charity registered in England with number 1021457 > > and a > > company registered in England with number 2742969, whose > > registered > > office is 215 Euston Road, London, NW1 2BE. > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > |