Re: [Samtools-help] mpileup output confusion

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

There is a slowly growing documentation on the new website, please check
the variant calling workflow
http://www.htslib.org/workflow/

You need to use `bcftools call`, not `bcftools view`.

petr

On Tue, 2014-09-23 at 09:25 -0400, jbr950 wrote: 
> Good catch - I was trying to use an old version of bcftools with the
> new version of samtools..
> 
> 
> So I changed that to point to the new version of bcftools, and now:
> 
> 
> samtools-bcftools-htslib-1.0_x64-linux/bin/samtools  mpileup  -BQ0
> -d10000000 -l $variantBed -uf ${ref} $bamFile > test.bcf
> 
> 
> 
> So far so good..
> 
> 
> And then:
> cat test.bcf | samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools
> view -cg - 
> Error: Could not parse --min-ac g
> 
> 
> 
> On the other hand, calling: .  
> cat test.bcf | samtools-bcftools-htslib-1.0_x64-linux/bin/bcftools
> view-
> 
> 
> 
> does create a nice output, so I'm inclined to let it go with the
> defaults. I don't see equivalent arguments for the -c and -g flags and
> I'm not particularly attached to the use of Bayesian inference,
> although it would be good to understand what differences I can expect
> between the old method (using -c) and the new (sans -c)?
> 
> 
> Thanks for your time and assistance!
> 
> 
> Jonathan
> 
> 
> 
> 
> 
> On Tue, Sep 23, 2014 at 2:40 AM, Petr Danecek <pd...@sa...>
> wrote:
>         This does not look like output from samtools 1.0 - it outputs
>         VCFv4.2
>         
>         petr 
>         
>         On Mon, 2014-09-22 at 14:53 -0400, jbr950 wrote:
>         > Hi Petr,
>         >     Thanks for your reply.  I grabbed the
>         > samtools-bcftools-htslib-1.0_x64-linux binary and tried
>         again.
>         >
>         >
>         > When I'd used Samtools 0.1.18, the issue I emailed about was
>         that the
>         > number of lines in output varied by .bam file, and I didn't
>         understand
>         > why lines were being ommitted, and why not in a common
>         manner.
>         >
>         >
>         > Using version 1.0 and the same command (I checked on the
>         older
>         > samtools and it is producing output), I get no output at
>         all.  Mpileup
>         > writes a header and then stops.  I have copied and pasted
>         the output.
>         >
>         >
>         >
>         > My .bed is 3 columns: chr   start   stop
>         > with no header.
>         >
>         >
>         > Any advice appreciated, thanks!
>         >
>         >
>         > Jonathan
>         >
>         >
>         >
>         >
>         >
>         >
>         > [mpileup] 1 samples in 1 input files
>         > (mpileup) Max depth is above 1M. Potential memory hog!
>         > [bcf_sync] incorrect number of fields (0 != 5) at 0:0
>         > [afs] 0:0.000
>         > ##fileformat=VCFv4.1
>         > ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read
>         depth">
>         > ##INFO=<ID=DP4,Number=4,Type=Integer,Description="#
>         high-quality
>         > ref-forward bases, ref-reverse, alt-forward and alt-reverse
>         bases">
>         >
>         ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
>         > mapping quality of covering reads">
>         > ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred
>         probability of
>         > all samples being the same">
>         >
>         ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood
>         > estimate of the first ALT allele frequency (assuming HWE)">
>         >
>         ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood
>         > estimate of the first ALT allele count (no HWE assumption)">
>         > ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate
>         of genotype
>         > frequencies">
>         > ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based
>         HWE test
>         > P-value based on G3">
>         > ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio
>         of
>         > genotype likelihoods with and without the constraint">
>         > ##INFO=<ID=UGT,Number=1,Type=String,Description="The most
>         probable
>         > unconstrained genotype configuration in the trio">
>         > ##INFO=<ID=CGT,Number=1,Type=String,Description="The most
>         probable
>         > constrained genotype configuration in the trio">
>         > ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for
>         strand
>         > bias, baseQ bias, mapQ bias and tail distance bias">
>         > ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates
>         that the
>         > variant is an INDEL.">
>         > ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred
>         probability of
>         > the nonRef allele frequency in group1 samples being larger
>         (,smaller)
>         > than in group2.">
>         > ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior
>         weighted
>         > chi^2 P-value for testing the association between group1 and
>         group2
>         > samples.">
>         > ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred
>         scaled
>         > PCHI2.">
>         > ##INFO=<ID=PR,Number=1,Type=Integer,Description="#
>         permutations
>         > yielding a smaller PCHI2.">
>         > ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant
>         Distance
>         > Bias">
>         > ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
>         > ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype
>         Quality">
>         > ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods
>         for
>         > RR,RA,AA genotypes (R=ref,A=alt)">
>         > ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="#
>         high-quality
>         > bases">
>         >
>         ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand
>         > bias P-value">
>         > ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
>         > Phred-scaled genotype likelihoods">
>         > #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>         FORMAT
>         >
>         >
>         >
>         > On Thu, Sep 4, 2014 at 5:20 AM, Petr Danecek
>         <pd...@sa...> wrote:
>         >         Hi Jonathan,
>         >
>         >         these are good questions. Could you please try with
>         the latest
>         >         release,
>         >         I am happy to answer any remaining issues not solved
>         by the
>         >         upgrade.
>         >
>         >         Cheers,
>         >         Petr
>         >
>         >         On Sun, 2014-08-17 at 01:34 -0400, Jo R wrote:
>         >         > Hello,
>         >         >     I'm attemping to get nucleotide frequency
>         information at
>         >         a number
>         >         > of positions across a number of samples, and am
>         having
>         >         difficulty
>         >         > interpreting some output.  Any insights would be
>         >         appreciated.
>         >         >
>         >         >
>         >         > I'm running the following command:
>         >         >
>         >         >
>         >         > samtools mpileup  -BQ0 -d10000000 -l
>         VariantBed.bed -uf
>         >         $refFile $bam
>         >         > | bcftools view -bcg - | bcftools view - >
>         >         > ${sampleName}_validation.vcf
>         >         >
>         >         >
>         >         >
>         >         > I notice that this command creates an output file
>         with
>         >         > an unpredictable number of rows.  Running the
>         command using
>         >         the same
>         >         > bed file on a set of different .bam files creates
>         a set of
>         >         output vcf
>         >         > files with a wide distribution in numbers of rows.
>         >         >
>         >         >
>         >         > I presumed that the difference in row numbers
>         means that
>         >         some
>         >         > positions drop out on some .bam files because
>         those samples
>         >         lacked
>         >         > coverage where other samples had coverage.
>         >         >
>         >         >
>         >         > If that's the case, though, I don't know what to
>         make of
>         >         lines like
>         >         > the following one:
>         >         >
>         >         >
>         >         > 1       2160881 .       G       .       28.2    .
>         >         > DP=0;VDB=0.0003;;AC1=2;FQ=-30   PL      0
>         >         >
>         >         >
>         >         >
>         >         > here, it looks like DP=0, but this position still
>         got
>         >         reported in the
>         >         > vcf output.  I also don't see AC1 in the legend
>         for INFO
>         >         tags in the
>         >         > samtools specification page, so I don't know what
>         to make of
>         >         a value
>         >         > of 2.
>         >         >
>         >         >
>         >         > So, I am confused.  Positions with a positive
>         value of DP
>         >         and DP4 make
>         >         > sense to me.  But why are some positions
>         completely ommitted
>         >         from the
>         >         > vcf output, and other positions reporting a DP=0?
>         >         >
>         >         >
>         >         > Thanks for any advice.
>         >         >
>         >         >
>         >         > Best regards,
>         >         > Jonathan
>         >         >
>         >         >
>         >         >
>         >         >
>         >
>         >         >
>         >
>          ------------------------------------------------------------------------------
>         >         > _______________________________________________
>         >         > Samtools-help mailing list
>         >         > Sam...@li...
>         >         >
>         https://lists.sourceforge.net/lists/listinfo/samtools-help
>         >
>         >
>         >
>         >
>         >         --
>         >          The Wellcome Trust Sanger Institute is operated by
>         Genome
>         >         Research
>         >          Limited, a charity registered in England with
>         number 1021457
>         >         and a
>         >          company registered in England with number 2742969,
>         whose
>         >         registered
>         >          office is 215 Euston Road, London, NW1 2BE.
>         >
>         >
>         
>         
>         
>         
>         --
>          The Wellcome Trust Sanger Institute is operated by Genome
>         Research
>          Limited, a charity registered in England with number 1021457
>         and a
>          company registered in England with number 2742969, whose
>         registered
>          office is 215 Euston Road, London, NW1 2BE.
>         
> 
> 

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.