|
From: Petr D. <pd...@sa...> - 2014-09-23 06:44:39
|
This does not look like output from samtools 1.0 - it outputs VCFv4.2
petr
On Mon, 2014-09-22 at 14:53 -0400, jbr950 wrote:
> Hi Petr,
> Thanks for your reply. I grabbed the
> samtools-bcftools-htslib-1.0_x64-linux binary and tried again.
>
>
> When I'd used Samtools 0.1.18, the issue I emailed about was that the
> number of lines in output varied by .bam file, and I didn't understand
> why lines were being ommitted, and why not in a common manner.
>
>
> Using version 1.0 and the same command (I checked on the older
> samtools and it is producing output), I get no output at all. Mpileup
> writes a header and then stops. I have copied and pasted the output.
>
>
>
> My .bed is 3 columns: chr start stop
> with no header.
>
>
> Any advice appreciated, thanks!
>
>
> Jonathan
>
>
>
>
>
>
> [mpileup] 1 samples in 1 input files
> (mpileup) Max depth is above 1M. Potential memory hog!
> [bcf_sync] incorrect number of fields (0 != 5) at 0:0
> [afs] 0:0.000
> ##fileformat=VCFv4.1
> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
> ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality
> ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
> ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
> mapping quality of covering reads">
> ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of
> all samples being the same">
> ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood
> estimate of the first ALT allele frequency (assuming HWE)">
> ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood
> estimate of the first ALT allele count (no HWE assumption)">
> ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype
> frequencies">
> ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test
> P-value based on G3">
> ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of
> genotype likelihoods with and without the constraint">
> ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable
> unconstrained genotype configuration in the trio">
> ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable
> constrained genotype configuration in the trio">
> ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand
> bias, baseQ bias, mapQ bias and tail distance bias">
> ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the
> variant is an INDEL.">
> ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of
> the nonRef allele frequency in group1 samples being larger (,smaller)
> than in group2.">
> ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted
> chi^2 P-value for testing the association between group1 and group2
> samples.">
> ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled
> PCHI2.">
> ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations
> yielding a smaller PCHI2.">
> ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance
> Bias">
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
> ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for
> RR,RA,AA genotypes (R=ref,A=alt)">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality
> bases">
> ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand
> bias P-value">
> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
> Phred-scaled genotype likelihoods">
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
>
>
>
> On Thu, Sep 4, 2014 at 5:20 AM, Petr Danecek <pd...@sa...> wrote:
> Hi Jonathan,
>
> these are good questions. Could you please try with the latest
> release,
> I am happy to answer any remaining issues not solved by the
> upgrade.
>
> Cheers,
> Petr
>
> On Sun, 2014-08-17 at 01:34 -0400, Jo R wrote:
> > Hello,
> > I'm attemping to get nucleotide frequency information at
> a number
> > of positions across a number of samples, and am having
> difficulty
> > interpreting some output. Any insights would be
> appreciated.
> >
> >
> > I'm running the following command:
> >
> >
> > samtools mpileup -BQ0 -d10000000 -l VariantBed.bed -uf
> $refFile $bam
> > | bcftools view -bcg - | bcftools view - >
> > ${sampleName}_validation.vcf
> >
> >
> >
> > I notice that this command creates an output file with
> > an unpredictable number of rows. Running the command using
> the same
> > bed file on a set of different .bam files creates a set of
> output vcf
> > files with a wide distribution in numbers of rows.
> >
> >
> > I presumed that the difference in row numbers means that
> some
> > positions drop out on some .bam files because those samples
> lacked
> > coverage where other samples had coverage.
> >
> >
> > If that's the case, though, I don't know what to make of
> lines like
> > the following one:
> >
> >
> > 1 2160881 . G . 28.2 .
> > DP=0;VDB=0.0003;;AC1=2;FQ=-30 PL 0
> >
> >
> >
> > here, it looks like DP=0, but this position still got
> reported in the
> > vcf output. I also don't see AC1 in the legend for INFO
> tags in the
> > samtools specification page, so I don't know what to make of
> a value
> > of 2.
> >
> >
> > So, I am confused. Positions with a positive value of DP
> and DP4 make
> > sense to me. But why are some positions completely ommitted
> from the
> > vcf output, and other positions reporting a DP=0?
> >
> >
> > Thanks for any advice.
> >
> >
> > Best regards,
> > Jonathan
> >
> >
> >
> >
>
> >
> ------------------------------------------------------------------------------
> > _______________________________________________
> > Samtools-help mailing list
> > Sam...@li...
> > https://lists.sourceforge.net/lists/listinfo/samtools-help
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome
> Research
> Limited, a charity registered in England with number 1021457
> and a
> company registered in England with number 2742969, whose
> registered
> office is 215 Euston Road, London, NW1 2BE.
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|