|
From: Andre M. <av...@gm...> - 2012-09-23 22:31:02
|
Hi everyone,
I've just started trying to use vcftools on a genome I've been working,
but encounter a problem whose origin is not immediately clear to me.
I've been trying to get some basic statistics such as average coverage
(i.e. depth) and heterozigosity using the vcftools binary, but even
though the program seems to run fine I only get the word "nan" where I
expected to have results. Here's an example regarding average coverage.
This was obtained with the command "vcftools --vcf Kraken_RT.vcf --out
Kraken_RT --depth", and the following is the content of the .idepth file:
INDV N_SITES MEAN_DEPTH
Kraken_RT_bamhi.sorted.bam 0 nan
Kraken_RT_eag1.sorted.bam 0 nan
Kraken_RT_ecori.sorted.bam 0 nan
The resulting .log file also appears to give me no useful information:
VCFtools - v0.1.9.0
(C) Adam Auton 2009
Parameters as interpreted:
--vcf Kraken_RT.vcf
--depth
--out Kraken_RT
Reading Index file.
File contains 104138656 entries and 3 individuals.
Applying Required Filters.
After filtering, kept 3 out of 3 Individuals
After filtering, kept 104138656 out of a possible 104138656 Sites
Outputting Mean Depth by Individual
Run Time = 773.00 seconds
The first few lines of my vcf file are at the end of this message. It's
a vcf v4.1 file, which is what I thought was the cause of it initially,
but then thought that there are no differences in the way the different
versions of vcf store information regarding depth (I might have gotten
this part wrong though). Any help would be very much appreciated, as
there doesn't seem to be many alternatives to vcf tools to get all the
information I need from my genome.
Thanks!
##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality
ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all
samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate
of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate
of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype
frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test
P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype
likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable
unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable
constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand
bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the
variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of
the nonRef allele frequency in group1 samples being larger (,smaller)
than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted
chi^2 P-value for testing the association between group1 and group2
samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding
a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for
RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand
bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled
genotype likelihoods">
#CHROM POS ID REF ALT QUAL
FILTER INFO FORMAT
Kraken_RT_bamhi.sorted.bam Kraken_RT_eag1.sorted.bam
Kraken_RT_ecori.sorted.bam
1 3568 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3569 . A . 27.8 .
DP=2;AF1=0;AC1=0;DP4=0,1,0,0;MQ=0;FQ=-27 PL
0 0 0
1 3570 . T . 27.8 .
DP=2;AF1=0;AC1=0;DP4=0,1,0,0;MQ=0;FQ=-27 PL
0 0 0
1 3571 . A . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3572 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3573 . A . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3574 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3575 . G . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3576 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3577 . G . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3578 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3579 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3580 . A . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3581 . G . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3582 . C . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3583 . A . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3584 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3585 . A . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3586 . C . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3587 . G . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3588 . G . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3589 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3590 . A . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
1 3591 . T . 28.9 .
DP=2;AF1=0;AC1=0;DP4=0,2,0,0;MQ=0;FQ=-27.7 PL
0 0 0
|