|
From: David J. <dr...@sa...> - 2012-07-16 09:39:14
|
Thank you for the clarification and the help Adam! Dave On 13 Jul 2012, at 12:52, Adam Auton wrote: > This is expected behavior. You need to add --recode-INFO-all to retain the INFO field. > > Adam > Sent from my iPhone > > On Jul 13, 2012, at 6:13 AM, David Jones <dr...@sa...> wrote: > >> Morning all, >> >> I just want to check before I submit an official bug report that this is actually a bug not an expected behaviour and I've missed something: >> >> I have a vcf file with 2 samples, where the INFO, ID, and FILTER fields are populated with two samples as shown below. >> >> <snip> >> >> ##FILTER=<ID=UM,Description="description"> >> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR >> 1 16257 ID_1 G C . UM;MQ;HSD >> DP=67;GP=7.0e-03;MP=9.9e-01;SG=CG/GG;SP=7.0e-03;TG=GG/CG;TP=9.9e-01 >> GT:AA:CA:GA:TA:PM 0|0:0:1:28:0:3.4e-02 0|1:0:7:31:0:1.8e-01 >> 1 20136 ID_2 T C . UM;MQ;HSD >> DP=92;GP=1.6e-01;MP=8.4e-01;SG=CT/TT;SP=1.6e-01;TG=TT/CT;TP=8.4e-01 >> GT:AA:CA:GA:TA:PM 0|0:0:1:0:27:3.6e-02 0|1:0:8:0:56:1.2e-01 >> 1 57999 ID_3 G T . UM;MN >> DP=25;GP=4.0e-03;MP=9.9e-01;SG=GG/TT;SP=7.3e-02;TG=GG/GT;TP=9.1e-01 >> GT:AA:CA:GA:TA:PM 0|0:0:0:15:0:0.0e+00 0|1:0:0:7:3:3.0e-01 >> 1 61219 ID_4 T C . UM;MQ >> DP=18;GP=1.7e-01;MP=8.3e-01;SG=TT/CC;SP=3.2e-01;TG=TT/CT;TP=5.1e-01 >> GT:AA:CA:GA:TA:PM 0|0:0:0:0:9:0.0e+00 0|1:0:4:0:5:4.4e-01 >> 1 62578 ID_5 G A . UM;MN;MQ DP=56;GP=2.4e-03;MP=1.0e >> +00;SG=GG/AAG;SP=3.1e-01;TG=GG/AGG;TP=6.8e-01 GT:AA:CA:GA:TA:PM 0| >> 0:2:0:35:0:5.4e-02 0|1:6:0:13:0:3.2e-01 >> 1 73841 ID_6 C T . UM >> DP=28;GP=2.6e-04;MP=8.1e-01;SG=CC/CTT;SP=3.6e-01;TG=CC/CCT;TP=4.3e-01 >> GT:AA:CA:GA:TA:PM 0|0:0:19:0:0:0.0e+00 0|1:0:6:0:3:3.3e-01 >> 1 84020 ID_7 A G . SR;RP DP=27;GP=3.2e-04;MP=1.0e >> +00;SG=AA/GGG;SP=2.2e-01;TG=AA/AGG;TP=6.1e-01 GT:AA:CA:GA:TA:PM 0| >> 0:19:0:0:0:0.0e+00 0|1:4:0:4:0:5.0e-01 >> 1 84022 ID_8 G A . SR;RP DP=29;GP=8.5e-05;MP=1.0e >> +00;SG=GG/AAA;SP=2.2e-01;TG=GG/AAG;TP=6.1e-01 GT:AA:CA:GA:TA:PM 0| >> 0:0:0:21:0:0.0e+00 0|1:4:0:4:0:5.0e-01 >> 1 84024 ID_9 A G . SR;UM;RP DP=30;GP=4.3e-05;MP=1.0e >> +00;SG=AA/GGG;SP=2.2e-01;TG=AA/AGG;TP=6.1e-01 GT:AA:CA:GA:TA:PM 0| >> 0:22:0:0:0:0.0e+00 0|1:4:0:4:0:5.0e-01 >> 1 84026 ID_10 G A . SR;UM;MN;RP >> DP=32;GP=1.8e-02;MP=9.8e-01;SG=GG/AAA;SP=3.2e-01;TG=GG/AAG;TP=5.8e-01 >> GT:AA:CA:GA:TA:PM 0|0:1:0:22:0:4.3e-02 0|1:5:0:4:0:5.6e-01 >> >> </snip> >> >> >> when I run the command: >> >> >> vcftools --vcf input.vcf --out testOut --recode --remove-filtered-all >> >> >> I seem to get, as expected, a vcf file named testOut.recode.vcf with >> only those positions that have PASS in the filter field. HOWEVER the >> data seems to have lost the INFO field and it has been replaced with a >> '.' (see below). >> >> >> <snip> >> >> >> ##FILTER=<ID=UM,Description="description"> >> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR >> 1 129151 ID_14 T G . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:0:13:0.0e+00 0| >> 1:0:0:3:9:2.5e-01 >> 1 1666175 ID_119 C T . PASS . GT:AA:CA:GA:TA:PM 0|0:0:15:0:0:0.0e+00 >> 0|1:0:20:0:4:1.7e-01 >> 1 1857336 ID_124 G A . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:20:0:0.0e+00 >> 0|1:6:0:15:0:2.9e-01 >> 1 2329409 ID_130 A G . PASS . GT:AA:CA:GA:TA:PM 0|0:41:0:0:0:0.0e+00 >> 0|1:40:0:16:0:2.9e-01 >> 1 2391122 ID_131 G T . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:10:0:0.0e+00 >> 0|1:0:0:12:3:2.0e-01 >> 1 2620133 ID_204 C G . PASS . GT:AA:CA:GA:TA:PM 0|0:0:26:0:0:0.0e+00 >> 0|1:0:20:6:0:2.3e-01 >> 1 2628561 ID_244 C T . PASS . GT:AA:CA:GA:TA:PM 0|0:0:116:0:7:5.7e-02 >> 0|1:0:146:0:15:9.3e-02 >> 1 2628576 ID_245 G T . PASS . GT:AA:CA:GA:TA:PM 0|0:3:1:77:4:4.7e-02 >> 0|1:4:1:104:9:7.6e-02 >> 1 3093904 ID_264 T A . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:0:32:0.0e+00 >> 0|1:16:0:0:29:3.6e-01 >> 1 3802432 ID_271 T G . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:0:22:0.0e+00 >> 0|1:0:0:8:26:2.4e-01 >> >> >> </snip> >> >> >> Is this an expected behaviour or a bug? >> >> >> Dave >> >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE. >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Vcftools-help mailing list >> Vcf...@li... >> https://lists.sourceforge.net/lists/listinfo/vcftools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE. |