|
From: David J. <dr...@sa...> - 2012-07-13 10:13:27
|
Morning all, I just want to check before I submit an official bug report that this is actually a bug not an expected behaviour and I've missed something: I have a vcf file with 2 samples, where the INFO, ID, and FILTER fields are populated with two samples as shown below. <snip> ##FILTER=<ID=UM,Description="description"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR 1 16257 ID_1 G C . UM;MQ;HSD DP=67;GP=7.0e-03;MP=9.9e-01;SG=CG/GG;SP=7.0e-03;TG=GG/CG;TP=9.9e-01 GT:AA:CA:GA:TA:PM 0|0:0:1:28:0:3.4e-02 0|1:0:7:31:0:1.8e-01 1 20136 ID_2 T C . UM;MQ;HSD DP=92;GP=1.6e-01;MP=8.4e-01;SG=CT/TT;SP=1.6e-01;TG=TT/CT;TP=8.4e-01 GT:AA:CA:GA:TA:PM 0|0:0:1:0:27:3.6e-02 0|1:0:8:0:56:1.2e-01 1 57999 ID_3 G T . UM;MN DP=25;GP=4.0e-03;MP=9.9e-01;SG=GG/TT;SP=7.3e-02;TG=GG/GT;TP=9.1e-01 GT:AA:CA:GA:TA:PM 0|0:0:0:15:0:0.0e+00 0|1:0:0:7:3:3.0e-01 1 61219 ID_4 T C . UM;MQ DP=18;GP=1.7e-01;MP=8.3e-01;SG=TT/CC;SP=3.2e-01;TG=TT/CT;TP=5.1e-01 GT:AA:CA:GA:TA:PM 0|0:0:0:0:9:0.0e+00 0|1:0:4:0:5:4.4e-01 1 62578 ID_5 G A . UM;MN;MQ DP=56;GP=2.4e-03;MP=1.0e +00;SG=GG/AAG;SP=3.1e-01;TG=GG/AGG;TP=6.8e-01 GT:AA:CA:GA:TA:PM 0| 0:2:0:35:0:5.4e-02 0|1:6:0:13:0:3.2e-01 1 73841 ID_6 C T . UM DP=28;GP=2.6e-04;MP=8.1e-01;SG=CC/CTT;SP=3.6e-01;TG=CC/CCT;TP=4.3e-01 GT:AA:CA:GA:TA:PM 0|0:0:19:0:0:0.0e+00 0|1:0:6:0:3:3.3e-01 1 84020 ID_7 A G . SR;RP DP=27;GP=3.2e-04;MP=1.0e +00;SG=AA/GGG;SP=2.2e-01;TG=AA/AGG;TP=6.1e-01 GT:AA:CA:GA:TA:PM 0| 0:19:0:0:0:0.0e+00 0|1:4:0:4:0:5.0e-01 1 84022 ID_8 G A . SR;RP DP=29;GP=8.5e-05;MP=1.0e +00;SG=GG/AAA;SP=2.2e-01;TG=GG/AAG;TP=6.1e-01 GT:AA:CA:GA:TA:PM 0| 0:0:0:21:0:0.0e+00 0|1:4:0:4:0:5.0e-01 1 84024 ID_9 A G . SR;UM;RP DP=30;GP=4.3e-05;MP=1.0e +00;SG=AA/GGG;SP=2.2e-01;TG=AA/AGG;TP=6.1e-01 GT:AA:CA:GA:TA:PM 0| 0:22:0:0:0:0.0e+00 0|1:4:0:4:0:5.0e-01 1 84026 ID_10 G A . SR;UM;MN;RP DP=32;GP=1.8e-02;MP=9.8e-01;SG=GG/AAA;SP=3.2e-01;TG=GG/AAG;TP=5.8e-01 GT:AA:CA:GA:TA:PM 0|0:1:0:22:0:4.3e-02 0|1:5:0:4:0:5.6e-01 </snip> when I run the command: vcftools --vcf input.vcf --out testOut --recode --remove-filtered-all I seem to get, as expected, a vcf file named testOut.recode.vcf with only those positions that have PASS in the filter field. HOWEVER the data seems to have lost the INFO field and it has been replaced with a '.' (see below). <snip> ##FILTER=<ID=UM,Description="description"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR 1 129151 ID_14 T G . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:0:13:0.0e+00 0| 1:0:0:3:9:2.5e-01 1 1666175 ID_119 C T . PASS . GT:AA:CA:GA:TA:PM 0|0:0:15:0:0:0.0e+00 0|1:0:20:0:4:1.7e-01 1 1857336 ID_124 G A . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:20:0:0.0e+00 0|1:6:0:15:0:2.9e-01 1 2329409 ID_130 A G . PASS . GT:AA:CA:GA:TA:PM 0|0:41:0:0:0:0.0e+00 0|1:40:0:16:0:2.9e-01 1 2391122 ID_131 G T . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:10:0:0.0e+00 0|1:0:0:12:3:2.0e-01 1 2620133 ID_204 C G . PASS . GT:AA:CA:GA:TA:PM 0|0:0:26:0:0:0.0e+00 0|1:0:20:6:0:2.3e-01 1 2628561 ID_244 C T . PASS . GT:AA:CA:GA:TA:PM 0|0:0:116:0:7:5.7e-02 0|1:0:146:0:15:9.3e-02 1 2628576 ID_245 G T . PASS . GT:AA:CA:GA:TA:PM 0|0:3:1:77:4:4.7e-02 0|1:4:1:104:9:7.6e-02 1 3093904 ID_264 T A . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:0:32:0.0e+00 0|1:16:0:0:29:3.6e-01 1 3802432 ID_271 T G . PASS . GT:AA:CA:GA:TA:PM 0|0:0:0:0:22:0.0e+00 0|1:0:0:8:26:2.4e-01 </snip> Is this an expected behaviour or a bug? Dave -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE. |