From: Petr D. <pd...@sa...> - 2014-11-25 16:44:07
|
Hi Marea, that does not look right, I can't think of a reason why it should behave this way. Can you please swith to 'bcftools merge' instead, the old vcf-merge is slow and had issues which were addressed in bcftools. The missing genotypes can be filled with 'bcftools +missing2ref' Petr On Fri, 2014-11-21 at 19:28 +0000, Marea Cobb wrote: > Hi, > > I am using vcf-merge to combine four different vcf files using the command > > vcf-merge ES041_gatk.snpEff.vcf.gz ES042_gatk.snpEff.vcf.gz ES043_gatk.snpEff.vcf.gz ES044_gatk.snpEff.vcf.gz > > It seems to be working well until it comes to multi-allelic data. The libraries with the second allele are not returning all PL values. As the example below shows, only the final PL value is being copied into the merge file. Can you please explain to me why this is occurring? In other examples, this is affecting the indicated GT of the libraries in the merge file. > > Original ES041 File: > #CHROM POS REF ALT QUAL FORMAT ES041 > Ld33_v01s1 8045 GGT G 2231.73 GT:AD:DP:GQ:PL 1/1:0,63:75:99:2269,188,0 > > Original ES043 File: > #CHROM POS REF ALT QUAL FORMAT ES043 > Ld33_v01s1 8045 GGT G 3339.73 GT:AD:DP:GQ:PL 1/1:0,99:116:99:3377,293,0 > > Original ES044 File: > #CHROM POS REF ALT QUAL FORMAT ES044 > Ld33_v01s1 8045 GGT G 752.7 GT:AD:DP:GQ:PL 1/1:0,23:29:64:790,64,0 > > Merge File: > #CHROM POS REF ALT QUAL FORMAT ES041 ES042 ES043 ES044 > Ld33_v01s1 8045 GGTG G,GG 1665.23 GT:GQ:DP:PL:AD 2/2:99:75:.,.,.,.,.,0:0,63 1/1:99:88:3059,210,0,.,.,.:0,65 2/2:99:116:.,.,.,.,.,0:0,99 2/2:64:29:.,.,.,.,.,0:0,23 > > A second question I have is why alleles that are not present in one library are represented as '.' in the merge files. I have found that when using these merge files in vcf-contrast to find novel alleles, it does not pick up these missing alleles resulting in lower numbers of novel alleles then anticipated. Would representing them as WT be worthwhile? An example is below from the same merge command. > > #CHROM POS REF ALT QUAL FORMAT ES041 ES042 ES043 ES044 > Ld33_v01s1 65 G A 541.77 GT:GQ:DP:PL:AD . 0/1:99:311:474,0,293:299,44 0/1:99:316:352,0,522:298,45 0/1:24:159:884,0,24:141,63 > > > Any help would be appreciated, > > Marea > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > _______________________________________________ > VCFtools-spec mailing list > VCF...@li... > https://lists.sourceforge.net/lists/listinfo/vcftools-spec -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |