From: Thomas J. <jue...@gm...> - 2022-08-04 07:44:21
|
Hi Rob, Thanks for looking into it. Unfortunately isec keeps only the first record. Digging through Github, it seems it is a known limitation: https://github.com/samtools/bcftools/issues/665#issuecomment-323372893 Best, Thomas On Wed, 3 Aug 2022 at 17:50, Robert Davies <rm...@sa...> wrote: > > On Tue, 2 Aug 2022, Thomas Juettemann wrote: > > > I came across a "transcript-based" VCF file, meaning a variant can be > > present multiple times but belonging to a different transcript. See > > "FIle 1" below as an example. I am finding myself in the unfortunate > > situation of having to intersect ("File 2") and retain all records > > with the same position and REF/ALT ("Desired output"). > > Long shot: Is that possible? > > Does "bcftools isec" (https://www.htslib.org/doc/bcftools.html#isec) do > what you want? The "Extract and write records from A shared by both A and > B using exact allele match" example in the manual page sounds like it > might: > > bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz > > If not, you can't find anything else, and you only want to do a few of > them, it might be possible to break out pysam and write something. If you > want to do lots, then a C program would probably be the way forward - it > doesn't look like it would be too difficult. > > Rob Davies rm...@sa... > The Sanger Institute http://www.sanger.ac.uk/ > Hinxton, Cambs., Tel. +44 (1223) 834244 > CB10 1SA, U.K. Fax. +44 (1223) 494919 > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. |