From: Robert D. <rm...@sa...> - 2022-08-03 16:51:06
|
On Tue, 2 Aug 2022, Thomas Juettemann wrote: > I came across a "transcript-based" VCF file, meaning a variant can be > present multiple times but belonging to a different transcript. See > "FIle 1" below as an example. I am finding myself in the unfortunate > situation of having to intersect ("File 2") and retain all records > with the same position and REF/ALT ("Desired output"). > Long shot: Is that possible? Does "bcftools isec" (https://www.htslib.org/doc/bcftools.html#isec) do what you want? The "Extract and write records from A shared by both A and B using exact allele match" example in the manual page sounds like it might: bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz If not, you can't find anything else, and you only want to do a few of them, it might be possible to break out pysam and write something. If you want to do lots, then a C program would probably be the way forward - it doesn't look like it would be too difficult. Rob Davies rm...@sa... The Sanger Institute http://www.sanger.ac.uk/ Hinxton, Cambs., Tel. +44 (1223) 834244 CB10 1SA, U.K. Fax. +44 (1223) 494919 -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |