|
From: Jonathan H. <jw...@sa...> - 2012-03-16 17:05:30
|
Hi All, First I will apologise if this has already been brought up on the list - I had a quick look through the archives and couldnt see any related topics. I am currently attempting to convert some of our pindel variant call data into VCF. Pindel generates a number of different call types including 'Deletions', 'Insertions' and 'Complex'. In this case complex calls manifest themselves as a deletion AND insertion at the same location - or another way to look at it, a region of reference sequence that is replaced by a novel chunk of sequence. While deletions and insertions are well covered in the VCF documentation I am unsure how to proceed in the case of such complex events. My initial idea was to set include the deleted region under ref and include the inserted sequence under alt: #CHROM POS ID REF ALT QUAL ...... 6 1396302 . TCACGCTG TAACAC . . ...... My alternative would be to represent the event in two components and link them together using the id field : #CHROM POS ID REF ALT QUAL ...... 6 1396302 1 TCACGCTG T . . ...... 6 1396302 1 T TAACAC . . ...... The third would be to use the Structural Variation notation to represent a breakpoint with inserted sequence and eroded bases but this seems very heavy weight. I would prefer the former over the latter two as there is less repetition and I believe is more human readable. However, I as I am unsure what is the accepted convention I thought I would throw it out into the open and see how other people are representing variants like this. Many thanks in advance, Jon -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE. |