|
From: jkb <jk...@sa...> - 2015-08-12 21:13:13
|
On 2015-08-12 17:20, David Roazen wrote: > htsjdk (https://github.com/samtools/htsjdk/ [1]), and software built on top of it such as the GATK, currently only support BAI indices with CRAM files. It also appears likely that when CRAI support is added to htsjdk, it will be implemented internally as an on-the-fly CRAI -> BAI conversion. > > So, there is a bit of a need right now for samtools index -b <cram_file> to honor the "-b" option and create a BAI index on a CRAM file. Or a need for GATK to start supporting CRAI. I'm not yet convinced that abusing BAI to index something with a non-32k (or is it 64k?) block size format is the best way to go. Admittedly though I haven't investigated how it works as initially I just assumed it would be impossible. CRAM isn't a bgzf stream so I didn't explore further. I think it does something like treat containers as reads instead, but I'm not sure on practically what that means for index efficiency. Links: ------ [1] https://github.com/samtools/htsjdk/ -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |