|
From: James B. <jk...@sa...> - 2014-11-10 17:26:22
|
On Mon, Nov 10, 2014 at 03:55:44PM +0000, Vadim Zalunin wrote:
> Index file size for a CRAM file should be smaller than for a
> corresponding BAM file because each container/slice is used as a
> contiguous read, leaving the rest to the iterators. This is equivalent
> to indexing every 10k (by default) reads as one long read. I wonder if
> reduced resolution could be used for BAM files as well. This should not
> affect readers if I'm not mistaken.
We discussed this at our weekly meeting today and concluded that
skipping some bins from indexing is a comparable thing and possibly
not that hard to do, but we are unsure of whether it is necessary.
CSI has a variety of options controlling granularity already, so
perhaps just exposing these to the user is sufficient. Plus it is
already a lot smaller than BAI anyway due to compression.
James
--
James Bonfield (jk...@sa...) | Hora aderat briligi. Nunc et Slythia Tova
| Plurima gyrabant gymbolitare vabo;
A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|