|
From: Dan V. <da...@gm...> - 2014-12-01 05:09:51
|
Shouldn't that be handled by http? i.e if IGV sets "Accept-encoding: gzip" in the request for the bai file, then the remote http server is free to respond with "Content-encoding: gzip" and compress the response. On Sun, Nov 30, 2014 at 10:01 PM Jim Robinson <jro...@br...> wrote: > Hi, I just pushed a simple change to IGV that should help with the > original problem. IGV will now accept a remote gzipped index file, > with naming convention http://..../foo.bam.bai.gz. The index should be > gzipped, not "bgzipped". IGV will check for presence of the .gz file > first, so that both gzipped and non-gzipped bai files can coexist. > > This only works with remote resources (http & https), not local files. > > Jim > > > Petr is right that keeping the virtual offsets in CSI is > overcomplicated. We should not go there. > > > > In general, there is a tension between the index resolution > (proportional to the index size) and the amount of data read per query. > When we have lower resolution, we have to linearly read more data before > reaching the data slot we want to retrieve. IIRC, James showed that we need > to read more data with the CRAM index. I doubt there is a universally > better solution. > > > > Heng > > > > PS: I acknowledge that the BAM index frequently brings troubles > especially for remote access. The BigBED integrated index is more friendly > and advanced in this aspect. Jim et al had remote access in mind when they > designed BigBED. The BAM remote access is an afterthought suggested by > Lincoln. > > > > On Nov 10, 2014, at 10:55, Vadim Zalunin <va...@eb...> wrote: > > > >> On 10/11/2014 15:34, Petr Danecek wrote: > >>> Hello Dan, > >>> > >>> sorry for the delay in responding. > >>> > >>> The compression of CSI makes the indexing of the index more difficult. > >>> We'd need to keep mappings to compressed blocks and to the uncompressed > >>> offsets within the blocks. This needs to be relative to the index > >>> header, because its compressed size is unknown at the time of writing > >>> the body. > >>> > >>> It is doable, but it increases the complexity and we are not sure if it > >>> is worth it: For high coverage data the CSI(v1) index is about 6.5MB, > >>> which is comparable to the size of a 100kbp bam data chunk, and is a > >>> negligible fraction of the whole BAM file. I don't know what is the > >>> typical access pattern in genome viewers; how does the size of the > >>> transferred data compare to the size of the index? > >>> > >>> Also we expect CRAM to start replacing BAM soon, so this probably > >>> should become less of a problem in near future. > >> Index file size for a CRAM file should be smaller than for a > >> corresponding BAM file because each container/slice is used as a > >> contiguous read, leaving the rest to the iterators. This is equivalent > >> to indexing every 10k (by default) reads as one long read. I wonder if > >> reduced resolution could be used for BAM files as well. This should not > >> affect readers if I'm not mistaken. > >> > >> Vadim > >>> Best wishes, > >>> Petr > >>> > >>> > >>> > >>> On Tue, 2014-11-04 at 13:50 -0500, Dan Vanderkam wrote: > >>>> A 4.2M file is an improvement, but still quite large to pull in while > >>>> loading a visualization on a web page. > >>>> > >>>> > >>>> re: Heng's comment about CSI, it would be great if CSI included a list > >>>> of virtual offsets for each chromosome at the start of the file. This > >>>> would work best if the length of the index index (in bytes) were > >>>> encoded in the header before it. This would support the following > >>>> access pattern: > >>>> > >>>> > >>>> 1. HTTP request for first 8 bytes (to get index index length) > >>>> 2. HTTP request for the full index index > >>>> > >>>> > >>>> or, as an optimization > >>>> > >>>> > >>>> 1. HTTP request for the first, say, 64k (to hopefully grab the full > >>>> index index) > >>>> 2. HTTP request for the rest of the index index (if it's longer than > >>>> 64k) > >>>> > >>>> > >>>> Prefixing structured fields with their length in bytes is quite common > >>>> in binary formats, e.g. in the google protocol buffer wire format. > >>>> > >>>> > >>>> - Dan > >>>> > >>>> On Wed, Oct 29, 2014 at 9:43 AM, John Marshall <jm...@sa...> > >>>> wrote: > >>>> On 24 Oct 2014, at 19:54, Dan Vanderkam <da...@gm...> > >>>> wrote: > >>>>> My group's BAI files have gotten quite large (10+MB) and are > >>>> proving to be a bottleneck when loading interactive > >>>> visualizations like IGV or BioDalliance. Downloading a 10MB > >>>> file takes many seconds, during which time the visualization > >>>> can't display anything. > >>>> [...] > >>>>> - Does CSI (instead of BAI) help with this? > >>>> As Heng just mentioned in passing, CSI is compressed while > BAI > >>>> is not. For example, for a 42G BAM file I just compared, the > >>>> CSI index is half the size of the BAM index (4.2M v. 8.4M). > >>>> > >>>> John > >>>> > >>>> -- > >>>> The Wellcome Trust Sanger Institute is operated by Genome > >>>> Research > >>>> Limited, a charity registered in England with number 1021457 > >>>> and a > >>>> company registered in England with number 2742969, whose > >>>> registered > >>>> office is 215 Euston Road, London, NW1 2BE. > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------ > ------------------ > >>>> _______________________________________________ > >>>> Samtools-devel mailing list > >>>> Sam...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/samtools-devel > >>> > >>> > >>> > >> > >> -- > >> Vadim Zalunin > >> European Bioinformatics Institute (EMBL-EBI) > >> European Molecular Biology Laboratory > >> Wellcome Trust Genome Campus > >> Hinxton > >> Cambridge CB10 1SD > >> United Kingdom > >> Tel: + 44 (0) 1223 494 614 > >> Fax: + 44 (0) 1223 494 468 > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> _______________________________________________ > >> Samtools-devel mailing list > >> Sam...@li... > >> https://lists.sourceforge.net/lists/listinfo/samtools-devel > > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > > Samtools-devel mailing list > > Sam...@li... > > https://lists.sourceforge.net/lists/listinfo/samtools-devel > > |