From: Mitch S. <mit...@be...> - 2009-09-29 20:39:05
|
On 09/29/2009 08:08 AM, Brenton Graveley wrote: > I think this would be one option, though it would be best to retain > the ability to turn individual tracks on or off. For simplicity sake, > there are times when viewing 12 embryonic tracks gets overwhelming and > having just 1 or 2 is sufficient to compare embryos to adults, for > instance. Random thought: for this particular use case, would it be better to have a view that aggregates all the embryonic timepoints (e.g., showing the mean, or the min/mean/max)? We could implement a one-button toggle between the all-timepoints summary and per-timepoint view. Thanks for providing a specific use case here. > Disk space is starting to become an issue - I recently started making > the wiggle track images at half the default height in order to save on > some disk space (I haven't calculated or compared the size difference, > but assumed there would be). For the fly genome, at 150 MB, it isn't > so bad, but tracks for the human genome are much larger and it is much > more of an issue. In my testing in linux (with the ext3 filesystem) filesystem overhead was about half of the total disk usage (in linux, you can measure this by comparing the output of "du" and "du --apparent-size"). In my tests, if a file was smaller than 4k, then it would take up the whole 4k (which is the default block size in ext3). And I'd guess that many of your images are close to or below that threshold. So I think aggregating the images could really help (especially given the PNG compression). In my testing so far, the images (total for all zoom levels) have been taking about 5 and a half bytes per base (~2.5 bytes for the image data, the rest filesystem overhead). That compares with GBrowse's 1 byte per data point storage method; it should be possible (maybe with a bit of tinkering) to plug GBrowse's wiggle image generation into JBrowse. The tradeoff would be that GBrowse installation would add complexity, and it would take (I think) significantly more CPU when people browse the data. JBrowse usually chooses speed over space usage whenever there's a computation/storage tradeoff, and I think that's the right choice in general, especially if you want to serve lots of users. But in your case maybe that choice isn't appropriate. Do you usually have one data point per base across the entire genome, or is it less dense? I thought you were making shorter images to save on screen space, which is something that I was also concerned about when I looked at your JBrowse installation. I did think that your JBrowse installation was pretty awesome, though. Mitch |