Re: [Gmod-ajax] JBrowse track groupings

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 09/29/2009 08:08 AM, Brenton Graveley wrote:
> I think this would be one option, though it would be best to retain 
> the ability to turn individual tracks on or off.  For simplicity sake, 
> there are times when viewing 12 embryonic tracks gets overwhelming and 
> having just 1 or 2 is sufficient to compare embryos to adults, for 
> instance.

Random thought: for this particular use case, would it be better to have 
a view that aggregates all the embryonic timepoints (e.g., showing the 
mean, or the min/mean/max)?  We could implement a one-button toggle 
between the all-timepoints summary and per-timepoint view.

Thanks for providing a specific use case here.

> Disk space is starting to become an issue - I recently started making 
> the wiggle track images at half the default height in order to save on 
> some disk space (I haven't calculated or compared the size difference, 
> but assumed there would be).  For the fly genome, at 150 MB, it isn't 
> so bad, but tracks for the human genome are much larger and it is much 
> more of an issue.

In my testing in linux (with the ext3 filesystem) filesystem overhead 
was about half of the total disk usage (in linux, you can measure this 
by comparing the output of "du" and "du --apparent-size").  In my tests, 
if a file was smaller than 4k, then it would take up the whole 4k (which 
is the default block size in ext3).  And I'd guess that many of your 
images are close to or below that threshold.  So I think aggregating the 
images could really help (especially given the PNG compression).

In my testing so far, the images (total for all zoom levels) have been 
taking about 5 and a half bytes per base (~2.5 bytes for the image data, 
the rest filesystem overhead).  That compares with GBrowse's 1 byte per 
data point storage method; it should be possible (maybe with a bit of 
tinkering) to plug GBrowse's wiggle image generation into JBrowse.  The 
tradeoff would be that GBrowse installation would add complexity, and it 
would take (I think) significantly more CPU when people browse the 
data.  JBrowse usually chooses speed over space usage whenever there's a 
computation/storage tradeoff, and I think that's the right choice in 
general, especially if you want to serve lots of users.  But in your 
case maybe that choice isn't appropriate.

Do you usually have one data point per base across the entire genome, or 
is it less dense?

I thought you were making shorter images to save on screen space, which 
is something that I was also concerned about when I looked at your 
JBrowse installation.  I did think that your JBrowse installation was 
pretty awesome, though.

Mitch