From: Mitch S. <mit...@be...> - 2007-03-23 01:27:16
|
I've put up Dmel 3R here: http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html It rendered in less than 8 hours on one CPU, and the tiles and HTML take up around 13 gigabytes on disk. Memory usage maxed out at 342 MB. The 3R chromosome arm is the largest Dmel one at 28 Mb, or about 20 times the size of chromosome 4. So to do human we just have to scale another order of magnitude; it's CPU bound and we've got a fair amount of CPU to throw at it, but if memory usage grows linearly with the amount of sequence then we may start running into problems. We could do some more work on rendering on demand, and that would help, but you've still got to have all the features in memory at the same time to do layout, so we may not be able to reduce max memory usage very much that way. Maybe we can partition the layout job if there are empty regions of the chromosome. One problem with the current approach is that the HTML files get quite large for the lower (more zoomed-out) zoom levels. For the Genes track the entire_landmark HTML file is 1.6 megabytes; for CDS it's 7.7 megabytes. The HTML for entire_landmark for all tracks totals 24 megabytes, which is how much you would have to download to actually view the entire_landmark zoom level. Maybe we should have some kind of limit above which we turn off the HTML. Also, if we had a way of lazily loading feature information then the HTML could be a lot smaller. With a large number of features the browser starts to slow down; for me this starts to be noticeable around the 1Mbp zoom level on Windows. On other platforms firefox starts to bog down at higher (less dense) zoom levels. Again, if we turn off the html above some feature density threshold this should be less of a problem. Mitch |
From: Ian H. <ih...@be...> - 2007-03-23 03:50:59
|
Mitch, > I've put up Dmel 3R here: > http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html cool -- looks good (a bit slow to load but maybe others are looking at it right now...? could also be the ginormous html) > It rendered in less than 8 hours on one CPU, and the tiles and HTML take > up around 13 gigabytes on disk. Memory usage maxed out at 342 MB. The > 3R chromosome arm is the largest Dmel one at 28 Mb, or about 20 times > the size of chromosome 4. So to do human we just have to scale another > order of magnitude; it's CPU bound and we've got a fair amount of CPU to > throw at it, but if memory usage grows linearly with the amount of > sequence then we may start running into problems. as a web service we can easily afford a few 4Gb-RAM machines or even 64Gb. disk space, similarly, is cheap - for us. as an app that can be downloaded and installed by anyone (i.e. like the existing GBrowse), you're right: this narrows the user base a bit. still i'm not too worried about it yet. > We could do some more > work on rendering on demand, and that would help, but you've still got > to have all the features in memory at the same time to do layout, so we > may not be able to reduce max memory usage very much that way. Maybe we > can partition the layout job if there are empty regions of the chromosome. again.. not worried yet. maybe we can start thinking idly about bounded-space layout algorithms. > One problem with the current approach is that the HTML files get quite > large for the lower (more zoomed-out) zoom levels. For the Genes track > the entire_landmark HTML file is 1.6 megabytes; for CDS it's 7.7 > megabytes. The HTML for entire_landmark for all tracks totals 24 > megabytes, which is how much you would have to download to actually view > the entire_landmark zoom level. Maybe we should have some kind of limit > above which we turn off the HTML. Also, if we had a way of lazily > loading feature information then the HTML could be a lot smaller. mmm, this is a bit more scary. it does make lazy feature-info-loading look appealing. > With a large number of features the browser starts to slow down; for me > this starts to be noticeable around the 1Mbp zoom level on Windows. On > other platforms firefox starts to bog down at higher (less dense) zoom > levels. Again, if we turn off the html above some feature density > threshold this should be less of a problem. i suspect that other people (probably some on this list, or the other GMOD lists) have thought quite hard about this kind of issue (collapsing features, density thresholds, etc) and we'd benefit from consulting with them. ian > > Mitch > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax |
From: Mitch S. <mit...@be...> - 2007-03-23 23:37:00
|
Ian Holmes wrote: > i suspect that other people (probably some on this list, or the other > GMOD lists) have thought quite hard about this kind of issue > (collapsing features, density thresholds, etc) and we'd benefit from > consulting with them. Well, the GBrowse code that we're using certainly has facilities for dealing with high feature densities; right now the generate-tiles.pl code for turning off labels looks like this: # if the average number of features per tile is # less than $feature_thresh, we print labels if (($#features / $num_tiles) < $feature_thresh) { $track->configure(-bump => 1, -label => 1, -description => 1); } else { $track->configure(-bump => 1, -label => 0, -description => 0); } where $feature_thresh is set using the new -f command line option (default: 50, which is what I used to render 3R). We could certainly add a second threshold above which we turn off bumping; I'm not a big fan of the non-bumped view since I don't think it's as useful as a density plot but I'd certainly be interested in hearing arguments for it. Glyph also has a bump_limit option that I should try out at some point. Right now this is how flybase deals with dense views: ==== /Detailed view is limited to 1 Mbp. Click in the overview to select a region 20 kbp wide. /==== so we're not covering terribly heavily trodden ground by exploring different alternatives. This ties in with what Andrew said: > Yeah, we definately need some sort of heuristic where we compact > features above some threshold. Suzi's suggestion to turn them into a > feature density plot or some such was a good idea, but would require a > bit of work... maybe for now we can just drop the smaller features > out, or turn them off completely (e.g. how GBrowse stops displaying > features above the 1MB zoom level) or do something smarter than that, > where we have a "max track height" above which feature display would > shut off. I'm not too concerned with the amount of graphical space that the features take up (especially since it's easy to turn tracks on and off), although I guess that as we try to do larger chromosomes the entire_landmark zoom level will start to get quite tall. The main problem in my view is the number of imagemap regions that get created for dense tracks. Even if we load feature info lazily there's a minimum bound on the size of the html required to define those imagemap regions. And the browser starts to have problems when we create large numbers of them. If we turn off the HTML above some threshold then those problems goes away. IMO the question is whether or not the graphical view is useful even when you can't click on things, and I think the answer is "yes", but it would be confusing for the user to be able to click on features at some zoom levels but not others. Mitch |
From: Andrew U. <and...@gm...> - 2007-03-25 01:29:23
|
On 3/23/07, Mitch Skinner <mit...@be...> wrote: [snip] > I'm not too concerned with the amount of graphical space that the > features take up (especially since it's easy to turn tracks on and off), > although I guess that as we try to do larger chromosomes the > entire_landmark zoom level will start to get quite tall. Hmm... I think that, above some point, a single track will get so tall (>1000px or so) as to render it worthless on any monitor. I just don't see anyone finding something so feature-dense useful, but I might be wrong. I think the Dmel demo suffers from this severely at low zoom levels. Maybe we should allow users to specify max track height in pixels when they do track upload? When in doubt, just let them set their own parameter and pass it to generate_tiles/Bio::Graphics > The main > problem in my view is the number of imagemap regions that get created > for dense tracks. Even if we load feature info lazily there's a minimum > bound on the size of the html required to define those imagemap > regions. And the browser starts to have problems when we create large > numbers of them. > > If we turn off the HTML above some threshold then those problems goes > away. IMO the question is whether or not the graphical view is useful > even when you can't click on things, and I think the answer is "yes", > but it would be confusing for the user to be able to click on features > at some zoom levels but not others. Yeah, it would definately still be useful. When you say "confusing," do you mean confusing to implement or confusing to the user? The user can be just notified with a little warning at the top of the page or on the track... kind of like GBrowse warns you currently. |
From: Mitch S. <mi...@ar...> - 2007-03-26 21:36:47
|
Andrew Uzilov wrote: > Yeah, it would definately still be useful. When you say "confusing," > do you mean confusing to implement or confusing to the user? The user > can be just notified with a little warning at the top of the page or > on the track... kind of like GBrowse warns you currently. > I meant confusing for the user; we could have a warning but I think it could be easy to miss (or annoying, if we made it really in-your-face). One of the things I like about the feature density plot is that it's obvious that there's nothing to click on. Mitch |
From: Mitch S. <mit...@be...> - 2007-03-27 22:21:04
|
I wrote: > One of the things I like about the feature density plot is that it's > obvious that there's nothing to click on. As an experiment I implemented this: http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html On the conference call yesterday Lincoln said that there was some feature density functionality already in GBrowse, and I see some code for it in Bio::DB::GFF but AFAICS it's not implemented for Bio::DB::SeqFeature, so I did my own version. The stuff in Bio::DB::GFF is per-datasource (there's some SQL to count the number of features in each histogram bin) and that's more efficient than retrieving all the features and sorting them into bins, but since we have to get all of the features anyway a generic implementation doesn't cost us much. Point being, I don't think I'm duplicating effort by doing things this way. Also (at Lincoln's suggestion) I made it so it doesn't generate imagemap regions for 1-pixel wide features. I'm a little concerned that this might be confusing, since you can mouseover some features but not others, but then again I don't think it's too likely that people will try to mouseover 1-pixel targets but will probably try to zoom in some more before trying it. I'd be interested to hear opinions on this--try looking at the exon track at 500kbp. I should try to measure how much this saves on HTML size. Unfortunately, the scale is currently getting rendered off-tile, and I think the only way to change this is to add some padding on the side, which will throw off a lot of other position calculations (on both the client and server side). The Right Thing to do in this case is have special client-side support for always showing the scale at the edge of the view, so trying to add padding is not only problematic but it doesn't really solve the problem either. Point being, when judging the usefulness of the feature density histogram, imagine that there's a scale on the side. Mitch |
From: Ian H. <ih...@be...> - 2007-03-28 00:09:41
|
Mitch Skinner wrote: > As an experiment I implemented this: > http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html I like it.. > Unfortunately, the scale is currently getting rendered off-tile, and I > think the only way to change this is to add some padding on the side, > which will throw off a lot of other position calculations (on both the > client and server side). The Right Thing to do in this case is have > special client-side support for always showing the scale at the edge of > the view, At some point we should allow the black/grey stripes on the left-hand side to be replaced with arbitrary images, I guess. Ian |
From: Andrew U. <and...@gm...> - 2007-03-23 03:54:31
|
Good work improving the rendering time! It's great to see we can do a large chromosome in a good amount of time. Yeah, we definately need some sort of heuristic where we compact features above some threshold. Suzi's suggestion to turn them into a feature density plot or some such was a good idea, but would require a bit of work... maybe for now we can just drop the smaller features out, or turn them off completely (e.g. how GBrowse stops displaying features above the 1MB zoom level) or do something smarter than that, where we have a "max track height" above which feature display would shut off. Andrew On 3/22/07, Mitch Skinner <mit...@be...> wrote: > I've put up Dmel 3R here: > http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html > > It rendered in less than 8 hours on one CPU, and the tiles and HTML take > up around 13 gigabytes on disk. Memory usage maxed out at 342 MB. The > 3R chromosome arm is the largest Dmel one at 28 Mb, or about 20 times > the size of chromosome 4. So to do human we just have to scale another > order of magnitude; it's CPU bound and we've got a fair amount of CPU to > throw at it, but if memory usage grows linearly with the amount of > sequence then we may start running into problems. We could do some more > work on rendering on demand, and that would help, but you've still got > to have all the features in memory at the same time to do layout, so we > may not be able to reduce max memory usage very much that way. Maybe we > can partition the layout job if there are empty regions of the chromosome. > > One problem with the current approach is that the HTML files get quite > large for the lower (more zoomed-out) zoom levels. For the Genes track > the entire_landmark HTML file is 1.6 megabytes; for CDS it's 7.7 > megabytes. The HTML for entire_landmark for all tracks totals 24 > megabytes, which is how much you would have to download to actually view > the entire_landmark zoom level. Maybe we should have some kind of limit > above which we turn off the HTML. Also, if we had a way of lazily > loading feature information then the HTML could be a lot smaller. > > With a large number of features the browser starts to slow down; for me > this starts to be noticeable around the 1Mbp zoom level on Windows. On > other platforms firefox starts to bog down at higher (less dense) zoom > levels. Again, if we turn off the html above some feature density > threshold this should be less of a problem. > > Mitch > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > |
From: Chris M. <cj...@fr...> - 2007-03-23 18:29:30
|
great to see a real dmel Chromosome up! so how hard would the lazy-feature-info be? (still getting the oldPrimaryDivNum error but I expect you're aware of that one)... On Mar 23, 2007, at 3:54 AM, Andrew Uzilov wrote: > Good work improving the rendering time! It's great to see we can do a > large chromosome in a good amount of time. > > Yeah, we definately need some sort of heuristic where we compact > features above some threshold. Suzi's suggestion to turn them into a > feature density plot or some such was a good idea, but would require a > bit of work... maybe for now we can just drop the smaller features > out, or turn them off completely (e.g. how GBrowse stops displaying > features above the 1MB zoom level) or do something smarter than that, > where we have a "max track height" above which feature display would > shut off. > > Andrew > > On 3/22/07, Mitch Skinner <mit...@be...> wrote: >> I've put up Dmel 3R here: >> http://genome.biowiki.org/gbrowse/dmel51/prototype_gbrowse.html >> >> It rendered in less than 8 hours on one CPU, and the tiles and >> HTML take >> up around 13 gigabytes on disk. Memory usage maxed out at 342 >> MB. The >> 3R chromosome arm is the largest Dmel one at 28 Mb, or about 20 times >> the size of chromosome 4. So to do human we just have to scale >> another >> order of magnitude; it's CPU bound and we've got a fair amount of >> CPU to >> throw at it, but if memory usage grows linearly with the amount of >> sequence then we may start running into problems. We could do >> some more >> work on rendering on demand, and that would help, but you've still >> got >> to have all the features in memory at the same time to do layout, >> so we >> may not be able to reduce max memory usage very much that way. >> Maybe we >> can partition the layout job if there are empty regions of the >> chromosome. >> >> One problem with the current approach is that the HTML files get >> quite >> large for the lower (more zoomed-out) zoom levels. For the Genes >> track >> the entire_landmark HTML file is 1.6 megabytes; for CDS it's 7.7 >> megabytes. The HTML for entire_landmark for all tracks totals 24 >> megabytes, which is how much you would have to download to >> actually view >> the entire_landmark zoom level. Maybe we should have some kind of >> limit >> above which we turn off the HTML. Also, if we had a way of lazily >> loading feature information then the HTML could be a lot smaller. >> >> With a large number of features the browser starts to slow down; >> for me >> this starts to be noticeable around the 1Mbp zoom level on >> Windows. On >> other platforms firefox starts to bog down at higher (less dense) >> zoom >> levels. Again, if we turn off the html above some feature density >> threshold this should be less of a problem. >> >> Mitch >> >> --------------------------------------------------------------------- >> ---- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to >> share your >> opinions on IT & business topics through brief surveys-and earn cash >> http://www.techsay.com/default.php? >> page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Gmod-ajax mailing list >> Gmo...@li... >> https://lists.sourceforge.net/lists/listinfo/gmod-ajax >> > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax > |