From: Mitch S. <li...@ar...> - 2007-02-01 18:21:54
|
On Thu, 2007-02-01 at 18:56 +1000, Aaron Darling wrote: > I really liked the demo at http://genome.biowiki.org/ > Would it be possible for my friends at Univ. of Wisconsin to set up their > own instance and load mammalian genome data into it? The code is in the gmod CVS on sourceforge in gmod/Generic-Genome-Browser/ajax. Beyond that, Andrew wrote up how to install the pre-rendering stuff here: http://biowiki.org/twiki/bin/view/GBrowse/InstallTileRendering The main obstacle at this point would be that rendering a human or mouse genome would take a long time. I've been working on speeding it up, but I haven't checked that stuff into CVS yet, as we've still not made a decision about whether or not we need the graphics primitive database, which I think may be the wrong approach. The graphics primitive database stores all of the drawing commands (like line, rectangle, text) so they can be replayed later in chunks. I think (but haven't yet shown) that we can do it all by storing graphics primitives in memory, which is more than an order of magnitude faster. I've appended the full discussion below. Maybe Aaron can help us parallelize the rendering (MPI-tilerendering?) :-) I (and Andrew?) will be exploring the options and taking some measurements in the near future. Rendering some larger chromosomes is pretty high on my to-do list, and then we'll have a better idea of how to move forward. I do think being able to handle large chromosomes is important functionality to demo, right up there with search and community annotation (Ian, Andrew: agree or disagree?). Mitch Reasons for the primitive DB: 1. break up genome by pixel coordinates into smaller chunks for rendering--takes less RAM per chunk/can be parallelized. On the other hand, we may be able to break up the genome accurately without precomputing and storing all of the primitives. The main concern here is that it's sometimes difficult to predict the pixel span of a feature from its genomic span (especially with text labels). One option is to do the pixel-level layout of the entire track in each rendering job but only store/render the primitives that overlap the part of the track that's currently being rendered. Having each chunk recompute the full layout is redundant work, but I'm pretty sure that recomputing the layout is faster than fetching all the primitives from the database. BTW: this is one of the reasons to explore client-side labels. 2. if a small part of a track changes we may be able to save work by re-rendering only a small area. As far as I know we're not yet storing which feature each primitive is for, so it's difficult to invalidate the right primitives. Plus, adding a new feature to a track can potentially cause a large portion of the track to change (if it has to be laid out ("bumped") again). And at this point I'm not sure how much we need to support individual feature creation/editing. 3. doing only the layout/database fill work up front and then rendering tiles from the database on demand reduces the amount of time people have to wait between uploading their annotations and seeing them in the browser. On the other hand, we may be able to do a full in-memory pre-rendering in less time than it takes to fill the database. Am I missing anything here? |