Re: [Gmod-ajax] Synteny

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 2007-02-01 at 18:56 +1000, Aaron Darling wrote:
> I really liked the demo at http://genome.biowiki.org/ 
> Would it be possible for my friends at Univ. of Wisconsin to set up their
> own instance and load mammalian genome data into it?

The code is in the gmod CVS on sourceforge in
gmod/Generic-Genome-Browser/ajax.  Beyond that, Andrew wrote up how to
install the pre-rendering stuff here:
http://biowiki.org/twiki/bin/view/GBrowse/InstallTileRendering

The main obstacle at this point would be that rendering a human or mouse
genome would take a long time.  I've been working on speeding it up, but
I haven't checked that stuff into CVS yet, as we've still not made a
decision about whether or not we need the graphics primitive database,
which I think may be the wrong approach.  The graphics primitive
database stores all of the drawing commands (like line, rectangle, text)
so they can be replayed later in chunks.  I think (but haven't yet
shown) that we can do it all by storing graphics primitives in memory,
which is more than an order of magnitude faster.  I've appended the full
discussion below.

Maybe Aaron can help us parallelize the rendering
(MPI-tilerendering?) :-)

I (and Andrew?) will be exploring the options and taking some
measurements in the near future.  Rendering some larger chromosomes is
pretty high on my to-do list, and then we'll have a better idea of how
to move forward.  I do think being able to handle large chromosomes is
important functionality to demo, right up there with search and
community annotation (Ian, Andrew: agree or disagree?).

Mitch

Reasons for the primitive DB:
1. break up genome by pixel coordinates into smaller chunks for
rendering--takes less RAM per chunk/can be parallelized.
On the other hand, we may be able to break up the genome accurately
without precomputing and storing all of the primitives.  The main
concern here is that it's sometimes difficult to predict the pixel span
of a feature from its genomic span (especially with text labels).  One
option is to do the pixel-level layout of the entire track in each
rendering job but only store/render the primitives that overlap the part
of the track that's currently being rendered.  Having each chunk
recompute the full layout is redundant work, but I'm pretty sure that
recomputing the layout is faster than fetching all the primitives from
the database.
BTW: this is one of the reasons to explore client-side labels.

2. if a small part of a track changes we may be able to save work by
re-rendering only a small area.
As far as I know we're not yet storing which feature each primitive is
for, so it's difficult to invalidate the right primitives.  Plus, adding
a new feature to a track can potentially cause a large portion of the
track to change (if it has to be laid out ("bumped") again).  And at
this point I'm not sure how much we need to support individual feature
creation/editing.

3. doing only the layout/database fill work up front and then rendering
tiles from the database on demand reduces the amount of time people have
to wait between uploading their annotations and seeing them in the
browser.
On the other hand, we may be able to do a full in-memory pre-rendering
in less time than it takes to fill the database.

Am I missing anything here?