From: Mitch S. <li...@ar...> - 2006-11-26 15:38:17
Attachments:
bti-memstorage.patch
ti-prim-api.patch
|
Hello, Reading the wiki, it looks like in-memory was the original approach, but it was too slow because it was rendering all the primitives for each tile. As I understand it, storing the primitives in a database buys you the ability to query for just those primitives that overlap the current tile. After reading Andrew's message that said that database access was the bottleneck, I wanted to try another take on the in-memory approach. Instead of storing all the primitives in one big array, I'm keeping an array of arrays of primitives, with one array of primitives per rendering tile. This version of GDRecordPrimitives adds its primitive to the primitive array of each of the rendering tiles that overlap with the primitive. There's also a separate global primitive array. On the yeast_chr1 that comes with gbrowse, rendering track 3 (named genes) at zoom level 1 with the in-memory patch takes about a third of the time that the db version does (see below). I haven't gotten chado going on my machine yet, so I'd be interested in seeing comparisons with/without the patches from anyone who wants to do one on their data (particularly with larger/more complex tracks). I've attached two patches (against CVS HEAD); the first one (ti-prim-api.patch) changes the primitive storage api in TiledImage a little so that I could cleanly override those functions in BatchTiledImage. Basically, it moves some work between callers and callees so that the in-memory version doesn't have to do the serialization work. The second patch (bti-memstorage.patch, which depends on the first patch) changes BatchTiledImage to override the primitive storage methods of TiledImage. I put this stuff in BatchTiledImage because BatchTiledImage is the class that knows about the rendering tile dimensions, which I wanted to use. I thought this was the minimally invasive way to do it; I wanted to make it easy to see what I was trying to do by reading the patch. If there's consensus that this is the way to go, then more reorganization would probably be a good idea. Regards, Mitch This is on a 2.2 GHz Athlon 64 - [mitch@firebolt server]$ patch < ti-prim-api.patch patching file TiledImage.pm [mitch@firebolt server]$ mkdir testdb [mitch@firebolt server]$ time ./generate-tiles.pl -c ~/apache/conf/gbrowse.conf/ -o testdb/ -s yeast_chr1 -m 0 -v 1 --no-xml --print-tile-nums --render-gridlines -l I -r t3z1r1-2303 &> withdb.out real 4m10.798s user 2m16.425s sys 0m6.968s [mitch@firebolt server]$ patch < bti-memstorage.patch patching file BatchTiledImage.pm [mitch@firebolt server]$ mkdir testmem [mitch@firebolt server]$ time ./generate-tiles.pl -c ~/apache/conf/gbrowse.conf/ -o testmem/ -s yeast_chr1 -m 0 -v 1 --no-xml --print-tile-nums --render-gridlines -l I -r t3z1r1-2303 &> withmem.out real 1m23.400s user 1m18.229s sys 0m2.572s [mitch@firebolt server]$ diff -r testdb testmem [mitch@firebolt server]$ |
From: Ian H. <ih...@be...> - 2006-11-26 20:12:07
|
Brilliant, thanks Mitch! This is very promising. It might also be interesting to investigate (i) how many features the in-memory approach can handle (e.g. on Drosophila, human chromosomes) and (ii) the performance compared to the render-on-demand approach (which fills the database with primitives, but does not render any tiles until someone views that particular tile). Ian On Nov 26, 2006, at 7:38 AM, Mitch Skinner wrote: > Hello, > > Reading the wiki, it looks like in-memory was the original > approach, but > it was too slow because it was rendering all the primitives for each > tile. As I understand it, storing the primitives in a database > buys you > the ability to query for just those primitives that overlap the > current > tile. > > After reading Andrew's message that said that database access was the > bottleneck, I wanted to try another take on the in-memory approach. > Instead of storing all the primitives in one big array, I'm keeping an > array of arrays of primitives, with one array of primitives per > rendering tile. This version of GDRecordPrimitives adds its primitive > to the primitive array of each of the rendering tiles that overlap > with > the primitive. There's also a separate global primitive array. > > On the yeast_chr1 that comes with gbrowse, rendering track 3 (named > genes) at zoom level 1 with the in-memory patch takes about a third of > the time that the db version does (see below). I haven't gotten chado > going on my machine yet, so I'd be interested in seeing comparisons > with/without the patches from anyone who wants to do one on their data > (particularly with larger/more complex tracks). > > I've attached two patches (against CVS HEAD); the first one > (ti-prim-api.patch) changes the primitive storage api in TiledImage a > little so that I could cleanly override those functions in > BatchTiledImage. Basically, it moves some work between callers and > callees so that the in-memory version doesn't have to do the > serialization work. > > The second patch (bti-memstorage.patch, which depends on the first > patch) changes BatchTiledImage to override the primitive storage > methods > of TiledImage. I put this stuff in BatchTiledImage because > BatchTiledImage is the class that knows about the rendering tile > dimensions, which I wanted to use. I thought this was the minimally > invasive way to do it; I wanted to make it easy to see what I was > trying > to do by reading the patch. If there's consensus that this is the way > to go, then more reorganization would probably be a good idea. > > Regards, > Mitch > > This is on a 2.2 GHz Athlon 64 - > > [mitch@firebolt server]$ patch < ti-prim-api.patch > patching file TiledImage.pm > [mitch@firebolt server]$ mkdir testdb > [mitch@firebolt server]$ time ./generate-tiles.pl -c > ~/apache/conf/gbrowse.conf/ -o testdb/ -s yeast_chr1 -m 0 -v 1 --no- > xml > --print-tile-nums --render-gridlines -l I -r t3z1r1-2303 &> withdb.out > > real 4m10.798s > user 2m16.425s > sys 0m6.968s > [mitch@firebolt server]$ patch < bti-memstorage.patch > patching file BatchTiledImage.pm > [mitch@firebolt server]$ mkdir testmem > [mitch@firebolt server]$ time ./generate-tiles.pl -c > ~/apache/conf/gbrowse.conf/ -o testmem/ -s yeast_chr1 -m 0 -v 1 -- > no-xml > --print-tile-nums --render-gridlines -l I -r t3z1r1-2303 &> > withmem.out > > real 1m23.400s > user 1m18.229s > sys 0m2.572s > [mitch@firebolt server]$ diff -r testdb testmem > [mitch@firebolt server]$ > <bti-memstorage.patch> > <ti-prim-api.patch> > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV________________________________ > _______________ > Gmod-ajax mailing list > Gmo...@li... > https://lists.sourceforge.net/lists/listinfo/gmod-ajax |
From: Mitchell S. <mi...@ar...> - 2006-12-04 16:23:05
|
Well, I've been spending some quality time with D. melanogaster chromosome 4 and perl's DProf, and I've learned a few things: * the TiledImage::AUTOLOAD routine was taking up a significant proportion of the time, especially on primitive-intensive tracks like the yeast_chr1 translation tracks at full zoom. I'm working on reimplementing the accessors using closures or Class::Struct, and reimplementing the GD primitive interception with closures like this (in TiledImage): foreach my $sub (keys %intercept) { no strict "refs"; *$sub = sub { my ($self, @args) = @_; my @originalArgs = @args; # get bounding box my @bb; @bb = $self->getBoundingBox ($sub, @args); ... * on some tracks, GD::filledRectangle was taking the lion's share of the time (96% on one run that I measured for a track that was using the "transcript" glyph). I looked at the gd c code, and gdImageFilledRectangle calls the retail gdImageSetPixel function (which is non-trivial) in a nested loop. Since AFAIK we're not doing any antialiasing or other compositing, it seems like this could be improved a lot. For the immediate future the advice is to avoid rectangle based glyphs. * for the less primitive-intensive tracks (like gene or mRNA) at high zoom levels, the gridlines are big memory users (for the in-memory primitive storage approach). Rendering Dmel chrom 4 mRNA at zoom 1 without gridlines reduced peak memory usage (RSS as measured by ps) from 653 megabytes to 169 megabytes. If we can find a way to avoid rendering the gridlines (like using PNG transparency as mentioned on the wiki) or if we can find a way to avoid storing the gridline primitives then I don't think memory usage will be a problem for these kinds of tracks. * for the primitive-intensive tracks like the translation and dna tracks, I'm hoping that rendering smaller tile ranges will make the jobs fit in RAM. I'm not sure if we can avoid generating all primitives when we're not rendering the whole track, but in any case we should be able to avoid storing all those primitives. If we can make all those improvements (not a huge if IMO; call it a medium-small if) then I think pre-rendering is the way to go. Unless there are other reasons than speed and memory usage to do render-on-demand? I've been assuming that rendering on demand would be a nontrivial amount of work, not to mention that it's yet another CGI sitting on top of a database (not sure how well it'll scale), but admittedly I haven't thought all the way through it. If we don't need to render on demand, then AFAICT we don't need the primitive database (?), and then I'll feel comfortable producing a patch that takes out the database bits. It's not like I have some kind of vendetta against the database bits, it's just that making database/memory a run-time option is that much extra work. Mitch |
From: Mitch S. <li...@ar...> - 2006-12-13 10:20:32
Attachments:
filledrect-memset.patch
prim-mem.patch
|
I've implemented some of the things I talked about in my last message; there are two patches attached, one for GD and one for gmod-ajax. With them both, my hardware can render Dmel chr. 4 mRNA (with the transcript glyph) at zoom 1 in about five and a half minutes. More inline: On Mon, 2006-12-04 at 08:22 -0800, Mitchell Skinner wrote: > * the TiledImage::AUTOLOAD routine was taking up a > significant proportion of the time, especially on > primitive-intensive tracks like the yeast_chr1 > translation tracks at full zoom. I'm working on > reimplementing the accessors using closures or > Class::Struct, and reimplementing the GD primitive > interception with closures This is done. > * on some tracks, GD::filledRectangle was taking the > lion's share of the time (96% on one run that I > measured for a track that was using the "transcript" > glyph). I looked at the gd c code, and > gdImageFilledRectangle calls the retail > gdImageSetPixel function (which is non-trivial) in > a nested loop. The attached GD patch adds a special case to gdImageFilledRectangle that uses memset on non-truecolor images where the rectangle is being filled with a regular color. If no one here sees any problems with it, I'll be sending it on upstream. The optimal case for the memset version is very wide (say, tens of thousands of pixels) and short rectangles, just like the ones we're doing at full zoom. On rectangles like that, this patch makes gdImageFilledRectangle up to 40x faster on my system. > * for the less primitive-intensive tracks (like gene > or mRNA) at high zoom levels, the gridlines are big > memory users (for the in-memory primitive storage > approach). Rendering Dmel chrom 4 mRNA at zoom 1 > without gridlines reduced peak memory usage (RSS as > measured by ps) from 653 megabytes to 169 megabytes. > If we can find a way to avoid rendering the gridlines > (like using PNG transparency as mentioned on the > wiki) or if we can find a way to avoid storing the > gridline primitives then I don't think memory usage > will be a problem for these kinds of tracks. I thought about trying to take the gridline code from TiledImagePanel and copying it into TiledImage, but that would mean that TiledImage would have to know about a lot more stuff than it does now; plus blurring the line betweeen TiledImagePanel and the rest of the pre-rendering code works against the goal of merging TiledImagePanel back into BioPerl proper. Instead, I took an approach that's a bit less efficient but IMHO cleaner layering-wise: trying to represent all of the gridline information without explicitly storing all of it. In other words, the fact that the gridline line primitives are so regular means that we can store the gridline information a little more compactly. I wrote a class that compactly stores sequences of integers where the numbers are mostly increasing by the same interval or mostly the same, and I used it to store the arguments for vertical line primitives. See CompactList.pm and TiledImage::line in the patch. This reduces memory usage on Dmel chr. 4 mRNA at zoom 1 from 653 megabytes to 225 megabytes, and I think it will scale fairly well to larger chromosomes. The runtime cost is several percent; I think it's a good tradeoff. > * for the primitive-intensive tracks like the > translation and dna tracks, I'm hoping that rendering > smaller tile ranges will make the jobs fit in RAM. > I'm not sure if we can avoid generating all > primitives when we're not rendering the whole track, > but in any case we should be able to avoid storing > all those primitives. I haven't done anything here yet, but I believe we can do something similar the gridlines to store those primitives more compactly. I think this all suggests that pre-rendering large chromosomes will work; when I get a chance I'll be loading up and trying out the other Drosophila chromosomes. In the attached patch I've gone ahead and taken out the code related to the primitive database. If people are cool with the patch, then I'll go ahead and commit; if not, I'd appreciate any feedback. Mitch |