Thread: [Gmod-ajax] using the panel to do tiling

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Over the last week or so I've been experimenting with a different way of 
doing the rendering.  Performance-wise, it takes significantly less time 
and space.  Correctness-wise, I haven't found any problems but checking 
it is a bit difficult.  Code-elegance wise, it's worse, but I think that 
with some Panel api changes it could be cleaned up a lot by moving most 
of the code into a Panel subclass (it currently fiddles in odd ways with 
some of the Panel's state).

For yeast_chr1, it takes 3 and a half minutes to render all tracks + all 
zooms and uses 93MB of RAM.  This is about 1/4 the space and less than 
1/4 of the time taken by current CVS HEAD with in-memory primitive 
storage. For Drosophila chr. 4 (all tracks + zooms), it takes 31 and a 
half minutes and uses 200 MB of RAM, which is less than 1/6 the space 
and about 40% of the time taken by CVS HEAD.

I've put a set of tiles generated with this code here:
http://genome.biowiki.org/gbrowse/dmel-noti/prototype_gbrowse.html
http://genome.biowiki.org/gbrowse/yeast_chr1-noti/prototype_gbrowse.html
and I'd appreciate any reports of incorrectly rendered tiles there.

The rest of this email is a description of why I took this approach and 
how it's done.  If you just want to render big chromosomes without 
reading all the details, then I'll be committing this code soon, either 
to HEAD or on a branch.

There were two things that pointed me in this direction: the gridline 
thing, and the empty tile thing.

The gridline thing was when I tried to avoid storing gridline primitives 
by just drawing the first tile's gridlines on every tile, without going 
through TiledImage.  At first I thought I had to use the first tile's 
gridlines because  otherwise the gridlines would have been drawn 
off-tile (because without going through TiledImage I didn't have 
TiledImage's primitive position translation functionality).  The problem 
was that the first tile's gridlines were _different_ from the rest, 
because of the Panel's edge behavior at the first gridline.

This all could have been solved (by copying and adjusting the gridline 
code, if nothing else), but there's a similar issue with "global 
feature" tracks like DNA and translation, and the ruler.  For those, and 
for the gridlines, I wanted to be able to generate just one rendering 
tile's worth of primitives at a time, and not have to store all of the 
primitives for the entire chromosome, which take up a lot of space on 
these primitive-intensive tracks.

My solution was to create a Panel for each rendering tile, and use that 
to draw the gridlines and global features.  One problem with this 
approach is what happens when some primitive runs off the end of a 
rendering tile (which is one of the main reasons TiledImage exists in 
the first place).  For the DNA and translation tracks, there are no 
labels or other primitives that extend in unpredictable ways, so it's 
not a big problem there.  For the ruler (where the labels do have some 
extra width), my solution was to have the per-tile Panel extend a short 
distance (currently 100px) beyond the rendering tile on both sides.  
This way, any primitive which extends less than 100px off-tile does get 
rendered correctly on both sides of the tile boundary.  The distance is 
set by the "$global_padding" variable in my experimental version of 
generate-tiles.pl.

The empty tile thing had to with the fact that GD::Image::png was taking 
up a lot of time in the profiles I was generating.  I figured I could 
use the glyph boxes used by the imagemap code to figure out which tiles 
were blank, and avoid generating those (by hardlinking the file name to 
a previous blank tile).  This worked, and it gave a speedup on CVS HEAD 
of up to 12% on some tracks, but it got me thinking: if I knew the pixel 
span of each glyph in advance (which is what those boxes provide) then 
for each rendering tile I could use that information to only render the 
glyphs that overlap the tile.  I also spent some time reading the panel 
code, and I realized that I could get the TiledImage primitive position 
translation functionality almost for free by giving the Panel a negative 
pad_left value.  And only rendering a tile's worth of glyphs at a time 
did something similar to TiledImage's "only render the current tile's 
worth of primitives" functionality, without having to store any primitives.

For non-global features I'm still using a chromosome-wide Panel to do 
the bumping, so the layout is still right.

So this approach doesn't use TiledImage (or BatchTiledImage, or 
DBPrimStorage, or MemoryPrimStorage) at all.  Which is a fairly radical 
change IMO but it's the only way I see to really scale to large 
chromosomes.  I do like the fact that TiledImage is a nice clean 
abstraction, but there's no way to store a human chr. 1 worth of 
primitives in memory, and even if you had an infinitely fast disk 
storage method for primitives the (de)serialization overhead would still 
kill you, as far as I can tell.  Actually, now that I think about it, I 
remember Data::Dumper (serialization) taking a nontrivial amount of time 
in the database primitive storage profiling I did last year, but I'm not 
sure about eval (deserialization).

As for whether we should ditch TiledImage, I think there are two 
remaining questions: rendering on demand and correctness.  If this 
approach can do both of those things, then I think it's the way we 
should go.

I believe this can be applied to the rendering-on-demand scenario if we 
use mod_perl and do the layout step on startup.  This would take a fair 
amount of RAM but it's only necessary for tracks that haven't been fully 
rendered yet.  One plus of that approach is that handling single new 
features gets easier.  Storing the layout in a database is theoretically 
possible but saving and restoring that information seems pretty 
complicated, unless we just serialize the entire panel.

So far, I've been testing my changes by doing diffs of the tiles; I'm 
pretty sure I've only committed changes that generate tiles that are 
bit-exactly the same as before.  The tiles that I've generated with this 
approach aren't the same bit-for-bit, but they do look the same (with 
one exception: right ends of genes are now getting rendered correctly).  
I think the difference is in the palette, so the tiles could still be 
correct even if they're different.  So I'm not yet fully convinced that 
it's rendering exactly correctly, but it does look right to me.

If you're curious, I've appended the code for the meat of the tile 
rendering below.  The things to pay attention to here are how the 
per-tile panel is set up, any if statement that checks $is_global, and 
the $small_tile_gd->copy call near the end.  @per_tile_glyphs is an 
array of arrays; for each rendering tile, it has an array of the glyphs 
that overlap that tile.

Comments?

Mitch

    for (my $x = $first_large_tile; $x <= $last_large_tile; $x++) {
        my $large_tile_gd;
        my $pixel_offset = (0 == $x) ? 0 : $global_padding;

        # we want to skip rendering whole tile if it's blank, but only if
        # there's a blank tile to which to hardlink that's already rendered
        if (defined($per_tile_glyphs[$x]) || (!defined($blankTile))) {

            # rendering tile bounds in pixel coordinates
            my $rtile_left = ($x * $rendering_tilewidth) - $pixel_offset;
            my $rtile_right = (($x + 1) * $rendering_tilewidth) + 
$global_padding - 1;
            # rendering tile bounds in bp coordinates
            my $first_base = ($rtile_left / $big_panel->scale) + 1;
            my $last_base = int(($rtile_right / $big_panel->scale) + 1);

            #print "pixel_offset: $pixel_offset first_base: $first_base 
last_base: $last_base " . tv_interval($start_time) . "\n";

            # set up the per-rendering-tile panel, with the right
            # bp coordinates and pixel width
            my %tpanel_args = %$panel_args;
            $tpanel_args{-start} = $first_base;
            $tpanel_args{-end} = $last_base;
            $tpanel_args{-stop} = $last_base;
            $tpanel_args{-width} = $rtile_right - $rtile_left + 1;
            my $tile_panel = Bio::Graphics::Panel->new(%tpanel_args);

            if ($is_global) {
                # for global features we can just render everything
                # using the per-tile panel
                my @segments = $CONFIG->name2segments($landmark_name . ":"
                                                      . $first_base . ".."
                                                      . $last_base,
                                                      $db, undef, 1);
                my $small_segment = $segments[0];
                $tile_panel->add_track($small_segment, @$track_settings);
                $large_tile_gd = $tile_panel->gd();
            } else {
                # add generic track to the tile panel, so that the
                # gridlines have the right height
                $tile_panel->add_track(-glyph => 'generic',
                                       @$track_settings,
                                       -height => $image_height);
                $large_tile_gd = $tile_panel->gd();
                #print "got tile panel gd " . tv_interval($start_time) . 
"\n";

                if (defined $per_tile_glyphs[$x]) {
                    # some glyphs call set_pen on the big_panel;
                    # we want that to go to the right GD object
                    $big_panel->{gd} = $large_tile_gd;

                    #move rendering onto the tile
                    $big_panel->pad_left(-$rtile_left);

                    # draw the glyphs for the current rendering tile
                    foreach my $glyph (@{$per_tile_glyphs[$x]}) {
                        # sometimes the glyph positions itself
                        # using the panel's pad_left, sometimes
                        # it just uses the x-coordinate it gets
                        # in the draw method.  We want them both
                        # to be -$rtile_left.
                        $glyph->draw($large_tile_gd, -$rtile_left, 0);
                    }
                }
            }
            $tile_panel->finished;
            $tile_panel = undef;
        }

        # now to break up the large tile into small tiles and write them 
to PNG on disk...
      SMALLTILE:
        for (my $y = 0; $y < $small_per_large; $y++) {
            my $small_tile_num = $x * $small_per_large + $y;
            if ( ($small_tile_num >= $first_tile) && ($small_tile_num <= 
$last_tile) ) { # do we print it?
                my $outfile = "${tile_prefix}${small_tile_num}.png";

                if (!$is_global) {
                    writeHTML($tile_prefix, $x, $y, $small_tile_num,
                              $tilewidth_pixels, $image_height,
                              $track_num, $html_current_outdir,
                              $per_tile_glyphs[$x]);

                    if (!defined($nonempty_smalltiles[$x]{$y})) {
                        if (defined($blankTile)) {
                            #print "linking $outfile to $blankTile\n";
                            link $blankTile, $outfile
                                || die "could not link blank tile: $!\n";
                            next SMALLTILE;
                        } else {
                            $blankTile = $outfile;
                        }
                    }
                }
                open (TILE, ">${outfile}") or die "ERROR: could not open 
${outfile}!\n";

                my $small_tile_gd = GD::Image->new($tilewidth_pixels,
                                                   $image_height,
                                                   0);
                $small_tile_gd->copy($large_tile_gd,
                                     0, 0,
                                     $y * $tilewidth_pixels + 
$pixel_offset, 0,
                                     $tilewidth_pixels, $image_height);

                print TILE $small_tile_gd->png
                    or die "ERROR: could not write to ${outfile}!\n";

                warn "done printing ${outfile}\n" if $verbose >= 2;
            }
        }
    }

Thread: [Gmod-ajax] using the panel to do tiling

gmod-ajax