ReportLab / Feature Requests / #13 Slow rendering of very large tables

Nobody/Anonymous - 2004-05-12

Logged In: NO

Indeed. Table generation can even be very slow on my dual
2.4GHz Xeon with 2 gigs of RAM. OS == RedHat 8, but with
Python 2.3 and RL 1.19 .

I've found that making sure to build _rl_accel helps, and using
Psyco can help some more, but its still far from fast. The main
issue is that it makes it much harder to do use ReportLab for
functions that are interactive, such as CGI PDF generation or
as a plug-in print system for an app.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Craig Ringer - 2004-05-13

Logged In: YES
user_id=639504

Going by your numbers, I assume you're already passing a
rowHeights= parameter to the Table constructor. Without that,
even a 2000-row run on my laptop (PIII-800) takes a bit over a
minute. With rowHeights passed explicitly, execution takes five
seconds.

A little profiling revealed that the _calc_height method of the
Table class was the #1 CPU time eater by far when no
rowHeights parameter was passed.

When I do pass RowHeights, I'm seeing much the same
performance behaviour as you.

Some more profiling shows that the functions that see most of
the time when a rowHeights param is passed are the bodies of:
Table.__init__ (50% on a 4000-row sample)
and
CellStyle1.__init__ (another 10%)

BTW, my first comment on reading tables.py: "Where are the
docstrings?!?!?".

Anyway, if I add some print statements and run over a
100,000 line table, part of what is happening becomes clear:

STARTING <instance object at 0x4035794c>
DONE: 100000 passes
STARTING <instance object at 0x403ac0ec>
DONE: 56 passes
STARTING <instance object at 0x403ac1ac>
DONE: 99944 passes
STARTING <instance object at 0x403ac46c>
DONE: 56 passes
STARTING <instance object at 0x403ac30c>
DONE: 99888 passes
STARTING <instance object at 0x403ac4cc>
DONE: 56 passes
STARTING <instance object at 0x403ac88c>
DONE: 99832 passes
STARTING <instance object at 0x403ac5ac>
DONE: 56 passes
STARTING <instance object at 0x403ace4c>
DONE: 99776 passes

(etc). Each time Table.split is called, the table is splitting off
another roughtly page-sized chunk, returning that, and
creating a whole new table instance to hold the rest. That
table has Table.split called in turn. The Table.split function
looks like its written to be called sort of recursively like this -
effective on a small table, less nice on a large one.

I haven't calculated it out, but intuitively this behaviour would
seem to be consistent with the O(N**2) completion time you
mentioned. Each size doubling would mean twice the number
of split calls (assuming fixed row heights), with the chunks of
data starting off at twice the size. Ouch.

This doesn't appear to be the only problem, but its certainly
not helping.

I'm looking into this more now, but I'm relatively new to Python
and ReportLab so no results guaranteed. The use of one-char
variables, near-total absence of comments, and lack of
docstrings isn't helping me one bit - *sigh*.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Craig Ringer - 2004-05-13

Logged In: YES
user_id=639504

OK. I've done a gruesomely ugly special case hack as a test.
It currently doesn't work with repeatRows, completely breaks
styling, and probably fails on a document with anything but a
table on it. The goal was to see if Table.split was really the
problem - breaking everything be dammned, it's not like I'd
ever use this test code (or show it to anybody - *ICK!*).

Basically, I re-wrote table.split to loop over the table data and
create new Table instances for each page-sized block, then
return the lot in one go.

It does seem to be VASTLY faster on larger tables, but I can't
be sure until I fix repeatRows and styles. Even then, the
handling of varying target drawing areas will still be utterly
broken, making it completely useless.

It did, however, do the job - isolating the speed problem down
to the split method.

If I contrast (a) the original table.py, (b) my mangled table.py,
and (c) the orignal with the same code (for styles and
repeatRows) removed as I've removed from the mangled one, I
see these sort of run times:

4000 rows:
(a) 20s
(b) 5s
(c) 19s

Note also that this is now without repeatRows (I was using
repeatRows=1 when seeing times vaguely approximating
yours).

Of course, while the PDF output from my hacked up version
_looks_ OK, it could be horribly broken in some way, too. This
is just an early test.

Hmm... results at 8000 rows even more promising:
(a) 1:20
(b) 1:19
(c) 0:10

The core changes I've made to do this are:

(a) scan self._rowHeights to see if all values are equal

(b) If the table row heights are all the same, use the "n"
value from above to figure out how many tables we'll need to
hold all the results.

(c) loop over xrange(num_tables), and break out an
"n"-sized block of rows from "data" on each loop. Create a
new Table instance with that data, and add it to a list of tables
to return.

This avoids the recursive table re-creation. HOWEVER, I
suspect that the current system does what it does for a reason.
My best guess is that it handles varying frame sizes that way.
After all, if you have 3 pages - one with room for 25 cols, one
with room for 50, and one with room for another 50, you don't
want to split into 5 tables - only 3. *arrggh*.

There has to be a better way of doing this - either handling the
case where all row heights are equal _and_ all frame sizes
are equal separately, or reducing the penalty for the repeated
Table class re-creation.

Ideas? Maybe subclass Table to create one that can't handle
varying sized frames etc, but is vastly faster? It could draw the
first page normally, then for second and subsequent pages
use the method above instead - that could work.

My 20,000 row test just completed as I was going to send this.
run time: 24 seconds on a PIII/800 laptop. the split method is
definitely the problem - as this would've taken many minutes
with the original code. In fact, a 100,000 line run just finished
in 2 min 11 sec - all 1755 pages and 2.68MB of it :-)

(Sorry if I'm a bit incoherent here. It's almost 1:00AM and I've
been awake for _far_ too long).

I'll try to assemble this into a useful post and ask on the list if I
don't see anything here in the next few days. I'm going to be
rather busy and won't be able to do more on it for a little while.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Craig Ringer - 2004-05-13

Logged In: YES
user_id=639504

I just did a few quick calculations on my test numbers, just out
of curiosity. The results are interesting:

4000 rows : 800 rows/sec
8000 rows : 800 rows/sec
20000 rows : 833 rows/sec
100000 rows: 763 rows/sec

In other words, it really looks like the repeated table
re-splitting and re-creation really is the killer.

I also find it amusing that I get almost exactly one row
generated per million CPU cycles ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Slow rendering of very large tables

Group

Searches

Help

#13 Slow rendering of very large tables

Discussion