[Plone-developers] Plone 4.1 - btree query improvements

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi.

If you used experimental.catalogqueryplan in Plone 3 or 4.0 you got
two distinct features from it. First a queryplan implementation for
the catalog, but also some improvements to btree operations like
intersection and difference.

The btree optimizations are now available as a standalone library for
Plone 4.1 in http://pypi.python.org/pypi/experimental.btree. The
queryplan part has been merged upstream and is now part of ZCatalog
itself. But it's not a direct port of the code, but also integrated
the unimr.catalogqueryplan work into it.

I'd encourage you to use the experimental.btree library and see how
much it helps you. It probably only has any noticeable impact if you
have at least 10k content objects.

I've gotten some preliminary performance data from norden.org, while
running it on Plone 4.1 for a bit more than a week now. One week of
which we didn't use the btree optimizations as I didn't extract them
yet and thought their impact to be minor - which turned out to be very
wrong.

The average response time across all requests looks something like
this (averaged on about 150k requests / day):

- 500ms for Plone 4 and experimental.catalogqueryplan
- 800ms for Plone 4.1 without btree optimizations (but built-in
queryplan in ZCatalog)

I don't have a reliable number for 4.1 + btree optimizations yet, as
it's only been running on a weekend after server restarts which skew
the results. It does look much better again.

If the site would run with default Plone 4.0 without any queryplan
support, the average page rendering time would be more like 2-3
seconds - which means basically unworkable.

If you have larger sites and can monitor response times, I'd value any
input on what types of pages or catalog queries are still slow. I
believe there's still some major improvements to be made, but it's
hard to know what are generally realistic data sets and write
performance tests for those.

We have some basic performance tests in exp.btree and those show up to
a 1000x improvement for some artificial data sets, but also some which
show a 2x slowdown. Clearly we get an overall improvement out of the
optimizations, but there might be ways to prevent the negative impact
if we understand what data sets actually occur.

I'll keep monitoring the performance and try to identify patterns.
Since we changed a whole lot of internal data structures in indexes
and have quite a different queryplan code, it's possible that some of
the optimizations we did so far don't apply anymore or there's room
for others.

Optimizingly yours :)
Hanno