|
From: Hanno S. <ha...@ha...> - 2011-07-31 20:15:06
|
Hi. If you used experimental.catalogqueryplan in Plone 3 or 4.0 you got two distinct features from it. First a queryplan implementation for the catalog, but also some improvements to btree operations like intersection and difference. The btree optimizations are now available as a standalone library for Plone 4.1 in http://pypi.python.org/pypi/experimental.btree. The queryplan part has been merged upstream and is now part of ZCatalog itself. But it's not a direct port of the code, but also integrated the unimr.catalogqueryplan work into it. I'd encourage you to use the experimental.btree library and see how much it helps you. It probably only has any noticeable impact if you have at least 10k content objects. I've gotten some preliminary performance data from norden.org, while running it on Plone 4.1 for a bit more than a week now. One week of which we didn't use the btree optimizations as I didn't extract them yet and thought their impact to be minor - which turned out to be very wrong. The average response time across all requests looks something like this (averaged on about 150k requests / day): - 500ms for Plone 4 and experimental.catalogqueryplan - 800ms for Plone 4.1 without btree optimizations (but built-in queryplan in ZCatalog) I don't have a reliable number for 4.1 + btree optimizations yet, as it's only been running on a weekend after server restarts which skew the results. It does look much better again. If the site would run with default Plone 4.0 without any queryplan support, the average page rendering time would be more like 2-3 seconds - which means basically unworkable. If you have larger sites and can monitor response times, I'd value any input on what types of pages or catalog queries are still slow. I believe there's still some major improvements to be made, but it's hard to know what are generally realistic data sets and write performance tests for those. We have some basic performance tests in exp.btree and those show up to a 1000x improvement for some artificial data sets, but also some which show a 2x slowdown. Clearly we get an overall improvement out of the optimizations, but there might be ways to prevent the negative impact if we understand what data sets actually occur. I'll keep monitoring the performance and try to identify patterns. Since we changed a whole lot of internal data structures in indexes and have quite a different queryplan code, it's possible that some of the optimizations we did so far don't apply anymore or there's room for others. Optimizingly yours :) Hanno |