[Apc-aa-coders] MySQL optimizations

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi there

We have a particular ActionApps-driven site on its own virtual server,
because it has some serious CPU, memory and disk demands.

The main slice has 40 000 items, but the problem is the number of slow
queries, most of which are RSS feeds, jsview "top 10" lists, tickers
and so on.  A typical query from such a view looks like:

SELECT DISTINCT item.id as itemid , item.publish_date as s0 FROM item
LEFT JOIN content as c0 ON (c0.item_id=item.id AND
(c0.field_id='place...........' OR c0.field_id is NULL)) LEFT JOIN
content as c1 ON (c1.item_id=item.id AND
(c1.field_id='category........' OR c1.field_id is NULL)) WHERE
item.slice_id = @myslice AND  (  ( item.status_code=1 AND
(item.publish_date <= '1194184800')  AND (item.expiry_date >
'1194184800')  ) )  AND ( (c0.text LIKE "%Heal%") ) AND ( (c1.text
LIKE "Somestring%") ) ORDER BY item.publish_date DESC;

# Query_time: 11  Lock_time: 0  Rows_sent: 2364  Rows_examined: 107941

2364 rows are returned to PHP, but in fact the list length specified
in the view is only 6.  By using the index optimisation described at
http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html
much less of the database should need to be scanned. What would be
needed would be to drop the DISTINCT, and a LIMIT 6 for this type of
view.  This optimization works correctly under MySQL 5.0 given the
appropriate indexing, although really it should work even if DISTINCT
is still present (if MySQL could de-duplicate as results were
returned).

So, firstly, is DISTINCT really necessary in the query at
/include/searchlib.php3 line 1291 in QueryZIDs()?

Just dropping 'DISTINCT' in phpMyAdmin, I get the same dataset
returned in a fraction of the time, under a second.  So these views
should be much faster if if an appropriate index of the item table is
created if it's not already there, e.g. (slice_id, status_code,
last_edit), and we pass the view's "listlen" and "list_from"
properties to LIMIT and OFFSET options respectively.

With DISTINCT on the query above, I get
2375 rows in set (14.67 sec)

Without DISTINCT and with LIMIT 6, I get
6 rows in set (0.01 sec)
That is, the same ordering, but just the top 6 results.

I also wonder whether this affects scrollers?  Would it be possible to
use "SELECT SQL_CALC_FOUND_ROWS" when a scroller needs to determine
FOUND_ROWS(), but use LIMIT and OFFSET instead when looking at
particular sections?  The type of view could be passed to
QueryZIDs() as well as the listlen.  More simply the listlen parameter
would only be passed if it were a view like that above which didn't
include a scroller.  If scrollers are to take advantage of the same
optimisation, perhaps there could be a different scroller alias which
only links to 'next page'.

The site performance is bad enough that I'm tempted to leap in and
hack some code, but what might go wrong?  Is there a schematic of the
AA code anywhere?

Thanks

Cedric

[Apc-aa-coders] MySQL optimizations

ActionApps - easy and automated publishing and sharing for websites

[Apc-aa-coders] MySQL optimizations