From: Honza M. <hon...@ec...> - 2007-11-04 19:38:56
|
Hi Cedric, It would be really nice if you check the database queries in AA and sugest the optimalizations. We tried to make the queries as fast as possible, but it is always better to check it. Back to your tests. a) DISTINCT I tried to replicate the test you described, but I didn't get so significant results as you did. The problem is I think, that MySQL caches the queries, so if you perform the same query twice, then the second query could be thousands times faster than the first one. If you use similar queries, where the only difference is in DISTINCT or LIMIT, then the second query most probably use the MySQL cache, so the comparison is a bit hard. Can you please test your case taking the MySQL cache into account (so change not only DISTINCT, but also the condition "Somestring%" ...). Personaly I do not believe the DISTINCT could generate big query slowdown, since it just removes duplicate results, which is very easy operation (comparing to searching the database). Also it is not so easy to remove the DISTINCT in general, since many queries in AA leads to duplicate IDs returned (mainly if you try to search or order the results using the field which could contain multivalues). Duplication of IDs in unwanted in most cases. b) LIMIT The same as above is valid - My test results wasn't so significant - the speedup wasn't big if I take MySQL cahce into account. Can you please test it again bypassing the MySQL cache? The AA code do not use LIMIT in this case. We use different approach: 1) get ALL item IDs which matches the filtering conditions (using QueryZids() function). We grab only IDs (not the content of items), so the result set is not so big, so the LIMIT shouldn't help much, I think. 2) check, which ids we want (looking at scroller setting, page, random parameter, ....) and for selected IDs we ask for all the item data (using GetItemContent() function) 3) Then we display the selected items using AA_Stringexpand functions. This approach have some advatages (comaring to LIMIT solution) - you can display random item, you know the number of all matching items (good for scroller), ... This approach we use on many places in the code, so the change to LIMIT solution is not so easy (and I'm not sure, if the LIMIT solution is better at this moment). What looks bad in your query is the '(c0.text LIKE "%Heal%")' part. This part uses substring search, which is always slow, since the database is not able to use database indexes so it have to check all fields for that substring. This is slow. If you can, do not use such queries. Using "=" operator or "RLIKE" operator in AA conds[] is much , much better, than the "LIKE" operator. Honza Dne ne 4. listopadu 2007 Cedric Knight, GreenNet napsal(a): > Hi there > > We have a particular ActionApps-driven site on its own virtual server, > because it has some serious CPU, memory and disk demands. > > The main slice has 40 000 items, but the problem is the number of slow > queries, most of which are RSS feeds, jsview "top 10" lists, tickers > and so on. A typical query from such a view looks like: > > SELECT DISTINCT item.id as itemid , item.publish_date as s0 FROM item > LEFT JOIN content as c0 ON (c0.item_id=item.id AND > (c0.field_id='place...........' OR c0.field_id is NULL)) LEFT JOIN > content as c1 ON (c1.item_id=item.id AND > (c1.field_id='category........' OR c1.field_id is NULL)) WHERE > item.slice_id = @myslice AND ( ( item.status_code=1 AND > (item.publish_date <= '1194184800') AND (item.expiry_date > > '1194184800') ) ) AND ( (c0.text LIKE "%Heal%") ) AND ( (c1.text > LIKE "Somestring%") ) ORDER BY item.publish_date DESC; > > # Query_time: 11 Lock_time: 0 Rows_sent: 2364 Rows_examined: 107941 > > 2364 rows are returned to PHP, but in fact the list length specified > in the view is only 6. By using the index optimisation described at > http://dev.mysql.com/doc/refman/5.0/en/limit-optimization.html > much less of the database should need to be scanned. What would be > needed would be to drop the DISTINCT, and a LIMIT 6 for this type of > view. This optimization works correctly under MySQL 5.0 given the > appropriate indexing, although really it should work even if DISTINCT > is still present (if MySQL could de-duplicate as results were > returned). > > So, firstly, is DISTINCT really necessary in the query at > /include/searchlib.php3 line 1291 in QueryZIDs()? > > Just dropping 'DISTINCT' in phpMyAdmin, I get the same dataset > returned in a fraction of the time, under a second. So these views > should be much faster if if an appropriate index of the item table is > created if it's not already there, e.g. (slice_id, status_code, > last_edit), and we pass the view's "listlen" and "list_from" > properties to LIMIT and OFFSET options respectively. > > With DISTINCT on the query above, I get > 2375 rows in set (14.67 sec) > > Without DISTINCT and with LIMIT 6, I get > 6 rows in set (0.01 sec) > That is, the same ordering, but just the top 6 results. > > I also wonder whether this affects scrollers? Would it be possible to > use "SELECT SQL_CALC_FOUND_ROWS" when a scroller needs to determine > FOUND_ROWS(), but use LIMIT and OFFSET instead when looking at > particular sections? The type of view could be passed to > QueryZIDs() as well as the listlen. More simply the listlen parameter > would only be passed if it were a view like that above which didn't > include a scroller. If scrollers are to take advantage of the same > optimisation, perhaps there could be a different scroller alias which > only links to 'next page'. > > The site performance is bad enough that I'm tempted to leap in and > hack some code, but what might go wrong? Is there a schematic of the > AA code anywhere? > > Thanks > > Cedric > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Apc-aa-coders mailing list > Apc...@li... > https://lists.sourceforge.net/lists/listinfo/apc-aa-coders |