Hey -- the different charsets, if i remember correctly, had to do with one table/index had latin1, the other had binary, and there was a join statement that wasn't using an index as found by Explain, even though it could find it.  One index was in, say, Latin1, and the other was in binary, and so it exploded when trying to convert over 8million+ rows from one to the other as it couldn't do a direct compare.  Really more of an issue based around initial performance than scalability to your level, but try watching what queries are taking the longest in mysql, run an explain on the query, see if it's using indexes, and if they aren't, there's a cause for concern.

refreshLinks2 is just a job type (like htmlCacheUpdate or SMWUpdateJob) in MW for when you edit templates that effect large numbers of pages.  They don't want the page to hang when you hit 'save page' by inserting 100ks of refreshLinks jobs into the jobs table.  They also take a long time to run, depending on the template/complexity of the change/etc., so if the jobrate > 0, and a user accesses a page, it tries to run the job before returning anything to the user, and hence page timeout.

select distinct(job_cmd) from job;

you can see some other job types depending on what is in your queue at any given time.  but since you have the job queue turned off (i.e. one isn't run every time a user accesses a page), I don't believe that is what is causing the slow page loads.

definitely look into the mw profiling capability, also look into how to add profiling to functions/etc as some extensions might not have them by default.  Fairly simple to add it in and time how long everything is taking.


On Mon, Mar 1, 2010 at 5:07 PM, don undeen <donundeen@yahoo.com> wrote:
all great advice, thanks!
yeah, my jobqueue does get ridiculous, because I'm using externalData calls, and properties that automatically create pages, which have their own externalData calls, etc etc. All managed basically through the refreshLinks and runJobs maintenance scripts. So I've got the job rate set to 0, and I've got scheduled jobs running to take care of that. Right now I'm running refreshLinks and runJobs in multiple perpetual loops, hoovering up more external data. Then occasionally I stop those processes, to see how the performance is doing.
So I guess running those maintenance scripts is going to cause some stress that affects the performance of page loads, and general mysql access, no? Once my dataset is a little more stable, I'll reduce the maintenance script frequency, and be able to do some more profiling.

refreshLinks2.php : what's that about? I don't see that in my codebase; maybe it's a version thing?

I didn't realize MW had its own profiling framework. I'll have to dig into that for sure. I had been using xdebug and wincachegrind, to some effect.

aslo, could you explain a bit more about "one of the tables in my database was created with the latin1 charset, while the rest were in binary, which made the use of indexes useless. "

I've got tables with collations: binary (Innodb), latin1_swedish_ci(MyIsam) , and a couple in utf8_bin (MyIsam) . How does this make the indexes not work?

Sorry if any of these questions have been covered elsewhere. Feel free to tell me to google it, or toss a link my way, if you want.

thanks again!

Don Undeen

From: Thomas Fellows <thomas.fellows@gmail.com>
To: don undeen <donundeen@yahoo.com>
Cc: smw list <semediawiki-devel@lists.sourceforge.net>
Sent: Mon, March 1, 2010 4:48:26 PM
Subject: Re: [SMW-devel] optimizing SMW

Hey -

Something you can try out that helped me out of a large smw-timeout issue was actually really just related to a large MW issue, and MW's profiling turned up the answer.


In my case, the Job Queue was what was killing my performance - templates that affected 100,000+ pages were trying to get run off the queue (refreshLinks2), and that just ended in a timeout for the user.  Turning the job rate to 0 solved it. (Have to set up chron job to run the queue overnight).

In your case, check out how long everything is taking using the profiler, will be easier to pinpoint the time hog this way, though I'm sure others might have better suggestions.  As far as MySQL query optimisation, there are lots of good articles on the "explain" syntax out there.  Another problem I encountered (though assuredly rare) was that one of the tables in my database was created with the latin1 charset, while the rest were in binary, which made the use of indexes useless.  The explain command turned that one up for me.

Hope it was at least a little bit helpful, and goodluck


On Mon, Mar 1, 2010 at 4:26 PM, don undeen <donundeen@yahoo.com> wrote:
hi all,
I've got a semantic mediawiki installation with about 100,000 pages and growing,  236k rows in smw_ids, and 646k rows in pagelinks

running on Windows Server 2008,
MediaWiki 1.13.5
PHP 5.3.1
MySQL 5.1.41
SMW 1.4.2
SMWHalo 1.4.5

I'm getting to the point where page loads are starting to be pretty slow sometimes, and occasionally timeout.

Granted, I'm using lots of external data calls, and those calls cause new pages to be created in the background, and those new pages beget more new pages, etc etc. So there's a spidering growth going on as well. I'm doing plenty of caching of my service calls, using memcached.

obviously it's a sort of complicated setup, and I'm noticing that even normal queries of the db (using phpmyadmin) are taking quite a while.

I don't have a lot of experience with db optimization; I'm wondering if there's anything that you guys do to your wiki to make it run better, any defaults I can change, indexes to create, etc (I did add an index on a temp table being created in code, and that helped in one area, so I know things like that can be done).

Also, if there's any good tools you use for profiling either the php or the mysql?

I've used xdebug and wincachegrind for php profiling; has anyone tried MonYog:

for mysql profiling? And other tools you like that I can use in a windows env?
Or maybe just some general pointers on what to look for when trying to improve performance?

I know this is vague; maybe there's a good thread/link out there already for this topic? I haven't seen it.

thanks for all your help and hard work!

don undeen
Metropolitan Museum of Art

Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
Semediawiki-devel mailing list