I agree that it's great to hear from Wikia, and also great to know that Wikia is willing to put in some development time and effort to help with SMW. A few thoughts:

- Wikia has already contributed somewhat to improving performance - I've been talking for a while to Tim Quievryn (who was at the Boston SMWCon last year), and his feedback helped lead to the faster handling of red links in Semantic Forms that was added in version 2.0.8.

- Semantic Drilldown might actually be contributing to DB writes - it creates a temporary database table on every hit to Special:BrowseData. (I don't know if temporary tables get counted.)

- This might not be the right place to discuss the specifics of the "SMW light" initiative, but it's my personal belief that the best approach to it is to do the triple-store integration, [1] so that SMW can use an RDF triple-store directly to store its data, rather than trying to improve or limit SMW's queries. It would theoretically speed up queries, but, more importantly, even if it didn't, it would basically eliminate SMW's impact on the wiki's database. That's just my personal opinion, though - I'm not involved in either of those projects.

[1] http://semantic-mediawiki.org/wiki/SPARQL_and_RDF_stores_for_SMW


2011/2/23 Markus Krötzsch <markus@semantic-mediawiki.org>
[Making this into a new thread]

Hi Krzysztof,

I was already wondering when I would hear from Wikia ...

As you have noticed, running SMW and extensions on large sites (large in
terms of content, or in terms of users) has special requirements.
Typically, we suggest to use more conservative settings for querying, so
that long and difficult queries do not occur. Similarly, some SMW
extensions have not been developed for large sites, and can be
problematic in their own right. But your users obviously want to keep
the features that they already have, so we need to find better ways of
addressing your problem.

But first we need to separate concerns a little bit. You mention the
following distinct problems:

(1) Too many DB writes (about 60% in total)

(2) Too many slow queries (about 90% from SMW)

Moreover, your problem is not caused by SMW alone but by a number of
SMW-related extensions. So there will be multiple issues that need
addressing to fix this, and maybe even in multiple extensions.

Let us first see how big the impact of the extensions you mention could
be. Semantic Forms mainly leads to some additional reads (apparently no
problem for you); the total number could possibly be reduced. It may
also have some effect on query activity if certain autocompletion
features are used. But otherwise I think it is unlikely to be the root
of the problem. Semantic Drilldown might be more of a problem regarding
complex queries. But it uses its own SQL queries, so it should be
possible to find out how much of (2) comes from this extension. Semantic
Drilldown should not contribute to (1).

Are there any other extensions that use SMW on your site?

Regarding SMW, I have some concrete ideas on what could be done for (1)
and (2) but this will need more careful consideration first. I am
grateful if you can help to track down the cause of the problem, but I
am afraid that the changes in SMW core will still need to be done or at
least reviewed carefully by myself -- which makes me kind of a
bottleneck for the SMW part of your problem. I need to think about the
required work a little further before I can promise anything.



On 22/02/2011 22:38, Krzysztof Krzyżaniak wrote:
 > I think it's would be right place to jump in.
 > Hello, my name is Krzysztof Krzyżaniak a.k.a. eloy and I work for Wikia
 > Inc as backend team leader. We are probably (correct me if I am wrong)
 > on of the biggest user of Semantic Mediawiki suite. We currently have
 > enabled it on about 100 wikis for example on familypedia.wikia.com or
 > yugioh.wikia.com or www.wowwiki.com (but also on wikis which you
 > probably don't suspect for SMW interest like glee.wikia.com or
 > madmen.wikia.com). We would like to expand existence of SMW on Wikia
 > (for example lyrics would love it) but currently we cannot afford it
 > because of performance reasons. For example, our first cluster contains
 > about 30.000 wikis, mostly biggest ones. About 60% of writes in
 > databases came from SMW extensions (SemanticMediawiki,
 > SemanticDrilldown, SemanticForms), also about 90% queries from slow logs
 > are from SMW.
 > I am here to find a way for scaling SMW on our wikis. But also I think
 > that it will be benefit for every SMW user because we want to help
 > improve SMW.
 > What you can expect:
 > - "real world" cases, actually lot of them :)
 > - bugs :) (filled in bugzilla of course)
 > - bug fixes and patches (either as diff or direct svn commits if you
 > prefer that way)
 > - questions
 > We can offer engineering hours and testbeds.
 > For a start I have question for Roadmap: SMW light - how complete it is?
 > What's missing? When you expect it will be ready? How can we help?
 >     eloy

Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev
Semediawiki-devel mailing list

WikiWorks · MediaWiki Consulting · http://wikiworks.com