Thread: [SMW-devel] [SMW] Query performance and in-memory storage of SMW-related queries

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

Starting with SMW 1.7 and MW 1.18, we began to convert our old legacy
document system into a SMW-MW based system which right now left us
with more than 700.00 triplets stored in SMW but at the same time
decreased our response time on SMW-related queries.

Somewhere around 200.000 triplets (it does not mean the number is a
threshold)  we recognized an increased impact on query performance
where now every time we execute a query we feel the pinch. We are not
talking about in-template query performance as seen by the
Wikia/Familypedia example (we abandoned such practices some time ago).
Nowadays we encourage users to execute all complex queries either via
Special:Ask or provide an input form to run a RunQuery and yes we are
using APC to improve caching and response time in general.

We tried to look at external solutions such as 4Store which is not
supported on Windows, Virtuoso has no real documentation available to
make it work with SMW (at least we couldn't find one), and Jena which
seems to require SMW+ leaving us with the native SMW store itself and
we would like to keep it that way as every external software means an
additional fault point and maintenances effort.

== Architectural question ==

#1 Could their be an indexing problem on behalf of one of the primary
SMW table key indexes?

# 2 Does SMW natively support MySQL internal
query-cache-type/query-cache-size option to improve query performance?
We made sure MySQL is using query-cache-type/query-cache-size option
but somehow this don't show any effect for SMW-related queries.

#3 Would a different approach to handle query data namely storing
query data in a temporary in-memory table bring advantages compared to
the current approach of accessing SMW disk tables every-time a query
is executed? Would an in-memory concept for queried data (SMW data is
mirrored into a temporary in-memory table for READ purpose only at the
time of the actual MySQL session and every time MySQL is restarted
temporary in-memory tables have to been rebuild) improve query and
access performance of SMW related triplets. I guess (I don't know)
neither MyISM or InnoDB would do have an impact since the bottleneck
seems the disk access to execute queries on behalf of triplets stored
in SMW-related tables.

Of course their is always a way to improve performance by using better
hardware (RAID, SSD to improve output performance) but this a last
resort approach which we would like to avoid for the moment.

System:
MediaWiki	1.18.0, PHP 5.3.8 (apache2handler), MySQL 5.5.16, APC
version	3.1.6-dev

PS: Our increased use of triplets comes from an automatic indexing
process of content and document transfer which exchanges information
with Sphinx Search while identifying the 30 most used words in a
document which is written back to the wiki and stored as semantic
triplet on the related NS_IMAGE object.

Cheers,

mwjames

Thread: [SMW-devel] [SMW] Query performance and in-memory storage of SMW-related queries

Lets you store and query data within the wiki's pages.

semediawiki-devel