top shows:

20637 mysql     15   0 24244  17M 14224 R    98.4  3.4  12:16 mysqld
21161 csan      14   0  1064 1064   804 R     0.9  0.2   0:00 top
Still running...


<div style="background:#eff2c0;font-weight:bold;">Invulling van de volledige vraag</div>
<ask <default="&lt;div style=&#34;background:#f1d5c0;color:red;font-weight:bold;&#34;&gt;Geen match&lt;/div&gt;">[[Categorie:Medewerker]] [[Categorie:Kenniskaart]] [[Medewerker:=*]] [[Business unit:=*]] [[Niveau:B/CICT:=>1]]  [[Niveau:Databases:Oracle Spatial:=>1]]  [[Niveau:Java:=>1]]  [[Niveau:Semantic Web:=>0]]  [[Niveau:UML:=>1]]  [[Niveau:DSDM:=>1]]  [[Niveau:rol:Architect:=>1]]  [[Niveau:vakgebied:SO:=>1]] </ask>

Show full processlist shows:

| 7099 | semwebwiki | localhost | semwebwiki | Query   | 84   | statistics | SELECT /* SMW_InlineQuery */ DISTINCT page_title,page_namespace,R0.value_xsd AS R0o,R1.value_xsd AS R1o  FROM `semwebpage`, `semwebcategorylinks` AS c0, `semwebcategorylinks` AS c1, `semwebsmw_attributes` AS R0, `semwebsmw_attributes` AS R1, `semwebsmw_attributes` AS c2, `semwebsmw_attributes` AS c3, `semwebsmw_attributes` AS c4, `semwebsmw_attributes` AS c5, `semwebsmw_attributes` AS c6, `semwebsmw_attributes` AS c7, `semwebsmw_attributes` AS c8, `semwebsmw_attributes` AS c9  WHERE ( page_id = c0.cl_from AND (c0.cl_to='Medewerker')  AND  page_id = c1.cl_from AND (c1.cl_to='Kenniskaart')  AND R0.attribute_title='Medewerker' AND R0.subject_id = page_id  AND R1.attribute_title='Business_unit' AND R1.subject_id = page_id  AND  page_id = c2.subject_id AND c2.attribute_title='Niveau:B/CICT' AND c2.value_num>1  AND  page_id = c3.subject_id AND c3.attribute_title='Niveau:Databases:Oracle_Spatial' AND c3.value_num>1  AND  page_id = c4.subject_id AND c4.attribute_title='Niveau:Java' AND c4.value_num>1  AND  page_id = c5.subject_id AND c5.attribute_title='Niveau:Semantic_Web' AND c5.value_num>0  AND  page_id = c6.subject_id AND c6.attribute_title='Niveau:UML' AND c6.value_num>1  AND  page_id = c7.subject_id AND c7.attribute_title='Niveau:DSDM' AND c7.value_num>1  AND  page_id = c8.subject_id AND c8.attribute_title='Niveau:rol:Architect' AND c8.value_num>1  AND  page_id = c9.subject_id AND c9.attribute_title='Niveau:vakgebied:SO' AND c9.value_num>1 )  ORDER BY page_title  LIMIT 100 |

We've currently got 240 relations and 86 attributes and about 180 articles.

I hope this helps?


Markus Krötzsch schreef:
On Monday 19 June 2006 18:13, Hans Oesterholt wrote:

When I ask a little bit more then a couple of links and attributes, e.g.:

<ask>[[Category:X]] [[Category:Y] [[Level:A:=>1]] [[Level:A:=*]]
[[Level:B:=>1]] [[Level:B=*]] [[Level:C:=>1]] [[Level:C:=*]]
[[Level:D:=>1]] [[Level:D:=*]] [[Level:E:=>1]] [[Level:E:=*]]
[[Level:F:=>1]] [[Level:F:=*]] [[Level:G:=>1]] [[Level:G:=*]]
[[Level:H:=>1]] [[Level:H:=*]] [[Level:I:=>1]] [[Level:I:=*]]
[[dataT::*]] [[dateW:=*]]</ask>

I tried this one on ontoworld and got no results (and this pretty quick). Is 
there an example that causes problems on ontoworld too? About how many 
entries for each of the realtions/attributes does your DB have?

This generates a big Join Select on MySQL. Essentially, it locks up
the complete mediawiki site. MySQL, taking forever and taking
100% CPU for maybe half an hour.

Well, *this* should not happen. What should happen, however, is that overly 
complex queries time out and do not yield a result (which cannot really be 
guaranteed at the moment). PHP usually times out after a couple of seconds 
and no PHP-induced computation should lock a site for 30mins. Not sure what 
could be the reason for this behaviour.

What we can do in general is to restrict the syntactic complexity of a query, 
but we cannot really make sure that it can be executed in decent time (I 
guess no one can). But I am open towards further ideas on how to improve the 
efficiency of such queries. At the moment, SMW offers many flags that allow 
administrators to restrict the query complexity, but they might not suffice 

If I ask all questions seperately,

<ask>[[Category:X]] [[Category:Y] [[Level:A:=>1]] [[Level:A:=*]]
[[dataT::*]] [[dateW:=*]]</ask>
<ask>[[Category:X]] [[Category:Y] [[Level:B:=>1]] [[Level:B:=*]]
[[dataT::*]] [[dateW:=*]]</ask>
<ask>[[Category:X]] [[Category:Y] [[Level:C:=>1]] [[Level:C:=*]]
[[dataT::*]] [[dateW:=*]]</ask>
<ask>[[Category:X]] [[Category:Y] [[Level:D:=>1]] [[Level:D:=*]]
[[dataT::*]] [[dateW:=*]]</ask>
<ask>[[Category:X]] [[Category:Y] [[Level:E:=>1]] [[Level:E:=*]]
[[dataT::*]] [[dateW:=*]]</ask>

This works a lot faster. Would the join operation be absolutely
neccessary? Isn't it possible to join
the result of the individual queries in one?

Well, this problem is called "join-order-optimisation" and it is really a task 
of the database (in general). Without access to the internal datastructures 
of the DB, it will be very hard to efficiently determine in which way one 
should do the computation of a query in order to end up with a small result 
set very quickly. E.g. there might be only a single item with an attribute 
dateW:= but millions of items with the other attributes. Joining them in the 
wrong order gives you billions of intermediate results, all of which are 
filtered out again when building the final join -- if you ever get there.

I will have a look at our (still not cleaned up) query generation. Maybe we 
can simplify some things. Overall, I am not sure whether the ":=*"-query is 
feasible with its current semantics. Typically, you don't want this to act as 
a join (including the very vague condition that the given property is set), 
but as a mere additional property to be displayed. E.g. you want to say 
"Select all articles in the category City with Population above 100000. Then, 
for each of these, show me all entries of 'population', 'located in', and 
'has mayor'"). The latter is obviously different from building a join with 
all combinations of the properties ou want to see!

But in any case, even a simple query could produce too many results 
internally, and some server-side restriction of computational effort for 
inline queries is needed. Not sure how to implement this nicely :-/ One 
option would be to not answer the inline queries immediately, but to process 
them offline in a more controlled setting, and to insert the results when 
they become available. But it would be a pity not to have real-time results 
in such a system.

-- Markus