Thread: [Semediawiki-user] Scaling up SMW data

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I need some help or advice. Maybe point me at instructions.

Been working on a proof of concept for storing scientific data with SMW. 
  Started small, but I am experiencing scale up issues and I am only at 
10% of what the total could be. Right now, there are 9000 pages with 
data and 1100 pages with #ask queries to display the data. At this point 
I'm concerned SMW will not be able to handle it in a suitable browser 
render time.

Partly, I think some of the issues is a lack of standardization on the 
data in regards to naming it. Each lab has their own prefix, but they 
really have the same data based on a position, just a different name. 
Unfortunately, there could be 10 of thousands of positions. Trying to 
explain it easily with cars and tire types...

Some cars have 4 tires made by Goodyear.
Some cars have 4 tires by Firestone.
Some cars have 4 tires by Brand X.
Some cars have 4 tires by ?? .
Some cars have the color blue.

Now I want to group those together. I need to ask, cars color blue. 
Group cars by the tire brand, then sort alphabetical. The issue I face 
is grouping by tire types. Most of the time that information is there, 
sometimes not. Unless I am missing something, there is no way to 
concatenate the values of one property dependent on another property of 
a page. You can only concatenate page names dependent on a property or 
category as far as I know. No, creating a page for all tire 
manufacturers is out. There could be 70,000 pages of manufacturers. Same 
for categories. 70,000 categories with one page name or 2 or 3.

I ended up having to ask in three complicated queries, then I throw in a 
unique, then sort by into an array. The time is not too bad, less than a 
second or right above a second for output for small sets of returned 
data. I came up with this, but the issue is parser time as the data set 
grows.   (1) arraydefined all cars blue, (2) arrarydefined with an 
arraymap asking for the brand of tires each has which is the same, (3) 
arraydefined a list with an arraymap asking for cars blue, tire 
manufacturer x OR cars blue, !tire manufacturer. Then print the 
arraydefined, unique, sort.  Gives me something like this:

CarName1/CarName4, CarName2, CarName3/CarName5/CarName6, CarName7, 
CarName8/CarName9.

Where I run into trouble is, imagine 100 different cars names with 
different tires made by 80 manufactures. So when I run an ask query as 
above, unique then sort, I am hitting 30 secs to parse the data and 
output it to a page. That's way too long.

I have tried to store like items as an #ask query in property tied to 
the page name. So CarName3/CarName5/CarName6 would all have the same 
data, ask tire manufacturer property. All three would store the same 
data and unique would drop all but one. This method is a case study of 
why you should not store properties with #ask queries.  Yes, it is much 
faster to render the data, 100 in a second or two. The problem is 
CarName10 is added and creates its property: 
CarName3/CarName5/CarName6/CarName10. Unfortunately, 3, 5, and 6 don't 
know about CarName10 unless you refresh the data for each one of the 
other 3. So I could hard store the data, {{{manufacturer|}}}, but I 
would still have to go back manually and update the other 3 every time a 
new car is added.

Is there a way to solve this without being in a perpetual refresh of 
data for storing property values with an #ask? Am I missing something?

Thanks
Tom

Thread: [Semediawiki-user] Scaling up SMW data

Lets you store and query data within the wiki's pages.

semediawiki-user