From: Tom H. <to...@hu...> - 2014-04-04 18:39:19
|
I need some help or advice. Maybe point me at instructions. Been working on a proof of concept for storing scientific data with SMW. Started small, but I am experiencing scale up issues and I am only at 10% of what the total could be. Right now, there are 9000 pages with data and 1100 pages with #ask queries to display the data. At this point I'm concerned SMW will not be able to handle it in a suitable browser render time. Partly, I think some of the issues is a lack of standardization on the data in regards to naming it. Each lab has their own prefix, but they really have the same data based on a position, just a different name. Unfortunately, there could be 10 of thousands of positions. Trying to explain it easily with cars and tire types... Some cars have 4 tires made by Goodyear. Some cars have 4 tires by Firestone. Some cars have 4 tires by Brand X. Some cars have 4 tires by ?? . Some cars have the color blue. Now I want to group those together. I need to ask, cars color blue. Group cars by the tire brand, then sort alphabetical. The issue I face is grouping by tire types. Most of the time that information is there, sometimes not. Unless I am missing something, there is no way to concatenate the values of one property dependent on another property of a page. You can only concatenate page names dependent on a property or category as far as I know. No, creating a page for all tire manufacturers is out. There could be 70,000 pages of manufacturers. Same for categories. 70,000 categories with one page name or 2 or 3. I ended up having to ask in three complicated queries, then I throw in a unique, then sort by into an array. The time is not too bad, less than a second or right above a second for output for small sets of returned data. I came up with this, but the issue is parser time as the data set grows. (1) arraydefined all cars blue, (2) arrarydefined with an arraymap asking for the brand of tires each has which is the same, (3) arraydefined a list with an arraymap asking for cars blue, tire manufacturer x OR cars blue, !tire manufacturer. Then print the arraydefined, unique, sort. Gives me something like this: CarName1/CarName4, CarName2, CarName3/CarName5/CarName6, CarName7, CarName8/CarName9. Where I run into trouble is, imagine 100 different cars names with different tires made by 80 manufactures. So when I run an ask query as above, unique then sort, I am hitting 30 secs to parse the data and output it to a page. That's way too long. I have tried to store like items as an #ask query in property tied to the page name. So CarName3/CarName5/CarName6 would all have the same data, ask tire manufacturer property. All three would store the same data and unique would drop all but one. This method is a case study of why you should not store properties with #ask queries. Yes, it is much faster to render the data, 100 in a second or two. The problem is CarName10 is added and creates its property: CarName3/CarName5/CarName6/CarName10. Unfortunately, 3, 5, and 6 don't know about CarName10 unless you refresh the data for each one of the other 3. So I could hard store the data, {{{manufacturer|}}}, but I would still have to go back manually and update the other 3 every time a new car is added. Is there a way to solve this without being in a perpetual refresh of data for storing property values with an #ask? Am I missing something? Thanks Tom |
From: Hans O. <han...@gm...> - 2014-04-07 11:01:42
|
Hi Tom, I have read this several times and I believe I still fail to understand the question. Could you help me to do so? What I have understood so far, using your car/tire example. So there is many pages on cars, you want to deal with the subset of those who have blue colors. That would imply to #ask for color=blue. Then you would like to get a list that groups them by tire brand, and have these groups in alphabetical order. To me it would seem that something like ... |sort=tire-brand, car |order=asc ... would result in a list that has all blue cars, beginning with the group of the "AAA" (i.e. alphabetically first brand), and within that group all pages ("car" being mainlabel) in alphabetical order, those without a tire brand being at the top of the list. Then of course rather than having the long default table format list you might use a customized format using format=template. But that would be rather straigthforward, so what am I missing? The second general advice would be to check whether you might solve your issue by using cached concepts ( https://semantic-mediawiki.org/wiki/Help:Concept_caching). As concepts contain results of queries that are constantly being kept up to date ("preprocessed") and thus are already there when doing the actual #ask query. This might help limiting time consumption during the actual ask query. So in general you might create a concept that asks for one property and outputs the contents of another. I understand that this might involve a lot of concepts, e.g. one for every of the 70000 tire manufacturers, so one would think about automation using the API (a script that keeps track of the tire manufacturer list and generates a standardized concept for new entries. Best, Hans 2014-04-04 20:39 GMT+02:00 Tom Hutchison <to...@hu...>: > I need some help or advice. Maybe point me at instructions. > > Been working on a proof of concept for storing scientific data with SMW. > Started small, but I am experiencing scale up issues and I am only at > 10% of what the total could be. Right now, there are 9000 pages with > data and 1100 pages with #ask queries to display the data. At this point > I'm concerned SMW will not be able to handle it in a suitable browser > render time. > > Partly, I think some of the issues is a lack of standardization on the > data in regards to naming it. Each lab has their own prefix, but they > really have the same data based on a position, just a different name. > Unfortunately, there could be 10 of thousands of positions. Trying to > explain it easily with cars and tire types... > > Some cars have 4 tires made by Goodyear. > Some cars have 4 tires by Firestone. > Some cars have 4 tires by Brand X. > Some cars have 4 tires by ?? . > Some cars have the color blue. > > Now I want to group those together. I need to ask, cars color blue. > Group cars by the tire brand, then sort alphabetical. The issue I face > is grouping by tire types. Most of the time that information is there, > sometimes not. Unless I am missing something, there is no way to > concatenate the values of one property dependent on another property of > a page. You can only concatenate page names dependent on a property or > category as far as I know. No, creating a page for all tire > manufacturers is out. There could be 70,000 pages of manufacturers. Same > for categories. 70,000 categories with one page name or 2 or 3. > > I ended up having to ask in three complicated queries, then I throw in a > unique, then sort by into an array. The time is not too bad, less than a > second or right above a second for output for small sets of returned > data. I came up with this, but the issue is parser time as the data set > grows. (1) arraydefined all cars blue, (2) arrarydefined with an > arraymap asking for the brand of tires each has which is the same, (3) > arraydefined a list with an arraymap asking for cars blue, tire > manufacturer x OR cars blue, !tire manufacturer. Then print the > arraydefined, unique, sort. Gives me something like this: > > CarName1/CarName4, CarName2, CarName3/CarName5/CarName6, CarName7, > CarName8/CarName9. > > Where I run into trouble is, imagine 100 different cars names with > different tires made by 80 manufactures. So when I run an ask query as > above, unique then sort, I am hitting 30 secs to parse the data and > output it to a page. That's way too long. > > I have tried to store like items as an #ask query in property tied to > the page name. So CarName3/CarName5/CarName6 would all have the same > data, ask tire manufacturer property. All three would store the same > data and unique would drop all but one. This method is a case study of > why you should not store properties with #ask queries. Yes, it is much > faster to render the data, 100 in a second or two. The problem is > CarName10 is added and creates its property: > CarName3/CarName5/CarName6/CarName10. Unfortunately, 3, 5, and 6 don't > know about CarName10 unless you refresh the data for each one of the > other 3. So I could hard store the data, {{{manufacturer|}}}, but I > would still have to go back manually and update the other 3 every time a > new car is added. > > Is there a way to solve this without being in a perpetual refresh of > data for storing property values with an #ask? Am I missing something? > > Thanks > Tom > > ------------------------------------------------------------------------------ > _______________________________________________ > Semediawiki-user mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-user > |
From: Tom H. <to...@hu...> - 2014-04-07 14:54:44
|
Hi Hans Thanks so much for your time. I think the main issue is the one thing which can't be done easily in SMW. Group pages by properties without a property of a type page and the property being used on more than one page. ... |sort=tire-brand, car |order=asc ... Would seem to be logical, but since tire-brand isn't a page type and used on more than one page there is no way to group the cars(the pages) by tire-brand. I need to ask individually, what cars(pages) use tire-brand: X and are the color blue. If I use a table ouput, I can see: Tire Brand | Car Foobar Foo Foobar1 Foo1 Foobar Foo2 Foobar2 Foo3 Using the js sort of Tire Brand Column they will realign to when clicked the first time Tire Brand | Car Foobar Foo Foobar Foo2 Foobar1 Foo1 Foobar2 Foo3 But will not Tire Brand | Car ----------------------------- Foobar | Foo | Foo2 ----------------------------- Foobar1 | Foo1 ----------------------------- Foobar2 | Foo3 I tried concepts from the low end. In my example it would be color. All cars color blue, from the true application side, currently there would be 1100 colors or 1100 concept pages for the colors. This cut the time down, with concept to about 16-18 secs from 30 secs. This of course is tied to the number of cars to return. 50 cars to group, 1 to 2 seconds, 300 cars, 12 seconds. To complicate matters, the car colors could change as cars are added. e.g. blue, dark-blue, light-blue, sky-blue to a new set: blue, dark-blue, dark-sky blue, light-blue, light-sky blue, sky-blue. Tom ------ Original Message ------ From: "Hans Oleander" <han...@gm...> To: "Tom Hutchison" <to...@hu...> Cc: "SMW List" <sem...@li...> Sent: 4/7/2014 7:01:35 AM Subject: Re: [Semediawiki-user] Scaling up SMW data >Hi Tom, > >I have read this several times and I believe I still fail to understand >the question. Could you help me to do so? What I have understood so >far, using your car/tire example. So there is many pages on cars, you >want to deal with the subset of those who have blue colors. > >That would imply to #ask for color=blue. Then you would like to get a >list that groups them by tire brand, and have these groups in >alphabetical order. To me it would seem that something like > > ... > |sort=tire-brand, car > |order=asc > ... > >would result in a list that has all blue cars, beginning with the group >of the "AAA" (i.e. alphabetically first brand), and within that group >all pages ("car" being mainlabel) in alphabetical order, those without >a tire brand being at the top of the list. Then of course rather than >having the long default table format list you might use a customized >format using format=template. But that would be rather straigthforward, >so what am I missing? > >The second general advice would be to check whether you might solve >your issue by using cached concepts >(https://semantic-mediawiki.org/wiki/Help:Concept_caching). As concepts >contain results of queries that are constantly being kept up to date >("preprocessed") and thus are already there when doing the actual #ask >query. This might help limiting time consumption during the actual ask >query. So in general you might create a concept that asks for one >property and outputs the contents of another. I understand that this >might involve a lot of concepts, e.g. one for every of the 70000 tire >manufacturers, so one would think about automation using the API (a >script that keeps track of the tire manufacturer list and generates a >standardized concept for new entries. > >Best, >Hans > > > >2014-04-04 20:39 GMT+02:00 Tom Hutchison <to...@hu...>: >>I need some help or advice. Maybe point me at instructions. >> >>Been working on a proof of concept for storing scientific data with >>SMW. >> Started small, but I am experiencing scale up issues and I am only >>at >>10% of what the total could be. Right now, there are 9000 pages with >>data and 1100 pages with #ask queries to display the data. At this >>point >>I'm concerned SMW will not be able to handle it in a suitable browser >>render time. >> >>Partly, I think some of the issues is a lack of standardization on the >>data in regards to naming it. Each lab has their own prefix, but they >>really have the same data based on a position, just a different name. >>Unfortunately, there could be 10 of thousands of positions. Trying to >>explain it easily with cars and tire types... >> >>Some cars have 4 tires made by Goodyear. >>Some cars have 4 tires by Firestone. >>Some cars have 4 tires by Brand X. >>Some cars have 4 tires by ?? . >>Some cars have the color blue. >> >>Now I want to group those together. I need to ask, cars color blue. >>Group cars by the tire brand, then sort alphabetical. The issue I face >>is grouping by tire types. Most of the time that information is there, >>sometimes not. Unless I am missing something, there is no way to >>concatenate the values of one property dependent on another property >>of >>a page. You can only concatenate page names dependent on a property or >>category as far as I know. No, creating a page for all tire >>manufacturers is out. There could be 70,000 pages of manufacturers. >>Same >>for categories. 70,000 categories with one page name or 2 or 3. >> >>I ended up having to ask in three complicated queries, then I throw in >>a >>unique, then sort by into an array. The time is not too bad, less than >>a >>second or right above a second for output for small sets of returned >>data. I came up with this, but the issue is parser time as the data >>set >>grows. (1) arraydefined all cars blue, (2) arrarydefined with an >>arraymap asking for the brand of tires each has which is the same, (3) >>arraydefined a list with an arraymap asking for cars blue, tire >>manufacturer x OR cars blue, !tire manufacturer. Then print the >>arraydefined, unique, sort. Gives me something like this: >> >>CarName1/CarName4, CarName2, CarName3/CarName5/CarName6, CarName7, >>CarName8/CarName9. >> >>Where I run into trouble is, imagine 100 different cars names with >>different tires made by 80 manufactures. So when I run an ask query as >>above, unique then sort, I am hitting 30 secs to parse the data and >>output it to a page. That's way too long. >> >>I have tried to store like items as an #ask query in property tied to >>the page name. So CarName3/CarName5/CarName6 would all have the same >>data, ask tire manufacturer property. All three would store the same >>data and unique would drop all but one. This method is a case study of >>why you should not store properties with #ask queries. Yes, it is >>much >>faster to render the data, 100 in a second or two. The problem is >>CarName10 is added and creates its property: >>CarName3/CarName5/CarName6/CarName10. Unfortunately, 3, 5, and 6 don't >>know about CarName10 unless you refresh the data for each one of the >>other 3. So I could hard store the data, {{{manufacturer|}}}, but I >>would still have to go back manually and update the other 3 every time >>a >>new car is added. >> >>Is there a way to solve this without being in a perpetual refresh of >>data for storing property values with an #ask? Am I missing something? >> >>Thanks >>Tom >>------------------------------------------------------------------------------ >>_______________________________________________ >>Semediawiki-user mailing list >>Sem...@li... >>https://lists.sourceforge.net/lists/listinfo/semediawiki-user > |
From: Patrick L. <pat...@st...> - 2014-04-10 02:33:12
|
On 04/05/2014 02:39 AM, Tom Hutchison wrote: > I need some help or advice. Maybe point me at instructions. > > Been working on a proof of concept for storing scientific data with SMW. > Started small, but I am experiencing scale up issues and I am only at > 10% of what the total could be. Right now, there are 9000 pages with > data and 1100 pages with #ask queries to display the data. At this point > I'm concerned SMW will not be able to handle it in a suitable browser > render time. Some anecdotal evidence - We're using SMW as a project management tool. It works quite well for adding new structures (Oh, this customer has three adresses we need to keep track of?). The templates, at some point, become brutally complicated, especially if you want to do counting or arithmetic. (Project management, so there's things like the cost of different parts of the project nicely added up so we can generate an invoice with almost no effort) Performance is quite horrible, the average for a "simple" project is ~1.5 second server-side. Apparently 4Ghz CPU isn't "fast enough" ;) On more interesting queries like "show me all projects where the project manager is Benny" the performance craters linearly in the amount of projects it selects, and there's evil horrible aarrgh where SMW *quietly* truncates results. So for the PMs with a few dozen to a few hundred projects to show ... well ... worst case I've seen was about 90 seconds render time. (This query, btw, was generating about forty thousand SQL queries, which is amusingly horrible. The newer storage backend (sqlstore3 or what it's called) reduced that to 'only' ~17000, so there's no way out of that performance trap) The truncation happens because SMW assumes that there will never be more than X results, where X is by default something silly like 100. If you increase it the results look good until you hit some internally hardcoded limits. Let's just say I was very NOT AMUSED when I figured out that it will never return more than 10000 results at once, so now I have scripts that pull in batches of 1000 with increasing offset until the result is empty. Hnuurgh. (How else am I supposed to generate statistics over all projects and do other maintenance?) At this scale concurrency becomes a severe problem: If there are multiple edits within a short enough timeframe (say, a second) the MW data is stable, but the semantic data becomes ... uhm ... random. For example some projects have a property, but are not shown in query-for-property unless you either save it again, or you trigger metadata refresh in some other way. So in addition to truncation now there's race conditions, so I force a full rebuild of the semantic data every night. Which has its own set of problems, but like this projects disappear for at most a day. And updating doesn't work because of subtle bugs in the template rendering between versions, I've filed bugs for those I could track down well enough, but there's still some stuff breaking horribly so I can't even upgrade. I see no way to improve this situation, thus I've written a custom exporter and now slowly build a custom app to replace SMW - and so far my prototype never goes above 120msec render time. My recommendation thus would be: Use SMW for prototyping, once you have a proper data model create a custom application. Don't even try scaling up above maybe 10k items all in all. The more properties you use the more racey and slow it'll become, so the best feature is also your worst enemy. That being said, (S)MW isn't all bad, it's just not built with either performance or correctness in mind. So if you don't need exact results but just something similar enough it's good ;) (Reminds me: In some situations if a query is malformed you just get some random crap returned. No proper error message, just "here's a bag of things, maybe something in it is accidentally correct" ... That's almost funny!) Have a nice day, Patrick |
From: Nischay N. <nis...@gm...> - 2014-04-10 05:33:53
|
On Thu, Apr 10, 2014 at 7:49 AM, Patrick Lauer <pat...@st... > wrote: > On 04/05/2014 02:39 AM, Tom Hutchison wrote: > > I need some help or advice. Maybe point me at instructions. > > > > Been working on a proof of concept for storing scientific data with SMW. > > Started small, but I am experiencing scale up issues and I am only at > > 10% of what the total could be. Right now, there are 9000 pages with > > data and 1100 pages with #ask queries to display the data. At this point > > I'm concerned SMW will not be able to handle it in a suitable browser > > render time. > > Some anecdotal evidence - > > We're using SMW as a project management tool. It works quite well for > adding new structures (Oh, this customer has three adresses we need to > keep track of?). The templates, at some point, become brutally > complicated, especially if you want to do counting or arithmetic. > (Project management, so there's things like the cost of different parts > of the project nicely added up so we can generate an invoice with almost > no effort) > > Performance is quite horrible, the average for a "simple" project is > ~1.5 second server-side. Apparently 4Ghz CPU isn't "fast enough" ;) > > On more interesting queries like "show me all projects where the project > manager is Benny" the performance craters linearly in the amount of > projects it selects, and there's evil horrible aarrgh where SMW > *quietly* truncates results. > So for the PMs with a few dozen to a few hundred projects to show ... > well ... worst case I've seen was about 90 seconds render time. > (This query, btw, was generating about forty thousand SQL queries, which > is amusingly horrible. The newer storage backend (sqlstore3 or what it's > called) reduced that to 'only' ~17000, so there's no way out of that > performance trap) > > Could you submit your SQL query logs, so that the developers could have a look at it? > The truncation happens because SMW assumes that there will never be more > than X results, where X is by default something silly like 100. > If you increase it the results look good until you hit some internally > hardcoded limits. Let's just say I was very NOT AMUSED when I figured > out that it will never return more than 10000 results at once, so now I > have scripts that pull in batches of 1000 with increasing offset until > the result is empty. Hnuurgh. > (How else am I supposed to generate statistics over all projects and do > other maintenance?) > > At this scale concurrency becomes a severe problem: If there are > multiple edits within a short enough timeframe (say, a second) the MW > data is stable, but the semantic data becomes ... uhm ... random. For > example some projects have a property, but are not shown in > query-for-property unless you either save it again, or you trigger > metadata refresh in some other way. > > So in addition to truncation now there's race conditions, so I force a > full rebuild of the semantic data every night. Which has its own set of > problems, but like this projects disappear for at most a day. > > And updating doesn't work because of subtle bugs in the template > rendering between versions, I've filed bugs for those I could track down > well enough, but there's still some stuff breaking horribly so I can't > even upgrade. I see no way to improve this situation, thus I've written > a custom exporter and now slowly build a custom app to replace SMW - and > so far my prototype never goes above 120msec render time. > > Out of curiosity, did you attempt to fix SMW before building your own app? Also did you consider switching to a RDF store? Is your app open source? > My recommendation thus would be: Use SMW for prototyping, once you have > a proper data model create a custom application. Don't even try scaling > up above maybe 10k items all in all. The more properties you use the > more racey and slow it'll become, so the best feature is also your worst > enemy. > > That being said, (S)MW isn't all bad, it's just not built with either > performance or correctness in mind. So if you don't need exact results > but just something similar enough it's good ;) > > (Reminds me: In some situations if a query is malformed you just get > some random crap returned. No proper error message, just "here's a bag > of things, maybe something in it is accidentally correct" ... That's > almost funny!) > > Have a nice day, > > Patrick > > > > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees > _______________________________________________ > Semediawiki-user mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-user > -- Cheers, Nischay Nahata |
From: Jeroen De D. <jer...@gm...> - 2014-04-10 12:21:34
|
Hey, It'd be also interesting to profile the code. the structure of > mediawiki is pretty modular, so, you'd have thought it not impossible > to take critical sections and code them in something faster than php > or python. > Profiling is good. However, neither the MediaWiki code or the SMW code are anywhere near modular, MediaWiki likely being the bigger offender. They both follow the big ball of mud pattern and have a high degree of inseparability. Extracting or replacing any "component" (not that there are nicely bounded components) is going to be all but easy. Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 |
From: Patrick L. <pat...@st...> - 2014-04-10 05:43:25
|
On 04/10/2014 01:33 PM, Nischay Nahata wrote: > On Thu, Apr 10, 2014 at 7:49 AM, Patrick Lauer <pat...@st... > > So for the PMs with a few dozen to a few hundred projects to show ... > well ... worst case I've seen was about 90 seconds render time. > (This query, btw, was generating about forty thousand SQL queries, which > is amusingly horrible. The newer storage backend (sqlstore3 or what it's > called) reduced that to 'only' ~17000, so there's no way out of that > performance trap) > > > Could you submit your SQL query logs, so that the developers could have a > look at it? Yes, I did open at least one bug. > > Out of curiosity, did you attempt to fix SMW before building your own app? > Also did you consider switching to a RDF store? > Is your app open source? I did try to figure out where SMW goes wrong, and gave up. I can't even properly *upgrade* to newest because the SMW template parsing diverges in ever so subtle ways (e.g. in one location "-" is now no longer valid, and a few other breakages I can't even figure out) RDF stores sound nice, but most of them are dysfunctional - I did get 4store up and running, but then SMW doesn't like talking to it and basic queries fail. Very amusing. And I lack the optimism to hope that it works "better" than the native storage backend. My app, so far, is an internal prototype. It's too custom to make sense for other people, so there's no reason to opensource it. (Currently mostly flask + sqlalchemy, plus some shell scripts for actions like "create a project folder on the file server) Amusing thing: About one in 4000 requests to the wiki for extracting data fails with "not logged in", but doesn't give a proper http error code - so now I grep for "DOCTYPE" in the exported data to find any broken pages, then manually fetch those too. Why it fails? No one knows, but at least things are not too boring :) Have fun, Patrick |
From: Peter B. <pet...@kc...> - 2014-04-10 06:21:27
|
>> So for the PMs with a few dozen to a few hundred projects to show ... >> well ... worst case I've seen was about 90 seconds render time. >> (This query, btw, was generating about forty thousand SQL queries, which >> is amusingly horrible. The newer storage backend (sqlstore3 or what it's >> called) reduced that to 'only' ~17000, so there's no way out of that >> performance trap) >> I'd be interested to know what these queries were looking for, it seems a lot of work for a relatively small number of projects. I wonder if storing partial results in each project would make sense, or structuring the data differently. It'd be also interesting to profile the code. the structure of mediawiki is pretty modular, so, you'd have thought it not impossible to take critical sections and code them in something faster than php or python. I'm not sure either how well-threaded php is - it might be a 4GHz CPU, but if it has four or eight cores then having your time-critical processes well threaded could improve the overall time considerably. > >> >> Out of curiosity, did you attempt to fix SMW before building your own app? >> Also did you consider switching to a RDF store? >> Is your app open source? > I did try to figure out where SMW goes wrong, and gave up. I can't even > properly *upgrade* to newest because the SMW template parsing diverges > in ever so subtle ways (e.g. in one location "-" is now no longer valid, > and a few other breakages I can't even figure out) > > RDF stores sound nice, but most of them are dysfunctional - I did get > 4store up and running, but then SMW doesn't like talking to it and basic > queries fail. Very amusing. And I lack the optimism to hope that it > works "better" than the native storage backend. > You can make your RDF store memory resident, which can make it considerably faster. |
From: Patrick L. <pat...@st...> - 2014-04-10 07:01:19
|
On 04/10/2014 02:20 PM, Peter Brooks wrote: >>> So for the PMs with a few dozen to a few hundred projects to show ... >>> well ... worst case I've seen was about 90 seconds render time. >>> (This query, btw, was generating about forty thousand SQL queries, which >>> is amusingly horrible. The newer storage backend (sqlstore3 or what it's >>> called) reduced that to 'only' ~17000, so there's no way out of that >>> performance trap) >>> > I'd be interested to know what these queries were looking for, it > seems a lot of work for a relatively small number of projects. I > wonder if storing partial results in each project would make sense, or > structuring the data differently. Yes, it's quite suboptimal. Every single property lookup is a query of its own, so if you look at, say "id | car model | name of driver" that's already three queries for display, plus multiple queries to figure out what items you should display. With ~15k items, and about 1k items to display, that's going to hurt :) > > It'd be also interesting to profile the code. the structure of > mediawiki is pretty modular, so, you'd have thought it not impossible > to take critical sections and code them in something faster than php > or python. Or maybe do less work. Instead of rewriting in ${whatever} you could profile it and see *why* it spends much time there > > I'm not sure either how well-threaded php is - it might be a 4GHz CPU, > but if it has four or eight cores then having your time-critical > processes well threaded could improve the overall time considerably. ... it's single-threaded, but a multi-core CPU still benefits if there are multiple requests in parallel > You can make your RDF store memory resident, which can make it > considerably faster. Blargh :D The data set fits into memory already, so mysql is serving everything nicely already. (~1200 qps for a single thread) (Amusing side-effect: Moving mysql to its own server SLOWED DOWN things considerably because of the ethernet latency!) The only way to "go fast" in this case is just "do less". |
From: Martin G. <mar...@ce...> - 2014-04-10 15:36:39
|
Dear List! I'm looking for a workaround to use "values from" in SF with serval sources. E.g. one Category and one Namespace (like a join). Or values from two different categories. I tested serval possiblities but SF uses for autocompletion always the last "values from"-definition in my form-declaration. Thanks for help! Martin |
From: Ad S. v. S. <ad....@gm...> - 2014-04-10 17:17:16
|
Martin, did you have a look at concepts? It seems to me that you can do what you want with them. http://semantic-mediawiki.org/wiki/Help:Concepts. Ad Verstuurd vanaf mijn iPhone > Op 10 apr. 2014 om 17:36 heeft "Martin Gruner" <mar...@ce...> het volgende geschreven: > > Dear List! > > I'm looking for a workaround to use "values from" in SF with serval > sources. E.g. one Category and one Namespace (like a join). Or values from > two different categories. > I tested serval possiblities but SF uses for autocompletion always the > last "values from"-definition in my form-declaration. > > Thanks for help! > Martin > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees > _______________________________________________ > Semediawiki-user mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-user |
From: Martin G. <mar...@ce...> - 2014-04-14 09:19:44
|
Dear Ad! Thank you very much! It works fine. Bye, Martin __________________________________ Von: Ad Strack van Schijndel <ad....@gm...> An: Martin Gruner <mar...@ce...>, Kopie: Semantic MediaWiki users <sem...@li...> Datum: 10.04.2014 19:17 Betreff: Re: [Semediawiki-user] Semantic Forms: values from Martin, did you have a look at concepts? It seems to me that you can do what you want with them. http://semantic-mediawiki.org/wiki/Help:Concepts. Ad Verstuurd vanaf mijn iPhone > Op 10 apr. 2014 om 17:36 heeft "Martin Gruner" <mar...@ce...> het volgende geschreven: > > Dear List! > > I'm looking for a workaround to use "values from" in SF with serval > sources. E.g. one Category and one Namespace (like a join). Or values from > two different categories. > I tested serval possiblities but SF uses for autocompletion always the > last "values from"-definition in my form-declaration. > > Thanks for help! > Martin > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees > _______________________________________________ > Semediawiki-user mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-user |