From: Markus K. <ma...@se...> - 2012-11-22 09:23:14
|
Hi, I would like to ask about this: http://semantic-mediawiki.org/wiki/Help:Max_format I am afraid to say that this idea seems to be fundamentally broken. The above page seriously suggests to find the largest population number in the wiki by querying for a list of *all cities with and without population* and invoke PHP code that scans through this list to find the maximum (this is what format=max does, AFAIK). The query to do this is: {{#ask: [[Category:City]] | ?Population | format=max }} This is an extremely slow method of producing wrong results (the results will be wrong as soon as there are enough pages in the wiki so that the one with the maximum value is after the default query limit when ordering results alphabetically). What one would do instead is to ask for the one result that has the largest value right away, like this: {{#ask: [[Category:City]] | ?Population | sort=population | order=DESC | limit=1 | format=max }} The max format in this case is obsolete, since one could also just do {{#ask: [[Category:City]] | ?Population= | mainlabel=- | sort=population | order=DESC | limit=1 }} This has the big advantage that one can also use further output formatting on the resulting number, e.g., to get it in a plain format without any beautification. I just noted these problems since there seem to be cases where PHP runs out of time/memory due to users following the above query anti-pattern [1]. My conclusion would be: let's drop max/min as soon as possible and change the documentation to give the efficient query pattern I gave above. Markus [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=42347 |
From: Yury K. <kat...@gm...> - 2012-11-22 11:23:00
|
The premises are clear: (1) the current implementation of the max parser function is slow and (2) there is a workaround for making max queries quicker. The conclusion is not clear: "let's drop the max ASAP". It's not that hard to replace the current implementation of MAX format with the faster one and save the backward compatibility there. ----- Yury Katkov, WikiVote On Thu, Nov 22, 2012 at 1:23 PM, Markus Krötzsch <ma...@se...> wrote: > Hi, > > I would like to ask about this: > > http://semantic-mediawiki.org/wiki/Help:Max_format > > I am afraid to say that this idea seems to be fundamentally broken. The > above page seriously suggests to find the largest population number in > the wiki by querying for a list of *all cities with and without > population* and invoke PHP code that scans through this list to find the > maximum (this is what format=max does, AFAIK). The query to do this is: > > {{#ask: [[Category:City]] > | ?Population > | format=max > }} > > This is an extremely slow method of producing wrong results (the results > will be wrong as soon as there are enough pages in the wiki so that the > one with the maximum value is after the default query limit when > ordering results alphabetically). > > What one would do instead is to ask for the one result that has the > largest value right away, like this: > > {{#ask: [[Category:City]] > | ?Population > | sort=population > | order=DESC > | limit=1 > | format=max > }} > > The max format in this case is obsolete, since one could also just do > > {{#ask: [[Category:City]] > | ?Population= > | mainlabel=- > | sort=population > | order=DESC > | limit=1 > }} > > This has the big advantage that one can also use further output > formatting on the resulting number, e.g., to get it in a plain format > without any beautification. > > > I just noted these problems since there seem to be cases where PHP runs > out of time/memory due to users following the above query anti-pattern > [1]. My conclusion would be: let's drop max/min as soon as possible and > change the documentation to give the efficient query pattern I gave above. > > Markus > > [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=42347 > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel |
From: Markus K. <ma...@se...> - 2012-11-23 08:05:15
|
On 22/11/12 11:22, Yury Katkov wrote: > The premises are clear: > (1) the current implementation of the max parser function is slow and > (2) there is a workaround for making max queries quicker. > > The conclusion is not clear: "let's drop the max ASAP". > > It's not that hard to replace the current implementation of MAX format > with the faster one and save the backward compatibility there. It would be hard to do this (cleanly) in an automated way, because it requires changes to many other query parameters that are superior to format (format is the last thing to play a role in query evaluation). Also, as Jeroen illustrated, there are cases where the user really wants to have a "local" maximum, and we would not be able to recognise these. So let us just keep it as it is but improve the docs (maybe this is what you meant). Markus > ----- > Yury Katkov, WikiVote > > > > On Thu, Nov 22, 2012 at 1:23 PM, Markus Krötzsch > <ma...@se...> wrote: >> Hi, >> >> I would like to ask about this: >> >> http://semantic-mediawiki.org/wiki/Help:Max_format >> >> I am afraid to say that this idea seems to be fundamentally broken. The >> above page seriously suggests to find the largest population number in >> the wiki by querying for a list of *all cities with and without >> population* and invoke PHP code that scans through this list to find the >> maximum (this is what format=max does, AFAIK). The query to do this is: >> >> {{#ask: [[Category:City]] >> | ?Population >> | format=max >> }} >> >> This is an extremely slow method of producing wrong results (the results >> will be wrong as soon as there are enough pages in the wiki so that the >> one with the maximum value is after the default query limit when >> ordering results alphabetically). >> >> What one would do instead is to ask for the one result that has the >> largest value right away, like this: >> >> {{#ask: [[Category:City]] >> | ?Population >> | sort=population >> | order=DESC >> | limit=1 >> | format=max >> }} >> >> The max format in this case is obsolete, since one could also just do >> >> {{#ask: [[Category:City]] >> | ?Population= >> | mainlabel=- >> | sort=population >> | order=DESC >> | limit=1 >> }} >> >> This has the big advantage that one can also use further output >> formatting on the resulting number, e.g., to get it in a plain format >> without any beautification. >> >> >> I just noted these problems since there seem to be cases where PHP runs >> out of time/memory due to users following the above query anti-pattern >> [1]. My conclusion would be: let's drop max/min as soon as possible and >> change the documentation to give the efficient query pattern I gave above. >> >> Markus >> >> [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=42347 >> >> ------------------------------------------------------------------------------ >> Monitor your physical, virtual and cloud infrastructure from a single >> web console. Get in-depth insight into apps, servers, databases, vmware, >> SAP, cloud infrastructure, etc. Download 30-day Free Trial. >> Pricing starts from $795 for 25 servers or applications! >> http://p.sf.net/sfu/zoho_dev2dev_nov >> _______________________________________________ >> Semediawiki-devel mailing list >> Sem...@li... >> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > |
From: Jeroen De D. <jer...@gm...> - 2012-11-22 14:44:31
|
Hey, Markus, I agree with you that in most cases you're better off doing a sorted query, and think this should be reflected in the documentation. I however would not be to quick to conclude the max format is completely useless. For one thing it supports recursion when dealing with containers. People might also be relying on the result being the max or min of the result set and not everything that matches the query, ie querying the 10 last edited cities and getting the biggest population of those. So I suggest making sure the documentation is clear enough so people don't misuse it while keeping the printer. After all, it's in SRF and can easily be disabled. And it has not been causing serious problems motivating removal to my knowledge. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. -- |
From: Markus K. <ma...@se...> - 2012-11-23 08:02:47
|
On 22/11/12 14:44, Jeroen De Dauw wrote: > Hey, > > Markus, I agree with you that in most cases you're better off doing a > sorted query, and think this should be reflected in the documentation. > I however would not be to quick to conclude the max format is > completely useless. For one thing it supports recursion when dealing > with containers. Interesting. How does that work? > People might also be relying on the result being the > max or min of the result set and not everything that matches the > query, ie querying the 10 last edited cities and getting the biggest > population of those. Nice example. This is also a good example to illustrate how max really works (finding the max within the displayed results as opposed to finding the max in the database). > > So I suggest making sure the documentation is clear enough so people > don't misuse it while keeping the printer. After all, it's in SRF and > can easily be disabled. And it has not been causing serious problems > motivating removal to my knowledge. Ok, agreed. We can achieve all of this with good documentation. Maybe it is even better to have max as an anchor for this documentation -- people who don't want max (but don't know it yet) will likely be looking there. Markus |
From: Jeroen De D. <jer...@gm...> - 2012-11-23 10:16:21
|
Hey, > Interesting. How does that work? Use the source, Markus... :) https://github.com/wikimedia/mediawiki-extensions-SemanticResultFormats/blob/master/formats/math/SRF_Math.php#L83 Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. -- |
From: Markus K. <ma...@se...> - 2012-11-23 10:19:48
|
On 23/11/12 10:15, Jeroen De Dauw wrote: > Hey, > >> Interesting. How does that work? > > Use the source, Markus... :) > > https://github.com/wikimedia/mediawiki-extensions-SemanticResultFormats/blob/master/formats/math/SRF_Math.php#L83 Not very helpful. I know I can dig into source code to find out how everything works. If somebody already knows what this feature does, I am still happy to read a (short) synopsis. Would also make sense to document this somewhere. Markus > > Cheers > > -- > Jeroen De Dauw > http://www.bn2vs.com > Don't panic. Don't be evil. > -- > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > |