From: S P. <in...@sk...> - 2008-07-01 03:20:32
|
If a page gives multiple values for a property, which one wins when you sort on that property? In my local tests on SMW 1.1.2. I can't see any pattern to it. On http://sandbox.semantic-mediawiki.org/wiki/Test_sorting (version 1.2f-SVN) it seems to sort using the "first" value on the page for the property. That page sorts by Author and many papers put each author annotation alphabetically. But this paper appears in the 'B's despite having some 'A' authors: Using and Combining RDF Vocabularies for Expert Finding Boanerges Aleman-Meza Lyndon JB Nixon Axel Polleres <--- !! John G. Breslin Harold Boley Anna V. Zhdanova <--- !! Malgorzata Mochol Uldis Bojars If I sort descending, again SMW seems to sort on the first value it finds for the property. ==> Is this a bug? You could define a sort order, e.g. if ascending sort based on the lowest value of the property's values on the page. A while ago Markus told me the order of property values for a page retrieved from the database was indeterminate, so values might not be returned in page order, so there's no "first" property value on a page. ==> Should http://semantic-mediawiki.org/wiki/Help:Semantic_search#Sorting_results just say "If pages have multiple values for the property, their sort order is undefined." ? Or I could just not say anything ;-) Thanks for any insight! -- =S Page |
From: Jon L. <dat...@gm...> - 2008-07-01 10:01:11
|
Since the order in which the properties appear on a given page is arbitrary, I would not want to use it to help resolve page sorting, even if it is possible to do so. IMHO, the best approach is to sort the properties on the page using the same approach that you intend for sorting the pages (e.g., if sorting pages in ascending order, sort multiple properties on a page in ascending order), then sort the pages based on the first property in this order, with subsequent properties being used to break ties. -- Jonathan "Dataweaver" Lang |
From: Mov GP 0 <mo...@gm...> - 2008-07-01 10:55:23
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, I think the problem with sorting this is that the lines are not atomar. Instead of having a table of the form |- | Property1 || Property2.1, Property2.2, Property2.3, Property2.4 |- the output should be rather |- | Property1 || Property2.1 |- | Property1 || Property2.2 |- | Property1 || Property2.3 |- | Property1 || Property2.4 |- this would allow proper sorting. To not break anything, I suggest a new parameter ie. called "group": {{#ask: [[Author::+]] |?Author |sort=Author |group= false }} or, more familar to SQL, "groupby": {{#ask: [[Author::+]] |?Author |sort=Author |groupby= Author }} This syntax could resolve the sorting problem. ys, MovGP0 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: http://getfiregpg.org iEYEARECAAYFAkhqDRAACgkQw23Lwv58rb48ZwCglsP+33YIntGX65UHMAL7NQnD zWEAoJhtPdWf1eeNz+c3S1yJ28psrj6+ =dsvg -----END PGP SIGNATURE----- 2008/7/1 Jon Lang <dat...@gm...>: > Since the order in which the properties appear on a given page is > arbitrary, I would not want to use it to help resolve page sorting, > even if it is possible to do so. IMHO, the best approach is to sort > the properties on the page using the same approach that you intend for > sorting the pages (e.g., if sorting pages in ascending order, sort > multiple properties on a page in ascending order), then sort the pages > based on the first property in this order, with subsequent properties > being used to break ties. > > -- > Jonathan "Dataweaver" Lang > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > -- ------ You can download my public PGP-Key here: http://members.aon.at/custos/public_pgp_key.asc KeyID: FE7CADBE Fingerprint: 9B8F 259C 4172 221C B2F8 BE3D C36D CBC2 FE7C ADBE |
From: Jon L. <dat...@gm...> - 2008-07-01 15:29:38
|
Mov GP 0 wrote: > Hello, > I think the problem with sorting this is that the lines are not > atomar. Instead of having a table of the form > > |- > | Property1 || Property2.1, Property2.2, Property2.3, Property2.4 > |- > > the output should be rather > > |- > | Property1 || Property2.1 > |- > | Property1 || Property2.2 > |- > | Property1 || Property2.3 > |- > | Property1 || Property2.4 > |- > > this would allow proper sorting. To not break anything, I suggest a > new parameter ie. called "group": > > {{#ask: > [[Author::+]] > |?Author > |sort=Author > |group= false > }} > > or, more familar to SQL, "groupby": > > {{#ask: > [[Author::+]] > |?Author > |sort=Author > |groupby= Author > }} > > This syntax could resolve the sorting problem. This does not resolve the sorting problem, since you're still left with the question of how to handle sorting when grouping multiple values into a single entry. As well, it opens a new can of worms in bringing up the question of whether a given page should be reported once, or once for every value that it has in a given property. It's a fascinating question that deserves discussion; but it has consequences well beyond the issue of sorting. For instance, take the following query: {{#ask: [[Category:Book]] | sort=Author }} Note that this query does not display the author for each book found; indeed, it doesn't even guarantee that a given book will name the author(s). Note also that I did not include any sort of grouping or degrouping parameter. What sort of result should this query produce? As written, I believe that it should list each book on the wiki exactly once. The question at hand is the order in which the books should be presented. There are actually two issues at hand here: what to do with multiple values of a property on a page, and what to do with the absence of values for a property on a page. I've already stated my proposal for resolving the first issue; for the second issue, Pages without the sorted property should probably be listed after pages with it, unless you explicitly state otherwise. -- Now, let's look at "grouping": {{#ask: [[Category:Book]] | ?Author | duplicate=Author }} My proposal here is that "duplicate" causes the page to show up in the results once if it has zero or one Author, and once per Author if it has more than one, treating each entry as if it only had one Author. Note that I'm not sorting by Author in this query; you don't have to sort by a property in order to duplicate on it. Conversely, while I _am_ having it list the Author for each result, you don't have to do that, either. This is why I say that the subject should be addressed separately: it's largely orthogonal to the sorting problem, with the sole exception that for the case where you're willing to duplicate on the property that you're sorting on, the multiple values issue goes away. -- Jonathan "Dataweaver" Lang |
From: Markus K. <ma...@se...> - 2008-07-28 19:07:16
|
After returning from Wikimania in Alexandria, I can contribute my five piastres to close this issue: As was rightly remarked, SMW does not define the sorting behaviour if a property has many values (or no value). This is documented. If you want to define one special value (the first or whatever) for this purpose, then you just need to make a property for that task and give it only one value per page. Using the new sortkeys can also help in some cases. The JavaScript live sorting of tables acts on the text as displayed, hence depends on the order of displayed values. One could make that alphabetical if considered useful. In general, we would like to have some more parameters for printouts (e.g. to set a limit on how many values of a multi-valued property should be given). We currently lack some smart syntax for doing this. In the early days of SMW, we have also had one property value per column and line (similar to what Mov GP 0 suggested). But this has funny effects if there are many such columns, since you get all combinations of values -- just as a proper DB is supposed to do it. This was not very useful and hence has been dropped. Another reason to keep sorting options low is performance. If you multiply columns for each value given in certain columns, then you get much larger result sets to handle (this was one reason early SMWs were slower on querying), and result caching as scheduled for next release would probably be less effective or more difficult. So, to make long things short, we do not intend to extend the sorting options anytime soon. I also feel that the multi-property sorting is already quite complex; think of all that poor not so technically minded users that must grok all that! Cheers, Markus On Dienstag, 1. Juli 2008, Jon Lang wrote: > Mov GP 0 wrote: > > Hello, > > I think the problem with sorting this is that the lines are not > > atomar. Instead of having a table of the form > > > > |- > > | Property1 || Property2.1, Property2.2, Property2.3, Property2.4 > > |- > > > > the output should be rather > > > > |- > > | Property1 || Property2.1 > > |- > > | Property1 || Property2.2 > > |- > > | Property1 || Property2.3 > > |- > > | Property1 || Property2.4 > > |- > > > > this would allow proper sorting. To not break anything, I suggest a > > new parameter ie. called "group": > > > > {{#ask: > > [[Author::+]] > > > > |?Author > > |sort=Author > > |group= false > > > > }} > > > > or, more familar to SQL, "groupby": > > > > {{#ask: > > [[Author::+]] > > > > |?Author > > |sort=Author > > |groupby= Author > > > > }} > > > > This syntax could resolve the sorting problem. > > This does not resolve the sorting problem, since you're still left > with the question of how to handle sorting when grouping multiple > values into a single entry. As well, it opens a new can of worms in > bringing up the question of whether a given page should be reported > once, or once for every value that it has in a given property. It's a > fascinating question that deserves discussion; but it has consequences > well beyond the issue of sorting. > > For instance, take the following query: > > {{#ask: > [[Category:Book]] > > | sort=Author > > }} > > Note that this query does not display the author for each book found; > indeed, it doesn't even guarantee that a given book will name the > author(s). Note also that I did not include any sort of grouping or > degrouping parameter. What sort of result should this query produce? > > As written, I believe that it should list each book on the wiki > exactly once. The question at hand is the order in which the books > should be presented. There are actually two issues at hand here: what > to do with multiple values of a property on a page, and what to do > with the absence of values for a property on a page. I've already > stated my proposal for resolving the first issue; for the second > issue, Pages without the sorted property should probably be listed > after pages with it, unless you explicitly state otherwise. > > -- > > Now, let's look at "grouping": > > {{#ask: > [[Category:Book]] > > | ?Author > | duplicate=Author > > }} > > My proposal here is that "duplicate" causes the page to show up in the > results once if it has zero or one Author, and once per Author if it > has more than one, treating each entry as if it only had one Author. > Note that I'm not sorting by Author in this query; you don't have to > sort by a property in order to duplicate on it. Conversely, while I > _am_ having it list the Author for each result, you don't have to do > that, either. This is why I say that the subject should be addressed > separately: it's largely orthogonal to the sorting problem, with the > sole exception that for the case where you're willing to duplicate on > the property that you're sorting on, the multiple values issue goes > away. -- Markus Krötzsch Semantic MediaWiki http://semantic-mediawiki.org http://korrekt.org ma...@se... |
From: Dan B. <dan...@gm...> - 2009-03-23 13:44:28
|
2008/7/28 Markus Krötzsch <ma...@se...>: > After returning from Wikimania in Alexandria, I can contribute my five > piastres to close this issue: > > As was rightly remarked, SMW does not define the sorting behaviour if a > property has many values (or no value). This is documented. If you want to > define one special value (the first or whatever) for this purpose, then you > just need to make a property for that task and give it only one value per > page. Using the new sortkeys can also help in some cases. > > The JavaScript live sorting of tables acts on the text as displayed, hence > depends on the order of displayed values. One could make that alphabetical if > considered useful. In general, we would like to have some more parameters for > printouts (e.g. to set a limit on how many values of a multi-valued property > should be given). We currently lack some smart syntax for doing this. > > In the early days of SMW, we have also had one property value per column and > line (similar to what Mov GP 0 suggested). But this has funny effects if > there are many such columns, since you get all combinations of values -- just > as a proper DB is supposed to do it. This was not very useful and hence has > been dropped. > > Another reason to keep sorting options low is performance. If you multiply > columns for each value given in certain columns, then you get much larger > result sets to handle (this was one reason early SMWs were slower on > querying), and result caching as scheduled for next release would probably be > less effective or more difficult. > > So, to make long things short, we do not intend to extend the sorting options > anytime soon. I also feel that the multi-property sorting is already quite > complex; think of all that poor not so technically minded users that must > grok all that! Sorry to dig up this old thread, but I was searching back over the list to see if the behaviour that I ran into was documented. I have several pages with several properties, some properties with multiple values. When I #ask for pages (default, tabular output), I get one row of data per page, with multiple values per-page grouped into separate lines of one cell. However, when I sort on a property that can have more than one value, I see some pages turning up multiple times in the result. The page occurs once in the table for each unique instance of the property, in the right sort order for that property. Because that's a bit cryptic, here is an example: Sorting on ID (or without sorting): +----+-----------+-----------+ | ID | Property1 | Property2 | +----+-----------+-----------+ | A | P | X | +----+-----------+-----------+ | B | Q | W | | | | Y | +----+-----------+-----------+ | C | R | Z | +----+-----------+-----------+ ... Sorting on Property2: +----+-----------+-----------+ | ID | Property1 | Property2 | +----+-----------+-----------+ | B | Q | W | | | | Y | +----+-----------+-----------+ | A | P | X | +----+-----------+-----------+ | B | Q | W | | | | Y | +----+-----------+-----------+ | C | R | Z | +----+-----------+-----------+ ... Is this now the agreed correct behaviour? It seems reasonable, but the above discussion was never resolved, so I thought I'd ask. At first I found this behaviour confusing, but actually it fits what I need quite well. The only slightly annoying thing is that the multiple value that is being sorted on occurs both times in both places (in an arbitrary order). If you go the whole hog and duplicate the row, I'd rather see something like this: Sorting on Property2: +----+-----------+-----------+ | ID | Property1 | Property2 | +----+-----------+-----------+ | B | Q | W | +----+-----------+-----------+ | A | P | X | +----+-----------+-----------+ | B | Q | Y | +----+-----------+-----------+ | C | R | Z | +----+-----------+-----------+ ... (Its just a bit neater). Thanks, Dan. > Cheers, > > Markus > > > On Dienstag, 1. Juli 2008, Jon Lang wrote: >> Mov GP 0 wrote: >> > Hello, >> > I think the problem with sorting this is that the lines are not >> > atomar. Instead of having a table of the form >> > >> > |- >> > | Property1 || Property2.1, Property2.2, Property2.3, Property2.4 >> > |- >> > >> > the output should be rather >> > >> > |- >> > | Property1 || Property2.1 >> > |- >> > | Property1 || Property2.2 >> > |- >> > | Property1 || Property2.3 >> > |- >> > | Property1 || Property2.4 >> > |- >> > >> > this would allow proper sorting. To not break anything, I suggest a >> > new parameter ie. called "group": >> > >> > {{#ask: >> > [[Author::+]] >> > >> > |?Author >> > |sort=Author >> > |group= false >> > >> > }} >> > >> > or, more familar to SQL, "groupby": >> > >> > {{#ask: >> > [[Author::+]] >> > >> > |?Author >> > |sort=Author >> > |groupby= Author >> > >> > }} >> > >> > This syntax could resolve the sorting problem. >> >> This does not resolve the sorting problem, since you're still left >> with the question of how to handle sorting when grouping multiple >> values into a single entry. As well, it opens a new can of worms in >> bringing up the question of whether a given page should be reported >> once, or once for every value that it has in a given property. It's a >> fascinating question that deserves discussion; but it has consequences >> well beyond the issue of sorting. >> >> For instance, take the following query: >> >> {{#ask: >> [[Category:Book]] >> >> | sort=Author >> >> }} >> >> Note that this query does not display the author for each book found; >> indeed, it doesn't even guarantee that a given book will name the >> author(s). Note also that I did not include any sort of grouping or >> degrouping parameter. What sort of result should this query produce? >> >> As written, I believe that it should list each book on the wiki >> exactly once. The question at hand is the order in which the books >> should be presented. There are actually two issues at hand here: what >> to do with multiple values of a property on a page, and what to do >> with the absence of values for a property on a page. I've already >> stated my proposal for resolving the first issue; for the second >> issue, Pages without the sorted property should probably be listed >> after pages with it, unless you explicitly state otherwise. >> >> -- >> >> Now, let's look at "grouping": >> >> {{#ask: >> [[Category:Book]] >> >> | ?Author >> | duplicate=Author >> >> }} >> >> My proposal here is that "duplicate" causes the page to show up in the >> results once if it has zero or one Author, and once per Author if it >> has more than one, treating each entry as if it only had one Author. >> Note that I'm not sorting by Author in this query; you don't have to >> sort by a property in order to duplicate on it. Conversely, while I >> _am_ having it list the Author for each result, you don't have to do >> that, either. This is why I say that the subject should be addressed >> separately: it's largely orthogonal to the sorting problem, with the >> sole exception that for the case where you're willing to duplicate on >> the property that you're sorting on, the multiple values issue goes >> away. > > > > -- > Markus Krötzsch > Semantic MediaWiki http://semantic-mediawiki.org > http://korrekt.org ma...@se... > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > |