we had a similar request like this already a couple of times. Although I think that having
(which actually is a count for groups) is often not sufficient, e.g. instead of the pure number users want the percentage (compared to all values), i.e. 42,8% instead of 3 (out of 7) occurrences. Other aggregates naturally come to mind as well, but are more complicated since they aggregate actual values not purely the number of "rows".

Nevertheless, the simple solution you propose, IMHO, should be realized in a way that no custom coding is required for query printers. If I am not mistaken, the JQPlot-result formats expect a label and a numeric value for displaying the charts. If your Distributable code would modify the query results to contain only the labels and the numbers than they could be rendered also by all other  result formats and no additional code would be needed. The aggregation would be a kind of post processing of the query results, before they are passed to the result printers turning a one column query with n lines and m values into a two column query with m lines.

Would this be sth. your code could support?


On 08.11.2011 15:08, Jeroen De Dauw wrote:
I have implemented general support for value distributions in result formats in SMW. This email explains this feature and is meant to gather feedback on it before SMW 1.7 is released.

== Goal ==

Allow visualizing how many times each value in a result occurs, ie allow for creating value distributions.

For example, this result set: foo bar baz foo bar bar ohi
Will be turned into
* bar (3)
* foo (2)
* baz (1)
* ohi (1)

This can then be displayed in chart formats, with the value as label and the occurrence count as value. Although the most obvious use for this are charts, it can really be used with any format.

== Current implementation: how to use it ==

Each format needs to add support for this functionality before you'll be able to use it to visualize value distributions. Right now only jqplotbar and jqplotpie make use of it. All formats that support this functionality accept 3 additional parameters:

* distribution (on/off) - if a value distribution should be calculated and shown instead of the regular results.
* distributionsort (asc/desc/none) - the sort of the values, by occurance count.
* distributionlimit (positive whole number) - the max amount of values to visualize.

This example will get the countries the matching cities are located in, count the occurance of each, and display this as a pie chart. Note the use of the mainlabel parameter. If this is not done, the cities themselves will also be put into the value distribution.

{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| mainlabel=-
| limit=500

This example will do the same query,  but will only show the 10 countries with most matching cities, in descending order.

{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| distributionsort=desc
| distributionlimit=10
| mainlabel=-
| limit=500

You can see these examples and 2 others working on the mapping documentation wiki, making use of the example semantic data there: http://mapping.referata.com/wiki/Value_distribution_examples

== Implementation details (technical) ==

After looking into several options I decided to implement this as a result printer class deriving from SMWResultPrinter, requiring changes to each format that wants to support this behaviour, but making this relatively easy. This approach seems like a good balance between making this functionality available as easy as possible and staying sane.

This class is called SMWDistributablePrinter and can be found here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup

Example jqplotpie implementation: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup

== Request for comments ==

Feedback is welcome. The main question for users is what names the parameters should use. Right now they all start with "distribution", but there might be a better (and shorter) name. From developers I'd like to know if you agree with this architecture.


