Thread: [SMW-devel] Reminder of Architecture Overview update

Lets you store and query data within the wiki's pages.

Brought to you by: bn2vs, kghbln, mkroetzsch, vrandezo

semediawiki-devel

[SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-09-25 04:18:27

Sorry for repeating this, but wanted to remind about the need to update 
the rest of the Architecture Overview article [1]. I guess that even 
just updating the text that is there (there are two sections not updated 
to 1.6) would go a long way?

The problem now is that even parts supposed to be updated for 1.6 
changes still contain pointers to the supposedly ditched SMWDataValue 
for example (see: [2]), which makes it somewhat confusing.

I'd love to have a shortlist of the foundational classes I need to know 
to represent triple data with SMW classes ... Should I basically be fine 
with SMWDataItems (elements) and SMWSemanticData (aggregates of facts 
per subject)? ... or is there some other foundational class I should add 
to the shortlist?

(Should not forget to say that the 1.6 changes looks very nice! :)
... if we can just get the know how to use it all ;) )

Cheers,
// Samuel

[1]: http://www.semantic-mediawiki.org/wiki/Architecture_guide
[2]: 
http://www.semantic-mediawiki.org/wiki/Architecture_guide#SMWSemanticData_and_other_ways_to_represent_facts 



-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Yury K. <kat...@gm...> - 2011-09-25 08:55:43

+1 for continuing the Architecture Guide.

On Sun, Sep 25, 2011 at 8:18 AM, Samuel Lampa <sam...@ri...>wrote:

> Sorry for repeating this, but wanted to remind about the need to update
> the rest of the Architecture Overview article [1]. I guess that even
> just updating the text that is there (there are two sections not updated
> to 1.6) would go a long way?
>
> The problem now is that even parts supposed to be updated for 1.6
> changes still contain pointers to the supposedly ditched SMWDataValue
> for example (see: [2]), which makes it somewhat confusing.
>
> I'd love to have a shortlist of the foundational classes I need to know
> to represent triple data with SMW classes ... Should I basically be fine
> with SMWDataItems (elements) and SMWSemanticData (aggregates of facts
> per subject)? ... or is there some other foundational class I should add
> to the shortlist?
>
> (Should not forget to say that the 1.6 changes looks very nice! :)
> ... if we can just get the know how to use it all ;) )
>
> Cheers,
> // Samuel
>
> [1]: http://www.semantic-mediawiki.org/wiki/Architecture_guide
> [2]:
>
> http://www.semantic-mediawiki.org/wiki/Architecture_guide#SMWSemanticData_and_other_ways_to_represent_facts
>
>
>
> --
> Samuel Lampa
> ---------------------------------------
>  Bioinformatician @ Uppsala University
>    Blog: http://saml.rilspace.org
> ---------------------------------------
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Semediawiki-devel mailing list
> Sem...@li...
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>



-- 
Yury V. Katkov
WikiVote! llc

Re: [SMW-devel] Reminder of Architecture Overview update

From: Markus K. <ma...@se...> - 2011-09-25 09:05:58

On 25/09/11 05:18, Samuel Lampa wrote:
> Sorry for repeating this, but wanted to remind about the need to update
> the rest of the Architecture Overview article [1]. I guess that even
> just updating the text that is there (there are two sections not updated
> to 1.6) would go a long way?
>
> The problem now is that even parts supposed to be updated for 1.6
> changes still contain pointers to the supposedly ditched SMWDataValue
> for example (see: [2]), which makes it somewhat confusing.

Indeed, I will see what I can do.

>
> I'd love to have a shortlist of the foundational classes I need to know
> to represent triple data with SMW classes ... Should I basically be fine
> with SMWDataItems (elements) and SMWSemanticData (aggregates of facts
> per subject)? ... or is there some other foundational class I should add
> to the shortlist?

For representing input data, that's all. Query outputs are represented 
in SMWQueryResult (basically an iterator for a 3D-array) but the data 
returned there is also based on DIs.

RDF data is represented by a smaller set of classes under SMWExpElement. 
These classes represent triples for the purpose of serialisation (they 
abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle).

Greetings (from the bus back to Oxford),

Markus

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-09-25 22:04:04

On 09/25/2011 11:05 AM, Markus Krötzsch wrote:
>
>>
>> I'd love to have a shortlist of the foundational classes I need to know
>> to represent triple data with SMW classes ... Should I basically be fine
>> with SMWDataItems (elements) and SMWSemanticData (aggregates of facts
>> per subject)? ... or is there some other foundational class I should add
>> to the shortlist?
>
> For representing input data, that's all. Query outputs are represented
> in SMWQueryResult (basically an iterator for a 3D-array) but the data
> returned there is also based on DIs.
>
> RDF data is represented by a smaller set of classes under SMWExpElement.
> These classes represent triples for the purpose of serialisation (they
> abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle).

Ok, many thanks for the hints! Will have a closer look at that ...

Cheers,
// Samuel

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-10-31 17:53:08

On 09/25/2011 11:05 AM, Markus Krötzsch wrote:
> RDF data is represented by a smaller set of classes under SMWExpElement.
> These classes represent triples for the purpose of serialisation (they
> abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle).

Got two questions:

=== Q1: Any SMWExpElements / SMWData converter? ===

Are there converters from/to the SMWExpElement related classes and the 
SMWData/SMWDataItems combo already?

I ask since so far I have been thinking about using the 
SMWData/SMWDataItem combo as native format in RDFIO, but it strikes me 
that the SMWExpElement related classes much more closely matches the 
data structure that you get from ARC2's RDF parsers.

Thus, I was thinking that if I can do only the conversion from ARC2 data 
structures to SMWExpElements classes, and then there is already some 
converters to SMWData/SMWDataItem, I wouldn't need to reinvent that 
wheel...?

=== Q2: Status of SMWData/SMWDataItem as API? ===

Also I wondered what status the SMWData/SMWDataItem classes are supposed 
to have, as a general API? ... Are they the supposed API, or is SMW 
going towards preferring to talk SPARQL with all extensions ... or even 
SMWExpElements?

I ask this since it does not seem clear that I will really *need* to use 
the SMWData/SMWDataItem combo as a representation, if I do the wiki page 
updates either with the Wiki Object Model extension or an own writer class.

I would still prefer to use it, if it is pushed as a preferred API for 
these kind of things, but I wondered whether that is so for the 
foreseeable future?

// Samuel

-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-10-31 18:13:13

On 10/31/2011 06:52 PM, Samuel Lampa wrote:
> === Q2: Status of SMWData/SMWDataItem as API? ===
>
> Also I wondered what status the SMWData/SMWDataItem classes are supposed
> to have, as a general API? ... Are they the supposed API, or is SMW
> going towards preferring to talk SPARQL with all extensions ... or even
> SMWExpElements?
>
> I ask this since it does not seem clear that I will really*need*  to use
> the SMWData/SMWDataItem combo as a representation, if I do the wiki page
> updates either with the Wiki Object Model extension or an own writer class.
>
> I would still prefer to use it, if it is pushed as a preferred API for
> these kind of things, but I wondered whether that is so for the
> foreseeable future?

The thing that makes me wonder, is since we're basically talking about 
two slightly different (though very much overlapping) representations: 
RDF (as represented by SMWExpElement rel. classes), and Semantic 
MediaWiki facts (as repr. by SMWData/SMWDataItem).

My problem, in the context of RDFIO, is that it seems I actually need 
both of these to capture the information from both worlds ... since:

a. I need to store the URI:s, which only SMWExpElement classes do
b. I need to store the wiki page titles that I choose to use (as part of 
RDFIO:s algorithm), which only the SMWData/SMWData combo does.

... thus it seems there's at least two options:

1. RDFIO creates an own more general data container, which wraps both 
the SMWData/SMWDataItem one, and the RDF one (possibly both the 
SMWExpElement one, and ARC2:s data structures), with in-built converters 
between all of these,

2. SMWData/SMWDataItem classes are updated to contain the "Original 
URI", and then this format will be the only needed one, in addition to 
possibly the ARC2 format, just for making use of it's parsers.

Number one is the one I've been pondering so far ... I just wanted to 
point out this now and ask whether there would be any interest in 
storing also the original URI directly in the SMWData/SMWDataItem 
classes ... (which would not need to be required, for data that has no 
counterpart in the outside world, though ... or maybe can just be 
prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a 
getter method)?

... it seems that would make the SMWData/SMWDI combo more general, and 
of course would make RDFIO add a lot less overhead :")

(I know we discussed this on SMWCon already, but these things weren't 
really that clear to me then, about the partly but not completely 
overlap between RDF and SMW data representations ... so wanted to point 
it out ... )

// Samuel

-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-10-31 18:20:44

On 10/31/2011 07:13 PM, Samuel Lampa wrote:
> (which would not need to be required, for data that has no
> counterpart in the outside world,

I mean, these "Original URI" fields would not be required to be filled. 
(Sorry for the possible confusion)

// Samuel

-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Markus K. <ma...@se...> - 2011-10-31 18:43:37

On 31/10/11 17:52, Samuel Lampa wrote:
> On 09/25/2011 11:05 AM, Markus Krötzsch wrote:
>> RDF data is represented by a smaller set of classes under SMWExpElement.
>> These classes represent triples for the purpose of serialisation (they
>> abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle).
>
> Got two questions:
>
> === Q1: Any SMWExpElements / SMWData converter? ===
>
> Are there converters from/to the SMWExpElement related classes and the
> SMWData/SMWDataItems combo already?
>
> I ask since so far I have been thinking about using the
> SMWData/SMWDataItem combo as native format in RDFIO, but it strikes me
> that the SMWExpElement related classes much more closely matches the
> data structure that you get from ARC2's RDF parsers.
>
> Thus, I was thinking that if I can do only the conversion from ARC2 data
> structures to SMWExpElements classes, and then there is already some
> converters to SMWData/SMWDataItem, I wouldn't need to reinvent that
> wheel...?

Yes, the SMWExpElement classes are meant as an abstraction of RDF terms 
and triples. They are used (1) as a pre-serialisation format for 
producing RDF (in any syntax) and (2) as a post-parsing format for 
interpreting SPARQL results.

* Due to (1), there is a complete implementation for the conversion

SMWDataItem/SMWSemanticData => SMWExpElement

This is done in the class SMWExporter (various methods, should be easy 
to find).

* Due to (2), there is an incomplete conversion

SMWExpElement => SMWDataItem

It is incomplete since we only need to interpret URIs as wiki pages when 
reading SPARQL results. Other types of RDF terms are not relevant in the 
SPARQL results we interpret. This conversion is implemented in 
SMWExporter::findDataItemForExpElement(). This method could be extended 
to create SMWDataItems for other types of input on a best-effort basis.

Since SPARQL results are plain lists (no graphs), there is no method yet 
for turning sets of triples into (necessarily many) SMWSemanticData 
objects. This could be added to SMWExporter as well, if needed.

The extension of this code would make sense in SMW. One could also 
imagine that this is later used for importing SPARQL results into SMW 
data for general forms of SPARQL queries. Note, however, that a main 
design goal for such an extension would be to round-trip the data that 
SMW exports as good as possible.

>
> === Q2: Status of SMWData/SMWDataItem as API? ===
>
> Also I wondered what status the SMWData/SMWDataItem classes are supposed
> to have, as a general API? ... Are they the supposed API, or is SMW
> going towards preferring to talk SPARQL with all extensions ... or even
> SMWExpElements?
>
> I ask this since it does not seem clear that I will really *need* to use
> the SMWData/SMWDataItem combo as a representation, if I do the wiki page
> updates either with the Wiki Object Model extension or an own writer class.
>
> I would still prefer to use it, if it is pushed as a preferred API for
> these kind of things, but I wondered whether that is so for the
> foreseeable future?

SMWDataItems are supposed to be the main atomic data representation API 
in SMW. SMWSemanticData is the main annotation (property assignment) API 
in SMW. Both are assumed to stay in this position for the foreseeable 
future.

SMWExpElement is based on the RDF data model and is therefore not 
suitable for representing SMW data where we have special elements like 
wiki pages, properties or geographic coordinates that are not 
represented explicitly in RDF. We need an API that distinguishes data 
items by their functional role in SMW (e.g., wiki page vs. property vs. 
URI) where this distinction does not exist in RDF.

For these reasons, data items and semantic data containers are the main 
API for passing around data in SMW.

Markus

Re: [SMW-devel] Reminder of Architecture Overview update

From: Markus K. <ma...@se...> - 2011-10-31 18:55:14

On 31/10/11 18:13, Samuel Lampa wrote:
> On 10/31/2011 06:52 PM, Samuel Lampa wrote:
>> === Q2: Status of SMWData/SMWDataItem as API? ===
>>
>> Also I wondered what status the SMWData/SMWDataItem classes are supposed
>> to have, as a general API? ... Are they the supposed API, or is SMW
>> going towards preferring to talk SPARQL with all extensions ... or even
>> SMWExpElements?
>>
>> I ask this since it does not seem clear that I will really*need* to use
>> the SMWData/SMWDataItem combo as a representation, if I do the wiki page
>> updates either with the Wiki Object Model extension or an own writer
>> class.
>>
>> I would still prefer to use it, if it is pushed as a preferred API for
>> these kind of things, but I wondered whether that is so for the
>> foreseeable future?
>
>
> The thing that makes me wonder, is since we're basically talking about
> two slightly different (though very much overlapping) representations:
> RDF (as represented by SMWExpElement rel. classes), and Semantic
> MediaWiki facts (as repr. by SMWData/SMWDataItem).
>
> My problem, in the context of RDFIO, is that it seems I actually need
> both of these to capture the information from both worlds ... since:
>
> a. I need to store the URI:s, which only SMWExpElement classes do
> b. I need to store the wiki page titles that I choose to use (as part of
> RDFIO:s algorithm), which only the SMWData/SMWData combo does.
>
> ... thus it seems there's at least two options:
>
> 1. RDFIO creates an own more general data container, which wraps both
> the SMWData/SMWDataItem one, and the RDF one (possibly both the
> SMWExpElement one, and ARC2:s data structures), with in-built converters
> between all of these,
>
> 2. SMWData/SMWDataItem classes are updated to contain the "Original
> URI", and then this format will be the only needed one, in addition to
> possibly the ARC2 format, just for making use of it's parsers.
>
>
> Number one is the one I've been pondering so far ... I just wanted to
> point out this now and ask whether there would be any interest in
> storing also the original URI directly in the SMWData/SMWDataItem
> classes ... (which would not need to be required, for data that has no
> counterpart in the outside world, though ... or maybe can just be
> prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a
> getter method)?
>
> ... it seems that would make the SMWData/SMWDI combo more general, and
> of course would make RDFIO add a lot less overhead :")
>
> (I know we discussed this on SMWCon already, but these things weren't
> really that clear to me then, about the partly but not completely
> overlap between RDF and SMW data representations ... so wanted to point
> it out ... )

I suggest to go for (1) if you need the full information in one object. 
You should think of SMW data items as small and simple "values", similar 
to an integer or a char in a programming language. They should be used 
like constants of datatypes. They should only be used for storing data, 
not for converting data or for augmenting it. They are pure data and 
know nothing about HTML, wikitext or RDF. [Exception: the SMWDIContainer 
type is a placeholder for compound data; it is not really considered as 
an atomic value in SMW but just used for transporting compound data in 
the API]

With this view in mind, making an object that holds a URI and a dataitem 
does not seem a bad idea (like making an object that holds an integer 
and a string).

Alternatively, you could of course represent URIs in an SMW data item as 
well and relate them to wiki page with a property, stored together in an 
SMWSemanticData.

Markus

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-10-31 19:17:37

On 10/31/2011 07:55 PM, Markus Krötzsch wrote:
> On 31/10/11 18:13, Samuel Lampa wrote:
>> On 10/31/2011 06:52 PM, Samuel Lampa wrote:
>>> === Q2: Status of SMWData/SMWDataItem as API? ===
>>>
>>> Also I wondered what status the SMWData/SMWDataItem classes are supposed
>>> to have, as a general API? ... Are they the supposed API, or is SMW
>>> going towards preferring to talk SPARQL with all extensions ... or even
>>> SMWExpElements?
>>>
>>> I ask this since it does not seem clear that I will really*need* to use
>>> the SMWData/SMWDataItem combo as a representation, if I do the wiki page
>>> updates either with the Wiki Object Model extension or an own writer
>>> class.
>>>
>>> I would still prefer to use it, if it is pushed as a preferred API for
>>> these kind of things, but I wondered whether that is so for the
>>> foreseeable future?
>>
>>
>> The thing that makes me wonder, is since we're basically talking about
>> two slightly different (though very much overlapping) representations:
>> RDF (as represented by SMWExpElement rel. classes), and Semantic
>> MediaWiki facts (as repr. by SMWData/SMWDataItem).
>>
>> My problem, in the context of RDFIO, is that it seems I actually need
>> both of these to capture the information from both worlds ... since:
>>
>> a. I need to store the URI:s, which only SMWExpElement classes do
>> b. I need to store the wiki page titles that I choose to use (as part of
>> RDFIO:s algorithm), which only the SMWData/SMWData combo does.
>>
>> ... thus it seems there's at least two options:
>>
>> 1. RDFIO creates an own more general data container, which wraps both
>> the SMWData/SMWDataItem one, and the RDF one (possibly both the
>> SMWExpElement one, and ARC2:s data structures), with in-built converters
>> between all of these,
>>
>> 2. SMWData/SMWDataItem classes are updated to contain the "Original
>> URI", and then this format will be the only needed one, in addition to
>> possibly the ARC2 format, just for making use of it's parsers.
>>
>>
>> Number one is the one I've been pondering so far ... I just wanted to
>> point out this now and ask whether there would be any interest in
>> storing also the original URI directly in the SMWData/SMWDataItem
>> classes ... (which would not need to be required, for data that has no
>> counterpart in the outside world, though ... or maybe can just be
>> prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a
>> getter method)?
>>
>> ... it seems that would make the SMWData/SMWDI combo more general, and
>> of course would make RDFIO add a lot less overhead :")
>>
>> (I know we discussed this on SMWCon already, but these things weren't
>> really that clear to me then, about the partly but not completely
>> overlap between RDF and SMW data representations ... so wanted to point
>> it out ... )
>
> I suggest to go for (1) if you need the full information in one object.
> You should think of SMW data items as small and simple "values", similar
> to an integer or a char in a programming language. They should be used
> like constants of datatypes. They should only be used for storing data,
> not for converting data or for augmenting it. They are pure data and
> know nothing about HTML, wikitext or RDF. [Exception: the SMWDIContainer
> type is a placeholder for compound data; it is not really considered as
> an atomic value in SMW but just used for transporting compound data in
> the API]
>
> With this view in mind, making an object that holds a URI and a dataitem
> does not seem a bad idea (like making an object that holds an integer
> and a string).
>
> Alternatively, you could of course represent URIs in an SMW data item as
> well and relate them to wiki page with a property, stored together in an
> SMWSemanticData.


Ok, many thanks for the feedback!

The suggestions sounds reasonable - keeping in line with the modelling 
approach already taken.

The only little caution I'd like to make, is that the decision keeping 
data objects atomic makes them follow the Anemic Model antipattern [1] a 
bit. But that is of course a question about model design approach 
overall, and not this specific case only - that is, whether one wants to 
follow Domain Driven Design patterns [2] or not.

... so for the moment I'm happy to follow the existing model design 
approach :)

// Samuel


[1] http://martinfowler.com/bliki/AnemicDomainModel.html
[2] http://en.wikipedia.org/wiki/Domain-driven_design



-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-10-31 19:20:56

On 10/31/2011 07:43 PM, Markus Krötzsch wrote:
> The extension of this code would make sense in SMW. One could also
> imagine that this is later used for importing SPARQL results into SMW
> data for general forms of SPARQL queries.

I'm not sure I followed this part:
"importing SPARQL results into SMW data for general forms of SPARQL 
queries."

... though it sounds interesting. Could you please elaborate?

Overall, though, I think, supporting full roundtrip of SMW<->RDF data 
structures, is indeed interesting, and would enable a whole bunch of new 
use cases ...

Just got to think about one ... that combined with a general and robust 
SMWSemanticData importer (into wiki pages), it would be enable to make 
explicit facts that are only implicit in the wiki, by the means of 
SPARQL CONSTRUCT queries, and persisting these new explicitized facts in 
the wiki ... that is, one thing of which is typically done by reasoners 
these days ...

// Samuel

-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Markus K. <ma...@se...> - 2011-11-01 15:26:17

On 31/10/11 19:20, Samuel Lampa wrote:
> On 10/31/2011 07:43 PM, Markus Krötzsch wrote:
>> The extension of this code would make sense in SMW. One could also
>> imagine that this is later used for importing SPARQL results into SMW
>> data for general forms of SPARQL queries.
>
> I'm not sure I followed this part:
> "importing SPARQL results into SMW data for general forms of SPARQL
> queries."
>
> ... though it sounds interesting. Could you please elaborate?

The queries we use for connecting SMW to SPARQL stores are of a special 
form since they only cover the page-selection part of #ask. So they 
always select a single variable and we always expect results to bind to 
URIs of wiki pages only.

>
> Overall, though, I think, supporting full roundtrip of SMW<->RDF data
> structures, is indeed interesting, and would enable a whole bunch of new
> use cases ...

Fully reliable round-tripping won't be possible when considering single 
entities (as one has to in SPARQL results since they may not, in genera, 
provide enough context). For example, a URI of a wiki page (Type:URL) 
and the wiki page itself (Type:Page) could not be distinguished in RDF. 
One would need to know the wiki type of all imported data to make this 
work reliably.

>
> Just got to think about one ... that combined with a general and robust
> SMWSemanticData importer (into wiki pages), it would be enable to make
> explicit facts that are only implicit in the wiki, by the means of
> SPARQL CONSTRUCT queries, and persisting these new explicitized facts in
> the wiki ... that is, one thing of which is typically done by reasoners
> these days ...

Possible, but I am not sure that the detour through SPARQL would be 
helpful there. One could also persist results of #ask queries in the 
same way. The main problem in both cases is not the initial computation 
but view maintenance/update.

Markus

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-11-01 15:40:34

On 11/01/2011 04:26 PM, Markus Krötzsch wrote:
>> Overall, though, I think, supporting full roundtrip of SMW<->RDF data
>> structures, is indeed interesting, and would enable a whole bunch of new
>> use cases ...
>
> Fully reliable round-tripping won't be possible when considering single
> entities (as one has to in SPARQL results since they may not, in genera,
> provide enough context). For example, a URI of a wiki page (Type:URL)
> and the wiki page itself (Type:Page) could not be distinguished in RDF.
> One would need to know the wiki type of all imported data to make this
> work reliably.

Good point!

// Samuel


-- 
Samuel Lampa
---------------------------------------
  Bioinformatician @ Uppsala University
    Blog: http://saml.rilspace.org
---------------------------------------

Re: [SMW-devel] Reminder of Architecture Overview update

From: Markus K. <ma...@se...> - 2011-11-01 15:41:43

On 31/10/11 19:17, Samuel Lampa wrote:
> On 10/31/2011 07:55 PM, Markus Krötzsch wrote:
>> On 31/10/11 18:13, Samuel Lampa wrote:
>>> On 10/31/2011 06:52 PM, Samuel Lampa wrote:
>>>> === Q2: Status of SMWData/SMWDataItem as API? ===
>>>>
>>>> Also I wondered what status the SMWData/SMWDataItem classes are
>>>> supposed
>>>> to have, as a general API? ... Are they the supposed API, or is SMW
>>>> going towards preferring to talk SPARQL with all extensions ... or even
>>>> SMWExpElements?
>>>>
>>>> I ask this since it does not seem clear that I will really*need* to use
>>>> the SMWData/SMWDataItem combo as a representation, if I do the wiki
>>>> page
>>>> updates either with the Wiki Object Model extension or an own writer
>>>> class.
>>>>
>>>> I would still prefer to use it, if it is pushed as a preferred API for
>>>> these kind of things, but I wondered whether that is so for the
>>>> foreseeable future?
>>>
>>>
>>> The thing that makes me wonder, is since we're basically talking about
>>> two slightly different (though very much overlapping) representations:
>>> RDF (as represented by SMWExpElement rel. classes), and Semantic
>>> MediaWiki facts (as repr. by SMWData/SMWDataItem).
>>>
>>> My problem, in the context of RDFIO, is that it seems I actually need
>>> both of these to capture the information from both worlds ... since:
>>>
>>> a. I need to store the URI:s, which only SMWExpElement classes do
>>> b. I need to store the wiki page titles that I choose to use (as part of
>>> RDFIO:s algorithm), which only the SMWData/SMWData combo does.
>>>
>>> ... thus it seems there's at least two options:
>>>
>>> 1. RDFIO creates an own more general data container, which wraps both
>>> the SMWData/SMWDataItem one, and the RDF one (possibly both the
>>> SMWExpElement one, and ARC2:s data structures), with in-built converters
>>> between all of these,
>>>
>>> 2. SMWData/SMWDataItem classes are updated to contain the "Original
>>> URI", and then this format will be the only needed one, in addition to
>>> possibly the ARC2 format, just for making use of it's parsers.
>>>
>>>
>>> Number one is the one I've been pondering so far ... I just wanted to
>>> point out this now and ask whether there would be any interest in
>>> storing also the original URI directly in the SMWData/SMWDataItem
>>> classes ... (which would not need to be required, for data that has no
>>> counterpart in the outside world, though ... or maybe can just be
>>> prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a
>>> getter method)?
>>>
>>> ... it seems that would make the SMWData/SMWDI combo more general, and
>>> of course would make RDFIO add a lot less overhead :")
>>>
>>> (I know we discussed this on SMWCon already, but these things weren't
>>> really that clear to me then, about the partly but not completely
>>> overlap between RDF and SMW data representations ... so wanted to point
>>> it out ... )
>>
>> I suggest to go for (1) if you need the full information in one object.
>> You should think of SMW data items as small and simple "values", similar
>> to an integer or a char in a programming language. They should be used
>> like constants of datatypes. They should only be used for storing data,
>> not for converting data or for augmenting it. They are pure data and
>> know nothing about HTML, wikitext or RDF. [Exception: the SMWDIContainer
>> type is a placeholder for compound data; it is not really considered as
>> an atomic value in SMW but just used for transporting compound data in
>> the API]
>>
>> With this view in mind, making an object that holds a URI and a dataitem
>> does not seem a bad idea (like making an object that holds an integer
>> and a string).
>>
>> Alternatively, you could of course represent URIs in an SMW data item as
>> well and relate them to wiki page with a property, stored together in an
>> SMWSemanticData.
>
>
> Ok, many thanks for the feedback!
>
> The suggestions sounds reasonable - keeping in line with the modelling
> approach already taken.
>
> The only little caution I'd like to make, is that the decision keeping
> data objects atomic makes them follow the Anemic Model antipattern [1] a
> bit. But that is of course a question about model design approach
> overall, and not this specific case only - that is, whether one wants to
> follow Domain Driven Design patterns [2] or not.

Reading [1], I think there is a misunderstanding in the way you seem to 
apply this text to SMW (probably due to my ill-chosen examples of 
property and wiki page out of all dataitems). The text states that 
domain specific behaviour of domain objects should be implemented in the 
classes that represent the objects. This is what we do. Our domain 
objects are strings, numbers, geographic coordinates. This is the very 
data that we want to manage in SMW, it just happens to be rather atomic, 
simple and (application) domain independent. Note that we do not 
artificially try to abstract or simplify the objects to get this 
representation -- these simple concepts are really the kinds of things 
that SMW users deal with.

Yet we include all related code into the objects whenever such code is 
needed. For example, you can have a look at SMWDITime to see a lot of 
calendar/date specific code. We could also have similar methods for 
strings (e.g., substring computation) and for numbers (e.g., for 
rounding) but this was not necessary so far. Our data items do not 
include parsing/rendering functions that are specific to syntactic 
formats like HTML, wikitext, JSON, RDF, SQL, ... which I think is good 
(and established) design (you don't mix all parsing/serialisation code 
into one class).

The big fallacy of [1] is to suggest that "object code" must always be 
much larger that "application/service code". If taken too serious, this 
could lead to a design that tries to merge all functionality into a few 
objects, thus contradicting the fundamental programming paradigm of 
separation of concerns. For example, SMW used to have HTML rendering and 
RDF serialisation methods for data in a single class, in spite of the 
fact that these functions are not at all related but merely work on the 
same input data.

This earlier design of SMW has also undermined another important idea of 
OO design: the definition of clear interfaces with limited visibility. 
The code for parsing, rendering, representation and serialisation used 
to have full access to all internal fields of the objects. Before the 
introduction of data items, it was quite unclear for some objects where 
the data is actually stored (there were multiple redundant/overlapping 
internal representations, sometimes optional, to reflect the internal 
state of the object; all code would directly read/write to any of the 
members).

A third main reason for keeping single objects small is that SMW is 
meant to be extendible. If each new storage backend or display format 
would rely on adding code to domain object classes, it would be very 
hard to extend the system.

Overall, I still think that SMW follows most of the guidelines of 
Domain-Driven Design but for a domain (data management) that is very 
different of what the author of [1] had in mind. Another special 
observation about SMW is that most of our "business logic" is related to 
parsing and serialisation -- tasks that should normally be separated 
from the data that they work on. But maybe one has to take a step back 
and ask what the "domain layer" and "application layer" in SMW really 
are to compare it to the DDD idea. :-)

Best regards,

Markus

>
> ... so for the moment I'm happy to follow the existing model design
> approach :)
>
> // Samuel
>
>
> [1] http://martinfowler.com/bliki/AnemicDomainModel.html
> [2] http://en.wikipedia.org/wiki/Domain-driven_design
>
>
>

Re: [SMW-devel] Reminder of Architecture Overview update

From: Samuel L. <sam...@ri...> - 2011-11-01 16:56:40

On 11/01/2011 04:41 PM, Markus Krötzsch wrote:
> On 31/10/11 19:17, Samuel Lampa wrote:
>> The only little caution I'd like to make, is that the decision keeping
>> data objects atomic makes them follow the Anemic Model antipattern [1] a
>> bit. But that is of course a question about model design approach
>> overall, and not this specific case only - that is, whether one wants to
>> follow Domain Driven Design patterns [2] or not.
>
> Reading [1], I think there is a misunderstanding in the way you seem to
> apply this text to SMW (probably due to my ill-chosen examples of
> property and wiki page out of all dataitems). The text states that
> domain specific behaviour of domain objects should be implemented in the
> classes that represent the objects. This is what we do. Our domain
> objects are strings, numbers, geographic coordinates. This is the very
> data that we want to manage in SMW, it just happens to be rather atomic,
> simple and (application) domain independent. Note that we do not
> artificially try to abstract or simplify the objects to get this
> representation -- these simple concepts are really the kinds of things
> that SMW users deal with.
>
> Yet we include all related code into the objects whenever such code is
> needed. For example, you can have a look at SMWDITime to see a lot of
> calendar/date specific code. We could also have similar methods for
> strings (e.g., substring computation) and for numbers (e.g., for
> rounding) but this was not necessary so far. Our data items do not
> include parsing/rendering functions that are specific to syntactic
> formats like HTML, wikitext, JSON, RDF, SQL, ... which I think is good
> (and established) design (you don't mix all parsing/serialisation code
> into one class).

Thanks for the clarification! I was obviously a bit quick to make an 
overall judgement, without studying other parts of the SMW code.

What you describe sounds fine. I indeed agree with separation of concerns.

Just as a sidenote I know that one very successful approach is to have 
"domain controller classes", for the different "domains" (or more 
properly "subdomains") or concerns, that aggregates and concerts all 
activities belonging to each concern, whether that be through rich 
domain objects, service objects, and any combination of those, but most 
importantly that all this concentrated and managed from the "domain 
controller object" (think "SMWDisplayDomain", "SMWExportDomain" etc ...).

... so, I'm indeed not a proponent of gather every possibly related 
functionality in normal domain objects. :)

> The big fallacy of [1] is to suggest that "object code" must always be
> much larger that "application/service code". If taken too serious, this
> could lead to a design that tries to merge all functionality into a few
> objects, thus contradicting the fundamental programming paradigm of
> separation of concerns. For example, SMW used to have HTML rendering and
> RDF serialisation methods for data in a single class, in spite of the
> fact that these functions are not at all related but merely work on the
> same input data.
>
> This earlier design of SMW has also undermined another important idea of
> OO design: the definition of clear interfaces with limited visibility.
> The code for parsing, rendering, representation and serialisation used
> to have full access to all internal fields of the objects. Before the
> introduction of data items, it was quite unclear for some objects where
> the data is actually stored (there were multiple redundant/overlapping
> internal representations, sometimes optional, to reflect the internal
> state of the object; all code would directly read/write to any of the
> members).

True.

Next sidenote, that the "Domain controller object" approach I'm familiar 
with, also makes it easy to add "facade" objects, specific to the 
different concerns, which manage visibility and the like, to all 
functionality related to that particular domain.

> A third main reason for keeping single objects small is that SMW is
> meant to be extendible. If each new storage backend or display format
> would rely on adding code to domain object classes, it would be very
> hard to extend the system.

Good point too.

> Overall, I still think that SMW follows most of the guidelines of
> Domain-Driven Design but for a domain (data management) that is very
> different of what the author of [1] had in mind.

Indeed.

> Another special
> observation about SMW is that most of our "business logic" is related to
> parsing and serialisation -- tasks that should normally be separated
> from the data that they work on.

An interesting observation ... got to think a bit on that :)

> But maybe one has to take a step back
> and ask what the "domain layer" and "application layer" in SMW really
> are to compare it to the DDD idea. :-)

Yeah, an interesting thing to ponder as well! :)

(Sidenote no. 3 is that the "domain controller object" approach simply 
has the application layer in the domain controller objects ... thus 
tying together application and domain layer very closely, eventhough a 
clear separation is still maintained (application logiv in the domain 
controllers, domain logic in the other domain objects).

... and then the corresponding facade objects provide the 
presentation/API level logic, to which UI code and other extensions can 
talk.)

All in all: Thanks for an interesting and clarifying elaboration of the 
design choices! To the best of my judgement it sounds well 
thought-through and reasonable.

My sidenotes are just my spontaneous reflections that I couldn't resist 
to add, since I find that the "domain controller" approach really fills 
a gap in the land of DDD, on how to actually implement the separation of 
concerns, as well as the application layer, in practice.

Cheers
// samuel