From: Samuel L. <sam...@ri...> - 2011-09-25 04:18:27
|
Sorry for repeating this, but wanted to remind about the need to update the rest of the Architecture Overview article [1]. I guess that even just updating the text that is there (there are two sections not updated to 1.6) would go a long way? The problem now is that even parts supposed to be updated for 1.6 changes still contain pointers to the supposedly ditched SMWDataValue for example (see: [2]), which makes it somewhat confusing. I'd love to have a shortlist of the foundational classes I need to know to represent triple data with SMW classes ... Should I basically be fine with SMWDataItems (elements) and SMWSemanticData (aggregates of facts per subject)? ... or is there some other foundational class I should add to the shortlist? (Should not forget to say that the 1.6 changes looks very nice! :) ... if we can just get the know how to use it all ;) ) Cheers, // Samuel [1]: http://www.semantic-mediawiki.org/wiki/Architecture_guide [2]: http://www.semantic-mediawiki.org/wiki/Architecture_guide#SMWSemanticData_and_other_ways_to_represent_facts -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Yury K. <kat...@gm...> - 2011-09-25 08:55:43
|
+1 for continuing the Architecture Guide. On Sun, Sep 25, 2011 at 8:18 AM, Samuel Lampa <sam...@ri...>wrote: > Sorry for repeating this, but wanted to remind about the need to update > the rest of the Architecture Overview article [1]. I guess that even > just updating the text that is there (there are two sections not updated > to 1.6) would go a long way? > > The problem now is that even parts supposed to be updated for 1.6 > changes still contain pointers to the supposedly ditched SMWDataValue > for example (see: [2]), which makes it somewhat confusing. > > I'd love to have a shortlist of the foundational classes I need to know > to represent triple data with SMW classes ... Should I basically be fine > with SMWDataItems (elements) and SMWSemanticData (aggregates of facts > per subject)? ... or is there some other foundational class I should add > to the shortlist? > > (Should not forget to say that the 1.6 changes looks very nice! :) > ... if we can just get the know how to use it all ;) ) > > Cheers, > // Samuel > > [1]: http://www.semantic-mediawiki.org/wiki/Architecture_guide > [2]: > > http://www.semantic-mediawiki.org/wiki/Architecture_guide#SMWSemanticData_and_other_ways_to_represent_facts > > > > -- > Samuel Lampa > --------------------------------------- > Bioinformatician @ Uppsala University > Blog: http://saml.rilspace.org > --------------------------------------- > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > Semediawiki-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > -- Yury V. Katkov WikiVote! llc |
From: Markus K. <ma...@se...> - 2011-09-25 09:05:58
|
On 25/09/11 05:18, Samuel Lampa wrote: > Sorry for repeating this, but wanted to remind about the need to update > the rest of the Architecture Overview article [1]. I guess that even > just updating the text that is there (there are two sections not updated > to 1.6) would go a long way? > > The problem now is that even parts supposed to be updated for 1.6 > changes still contain pointers to the supposedly ditched SMWDataValue > for example (see: [2]), which makes it somewhat confusing. Indeed, I will see what I can do. > > I'd love to have a shortlist of the foundational classes I need to know > to represent triple data with SMW classes ... Should I basically be fine > with SMWDataItems (elements) and SMWSemanticData (aggregates of facts > per subject)? ... or is there some other foundational class I should add > to the shortlist? For representing input data, that's all. Query outputs are represented in SMWQueryResult (basically an iterator for a 3D-array) but the data returned there is also based on DIs. RDF data is represented by a smaller set of classes under SMWExpElement. These classes represent triples for the purpose of serialisation (they abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle). Greetings (from the bus back to Oxford), Markus |
From: Samuel L. <sam...@ri...> - 2011-09-25 22:04:04
|
On 09/25/2011 11:05 AM, Markus Krötzsch wrote: > >> >> I'd love to have a shortlist of the foundational classes I need to know >> to represent triple data with SMW classes ... Should I basically be fine >> with SMWDataItems (elements) and SMWSemanticData (aggregates of facts >> per subject)? ... or is there some other foundational class I should add >> to the shortlist? > > For representing input data, that's all. Query outputs are represented > in SMWQueryResult (basically an iterator for a 3D-array) but the data > returned there is also based on DIs. > > RDF data is represented by a smaller set of classes under SMWExpElement. > These classes represent triples for the purpose of serialisation (they > abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle). Ok, many thanks for the hints! Will have a closer look at that ... Cheers, // Samuel |
From: Samuel L. <sam...@ri...> - 2011-10-31 17:53:08
|
On 09/25/2011 11:05 AM, Markus Krötzsch wrote: > RDF data is represented by a smaller set of classes under SMWExpElement. > These classes represent triples for the purpose of serialisation (they > abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle). Got two questions: === Q1: Any SMWExpElements / SMWData converter? === Are there converters from/to the SMWExpElement related classes and the SMWData/SMWDataItems combo already? I ask since so far I have been thinking about using the SMWData/SMWDataItem combo as native format in RDFIO, but it strikes me that the SMWExpElement related classes much more closely matches the data structure that you get from ARC2's RDF parsers. Thus, I was thinking that if I can do only the conversion from ARC2 data structures to SMWExpElements classes, and then there is already some converters to SMWData/SMWDataItem, I wouldn't need to reinvent that wheel...? === Q2: Status of SMWData/SMWDataItem as API? === Also I wondered what status the SMWData/SMWDataItem classes are supposed to have, as a general API? ... Are they the supposed API, or is SMW going towards preferring to talk SPARQL with all extensions ... or even SMWExpElements? I ask this since it does not seem clear that I will really *need* to use the SMWData/SMWDataItem combo as a representation, if I do the wiki page updates either with the Wiki Object Model extension or an own writer class. I would still prefer to use it, if it is pushed as a preferred API for these kind of things, but I wondered whether that is so for the foreseeable future? // Samuel -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Samuel L. <sam...@ri...> - 2011-10-31 18:13:13
|
On 10/31/2011 06:52 PM, Samuel Lampa wrote: > === Q2: Status of SMWData/SMWDataItem as API? === > > Also I wondered what status the SMWData/SMWDataItem classes are supposed > to have, as a general API? ... Are they the supposed API, or is SMW > going towards preferring to talk SPARQL with all extensions ... or even > SMWExpElements? > > I ask this since it does not seem clear that I will really*need* to use > the SMWData/SMWDataItem combo as a representation, if I do the wiki page > updates either with the Wiki Object Model extension or an own writer class. > > I would still prefer to use it, if it is pushed as a preferred API for > these kind of things, but I wondered whether that is so for the > foreseeable future? The thing that makes me wonder, is since we're basically talking about two slightly different (though very much overlapping) representations: RDF (as represented by SMWExpElement rel. classes), and Semantic MediaWiki facts (as repr. by SMWData/SMWDataItem). My problem, in the context of RDFIO, is that it seems I actually need both of these to capture the information from both worlds ... since: a. I need to store the URI:s, which only SMWExpElement classes do b. I need to store the wiki page titles that I choose to use (as part of RDFIO:s algorithm), which only the SMWData/SMWData combo does. ... thus it seems there's at least two options: 1. RDFIO creates an own more general data container, which wraps both the SMWData/SMWDataItem one, and the RDF one (possibly both the SMWExpElement one, and ARC2:s data structures), with in-built converters between all of these, 2. SMWData/SMWDataItem classes are updated to contain the "Original URI", and then this format will be the only needed one, in addition to possibly the ARC2 format, just for making use of it's parsers. Number one is the one I've been pondering so far ... I just wanted to point out this now and ask whether there would be any interest in storing also the original URI directly in the SMWData/SMWDataItem classes ... (which would not need to be required, for data that has no counterpart in the outside world, though ... or maybe can just be prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a getter method)? ... it seems that would make the SMWData/SMWDI combo more general, and of course would make RDFIO add a lot less overhead :") (I know we discussed this on SMWCon already, but these things weren't really that clear to me then, about the partly but not completely overlap between RDF and SMW data representations ... so wanted to point it out ... ) // Samuel -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Samuel L. <sam...@ri...> - 2011-10-31 18:20:44
|
On 10/31/2011 07:13 PM, Samuel Lampa wrote: > (which would not need to be required, for data that has no > counterpart in the outside world, I mean, these "Original URI" fields would not be required to be filled. (Sorry for the possible confusion) // Samuel -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Markus K. <ma...@se...> - 2011-10-31 18:43:37
|
On 31/10/11 17:52, Samuel Lampa wrote: > On 09/25/2011 11:05 AM, Markus Krötzsch wrote: >> RDF data is represented by a smaller set of classes under SMWExpElement. >> These classes represent triples for the purpose of serialisation (they >> abstract RDF before fixing a concrete syntax such as RDF/XML or Turtle). > > Got two questions: > > === Q1: Any SMWExpElements / SMWData converter? === > > Are there converters from/to the SMWExpElement related classes and the > SMWData/SMWDataItems combo already? > > I ask since so far I have been thinking about using the > SMWData/SMWDataItem combo as native format in RDFIO, but it strikes me > that the SMWExpElement related classes much more closely matches the > data structure that you get from ARC2's RDF parsers. > > Thus, I was thinking that if I can do only the conversion from ARC2 data > structures to SMWExpElements classes, and then there is already some > converters to SMWData/SMWDataItem, I wouldn't need to reinvent that > wheel...? Yes, the SMWExpElement classes are meant as an abstraction of RDF terms and triples. They are used (1) as a pre-serialisation format for producing RDF (in any syntax) and (2) as a post-parsing format for interpreting SPARQL results. * Due to (1), there is a complete implementation for the conversion SMWDataItem/SMWSemanticData => SMWExpElement This is done in the class SMWExporter (various methods, should be easy to find). * Due to (2), there is an incomplete conversion SMWExpElement => SMWDataItem It is incomplete since we only need to interpret URIs as wiki pages when reading SPARQL results. Other types of RDF terms are not relevant in the SPARQL results we interpret. This conversion is implemented in SMWExporter::findDataItemForExpElement(). This method could be extended to create SMWDataItems for other types of input on a best-effort basis. Since SPARQL results are plain lists (no graphs), there is no method yet for turning sets of triples into (necessarily many) SMWSemanticData objects. This could be added to SMWExporter as well, if needed. The extension of this code would make sense in SMW. One could also imagine that this is later used for importing SPARQL results into SMW data for general forms of SPARQL queries. Note, however, that a main design goal for such an extension would be to round-trip the data that SMW exports as good as possible. > > === Q2: Status of SMWData/SMWDataItem as API? === > > Also I wondered what status the SMWData/SMWDataItem classes are supposed > to have, as a general API? ... Are they the supposed API, or is SMW > going towards preferring to talk SPARQL with all extensions ... or even > SMWExpElements? > > I ask this since it does not seem clear that I will really *need* to use > the SMWData/SMWDataItem combo as a representation, if I do the wiki page > updates either with the Wiki Object Model extension or an own writer class. > > I would still prefer to use it, if it is pushed as a preferred API for > these kind of things, but I wondered whether that is so for the > foreseeable future? SMWDataItems are supposed to be the main atomic data representation API in SMW. SMWSemanticData is the main annotation (property assignment) API in SMW. Both are assumed to stay in this position for the foreseeable future. SMWExpElement is based on the RDF data model and is therefore not suitable for representing SMW data where we have special elements like wiki pages, properties or geographic coordinates that are not represented explicitly in RDF. We need an API that distinguishes data items by their functional role in SMW (e.g., wiki page vs. property vs. URI) where this distinction does not exist in RDF. For these reasons, data items and semantic data containers are the main API for passing around data in SMW. Markus |
From: Markus K. <ma...@se...> - 2011-10-31 18:55:14
|
On 31/10/11 18:13, Samuel Lampa wrote: > On 10/31/2011 06:52 PM, Samuel Lampa wrote: >> === Q2: Status of SMWData/SMWDataItem as API? === >> >> Also I wondered what status the SMWData/SMWDataItem classes are supposed >> to have, as a general API? ... Are they the supposed API, or is SMW >> going towards preferring to talk SPARQL with all extensions ... or even >> SMWExpElements? >> >> I ask this since it does not seem clear that I will really*need* to use >> the SMWData/SMWDataItem combo as a representation, if I do the wiki page >> updates either with the Wiki Object Model extension or an own writer >> class. >> >> I would still prefer to use it, if it is pushed as a preferred API for >> these kind of things, but I wondered whether that is so for the >> foreseeable future? > > > The thing that makes me wonder, is since we're basically talking about > two slightly different (though very much overlapping) representations: > RDF (as represented by SMWExpElement rel. classes), and Semantic > MediaWiki facts (as repr. by SMWData/SMWDataItem). > > My problem, in the context of RDFIO, is that it seems I actually need > both of these to capture the information from both worlds ... since: > > a. I need to store the URI:s, which only SMWExpElement classes do > b. I need to store the wiki page titles that I choose to use (as part of > RDFIO:s algorithm), which only the SMWData/SMWData combo does. > > ... thus it seems there's at least two options: > > 1. RDFIO creates an own more general data container, which wraps both > the SMWData/SMWDataItem one, and the RDF one (possibly both the > SMWExpElement one, and ARC2:s data structures), with in-built converters > between all of these, > > 2. SMWData/SMWDataItem classes are updated to contain the "Original > URI", and then this format will be the only needed one, in addition to > possibly the ARC2 format, just for making use of it's parsers. > > > Number one is the one I've been pondering so far ... I just wanted to > point out this now and ask whether there would be any interest in > storing also the original URI directly in the SMWData/SMWDataItem > classes ... (which would not need to be required, for data that has no > counterpart in the outside world, though ... or maybe can just be > prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a > getter method)? > > ... it seems that would make the SMWData/SMWDI combo more general, and > of course would make RDFIO add a lot less overhead :") > > (I know we discussed this on SMWCon already, but these things weren't > really that clear to me then, about the partly but not completely > overlap between RDF and SMW data representations ... so wanted to point > it out ... ) I suggest to go for (1) if you need the full information in one object. You should think of SMW data items as small and simple "values", similar to an integer or a char in a programming language. They should be used like constants of datatypes. They should only be used for storing data, not for converting data or for augmenting it. They are pure data and know nothing about HTML, wikitext or RDF. [Exception: the SMWDIContainer type is a placeholder for compound data; it is not really considered as an atomic value in SMW but just used for transporting compound data in the API] With this view in mind, making an object that holds a URI and a dataitem does not seem a bad idea (like making an object that holds an integer and a string). Alternatively, you could of course represent URIs in an SMW data item as well and relate them to wiki page with a property, stored together in an SMWSemanticData. Markus |
From: Samuel L. <sam...@ri...> - 2011-10-31 19:17:37
|
On 10/31/2011 07:55 PM, Markus Krötzsch wrote: > On 31/10/11 18:13, Samuel Lampa wrote: >> On 10/31/2011 06:52 PM, Samuel Lampa wrote: >>> === Q2: Status of SMWData/SMWDataItem as API? === >>> >>> Also I wondered what status the SMWData/SMWDataItem classes are supposed >>> to have, as a general API? ... Are they the supposed API, or is SMW >>> going towards preferring to talk SPARQL with all extensions ... or even >>> SMWExpElements? >>> >>> I ask this since it does not seem clear that I will really*need* to use >>> the SMWData/SMWDataItem combo as a representation, if I do the wiki page >>> updates either with the Wiki Object Model extension or an own writer >>> class. >>> >>> I would still prefer to use it, if it is pushed as a preferred API for >>> these kind of things, but I wondered whether that is so for the >>> foreseeable future? >> >> >> The thing that makes me wonder, is since we're basically talking about >> two slightly different (though very much overlapping) representations: >> RDF (as represented by SMWExpElement rel. classes), and Semantic >> MediaWiki facts (as repr. by SMWData/SMWDataItem). >> >> My problem, in the context of RDFIO, is that it seems I actually need >> both of these to capture the information from both worlds ... since: >> >> a. I need to store the URI:s, which only SMWExpElement classes do >> b. I need to store the wiki page titles that I choose to use (as part of >> RDFIO:s algorithm), which only the SMWData/SMWData combo does. >> >> ... thus it seems there's at least two options: >> >> 1. RDFIO creates an own more general data container, which wraps both >> the SMWData/SMWDataItem one, and the RDF one (possibly both the >> SMWExpElement one, and ARC2:s data structures), with in-built converters >> between all of these, >> >> 2. SMWData/SMWDataItem classes are updated to contain the "Original >> URI", and then this format will be the only needed one, in addition to >> possibly the ARC2 format, just for making use of it's parsers. >> >> >> Number one is the one I've been pondering so far ... I just wanted to >> point out this now and ask whether there would be any interest in >> storing also the original URI directly in the SMWData/SMWDataItem >> classes ... (which would not need to be required, for data that has no >> counterpart in the outside world, though ... or maybe can just be >> prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a >> getter method)? >> >> ... it seems that would make the SMWData/SMWDI combo more general, and >> of course would make RDFIO add a lot less overhead :") >> >> (I know we discussed this on SMWCon already, but these things weren't >> really that clear to me then, about the partly but not completely >> overlap between RDF and SMW data representations ... so wanted to point >> it out ... ) > > I suggest to go for (1) if you need the full information in one object. > You should think of SMW data items as small and simple "values", similar > to an integer or a char in a programming language. They should be used > like constants of datatypes. They should only be used for storing data, > not for converting data or for augmenting it. They are pure data and > know nothing about HTML, wikitext or RDF. [Exception: the SMWDIContainer > type is a placeholder for compound data; it is not really considered as > an atomic value in SMW but just used for transporting compound data in > the API] > > With this view in mind, making an object that holds a URI and a dataitem > does not seem a bad idea (like making an object that holds an integer > and a string). > > Alternatively, you could of course represent URIs in an SMW data item as > well and relate them to wiki page with a property, stored together in an > SMWSemanticData. Ok, many thanks for the feedback! The suggestions sounds reasonable - keeping in line with the modelling approach already taken. The only little caution I'd like to make, is that the decision keeping data objects atomic makes them follow the Anemic Model antipattern [1] a bit. But that is of course a question about model design approach overall, and not this specific case only - that is, whether one wants to follow Domain Driven Design patterns [2] or not. ... so for the moment I'm happy to follow the existing model design approach :) // Samuel [1] http://martinfowler.com/bliki/AnemicDomainModel.html [2] http://en.wikipedia.org/wiki/Domain-driven_design -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Samuel L. <sam...@ri...> - 2011-10-31 19:20:56
|
On 10/31/2011 07:43 PM, Markus Krötzsch wrote: > The extension of this code would make sense in SMW. One could also > imagine that this is later used for importing SPARQL results into SMW > data for general forms of SPARQL queries. I'm not sure I followed this part: "importing SPARQL results into SMW data for general forms of SPARQL queries." ... though it sounds interesting. Could you please elaborate? Overall, though, I think, supporting full roundtrip of SMW<->RDF data structures, is indeed interesting, and would enable a whole bunch of new use cases ... Just got to think about one ... that combined with a general and robust SMWSemanticData importer (into wiki pages), it would be enable to make explicit facts that are only implicit in the wiki, by the means of SPARQL CONSTRUCT queries, and persisting these new explicitized facts in the wiki ... that is, one thing of which is typically done by reasoners these days ... // Samuel -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Markus K. <ma...@se...> - 2011-11-01 15:26:17
|
On 31/10/11 19:20, Samuel Lampa wrote: > On 10/31/2011 07:43 PM, Markus Krötzsch wrote: >> The extension of this code would make sense in SMW. One could also >> imagine that this is later used for importing SPARQL results into SMW >> data for general forms of SPARQL queries. > > I'm not sure I followed this part: > "importing SPARQL results into SMW data for general forms of SPARQL > queries." > > ... though it sounds interesting. Could you please elaborate? The queries we use for connecting SMW to SPARQL stores are of a special form since they only cover the page-selection part of #ask. So they always select a single variable and we always expect results to bind to URIs of wiki pages only. > > Overall, though, I think, supporting full roundtrip of SMW<->RDF data > structures, is indeed interesting, and would enable a whole bunch of new > use cases ... Fully reliable round-tripping won't be possible when considering single entities (as one has to in SPARQL results since they may not, in genera, provide enough context). For example, a URI of a wiki page (Type:URL) and the wiki page itself (Type:Page) could not be distinguished in RDF. One would need to know the wiki type of all imported data to make this work reliably. > > Just got to think about one ... that combined with a general and robust > SMWSemanticData importer (into wiki pages), it would be enable to make > explicit facts that are only implicit in the wiki, by the means of > SPARQL CONSTRUCT queries, and persisting these new explicitized facts in > the wiki ... that is, one thing of which is typically done by reasoners > these days ... Possible, but I am not sure that the detour through SPARQL would be helpful there. One could also persist results of #ask queries in the same way. The main problem in both cases is not the initial computation but view maintenance/update. Markus |
From: Samuel L. <sam...@ri...> - 2011-11-01 15:40:34
|
On 11/01/2011 04:26 PM, Markus Krötzsch wrote: >> Overall, though, I think, supporting full roundtrip of SMW<->RDF data >> structures, is indeed interesting, and would enable a whole bunch of new >> use cases ... > > Fully reliable round-tripping won't be possible when considering single > entities (as one has to in SPARQL results since they may not, in genera, > provide enough context). For example, a URI of a wiki page (Type:URL) > and the wiki page itself (Type:Page) could not be distinguished in RDF. > One would need to know the wiki type of all imported data to make this > work reliably. Good point! // Samuel -- Samuel Lampa --------------------------------------- Bioinformatician @ Uppsala University Blog: http://saml.rilspace.org --------------------------------------- |
From: Markus K. <ma...@se...> - 2011-11-01 15:41:43
|
On 31/10/11 19:17, Samuel Lampa wrote: > On 10/31/2011 07:55 PM, Markus Krötzsch wrote: >> On 31/10/11 18:13, Samuel Lampa wrote: >>> On 10/31/2011 06:52 PM, Samuel Lampa wrote: >>>> === Q2: Status of SMWData/SMWDataItem as API? === >>>> >>>> Also I wondered what status the SMWData/SMWDataItem classes are >>>> supposed >>>> to have, as a general API? ... Are they the supposed API, or is SMW >>>> going towards preferring to talk SPARQL with all extensions ... or even >>>> SMWExpElements? >>>> >>>> I ask this since it does not seem clear that I will really*need* to use >>>> the SMWData/SMWDataItem combo as a representation, if I do the wiki >>>> page >>>> updates either with the Wiki Object Model extension or an own writer >>>> class. >>>> >>>> I would still prefer to use it, if it is pushed as a preferred API for >>>> these kind of things, but I wondered whether that is so for the >>>> foreseeable future? >>> >>> >>> The thing that makes me wonder, is since we're basically talking about >>> two slightly different (though very much overlapping) representations: >>> RDF (as represented by SMWExpElement rel. classes), and Semantic >>> MediaWiki facts (as repr. by SMWData/SMWDataItem). >>> >>> My problem, in the context of RDFIO, is that it seems I actually need >>> both of these to capture the information from both worlds ... since: >>> >>> a. I need to store the URI:s, which only SMWExpElement classes do >>> b. I need to store the wiki page titles that I choose to use (as part of >>> RDFIO:s algorithm), which only the SMWData/SMWData combo does. >>> >>> ... thus it seems there's at least two options: >>> >>> 1. RDFIO creates an own more general data container, which wraps both >>> the SMWData/SMWDataItem one, and the RDF one (possibly both the >>> SMWExpElement one, and ARC2:s data structures), with in-built converters >>> between all of these, >>> >>> 2. SMWData/SMWDataItem classes are updated to contain the "Original >>> URI", and then this format will be the only needed one, in addition to >>> possibly the ARC2 format, just for making use of it's parsers. >>> >>> >>> Number one is the one I've been pondering so far ... I just wanted to >>> point out this now and ask whether there would be any interest in >>> storing also the original URI directly in the SMWData/SMWDataItem >>> classes ... (which would not need to be required, for data that has no >>> counterpart in the outside world, though ... or maybe can just be >>> prefilled with the URIResolver URI:s ... this maybe on-the-fly, in a >>> getter method)? >>> >>> ... it seems that would make the SMWData/SMWDI combo more general, and >>> of course would make RDFIO add a lot less overhead :") >>> >>> (I know we discussed this on SMWCon already, but these things weren't >>> really that clear to me then, about the partly but not completely >>> overlap between RDF and SMW data representations ... so wanted to point >>> it out ... ) >> >> I suggest to go for (1) if you need the full information in one object. >> You should think of SMW data items as small and simple "values", similar >> to an integer or a char in a programming language. They should be used >> like constants of datatypes. They should only be used for storing data, >> not for converting data or for augmenting it. They are pure data and >> know nothing about HTML, wikitext or RDF. [Exception: the SMWDIContainer >> type is a placeholder for compound data; it is not really considered as >> an atomic value in SMW but just used for transporting compound data in >> the API] >> >> With this view in mind, making an object that holds a URI and a dataitem >> does not seem a bad idea (like making an object that holds an integer >> and a string). >> >> Alternatively, you could of course represent URIs in an SMW data item as >> well and relate them to wiki page with a property, stored together in an >> SMWSemanticData. > > > Ok, many thanks for the feedback! > > The suggestions sounds reasonable - keeping in line with the modelling > approach already taken. > > The only little caution I'd like to make, is that the decision keeping > data objects atomic makes them follow the Anemic Model antipattern [1] a > bit. But that is of course a question about model design approach > overall, and not this specific case only - that is, whether one wants to > follow Domain Driven Design patterns [2] or not. Reading [1], I think there is a misunderstanding in the way you seem to apply this text to SMW (probably due to my ill-chosen examples of property and wiki page out of all dataitems). The text states that domain specific behaviour of domain objects should be implemented in the classes that represent the objects. This is what we do. Our domain objects are strings, numbers, geographic coordinates. This is the very data that we want to manage in SMW, it just happens to be rather atomic, simple and (application) domain independent. Note that we do not artificially try to abstract or simplify the objects to get this representation -- these simple concepts are really the kinds of things that SMW users deal with. Yet we include all related code into the objects whenever such code is needed. For example, you can have a look at SMWDITime to see a lot of calendar/date specific code. We could also have similar methods for strings (e.g., substring computation) and for numbers (e.g., for rounding) but this was not necessary so far. Our data items do not include parsing/rendering functions that are specific to syntactic formats like HTML, wikitext, JSON, RDF, SQL, ... which I think is good (and established) design (you don't mix all parsing/serialisation code into one class). The big fallacy of [1] is to suggest that "object code" must always be much larger that "application/service code". If taken too serious, this could lead to a design that tries to merge all functionality into a few objects, thus contradicting the fundamental programming paradigm of separation of concerns. For example, SMW used to have HTML rendering and RDF serialisation methods for data in a single class, in spite of the fact that these functions are not at all related but merely work on the same input data. This earlier design of SMW has also undermined another important idea of OO design: the definition of clear interfaces with limited visibility. The code for parsing, rendering, representation and serialisation used to have full access to all internal fields of the objects. Before the introduction of data items, it was quite unclear for some objects where the data is actually stored (there were multiple redundant/overlapping internal representations, sometimes optional, to reflect the internal state of the object; all code would directly read/write to any of the members). A third main reason for keeping single objects small is that SMW is meant to be extendible. If each new storage backend or display format would rely on adding code to domain object classes, it would be very hard to extend the system. Overall, I still think that SMW follows most of the guidelines of Domain-Driven Design but for a domain (data management) that is very different of what the author of [1] had in mind. Another special observation about SMW is that most of our "business logic" is related to parsing and serialisation -- tasks that should normally be separated from the data that they work on. But maybe one has to take a step back and ask what the "domain layer" and "application layer" in SMW really are to compare it to the DDD idea. :-) Best regards, Markus > > ... so for the moment I'm happy to follow the existing model design > approach :) > > // Samuel > > > [1] http://martinfowler.com/bliki/AnemicDomainModel.html > [2] http://en.wikipedia.org/wiki/Domain-driven_design > > > |
From: Samuel L. <sam...@ri...> - 2011-11-01 16:56:40
|
On 11/01/2011 04:41 PM, Markus Krötzsch wrote: > On 31/10/11 19:17, Samuel Lampa wrote: >> The only little caution I'd like to make, is that the decision keeping >> data objects atomic makes them follow the Anemic Model antipattern [1] a >> bit. But that is of course a question about model design approach >> overall, and not this specific case only - that is, whether one wants to >> follow Domain Driven Design patterns [2] or not. > > Reading [1], I think there is a misunderstanding in the way you seem to > apply this text to SMW (probably due to my ill-chosen examples of > property and wiki page out of all dataitems). The text states that > domain specific behaviour of domain objects should be implemented in the > classes that represent the objects. This is what we do. Our domain > objects are strings, numbers, geographic coordinates. This is the very > data that we want to manage in SMW, it just happens to be rather atomic, > simple and (application) domain independent. Note that we do not > artificially try to abstract or simplify the objects to get this > representation -- these simple concepts are really the kinds of things > that SMW users deal with. > > Yet we include all related code into the objects whenever such code is > needed. For example, you can have a look at SMWDITime to see a lot of > calendar/date specific code. We could also have similar methods for > strings (e.g., substring computation) and for numbers (e.g., for > rounding) but this was not necessary so far. Our data items do not > include parsing/rendering functions that are specific to syntactic > formats like HTML, wikitext, JSON, RDF, SQL, ... which I think is good > (and established) design (you don't mix all parsing/serialisation code > into one class). Thanks for the clarification! I was obviously a bit quick to make an overall judgement, without studying other parts of the SMW code. What you describe sounds fine. I indeed agree with separation of concerns. Just as a sidenote I know that one very successful approach is to have "domain controller classes", for the different "domains" (or more properly "subdomains") or concerns, that aggregates and concerts all activities belonging to each concern, whether that be through rich domain objects, service objects, and any combination of those, but most importantly that all this concentrated and managed from the "domain controller object" (think "SMWDisplayDomain", "SMWExportDomain" etc ...). ... so, I'm indeed not a proponent of gather every possibly related functionality in normal domain objects. :) > The big fallacy of [1] is to suggest that "object code" must always be > much larger that "application/service code". If taken too serious, this > could lead to a design that tries to merge all functionality into a few > objects, thus contradicting the fundamental programming paradigm of > separation of concerns. For example, SMW used to have HTML rendering and > RDF serialisation methods for data in a single class, in spite of the > fact that these functions are not at all related but merely work on the > same input data. > > This earlier design of SMW has also undermined another important idea of > OO design: the definition of clear interfaces with limited visibility. > The code for parsing, rendering, representation and serialisation used > to have full access to all internal fields of the objects. Before the > introduction of data items, it was quite unclear for some objects where > the data is actually stored (there were multiple redundant/overlapping > internal representations, sometimes optional, to reflect the internal > state of the object; all code would directly read/write to any of the > members). True. Next sidenote, that the "Domain controller object" approach I'm familiar with, also makes it easy to add "facade" objects, specific to the different concerns, which manage visibility and the like, to all functionality related to that particular domain. > A third main reason for keeping single objects small is that SMW is > meant to be extendible. If each new storage backend or display format > would rely on adding code to domain object classes, it would be very > hard to extend the system. Good point too. > Overall, I still think that SMW follows most of the guidelines of > Domain-Driven Design but for a domain (data management) that is very > different of what the author of [1] had in mind. Indeed. > Another special > observation about SMW is that most of our "business logic" is related to > parsing and serialisation -- tasks that should normally be separated > from the data that they work on. An interesting observation ... got to think a bit on that :) > But maybe one has to take a step back > and ask what the "domain layer" and "application layer" in SMW really > are to compare it to the DDD idea. :-) Yeah, an interesting thing to ponder as well! :) (Sidenote no. 3 is that the "domain controller object" approach simply has the application layer in the domain controller objects ... thus tying together application and domain layer very closely, eventhough a clear separation is still maintained (application logiv in the domain controllers, domain logic in the other domain objects). ... and then the corresponding facade objects provide the presentation/API level logic, to which UI code and other extensions can talk.) All in all: Thanks for an interesting and clarifying elaboration of the design choices! To the best of my judgement it sounds well thought-through and reasonable. My sidenotes are just my spontaneous reflections that I couldn't resist to add, since I find that the "domain controller" approach really fills a gap in the land of DDD, on how to actually implement the separation of concerns, as well as the application layer, in practice. Cheers // samuel |