[Pointrel-discuss] NEPOMUK lessons learned
Status: Alpha
Brought to you by:
paulfernhout
From: Paul D. F. <pdf...@ku...> - 2011-02-27 12:59:02
|
Just poking around NEPOMUK as I think about CouchDB and using existing systems somehow, also thinking about this issue of context and how do people agree where to go to get their resources and transactions when they have a common shared space where the URI/URN resources might not be in that space (I'm not sure anyone has a general solution other than the web of URLs as locations). From NEPOMUK's: "Using the Personal Semantic Desktop" 20.10.2008 http://dev.nepomuk.semanticdesktop.org/raw-attachment/wiki/NepomukTutorial/d2-3.pdf """ 4.2 RDF Repository and PIMO Service The delivered RDF Repository has the conventional features of an RDF store. Main functionalities are adding and deleting triples, and searching for facts. The implementation goes beyond the state of the art by providing a scalable implementation of a limited NRL inference engine. The details of this are described on the page about inference14. The approach to inference is satisfactory: it provides a limited set, but is faster than the default inference engine of the underlying store, especially in the case of deleting triples. Similar is our approach to full-text indexing: we extended (together with the open-source community around Aduna-Software.com) the Sesame RDF store with fulltext-indexing capabilities. Again, this approach is fairly scalable (it slows down with increased database size). There is a wikipage documenting fulltext indexing15, we can say it provides satisfactory fulltext search. A problematic area is the communication with the database. Relational databases have an optimised scheme to communicate. Well-known commands exist to manipulate data (creation of rows, deletion, and updates) and a clear transaction scheme allows to commit and roll-back multiple operations. On RDF, the manipulation of stores is usually done on the level of adding and removing individual triples. To abstract from this, many applications add a "business layer" API on top of relational databases offering object-level operations. Typical operations would be to add, remove, and update items, or to invoke more complex commands. In NEPOMUK we have these commands on the level of the PIMO service. Developers can use either the RDF Repository directly of the "business layer" provided by PIMO service. At the beginning of the project, we discussed the possibility of having a single interface that offers access to the database, on the level of a "business layer" and prohibit direct manipulation of triples. On hindsight, we have seen that a business layer would have reduced the programming effort and claried how exactly the store has to be used. This would mean to remove access to the repository on the level of individual triples and instead only allow operations on the level of resources (and classes, properties, ontologies). Manipulations would then also work in bigger chunks, making each call veriable (the state before and after the change have to keep the RDF data valid, this is not possible when allowing single-triple operations). The advantages of a "high-granular" API would be: * Being able to verify operations on a higher level. * Optimising database operations: when operations have to be on "one resource" level, indexing and inference can be optimised. * Developers do not have to know about the semantics on the level of individual triples but can concentrate on higher-level tasks. * Signals about changes are more coarse-granular, allowing better listening architectures. The last point is related to the problem of signalling and messaging between services. At the moment, NEPOMUK only offers a method-invocation based service-to-service communication. Therefore the RDF Repository only supports very basic ways of listening to changes. A service which reacts to changes in data has to register a listener on the level of triples (a triple was changed). Having a higher-level API would allow listeners to get informed on the level of resources (a resource has changed). ... Summing up the issues already addressed above in the lessons learned, the most important aspect would be to restrict access to the RDF Repository to methods on a higher-level business-layer API. Instead of allowing single-triple operations (add a triple, remove a triple), operations should always be on the level of resources (add this resource, remove this resource) and provide more convenience to the developer by providing a clear object-oriented API. """ NEPOMUK has a lot of interesting service ideas that would be useful on top of the Pointrel System -- and that issue with changes is interesting. As for the aspect of "high-granular", that is what is accomplished by Pointrel20090201 transactions (and also transactions in earlier version that were not stored as individual files but integrated into the triple store database but could be rolled back). Anyway, with CouchDB (looking at now) focused on documents (resources), with NEPOMUK saying this, with HTTP talking about this, and with the version of Pointrel from two years ago having moved in this direction, it seems this ideas of resources both for their own sake and as transactions that I have been exploring is a promising area. The idea of resources as transactions though is different from the seeming intent of CouchDB, HTTP, and NEPOMUK as they are commonly used though. But the Pointrel20090201 system has two layers in that sense, resources as files and resources as transactions of triples if they are in a special file format (.pointrel). But the resources as files could support other usages -- like resources in RDF used as transactions. --Paul Fernhout http://www.pdfernhout.net/ ==== The biggest challenge of the 21st century is the irony of technologies of abundance in the hands of those thinking in terms of scarcity. |