[Pointrel-discuss] NEPOMUK lessons learned

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Just poking around NEPOMUK as I think about CouchDB and using existing 
systems somehow, also thinking about this issue of context and how do 
people agree where to go to get their resources and transactions when 
they have a common shared space where the URI/URN resources might not be 
in that space (I'm not sure anyone has a general solution other than the 
web of URLs as locations).

 From NEPOMUK's: "Using the Personal Semantic Desktop" 20.10.2008
http://dev.nepomuk.semanticdesktop.org/raw-attachment/wiki/NepomukTutorial/d2-3.pdf
"""
4.2 RDF Repository and PIMO Service
   The delivered RDF Repository has the conventional features of an RDF 
store. Main functionalities are adding and deleting triples, and 
searching for facts. The implementation goes beyond the state of the art 
by providing a scalable implementation of a limited NRL inference 
engine. The details of this are described on the page about inference14. 
The approach to inference is satisfactory: it provides a limited set, 
but is faster than the default inference engine of the underlying store, 
especially in the case of deleting triples. Similar is our approach to 
full-text indexing: we extended (together with the open-source community 
around Aduna-Software.com) the Sesame RDF store with fulltext-indexing 
capabilities. Again, this approach is fairly scalable (it slows down 
with increased database size). There is a wikipage documenting fulltext 
indexing15, we can say it provides satisfactory fulltext search.
   A problematic area is the communication with the database. Relational 
databases have an optimised scheme to communicate. Well-known commands 
exist to manipulate data (creation of rows, deletion, and updates) and a 
clear transaction scheme allows to commit and roll-back multiple 
operations. On RDF, the manipulation of stores is usually done on the 
level of adding and removing individual triples. To abstract from this, 
many applications add a "business layer" API on top of relational 
databases offering object-level operations.  Typical operations would be 
to add, remove, and update items, or to invoke more complex commands. In 
NEPOMUK we have these commands on the level of the PIMO service. 
Developers can use either the RDF Repository directly of the "business 
layer" provided by PIMO service. At the beginning of the project, we 
discussed the possibility of having a single interface that
offers access to the database, on the level of a "business layer" and 
prohibit direct manipulation of triples. On hindsight, we have seen that 
a business layer would have reduced the programming effort and claried 
how exactly the store has to be used. This would mean to remove access 
to the repository on the level of individual triples and instead only 
allow operations on the level of resources (and classes, properties, 
ontologies). Manipulations would then also work in bigger chunks, making 
each call veriable (the state before and after the change have to keep 
the RDF data valid, this is not possible when allowing single-triple 
operations). The advantages of a "high-granular" API would be:
* Being able to verify operations on a higher level.
* Optimising database operations: when operations have to be on "one
resource" level, indexing and inference can be optimised.
* Developers do not have to know about the semantics on the level of
individual triples but can concentrate on higher-level tasks.
* Signals about changes are more coarse-granular, allowing better 
listening architectures.
   The last point is related to the problem of signalling and messaging 
between services. At the moment, NEPOMUK only offers a method-invocation 
based service-to-service communication. Therefore the RDF Repository 
only supports very basic ways of listening to changes. A service which 
reacts to changes in data has to register a listener on the level of 
triples (a triple was changed). Having a higher-level API would allow 
listeners to get informed on the level of resources (a resource has 
changed). ...
   Summing up the issues already addressed above in the lessons learned, 
the most important aspect would be to restrict access to the RDF 
Repository to methods on a higher-level business-layer API. Instead of 
allowing single-triple operations (add a triple, remove a triple), 
operations should always be on the level of resources (add this 
resource, remove this resource) and provide more convenience to the 
developer by providing a clear object-oriented API.
"""

NEPOMUK has a lot of interesting service ideas that would be useful on 
top of the Pointrel System -- and that issue with changes is 
interesting. As for the aspect of "high-granular", that is what is 
accomplished by Pointrel20090201 transactions (and also transactions in 
earlier version that were not stored as individual files but integrated 
into the triple store database but could be rolled back).

Anyway, with CouchDB (looking at now) focused on documents (resources), 
with NEPOMUK saying this, with HTTP talking about this, and with the 
version of Pointrel from two years ago having moved in this direction, 
it seems this ideas of resources both for their own sake and as 
transactions that I have been exploring is a promising area. The idea of 
resources as transactions though is different from the seeming intent of 
CouchDB, HTTP, and NEPOMUK as they are commonly used though. But the 
Pointrel20090201 system has two layers in that sense, resources as files 
and resources as transactions of triples if they are in a special file 
format (.pointrel). But the resources as files could support other 
usages -- like resources in RDF used as transactions.

--Paul Fernhout
http://www.pdfernhout.net/
====
The biggest challenge of the 21st century is the irony of technologies 
of abundance in the hands of those thinking in terms of scarcity.