It was Keith R. Bennett who said at the right time 25.10.2007 16:34 the following words:
Chris -

Thank you for that information.  That is *exactly* the kind of  
feedback I need and appreciate.  I am totally new to RDF, and value  
the wisdom of the experience of yourself and others in avoiding  
pitfalls and maximizing my benefit.

This getPlainTextContent() is a common theme for me in my work; I'm  
trying to generalize the output of the parsing of many different kinds  
of documents, so that my user can say 'get me the { plainText, title,  
subject, etc. } from all these documents, no matter what their type.

Is this practical with Aperture's RDF output?  
in practice, you may want to do inferencing at some point.
A kind of simple inferencing is to provide a mapping before you return the results to the user.
As the ontologies are static and known, you can realize this by iterating through all statements and changing the statements, replacing the predicates with super-predicates.

I would go for this solution:
* load all ontologies into a rdf2go model (they are available quite easy by the classes, such as NIE.getOntologyAsStream (or something like this)
* write a list of predicates you want to have "streamlined", for example "plaintext", "title"
* iterate through the model, listing all subproperties of these predicates, and make a mapping list that maps from the sub-properties (the specific ones you don't want) mapping to the generic superproperties.
then you should have something like
HashMap<URI, URI> subpropertyToSuperpropertyILike  ...

before returning the dataobjects to the user,
iterate through the rdf2go model once and replace the statements that have mappings in this HashMap.
Alltogether, the HashMap will probably contain about 50-100 entries, so it won't hurt too much,
the iterating and replacing process can be a littly time-consuming, but hopefully not much.

If I make sure to use  
an RDF2Go Model implementation that supports inferencing, is it  
reasonable to think that I will get the expected results?
yes, but at the cost of inferencing.
If you only need the properties to be correct, rather use the simple solution.
  And does  
the implementation provided with Aperture support inferencing?
  
with aperture - perhaps, you would probably need some extra fiddling.
Some sesame sails shipped with aperture support inferencing, MemoryStoreRDFSInferencer.class maybe the thing you need.

you need the ontologies to make the inferencing work
(= the inference engines need two inputs: the ontologies defining which property is a subproperty of another and the extracted data to work on)

Both solutions are ok, using inference or hacking a HashMap mapping the properties,
the HashMap solution is probably more controllable in situations where performance is an issue (which is mostly the case)

best
Leo
Thanks,
Keith

Quoting Christiaan Fluit <christiaan.fluit@aduna-software.com>:

  
When looking at the ExtractorUtils code, I saw a getPlainTextContent
method. One thing that worries me is that what you get here depends on
the underlying RDF2Go Model implementation, specifically whether it
supports inferencing or not, and that this will not be clear to a lot of
folks (especially those without an RDF background).

The NIE ontology defines a plainTextContent property with two
subproperties: NMO:plainTextMessageContent and
NID3:unsynchronizedTextContent. The latter are meant to be used under
more specific circumstances, namely for mails and MP3 files (or files
with ID3 tags) respectively.

When the Model supports inferencing and you ask for plainTextContent,
you will also get the latter two, whereas non-inferencing Models only
return exact matches of what you ask for. I think we should make this
clear somewhere. I don't think the documentation captures this at the
moment.

The same principle applies to other properties, e.g. NIE.subject with
its subproperty NMO.messageSubject. For people dumping Aperture's output
straight into an RDF database this won't really be a problem (as you
take care of inferencing issues elsewhere), but it is when you directly
interpret Aperture's output.

Regards,

Chris
    



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Aperture-devel mailing list
Aperture-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aperture-devel
  


-- 
____________________________________________________
DI Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +49 631 20575-116
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
____________________________________________________