From: Antoni M. <Ant...@df...> - 2006-10-27 17:10:24
|
So, more or less the first stage of migration seems to be nearing completion. Here is a short summary of what has been done, what remains to be done and what issues arose in the process.. ----------------------------------------------------------------------------- Translation issues: RDFContainer changed the interfaces to their RDF2Go Equivalents getModel returns an RDFModel, not a generic Object getValueFactory returns a ValueFactory Creation of values and statements There are three ways to create values (URIS, Literals, BlankNodes and Statements) * using the model interface directly * using a value factory * using ModelUtil static methods (which are exactly the same as those in the valueFactory, but accept a model as their first argument). The rest is more of a mechanical process. * changing imports * changing new URIImpl into URIImpl.createURIWithoutChecking * changing new LiteralImpl into ModelUtil.createLiteral, or rdfContainer.getValueFactory().createLiteral ----------------------------------------------------------------------------- Implementation independence issues: There are three goals I tried to pursue. 1. Core aperture should depend only on rdf2go.jar (that is no class from the core aperture should use any concrete model implementation, since that would introduce a dependency on that particular implementation). 2. The tests may depend on ModelSesameImpl (but not on org.openrdf ... classes) 3. From the concrete Node implementations only URIImpl can be used directly in aperture code. It is impossible to prevent it, since it causes chicken-and-egg type problems e.g. with constructing rdfContainers. All other values are to be created with the aid of a model instance (either directly or through a valuefactory or through a ModelUtil). This is broken in: The AppleAddressBookCrawler. It uses a temporary model. Invokes a createSimpleModel() method. That uses ModelImplSesame Originally it was in RepositoryUtil class. I moved it to the AppleAddressBookCrawler for the dependency issues to be better visible. The RDF2GoRDFContainer itself. If the user doesn't provide a model - it creates a default ModelImplSesame model. I did some searching around the code. The only place within core architecture classes those constructors are used is the RDF2GoRDFContainer factory. The factory itself is used in other parts of the code but no class creates it. If removing this dependency is a concern, I would suggest a following solution. 1. Have RDF2GoRDFContainer accept a Model from outside. Don't provide any default implementation. 2. Ask Benjamin to create a ModelImplSesameFactory. (I'm actually surprised it isn't there). 3. Create a constructor for RDF2GoRDFContainerFactory that accepts an instance of the ModelFactory interface. (it is possible since as I said no aperture class creates instances of RDFContainerFactory, and the DEFAULT_FACTORY static field is never used in aperture). 4. Use the ModelFactory in newInstance and and getRDFContainer 5. Remove the DEFAULT_FACTORY field. This might break applications that use aperture. It's hard for me to estimate what would be the costs of such a change. (every RDFContainerFactory creation would need an instance of ModelFactory, DEFAULT_FACTORY couldn't be used). ----------------------------------------------------------------------------- URI Validation Leo is right that a malformed URI is usually an indication of a bug on our side that should be found and removed. Nevertheless if a Schema for some input file format states that some elements from the input should be represented as URIs in the RDF output - we should be prepared for situations where an input file can contain arbitrary strings that should be interpreted as 'URIS' The solution to ignore DataObjects with faulty URIs is simple and clear, It cannot be implemented with java.net.URI, since a simple string without spaces is accepted. I do insist that we need a general way to validate URI's from the input... a checkURI method in the Model interface would be sufficient. It has been proposed on the RDF2Go devel list independetly by me, and by Mr. Richard Cyganiak. In reply to this discussion Max Volkel changed a single comment in the Model interface, from /** @return a new URI from the given String */ to /** The model must create URIs it would accept itself. @return a new URI from the given String */ This is clearly NOT a solution, because in current implementation the validation is to be employed when a log level is high enough. We can't build an application that will be robust only if debug log level is enabled... ------------------------------------------------------------------------------- Other issues: ...outlook.OutlookResource Contains references to the ICAL ontology. May I switch it to ICALTZD ontology? Untestable classes... ...outlook.TestOutlookCrawlAll ...outlook.TestOutlookCrawler ...addressbook.AppleAddressbookCrawlerTest Couldn't test it since I don't have Outlook and Apple Addressbook... ----------------------------------------------------------------------------- Obsolete classes - had to be rewritten... RepositoryAccessData - replaced by ModelAccessData Since rdf2go doesn't support contexts directly it would be up to the user of ModelAccessData, to provide a model implementation that would use an appropriate context (if necessary) SesameRDFContainer - replaced by RDF2GoRDFContainer SesameRDFContainerFactory - replaced by RDF2GoRDFContainerFactory RepositoryUtil - methods from this class have been included in ModelUtil I have also rewritten the ConfigurationUtil.get/set domain boundaries and the VocabularyWriter, so they don't use SERQL queries anymore. ----------------------------------------------------------------------------- RDF2Go Bugs... Something's wrong with the reading part... I have sent an email to the RDF2Go devel mailing list. This exception occurs in the VocabularyWriter and in ThunderbirdCrawlerTest The ModelImplSesame constructor was wrong... There are deadlocks... There seems to be no implementation of ModelSet in the Sesame2 driver. Or maybe I don't understand how to use it... I couldn't find any documentation for the ModelFactory.getModelSet(Properties p). What properties could go there? Antoni Mylka ant...@df... |