From: Simon C. <si...@sa...> - 2005-01-31 13:47:38
|
Hi Chris, > It sounds as if you are converting obo format files into the go-rdf file > format (incidentally, I didn't know the W3 RDF primer had a whole section > on the GO RDF format!) in a native RDF database (redland). I was quite surprised to see the section on GO RDF in the Primer too - I before reading it I didn't know GO RDF existed. :) I'm going directly from OBO to RDF triples (with Redland as the store). Currently my RDF triples use a custom namespace because I didn't know about the GO RDF namespace when I was writing the scripts. A conversion should be relatively trivial. > Are you using a RDF query language like RDQL, or an RDF API, or both? > Do you find that certain queries are easier to express using an RDF > query language than the equivalent SQL queries? How do they compare in > terms of performance? Do you have any example queries? I'm using RDQL for querying (Redland also supports SPARQL). My current queries are quite simple (mostly looking up items mapped to individual terms). Obviously the same functionality could be provided by SQL but RDF fits closely with the way I think about ontologies. I'd guess that RDQL queries will look much simpler than the comparable SQL queries since even very simple RDQL probably translates into SQL with JOINS. More complex RDQL would mean more JOINS. If you need to select items based on triple structure (e.g. find all X where X is_a GO:0003673) then RDQL is probably as fast as a direct SQL query. If you need to match items based on the values of literals (e.g. find all X where GO:NAME starts with 'Foo') then RDQL will be substantially slower. This is because the Redland storage engines don't (yet?) support these kinds of look-ups so they have to be done by the Rasqal query engine after the triples are retrieved. It's possible that other RDF storage/query backends (Jena?) handle this better. I'm up to around 10 million triples at the moment and my query times are still acceptable (around a few seconds on an old Pentium 3 with 128 MBs of RAM - I suspect most of this is because size_of(data) >> size_of(RAM)). I'd like to be able to start doing some inference based querying but I'm still looking in to that. > You indicate you're also using MySQL - is this the GO MySQL schema, or a > generic RDF schema? One of the many possible Redland triple stores is MySQL. So Redland handles the storing of (generic) RDF in the MySQL database. Redlands SQL table structure is pretty straight forward > By the way, we will shortly be offering both go-rdf and owl (which is > layered on top of rdf) as download options for all of OBO. The scripts and > XSLTs for generating these are available as part of the go-dev > distribution - www.godatabase.org/dev - if you fancy generating either of > these formats yourself. The advantage of both of these formats is that > either can be loaded into and queried from a generic RDF database using > generic tools. Cool! Schiavo Simon |