[dotNetRDF-Support] Load performance

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I'm wondering how to best optimize the load performance of dotNetRDF. 

I have an RDF/XML (test) file containing the equivalent of about 316000 
n-triples statements pertaining to about 21000 resources.  The RDF is 
highly regular, and limited to a subset of full RDF/XML, having the 
following structure:

<rdf:Description rdf:about="....">
   <rdf:type rdf:resource="....">
   <dc:subject> ... </dc:subject>
   ... {small but variable number of other properties }

</rdf:Description>
...

My program reads the RDF and does some LINQ-based queries against it. 

It takes about 11 seconds to load the RDF/XML into a Graph; given the 
loaded graph, the LINQ queries take about 24 milliseconds.

As a comparison point, though, I tried to see what the performance would 
be if I used the .NET XElement.Load function, exploiting the fact that my 
XML has a very regular structure (no lists/blank nodes/nested resources).  
In that case, it takes about 400 milliseconds to load the XML into a 
simple homebrew  triple class structure (no indices); my equivalent LINQ 
queries then take about 200 milliseconds.  Of course, that approach only 
works for a particular RDF/XML structure, and it will start to break down 
as I do more & more queries.

Clearly, if I were to do lots of query operations, the cost of the initial 
load by the dotNetRDF library would be better amortized, and I would start 
to reap the benefits of having the Graph structure.  However, given that I 
only want to do a relatively small number of queries, I'm wondering if 
there's anything I can do to improve the load performance to close the 
gap.   I tried a few things: using NonIndexedGraph (reduced load time down 
to around 8.6 seconds, while bringing my query time up to 1.1 seconds) and 
using Turtle w/ NonIndexGraph (reduced load time to 5.6 seconds), but I'd 
like to get within a small multiplier of the XElement.Load time if 
possible, while preserving the benefit of indexing.

-Mark