From: <mar...@ko...> - 2013-02-12 18:12:52
|
I'm wondering how to best optimize the load performance of dotNetRDF. I have an RDF/XML (test) file containing the equivalent of about 316000 n-triples statements pertaining to about 21000 resources. The RDF is highly regular, and limited to a subset of full RDF/XML, having the following structure: <rdf:Description rdf:about="...."> <rdf:type rdf:resource="...."> <dc:subject> ... </dc:subject> ... {small but variable number of other properties } </rdf:Description> ... My program reads the RDF and does some LINQ-based queries against it. It takes about 11 seconds to load the RDF/XML into a Graph; given the loaded graph, the LINQ queries take about 24 milliseconds. As a comparison point, though, I tried to see what the performance would be if I used the .NET XElement.Load function, exploiting the fact that my XML has a very regular structure (no lists/blank nodes/nested resources). In that case, it takes about 400 milliseconds to load the XML into a simple homebrew triple class structure (no indices); my equivalent LINQ queries then take about 200 milliseconds. Of course, that approach only works for a particular RDF/XML structure, and it will start to break down as I do more & more queries. Clearly, if I were to do lots of query operations, the cost of the initial load by the dotNetRDF library would be better amortized, and I would start to reap the benefits of having the Graph structure. However, given that I only want to do a relatively small number of queries, I'm wondering if there's anything I can do to improve the load performance to close the gap. I tried a few things: using NonIndexedGraph (reduced load time down to around 8.6 seconds, while bringing my query time up to 1.1 seconds) and using Turtle w/ NonIndexGraph (reduced load time to 5.6 seconds), but I'd like to get within a small multiplier of the XElement.Load time if possible, while preserving the benefit of indexing. -Mark |