Re: [dotNetRDF-Support] Sparql Optimisation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Redouane

First off it is hard to offer accurate advice with a complete example I.e.
sample data and code as well as your SPARQL query.

I'm assuming your data is in RDF/XML format when you say it is XML?

RDF/XML is not our fastest parser so loading it into an in-memory graph can
take some time, without knowing what you are counting as a bigger file it is
hard to say whether 5 minutes is unreasonable or not.

As for your query there does not look to be anything particularly wrong with
it.  I would guess that you have far more rdf:type cim:VoltageLevel
statements in your data than you do cim:VoltageLevel.Substation statements
however the SPARQL engine should be doing an indexed join so this shouldn't
cause any performance issues.

The position of the triples in the file should be entirely irrelevant to
query performance.  Our in-memory graphs are fully indexed so all lookups
should be on the order of O(log n) so the performance variance must be
caused by either the shape of your data (are results later in the loop more
likely to produce more results?) or by something in your code.

In general it is worth noticing that SPARQL queries use try catch heavily so
running in Visual Studio under the debugger will often yield substantially
slower performance than when running standalone.  I would suggest trying to
run your program outside Visual Studio to see how fast it runs there as
we've seen dramatic performance differences in the past as a result of this.

However without seeing a more complete example there is not really any other
advice I can offer you

Hope this helps,

Rob

From:  Redouane Bali <red...@is...>
Date:  Wednesday, 4 June 2014 14:08
To:  Rob Vesse <rv...@do...>
Subject:  Sparql Optimisation [DotNetRDF]

> Hi,
> 
> I'm a young intern at Hydro-Québec Montreal and I'm working with DotNetRDF for
> few months.
> My aim is to parse a xml file containing circuit (and equipments) data and
> rebuild a full project on a software called EMTP.
> 
> When I tested my code on little xml files, It was ok (about 1sec of
> execution), but yesterday I tried it on a bigger file and saw that It needed
> about 5 minutes to build my model and recover all informations I need.
> 
> I make few thousands of query and I would like advice to optimize my code.
> 
> For example, I saw that a simple query collecting a subject (with a type, a
> predicate & a object known) :
> "SELECT ?s  WHERE {?s rdf:type cim:VoltageLevel; cim:VoltageLevel.Substation
> <" + this.node.Uri + "> }";
> 
> took about 100ms at the beggining of my algo, then about 300ms at the end,
> probabily because the triple is increasingly far in the file.
> 
> How could I fix it ?
> 
> Thank you !
> 
> 
> PS : sorry for my bad english, i'm French. And thank you for your library, i'm
> very proud to work with it.
> 
> 
> -- 
> Redouane Bali
> 06.62.58.06.64