From: Rob V. <rv...@do...> - 2014-06-04 15:32:53
|
Hi Redouane First off it is hard to offer accurate advice with a complete example I.e. sample data and code as well as your SPARQL query. I'm assuming your data is in RDF/XML format when you say it is XML? RDF/XML is not our fastest parser so loading it into an in-memory graph can take some time, without knowing what you are counting as a bigger file it is hard to say whether 5 minutes is unreasonable or not. As for your query there does not look to be anything particularly wrong with it. I would guess that you have far more rdf:type cim:VoltageLevel statements in your data than you do cim:VoltageLevel.Substation statements however the SPARQL engine should be doing an indexed join so this shouldn't cause any performance issues. The position of the triples in the file should be entirely irrelevant to query performance. Our in-memory graphs are fully indexed so all lookups should be on the order of O(log n) so the performance variance must be caused by either the shape of your data (are results later in the loop more likely to produce more results?) or by something in your code. In general it is worth noticing that SPARQL queries use try catch heavily so running in Visual Studio under the debugger will often yield substantially slower performance than when running standalone. I would suggest trying to run your program outside Visual Studio to see how fast it runs there as we've seen dramatic performance differences in the past as a result of this. However without seeing a more complete example there is not really any other advice I can offer you Hope this helps, Rob From: Redouane Bali <red...@is...> Date: Wednesday, 4 June 2014 14:08 To: Rob Vesse <rv...@do...> Subject: Sparql Optimisation [DotNetRDF] > Hi, > > I'm a young intern at Hydro-Québec Montreal and I'm working with DotNetRDF for > few months. > My aim is to parse a xml file containing circuit (and equipments) data and > rebuild a full project on a software called EMTP. > > When I tested my code on little xml files, It was ok (about 1sec of > execution), but yesterday I tried it on a bigger file and saw that It needed > about 5 minutes to build my model and recover all informations I need. > > I make few thousands of query and I would like advice to optimize my code. > > For example, I saw that a simple query collecting a subject (with a type, a > predicate & a object known) : > "SELECT ?s WHERE {?s rdf:type cim:VoltageLevel; cim:VoltageLevel.Substation > <" + this.node.Uri + "> }"; > > took about 100ms at the beggining of my algo, then about 300ms at the end, > probabily because the triple is increasingly far in the file. > > How could I fix it ? > > Thank you ! > > > PS : sorry for my bad english, i'm French. And thank you for your library, i'm > very proud to work with it. > > > -- > Redouane Bali > 06.62.58.06.64 |