Re: [dotNetRDF-Develop] Spaqrl Query Performance

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Rob,

thank you for your quick support! You were right that it was running 
with the debugger attached - I figured it wouldn't matter that much, 
since it was a Release build anyway, but I had no idea! Times are cut by 
35% when running without a debugger attached, I'm now below 4 ms per 
query most of the time.

Also, I wasn't aware that Property Paths were that expensive - replacing 
most of the * signs with + signs cut it down to 3 ms per query. 
Unfortunately, my use case doesn't allow me to do this as widespread, so 
I can't make use of this specific optimization.

I dug a little deeper in my own code and found out that some queries are 
quite repetetive, so I introduced an abstraction layer and a "query 
cache" as an optional wrapper, which allows me to prevent the execution 
of most queries and instead return cached results. This basically cuts 
down all of my times to zero after the first few algorithm cycles. I 
guess I just looked to the wrong place for optimizations :)

Thanks again for your support, and for maintaining DotNetRDF. It is a 
great library. I'm using it for a research project and I wouldn't have 
been anywhere as far without it.

Regards,
Adam

Am 03.03.2015 um 13:52 schrieb Rob Vesse:
> Adam
>
> Comments inline:
>
> From: Fedja Adam <ad...@ad... <mailto:ad...@ad...>>
> Reply-To: dotNetRDF Developer Discussion and Feature Request 
> <dot...@li... 
> <mailto:dot...@li...>>
> Date: Tuesday, 3 March 2015 10:45
> To: <dot...@li... 
> <mailto:dot...@li...>>
> Subject: [dotNetRDF-Develop] Spaqrl Query Performance
>
>     Hello dotNetRDF Team,
>
>     I'm quite new to your library and RDF as well, and I have run into
>     some performance issues I don't seem to be able to solve by
>     myself. Not being sure whether this is a problem on my side or
>     simply an algorithmic or implementation issue, I'm writing you for
>     some feedback and / or help.
>
>     The setup is reasonably simple: I'm using an InMemoryDataset
>     consisting of (potentially multiple, but in this case a single)
>     Graph(s). The dataset is very small (73 lines of Turtle)
>
>
> Provided a sample dataset (with data redacted/obfustucated as 
> necessary) is helpful if you'd like us to investigate further if our 
> other comments don't help
>
>     and the queries I'm performing shouldn't be too complex either.
>     This is the code I'm using for querying:
>
>         SparqlParameterizedString queryString = new
>         SparqlParameterizedString();
>         queryString.CommandText = query;
>         // []Adding namespaces here]
>         // [Setting parameters here]
>         SparqlQuery sparqlQuery =
>         this.parser.ParseFromString(queryString);
>         SparqlResultSet resultSet =
>         this.processor.ProcessQuery(sparqlQuery) as SparqlResultSet;
>
>     I'm using a SparqlQueryParser and a LeviathanQueryProcessor.
>     Everything happens locally on my machine, no web stuff involved.
>     The problem is that *a single ProcessQuery call takes about **5 -
>     7 ms*,
>
>
> Is this running under the debugger?
>
> Under the debugger the observed performance can be orders of magnitude 
> worse, please make sure you are taking any timings with a release 
> build with no debugger attached
>
>     which is too much for my purposes. I need to perform a lot of
>     differently parameterized queries in a row.
>
>     *Is there any way I could improve performance?* Calling "Optimize"
>     on the query before executing doesn't seem to have an effect. A
>     representative example query is this one:
>
>
> The parser automatically calls Optimize (unless you've disabled 
> optimisation) when it finishes parsing a query so calling it again 
> will be a no-op
>
>
>         SELECT ?obj
>         WHERE
>         {
>           ?obj Knowledge:IsA* ?actor .
>
> Do you actually need to use property paths here (the * syntax)?
>
> Property paths are expensive to evaluate especially arbitrary length 
> paths like * (zero or more).  Note that using * will potentially bind 
> all triples with that predicate in the data (depends on the order in 
> which the engine evaluates the matches) so if you do need property 
> paths using + (at least one step) is typically better though it won't 
> be as fast as avoiding property paths altogether.
>
> If the nodes or interest are directly connected to each other by a 
> single instance of the Knowledge:IsA predicate then omit the */+.   If 
> it will be connected within a limited number of hops consider using 
> the {n,m} syntax instead as that can be evaluated more efficiently.
>
>
>           ?actor Knowledge:HasAttribute Knowledge:Actor .
>           ?obj Knowledge:IsA* ?prey .
>
> Same comment as above applies to the use of property paths here
>
>
>           @MainActor Knowledge:PredatorOf ?prey .
>           MINUS
>           {
>             ?obj Knowledge:HasAttribute Knowledge:Abstract .
>           }
>           FILTER (!sameTerm(?obj, @MainActor))
>         }
>
>     If you spot any wild problems in the query itself, let me know.
>
>
> Use of property paths are the only obvious concern without knowing 
> more about your data
>
>
>
>     I've also looked at some parts of the RDF Querying code and it
>     seems like there is some kind of Algebra evaluation using classes
>     - is there maybe a way to "Compile" them similar to C# Expression
>     trees in order to improve performance?
>
>
> Well a query is "compiled" in a sense to an algebra (which is the 
> formal representation of the query) but our engine does not do any 
> kind of caching of the algebra as a query plan as a traditional RDBMS 
> might do.  If you really wanted to you could modify the algebra once 
> you have it to substitute your parameters in that way.  However that 
> is not for the faint of heart nor would it necessarily yield any 
> performance improvements since property paths are likely the biggest 
> single factor in the execution time.
>
> Rob
>
>
>
>     Regards,
>     Adam
>     ------------------------------------------------------------------------------
>     Dive into the World of Parallel Programming The Go Parallel
>     Website, sponsored by Intel and developed in partnership with
>     Slashdot Media, is your hub for all things parallel software
>     development, from weekly thought leadership blogs to news, videos,
>     case studies, tutorials and more. Take a look and join the
>     conversation now.
>     http://goparallel.sourceforge.net/_______________________________________________
>     dotNetRDF-develop mailing list
>     dot...@li...
>     <mailto:dot...@li...>
>     https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop 
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> dotNetRDF-develop mailing list
> dot...@li...
> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop