Blazegraph (powered by bigdata) / Discussion / Help: How to embedded an ASK query in a SELECT one ?

Cyril - 2015-08-17

Dear everyone,

I'm currently attempting to design queries which embedded ASK query in SELECT one. More exactly I wish to select and filter results in my graph where an assumption is true in one query.

For my example, imagine I've many books which have one author and one editor. I want to select the book from the author which his book is linked through random path length property to the client#1.

In my case, with my data, It takes a lot of time to realise the query directly like that:

SELECT DISTINCT ?id_book WHERE {?id_book prefix:hasAuthor :author#1. ?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}

To reduce the time of calculus (I save more than dozen minutes), I'm using a script to realise successively these queries. The script selects the books which have as author the author n°1:

SELECT ?id_book WHERE {?id_book prefix:hasAuthor :author#1}

And I ask for each result for 1 to n (id_book#1, id_book#2, ..., id_book#n) if it's linked to client n°1:

ASK {id_book#i prefix:linkedToEditor*/prefix:hasClient :client#1}

The SELECT query followed by the ASK query is far faster than the first SELECT query for the same results. I don't want to explore all the possibilities of ?id_book prefix:linkedToEditor/prefix:hasClient :client#1; I just want to save results where the link exists.

I wish to realise these queries in one query. I tried with FILTER EXISTS or one embedded SELECT query, but the query times are as long as the first query above.

SELECT ?id_book WHERE {?id_book prefix:hasAuthor :author#1.} FILTER EXIST {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1} ORDER by ?id_book

or

SELECT DISTINCT ?id_book WHERE {?id_book prefix:hasAuthor :author#1. {SELECT ?id_book WHERE {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1.} } }

I also tried the use of WITH and INCLUDE, same symptoms:

SELECT DISTINCT ?id_book WITH { SELECT ?id_book WHERE {?id_book prefix:hasAuthor :author#1} } AS %firstGraph WHERE { ?id_book prefix:linkedToEditor*/prefix:hasClient :client#1 INCLUDE %firstGraph }

The query with FILTER EXISTS and INCLUDE get all solutions before to give results. Do exist optimized queries for my purpose?

Thanks in advance.

Last edit: Cyril 2015-08-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bryan Thompson - 2015-08-18
  
  Michael wrote:
  
  One trick for a FILTER EXISTS containing a single triple pattern is to
  rewrite it into an OPTIONAL join -- the benefit is obtained through
  switching to a pipelined plan (this is only true if there is a LIMIT
  involved in the query). (We plan to introduce no-blocking hash joins to
  improve performance in queries that use limits.)
  
  This case is quite different in various aspects: it's a FILTER NOT EXISTS
  and the inner expression is quite complex with an arbitrary length property
  path. If the ASK queries in the decomposed version are really fast, then
  there should be an efficient plan (not saying that Blazegraph is able to
  find it, though).
  
  In the end it boils down to looking at the ALP node, and how it makes use
  of incoming bindings. If
  
  ASK {id_book#i prefix:linkedToEditor*/prefix:hasClient :client#1}
  
  is really running much more efficiently, what about the same version as a
  SELECT query with LIMIT 1. If the latter is efficient as well, the problem
  might be a blowup in the sense that there are multiple paths, and using
  LIMITed SELECT queries might help.
  
  We'd need some sample data to look into this into more detail.
  
  Thanks,
  Bryan
  
  On Mon, Aug 17, 2015 at 9:33 AM, Cyril cyril-f@users.sf.net wrote:
  
  Dear everyone,
  
  I'm currently attempting to design queries which embedded ASK query in
  SELECT one. More exactly I wish to select and filter results in my graph
  where an assumption is true in one query.
  
  For my example, imagine I've many books which have one author and one
  editor. I want to select the book from the author which his book is linked
  through random path length property to the client#1.
  
  In my case, with my data, It takes a lot of time to realise the query
  directly like that:
  
  SELECT DISTINCT ?id_book
  WHERE {?id_book prefix:hasAuthor :author#1.
  ?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}
  
  To reduce the time of calculus (I save more than dozen minutes), I'm using
  a script to realise successively these queries. The script selects the
  books which have as author the author n°1:
  
  SELECT ?id_book
  WHERE {?id_book prefix:hasAuthor :author#1}
  
  And I ask for each result for 1 to n (id_book#1, id_book#2, ...,
  id_book#n) if it's linked to client n°1:
  
  ASK {id_book#i prefix:linkedToEditor*/prefix:hasClient :client#1}
  
  The SELECT query followed by the ASK query is far faster than the first
  SELECT query for the same results. I don't want to explore all the
  possibilities of ?id_book prefix:linkedToEditor/prefix:hasClient :client#1;
  I just want to save results where the link exists.
  
  I wish to realise these queries in one query. I tried with FILTER
  EXISTS or one embedded SELECT query, but the query times are similarly long.
  
  SELECT ?id_book
  WHERE {?id_book prefix:hasAuthor :author#1.}
  FILTER EXIST {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}
  ORDER by ?id_book
  
  or
  
  SELECT DISTINCT ?id_book
  WHERE {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1.
  {SELECT ?id_book
  WHERE {?id_book prefix:hasAuthor :author#1.}
  }
  }
  
  I also tried the use of WITH and INCLUDE, same symptoms:
  
  SELECT DISTINCT ?id_book
  
  WITH {
  SELECT ?id_book
  WHERE {?id_book prefix:hasAuthor :author#1}
  } AS %firstGraph
  
  WHERE {
  ?id_book prefix:linkedToEditor/prefix:hasClient :client#1
  INCLUDE %firstGraph
  }
  
  The query with FILTER EXISTS and INCLUDE get all solutions before to give
  results. Do exist optimized queries for my purpose?
  
  Thanks in advance.
  
  How to embedded an ASK query in a SELECT one ?
  https://sourceforge.net/p/bigdata/discussion/676946/thread/e5718c23/?limit=25#1a0e
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Cyril - 2015-08-18
    
    (sorry for the subquery in my first post, I made a mistake between the query and the subquery; I EDITED my post)
    
    See my data:
    
    @prefix : http://people.example/ .
    
    :book1 :hasAuthor :author2.
    :book2 :hasAuthor :author1, :author2.
    :book1000 :hasAuthor :author1.
    
    :book1 :hasClient :client1.
    :book1 :hasClient :client2.
    :book1 :hasClient :client3.
    :book1 :hasClient :client4.
    :book1 :hasClient :client5.
    :book1 :hasClient :client6.
    
    :book2 :hasClient :client1.
    :book2 :hasClient :client3.
    :book2 :hasClient :client2.
    :book2 :hasClient :client4.
    :book2 :hasClient :client6.
    :book2 :hasClient :client7.
    
    :book3 :hasClient :client1.
    :book4 :hasClient :client1.
    :book5 :hasClient :client1.
    
    :book1000 :hasClient :client7.
    
    I tried this:
    
    PREFIX : http://people.example/
    SELECT DISTINCT ?id_book
    WHERE {?id_book :hasAuthor :author1.
    Filter exists { ?id_book :linkedToEditor*/:hasClient :client2. hint:SubQuery hint:filterExists "SubQueryLimitOne".}
    }
    
    The problem is that doesn't save time.
    
    Last edit: Cyril 2015-08-19
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cyril - 2015-08-19

The SELECT LIMIT 1 query is as fast as the ASK query.
I'll send you my data (the real ones) and the procedures I tested.

For the moment the SELECT results embedded in ASK query is the most faster.
I saw in your JIRA platform BLZG-1049 and BLZG-1048 ; I've tested the queryhint with hint:SubQuery hint:filterExists "SubQueryLimitOne" but the time of the query isn't good enough compare to my current method. Because I'm not very masterized all subtleties of SPARQL subquery and Blazegraph hint I'm thinking I missed something.

Thanks for your time.

Last edit: Cyril 2015-08-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Michael - 2015-08-24

Dear Cyril,

just tried your example, you are right: LIMIT 1 is different from ASK here. The problem is that the outer variable is not known in the scope of the subquery, and using LIMIT 1 will just (randomly) return one binding, which is not guaranteed to match the one from outside (this is why you end up with zero results).

It would be great if you could send some real data (should be enough to cause slow runtimes) plus the query alternatives that you've tried so far.

Best,
Michael

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Michael - 2015-08-26

Cyril,

thanks for sharing the data. I investigated the query and I currently do not see any means to get this running efficiently in one query. Actually, there are two main problems here:

In some variants I tried out we get blocking query plans, requiring the inner subquery to be fully evaluated, i.e. partial results are not pipelined through so evaluation blocks, caused by hash joins in the query evaluation plan. Implementing a non-blocking hash join operator is on our blacklog and will help to improve on such situations.

The ASK query that you run is not blocking. The key difference of using an ASK query (compared to, e.g., FILTER NOT EXISTS) is that in the ASK query the subject is a constant URI that is substituted in (by your script) externally. In that case, the property path implementation chooses to start out its evaluation with the subject. If the subject is a variable, it starts out with the object position (even though the subject is "implicitly" bound in incoming bindings. It's hard for the engine to decide what's best, for the data at hand starting out with the subject(s) is clearly the better choice. We would need a query hint to control this behavior. I've created a ticket for this: https://jira.blazegraph.com/browse/BLZG-1449

One more question from our side: is the data (and the queries) that you shared public or do you plan to publish it at some point in future? We're looking for interesting data sets and queries that we could use to benchmark and optimize our property path implementation. In case the data is open and you have more property path queries, we'd be very much interested in using it internally for benchmarking purpose.

Best,
Michael
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Cyril - 2015-08-27
  
  Thanks for you reply and explanations.
  
  Unfortunately the data aren't open and public and some are under copyrights.
  I intend to create soon a similar data with open/public data in several mounths. I'll send it to you.
  
  Best regards,
  Cyril
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to embedded an ASK query in a SELECT one ?

Fast, scalable, robust graph database platform

Forums

Help

How to embedded an ASK query in a SELECT one ?

Thanks in advance.

How to embedded an ASK query in a SELECT one ?

Fast, scalable, robust graph database platform

Forums

Help

How to embedded an ASK query in a SELECT one ? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Thanks in advance.

How to embedded an ASK query in a SELECT one ?