Menu

How to embedded an ASK query in a SELECT one ?

Help
Cyril
2015-08-17
2015-08-27
  • Cyril

    Cyril - 2015-08-17

    Dear everyone,

    I'm currently attempting to design queries which embedded ASK query in SELECT one. More exactly I wish to select and filter results in my graph where an assumption is true in one query.

    For my example, imagine I've many books which have one author and one editor. I want to select the book from the author which his book is linked through random path length property to the client#1.

    In my case, with my data, It takes a lot of time to realise the query directly like that:

    SELECT DISTINCT ?id_book
    WHERE {?id_book prefix:hasAuthor :author#1.
        ?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}
    

    To reduce the time of calculus (I save more than dozen minutes), I'm using a script to realise successively these queries. The script selects the books which have as author the author n°1:

    SELECT ?id_book
    WHERE {?id_book prefix:hasAuthor :author#1}
    

    And I ask for each result for 1 to n (id_book#1, id_book#2, ..., id_book#n) if it's linked to client n°1:

    ASK {id_book#i prefix:linkedToEditor*/prefix:hasClient :client#1}
    

    The SELECT query followed by the ASK query is far faster than the first SELECT query for the same results. I don't want to explore all the possibilities of ?id_book prefix:linkedToEditor/prefix:hasClient :client#1; I just want to save results where the link exists.

    I wish to realise these queries in one query. I tried with FILTER EXISTS or one embedded SELECT query, but the query times are as long as the first query above.

    SELECT ?id_book
    WHERE {?id_book prefix:hasAuthor :author#1.}
    FILTER EXIST {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}
    ORDER by ?id_book
    

    or

    SELECT DISTINCT ?id_book
    WHERE {?id_book prefix:hasAuthor :author#1.
    {SELECT ?id_book
        WHERE {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1.}
     }
    }
    

    I also tried the use of WITH and INCLUDE, same symptoms:

    SELECT DISTINCT ?id_book
    
    WITH {
    SELECT ?id_book
    WHERE {?id_book prefix:hasAuthor :author#1} 
       } AS %firstGraph
    
    WHERE {
    ?id_book prefix:linkedToEditor*/prefix:hasClient :client#1
    INCLUDE %firstGraph
    }
    

    The query with FILTER EXISTS and INCLUDE get all solutions before to give results. Do exist optimized queries for my purpose?

    Thanks in advance.

     

    Last edit: Cyril 2015-08-18
    • Bryan Thompson

      Bryan Thompson - 2015-08-18

      Michael wrote:

      One trick for a FILTER EXISTS containing a single triple pattern is to
      rewrite it into an OPTIONAL join -- the benefit is obtained through
      switching to a pipelined plan (this is only true if there is a LIMIT
      involved in the query). (We plan to introduce no-blocking hash joins to
      improve performance in queries that use limits.)

      This case is quite different in various aspects: it's a FILTER NOT EXISTS
      and the inner expression is quite complex with an arbitrary length property
      path. If the ASK queries in the decomposed version are really fast, then
      there should be an efficient plan (not saying that Blazegraph is able to
      find it, though).

      In the end it boils down to looking at the ALP node, and how it makes use
      of incoming bindings. If

      ASK {id_book#i prefix:linkedToEditor*/prefix:hasClient :client#1}

      is really running much more efficiently, what about the same version as a
      SELECT query with LIMIT 1. If the latter is efficient as well, the problem
      might be a blowup in the sense that there are multiple paths, and using
      LIMITed SELECT queries might help.

      We'd need some sample data to look into this into more detail.

      Thanks,
      Bryan

      On Mon, Aug 17, 2015 at 9:33 AM, Cyril cyril-f@users.sf.net wrote:

      Dear everyone,

      I'm currently attempting to design queries which embedded ASK query in
      SELECT one. More exactly I wish to select and filter results in my graph
      where an assumption is true in one query.

      For my example, imagine I've many books which have one author and one
      editor. I want to select the book from the author which his book is linked
      through random path length property to the client#1.

      In my case, with my data, It takes a lot of time to realise the query
      directly like that:

      SELECT DISTINCT ?id_book
      WHERE {?id_book prefix:hasAuthor :author#1.
      ?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}

      To reduce the time of calculus (I save more than dozen minutes), I'm using
      a script to realise successively these queries. The script selects the
      books which have as author the author n°1:

      SELECT ?id_book
      WHERE {?id_book prefix:hasAuthor :author#1}

      And I ask for each result for 1 to n (id_book#1, id_book#2, ...,
      id_book#n) if it's linked to client n°1:

      ASK {id_book#i prefix:linkedToEditor*/prefix:hasClient :client#1}

      The SELECT query followed by the ASK query is far faster than the first
      SELECT query for the same results. I don't want to explore all the
      possibilities of ?id_book prefix:linkedToEditor/prefix:hasClient :client#1;
      I just want to save results where the link exists.

      I wish to realise these queries in one query. I tried with FILTER
      EXISTS or one embedded SELECT query, but the query times are similarly long.

      SELECT ?id_book
      WHERE {?id_book prefix:hasAuthor :author#1.}
      FILTER EXIST {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1}
      ORDER by ?id_book

      or

      SELECT DISTINCT ?id_book
      WHERE {?id_book prefix:linkedToEditor*/prefix:hasClient :client#1.
      {SELECT ?id_book
      WHERE {?id_book prefix:hasAuthor :author#1.}
      }
      }

      I also tried the use of WITH and INCLUDE, same symptoms:

      SELECT DISTINCT ?id_book

      WITH {
      SELECT ?id_book
      WHERE {?id_book prefix:hasAuthor :author#1}
      } AS %firstGraph

      WHERE {
      ?id_book prefix:linkedToEditor/prefix:hasClient :client#1
      INCLUDE %firstGraph
      }

      The query with FILTER EXISTS and INCLUDE get all solutions before to give
      results. Do exist optimized queries for my purpose?

      Thanks in advance.

      How to embedded an ASK query in a SELECT one ?
      https://sourceforge.net/p/bigdata/discussion/676946/thread/e5718c23/?limit=25#1a0e


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • Cyril

        Cyril - 2015-08-18

        (sorry for the subquery in my first post, I made a mistake between the query and the subquery; I EDITED my post)

        See my data:

        @prefix : http://people.example/ .

        :book1 :hasAuthor :author2.
              :book2 :hasAuthor :author1, :author2.
              :book1000 :hasAuthor :author1.

        :book1 :hasClient :client1.
              :book1 :hasClient :client2.
              :book1 :hasClient :client3.
              :book1 :hasClient :client4.
              :book1 :hasClient :client5.
              :book1 :hasClient :client6.

        :book2 :hasClient :client1.
              :book2 :hasClient :client3.
              :book2 :hasClient :client2.
              :book2 :hasClient :client4.
              :book2 :hasClient :client6.
              :book2 :hasClient :client7.

        :book3 :hasClient :client1.
              :book4 :hasClient :client1.
              :book5 :hasClient :client1.

              :book1000 :hasClient :client7.

        I tried this:

        PREFIX : http://people.example/
          SELECT DISTINCT ?id_book
              WHERE {?id_book :hasAuthor :author1.
                Filter exists { ?id_book :linkedToEditor*/:hasClient :client2. hint:SubQuery hint:filterExists "SubQueryLimitOne".}
              }

        The problem is that doesn't save time.

         

        Last edit: Cyril 2015-08-19
  • Cyril

    Cyril - 2015-08-19

    The SELECT LIMIT 1 query is as fast as the ASK query.
    I'll send you my data (the real ones) and the procedures I tested.

    For the moment the SELECT results embedded in ASK query is the most faster.
    I saw in your JIRA platform BLZG-1049 and BLZG-1048 ; I've tested the queryhint with hint:SubQuery hint:filterExists "SubQueryLimitOne" but the time of the query isn't good enough compare to my current method. Because I'm not very masterized all subtleties of SPARQL subquery and Blazegraph hint I'm thinking I missed something.

    Thanks for your time.

     

    Last edit: Cyril 2015-08-19
  • Michael

    Michael - 2015-08-24

    Dear Cyril,

    just tried your example, you are right: LIMIT 1 is different from ASK here. The problem is that the outer variable is not known in the scope of the subquery, and using LIMIT 1 will just (randomly) return one binding, which is not guaranteed to match the one from outside (this is why you end up with zero results).

    It would be great if you could send some real data (should be enough to cause slow runtimes) plus the query alternatives that you've tried so far.

    Best,
    Michael

     
  • Michael

    Michael - 2015-08-26

    Cyril,

    thanks for sharing the data. I investigated the query and I currently do not see any means to get this running efficiently in one query. Actually, there are two main problems here:

    • In some variants I tried out we get blocking query plans, requiring the inner subquery to be fully evaluated, i.e. partial results are not pipelined through so evaluation blocks, caused by hash joins in the query evaluation plan. Implementing a non-blocking hash join operator is on our blacklog and will help to improve on such situations.
    • The ASK query that you run is not blocking. The key difference of using an ASK query (compared to, e.g., FILTER NOT EXISTS) is that in the ASK query the subject is a constant URI that is substituted in (by your script) externally. In that case, the property path implementation chooses to start out its evaluation with the subject. If the subject is a variable, it starts out with the object position (even though the subject is "implicitly" bound in incoming bindings. It's hard for the engine to decide what's best, for the data at hand starting out with the subject(s) is clearly the better choice. We would need a query hint to control this behavior. I've created a ticket for this: https://jira.blazegraph.com/browse/BLZG-1449

    One more question from our side: is the data (and the queries) that you shared public or do you plan to publish it at some point in future? We're looking for interesting data sets and queries that we could use to benchmark and optimize our property path implementation. In case the data is open and you have more property path queries, we'd be very much interested in using it internally for benchmarking purpose.

    Best,
    Michael

     
    • Cyril

      Cyril - 2015-08-27

      Thanks for you reply and explanations.

      Unfortunately the data aren't open and public and some are under copyrights.
      I intend to create soon a similar data with open/public data in several mounths. I'll send it to you.

      Best regards,
      Cyril

       

Log in to post a comment.

MongoDB Logo MongoDB