1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #515 (closed defect: fixed)

Opened 14 months ago

Last modified 14 months ago

Query with two "FILTER NOT EXISTS" expressions returns no results

Reported by: gjdev Owned by: mrpersonick
Priority: major Milestone: Query
Component: Bigdata RDF Database Version: BIGDATA_RELEASE_1_1_0
Keywords: Cc: thompsonbry

Description

This can be reproduced by creating a testcase for the following testdata, query and expected results, which will return no results while it should return os:2:

filter-not-exists.rq:

SELECT ?ar
WHERE {
    ?ar a <os:class/AnalysisResults>.
    FILTER NOT EXISTS {
        ?ar <os:prop/analysis/refEntity> <os:elem/loc/Artis>.
    }.
    FILTER NOT EXISTS {
        ?ar <os:prop/analysis/refEntity> <os:elem/loc/Kriterion>.
    }.
}

filter-not-exists.ttl:

<os:0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <os:class/AnalysisResults> .
<os:0> <os:prop/analysis/refEntity> <os:elem/loc/Artis> .
<os:0> <os:prop/analysis/refEntity> <os:elem/loc/Kriterion> .
<os:1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <os:class/AnalysisResults> .
<os:1> <os:prop/analysis/refEntity> <os:elem/loc/Artis> .
<os:2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <os:class/AnalysisResults> .

filter-not-exists.srx:

<?xml version="1.0"?>
<sparql
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:xs="http://www.w3.org/2001/XMLSchema#"
    xmlns="http://www.w3.org/2005/sparql-results#" >
  <head>
    <variable name="ar"/>
  </head>
  <results>
    <result>
      <binding name="ar">
      	<uri>os:2</uri>
      </binding>
    </result>
  </results>
</sparql>

Change History

Changed 14 months ago by gjdev

This is the query plan for the above query:

queryPlan
com.bigdata.bop.solutions.ProjectionOp[23](DropOp[22])[ com.bigdata.bop.BOp.bopId=23, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.select=[ar], com.bigdata.bop.engine.QueryEngine.queryId=960709f9-beec-4b97-b5b7-277305765804]
  com.bigdata.bop.solutions.DropOp[22](ConditionalRoutingOp[19])[ com.bigdata.bop.BOp.bopId=22, com.bigdata.bop.solutions.DropOp.dropVars=[-exists-1, -exists-2]]
    com.bigdata.bop.bset.ConditionalRoutingOp[19](ChunkedMaterializationOp[21])[ com.bigdata.bop.BOp.bopId=19, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-2)))]
      com.bigdata.bop.rdf.join.ChunkedMaterializationOp[21](ConditionalRoutingOp[20])[ com.bigdata.bop.rdf.join.ChunkedMaterializationOp.vars=[-exists-2], com.bigdata.bop.IPredicate.relationName=[kb.lex], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.BOp.bopId=21]
        com.bigdata.bop.bset.ConditionalRoutingOp[20](ConditionalRoutingOp[16])[ com.bigdata.bop.BOp.bopId=20, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NeedsMaterializationBOp(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-2)))), com.bigdata.bop.PipelineOp.altSinkRef=19]
          com.bigdata.bop.bset.ConditionalRoutingOp[16](ChunkedMaterializationOp[18])[ com.bigdata.bop.BOp.bopId=16, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-1)))]
            com.bigdata.bop.rdf.join.ChunkedMaterializationOp[18](ConditionalRoutingOp[17])[ com.bigdata.bop.rdf.join.ChunkedMaterializationOp.vars=[-exists-1], com.bigdata.bop.IPredicate.relationName=[kb.lex], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.BOp.bopId=18]
              com.bigdata.bop.bset.ConditionalRoutingOp[17](JVMSolutionSetHashJoinOp[15])[ com.bigdata.bop.BOp.bopId=17, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NeedsMaterializationBOp(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-1)))), com.bigdata.bop.PipelineOp.altSinkRef=16]
                com.bigdata.bop.join.JVMSolutionSetHashJoinOp[15](PipelineJoin[14])[ com.bigdata.bop.BOp.bopId=15, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.sharedState=true, class com.bigdata.bop.join.HTreeSolutionSetHashJoinOp.release=true, com.bigdata.bop.PipelineOp.lastPass=true, namedSetRef=NamedSolutionSetRef{queryId=960709f9-beec-4b97-b5b7-277305765804,namedSet=--set-10,joinVars=[ar]}]
                  com.bigdata.bop.join.PipelineJoin[14](JVMHashIndexOp[11])[ com.bigdata.bop.BOp.bopId=14, com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.BOp.evaluationContext=ANY, com.bigdata.bop.join.AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[12](ar=null, TermId(8U)[os:prop/analysis/refEntity], TermId(7U)[os:elem/loc/Kriterion], --anon-13=null)[ com.bigdata.bop.IPredicate.relationName=[kb.spo], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.BOp.bopId=12, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=1, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS, com.bigdata.bop.IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL], com.bigdata.bop.IPredicate.accessPathFilter=cutthecrap.utils.striterators.NOPFilter@178e13f{annotations=null,filterChain=[com.bigdata.bop.rdf.filter.StripContextFilter(), com.bigdata.bop.ap.filter.DistinctFilter()]}]]
                    com.bigdata.bop.join.JVMHashIndexOp[11](JVMSolutionSetHashJoinOp[9])[ com.bigdata.bop.BOp.bopId=11, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.lastPass=true, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.joinType=Exists, com.bigdata.bop.join.HashJoinAnnotations.joinVars=[ar], com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.join.JoinAnnotations.select=[ar, -exists-2], com.bigdata.bop.join.HashJoinAnnotations.askVar=-exists-2, namedSetRef=NamedSolutionSetRef{queryId=960709f9-beec-4b97-b5b7-277305765804,namedSet=--set-10,joinVars=[ar]}]
                      com.bigdata.bop.join.JVMSolutionSetHashJoinOp[9](PipelineJoin[8])[ com.bigdata.bop.BOp.bopId=9, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.sharedState=true, class com.bigdata.bop.join.HTreeSolutionSetHashJoinOp.release=true, com.bigdata.bop.PipelineOp.lastPass=true, namedSetRef=NamedSolutionSetRef{queryId=960709f9-beec-4b97-b5b7-277305765804,namedSet=--set-4,joinVars=[ar]}]
                        com.bigdata.bop.join.PipelineJoin[8](JVMHashIndexOp[5])[ com.bigdata.bop.BOp.bopId=8, com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.BOp.evaluationContext=ANY, com.bigdata.bop.join.AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[6](ar=null, TermId(8U)[os:prop/analysis/refEntity], TermId(6U)[os:elem/loc/Artis], --anon-7=null)[ com.bigdata.bop.IPredicate.relationName=[kb.spo], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.BOp.bopId=6, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=2, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS, com.bigdata.bop.IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL], com.bigdata.bop.IPredicate.accessPathFilter=cutthecrap.utils.striterators.NOPFilter@10c0fa7{annotations=null,filterChain=[com.bigdata.bop.rdf.filter.StripContextFilter(), com.bigdata.bop.ap.filter.DistinctFilter()]}]]
                          com.bigdata.bop.join.JVMHashIndexOp[5](PipelineJoin[3])[ com.bigdata.bop.BOp.bopId=5, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.lastPass=true, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.joinType=Exists, com.bigdata.bop.join.HashJoinAnnotations.joinVars=[ar], com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.join.JoinAnnotations.select=[ar, -exists-1], com.bigdata.bop.join.HashJoinAnnotations.askVar=-exists-1, namedSetRef=NamedSolutionSetRef{queryId=960709f9-beec-4b97-b5b7-277305765804,namedSet=--set-4,joinVars=[ar]}]
                            com.bigdata.bop.join.PipelineJoin[3]()[ com.bigdata.bop.BOp.bopId=3, com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.BOp.evaluationContext=ANY, com.bigdata.bop.join.AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[1](ar=null, Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], TermId(5U)[os:class/AnalysisResults], --anon-2=null)[ com.bigdata.bop.IPredicate.relationName=[kb.spo], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.BOp.bopId=1, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=3, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS, com.bigdata.bop.IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL], com.bigdata.bop.IPredicate.accessPathFilter=cutthecrap.utils.striterators.NOPFilter@d7a9c9{annotations=null,filterChain=[com.bigdata.bop.rdf.filter.StripContextFilter(), com.bigdata.bop.ap.filter.DistinctFilter()]}]]

The problem seems to be that the solutions of the filters are joined, but the boolean -exists-1 variable of the first filter is not passed on. ConditionalRoutingOp?[17] drops all results, because it gets a SparqlTypeError? when checking for the not-available solution of -exists-1. Either -exists-1 should be passed on from the subtasks, or the join should be done after the ConditionalRoutingOp?.

When I rewrite the query like this, the correct results are returned:

SELECT ?ar
WHERE {
    ?ar a <os:class/AnalysisResults>.
    {
        FILTER NOT EXISTS {
            ?ar <os:prop/analysis/refEntity> <os:elem/loc/Artis>.
        }
    }.
    FILTER NOT EXISTS {
        ?ar <os:prop/analysis/refEntity> <os:elem/loc/Kriterion>.
    }.
}

Changed 14 months ago by thompsonbry

Ah. That helps. It must be pruning the variable out. I see

com.bigdata.bop.join.JoinAnnotations.select=[ar, -exists-1]

on one of the JVMHashIndexOps and the corresponding annotation for -exists-2 on the other. It looks like it is dropping out variables which were not part of the original query when it adds that annotation.

Changed 14 months ago by thompsonbry

  • status changed from new to closed
  • resolution set to fixed

The code was specifying the set of variables to "select" out of the hash join as the projection of the subquery which models the EXISTS group. This was causing any variables NOT projected into that subquery to be pruned. I checked the code for standard subquery and subgroup hash join evaluation, and those code paths were not specifing the SELECT annotation to the hash join. I modified the code path for the exists filter subgroup hash join and the unit test now passes.

Committed revision r6136

Changed 14 months ago by gjdev

  • status changed from closed to reopened
  • resolution fixed deleted

When I rewrite the query as:

SELECT DISTINCT ?ar
WHERE {
    {
        ?ar a <os:class/AnalysisResults>.
        FILTER NOT EXISTS {
            ?ar <os:prop/analysis/refEntity> <os:elem/loc/Artis>.
        }
    } FILTER NOT EXISTS {
        ?ar <os:prop/analysis/refEntity> <os:elem/loc/Kriterion>.
    }
}

The testcase will fail again.

Changed 14 months ago by gjdev

It looks like the optimizedAST constructed from the query is wrong. The query that causes the failure is (by the way: in sesame this actually results in the exact same query plan as the original query reported when opening this ticket):

SELECT ?ar
WHERE {
    {
        ?ar a <os:class/AnalysisResults>.
        FILTER NOT EXISTS {
            ?ar <os:prop/analysis/refEntity> <os:elem/loc/Artis>.
        }
    } FILTER NOT EXISTS {
        ?ar <os:prop/analysis/refEntity> <os:elem/loc/Kriterion>.
    }
}

BigData? rewrites this to the following optimized AST:

QueryType: SELECT
SELECT ( VarNode(ar) AS VarNode(ar) )
  JoinGroupNode {
    QueryType: ASK
    SELECT VarNode(ar) VarNode(-exists-2)[anonymous]
      JoinGroupNode {
        StatementPatternNode(VarNode(ar), ConstantNode(TermId(8U)[os:prop/analysis/refEntity]), ConstantNode(TermId(7U)[os:elem/loc/Kriterion]), DEFAULT_CONTEXTS)
          com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=1
          com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
      }
    @askVar=-exists-2
    JoinGroupNode {
      StatementPatternNode(VarNode(ar), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), ConstantNode(TermId(5U)[os:class/AnalysisResults]), DEFAULT_CONTEXTS)
        com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=3
        com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
      QueryType: ASK
      SELECT VarNode(ar) VarNode(-exists-1)[anonymous]
        JoinGroupNode {
          StatementPatternNode(VarNode(ar), ConstantNode(TermId(8U)[os:prop/analysis/refEntity]), ConstantNode(TermId(6U)[os:elem/loc/Artis]), DEFAULT_CONTEXTS)
            com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=2
            com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
        }
      @askVar=-exists-1
      FILTER( com.bigdata.rdf.sparql.ast.NotExistsNode(VarNode(-exists-1))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.bigdata.com/sparql-1.1-undefined-functionsnot-exists, graphPattern=
JoinGroupNode {
  StatementPatternNode(VarNode(ar), ConstantNode(TermId(8U)[os:prop/analysis/refEntity]), ConstantNode(TermId(6U)[os:elem/loc/Artis]), DEFAULT_CONTEXTS)
    com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=2
    com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
}, valueExpr=com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-1))] )
    } JOIN ON (ar)
    FILTER( com.bigdata.rdf.sparql.ast.NotExistsNode(VarNode(-exists-2))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.bigdata.com/sparql-1.1-undefined-functionsnot-exists, graphPattern=
JoinGroupNode {
  StatementPatternNode(VarNode(ar), ConstantNode(TermId(8U)[os:prop/analysis/refEntity]), ConstantNode(TermId(7U)[os:elem/loc/Kriterion]), DEFAULT_CONTEXTS)
    com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=1
    com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
}, valueExpr=com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-2))] )
  }

This looks incorrect to me. One of the ASK queries (created for the NOT EXISTS clause) is moved to the top of the join group. Unless I misinterpret the optimized AST it means that the query plan will start looking for the first ?ar binding that matches the pattern in that NOT EXISTS clause, and it never looks at other statements that match the "?ar a <os:class/AnalysisResults>" pattern. The one ?ar binding it so selects will always be dropped, because it obviously matches the NOT EXISTS pattern...

For completeness, here is the flow of solutions:

SOLUTION:	QueryUUID	bop	bopId	partitionId	chunkSize
SOLUTION:	c902c6d0-2611-48f0-8c09-64ab527d5a4d	JVMHashIndexOp	2	-1	1	{  }
SOLUTION:	c902c6d0-2611-48f0-8c09-64ab527d5a4d	PipelineJoin	5	-1	1	{ ar=TermId(2U) }
SOLUTION:	c902c6d0-2611-48f0-8c09-64ab527d5a4d	JVMSolutionSetHashJoinOp	6	-1	1	{ -exists-2=XSDBoolean(true) }

With this query plan:

com.bigdata.bop.solutions.ProjectionOp[27](DropOp[26])[ com.bigdata.bop.BOp.bopId=27, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.select=[ar], com.bigdata.bop.engine.QueryEngine.queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d]
  com.bigdata.bop.solutions.DropOp[26](ConditionalRoutingOp[23])[ com.bigdata.bop.BOp.bopId=26, com.bigdata.bop.solutions.DropOp.dropVars=[-exists-2]]
    com.bigdata.bop.bset.ConditionalRoutingOp[23](ChunkedMaterializationOp[25])[ com.bigdata.bop.BOp.bopId=23, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-2)))]
      com.bigdata.bop.rdf.join.ChunkedMaterializationOp[25](ConditionalRoutingOp[24])[ com.bigdata.bop.rdf.join.ChunkedMaterializationOp.vars=[-exists-2], com.bigdata.bop.IPredicate.relationName=[kb.lex], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.BOp.bopId=25]
        com.bigdata.bop.bset.ConditionalRoutingOp[24](JVMSolutionSetHashJoinOp[22])[ com.bigdata.bop.BOp.bopId=24, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NeedsMaterializationBOp(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-2)))), com.bigdata.bop.PipelineOp.altSinkRef=23]
          com.bigdata.bop.join.JVMSolutionSetHashJoinOp[22](DropOp[21])[ com.bigdata.bop.BOp.bopId=22, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.constraints=null, class com.bigdata.bop.join.HTreeSolutionSetHashJoinOp.release=false, com.bigdata.bop.PipelineOp.lastPass=false, namedSetRef=NamedSolutionSetRef{queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d,namedSet=--set-7,joinVars=[ar]}]
            com.bigdata.bop.solutions.DropOp[21](ConditionalRoutingOp[18])[ com.bigdata.bop.BOp.bopId=21, com.bigdata.bop.solutions.DropOp.dropVars=[-exists-1]]
              com.bigdata.bop.bset.ConditionalRoutingOp[18](ChunkedMaterializationOp[20])[ com.bigdata.bop.BOp.bopId=18, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-1)))]
                com.bigdata.bop.rdf.join.ChunkedMaterializationOp[20](ConditionalRoutingOp[19])[ com.bigdata.bop.rdf.join.ChunkedMaterializationOp.vars=[-exists-1], com.bigdata.bop.IPredicate.relationName=[kb.lex], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.BOp.bopId=20]
                  com.bigdata.bop.bset.ConditionalRoutingOp[19](JVMSolutionSetHashJoinOp[17])[ com.bigdata.bop.BOp.bopId=19, com.bigdata.bop.bset.ConditionalRoutingOp.condition=com.bigdata.rdf.internal.constraints.SPARQLConstraint(com.bigdata.rdf.internal.constraints.NeedsMaterializationBOp(com.bigdata.rdf.internal.constraints.NotBOp(com.bigdata.rdf.internal.constraints.EBVBOp(-exists-1)))), com.bigdata.bop.PipelineOp.altSinkRef=18]
                    com.bigdata.bop.join.JVMSolutionSetHashJoinOp[17](PipelineJoin[16])[ com.bigdata.bop.BOp.bopId=17, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.sharedState=true, class com.bigdata.bop.join.HTreeSolutionSetHashJoinOp.release=true, com.bigdata.bop.PipelineOp.lastPass=true, namedSetRef=NamedSolutionSetRef{queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d,namedSet=--set-12,joinVars=[ar]}]
                      com.bigdata.bop.join.PipelineJoin[16](JVMHashIndexOp[13])[ com.bigdata.bop.BOp.bopId=16, com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.BOp.evaluationContext=ANY, com.bigdata.bop.join.AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[14](ar=null, TermId(8U)[os:prop/analysis/refEntity], TermId(6U)[os:elem/loc/Artis], --anon-15=null)[ com.bigdata.bop.IPredicate.relationName=[kb.spo], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.BOp.bopId=14, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=2, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS, com.bigdata.bop.IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL], com.bigdata.bop.IPredicate.accessPathFilter=cutthecrap.utils.striterators.NOPFilter@d5c0f9{annotations=null,filterChain=[com.bigdata.bop.rdf.filter.StripContextFilter(), com.bigdata.bop.ap.filter.DistinctFilter()]}]]
                        com.bigdata.bop.join.JVMHashIndexOp[13](PipelineJoin[11])[ com.bigdata.bop.BOp.bopId=13, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.lastPass=true, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.joinType=Exists, com.bigdata.bop.join.HashJoinAnnotations.joinVars=[ar], com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.join.JoinAnnotations.select=null, com.bigdata.bop.join.HashJoinAnnotations.askVar=-exists-1, namedSetRef=NamedSolutionSetRef{queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d,namedSet=--set-12,joinVars=[ar]}]
                          com.bigdata.bop.join.PipelineJoin[11](JVMHashIndexOp[8])[ com.bigdata.bop.BOp.bopId=11, com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.BOp.evaluationContext=ANY, com.bigdata.bop.join.AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[9](ar=null, Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], TermId(5U)[os:class/AnalysisResults], --anon-10=null)[ com.bigdata.bop.IPredicate.relationName=[kb.spo], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.BOp.bopId=9, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=3, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS, com.bigdata.bop.IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL], com.bigdata.bop.IPredicate.accessPathFilter=cutthecrap.utils.striterators.NOPFilter@1701bdc{annotations=null,filterChain=[com.bigdata.bop.rdf.filter.StripContextFilter(), com.bigdata.bop.ap.filter.DistinctFilter()]}]]
                            com.bigdata.bop.join.JVMHashIndexOp[8](JVMSolutionSetHashJoinOp[6])[ com.bigdata.bop.BOp.bopId=8, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.lastPass=true, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.joinType=Normal, com.bigdata.bop.join.HashJoinAnnotations.joinVars=[ar], com.bigdata.bop.join.JoinAnnotations.select=null, namedSetRef=NamedSolutionSetRef{queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d,namedSet=--set-7,joinVars=[ar]}]
                              com.bigdata.bop.join.JVMSolutionSetHashJoinOp[6](PipelineJoin[5])[ com.bigdata.bop.BOp.bopId=6, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.sharedState=true, class com.bigdata.bop.join.HTreeSolutionSetHashJoinOp.release=true, com.bigdata.bop.PipelineOp.lastPass=true, namedSetRef=NamedSolutionSetRef{queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d,namedSet=--set-1,joinVars=[]}]
                                com.bigdata.bop.join.PipelineJoin[5](JVMHashIndexOp[2])[ com.bigdata.bop.BOp.bopId=5, com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.BOp.evaluationContext=ANY, com.bigdata.bop.join.AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[3](ar=null, TermId(8U)[os:prop/analysis/refEntity], TermId(7U)[os:elem/loc/Kriterion], --anon-4=null)[ com.bigdata.bop.IPredicate.relationName=[kb.spo], com.bigdata.bop.IPredicate.timestamp=0, com.bigdata.bop.BOp.bopId=3, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=1, com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS, com.bigdata.bop.IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL], com.bigdata.bop.IPredicate.accessPathFilter=cutthecrap.utils.striterators.NOPFilter@1353249{annotations=null,filterChain=[com.bigdata.bop.rdf.filter.StripContextFilter(), com.bigdata.bop.ap.filter.DistinctFilter()]}]]
                                  com.bigdata.bop.join.JVMHashIndexOp[2]()[ com.bigdata.bop.BOp.bopId=2, com.bigdata.bop.BOp.evaluationContext=CONTROLLER, com.bigdata.bop.PipelineOp.maxParallel=1, com.bigdata.bop.PipelineOp.lastPass=true, com.bigdata.bop.PipelineOp.sharedState=true, com.bigdata.bop.join.JoinAnnotations.joinType=Exists, com.bigdata.bop.join.HashJoinAnnotations.joinVars=[], com.bigdata.bop.join.JoinAnnotations.constraints=null, com.bigdata.bop.join.JoinAnnotations.select=null, com.bigdata.bop.join.HashJoinAnnotations.askVar=-exists-2, namedSetRef=NamedSolutionSetRef{queryId=c902c6d0-2611-48f0-8c09-64ab527d5a4d,namedSet=--set-1,joinVars=[]}]

Changed 14 months ago by thompsonbry

This looks incorrect to me. One of the ASK queries (created for the NOT  EXISTS
clause) is moved to the top of the join group. Unless I  misinterpret the
optimized AST it means that the query plan will start  looking for the first ?ar
binding that matches the pattern in that NOT  EXISTS clause, and it never looks
at other statements that match the "?ar  a <os:class/AnalysisResults>" pattern.
The one ?ar binding it so selects  will always be dropped, because it obviously
matches the NOT EXISTS  pattern... 

I think that this is exactly it. I suspect that the filter is being attached without proper regard to the requirements for binding the variables. It runs too soon and rejects all solutions.

Changed 14 months ago by thompsonbry

Adding the 2nd version of the FILTER NOT EXISTS query to CI. The test fails. Based on the analysis by Gerjon, it appears that the evaluation order is incorrect. I am asking Mike to take a look at this. It might be because we are not taking the materialization requirements for the filters into account for the ASK subquery. There is a TODO in the Javadoc related to this in AST2BOpUtility where it handles the EXISTS filter.

Committed revision r6217.

Changed 14 months ago by thompsonbry

  • owner set to mrpersonick
  • status changed from reopened to assigned

Changed 14 months ago by thompsonbry

  • status changed from assigned to closed
  • resolution set to fixed

The FILTER NOT EXISTS issue is fixed. The problem was the ordering of the children in the graph pattern group. The fix was to the ASTJoinOrderByTypeOptimizer. We are modeling the NOT EXISTS and EXISTS graph patterns as ASK subqueries. It was treating those subqueries as required joins and ordering them early in some cases.

I have modified the ASTJoinOrderByTypeOptimizer to order the ASK subqueries after the required joins.

While this change fixes this query, it is possible that we still could get bad join orderings when the variables used by the filter are only bound by OPTIONAL joins. It is also possible that we could run the ASK subquery for FILTER (NOT) EXISTS earlier if the filter variables are bound by required joins. This is really identical to the join filter attachment problem. The problem in the AST is that both the ASK subquery and the FILTER are present. It seems that the best solution would be to attach the ASK subquery to the FILTER and then to run it immediately before the FILTER, letting the existing filter attachment logic decide where to place the filter. We would also have to make sure that the FILTER was never attached to a JOIN since the ASK subquery would have to be run before the FILTER was evaluated.

Committed revision r6227.

Note: See TracTickets for help on using tickets.