From: Michael S. <ms...@me...> - 2015-05-29 20:42:47
|
Stas, thanks for reporting, I’ll open a ticket. The problem is that FILTER NOT EXISTS is translated using a blocking (hash join) operator, so the query plan cannot pipeline through the first five results early on. There are a couple of optimisation approaches we’ve recently discussed that could help rewriting this into a more efficient plan. One thing you can do as a workaround is the “old-fashioned” SPARQL OPTIONAL + FILTER + !bound construct for expressing negation: prefix wdt: <http://www.wikidata.org/prop/direct/> prefix entity: <http://www.wikidata.org/entity/> SELECT ?item WHERE { ?item wdt:P31 entity:Q5 . OPTIONAL { ?item wdt:P18 ?dummy0 } FILTER(!bound(?dummy0)) } limit 5 This gives you a fully pipelined plan and runs amazingly fast :). Best, Michael > On 29 May 2015, at 21:27, Stas Malyshev <sma...@wi...> wrote: > > Hi! > > I am trying to run a query which does a lookup for non-existing links, > specifically: > > prefix wdt: <http://www.wikidata.org/prop/direct/> > prefix entity: <http://www.wikidata.org/entity/> > SELECT ?item WHERE { > ?item wdt:P31 entity:Q5 . > FILTER NOT EXISTS { ?item wdt:P18 ?dummy0 } > } limit 5 > > This tries to find all items with link to wdt:P31->entity:Q5 that lack > wdt:P18 predicate. The first line has a lot of matches (about 2.7 > millions) and a lot of them do have wdt:P18 predicate. So the query is > slow. My question is - can it be imporved? The reverse query - i.e. > items having both wdt:P31 and wdt:P18 - is very fast, since I assume it > uses indexes. But for the negative one looks like it doesn't. Can > anything be done to improve it? > > -- > Stas Malyshev > sma...@wi... > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |