From: Bryan T. <br...@sy...> - 2014-08-15 14:46:10
|
If you use the 1.3.1 release, you have access to the vertex-centric API of the GASService.[1]. This is designed to support a wide variety of graph traversal patterns and can be used transparently from SPARQL. See the examples on that wiki page. Thanks, Bryan [1] http://wiki.bigdata.com/wiki/index.php/RDF_GAS_API ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Aug 15, 2014 at 10:17 AM, Antoni Myłka <ant...@qu...> wrote: > Hi, > > I have a list of URIs. I want to get all triples whose subjects are the given URIs. To do that I use: > > CONSTRUCT { ?s ?p ?o . } WHERE { ?s ?p ?o. FILTER (?s in ($MY_LIST)) } > > I iterate over the result triples. For each triple, if the object is an URI I add it to a set. Then I use that set to generate a similar query to get triples on the second hop from the input uris. Then another query to get triples on the third hop. In my application code I remember all URIs to make sure I never request information about a resource more than once. I can also filter out predicates that I don't want to follow (like rdf:type). > > This seems to work with three queries. The problem is that those lists can get pretty large (thousands) and the SPARQL query that I push over the network gets large as well. My question is: how to do it faster? How to get triples up to n hops from the given set of starting points. > > I've tried the following: > > CONSTRUCT { ?s ?p ?o . } WHERE { > { ?s ?p ?o . FILTER (?s in ($MY_LIST)) } > UNION > { ?start ?p1 ?s . ?s ?p ?o. FILTER (?start in ($MY_LIST)) } > UNION > { ?start ?p1 ?o1 . ?o1 ?p2 ?s . ?s ?p ?o . FILTER (?start in ($MY_LIST)) } > } > > CONSTRUCT { ?s ?p1 ?o1 . ?o1 ?p2 ?o2 . ?o2 ?p3 ?o3 . } > WHERE { > ?s ?p1 ?o1 . > OPTIONAL { > ?o1 ?p2 ?o2 . > OPTIONAL { > ?o2 ?p3 ?o3 . > } > } > FILTER (?s in ($MY_LIST)) > } > > Both are MUCH too slow. Only the iterative version stays within seconds on my dataset. The simple unions and optionals like the ones above go well beyond minutes, variants that filter out certain predicates, or those that try to avoid returning the same triples more than once seem even slower. > > What is the fastest way to do that in Bigdata? > > I use bigdata 1.2.3, deployed as a .war on Tomcat. > > Cheers, > > -- > Antoni Myłka > Software Engineer > > Quantinum AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 388 20 40 > http://www.quantinum.com - Bee for Business > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |