Re: [Bigdata-developers] Getting triples up to 3 hops from a given set of resources

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

If you use the 1.3.1 release, you have access to the vertex-centric
API of the GASService.[1]. This is designed to support a wide variety
of graph traversal patterns and can be used transparently from SPARQL.
See the examples on that wiki page.

Thanks,
Bryan

[1] http://wiki.bigdata.com/wiki/index.php/RDF_GAS_API
----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://bigdata.com
http://mapgraph.io

CONFIDENTIALITY NOTICE:  This email and its contents and attachments
are for the sole use of the intended recipient(s) and are confidential
or proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments
is prohibited. If you have received this communication in error,
please notify the sender by reply email and permanently delete all
copies of the email and its contents and attachments.

On Fri, Aug 15, 2014 at 10:17 AM, Antoni Myłka
<ant...@qu...> wrote:
> Hi,
>
> I have a list of URIs. I want to get all triples whose subjects are the given URIs. To do that I use:
>
> CONSTRUCT { ?s ?p ?o . } WHERE { ?s ?p ?o. FILTER (?s in ($MY_LIST)) }
>
> I iterate over the result triples. For each triple, if the object is an URI I add it to a set. Then I use that set to generate a similar query to get triples on the second hop from the input uris. Then another query to get triples on the third hop. In my application code I remember all URIs to make sure I never request information about a resource more than once. I can also filter out predicates that I don't want to follow (like rdf:type).
>
> This seems to work with three queries. The problem is that those lists can get pretty large (thousands) and the SPARQL query that I push over the network gets large as well. My question is: how to do it faster? How to get triples up to n hops from the given set of starting points.
>
> I've tried the following:
>
> CONSTRUCT { ?s ?p ?o . } WHERE {
>  { ?s ?p ?o . FILTER (?s in ($MY_LIST)) }
>  UNION
>  { ?start ?p1 ?s . ?s ?p ?o. FILTER (?start in ($MY_LIST)) }
>  UNION
>  { ?start ?p1 ?o1 . ?o1 ?p2 ?s . ?s ?p ?o . FILTER (?start in ($MY_LIST)) }
> }
>
> CONSTRUCT { ?s ?p1 ?o1 . ?o1 ?p2 ?o2 . ?o2 ?p3 ?o3 . }
> WHERE {
>     ?s ?p1 ?o1 .
>     OPTIONAL {
>         ?o1 ?p2 ?o2 .
>         OPTIONAL {
>              ?o2 ?p3 ?o3 .
>         }
>     }
>     FILTER (?s in ($MY_LIST))
> }
>
> Both are MUCH too slow. Only the iterative version stays within seconds on my dataset. The simple unions and optionals like the ones above go well beyond minutes, variants that filter out certain predicates, or those that try to avoid returning the same triples more than once seem even slower.
>
> What is the fastest way to do that in Bigdata?
>
> I use bigdata 1.2.3, deployed as a .war on Tomcat.
>
> Cheers,
>
> --
> Antoni Myłka
> Software Engineer
>
> Quantinum AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 388 20 40
> http://www.quantinum.com - Bee for Business
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers

Re: [Bigdata-developers] Getting triples up to 3 hops from a given set of resources

Fast, scalable, robust graph database platform

Re: [Bigdata-developers] Getting triples up to 3 hops from a given set of resources