[Bigdata-developers] Getting triples up to 3 hops from a given set of resources

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I have a list of URIs. I want to get all triples whose subjects are the given URIs. To do that I use:

CONSTRUCT { ?s ?p ?o . } WHERE { ?s ?p ?o. FILTER (?s in ($MY_LIST)) }

I iterate over the result triples. For each triple, if the object is an URI I add it to a set. Then I use that set to generate a similar query to get triples on the second hop from the input uris. Then another query to get triples on the third hop. In my application code I remember all URIs to make sure I never request information about a resource more than once. I can also filter out predicates that I don't want to follow (like rdf:type).

This seems to work with three queries. The problem is that those lists can get pretty large (thousands) and the SPARQL query that I push over the network gets large as well. My question is: how to do it faster? How to get triples up to n hops from the given set of starting points.

I've tried the following:

CONSTRUCT { ?s ?p ?o . } WHERE {
 { ?s ?p ?o . FILTER (?s in ($MY_LIST)) }
 UNION
 { ?start ?p1 ?s . ?s ?p ?o. FILTER (?start in ($MY_LIST)) }
 UNION
 { ?start ?p1 ?o1 . ?o1 ?p2 ?s . ?s ?p ?o . FILTER (?start in ($MY_LIST)) }
}

CONSTRUCT { ?s ?p1 ?o1 . ?o1 ?p2 ?o2 . ?o2 ?p3 ?o3 . }
WHERE {
    ?s ?p1 ?o1 .
    OPTIONAL {
        ?o1 ?p2 ?o2 .
        OPTIONAL {
             ?o2 ?p3 ?o3 .
        }
    }
    FILTER (?s in ($MY_LIST))
}

Both are MUCH too slow. Only the iterative version stays within seconds on my dataset. The simple unions and optionals like the ones above go well beyond minutes, variants that filter out certain predicates, or those that try to avoid returning the same triples more than once seem even slower. 

What is the fastest way to do that in Bigdata?

I use bigdata 1.2.3, deployed as a .war on Tomcat.

Cheers,

-- 
Antoni Myłka
Software Engineer

Quantinum AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 388 20 40
http://www.quantinum.com - Bee for Business

[Bigdata-developers] Getting triples up to 3 hops from a given set of resources

Fast, scalable, robust graph database platform

[Bigdata-developers] Getting triples up to 3 hops from a given set of resources