From: Jeremy J C. <jj...@sy...> - 2015-04-14 17:22:44
|
I found a CONSTRUCT and LOAD much more performant than a DELETE/INSERT, and was wondering why, and whether there is anything new (to me) about the blazegraph architecture that I should understand. ===== I had a graph for which I wished to rename almost all URIs. The graph had about 3M triples I was working in AWS on I constructed a temporary graph with a rename mapping and then tried the following update query: DELETE { GRAPH <%(abox)s> { ?oldS ?oldP ?oldO } } INSERT { GRAPH <%(abox)s> { ?newS ?newP ?newO } } WHERE { graph <%(abox)s> { ?oldS ?oldP ?oldO } GRAPH <x-eg:temporary-graph> { ?oldS <x-eg:replaced-by> ?newS } GRAPH <x-eg:temporary-graph> { ?oldP <x-eg:replaced-by> ?newP } { GRAPH <x-eg:temporary-graph> { ?oldO <x-eg:replaced-by> ?newO } } UNION { graph <%(abox)s> { ?oldS ?oldP ?oldO } FILTER ( isLiteral(?oldO) ) BIND ( ?oldO as ?newO ) } } where <%(abox)s> is a variable At the point where we perform this query we have exclusive access to the blaze graph process. It took over 4 hours, with approx. the first hour showing some change in the query execution stats, and then the last 3 hours showing no change in the stats (the status page in the NSS display is not very useful with these update queries). After 4 hours I got bored. Cancel did not work. So I killed blazegraph and restarted. I then rewrote the code as follows. I wrote a construct query: CONSTRUCT { ?newS ?newP ?newO } WHERE { graph <%(abox)s> { ?oldS ?oldP ?oldO } GRAPH <x-eg:temporary-graph> { ?oldS <x-eg:replaced-by> ?newS } GRAPH <x-eg:temporary-graph> { ?oldP <x-eg:replaced-by> ?newP } { GRAPH <x-eg:temporary-graph> { ?oldO <x-eg:replaced-by> ?newO } } UNION { graph <%(abox)s> { ?oldS ?oldP ?oldO } FILTER ( isLiteral(?oldO) ) BIND ( ?oldO as ?newO ) } } this created a temporary file. I replaced the DELETE part with DROP GRAPH <%(abox)s> and the INSERT with LOAD <file://%(tmpfile)s> INTO GRAPH <%(abox)s> ==== The rewritten code took only a few minutes (less than 5 in total) I was expecting some improvement, but not as much as I saw. My understanding is that each of the three operations is atomic and isolated, but I lost the guarantee linking the three (which I did not need since I had exclusive lock at a higher level). Was it the atomicity that cost so much? Jeremy |