My application creates large SPARQL 1.1. Update queries. I would expect to get a response similar to "345 triples modified", but bigdata returns an HTML page that contains all the data back me. This is an issue, since the SPARQL queries I generate reach 6 megabytes now and may get even bigger.
I've been looking at the code and got to the BigdataRDFContext.SparqlUpdateResponseWriter class and its updateEvent method. Near the end there is a block of code labelled "End of some UPDATE" operation. It in turn contains:
In my case e.getUpdate() returns a DeleteInsertGraph, whose toString method generates a massive string with all the deleted and inserted triples. This means pushing 5 megabytes of text back to the query issuer.
Now my question is why? I'd be inclined to call it a bug and simply remove that <pre> block from the HTML response, or at least surround it with log.isDebugEnabled. I couldn't find any way to work around it that wouldn't involve code modification. WDYT? What is the best way to proceed with it? If you agree that this is indeed an issue, I could try to submit a patch.
The reason why we are echoing back the operation is that there can be multiple operations in a single UPDATE request. Without that echoed back information, you can't really tell what operation is running when you hit an exception and (if you are loading a bunch of large files) you can't tell what progress it is making on which load (the development branch currently reports incremental progress for LOAD).
It sounds like we need to be careful about what information we echo back about the operation. That's a bug and we will fix it ASAP.
I do not really recommend sending a large amount of data through a SPARQL Update operation. This is going to put a burden on the SPARQL query processor, which has to parse that UPDATE, generate the parse tree, translate that into an operator model, and then execute it. It is MUCH more efficient if you are going to load a large data set to use LOAD and point it at a file or use one of the REST API methods to send pure data.
This advice is in the category of performance tuning. It should work anyway, but you are not doing things in the most efficient manner.
Ticket filed at .
https://sourceforge.net/apps/trac/bigdata/ticket/613 (SPARQL UPDATE response inlines large DELETE or INSERT triple graphs)
I've proposed a fix at and committed this change to the dev branch. You can either checkout the development branch (branches/BIGDATA_RELEASE_1_2_0) or make the change in your local code as described at . Let me known how it works.
Everything did work, right now our largest query is 6MB and the AST results in a visible peak on the heap usage graph (about 500m), but nothing breaks, and the processing time is about 14 seconds on our test machine, which isn't a tragedy. It's just that I thought it could be made a little faster. Thanks for the fix.
We're using the UPDATE for two reasons:
1. The code that generates those updates sometimes has to delete old triples as well, so it's not always only INSERT.
2. We want to stay standard-compliant, to keep it working with any SPARQL 1.1 endpoint (unless we stumble upon really strong reasons for vendor-specific optimizations).
That being said…
… does indeed look like an interesting avenue for further optimization if necessary. Thanks for the tip.
I'd gladly test the war from the current trunk, but it doesn't seem to work for me out of the box at the moment. I described the issue on
IMHO it's a bug. I'd be grateful for a comment.
Yes, that's an unintended consequence of the high availability support. I'm not sure what the right answer is here. HA support for the Journal requires zookeeper and the jini/river JARs. Those have not been standard dependencies for the standalone deployment, and they should not be dependencies when the NanoSparqlServer is deployed as an embedded component. I will probably refactor the code into a supporting class in order to avoid the run-time dependency when the StatusServlet is used but the Journal is not highly available.