This list is closed, nobody may subscribe to it.
2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Bryan T. <br...@bl...> - 2016-09-20 16:52:17
|
There is a different parameter for the JVM to specify the maximum native memory allocation. Here is an example from the wiki. You will want to use values specific to your machine. # Sample JVM options showing allocation of a 4GB managed object heap # and allowing a 3GB native heap. Always use the -server mode JVM for # Blazegraph. -server -Xmx4G -XX:MaxDirectMemorySize=3000m Thanks, Bryan On Mon, Sep 19, 2016 at 9:57 AM, Eric Scott <eri...@at...> wrote: > Hi all - > > We've been running a copy of the Wikidata stand-alone for several months > now with relatively few problems, but this weekend we abruptly starting > having memory issues while executing a query against the WD triple store. > It seems to manifest first as a MemoryManagerClosedException. > > See below for a stack trace of one fairly typical error that gets logged. > > We've been running with the default RWStore parameters included for the > Wikidata stand-alone. Upping the heap size form 8G to 12G did not help. > This is running on a server that has 128GB of memory, with plenty free. > > If someone could provide me with guidance, I'd greatly appreciate it. > > Cheers, > > Eric Scott > > > ERROR: Haltable.java:469: com.bigdata.bop.join.PipelineJoin$JoinTask{ > joinOp=com.bigdata.bop.join.PipelineJoin[2]()[ BOp.bopId=2, > JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, > BOp.evaluationContext=ANY, AccessPathJoinAnnotations. > predicate=com.bigdata.rdf.spo.SPOPredicate[1](s=null, p=null, o=null)[ > IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1474255803369, > BOp.bopId=1, AST2BOpBase.estimatedCardinality=1009418509, > AST2BOpBase.originalIndex=SPO, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]} > : isFirstCause=true : com.bigdata.rwstore.sector. > MemoryManagerClosedException > com.bigdata.rwstore.sector.MemoryManagerClosedException > at com.bigdata.rwstore.sector.MemoryManager.assertOpen( > MemoryManager.java:110) > at com.bigdata.rwstore.sector.MemoryManager.allocate( > MemoryManager.java:671) > at com.bigdata.rwstore.sector.AllocationContext.allocate( > AllocationContext.java:195) > at com.bigdata.rwstore.sector.AllocationContext.allocate( > AllocationContext.java:169) > at com.bigdata.rwstore.sector.AllocationContext.allocate( > AllocationContext.java:159) > at com.bigdata.rwstore.sector.AllocationContext.alloc( > AllocationContext.java:359) > at com.bigdata.rwstore.PSOutputStream.save( > PSOutputStream.java:335) > at com.bigdata.rwstore.PSOutputStream.getAddr( > PSOutputStream.java:416) > at com.bigdata.bop.solutions.SolutionSetStream.put( > SolutionSetStream.java:297) > at com.bigdata.bop.engine.LocalNativeChunkMessage.<init> > (LocalNativeChunkMessage.java:213) > at com.bigdata.bop.engine.LocalNativeChunkMessage.<init> > (LocalNativeChunkMessage.java:147) > at com.bigdata.bop.engine.StandaloneChunkHandler.handleChunk( > StandaloneChunkHandler.java:90) > at com.bigdata.bop.engine.ChunkedRunningQuery$ > HandleChunkBuffer.outputChunk(ChunkedRunningQuery.java:1699) > at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer. > addReorderAllowed(ChunkedRunningQuery.java:1628) > at com.bigdata.bop.engine.ChunkedRunningQuery$ > HandleChunkBuffer.add(ChunkedRunningQuery.java:1569) > at com.bigdata.bop.engine.ChunkedRunningQuery$ > HandleChunkBuffer.add(ChunkedRunningQuery.java:1453) > at com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer. > handleChunk(UnsyncLocalOutputBuffer.java:59) > at com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer. > handleChunk(UnsyncLocalOutputBuffer.java:14) > at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuf > fer.overflow(AbstractUnsynchronizedArrayBuffer.java:287) > at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuf > fer.add2(AbstractUnsynchronizedArrayBuffer.java:215) > at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuf > fer.add(AbstractUnsynchronizedArrayBuffer.java:173) > at com.bigdata.bop.join.PipelineJoin$JoinTask$ > AccessPathTask.handleJoin2(PipelineJoin.java:1868) > at com.bigdata.bop.join.PipelineJoin$JoinTask$AccessPathTask.call( > PipelineJoin.java:1684) > at com.bigdata.bop.join.PipelineJoin$JoinTask$ > BindingSetConsumerTask.runOneTask(PipelineJoin.java:1086) > at com.bigdata.bop.join.PipelineJoin$JoinTask$ > BindingSetConsumerTask.call(PipelineJoin.java:995) > at com.bigdata.bop.join.PipelineJoin$JoinTask. > consumeSource(PipelineJoin.java:728) > at com.bigdata.bop.join.PipelineJoin$JoinTask.call( > PipelineJoin.java:623) > at com.bigdata.bop.join.PipelineJoin$JoinTask.call( > PipelineJoin.java:382) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63) > at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask. > call(ChunkedRunningQuery.java:1346) > at com.bigdata.bop.engine.ChunkedRunningQuery$ > ChunkTaskWrapper.run(ChunkedRunningQuery.java:926) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63) > at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run( > ChunkedRunningQuery.java:821) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Eric S. <eri...@at...> - 2016-09-19 17:12:30
|
Hi all - We've been running a copy of the Wikidata stand-alone for several months now with relatively few problems, but this weekend we abruptly starting having memory issues while executing a query against the WD triple store. It seems to manifest first as a MemoryManagerClosedException. See below for a stack trace of one fairly typical error that gets logged. We've been running with the default RWStore parameters included for the Wikidata stand-alone. Upping the heap size form 8G to 12G did not help. This is running on a server that has 128GB of memory, with plenty free. If someone could provide me with guidance, I'd greatly appreciate it. Cheers, Eric Scott ERROR: Haltable.java:469: com.bigdata.bop.join.PipelineJoin$JoinTask{ joinOp=com.bigdata.bop.join.PipelineJoin[2]()[ BOp.bopId=2, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[1](s=null, p=null, o=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1474255803369, BOp.bopId=1, AST2BOpBase.estimatedCardinality=1009418509, AST2BOpBase.originalIndex=SPO, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]} : isFirstCause=true : com.bigdata.rwstore.sector.MemoryManagerClosedException com.bigdata.rwstore.sector.MemoryManagerClosedException at com.bigdata.rwstore.sector.MemoryManager.assertOpen(MemoryManager.java:110) at com.bigdata.rwstore.sector.MemoryManager.allocate(MemoryManager.java:671) at com.bigdata.rwstore.sector.AllocationContext.allocate(AllocationContext.java:195) at com.bigdata.rwstore.sector.AllocationContext.allocate(AllocationContext.java:169) at com.bigdata.rwstore.sector.AllocationContext.allocate(AllocationContext.java:159) at com.bigdata.rwstore.sector.AllocationContext.alloc(AllocationContext.java:359) at com.bigdata.rwstore.PSOutputStream.save(PSOutputStream.java:335) at com.bigdata.rwstore.PSOutputStream.getAddr(PSOutputStream.java:416) at com.bigdata.bop.solutions.SolutionSetStream.put(SolutionSetStream.java:297) at com.bigdata.bop.engine.LocalNativeChunkMessage.<init>(LocalNativeChunkMessage.java:213) at com.bigdata.bop.engine.LocalNativeChunkMessage.<init>(LocalNativeChunkMessage.java:147) at com.bigdata.bop.engine.StandaloneChunkHandler.handleChunk(StandaloneChunkHandler.java:90) at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.outputChunk(ChunkedRunningQuery.java:1699) at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.addReorderAllowed(ChunkedRunningQuery.java:1628) at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.add(ChunkedRunningQuery.java:1569) at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.add(ChunkedRunningQuery.java:1453) at com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer.handleChunk(UnsyncLocalOutputBuffer.java:59) at com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer.handleChunk(UnsyncLocalOutputBuffer.java:14) at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuffer.overflow(AbstractUnsynchronizedArrayBuffer.java:287) at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuffer.add2(AbstractUnsynchronizedArrayBuffer.java:215) at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuffer.add(AbstractUnsynchronizedArrayBuffer.java:173) at com.bigdata.bop.join.PipelineJoin$JoinTask$AccessPathTask.handleJoin2(PipelineJoin.java:1868) at com.bigdata.bop.join.PipelineJoin$JoinTask$AccessPathTask.call(PipelineJoin.java:1684) at com.bigdata.bop.join.PipelineJoin$JoinTask$BindingSetConsumerTask.runOneTask(PipelineJoin.java:1086) at com.bigdata.bop.join.PipelineJoin$JoinTask$BindingSetConsumerTask.call(PipelineJoin.java:995) at com.bigdata.bop.join.PipelineJoin$JoinTask.consumeSource(PipelineJoin.java:728) at com.bigdata.bop.join.PipelineJoin$JoinTask.call(PipelineJoin.java:623) at com.bigdata.bop.join.PipelineJoin$JoinTask.call(PipelineJoin.java:382) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63) at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1346) at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTaskWrapper.run(ChunkedRunningQuery.java:926) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63) at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run(ChunkedRunningQuery.java:821) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) |
From: Aymeric B. <aym...@pe...> - 2016-09-13 15:01:47
|
Hello, I have been trying to reach Blazegraph support for 3 weeks now (website, email, phone, twitter) in vain, so maybe I will have some answers via this mailing list. Basically here are my questions : - what is the the difference between commercial, entreprise and OEM licensing? - how to have the GPU acceleration feature? - how the 7-days developper support works? Is that 7 full days after the purchase or can it be spread on the year? Thanks, Aymeric |
From: Jasper K. <jas...@gm...> - 2016-09-05 05:46:01
|
Hello, The question below is solved but as it was all written down maybe it might be useful for others… The solution is: -Djetty.start.timeout=60 As can be found here: https://wiki.blazegraph.com/wiki/index.php/NanoSparqlServer I have created a blaze graph database through the github code running the following command: /ssd/jasper/BLAZEGRAPH_RELEASE_2_1_1/scripts/dataLoader.sh -verbose -durableQueues -namespace MicroDB MicroDB.properties ~/GZ/*ttl.gz The property file that I used was as followed: com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=1024 com.bigdata.relation.container=MicroDB com.bigdata.journal.AbstractJournal.bufferMode=DiskRW com.bigdata.journal.AbstractJournal.file=MicroDB.jnl com.bigdata.journal.AbstractJournal.initialExtent=209715200 com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.DefaultBigdataVocabulary com.bigdata.rdf.store.AbstractTripleStore.textIndex=false com.bigdata.btree.BTree.branchingFactor=128 com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor=400 com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms com.bigdata.service.AbstractTransactionService.minReleaseAge=1 com.bigdata.rdf.sail.truthMaintenance=false com.bigdata.journal.AbstractJournal.maximumExtent=209715200 com.bigdata.rdf.sail.namespace=MicroDB com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore com.bigdata.rdf.store.AbstractTripleStore.quads=false com.bigdata.relation.namespace=MicroDB com.bigdata.btree.writeRetentionQueue.capacity=4000 com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false After loading 4961 files all received the .good extension. When trying to start the database of ±413 gigabytes using the following commands: /ssd/jasper/BLAZEGRAPH_RELEASE_2_1_1/scripts/startBlazegraph.sh 9999 kb /ssd/jasper/MicroDB/MicroDB.properties This is with a slightly modified startBlazegraph.sh script: BASE_DIR=`dirname $0` PORT=$1 NAMESPACE=$2 PROPERTIES_FILE=$3 "$BASE_DIR"/prog.sh com.bigdata.rdf.sail.webapp.NanoSparqlServer $PORT $NAMESPACE $PROPERTIES_FILE However the end result is: WARN : NanoSparqlServer.java:517: Starting NSS WARN : ServiceProviderHook.java:171: Running. Server did not start. FATAL: NanoSparqlServer.java:538: Server did not start. ERROR: Banner.java:160: Uncaught exception in thread java.util.concurrent.TimeoutException at com.bigdata.rdf.sail.webapp.NanoSparqlServer.awaitServerStart(NanoSparqlServer.java:528) at com.bigdata.rdf.sail.webapp.NanoSparqlServer.main(NanoSparqlServer.java:482) I thought it could be due to the github version and thus tried the jar file from github and the jar file from source forge: java -server -Xmx8g -Dbigdata.propertyFile=MicroDB.properties -jar blazegraph-jar-2.1.1.jar java -server -Xmx8g -Dbigdata.propertyFile=MicroDB.properties -jar blazegraph.jar Unfortunately with the same result: WARN : NanoSparqlServer.java:517: Starting NSS WARN : ServiceProviderHook.java:171: Running. Server did not start. FATAL: NanoSparqlServer.java:538: Server did not start. ERROR: Banner.java:160: Uncaught exception in thread java.util.concurrent.TimeoutException at com.bigdata.rdf.sail.webapp.NanoSparqlServer.awaitServerStart(NanoSparqlServer.java:528) at com.bigdata.rdf.sail.webapp.StandaloneNanoSparqlServer.main(StandaloneNanoSparqlServer.java:150) Any ideas on how to start the database as reloading it all again will take quite some time… Thanks, Jasper |
From: Joakim S. <joa...@bl...> - 2016-08-30 17:48:18
|
Hi, In the last release, is it possible to load bz2-files using http://localhost:9999/blazegraph/dataloader It would save time to not have to uncompress before loading. |
From: Bryan T. <br...@bl...> - 2016-08-26 02:53:10
|
Edgar, The upcoming 2.1.4 / 2.2.0 releases both include the ability to place the intermediate solutions of either the native or the managed heap. The default patterns are either / or. However, it is possible to configure any pattern. For example, the pattern could use native memory if there are more than X solutions on the managed heap or GC time is more than some threshold. This is a useful tool for managing the heap / performance tradeoff because much of the memory burden of running queries is the managed object heap. If you allow large heaps, then make sure you have the memory available for those heaps and that swapping does not occur (swappiness is zero, etc). With respect to the timings you cite below, performance of the analytic mode is strongly dependent on whether the query has significant memory or can benefit from increased parallelism (especially for distinct solution filters, which are only concurrent for the non-analytic mode). The analytic mode is really designed for queries with larger hash joins. The new ability to put the intermediate solutions in native memory addresses the memory burden from in flight intermediate solutions. As indicated above, this decision about managed vs native heap can be made dynamically by overriding the default policy and orthogonal to the choice of analytic or non-analytic joins. I also suggest that you look at count(*) or explain versions of queries when reporting timings. Often the query engine runs faster than the ability of the client to drain the solutions. Currently those solutions dwell on the managed object heap until they are drained by the client. We will address this aspect in a subsequent release. However, we have observed that 50% of the evaluation time for queries with modest output cardinality (10,000 rows) can be waiting on the client to drain the solutions. If there are current high output cardinality queries, then the GC pressure arising from a slow client can slow down overall evaluation. You could also look at increasing the operator level parallelism. The main place where the analytic mode is slower is a distinct solutions filter. For a quads mode application, we use a distinct solutions filter implicitly for each default graph triple pattern in order enforce the RDF merge semantics. However, some applications (including I believe yours) ensure that the same triple does not appear in more than one named graph. In such cases you can disable this distinct solutions filter on the quads mode default graph access paths and enjoy improved with query parallelism as a result. As a general guideline, you can hide latency under concurrency. If you are getting results which are not consistent with this, then the system is probably at some extreme. This could be limited within query parallelism, swapping, exceeding the viable disk bandwidth, etc. I am not sure what the limiting factor is for your queries, but I would suspect any of: slow client draining results, distinct solutions filter for the quads access path, swapping, etc. Thanks, Bryan On Thursday, August 25, 2016, Edgar Rodriguez-Diaz <ed...@sy...> wrote: > Hi, > > We’ve been experimenting with analytic mode using a dataset with ~12M > quads. > While running a particular query, a bit complex, it produces the following > running times consistently: > > Blazegraph version: 2.0.1 > > Using Java Heap > Mem Configurations tried: (Xms4g, Xmx8g) > > Concurrency Level | Approx avg. execution time > ------------------|--------------------------- > 1 | 3.5 > 3 | 5.0 > 5 | 5.5 > 9 | 8 > 10 | 9 > > Using Analytic Mode > Mem Configurations tried: > (Xms2g, Xmx2g, -XX:MaxDirectMemorySize=6g), > (Xms4g, Xmx4g, -XX:MaxDirectMemorySize=4g) > > Concurrency Level | Approx avg. execution time > ------------------|--------------------------- > 1 | 3.09 > 3 | 5.95 > 5 | 10.58 > 9 | 19.24 > 10 | 22.87 > > On levels of concurrency > 3, queries are highly penalized on performance, > being at least > 2x slower. I know that there may be an overhead on performance for BG > doing it's own memory management, but > 2x slower queries on relatively low levels of concurrency seems like a bit > too high. > > So, the questions are: > Is the previous outcome something expected or an exception?, if it's an > exception I could follow up with a bug report. > What can be expected on the performance of concurrent queries while on > analytic mode? > > |
From: Edgar Rodriguez-D. <ed...@sy...> - 2016-08-26 00:02:19
|
Hi, We’ve been experimenting with analytic mode using a dataset with ~12M quads. While running a particular query, a bit complex, it produces the following running times consistently: Blazegraph version: 2.0.1 Using Java Heap Mem Configurations tried: (Xms4g, Xmx8g) Concurrency Level | Approx avg. execution time ------------------|--------------------------- 1 | 3.5 3 | 5.0 5 | 5.5 9 | 8 10 | 9 Using Analytic Mode Mem Configurations tried: (Xms2g, Xmx2g, -XX:MaxDirectMemorySize=6g), (Xms4g, Xmx4g, -XX:MaxDirectMemorySize=4g) Concurrency Level | Approx avg. execution time ------------------|--------------------------- 1 | 3.09 3 | 5.95 5 | 10.58 9 | 19.24 10 | 22.87 On levels of concurrency > 3, queries are highly penalized on performance, being at least 2x slower. I know that there may be an overhead on performance for BG doing it's own memory management, but 2x slower queries on relatively low levels of concurrency seems like a bit too high. So, the questions are: Is the previous outcome something expected or an exception?, if it's an exception I could follow up with a bug report. What can be expected on the performance of concurrent queries while on analytic mode? |
From: Bryan T. <br...@bl...> - 2016-08-24 19:50:45
|
Great. Glad to hear it. Thanks, Bryan On Wed, Aug 24, 2016 at 3:41 PM, Kevin Ford <kf...@ar...> wrote: > Thanks, Bryan. > > Reducing the amount of max memory, thereby avoiding swapping, had a > positive impact. > > Yours, > Kevin > > > On 8/19/16 16:06, Bryan Thompson wrote: > > You might also give it less memory for a small data set. What is > > important is making sure that there is NO swapping. Java handles > > swapping very poorly because it needs to do memory scans during GC. > > > > Thanks, > > Bryan > > > > On Fri, Aug 19, 2016 at 4:36 PM, Kevin Ford <kf...@ar... > > <mailto:kf...@ar...>> wrote: > > > > Thanks, Bryan. > > > > We're looking at whether we can increase the memory on the machine > > (currently 8gb, but has other applications running on it) and see if > > that improves matters. > > > > Yours, > > Kevin > > > > > > On 8/18/16 12:32, Bryan Thompson wrote: > > > Kevin, > > > > > > I would not anticipate the kinds of times that you are reporting > below > > > for these update requests. Assuming that URI and URI/other are > > not all > > > of the triples in your graph, these should be very fast updates. > > > > > > Can you confirm that the machine has at least those 4GB of RAM > > available > > > beyond the other active processes? > > > > > > Note that Linux distributions will begin to swap out processes > > once 50% > > > of the physical memory has been used unless you set swappiness to > zero > > > (a kernel parameter - see the Blazegraph wiki for more information > on > > > this topic). > > > > > > Bryan > > > > > > On Wednesday, August 17, 2016, Kevin Ford <kf...@ar... > > <mailto:kf...@ar...> > > > <mailto:kf...@ar... <mailto:kf...@ar...>>> wrote: > > > > > > Dear All, > > > > > > I have a question about SPARQL DELETE/INSERT performance. > > > > > > We have a job that issues two DELETE queries followed by an > INSERT > > > query, per request, to keep our triplestore up to date. The > > pattern is > > > as follows: > > > > > > DELETE { <URI> ?p ?o }; > > > DELETE { <URI/other> ?p ?o }; > > > INSERT { > > > <URI> <new> '1' . > > > <URI> <new> '2' . > > > <URI> <new> '3' . > > > <URI/other> <new> '1' . > > > } > > > > > > We've noted two things: > > > > > > 1) If a store is empty, this is relatively fast. > > > 2) IF the store is configured for automatic inferencing, it is > > slower > > > than a store without that feature activated. > > > > > > Neither of those observations seems particularly surprising, > > but each of > > > those two DELETE statements and the one INSERT takes more than > > 3 seconds > > > against a store with only 4.6 million triples. See below for > > a sample > > > of the output. > > > > > > Blazegraph is allocated 4g of memory. > > > > > > We're working within a framework that produces these series of > > queries > > > so, before making any custom modifications, I was wondering if > > this > > > performance is to be expected? If not, would it point to a > > configuation > > > issue of some kind, or something else? > > > > > > Also, I've noted that issuing a DELETE HTTP request appears to > > handle > > > the deletes faster, but is there a more optimal way to > > construct those > > > three queries (the DELETE/INSERT WHERE pattern did not appear > > to improve > > > matters based on a few isolated tests). > > > > > > Cordially, > > > Kevin > > > > > > > > > $ curl -X POST > > https://laketsidx/blazegraph/namespace/lakeidx/sparql > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql>> > > > --data @bg-test-02.sparql -H "Content-type: > > > application/x-www-form-urlencoded" > > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > > > "http://www.w3.org/TR/html4/loose.dtd > > <http://www.w3.org/TR/html4/loose.dtd> > > > <http://www.w3.org/TR/html4/loose.dtd > > <http://www.w3.org/TR/html4/loose.dtd>>"><html><head><meta > > > http-equiv="Content-Type" > > > content="text/html;charset=UTF-8"><title>blazegraph™ > by > > > SYSTAP</title > > > ></head > > > ><body<p>totalElapsed=33ms, elapsed=33ms, connFlush=0ms, > > > batchResolve=0, whereClause=23ms, deleteClause=9ms, > > insertClause=23ms</p > > > ><hr><p>totalElapsed=2579ms, elapsed=21ms, connFlush=2524ms, > > > batchResolve=0, whereClause=21ms, deleteClause=0ms, > > insertClause=21ms</p > > > ><hr><p>totalElapsed=2582ms, elapsed=2ms, connFlush=0ms, > > > batchResolve=0, whereClause=0ms, deleteClause=0ms, > > insertClause=0ms</p > > > ><hr><p>COMMIT: totalElapsed=3118ms, > commitTime=1471453475719, > > > mutationCount=62</p > > > ></html > > > > > > $ curl -X POST > > https://laketsidx/blazegraph/namespace/lakeidx/sparql > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql>> > > > --data @bg-test-02.sparql -H "Content-type: > > > application/x-www-form-urlencoded" > > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > > > "http://www.w3.org/TR/html4/loose.dtd > > <http://www.w3.org/TR/html4/loose.dtd> > > > <http://www.w3.org/TR/html4/loose.dtd > > <http://www.w3.org/TR/html4/loose.dtd>>"><html><head><meta > > > http-equiv="Content-Type" > > > content="text/html;charset=UTF-8"><title>blazegraph™ > by > > > SYSTAP</title > > > ></head > > > ><body<p>totalElapsed=24ms, elapsed=24ms, connFlush=0ms, > > > batchResolve=0, whereClause=12ms, deleteClause=11ms, > > > insertClause=12ms</p > > > ><hr><p>totalElapsed=2966ms, elapsed=12ms, connFlush=2929ms, > > > batchResolve=0, whereClause=12ms, deleteClause=0ms, > > insertClause=12ms</p > > > ><hr><p>totalElapsed=2969ms, elapsed=2ms, connFlush=0ms, > > > batchResolve=0, whereClause=0ms, deleteClause=0ms, > > insertClause=0ms</p > > > ><hr><p>COMMIT: totalElapsed=3669ms, > commitTime=1471453483251, > > > mutationCount=62</p > > > ></html > > > > > > > > > > > ------------------------------------------------------------ > ------------------ > > > _______________________________________________ > > > Bigdata-developers mailing list > > > Big...@li... > > <mailto:Big...@li...> <javascript:;> > > > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > > > > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers>> > > > > > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > <mailto:Big...@li...> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > > > > > ------------------------------------------------------------ > ------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Kevin F. <kf...@ar...> - 2016-08-24 19:41:48
|
Thanks, Bryan. Reducing the amount of max memory, thereby avoiding swapping, had a positive impact. Yours, Kevin On 8/19/16 16:06, Bryan Thompson wrote: > You might also give it less memory for a small data set. What is > important is making sure that there is NO swapping. Java handles > swapping very poorly because it needs to do memory scans during GC. > > Thanks, > Bryan > > On Fri, Aug 19, 2016 at 4:36 PM, Kevin Ford <kf...@ar... > <mailto:kf...@ar...>> wrote: > > Thanks, Bryan. > > We're looking at whether we can increase the memory on the machine > (currently 8gb, but has other applications running on it) and see if > that improves matters. > > Yours, > Kevin > > > On 8/18/16 12:32, Bryan Thompson wrote: > > Kevin, > > > > I would not anticipate the kinds of times that you are reporting below > > for these update requests. Assuming that URI and URI/other are > not all > > of the triples in your graph, these should be very fast updates. > > > > Can you confirm that the machine has at least those 4GB of RAM > available > > beyond the other active processes? > > > > Note that Linux distributions will begin to swap out processes > once 50% > > of the physical memory has been used unless you set swappiness to zero > > (a kernel parameter - see the Blazegraph wiki for more information on > > this topic). > > > > Bryan > > > > On Wednesday, August 17, 2016, Kevin Ford <kf...@ar... > <mailto:kf...@ar...> > > <mailto:kf...@ar... <mailto:kf...@ar...>>> wrote: > > > > Dear All, > > > > I have a question about SPARQL DELETE/INSERT performance. > > > > We have a job that issues two DELETE queries followed by an INSERT > > query, per request, to keep our triplestore up to date. The > pattern is > > as follows: > > > > DELETE { <URI> ?p ?o }; > > DELETE { <URI/other> ?p ?o }; > > INSERT { > > <URI> <new> '1' . > > <URI> <new> '2' . > > <URI> <new> '3' . > > <URI/other> <new> '1' . > > } > > > > We've noted two things: > > > > 1) If a store is empty, this is relatively fast. > > 2) IF the store is configured for automatic inferencing, it is > slower > > than a store without that feature activated. > > > > Neither of those observations seems particularly surprising, > but each of > > those two DELETE statements and the one INSERT takes more than > 3 seconds > > against a store with only 4.6 million triples. See below for > a sample > > of the output. > > > > Blazegraph is allocated 4g of memory. > > > > We're working within a framework that produces these series of > queries > > so, before making any custom modifications, I was wondering if > this > > performance is to be expected? If not, would it point to a > configuation > > issue of some kind, or something else? > > > > Also, I've noted that issuing a DELETE HTTP request appears to > handle > > the deletes faster, but is there a more optimal way to > construct those > > three queries (the DELETE/INSERT WHERE pattern did not appear > to improve > > matters based on a few isolated tests). > > > > Cordially, > > Kevin > > > > > > $ curl -X POST > https://laketsidx/blazegraph/namespace/lakeidx/sparql > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql > <https://laketsidx/blazegraph/namespace/lakeidx/sparql>> > > --data @bg-test-02.sparql -H "Content-type: > > application/x-www-form-urlencoded" > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > > "http://www.w3.org/TR/html4/loose.dtd > <http://www.w3.org/TR/html4/loose.dtd> > > <http://www.w3.org/TR/html4/loose.dtd > <http://www.w3.org/TR/html4/loose.dtd>>"><html><head><meta > > http-equiv="Content-Type" > > content="text/html;charset=UTF-8"><title>blazegraph™ by > > SYSTAP</title > > ></head > > ><body<p>totalElapsed=33ms, elapsed=33ms, connFlush=0ms, > > batchResolve=0, whereClause=23ms, deleteClause=9ms, > insertClause=23ms</p > > ><hr><p>totalElapsed=2579ms, elapsed=21ms, connFlush=2524ms, > > batchResolve=0, whereClause=21ms, deleteClause=0ms, > insertClause=21ms</p > > ><hr><p>totalElapsed=2582ms, elapsed=2ms, connFlush=0ms, > > batchResolve=0, whereClause=0ms, deleteClause=0ms, > insertClause=0ms</p > > ><hr><p>COMMIT: totalElapsed=3118ms, commitTime=1471453475719, > > mutationCount=62</p > > ></html > > > > $ curl -X POST > https://laketsidx/blazegraph/namespace/lakeidx/sparql > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql > <https://laketsidx/blazegraph/namespace/lakeidx/sparql>> > > --data @bg-test-02.sparql -H "Content-type: > > application/x-www-form-urlencoded" > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > > "http://www.w3.org/TR/html4/loose.dtd > <http://www.w3.org/TR/html4/loose.dtd> > > <http://www.w3.org/TR/html4/loose.dtd > <http://www.w3.org/TR/html4/loose.dtd>>"><html><head><meta > > http-equiv="Content-Type" > > content="text/html;charset=UTF-8"><title>blazegraph™ by > > SYSTAP</title > > ></head > > ><body<p>totalElapsed=24ms, elapsed=24ms, connFlush=0ms, > > batchResolve=0, whereClause=12ms, deleteClause=11ms, > > insertClause=12ms</p > > ><hr><p>totalElapsed=2966ms, elapsed=12ms, connFlush=2929ms, > > batchResolve=0, whereClause=12ms, deleteClause=0ms, > insertClause=12ms</p > > ><hr><p>totalElapsed=2969ms, elapsed=2ms, connFlush=0ms, > > batchResolve=0, whereClause=0ms, deleteClause=0ms, > insertClause=0ms</p > > ><hr><p>COMMIT: totalElapsed=3669ms, commitTime=1471453483251, > > mutationCount=62</p > > ></html > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > <mailto:Big...@li...> <javascript:;> > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers>> > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > <mailto:Big...@li...> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > |
From: Bryan T. <br...@bl...> - 2016-08-24 17:23:13
|
All, we are at a code freeze for a 2.2.0RC. Brad will prepare the RC branch shortly. Once that is done, all continued development should be in master. Thanks, Bryan |
From: Bryan T. <br...@bl...> - 2016-08-19 21:06:11
|
You might also give it less memory for a small data set. What is important is making sure that there is NO swapping. Java handles swapping very poorly because it needs to do memory scans during GC. Thanks, Bryan On Fri, Aug 19, 2016 at 4:36 PM, Kevin Ford <kf...@ar...> wrote: > Thanks, Bryan. > > We're looking at whether we can increase the memory on the machine > (currently 8gb, but has other applications running on it) and see if > that improves matters. > > Yours, > Kevin > > > On 8/18/16 12:32, Bryan Thompson wrote: > > Kevin, > > > > I would not anticipate the kinds of times that you are reporting below > > for these update requests. Assuming that URI and URI/other are not all > > of the triples in your graph, these should be very fast updates. > > > > Can you confirm that the machine has at least those 4GB of RAM available > > beyond the other active processes? > > > > Note that Linux distributions will begin to swap out processes once 50% > > of the physical memory has been used unless you set swappiness to zero > > (a kernel parameter - see the Blazegraph wiki for more information on > > this topic). > > > > Bryan > > > > On Wednesday, August 17, 2016, Kevin Ford <kf...@ar... > > <mailto:kf...@ar...>> wrote: > > > > Dear All, > > > > I have a question about SPARQL DELETE/INSERT performance. > > > > We have a job that issues two DELETE queries followed by an INSERT > > query, per request, to keep our triplestore up to date. The pattern > is > > as follows: > > > > DELETE { <URI> ?p ?o }; > > DELETE { <URI/other> ?p ?o }; > > INSERT { > > <URI> <new> '1' . > > <URI> <new> '2' . > > <URI> <new> '3' . > > <URI/other> <new> '1' . > > } > > > > We've noted two things: > > > > 1) If a store is empty, this is relatively fast. > > 2) IF the store is configured for automatic inferencing, it is slower > > than a store without that feature activated. > > > > Neither of those observations seems particularly surprising, but > each of > > those two DELETE statements and the one INSERT takes more than 3 > seconds > > against a store with only 4.6 million triples. See below for a > sample > > of the output. > > > > Blazegraph is allocated 4g of memory. > > > > We're working within a framework that produces these series of > queries > > so, before making any custom modifications, I was wondering if this > > performance is to be expected? If not, would it point to a > configuation > > issue of some kind, or something else? > > > > Also, I've noted that issuing a DELETE HTTP request appears to handle > > the deletes faster, but is there a more optimal way to construct > those > > three queries (the DELETE/INSERT WHERE pattern did not appear to > improve > > matters based on a few isolated tests). > > > > Cordially, > > Kevin > > > > > > $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > > --data @bg-test-02.sparql -H "Content-type: > > application/x-www-form-urlencoded" > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > > "http://www.w3.org/TR/html4/loose.dtd > > <http://www.w3.org/TR/html4/loose.dtd>"><html><head><meta > > http-equiv="Content-Type" > > content="text/html;charset=UTF-8"><title>blazegraph™ by > > SYSTAP</title > > ></head > > ><body<p>totalElapsed=33ms, elapsed=33ms, connFlush=0ms, > > batchResolve=0, whereClause=23ms, deleteClause=9ms, > insertClause=23ms</p > > ><hr><p>totalElapsed=2579ms, elapsed=21ms, connFlush=2524ms, > > batchResolve=0, whereClause=21ms, deleteClause=0ms, > insertClause=21ms</p > > ><hr><p>totalElapsed=2582ms, elapsed=2ms, connFlush=0ms, > > batchResolve=0, whereClause=0ms, deleteClause=0ms, > insertClause=0ms</p > > ><hr><p>COMMIT: totalElapsed=3118ms, commitTime=1471453475719, > > mutationCount=62</p > > ></html > > > > $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql > > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > > --data @bg-test-02.sparql -H "Content-type: > > application/x-www-form-urlencoded" > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > > "http://www.w3.org/TR/html4/loose.dtd > > <http://www.w3.org/TR/html4/loose.dtd>"><html><head><meta > > http-equiv="Content-Type" > > content="text/html;charset=UTF-8"><title>blazegraph™ by > > SYSTAP</title > > ></head > > ><body<p>totalElapsed=24ms, elapsed=24ms, connFlush=0ms, > > batchResolve=0, whereClause=12ms, deleteClause=11ms, > > insertClause=12ms</p > > ><hr><p>totalElapsed=2966ms, elapsed=12ms, connFlush=2929ms, > > batchResolve=0, whereClause=12ms, deleteClause=0ms, > insertClause=12ms</p > > ><hr><p>totalElapsed=2969ms, elapsed=2ms, connFlush=0ms, > > batchResolve=0, whereClause=0ms, deleteClause=0ms, > insertClause=0ms</p > > ><hr><p>COMMIT: totalElapsed=3669ms, commitTime=1471453483251, > > mutationCount=62</p > > ></html > > > > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... <javascript:;> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > > > ------------------------------------------------------------ > ------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Kevin F. <kf...@ar...> - 2016-08-19 21:04:34
|
Thanks, Bryan. We're looking at whether we can increase the memory on the machine (currently 8gb, but has other applications running on it) and see if that improves matters. Yours, Kevin On 8/18/16 12:32, Bryan Thompson wrote: > Kevin, > > I would not anticipate the kinds of times that you are reporting below > for these update requests. Assuming that URI and URI/other are not all > of the triples in your graph, these should be very fast updates. > > Can you confirm that the machine has at least those 4GB of RAM available > beyond the other active processes? > > Note that Linux distributions will begin to swap out processes once 50% > of the physical memory has been used unless you set swappiness to zero > (a kernel parameter - see the Blazegraph wiki for more information on > this topic). > > Bryan > > On Wednesday, August 17, 2016, Kevin Ford <kf...@ar... > <mailto:kf...@ar...>> wrote: > > Dear All, > > I have a question about SPARQL DELETE/INSERT performance. > > We have a job that issues two DELETE queries followed by an INSERT > query, per request, to keep our triplestore up to date. The pattern is > as follows: > > DELETE { <URI> ?p ?o }; > DELETE { <URI/other> ?p ?o }; > INSERT { > <URI> <new> '1' . > <URI> <new> '2' . > <URI> <new> '3' . > <URI/other> <new> '1' . > } > > We've noted two things: > > 1) If a store is empty, this is relatively fast. > 2) IF the store is configured for automatic inferencing, it is slower > than a store without that feature activated. > > Neither of those observations seems particularly surprising, but each of > those two DELETE statements and the one INSERT takes more than 3 seconds > against a store with only 4.6 million triples. See below for a sample > of the output. > > Blazegraph is allocated 4g of memory. > > We're working within a framework that produces these series of queries > so, before making any custom modifications, I was wondering if this > performance is to be expected? If not, would it point to a configuation > issue of some kind, or something else? > > Also, I've noted that issuing a DELETE HTTP request appears to handle > the deletes faster, but is there a more optimal way to construct those > three queries (the DELETE/INSERT WHERE pattern did not appear to improve > matters based on a few isolated tests). > > Cordially, > Kevin > > > $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > --data @bg-test-02.sparql -H "Content-type: > application/x-www-form-urlencoded" > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > "http://www.w3.org/TR/html4/loose.dtd > <http://www.w3.org/TR/html4/loose.dtd>"><html><head><meta > http-equiv="Content-Type" > content="text/html;charset=UTF-8"><title>blazegraph™ by > SYSTAP</title > ></head > ><body<p>totalElapsed=33ms, elapsed=33ms, connFlush=0ms, > batchResolve=0, whereClause=23ms, deleteClause=9ms, insertClause=23ms</p > ><hr><p>totalElapsed=2579ms, elapsed=21ms, connFlush=2524ms, > batchResolve=0, whereClause=21ms, deleteClause=0ms, insertClause=21ms</p > ><hr><p>totalElapsed=2582ms, elapsed=2ms, connFlush=0ms, > batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p > ><hr><p>COMMIT: totalElapsed=3118ms, commitTime=1471453475719, > mutationCount=62</p > ></html > > $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql > <https://laketsidx/blazegraph/namespace/lakeidx/sparql> > --data @bg-test-02.sparql -H "Content-type: > application/x-www-form-urlencoded" > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > "http://www.w3.org/TR/html4/loose.dtd > <http://www.w3.org/TR/html4/loose.dtd>"><html><head><meta > http-equiv="Content-Type" > content="text/html;charset=UTF-8"><title>blazegraph™ by > SYSTAP</title > ></head > ><body<p>totalElapsed=24ms, elapsed=24ms, connFlush=0ms, > batchResolve=0, whereClause=12ms, deleteClause=11ms, > insertClause=12ms</p > ><hr><p>totalElapsed=2966ms, elapsed=12ms, connFlush=2929ms, > batchResolve=0, whereClause=12ms, deleteClause=0ms, insertClause=12ms</p > ><hr><p>totalElapsed=2969ms, elapsed=2ms, connFlush=0ms, > batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p > ><hr><p>COMMIT: totalElapsed=3669ms, commitTime=1471453483251, > mutationCount=62</p > ></html > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <javascript:;> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > |
From: Bryan T. <br...@bl...> - 2016-08-18 17:54:45
|
Kevin, I would not anticipate the kinds of times that you are reporting below for these update requests. Assuming that URI and URI/other are not all of the triples in your graph, these should be very fast updates. Can you confirm that the machine has at least those 4GB of RAM available beyond the other active processes? Note that Linux distributions will begin to swap out processes once 50% of the physical memory has been used unless you set swappiness to zero (a kernel parameter - see the Blazegraph wiki for more information on this topic). Bryan On Wednesday, August 17, 2016, Kevin Ford <kf...@ar...> wrote: > Dear All, > > I have a question about SPARQL DELETE/INSERT performance. > > We have a job that issues two DELETE queries followed by an INSERT > query, per request, to keep our triplestore up to date. The pattern is > as follows: > > DELETE { <URI> ?p ?o }; > DELETE { <URI/other> ?p ?o }; > INSERT { > <URI> <new> '1' . > <URI> <new> '2' . > <URI> <new> '3' . > <URI/other> <new> '1' . > } > > We've noted two things: > > 1) If a store is empty, this is relatively fast. > 2) IF the store is configured for automatic inferencing, it is slower > than a store without that feature activated. > > Neither of those observations seems particularly surprising, but each of > those two DELETE statements and the one INSERT takes more than 3 seconds > against a store with only 4.6 million triples. See below for a sample > of the output. > > Blazegraph is allocated 4g of memory. > > We're working within a framework that produces these series of queries > so, before making any custom modifications, I was wondering if this > performance is to be expected? If not, would it point to a configuation > issue of some kind, or something else? > > Also, I've noted that issuing a DELETE HTTP request appears to handle > the deletes faster, but is there a more optimal way to construct those > three queries (the DELETE/INSERT WHERE pattern did not appear to improve > matters based on a few isolated tests). > > Cordially, > Kevin > > > $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql > --data @bg-test-02.sparql -H "Content-type: > application/x-www-form-urlencoded" > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta > http-equiv="Content-Type" > content="text/html;charset=UTF-8"><title>blazegraph™ by > SYSTAP</title > ></head > ><body<p>totalElapsed=33ms, elapsed=33ms, connFlush=0ms, > batchResolve=0, whereClause=23ms, deleteClause=9ms, insertClause=23ms</p > ><hr><p>totalElapsed=2579ms, elapsed=21ms, connFlush=2524ms, > batchResolve=0, whereClause=21ms, deleteClause=0ms, insertClause=21ms</p > ><hr><p>totalElapsed=2582ms, elapsed=2ms, connFlush=0ms, > batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p > ><hr><p>COMMIT: totalElapsed=3118ms, commitTime=1471453475719, > mutationCount=62</p > ></html > > $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql > --data @bg-test-02.sparql -H "Content-type: > application/x-www-form-urlencoded" > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta > http-equiv="Content-Type" > content="text/html;charset=UTF-8"><title>blazegraph™ by > SYSTAP</title > ></head > ><body<p>totalElapsed=24ms, elapsed=24ms, connFlush=0ms, > batchResolve=0, whereClause=12ms, deleteClause=11ms, insertClause=12ms</p > ><hr><p>totalElapsed=2966ms, elapsed=12ms, connFlush=2929ms, > batchResolve=0, whereClause=12ms, deleteClause=0ms, insertClause=12ms</p > ><hr><p>totalElapsed=2969ms, elapsed=2ms, connFlush=0ms, > batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p > ><hr><p>COMMIT: totalElapsed=3669ms, commitTime=1471453483251, > mutationCount=62</p > ></html > > > ------------------------------------------------------------ > ------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <javascript:;> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Kevin F. <kf...@ar...> - 2016-08-17 19:20:56
|
Dear All, I have a question about SPARQL DELETE/INSERT performance. We have a job that issues two DELETE queries followed by an INSERT query, per request, to keep our triplestore up to date. The pattern is as follows: DELETE { <URI> ?p ?o }; DELETE { <URI/other> ?p ?o }; INSERT { <URI> <new> '1' . <URI> <new> '2' . <URI> <new> '3' . <URI/other> <new> '1' . } We've noted two things: 1) If a store is empty, this is relatively fast. 2) IF the store is configured for automatic inferencing, it is slower than a store without that feature activated. Neither of those observations seems particularly surprising, but each of those two DELETE statements and the one INSERT takes more than 3 seconds against a store with only 4.6 million triples. See below for a sample of the output. Blazegraph is allocated 4g of memory. We're working within a framework that produces these series of queries so, before making any custom modifications, I was wondering if this performance is to be expected? If not, would it point to a configuation issue of some kind, or something else? Also, I've noted that issuing a DELETE HTTP request appears to handle the deletes faster, but is there a more optimal way to construct those three queries (the DELETE/INSERT WHERE pattern did not appear to improve matters based on a few isolated tests). Cordially, Kevin $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql --data @bg-test-02.sparql -H "Content-type: application/x-www-form-urlencoded" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"><title>blazegraph™ by SYSTAP</title ></head ><body<p>totalElapsed=33ms, elapsed=33ms, connFlush=0ms, batchResolve=0, whereClause=23ms, deleteClause=9ms, insertClause=23ms</p ><hr><p>totalElapsed=2579ms, elapsed=21ms, connFlush=2524ms, batchResolve=0, whereClause=21ms, deleteClause=0ms, insertClause=21ms</p ><hr><p>totalElapsed=2582ms, elapsed=2ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p ><hr><p>COMMIT: totalElapsed=3118ms, commitTime=1471453475719, mutationCount=62</p ></html $ curl -X POST https://laketsidx/blazegraph/namespace/lakeidx/sparql --data @bg-test-02.sparql -H "Content-type: application/x-www-form-urlencoded" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8"><title>blazegraph™ by SYSTAP</title ></head ><body<p>totalElapsed=24ms, elapsed=24ms, connFlush=0ms, batchResolve=0, whereClause=12ms, deleteClause=11ms, insertClause=12ms</p ><hr><p>totalElapsed=2966ms, elapsed=12ms, connFlush=2929ms, batchResolve=0, whereClause=12ms, deleteClause=0ms, insertClause=12ms</p ><hr><p>totalElapsed=2969ms, elapsed=2ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p ><hr><p>COMMIT: totalElapsed=3669ms, commitTime=1471453483251, mutationCount=62</p ></html |
From: Stas M. <sma...@wi...> - 2016-08-04 00:03:59
|
Hi! > A pointer to your implementation would be good, I will take a look. Please see: Main repo: https://github.com/wikimedia/wikidata-query-rdf Coordinate parser: https://github.com/wikimedia/wikidata-query-rdf/blob/master/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/inline/literal/WKTSerializer.java it is probably more complex than you need because it supports globes and two coordinate orders. Vocabulary: https://github.com/wikimedia/wikidata-query-rdf/blob/master/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/WikibaseVocabulary.java Services: https://github.com/wikimedia/wikidata-query-rdf/tree/master/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/geo this one works on top of standard services, just to show how you can customize search. Configs: https://github.com/wikimedia/wikidata-query-rdf/blob/master/dist/src/script/RWStore.properties -- Stas Malyshev sma...@wi... |
From: Jem R. <jem...@ft...> - 2016-07-26 08:39:35
|
Hi Stas, A pointer to your implementation would be good, I will take a look. Cheers Jem On 25 July 2016 at 21:01, Stas Malyshev <sma...@wi...> wrote: > Hi! > > > I was wondering if anyone has already worked on Custom Geospatial Data > > types, vocabularies etc for Geonames in Blazegraph ? > > I have implementation of custom geodata in Wikidata Query Service > ( > https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globe_coordinate > ) > but it uses single WKT literal instead of two literals. Making a single > literal custom parser should not be hard - mostly implementing > IGeoSpatialLiteralSerializer and adding suitable config as > com.bigdata.rdf.store.AbstractTripleStore.geoSpatialDatatypeConfig. > I could point you to specific code if interested. > > I don't think there's currently a way to index using multiple predicates. > -- > Stas Malyshev > sma...@wi... > -- *Jem Rayfield* Head of Solution Architecture Technology +44 (0)7709 332482 Number One Southwark Bridge, London SE1 9HL <https://www.facebook.com/financialtimes> <https://twitter.com/FT> <http://www.linkedin.com/company/financial-times> <https://plus.google.com/+FinancialTimes/posts> <http://www.youtube.com/user/FinancialTimesVideos> -- ------------------------------ *This email was sent by a company owned by Financial Times Group Limited ("FT Group <http://aboutus.ft.com/corporate-information/#axzz3rajCSIAt>"), registered office at Number One Southwark Bridge, London SE1 9HL. Registered in England and Wales with company number 879531. This e-mail may contain confidential information. If you are not the intended recipient, please notify the sender immediately, delete all copies and do not distribute it further. It could also contain personal views which are not necessarily those of the FT Group. We may monitor outgoing or incoming emails as permitted by law.* |
From: Jem R. <jem...@ft...> - 2016-07-26 08:19:27
|
Hi Michael, I will take a look at WKT and Stas's implementation. Re:deleting the single-component literals these are expected and are part of the geonames ontology so I will leave these. Thanks! Jem On 25 July 2016 at 19:52, Michael Schmidt <ms...@me...> wrote: > Hi Jem, > > we are aware of this limitation and have been discussing this use case > previously, but (as you guessed) indexing of such “distributed” coordinates > is currently not implemented. So for now there’s no way around the > transformation (you may, however, want to consider deleting the > single-component literals when creating the composed literals, if that’s an > option for you). > > As a side note: there are also standards-based binary coordinate formats > in which you could transform the literals. For instance, Wikidata is using > WKT literals, see the POINT datatype in > https://en.wikipedia.org/wiki/Well-known_text, which could easily be used > in combination with a customer geospatial LiteralSerializer. > > Best, > Michael > > On 25 Jul 2016, at 17:14, Jem Rayfield <jem...@ft...> wrote: > > I guess I could invoke a sparql update as follows: > > INSERT { > ?s <http://jems/latlong> ?latlong > } > WHERE { > ?s <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat ; > <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long . > BIND(STRDT(STR(CONCAT(?lat, "#", ?long)), < > http://jems/custom/latlong/literaltype>) AS ?latlong) > } > > However this will create many millions of essentially redundant statements? > > Question, re: multiple predicate index still stands. > > Cheers > Jem > > > On 25 July 2016 at 15:14, Jem Rayfield <jem...@ft...> wrote: > >> Hello, >> >> I would like to index geonames lat/long using Blazegraphs geospatial >> index. >> >> Geonames lat/longs are provided in the following flavour of RDF: >> >> <http://sws.geonames.org/4667981/> < >> http://www.w3.org/2003/01/geo/wgs84_pos#lat> "35.79787" . >> <http://sws.geonames.org/4667981/> < >> http://www.w3.org/2003/01/geo/wgs84_pos#long> "-83.44683" . >> >> I was wondering if anyone has already worked on Custom Geospatial Data >> types, vocabularies etc for Geonames in Blazegraph ? >> >> >> It appears that one would need flatten the objects into a single >> multidimensional literal for indexing? >> >> <http://sws.geonames.org/4667981/> somenamespace:latlong >> "35.79787#-83.44683" >> >> With the configuration and definition of a new literal type? >> >> I am wondering if its possible to index on multiple known predicates >> rather than multidimensional literals with extended data types? >> >> >> Cheers >> -- >> *Jem Rayfield* >> Head of Solution Architecture >> Technology >> >> +44 (0)7709 332482 >> Number One Southwark Bridge, London SE1 9HL >> >> >> <https://www.facebook.com/financialtimes> <https://twitter.com/FT> >> <http://www.linkedin.com/company/financial-times> >> <https://plus.google.com/+FinancialTimes/posts> >> <http://www.youtube.com/user/FinancialTimesVideos> >> > > > > -- > *Jem Rayfield* > Head of Solution Architecture > Technology > > +44 (0)7709 332482 > Number One Southwark Bridge, London SE1 9HL > > > <https://www.facebook.com/financialtimes> <https://twitter.com/FT> > <http://www.linkedin.com/company/financial-times> > <https://plus.google.com/+FinancialTimes/posts> > <http://www.youtube.com/user/FinancialTimesVideos> > > ------------------------------ > > *This email was sent by a company owned by Financial Times Group Limited > ("FT Group <http://aboutus.ft.com/corporate-information/#axzz3rajCSIAt>"), > registered office at Number One Southwark Bridge, London SE1 9HL. > Registered in England and Wales with company number 879531. This e-mail may > contain confidential information. If you are not the intended recipient, > please notify the sender immediately, delete all copies and do not > distribute it further. It could also contain personal views which are not > necessarily those of the FT Group. We may monitor outgoing or > incoming emails as permitted by law.* > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning > reports. > http://sdm.link/zohodev2dev_______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > -- *Jem Rayfield* Head of Solution Architecture Technology +44 (0)7709 332482 Number One Southwark Bridge, London SE1 9HL <https://www.facebook.com/financialtimes> <https://twitter.com/FT> <http://www.linkedin.com/company/financial-times> <https://plus.google.com/+FinancialTimes/posts> <http://www.youtube.com/user/FinancialTimesVideos> -- ------------------------------ *This email was sent by a company owned by Financial Times Group Limited ("FT Group <http://aboutus.ft.com/corporate-information/#axzz3rajCSIAt>"), registered office at Number One Southwark Bridge, London SE1 9HL. Registered in England and Wales with company number 879531. This e-mail may contain confidential information. If you are not the intended recipient, please notify the sender immediately, delete all copies and do not distribute it further. It could also contain personal views which are not necessarily those of the FT Group. We may monitor outgoing or incoming emails as permitted by law.* |
From: Stas M. <sma...@wi...> - 2016-07-25 20:24:42
|
Hi! > I was wondering if anyone has already worked on Custom Geospatial Data > types, vocabularies etc for Geonames in Blazegraph ? I have implementation of custom geodata in Wikidata Query Service (https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globe_coordinate) but it uses single WKT literal instead of two literals. Making a single literal custom parser should not be hard - mostly implementing IGeoSpatialLiteralSerializer and adding suitable config as com.bigdata.rdf.store.AbstractTripleStore.geoSpatialDatatypeConfig. I could point you to specific code if interested. I don't think there's currently a way to index using multiple predicates. -- Stas Malyshev sma...@wi... |
From: Michael S. <ms...@me...> - 2016-07-25 19:22:12
|
Hi Jem, we are aware of this limitation and have been discussing this use case previously, but (as you guessed) indexing of such “distributed” coordinates is currently not implemented. So for now there’s no way around the transformation (you may, however, want to consider deleting the single-component literals when creating the composed literals, if that’s an option for you). As a side note: there are also standards-based binary coordinate formats in which you could transform the literals. For instance, Wikidata is using WKT literals, see the POINT datatype in https://en.wikipedia.org/wiki/Well-known_text, which could easily be used in combination with a customer geospatial LiteralSerializer. Best, Michael > On 25 Jul 2016, at 17:14, Jem Rayfield <jem...@ft...> wrote: > > I guess I could invoke a sparql update as follows: > > INSERT { > ?s <http://jems/latlong <http://jems/latlong>> ?latlong > } > WHERE { > ?s <http://www.w3.org/2003/01/geo/wgs84_pos#lat <http://www.w3.org/2003/01/geo/wgs84_pos#lat>> ?lat ; > <http://www.w3.org/2003/01/geo/wgs84_pos#long <http://www.w3.org/2003/01/geo/wgs84_pos#long>> ?long . > BIND(STRDT(STR(CONCAT(?lat, "#", ?long)), <http://jems/custom/latlong/literaltype <http://jems/custom/latlong/literaltype>>) AS ?latlong) > } > > However this will create many millions of essentially redundant statements? > > Question, re: multiple predicate index still stands. > > Cheers > Jem > > > On 25 July 2016 at 15:14, Jem Rayfield <jem...@ft... <mailto:jem...@ft...>> wrote: > Hello, > > I would like to index geonames lat/long using Blazegraphs geospatial index. > > Geonames lat/longs are provided in the following flavour of RDF: > > <http://sws.geonames.org/4667981/ <http://sws.geonames.org/4667981/>> <http://www.w3.org/2003/01/geo/wgs84_pos#lat <http://www.w3.org/2003/01/geo/wgs84_pos#lat>> "35.79787" . > <http://sws.geonames.org/4667981/ <http://sws.geonames.org/4667981/>> <http://www.w3.org/2003/01/geo/wgs84_pos#long <http://www.w3.org/2003/01/geo/wgs84_pos#long>> "-83.44683" . > > I was wondering if anyone has already worked on Custom Geospatial Data types, vocabularies etc for Geonames in Blazegraph ? > > > It appears that one would need flatten the objects into a single multidimensional literal for indexing? > > <http://sws.geonames.org/4667981/ <http://sws.geonames.org/4667981/>> somenamespace:latlong "35.79787#-83.44683" > > With the configuration and definition of a new literal type? > > I am wondering if its possible to index on multiple known predicates rather than multidimensional literals with extended data types? > > > Cheers > -- > Jem Rayfield > Head of Solution Architecture > Technology > > +44 (0)7709 332482 <tel:%2B44%20%280%297709%20332482> > Number One Southwark Bridge, London SE1 9HL > > > > <https://www.facebook.com/financialtimes> <https://twitter.com/FT> <http://www.linkedin.com/company/financial-times> <https://plus.google.com/+FinancialTimes/posts> <http://www.youtube.com/user/FinancialTimesVideos> > > > -- > Jem Rayfield > Head of Solution Architecture > Technology > > +44 (0)7709 332482 > Number One Southwark Bridge, London SE1 9HL > > > > <https://www.facebook.com/financialtimes> <https://twitter.com/FT> <http://www.linkedin.com/company/financial-times> <https://plus.google.com/+FinancialTimes/posts> <http://www.youtube.com/user/FinancialTimesVideos> > This email was sent by a company owned by Financial Times Group Limited ("FT Group <http://aboutus.ft.com/corporate-information/#axzz3rajCSIAt>"), registered office at Number One Southwark Bridge, London SE1 9HL. Registered in England and Wales with company number 879531. This e-mail may contain confidential information. If you are not the intended recipient, please notify the sender immediately, delete all copies and do not distribute it further. It could also contain personal views which are not necessarily those of the FT Group. We may monitor outgoing or incoming emails as permitted by law. > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity planning > reports.http://sdm.link/zohodev2dev_______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Jem R. <jem...@ft...> - 2016-07-25 15:14:07
|
I guess I could invoke a sparql update as follows: INSERT { ?s <http://jems/latlong> ?latlong } WHERE { ?s <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat ; <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long . BIND(STRDT(STR(CONCAT(?lat, "#", ?long)), < http://jems/custom/latlong/literaltype>) AS ?latlong) } However this will create many millions of essentially redundant statements? Question, re: multiple predicate index still stands. Cheers Jem On 25 July 2016 at 15:14, Jem Rayfield <jem...@ft...> wrote: > Hello, > > I would like to index geonames lat/long using Blazegraphs geospatial index. > > Geonames lat/longs are provided in the following flavour of RDF: > > <http://sws.geonames.org/4667981/> < > http://www.w3.org/2003/01/geo/wgs84_pos#lat> "35.79787" . > <http://sws.geonames.org/4667981/> < > http://www.w3.org/2003/01/geo/wgs84_pos#long> "-83.44683" . > > I was wondering if anyone has already worked on Custom Geospatial Data > types, vocabularies etc for Geonames in Blazegraph ? > > > It appears that one would need flatten the objects into a single > multidimensional literal for indexing? > > <http://sws.geonames.org/4667981/> somenamespace:latlong > "35.79787#-83.44683" > > With the configuration and definition of a new literal type? > > I am wondering if its possible to index on multiple known predicates > rather than multidimensional literals with extended data types? > > > Cheers > -- > *Jem Rayfield* > Head of Solution Architecture > Technology > > +44 (0)7709 332482 > Number One Southwark Bridge, London SE1 9HL > > > <https://www.facebook.com/financialtimes> <https://twitter.com/FT> > <http://www.linkedin.com/company/financial-times> > <https://plus.google.com/+FinancialTimes/posts> > <http://www.youtube.com/user/FinancialTimesVideos> > -- *Jem Rayfield* Head of Solution Architecture Technology +44 (0)7709 332482 Number One Southwark Bridge, London SE1 9HL <https://www.facebook.com/financialtimes> <https://twitter.com/FT> <http://www.linkedin.com/company/financial-times> <https://plus.google.com/+FinancialTimes/posts> <http://www.youtube.com/user/FinancialTimesVideos> -- ------------------------------ *This email was sent by a company owned by Financial Times Group Limited ("FT Group <http://aboutus.ft.com/corporate-information/#axzz3rajCSIAt>"), registered office at Number One Southwark Bridge, London SE1 9HL. Registered in England and Wales with company number 879531. This e-mail may contain confidential information. If you are not the intended recipient, please notify the sender immediately, delete all copies and do not distribute it further. It could also contain personal views which are not necessarily those of the FT Group. We may monitor outgoing or incoming emails as permitted by law.* |
From: Jem R. <jem...@ft...> - 2016-07-25 14:44:34
|
Hello, I would like to index geonames lat/long using Blazegraphs geospatial index. Geonames lat/longs are provided in the following flavour of RDF: <http://sws.geonames.org/4667981/> < http://www.w3.org/2003/01/geo/wgs84_pos#lat> "35.79787" . <http://sws.geonames.org/4667981/> < http://www.w3.org/2003/01/geo/wgs84_pos#long> "-83.44683" . I was wondering if anyone has already worked on Custom Geospatial Data types, vocabularies etc for Geonames in Blazegraph ? It appears that one would need flatten the objects into a single multidimensional literal for indexing? <http://sws.geonames.org/4667981/> somenamespace:latlong "35.79787#-83.44683" With the configuration and definition of a new literal type? I am wondering if its possible to index on multiple known predicates rather than multidimensional literals with extended data types? Cheers -- *Jem Rayfield* Head of Solution Architecture Technology +44 (0)7709 332482 Number One Southwark Bridge, London SE1 9HL <https://www.facebook.com/financialtimes> <https://twitter.com/FT> <http://www.linkedin.com/company/financial-times> <https://plus.google.com/+FinancialTimes/posts> <http://www.youtube.com/user/FinancialTimesVideos> -- ------------------------------ *This email was sent by a company owned by Financial Times Group Limited ("FT Group <http://aboutus.ft.com/corporate-information/#axzz3rajCSIAt>"), registered office at Number One Southwark Bridge, London SE1 9HL. Registered in England and Wales with company number 879531. This e-mail may contain confidential information. If you are not the intended recipient, please notify the sender immediately, delete all copies and do not distribute it further. It could also contain personal views which are not necessarily those of the FT Group. We may monitor outgoing or incoming emails as permitted by law.* |
From: Alfredo S. <se...@gm...> - 2016-07-07 07:47:50
|
Hi Bryan thank you very much for the really clear explanation. We are doing some research/test for multiple backing stores with large advance, in order to be ready in case if will be possibile, and being able to correctly evaluate possible strategie, drawbacks, etc. So I will follow any updates on this. At the moment the single journal file with multiple context works fine, and I've already also tested the CompactJournalUtility which works ok too, so I'll test also the ExportKB utility for export as an alternative to use a process with SPARQL, thanks for the suggestion. thank you, Alfredo PS: sorry for wrong subject in the email, I have done something wrong with mailing list addresses and I didn't notice the wrong line 2016-07-06 16:41 GMT+02:00 Bryan Thompson <br...@bl...>: > There is a facility to take a snapshot of a journal file. This exists in > the core platform and is automated in the enterprise platform, which also > supports transaction logs, resync, etc. > > Each namespace is stored in the same journal file. So anything that > operates at the journal level handles all namespaces. > > There is an ExportKB utility. This is not integrated into the REST API, > but it could be. This would provide a means to dump a namespace. > > Having multiple backing stores (multi-RWStore) is not trivial. We have > taken some steps to prepare for this, such as including a store file unique > identifier in the HA replication messages, but actually doing this would be > a significant undertaking. Lots of tests, more complex conditions around > atomic commit and rollback, etc. > > Thanks, > Bryan > > On Wednesday, July 6, 2016, Alfredo Serafini <se...@gm...> wrote: > >> Hi >> >> I'm testing blazegraph and exploring some possible configuration options >> for the journal file. >> >> For a project in which I'm involved, we will have an increasing amount of >> data, so we are searching in advance for a robust strategy to conduct >> backups. >> >> Ideally we'd like to test different strategies: >> >> - backup a single dataset / namespace : this seems to be possible by >> using dumps over the specific endpoints exposed for every namespace, while >> still using the same journal file. >> On the other hand, I wonder if there could be a way for avoiding the >> usage of SERVICE statements when we have to query on different contexts >> stored on the same instance, (avoiding materializations and thus a loto of >> duplicated data)? >> - backup the journal file itself: if possible we'd like to have it >> physically splitted for dataset, but I didn't find any references to such a >> feature. >> Moreover: if this feature it's not available, do you think it could >> be possible to hack a bit the classes that are currently handling the file, >> creating a n intermediate class which could handle multiple files at the >> same time transparently? I imagine something like a "MultipleRWStore", just >> to say. >> >> Sorry if the questions may seem weird: any suggestions / criticism is >> very welcome >> >> Thank you in advance (and apologies in case this was not the right >> address to post requesting help) >> >> Alfredo >> > |
From: Bryan T. <br...@bl...> - 2016-07-06 14:41:41
|
There is a facility to take a snapshot of a journal file. This exists in the core platform and is automated in the enterprise platform, which also supports transaction logs, resync, etc. Each namespace is stored in the same journal file. So anything that operates at the journal level handles all namespaces. There is an ExportKB utility. This is not integrated into the REST API, but it could be. This would provide a means to dump a namespace. Having multiple backing stores (multi-RWStore) is not trivial. We have taken some steps to prepare for this, such as including a store file unique identifier in the HA replication messages, but actually doing this would be a significant undertaking. Lots of tests, more complex conditions around atomic commit and rollback, etc. Thanks, Bryan On Wednesday, July 6, 2016, Alfredo Serafini <se...@gm...> wrote: > Hi > > I'm testing blazegraph and exploring some possible configuration options > for the journal file. > > For a project in which I'm involved, we will have an increasing amount of > data, so we are searching in advance for a robust strategy to conduct > backups. > > Ideally we'd like to test different strategies: > > - backup a single dataset / namespace : this seems to be possible by > using dumps over the specific endpoints exposed for every namespace, while > still using the same journal file. > On the other hand, I wonder if there could be a way for avoiding the > usage of SERVICE statements when we have to query on different contexts > stored on the same instance, (avoiding materializations and thus a loto of > duplicated data)? > - backup the journal file itself: if possible we'd like to have it > physically splitted for dataset, but I didn't find any references to such a > feature. > Moreover: if this feature it's not available, do you think it could be > possible to hack a bit the classes that are currently handling the file, > creating a n intermediate class which could handle multiple files at the > same time transparently? I imagine something like a "MultipleRWStore", just > to say. > > Sorry if the questions may seem weird: any suggestions / criticism is > very welcome > > Thank you in advance (and apologies in case this was not the right address > to post requesting help) > > Alfredo > |
From: Alfredo S. <se...@gm...> - 2016-07-06 10:48:04
|
Hi I'm testing blazegraph and exploring some possible configuration options for the journal file. For a project in which I'm involved, we will have an increasing amount of data, so we are searching in advance for a robust strategy to conduct backups. Ideally we'd like to test different strategies: - backup a single dataset / namespace : this seems to be possible by using dumps over the specific endpoints exposed for every namespace, while still using the same journal file. On the other hand, I wonder if there could be a way for avoiding the usage of SERVICE statements when we have to query on different contexts stored on the same instance, (avoiding materializations and thus a loto of duplicated data)? - backup the journal file itself: if possible we'd like to have it physically splitted for dataset, but I didn't find any references to such a feature. Moreover: if this feature it's not available, do you think it could be possible to hack a bit the classes that are currently handling the file, creating a n intermediate class which could handle multiple files at the same time transparently? I imagine something like a "MultipleRWStore", just to say. Sorry if the questions may seem weird: any suggestions / criticism is very welcome Thank you in advance (and apologies in case this was not the right address to post requesting help) Alfredo |
From: Jim B. <ba...@gm...> - 2016-06-25 00:44:59
|
Thanks! I was actually just about to post back to the list that I figured out how to do this using BigdataSailRepositoryConnection.addChangeLog(IChangeLog log). I just use a simple IChangeLog that counts the changes for one update. I can post this on the ticket. Thanks, Jim > On Jun 24, 2016, at 2:33 PM, Bryan Thompson <br...@bl...> wrote: > > Jim, > > There is a ticket which would address this. It is BLZG-824 <https://jira.blazegraph.com/browse/BLZG-824>. Could you offer some feedback on that ticket and we can try to raise bring it into a sprint soon. > > Thanks, > Bryan > > On Mon, Jun 13, 2016 at 3:50 PM, Jim Balhoff <ba...@gm... <mailto:ba...@gm...>> wrote: > Hi, > > Is there a client API that would allow me to get the mutation count from a SPARQL update? I see that mutationCount is returned from the HTTP interface, but I was hoping to get the same information when submitting a SPARQL update via the Sesame API. I see that Update.execute() returns void. > > I am trying to run a sequence of SPARQL updates to do some “reasoning”, and stop when no further triples are being inserted. I suppose I could write these as rules, but for this job it just seemed a little simpler to use SPARQL update. > > Thanks, > Jim > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e <https://ad.doubleclick.net/ddm/clk/305295220;132659582;e> > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <mailto:Big...@li...> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > |