This list is closed, nobody may subscribe to it.
2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Lee K. <le...@sw...> - 2015-04-27 15:48:19
|
Hi, We are trying to perform a bulk import into a new blazegraph journal. The import process writes quads to an in-process BigdataSailRepository with the following configuration based on the 'fastload' settings in the bigdata-sails samples directory: com.bigdata.rdf.store.AbstractTripleStore.quadsMode=true com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms com.bigdata.rdf.sail.truthMaintenance=false com.bigdata.rdf.store.AbstractTripleStore.justify=false com.bigdata.journal.AbstractJournal.initialExtent=209715200 com.bigdata.journal.AbstractJournal.maximumExtent=209715200 com.bigdata.rdf.store.AbstractTripleStore.textIndex=false com.bigdata.journal.AbstractJournal.bufferMode=DiskRW com.bigdata.sail.isolatableIndices=true com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary com.bigdata.journal.AbstractJournal.file=bigdata_conf.jnl com.bigdata.journal.AbstractJournal.writeCacheBufferCount=2000 com.bigdata.btree.writeRetentionQueue.capacity=8000 When run against a native sesame repository, the import takes around 50 hours. When run against the blazegraph repository the import slows down significantly after 2-3 hours and begins logging warnings of the form: [2015-04-27 07:34:30,238][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.OCSP, 8 records (#nodes=3, #leaves=5) in 5493ms : addrRoot=-244779124025982418 [2015-04-27 07:47:48,342][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.SOPC, 1 records (#nodes=1, #leaves=0) in 40841ms : addrRoot=-246059333517835846 [2015-04-27 07:47:48,858][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.SPOC, 7 records (#nodes=4, #leaves=3) in 42109ms : addrRoot=-246099989678259484 [2015-04-27 07:54:47,743][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.SOPC, 1 records (#nodes=1, #leaves=0) in 43231ms : addrRoot=-245678000551493109 [2015-04-27 07:54:52,251][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.SPOC, 1 records (#nodes=1, #leaves=0) in 44875ms : addrRoot=-245097441232158259 [2015-04-27 07:54:52,251][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.CSPO, 1 records (#nodes=1, #leaves=0) in 34808ms : addrRoot=-245097501361700476 [2015-04-27 07:54:52,251][WARN][com.bigdata.btree.AbstractBTree] wrote: name=kb.spo.POCS, 1 records (#nodes=1, #leaves=0) in 44875ms : addrRoot=-245097342447910551 Are there any settings we should change or add to the journal configuration to prevent this slowdown? Thanks |
From: Andreas K. <ka...@bs...> - 2015-04-27 12:38:35
|
Hello Bryan & Martin, Sorry for the long delay. Now I ran two dumpJournal&dumpPages: 1. Dump while the SPARQL LOAD was running with groupCommit and smallSlotOptimization enabled (the one that cannot finish due to disk space) 2. Dump after the whole file was successfully loaded because I disabled groupCommit (I could also use groupCommit and disable smallSlots) I will do what I can to help you testing and tracking down the problem. For me here it is not too much trouble working with the knowledge that I can only activate one of the both features at a time. Best Regards Andreas P.S. I also followed your advice to increase com.bigdata.rdf.sail.bufferCapacity as you can see from the settings of run No. 2: triples:/tmp # curl -H "Accept: text/plain" http://localhost:8080/bigdata/namespace/gnd/properties #Mon Apr 27 14:26:37 CEST 2015 com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 com.bigdata.relation.container=gnd com.bigdata.rwstore.RWStore.smallSlotType=1024 com.bigdata.journal.AbstractJournal.bufferMode=DiskRW com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl com.bigdata.journal.AbstractJournal.initialExtent=209715200 com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary com.bigdata.rdf.store.AbstractTripleStore.textIndex=true com.bigdata.btree.BTree.branchingFactor=700 com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms com.bigdata.rdf.sail.isolatableIndices=false com.bigdata.service.AbstractTransactionService.minReleaseAge=1 com.bigdata.rdf.sail.bufferCapacity=200000 com.bigdata.rdf.sail.truthMaintenance=false com.bigdata.rdf.sail.namespace=gnd com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore com.bigdata.rdf.store.AbstractTripleStore.quads=false com.bigdata.journal.AbstractJournal.writeCacheBufferCount=2000 com.bigdata.search.FullTextIndex.fieldsEnabled=false com.bigdata.relation.namespace=gnd com.bigdata.btree.writeRetentionQueue.capacity=10000 com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false >>> Bryan Thompson <br...@sy...> 24.04.15 18.45 Uhr >>> Martyn and I discussed this in some depth today. We've reopened the ticket to: a. gain more understanding of the interaction of the small slot optimization and group commit. b. verify correct reporting by the allocators in dumpJournal. c. modify the small slots optimization allocator policy to make it less susceptible to mis-configuration. In the data as loaded, the OSP index was 66% blob slots (greater than 8k). For the small slot optimization to be effective the O(C)SP index should target a page size of 64-256 bytes. (c) should minimize or remove the negative impact of the small slot optimization in such cases. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Apr 24, 2015 at 8:35 AM, Martyn Cutcher <ma...@sy...> wrote: > I don't see how the small slot optimisation can result in more waste with > larger allocators. > > It is simply a mechanism to avoid rapid re-allocation of the small slotllocator dump, there are a lot of 64 byte allocators. > Unlike the larger allocators (128 and greater) a large proportion of the 64 > byte slots will be used for long literal values (note that the mean > allocation is only 27 bytes). > > Counter intuitively, there may well be a case for excluding the 64 byte > allocators from the "small slot optimisation". So "small slot" NOT > "smallest slot" ;-) > > - Martyn > > On 24/04/2015 00:18, Bryan Thompson wrote: > > I've updated the ticket. I've also copied my main conclusions inline below. > > I think that the issue here is the use of the small slot optimization > without proper configuration of the indices in order to target small > allocation slots for at least one of the indices. The small slot > optimization changes the allocation policy in two ways. > > 1. It has a strong preference to use only empty 8k pages for small > allocations (as configured, for allocations less than 1k). This allows us > to coalesce writes by combining them onto the same page. > 2. It has a preference to use allocation blocks that are relatively empty > for small slots. > > As a consequence, the small slot optimization MAY recruit more allocators > in order to have allocators for small slots that have good sparsity. > > The main goal of the small slot optimization is to optimize for indices > that have very scattered IO patterns. The indices that exhibits this the > most are the OSP and OCSP indices. In many cases even batched updates will > modify no more than a single tuple per page on this index. However, in > your configuration (and in mine when I enabled the small slot optimization > without adjusting the branching factors), the O(C)SP indices were not > created with a small branching factor, so the small slot allocation could > not be put to any good effect. However it did have a negative effect -- by > recruiting more allocators. If you want to use the small slot > optimization, make sure that at least the O(C)SP index has a relatively > small branching factor giving an effective slot size of 256 bytes or less > on average. > > I suggest that you retest w/o the small slot optimization and with group > commit still enabled. > > I've asked Martyn to look over the allocators from the small slot > optimization run and think about whether we can make this policy a little > more adaptive when the branching factors are not really tuned properly and > too many allocators with too much wasted space are allocated as a result. > Basically, how to avoid file bloat from misconfiguration. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> wrote: > > > Ok, I can redo the test with smallSlots + groupCommit enabled, and runhttp://localhost:8080/bigdata/status?dumpJournal&dumpPages after some > minutes. (I cannot run it on the fully loadedjust one of my many attempts to improve IO Perfomance on rotating disks. > > Best Regards > Andreas > > > Bryan Thompson <br...@sy...> <br...@sy...> 23.04.15 15.31 Uhr >>> > > I just noticed that you have the full text index enabled as well. I have > not be enabling that. > > I would like to see the output from this command on the fully loaded data > sets. > http://localhost:8080/bigdata/status?dumpJournal&dumpPages > > This will let us if any specific index is taking up a very large number of > pages. It will also tell us the distribution over the page sizes for each > index. > > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> > wrote: > > > Bryan, > > in the meantime, I could successfully load the file into a 18GB journal > after disabling groupCommit (I simply commented out the line in > RWStore.properties). > I can try again with groupCommit enabled, but smallSlotOptimization > disabled. > > Best Regards > Andreas > > > Bryan Thompson <br...@sy...> <br...@sy...> 23.04.2015 13:24 >>> > > Andreas, > > I was not able to replicate your result. Unfortunately I navigated away > from the browser page in which I had submitted the request, so it loaded > all the data but failed to commit. However, the resulting file is only > 16GB. > > I will redo this run and verify that the journal after the commit has > > this > > same size on the disk. > > I was only assuming that this was related to group commit because of your > original message. Perhaps I misinterpreted your message. This is simply > about 1.5.1 (with group commit) vs 1.4.0. > > Perhaps the issue is related to the small slot optimization? Maybe in > combination with group commit? > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > I could not replicate your properties exactly because you are using a > non-standard vocabulary class. Therefore I simply deleted the default > namespace (in quads mode) and recreated it with the defaults in triples > mode. The small slot optimization and other parameters were not enabled > > in > > my run. > > Perhaps you could try to replicate my experience and I will enable the > small slots optimization? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email a> prohibited. If you have received this communication in error, please > > notify > > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> > wrote: > > > Bryan & Martyn, > > Thank you very much for investigating the issue. I assume from the > > ticket > > that the error will vanish if I disable groupCommit. I will do so for > > the > > meantime. > > Although there is already extensive information in Bryan's ticket, > > please > > find attached my logs and DumpJournal outputs: > - dumpJournal.html contains a dump from the 67GB journal after > > Blazegraph > > ran into "No space left on device" > - dumpJournalWithTraceEnabled.html is the same dump for a running query > when the journal was at about 14GB > - queryStatus.html is just the status page showing my query > - catalina.out.gz contains the trace outputs from starting Tomcat > > until I > > killed the curl running the SPARQL Update by Ctrl-C > - loadGnd.log.gz is Blazegraphs output when loading the data > > Best Regards > Andreas > > > > > Bryan Thompson <br...@sy...> <br...@sy...> 22.04.15 20.56 Uhr >>> > > See http://trac.bigdata.com/ticket/1206. This is still in the > investigation stage. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive > > new > > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > > are > > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments > > is > > prohibited. If you have received this communication in error, please > > notify > > the sender by reply email and permanently delete all copies of the > > email > > and its contents and attachments. > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> > wrote: > > > Hello everyone, > > I currently updated to the current Revision (f4c63e5) of Blazegraph > > from > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > 1.4.0 > > this resulted in a journal of ~18GB. Now the process was cancelled > > because > > the disk was full - the journal was beyond 50GB for the same file > > with > > the > > same settings. > The only exception was that I activated GroupCommit. > > The dataset can be downloaded here: > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > . > Please find the settings used to load the file below. > > Do I have a misconfiguration, or is there a bug eating all disk > > memory? > > Best regards > Andreas > > Namespace-Properties: > curl -H "Accept: text/plain"http://localhost:8080/bigdata/namespace/gnd/properties > #Wed Apr 22 11:35:31 CEST 2015 > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.relation.container=gnd > com.bigdata.rwstore.RWStore.smallSlotType=1024 > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true > > com.bigdata.btree.BTree.branchingFactor=7ionService.minReleaseAge=1 > com.bigdata.rdf.sail.bufferCapacity=2000 > com.bigdata.rdf.sail.truthMaintenance=false > com.bigdata.rdf.sail.namespace=gnd > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > com.bigdata.rdf.store.AbstractTripleStore.quads=false > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > com.bigdata.search.FullTextIndex.fieldsEnabled=false > com.bigdata.relation.namespace=gndity=10000 > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > ------------------------------------------------------------------------------ > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exerciseshttp://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > _______________________________________________ > Bigdata-developers mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > > > _______________________________________________ > Bigdata-developers mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Kaushik C. <kay...@gm...> - 2015-04-27 07:12:36
|
Hi, Please have a look at my question in StackOverflow where I've asked the reason of a sparql query not giving me the intended results while I'm running the NanoSparqlServer http://stackoverflow.com/questions/29806316/owlrestriction-reasoning-not-working-in-blazegraph Thanks in advance. |
From: Bryan T. <br...@sy...> - 2015-04-24 21:37:09
|
I think that this might be a bug in the postOrderIteratorWithAnnotations() method. Which lacks a unit test. I've create a new ticket for that issue. See #1210. Martyn is the striterator wizard so I've asked him to take a look at this. I have incorporated a version of your test case for the wildcard rewrites which I believe to be correct into our master development branch. The other way to fix this is by explicit recursion in the ASTWildcardProjectionOptimizer. That is actually how most of the rewrites are implemented which is why we are not using the postOrderIteratorWithAnnotations() elsewhere and why this has gone undetected. I am going to be on travel for several days. Hopefully this bug will resolve as soon as the postOrderIteratorWithAnnotations() issue is resolved. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Wed, Apr 22, 2015 at 5:41 AM, Lee Kitching <le...@sw...> wrote: > Hi Bryan, > > Yes the AST in the test is supposed to be for the query > > select (count(*) as ?c) where { > select * where { > select * where { ?s ?p ?o } > } limit 21 offset 0 > } > > Thanks > > On Tue, Apr 21, 2015 at 7:53 PM, Bryan Thompson <br...@sy...> wrote: > >> Lee, >> >> I can replicate the problem with your query (as given above) against the >> sparql end point. >> >> Can you state the SPARQL that you are trying to model with this unit >> test? It appears to be not query the same as your SPARQL query above. I >> would like to make sure that it is being translated correctly into the >> AST. I can then look at the expected AST and work backwards and see if I >> believe that the test shows the problem. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://blazegraph.com >> http://blog.bigdata.com <http://bigdata.com> >> http://mapgraph.io >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Tue, Apr 21, 2015 at 11:07 AM, Lee Kitching <le...@sw...> wrote: >> >>> Hi Bryan, >>> >>> We allow users to enter their own SPARQL queries and wrap them to do >>> things like pagination so unfortunately we cannot just re-write our queries >>> to do the expansion manually. >>> I applied the fix detailed in the ticket and it fixes the for the query >>> I provided, however it fails to rewrite the following query: >>> >>> SELECT (COUNT(*) as ?c) { >>> SELECT * { >>> SELECT * WHERE { ?s ?p ?o } >>> } LIMIT 21 OFFSET 0 >>> } >>> >>> I attempted to debug the issue, and it seems to re-write the * >>> projection in the inner-most subquery but not the subquery with the limit >>> and offset. I created a test based on the >>> existing tests: >>> >>> public void test_wildcardProjectionOptimizer03() { >>> >>> /* >>> * Note: DO NOT share structures in this test!!!! >>> */ >>> final IBindingSet[] bsets = new IBindingSet[] {}; >>> >>> // The source AST. >>> final QueryRoot given = new QueryRoot(QueryType.SELECT); >>> { >>> final SubqueryRoot selectQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final JoinGroupNode whereClause1 = new JoinGroupNode(); >>> final StatementPatternNode spoPattern = new >>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"), >>> null, Scope.DEFAULT_CONTEXTS); >>> whereClause1.addChild(spoPattern); >>> >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("*")); >>> selectQuery.setProjection(p); >>> selectQuery.setWhereClause(whereClause1); >>> } >>> >>> final SubqueryRoot sliceQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("*")); >>> sliceQuery.setProjection(p); >>> >>> final JoinGroupNode whereClause = new JoinGroupNode(); >>> whereClause.addChild(selectQuery); >>> >>> sliceQuery.setSlice(new SliceNode(0, 21)); >>> } >>> >>> final FunctionNode countNode = new FunctionNode( >>> FunctionRegistry.COUNT, >>> Collections.EMPTY_MAP, >>> new VarNode("*")); >>> >>> final ProjectionNode countProjection = new ProjectionNode(); >>> countProjection.addProjectionExpression(new >>> AssignmentNode(new VarNode("c"), countNode)); >>> >>> JoinGroupNode countWhere = new JoinGroupNode(); >>> countWhere.addChild(sliceQuery); >>> >>> given.setProjection(countProjection); >>> given.setWhereClause(countWhere); >>> } >>> >>> final QueryRoot expected = new QueryRoot(QueryType.SELECT); >>> { >>> final SubqueryRoot selectQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final JoinGroupNode whereClause1 = new JoinGroupNode(); >>> final StatementPatternNode spoPattern = new >>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"), >>> null, Scope.DEFAULT_CONTEXTS); >>> whereClause1.addChild(spoPattern); >>> >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("s")); >>> p.addProjectionVar(new VarNode("p")); >>> p.addProjectionVar(new VarNode("o")); >>> selectQuery.setProjection(p); >>> selectQuery.setWhereClause(whereClause1); >>> } >>> >>> final SubqueryRoot sliceQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("s")); >>> p.addProjectionVar(new VarNode("p")); >>> p.addProjectionVar(new VarNode("o")); >>> >>> sliceQuery.setProjection(p); >>> >>> final JoinGroupNode whereClause = new JoinGroupNode(); >>> whereClause.addChild(selectQuery); >>> >>> sliceQuery.setSlice(new SliceNode(0, 21)); >>> } >>> >>> final FunctionNode countNode = new FunctionNode( >>> FunctionRegistry.COUNT, >>> Collections.EMPTY_MAP, >>> new VarNode("*")); >>> >>> final ProjectionNode countProjection = new ProjectionNode(); >>> countProjection.addProjectionExpression(new >>> AssignmentNode(new VarNode("c"), countNode)); >>> >>> JoinGroupNode countWhere = new JoinGroupNode(); >>> countWhere.addChild(sliceQuery); >>> >>> expected.setProjection(countProjection); >>> expected.setWhereClause(countWhere); >>> } >>> >>> final IASTOptimizer rewriter = new >>> ASTWildcardProjectionOptimizer(); >>> >>> final IQueryNode actual = rewriter.optimize(null/* >>> AST2BOpContext */, >>> given/* queryNode */, bsets); >>> >>> assertSameAST(expected, actual); >>> >>> } >>> >>> however I am having some problems running the tests locally so I don't >>> know if it accurately models the situation. >>> >>> Thanks >>> >>> >>> >>> On Mon, Apr 20, 2015 at 9:05 PM, Bryan Thompson <br...@sy...> >>> wrote: >>> >>>> Lee, >>>> >>>> I've updated the ticket with the code changes and the test changes. >>>> Please try this out and let me know if you have any problems. >>>> >>>> Thanks, >>>> Bryan >>>> >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://blazegraph.com >>>> http://blog.bigdata.com <http://bigdata.com> >>>> http://mapgraph.io >>>> >>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive >>>> new technology to use GPUs to accelerate data-parallel graph analytics. >>>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >>>> are for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments is >>>> prohibited. If you have received this communication in error, please notify >>>> the sender by reply email and permanently delete all copies of the email >>>> and its contents and attachments. >>>> >>>> On Mon, Apr 20, 2015 at 1:20 PM, Lee Kitching <le...@sw...> wrote: >>>> >>>>> Hi, >>>>> >>>>> We are currently evaluating using Blazegraph as our rdf database and >>>>> have run in the issue described at http://trac.bigdata.com/ticket/757. >>>>> The below query causes the AssertionError to be thrown: >>>>> >>>>> SELECT (COUNT(*) as ?c) { >>>>> SELECT ?uri ?graph where { >>>>> { >>>>> SELECT * WHERE { >>>>> GRAPH ?graph { >>>>> ?uri a <http://object> . >>>>> ?uri <http://purl.org/dc/terms/title> ?title . >>>>> } >>>>> MINUS { >>>>> ?uri a <http://other> >>>>> } >>>>> } >>>>> ORDER BY ?title >>>>> } >>>>> } >>>>> } >>>>> >>>>> Some debugging shows that the error is caused by the >>>>> ASTWildcardProjectionOptimizer failing to recurse into the subqueries to >>>>> rewrite the * projection. However this recursion is implemented in the >>>>> BOpUtility.postOrderIterator(BOp) method - this method uses the argIterator >>>>> to >>>>> find child operators and therefore only visits children for nodes with >>>>> an arity > 0. >>>>> >>>>> The root query node for the above query has an empty 'args' collection >>>>> and all the associated components of the top-level query are stored in the >>>>> annotations map. It looks like the iterator should search through the >>>>> annotations rather than the args for query nodes. >>>>> >>>>> As there are a lot of implementations of the BOp interface, it seems >>>>> that changing the postOrderIterator2(BOp) method is unlikely to be the >>>>> correct fix. It seems that either the AST query nodes should override the >>>>> arity() function to return the count of the annotations map, or the >>>>> ASTWildcardProjectionOptimizer should use its own iterator for the nodes of >>>>> the query. The latter option would be the least impactful change but I am >>>>> not familiar with the codebase to understand the correct fix. >>>>> >>>>> Any help in resolving the issue would be appreciated. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>>>> Develop your own process in accordance with the BPMN 2 standard >>>>> Learn Process modeling best practices with Bonita BPM through live >>>>> exercises >>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >>>>> event?utm_ >>>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >>>>> _______________________________________________ >>>>> Bigdata-developers mailing list >>>>> Big...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>>> >>>>> >>>> >>> >> > |
From: Bryan T. <br...@sy...> - 2015-04-24 17:11:33
|
We have not been using it ourselves. No known issues. The code base is still at Java 7 compatibility. We have been discussing when to move the code base to Java 8. Personally, I think that this will happen in the 2.0 release. ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Apr 24, 2015 at 12:44 PM, Jeremy J Carroll <jj...@sy...> wrote: > Is java 8 supported? > Any issues? > > Jeremy > > > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Jeremy J C. <jj...@sy...> - 2015-04-24 17:08:47
|
Is java 8 supported? Any issues? Jeremy |
From: Bryan T. <br...@sy...> - 2015-04-24 16:44:43
|
Martyn and I discussed this in some depth today. We've reopened the ticket to: a. gain more understanding of the interaction of the small slot optimization and group commit. b. verify correct reporting by the allocators in dumpJournal. c. modify the small slots optimization allocator policy to make it less susceptible to mis-configuration. In the data as loaded, the OSP index was 66% blob slots (greater than 8k). For the small slot optimization to be effective the O(C)SP index should target a page size of 64-256 bytes. (c) should minimize or remove the negative impact of the small slot optimization in such cases. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Apr 24, 2015 at 8:35 AM, Martyn Cutcher <ma...@sy...> wrote: > I don't see how the small slot optimisation can result in more waste with > larger allocators. > > It is simply a mechanism to avoid rapid re-allocation of the small slot > allocators to attempt to improve write elision on recycled slots. > > In the latest Allocator dump, there are a lot of 64 byte allocators. > Unlike the larger allocators (128 and greater) a large proportion of the 64 > byte slots will be used for long literal values (note that the mean > allocation is only 27 bytes). > > Counter intuitively, there may well be a case for excluding the 64 byte > allocators from the "small slot optimisation". So "small slot" NOT > "smallest slot" ;-) > > - Martyn > > On 24/04/2015 00:18, Bryan Thompson wrote: > > I've updated the ticket. I've also copied my main conclusions inline below. > > I think that the issue here is the use of the small slot optimization > without proper configuration of the indices in order to target small > allocation slots for at least one of the indices. The small slot > optimization changes the allocation policy in two ways. > > 1. It has a strong preference to use only empty 8k pages for small > allocations (as configured, for allocations less than 1k). This allows us > to coalesce writes by combining them onto the same page. > 2. It has a preference to use allocation blocks that are relatively empty > for small slots. > > As a consequence, the small slot optimization MAY recruit more allocators > in order to have allocators for small slots that have good sparsity. > > The main goal of the small slot optimization is to optimize for indices > that have very scattered IO patterns. The indices that exhibits this the > most are the OSP and OCSP indices. In many cases even batched updates will > modify no more than a single tuple per page on this index. However, in > your configuration (and in mine when I enabled the small slot optimization > without adjusting the branching factors), the O(C)SP indices were not > created with a small branching factor, so the small slot allocation could > not be put to any good effect. However it did have a negative effect -- by > recruiting more allocators. If you want to use the small slot > optimization, make sure that at least the O(C)SP index has a relatively > small branching factor giving an effective slot size of 256 bytes or less > on average. > > I suggest that you retest w/o the small slot optimization and with group > commit still enabled. > > I've asked Martyn to look over the allocators from the small slot > optimization run and think about whether we can make this policy a little > more adaptive when the branching factors are not really tuned properly and > too many allocators with too much wasted space are allocated as a result. > Basically, how to avoid file bloat from misconfiguration. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> wrote: > > > Ok, I can redo the test with smallSlots + groupCommit enabled, and runhttp://localhost:8080/bigdata/status?dumpJournal&dumpPages after some > minutes. (I cannot run it on the fully loaded dataset because my disk is > not sufficient for the resulting Journal). > > By the way: Please find attached my custom Vocabulary classes. They are > just one of my many attempts to improve IO Perfomance on rotating disks. > > Best Regards > Andreas > > > Bryan Thompson <br...@sy...> <br...@sy...> 23.04.15 15.31 Uhr >>> > > I just noticed that you have the full text index enabled as well. I have > not be enabling that. > > I would like to see the output from this command on the fully loaded data > sets. > http://localhost:8080/bigdata/status?dumpJournal&dumpPages > > This will let us if any specific index is taking up a very large number of > pages. It will also tell us the distribution over the page sizes for each > index. > > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> > wrote: > > > Bryan, > > in the meantime, I could successfully load the file into a 18GB journal > after disabling groupCommit (I simply commented out the line in > RWStore.properties). > I can try again with groupCommit enabled, but smallSlotOptimization > disabled. > > Best Regards > Andreas > > > Bryan Thompson <br...@sy...> <br...@sy...> 23.04.2015 13:24 >>> > > Andreas, > > I was not able to replicate your result. Unfortunately I navigated away > from the browser page in which I had submitted the request, so it loaded > all the data but failed to commit. However, the resulting file is only > 16GB. > > I will redo this run and verify that the journal after the commit has > > this > > same size on the disk. > > I was only assuming that this was related to group commit because of your > original message. Perhaps I misinterpreted your message. This is simply > about 1.5.1 (with group commit) vs 1.4.0. > > Perhaps the issue is related to the small slot optimization? Maybe in > combination with group commit? > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > I could not replicate your properties exactly because you are using a > non-standard vocabulary class. Therefore I simply deleted the default > namespace (in quads mode) and recreated it with the defaults in triples > mode. The small slot optimization and other parameters were not enabled > > in > > my run. > > Perhaps you could try to replicate my experience and I will enable the > small slots optimization? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attacP. Any > > unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please > > notify > > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> > wrote: > > > Bryan & Martyn, > > Thank you very much for investigating the issue. I assume from the > > ticket > > that the error will vanish if I disable groupCommit. I will do so for > > the > > meantime. > > Although there is already extensive information in Bryan's ticket, > > please > > find attached my logs and DumpJournal outputs: > - dumpJournal.html contains a dump from the 67GB journal after > > Blazegraph > > ran into "No space left on device" > - dumpJournalWithTraceEnabled.html is the same dump for a running query > when the journal was at about 14GB > - queryStatus.html is just the status page showing my query > - catalina.out.gz contains the trace outputs from starting Tomcat > > until I > > killed the curl running the SPARQL Update by Ctrl-C > - loadGnd.log.gz is Blazegraphs output when loading the data > > Best Regards > Andreas > > > > > Bryan Thompson <br...@sy...> <br...@sy...> 22.04.15 20.56 Uhr >>> > > See http://trac.bigdata.com/ticket/1206. This is still in the > investigation stage. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 274...@sy...http://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive > > new > > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > > are > > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments > > is > > prohibited. If you have received this communication in error, please > > notify > > the sender by reply email and permanently delete all copies of the > > email > > and its contents and attachments. > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> > wrote: > > > Hello everyone, > > I currently updated to the current Revision (f4c63e5) of Blazegraph > > from > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > 1.4.0 > > this resulted in a journal of ~18GB. Now the process was cancelled > > because > > the disk was full - the journal was beyond 50GB for the same file > > with > > the > > same settings. > The only exception was that I activated GroupCommit. > > The dataset can be downloaded here: > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > . > Please find the settings used to load the file below. > > Do I have a misconfiguration, or is there a bug eating all disk > > memory? > > Best regards > Andreas > > Namespace-Properties: > curl -H "Accept: text/plain"http://localhost:8080/bigdata/namespace/gnd/properties > #Wed Apr 22 11:35:31 CEST 2015 > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.relation.container=gnd > com.bigdata.rwstore.RWStore.smallSlotType=1024 > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true > > com.bigdata.btree.BTree.branchingFactor=700 > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > com.bigdata.rdf.sail.isolatableIndices=false > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > com.bigdata.rdf.sail.bufferCapacity=2000 > com.bigdata.rdf.sail.truthMaintenance=false > com.bigdata.rdf.sail.namespace=gnd > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > com.bigdata.rdf.store.AbstractTripleStore.quads=false > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > com.bigdata.search.FullTextIndex.fieldsEnabled=false > com.bigdata.relation.namespace=gndity=10000 > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > ------------------------------------------------------------------------------ > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exerciseshttp://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > _______________________________________________ > Bigdata-developers mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > > > _______________________________________________ > Bigdata-developers mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Bryan T. <br...@sy...> - 2015-04-24 16:35:32
|
Rick, I would recommend that you model your problem a bit differently. I will give some suggestions as to how you might do this, but first let me explain how we handle such moves and storage reclamation. - Blazegraph does a COPY + DELETE for SPARQL UPDATE "MOVE". You might be able to hack this. I will outline how below. - Blazegraph recycles storage. This is documented in some depth on the wiki, but the basic concept is that allocation slots are recycled once they no longer have data that is visible from a retained commit point. Let me suggest some ways in which you might achieve your goals without a performance penalty. As I see it, you are basically trying to change the state associated with a named graph as you move it along in some workflow. The first two options would require you to manage metadata (in yet another graph) mapping workflow state URIs onto fixed URIs associated with a named graph. When you change the workflow state, you are just changing the mapping between the external URI and the fixed URI naming the graph internally. Either of these approaches would give you constant time "renames". 1. Use named graphs. But, per above, do the rename outside of the quads store. You can either use a special named graph to old this mapping or you can have yet another graph in the database that has this mapping. We even have support for "virtual graphs" that might let you do this out of the box. See http://wiki.blazegraph.com/wiki/index.php/VirtualGraphs 2. Using multiple triple stores. SPARQL "quads" (named graphs) provides the ability to transparently query across the named graphs either extracting their identifiers (using the GRAPH keyword for a named graph access path) or collapsing duplicate statements onto distinct statements (for a default graph access path). If each of these named graphs is really just being used as its own triple store, then you can have many different triple stores in a single blazegraph instance. Just put each one into its own namespace. The 3rd approach is more in the spirit of hacking the rename. 3. Hacking the rename. Ok, you effectively want to change the name of the graph. Internally each statement in a graph has an IV (Internal Value) in the 4th position of the statement tuple that is the graph identifier. If you need to modify those IVs, then you are going to touch a lot of data. Not constant time operation. The alternative is to hack the dictionary. You would *replace* the entry in the TERM2ID dictionary (mapping the URI onto an IV) with a different entry mapping the new URI onto the same IV. You would also update the reverse lookup (in ID2TERM). The old URI will "disappear". The new URI will be mapped to the data associated with the old URI. This would be a constant time operation. However, it WILL NOT work if the new URI is already defined since it would then orphan any data associated with the IV for the new URI. If your URIs are always new when you do this "rename" then you could use this mechanism. We could not make this a general purpose rename. We could perhaps do this rename if we could prove that the new URI was not pre-existing through some clever code. Either we or you could implement this as a special operator for your application. Let me know if you want to setup a telcon to discuss any of this. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Apr 24, 2015 at 7:51 AM, Rick Moynihan <ri...@sw...> wrote: > Hi all, > > We've recently been evaluating quad-stores, and in particular are looking > for better storage layers, and Blazegraph looks like a promising option. > > We have a linked data management system, which has several management > workflows where by: > > 1. large named graphs can be moved around (renamed via a SPARQL Update > MOVE command). > > 2. large named graphs can be inserted, reviewed, deleted (repaired > offline) and reinserted again before finally being approved. > > With this workflow there are two problems we have been finding with some > of the other quad stores: > > The first is that renames are often implemented as a copy/delete; which > results in a slow linear-time (or worse) operation. Ideally renaming > graphs would be constant time. > > The second problem we have been encountering (which the first can > compound) is that some stores don't free storage on deletions, and don't > even have a mechanism for expunging deletions without taking the database > offline. > > I'm curious as to what Blazegraph's behaviour is in these two > circumstances, and whether or not the different journals have different > behaviours. > > Many thanks, > > R. > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Martyn C. <ma...@sy...> - 2015-04-24 12:35:30
|
I don't see how the small slot optimisation can result in more waste with larger allocators. It is simply a mechanism to avoid rapid re-allocation of the small slot allocators to attempt to improve write elision on recycled slots. In the latest Allocator dump, there are a lot of 64 byte allocators. Unlike the larger allocators (128 and greater) a large proportion of the 64 byte slots will be used for long literal values (note that the mean allocation is only 27 bytes). Counter intuitively, there may well be a case for excluding the 64 byte allocators from the "small slot optimisation". So "small slot" NOT "smallest slot" ;-) - Martyn On 24/04/2015 00:18, Bryan Thompson wrote: > I've updated the ticket. I've also copied my main conclusions inline below. > > I think that the issue here is the use of the small slot optimization > without proper configuration of the indices in order to target small > allocation slots for at least one of the indices. The small slot > optimization changes the allocation policy in two ways. > > 1. It has a strong preference to use only empty 8k pages for small > allocations (as configured, for allocations less than 1k). This allows us > to coalesce writes by combining them onto the same page. > 2. It has a preference to use allocation blocks that are relatively empty > for small slots. > > As a consequence, the small slot optimization MAY recruit more allocators > in order to have allocators for small slots that have good sparsity. > > The main goal of the small slot optimization is to optimize for indices > that have very scattered IO patterns. The indices that exhibits this the > most are the OSP and OCSP indices. In many cases even batched updates will > modify no more than a single tuple per page on this index. However, in > your configuration (and in mine when I enabled the small slot optimization > without adjusting the branching factors), the O(C)SP indices were not > created with a small branching factor, so the small slot allocation could > not be put to any good effect. However it did have a negative effect -- by > recruiting more allocators. If you want to use the small slot > optimization, make sure that at least the O(C)SP index has a relatively > small branching factor giving an effective slot size of 256 bytes or less > on average. > > I suggest that you retest w/o the small slot optimization and with group > commit still enabled. > > I've asked Martyn to look over the allocators from the small slot > optimization run and think about whether we can make this policy a little > more adaptive when the branching factors are not really tuned properly and > too many allocators with too much wasted space are allocated as a result. > Basically, how to avoid file bloat from misconfiguration. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote: > >> Ok, I can redo the test with smallSlots + groupCommit enabled, and run >> http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some >> minutes. (I cannot run it on the fully loaded dataset because my disk is >> not sufficient for the resulting Journal). >> >> By the way: Please find attached my custom Vocabulary classes. They are >> just one of my many attempts to improve IO Perfomance on rotating disks. >> >> Best Regards >> Andreas >> >>>>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>> >> I just noticed that you have the full text index enabled as well. I have >> not be enabling that. >> >> I would like to see the output from this command on the fully loaded data >> sets. >> >> http://localhost:8080/bigdata/status?dumpJournal&dumpPages >> >> This will let us if any specific index is taking up a very large number of >> pages. It will also tell us the distribution over the page sizes for each >> index. >> >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://blazegraph.com >> http://blog.bigdata.com <http://bigdata.com> >> http://mapgraph.io >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> >> wrote: >> >>> Bryan, >>> >>> in the meantime, I could successfully load the file into a 18GB journal >>> after disabling groupCommit (I simply commented out the line in >>> RWStore.properties). >>> I can try again with groupCommit enabled, but smallSlotOptimization >>> disabled. >>> >>> Best Regards >>> Andreas >>> >>>>>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> >>> Andreas, >>> >>> I was not able to replicate your result. Unfortunately I navigated away >>> from the browser page in which I had submitted the request, so it loaded >>> all the data but failed to commit. However, the resulting file is only >>> 16GB. >>> >>> I will redo this run and verify that the journal after the commit has >> this >>> same size on the disk. >>> >>> I was only assuming that this was related to group commit because of your >>> original message. Perhaps I misinterpreted your message. This is simply >>> about 1.5.1 (with group commit) vs 1.4.0. >>> >>> Perhaps the issue is related to the small slot optimization? Maybe in >>> combination with group commit? >>> >>> *> com.bigdata.rwstore.RWStore.smallSlotType=1024* >>> >>> I could not replicate your properties exactly because you are using a >>> non-standard vocabulary class. Therefore I simply deleted the default >>> namespace (in quads mode) and recreated it with the defaults in triples >>> mode. The small slot optimization and other parameters were not enabled >> in >>> my run. >>> >>> Perhaps you could try to replicate my experience and I will enable the >>> small slots optimization? >>> >>> Thanks, >>> Bryan >>> >>> ---- >>> Bryan Thompson >>> Chief Scientist & Founder >>> SYSTAP, LLC >>> 4501 Tower Road >>> Greensboro, NC 27410 >>> br...@sy... >>> http://blazegraph.com >>> http://blog.bigdata.com <http://bigdata.com> >>> http://mapgraph.io >>> >>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >>> technology to use GPUs to accelerate data-parallel graph analytics. >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attacP. Any >> unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please >> notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. >>> >>> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> >>> wrote: >>> >>>> Bryan & Martyn, >>>> >>>> Thank you very much for investigating the issue. I assume from the >>> ticket >>>> that the error will vanish if I disable groupCommit. I will do so for >> the >>>> meantime. >>>> >>>> Although there is already extensive information in Bryan's ticket, >> please >>>> find attached my logs and DumpJournal outputs: >>>> - dumpJournal.html contains a dump from the 67GB journal after >> Blazegraph >>>> ran into "No space left on device" >>>> - dumpJournalWithTraceEnabled.html is the same dump for a running query >>>> when the journal was at about 14GB >>>> - queryStatus.html is just the status page showing my query >>>> - catalina.out.gz contains the trace outputs from starting Tomcat >> until I >>>> killed the curl running the SPARQL Update by Ctrl-C >>>> - loadGnd.log.gz is Blazegraphs output when loading the data >>>> >>>> Best Regards >>>> Andreas >>>> >>>> >>>> >>>>>>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> >>>> See http://trac.bigdata.com/ticket/1206. This is still in the >>>> investigation stage. >>>> >>>> Thanks, >>>> Bryan >>>> >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://blazegraph.com >>>> http://blog.bigdata.com <http://bigdata.com> >>>> http://mapgraph.io >>>> >>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive >> new >>>> technology to use GPUs to accelerate data-parallel graph analytics. >>>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are >>>> for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments >> is >>>> prohibited. If you have received this communication in error, please >>> notify >>>> the sender by reply email and permanently delete all copies of the >> email >>>> and its contents and attachments. >>>> >>>> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> >>>> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> I currently updated to the current Revision (f4c63e5) of Blazegraph >>> from >>>>> Git and tried to load a dataset into the updated Webapp. With Bigdata >>>> 1.4.0 >>>>> this resulted in a journal of ~18GB. Now the process was cancelled >>>> because >>>>> the disk was full - the journal was beyond 50GB for the same file >> with >>>> the >>>>> same settings. >>>>> The only exception was that I activated GroupCommit. >>>>> >>>>> The dataset can be downloaded here: >>>>> >> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz >>>>> . >>>>> Please find the settings used to load the file below. >>>>> >>>>> Do I have a misconfiguration, or is there a bug eating all disk >> memory? >>>>> Best regards >>>>> Andreas >>>>> >>>>> Namespace-Properties: >>>>> curl -H "Accept: text/plain" >>>>> http://localhost:8080/bigdata/namespace/gnd/properties >>>>> #Wed Apr 22 11:35:31 CEST 2015 >>>>> >>> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 >>>>> com.bigdata.relation.container=gnd >>>>> com.bigdata.rwstore.RWStore.smallSlotType=1024 >>>>> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW >>>>> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl >>>>> >>>>> >>> com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true >>>>> com.bigdata.btree.BTree.branchingFactor=700 >>>>> >>>>> >> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms >>>>> com.bigdata.rdf.sail.isolatableIndices=false >>>>> com.bigdata.service.AbstractTransactionService.minReleaseAge=1 >>>>> com.bigdata.rdf.sail.bufferCapacity=2000 >>>>> com.bigdata.rdf.sail.truthMaintenance=false >>>>> com.bigdata.rdf.sail.namespace=gnd >>>>> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore >>>>> com.bigdata.rdf.store.AbstractTripleStore.quads=false >>>>> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 >>>>> com.bigdata.search.FullTextIndex.fieldsEnabled=false >>>>> com.bigdata.relation.namespace=gndity=10000 >>>>> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 >>>>> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false >>>>> >>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>>>> Develop your own process in accordance with the BPMN 2 standard >>>>> Learn Process modeling best practices with Bonita BPM through live >>>>> exercises >>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >>>>> event?utm_ >>>>> >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >>>>> _______________________________________________ >>>>> Bigdata-developers mailing list >>>>> Big...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>>> >>>>> >>>> >> > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Rick M. <ri...@sw...> - 2015-04-24 12:23:30
|
Hi all, We've recently been evaluating quad-stores, and in particular are looking for better storage layers, and Blazegraph looks like a promising option. We have a linked data management system, which has several management workflows where by: 1. large named graphs can be moved around (renamed via a SPARQL Update MOVE command). 2. large named graphs can be inserted, reviewed, deleted (repaired offline) and reinserted again before finally being approved. With this workflow there are two problems we have been finding with some of the other quad stores: The first is that renames are often implemented as a copy/delete; which results in a slow linear-time (or worse) operation. Ideally renaming graphs would be constant time. The second problem we have been encountering (which the first can compound) is that some stores don't free storage on deletions, and don't even have a mechanism for expunging deletions without taking the database offline. I'm curious as to what Blazegraph's behaviour is in these two circumstances, and whether or not the different journals have different behaviours. Many thanks, R. |
From: Bryan T. <br...@sy...> - 2015-04-23 23:18:20
|
I've updated the ticket. I've also copied my main conclusions inline below. I think that the issue here is the use of the small slot optimization without proper configuration of the indices in order to target small allocation slots for at least one of the indices. The small slot optimization changes the allocation policy in two ways. 1. It has a strong preference to use only empty 8k pages for small allocations (as configured, for allocations less than 1k). This allows us to coalesce writes by combining them onto the same page. 2. It has a preference to use allocation blocks that are relatively empty for small slots. As a consequence, the small slot optimization MAY recruit more allocators in order to have allocators for small slots that have good sparsity. The main goal of the small slot optimization is to optimize for indices that have very scattered IO patterns. The indices that exhibits this the most are the OSP and OCSP indices. In many cases even batched updates will modify no more than a single tuple per page on this index. However, in your configuration (and in mine when I enabled the small slot optimization without adjusting the branching factors), the O(C)SP indices were not created with a small branching factor, so the small slot allocation could not be put to any good effect. However it did have a negative effect -- by recruiting more allocators. If you want to use the small slot optimization, make sure that at least the O(C)SP index has a relatively small branching factor giving an effective slot size of 256 bytes or less on average. I suggest that you retest w/o the small slot optimization and with group commit still enabled. I've asked Martyn to look over the allocators from the small slot optimization run and think about whether we can make this policy a little more adaptive when the branching factors are not really tuned properly and too many allocators with too much wasted space are allocated as a result. Basically, how to avoid file bloat from misconfiguration. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote: > Ok, I can redo the test with smallSlots + groupCommit enabled, and run > http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some > minutes. (I cannot run it on the fully loaded dataset because my disk is > not sufficient for the resulting Journal). > > By the way: Please find attached my custom Vocabulary classes. They are > just one of my many attempts to improve IO Perfomance on rotating disks. > > Best Regards > Andreas > > >>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>> > I just noticed that you have the full text index enabled as well. I have > not be enabling that. > > I would like to see the output from this command on the fully loaded data > sets. > > http://localhost:8080/bigdata/status?dumpJournal&dumpPages > > This will let us if any specific index is taking up a very large number of > pages. It will also tell us the distribution over the page sizes for each > index. > > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Bryan, > > > > in the meantime, I could successfully load the file into a 18GB journal > > after disabling groupCommit (I simply commented out the line in > > RWStore.properties). > > I can try again with groupCommit enabled, but smallSlotOptimization > > disabled. > > > > Best Regards > > Andreas > > > > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> > > Andreas, > > > > I was not able to replicate your result. Unfortunately I navigated away > > from the browser page in which I had submitted the request, so it loaded > > all the data but failed to commit. However, the resulting file is only > > 16GB. > > > > I will redo this run and verify that the journal after the commit has > this > > same size on the disk. > > > > I was only assuming that this was related to group commit because of your > > original message. Perhaps I misinterpreted your message. This is simply > > about 1.5.1 (with group commit) vs 1.4.0. > > > > Perhaps the issue is related to the small slot optimization? Maybe in > > combination with group commit? > > > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > > > I could not replicate your properties exactly because you are using a > > non-standard vocabulary class. Therefore I simply deleted the default > > namespace (in quads mode) and recreated it with the defaults in triples > > mode. The small slot optimization and other parameters were not enabled > in > > my run. > > > > Perhaps you could try to replicate my experience and I will enable the > > small slots optimization? > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com <http://bigdata.com> > > http://mapgraph.io > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attacP. Any > unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > > and its contents and attachments. > > > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> > > wrote: > > > > > Bryan & Martyn, > > > > > > Thank you very much for investigating the issue. I assume from the > > ticket > > > that the error will vanish if I disable groupCommit. I will do so for > the > > > meantime. > > > > > > Although there is already extensive information in Bryan's ticket, > please > > > find attached my logs and DumpJournal outputs: > > > - dumpJournal.html contains a dump from the 67GB journal after > Blazegraph > > > ran into "No space left on device" > > > - dumpJournalWithTraceEnabled.html is the same dump for a running query > > > when the journal was at about 14GB > > > - queryStatus.html is just the status page showing my query > > > - catalina.out.gz contains the trace outputs from starting Tomcat > until I > > > killed the curl running the SPARQL Update by Ctrl-C > > > - loadGnd.log.gz is Blazegraphs output when loading the data > > > > > > Best Regards > > > Andreas > > > > > > > > > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > > > See http://trac.bigdata.com/ticket/1206. This is still in the > > > investigation stage. > > > > > > Thanks, > > > Bryan > > > > > > ---- > > > Bryan Thompson > > > Chief Scientist & Founder > > > SYSTAP, LLC > > > 4501 Tower Road > > > Greensboro, NC 27410 > > > br...@sy... > > > http://blazegraph.com > > > http://blog.bigdata.com <http://bigdata.com> > > > http://mapgraph.io > > > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive > new > > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > are > > > for the sole use of the intended recipient(s) and are confidential or > > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > > dissemination or copying of this email or its contents or attachments > is > > > prohibited. If you have received this communication in error, please > > notify > > > the sender by reply email and permanently delete all copies of the > email > > > and its contents and attachments. > > > > > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > > > wrote: > > > > > > > Hello everyone, > > > > > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph > > from > > > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > > 1.4.0 > > > > this resulted in a journal of ~18GB. Now the process was cancelled > > > because > > > > the disk was full - the journal was beyond 50GB for the same file > with > > > the > > > > same settings. > > > > The only exception was that I activated GroupCommit. > > > > > > > > The dataset can be downloaded here: > > > > > > > > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > > > . > > > > Please find the settings used to load the file below. > > > > > > > > Do I have a misconfiguration, or is there a bug eating all disk > memory? > > > > > > > > Best regards > > > > Andreas > > > > > > > > Namespace-Properties: > > > > curl -H "Accept: text/plain" > > > > http://localhost:8080/bigdata/namespace/gnd/properties > > > > #Wed Apr 22 11:35:31 CEST 2015 > > > > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > > > com.bigdata.relation.container=gnd > > > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true > > > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > > > com.bigdata.rdf.sail.isolatableIndices=false > > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > > com.bigdata.rdf.sail.truthMaintenance=false > > > > com.bigdata.rdf.sail.namespace=gnd > > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > > > com.bigdata.relation.namespace=gndity=10000 > > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > > > Develop your own process in accordance with the BPMN 2 standard > > > > Learn Process modeling best practices with Bonita BPM through live > > > > exercises > > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > > > event?utm_ > > > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > > _______________________________________________ > > > > Bigdata-developers mailing list > > > > Big...@li... > > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > > > > > > > > > > > |
From: Bryan T. <br...@sy...> - 2015-04-23 20:44:26
|
The NPE is on the bold lines below. getSPORelation() is returning null. final public IAccessPath<ISPO> getAccessPath(final IV s, final IV p, final IV o,final IV c, final RangeBOp range) { * return getSPORelation()* * .getAccessPath(s, p, o, c, range);* } The code for this method is below. It uses what amounts to a double-checked locking pattern to avoid synchronization in the common case where the value is already set on the atomic reference. abort(), create(), destroy() and this method can all set its value, but this is the only method that will set it to a non-null value. final public SPORelation getSPORelation() { if (spoRelationRef.get() == null) { /* * Note: double-checked locking pattern (mostly non-blocking). Only * synchronized if not yet resolved. The AtomicReference is reused * as the monitor to serialize the resolution of the SPORelation in * order to have that operation not contend with any other part of * the API. */ synchronized (this) { if (spoRelationRef.get() == null) { spoRelationRef.set((SPORelation) getIndexManager() .getResourceLocator().locate( getNamespace() + "." + SPORelation.NAME_SPO_RELATION, getTimestamp())); } } } return spoRelationRef.get(); } private final AtomicReference<SPORelation> spoRelationRef = new AtomicReference<SPORelation>(); My most likely interpretation for this is that the operation has been cancelled and this represents the asynchronous case where the spoRelationRef value was cleared by abort(). However, you might want to turn on logging @ INFO on the DefaultResourceLocator class. This is the class that is being called by the *locate()* call above. This *can* return null, but it should only return null if the index does not exist. This should not be true when it is running a query against an existing triple store. This might be related to #468 (rare interrupt of rangeCount during query on cluster). That is, it is possible that an interrupt is coming through in a race with the rangeCount() call and you are seeing this NPE when the abort() is executed before the rangeCount() and the thread calling rangeCount() might have been interrupted, but it has not observed the interrupt yet (has not hit a lock or IO, etc.). Both of these potential explanations would beg the questions: a. why is abort() being called (rollback() of the connection running the query or canceling the query could do this). b. why is an interrupt being raised (if we believe that abort() was called due to query termination by interrupt)? Is this to cancel the query? Or is it spurious? So the question is whether this is a data race that is triggered by an intentional cancellation of the query (which could also be due to an error during query processing) or a data race triggered by a spurious interrupt (which would be unpleasant) or something else? Yes, it is worth looking into further. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 3:50 PM, Stas Malyshev <sma...@wi...> wrote: > Hi! > > I've encountered an NPE exception running Blazegraph with our data > update tool, the dump is here: > https://gist.github.com/smalyshev/6b8b318c8449bfb837e1 > > This seems to be random (the same query runs again with no issue) and > happened under some load, but does not seem to be reproducible since. I > am still worried it may hint at some bug. Any ideas of how to > investigate it further and if there's a reason for worry? > > Thanks, > -- > Stas Malyshev > sma...@wi... > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Stas M. <sma...@wi...> - 2015-04-23 20:15:20
|
Hi! I've encountered an NPE exception running Blazegraph with our data update tool, the dump is here: https://gist.github.com/smalyshev/6b8b318c8449bfb837e1 This seems to be random (the same query runs again with no issue) and happened under some load, but does not seem to be reproducible since. I am still worried it may hint at some bug. Any ideas of how to investigate it further and if there's a reason for worry? Thanks, -- Stas Malyshev sma...@wi... |
From: Bryan T. <br...@sy...> - 2015-04-23 18:57:46
|
I will say that I am observing a lot of IO Wait on that data set, even on an SSD (~10-20%). I am using just the out of the box settings for a newly created kb. These are by no means optimal. I would suggest a larger pool of write cache buffers in order to reduce the disk IO. The write cache buffers make it possible for index pages that are evicted and then modified before they are actually written to the disk to skip the IO for the first modified version of the page. This can be quite a substantial savings for large data set loads. Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 9:30 AM, Andreas Kahl <ka...@bs...> wrote: > Now I am 25mins into the new load with groupCommit enabled and > com.bigdata.rwstore.RWStore.smallSlotType=1024 commented out. > Currently 24,870,000 Triples are parsed and the journal is at 3.6GB. It > looks like disabling smallSlotOptimization also resolves the problem > (Otherwise I would have more than twice the space used at that time). > > So, I would conclude, it's the combination of groupCommit and > smallSlotOptimization. > > All tests were run on Blazegraph 1.5.1 from Git revision f4c63e5. > > Best Regards > Andreas > > > >>> "Andreas Kahl" <ka...@bs...> 23.04.2015 14:54 >>> > Bryan, > > in the meantime, I could successfully load the file into a 18GB journal > after disabling groupCommit (I simply commented out the line in > RWStore.properties). > I can try again with groupCommit enabled, but smallSlotOptimization > disabled. > > Best Regards > Andreas > > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> > Andreas, > > I was not able to replicate your result. Unfortunately I navigated away > from the browser page in which I had submitted the request, so it loaded > all the data but failed to commit. However, the resulting file is only > 16GB. > > I will redo this run and verify that the journal after the commit has this > same size on the disk. > > I was only assuming that this was related to group commit because of your > original message. Perhaps I misinterpreted your message. This is simply > about 1.5.1 (with group commit) vs 1.4.0. > > Perhaps the issue is related to the small slot optimization? Maybe in > combination with group commit? > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > I could not replicate your properties exactly because you are using a > non-standard vocabulary class. Therefore I simply deleted the default > namespace (in quads mode) and recreated it with the defaults in triples > mode. The small slot optimization and other parameters were not enabled in > my run. > > Perhaps you could try to replicate my experience and I will enable the > small slots optimization? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Bryan & Martyn, > > > > Thank you very much for investigating the issue. I assume from the > ticket > > that the error will vanish if I disable groupCommit. I will do so for the > > meantime. > > > > Although there is already extensive information in Bryan's ticket, please > > find attached my logs and DumpJournal outputs: > > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > > ran into "No space left on device" > > - dumpJournalWithTraceEnabled.html is the same dump for a running query > > when the journal was at about 14GB > > - queryStatus.html is just the status page showing my query > > - catalina.out.gz contains the trace outputs from starting Tomcat until I > > killed the curl running the SPARQL Update by Ctrl-C > > - loadGnd.log.gz is Blazegraphs output when loading the data > > > > Best Regards > > Andreas > > > > > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > > See http://trac.bigdata.com/ticket/1206. This is still in the > > investigation stage. > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com <http://bigdata.com> > > http://mapgraph.io > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > > for the sole use of the intended recipient(s) and are confidential or > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > > and its contents and attachments. > > > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > > wrote: > > > > > Hello everyone, > > > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph > from > > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > 1.4.0 > > > this resulted in a journal of ~18GB. Now the process was cancelled > > because > > > the disk was full - the journal was beyond 50GB for the same file with > > the > > > same settings. > > > The only exception was that I activated GroupCommit. > > > > > > The dataset can be downloaded here: > > > > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > > . > > > Please find the settings used to load the file below. > > > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > > > Best regards > > > Andreas > > > > > > Namespace-Properties: > > > curl -H "Accept: text/plain" > > > http://localhost:8080/bigdata/namespace/gnd/properties > > > #Wed Apr 22 11:35:31 CEST 2015 > > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > > com.bigdata.relation.container=gnd > > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > > > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > > com.bigdata.rdf.sail.isolatableIndices=false > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > com.bigdata.rdf.sail.truthMaintenance=false > > > com.bigdata.rdf.sail.namespace=gnd > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > > com.bigdata.relation.namespace=gndity=10000 > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > > Develop your own process in accordance with the BPMN 2 standard > > > Learn Process modeling best practices with Bonita BPM through live > > > exercises > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > > event?utm_ > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > _______________________________________________ > > > Bigdata-developers mailing list > > > Big...@li... > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > > > > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Bryan T. <br...@sy...> - 2015-04-23 13:58:31
|
You should increase the buffer capacity to get better throughput. You specify two different names below. The actual name is the first (com.bigdata.rdf.sail.bufferCapacity). This specifies how many statements will be buffered in the BigdataSailConnection before the statements are incrementally evicted to the disk. For large loads, a value of 100000 or better is a good idea - as long as you do not encounter too much GC overhead. > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote: > Ok, I can redo the test with smallSlots + groupCommit enabled, and run > http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some > minutes. (I cannot run it on the fully loaded dataset because my disk is > not sufficient for the resulting Journal). > > By the way: Please find attached my custom Vocabulary classes. They are > just one of my many attempts to improve IO Perfomance on rotating disks. > > Best Regards > Andreas > > >>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>> > I just noticed that you have the full text index enabled as well. I have > not be enabling that. > > I would like to see the output from this command on the fully loaded data > sets. > > http://localhost:8080/bigdata/status?dumpJournal&dumpPages > > This will let us if any specific index is taking up a very large number of > pages. It will also tell us the distribution over the page sizes for each > index. > > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Bryan, > > > > in the meantime, I could successfully load the file into a 18GB journal > > after disabling groupCommit (I simply commented out the line in > > RWStore.properties). > > I can try again with groupCommit enabled, but smallSlotOptimization > > disabled. > > > > Best Regards > > Andreas > > > > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> > > Andreas, > > > > I was not able to replicate your result. Unfortunately I navigated away > > from the browser page in which I had submitted the request, so it loaded > > all the data but failed to commit. However, the resulting file is only > > 16GB. > > > > I will redo this run and verify that the journal after the commit has > this > > same size on the disk. > > > > I was only assuming that this was related to group commit because of your > > original message. Perhaps I misinterpreted your message. This is simply > > about 1.5.1 (with group commit) vs 1.4.0. > > > > Perhaps the issue is related to the small slot optimization? Maybe in > > combination with group commit? > > > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > > > I could not replicate your properties exactly because you are using a > > non-standard vocabulary class. Therefore I simply deleted the default > > namespace (in quads mode) and recreated it with the defaults in triples > > mode. The small slot optimization and other parameters were not enabled > in > > my run. > > > > Perhaps you could try to replicate my experience and I will enable the > > small slots optimization? > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com <http://bigdata.com> > > http://mapgraph.io > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attacP. Any > unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > > and its contents and attachments. > > > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> > > wrote: > > > > > Bryan & Martyn, > > > > > > Thank you very much for investigating the issue. I assume from the > > ticket > > > that the error will vanish if I disable groupCommit. I will do so for > the > > > meantime. > > > > > > Although there is already extensive information in Bryan's ticket, > please > > > find attached my logs and DumpJournal outputs: > > > - dumpJournal.html contains a dump from the 67GB journal after > Blazegraph > > > ran into "No space left on device" > > > - dumpJournalWithTraceEnabled.html is the same dump for a running query > > > when the journal was at about 14GB > > > - queryStatus.html is just the status page showing my query > > > - catalina.out.gz contains the trace outputs from starting Tomcat > until I > > > killed the curl running the SPARQL Update by Ctrl-C > > > - loadGnd.log.gz is Blazegraphs output when loading the data > > > > > > Best Regards > > > Andreas > > > > > > > > > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > > > See http://trac.bigdata.com/ticket/1206. This is still in the > > > investigation stage. > > > > > > Thanks, > > > Bryan > > > > > > ---- > > > Bryan Thompson > > > Chief Scientist & Founder > > > SYSTAP, LLC > > > 4501 Tower Road > > > Greensboro, NC 27410 > > > br...@sy... > > > http://blazegraph.com > > > http://blog.bigdata.com <http://bigdata.com> > > > http://mapgraph.io > > > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive > new > > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > are > > > for the sole use of the intended recipient(s) and are confidential or > > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > > dissemination or copying of this email or its contents or attachments > is > > > prohibited. If you have received this communication in error, please > > notify > > > the sender by reply email and permanently delete all copies of the > email > > > and its contents and attachments. > > > > > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > > > wrote: > > > > > > > Hello everyone, > > > > > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph > > from > > > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > > 1.4.0 > > > > this resulted in a journal of ~18GB. Now the process was cancelled > > > because > > > > the disk was full - the journal was beyond 50GB for the same file > with > > > the > > > > same settings. > > > > The only exception was that I activated GroupCommit. > > > > > > > > The dataset can be downloaded here: > > > > > > > > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > > > . > > > > Please find the settings used to load the file below. > > > > > > > > Do I have a misconfiguration, or is there a bug eating all disk > memory? > > > > > > > > Best regards > > > > Andreas > > > > > > > > Namespace-Properties: > > > > curl -H "Accept: text/plain" > > > > http://localhost:8080/bigdata/namespace/gnd/properties > > > > #Wed Apr 22 11:35:31 CEST 2015 > > > > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > > > com.bigdata.relation.container=gnd > > > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true > > > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > > > com.bigdata.rdf.sail.isolatableIndices=false > > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > > com.bigdata.rdf.sail.truthMaintenance=false > > > > com.bigdata.rdf.sail.namespace=gnd > > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > > > com.bigdata.relation.namespace=gndity=10000 > > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > > > Develop your own process in accordance with the BPMN 2 standard > > > > Learn Process modeling best practices with Bonita BPM through live > > > > exercises > > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > > > event?utm_ > > > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > > _______________________________________________ > > > > Bigdata-developers mailing list > > > > Big...@li... > > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > > > > > > > > > > > |
From: Andreas K. <ka...@bs...> - 2015-04-23 13:36:24
|
Ok, I can redo the test with smallSlots + groupCommit enabled, and run http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some minutes. (I cannot run it on the fully loaded dataset because my disk is not sufficient for the resulting Journal). By the way: Please find attached my custom Vocabulary classes. They are just one of my many attempts to improve IO Perfomance on rotating disks. Best Regards Andreas >>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>> I just noticed that you have the full text index enabled as well. I have not be enabling that. I would like to see the output from this command on the fully loaded data sets. http://localhost:8080/bigdata/status?dumpJournal&dumpPages This will let us if any specific index is taking up a very large number of pages. It will also tell us the distribution over the page sizes for each index. Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> wrote: > Bryan, > > in the meantime, I could successfully load the file into a 18GB journal > after disabling groupCommit (I simply commented out the line in > RWStore.properties). > I can try again with groupCommit enabled, but smallSlotOptimization > disabled. > > Best Regards > Andreas > > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> > Andreas, > > I was not able to replicate your result. Unfortunately I navigated away > from the browser page in which I had submitted the request, so it loaded > all the data but failed to commit. However, the resulting file is only > 16GB. > > I will redo this run and verify that the journal after the commit has this > same size on the disk. > > I was only assuming that this was related to group commit because of your > original message. Perhaps I misinterpreted your message. This is simply > about 1.5.1 (with group commit) vs 1.4.0. > > Perhaps the issue is related to the small slot optimization? Maybe in > combination with group commit? > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > I could not replicate your properties exactly because you are using a > non-standard vocabulary class. Therefore I simply deleted the default > namespace (in quads mode) and recreated it with the defaults in triples > mode. The small slot optimization and other parameters were not enabled in > my run. > > Perhaps you could try to replicate my experience and I will enable the > small slots optimization? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attacP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Bryan & Martyn, > > > > Thank you very much for investigating the issue. I assume from the > ticket > > that the error will vanish if I disable groupCommit. I will do so for the > > meantime. > > > > Although there is already extensive information in Bryan's ticket, please > > find attached my logs and DumpJournal outputs: > > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > > ran into "No space left on device" > > - dumpJournalWithTraceEnabled.html is the same dump for a running query > > when the journal was at about 14GB > > - queryStatus.html is just the status page showing my query > > - catalina.out.gz contains the trace outputs from starting Tomcat until I > > killed the curl running the SPARQL Update by Ctrl-C > > - loadGnd.log.gz is Blazegraphs output when loading the data > > > > Best Regards > > Andreas > > > > > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > > See http://trac.bigdata.com/ticket/1206. This is still in the > > investigation stage. > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com <http://bigdata.com> > > http://mapgraph.io > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > > for the sole use of the intended recipient(s) and are confidential or > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > > and its contents and attachments. > > > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > > wrote: > > > > > Hello everyone, > > > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph > from > > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > 1.4.0 > > > this resulted in a journal of ~18GB. Now the process was cancelled > > because > > > the disk was full - the journal was beyond 50GB for the same file with > > the > > > same settings. > > > The only exception was that I activated GroupCommit. > > > > > > The dataset can be downloaded here: > > > > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > > . > > > Please find the settings used to load the file below. > > > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > > > Best regards > > > Andreas > > > > > > Namespace-Properties: > > > curl -H "Accept: text/plain" > > > http://localhost:8080/bigdata/namespace/gnd/properties > > > #Wed Apr 22 11:35:31 CEST 2015 > > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > > com.bigdata.relation.container=gnd > > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true > > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > > com.bigdata.rdf.sail.isolatableIndices=false > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > com.bigdata.rdf.sail.truthMaintenance=false > > > com.bigdata.rdf.sail.namespace=gnd > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > > com.bigdata.relation.namespace=gndity=10000 > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > > Develop your own process in accordance with the BPMN 2 standard > > > Learn Process modeling best practices with Bonita BPM through live > > > exercises > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > > event?utm_ > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > _______________________________________________ > > > Bigdata-developers mailing list > > > Big...@li... > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > > > > |
From: Bryan T. <br...@sy...> - 2015-04-23 13:33:27
|
I am at the following with group commit + small slots but without the full text index. ><br>totalElapsed=4502693ms, elapsed=4502570ms, parsed=43820000, tps=9732, done=false</br -rw-r--r-- 1 root root 6.0G Apr 23 09:31 bigdata.jnl There is clearly a lot of recycling going on. I am going to wait for it to finish to look into this further. magic=e6b4c275 version=1 extent=209715200(200M), userExtent=209714512(199M), bytesAvailable=209714512(199M), nextOffset=0 rootBlock{ rootBlock=0, challisField=4, version=3, nextOffset=253403152405, localTime=1429791054868 [Thursday, April 23, 2015 8:10:54 AM EDT], firstCommitTime=1429789657461 [Thursday, April 23, 2015 7:47:37 AM EDT], lastCommitTime=1429791054859 [Thursday, April 23, 2015 8:10:54 AM EDT], commitCounter=4, commitRecordAddr={off=NATIVE:-106500,len=422}, commitRecordIndexAddr={off=NATIVE:-81940,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=206535917615, metaStartAddr=3200, storeType=RW, uuid=8d9bce3f-db56-4a87-b3fd-c1a433e1d3d8, offsetBits=42, checksum=-1504696410, createTime=1429789657046 [Thursday, April 23, 2015 7:47:37 AM EDT], closeTime=0} rootBlock{ rootBlock=1, challisField=3, version=3, nextOffset=231928315910, localTime=1429791050520 [Thursday, April 23, 2015 8:10:50 AM EDT], firstCommitTime=1429789657461 [Thursday, April 23, 2015 7:47:37 AM EDT], lastCommitTime=1429791050513 [Thursday, April 23, 2015 8:10:50 AM EDT], commitCounter=3, commitRecordAddr={off=NATIVE:-40968,len=422}, commitRecordIndexAddr={off=NATIVE:-81925,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=206221344815, metaStartAddr=3200, storeType=RW, uuid=8d9bce3f-db56-4a87-b3fd-c1a433e1d3d8, offsetBits=42, checksum=-2109528144, createTime=1429789657046 [Thursday, April 23, 2015 7:47:37 AM EDT], closeTime=0} The current root block is #0 ------------------------- RWStore Allocator Summary ------------------------- AllocatorSize AllocatorCount SlotsAllocated %SlotsAllocated SlotsRecycled SlotChurn SlotsInUse %SlotsInUse MeanAllocation SlotsReserved %SlotsUnused BytesReserved BytesAppData %SlotWaste %AppData %StoreFile %TotalWaste %FileWaste 64 3390 25653334 51.17 8436924 1.49 17216410 89.79 27 24299520 29.15 1555169280 566105149 63.60 14.84 28.50 60.20 18.13 128 178 1349254 2.69 106699 1.09 1242555 6.48 87 1270272 2.18 162594816 107803126 33.70 2.83 2.98 3.33 1.00 192 19 229968 0.46 105038 1.84 124930 0.65 153 134144 6.87 25755648 18951542 26.42 0.50 0.47 0.41 0.12 320 5 229984 0.46 203087 8.55 26897 0.14 253 35840 24.95 11468800 7145051 37.70 0.19 0.21 0.26 0.08 512 2 296128 0.59 289066 41.93 7062 0.04 415 7424 4.88 3801088 3949092 -3.89 0.10 0.07 -0.01 0.00 768 2 369042 0.74 365413 101.69 3629 0.02 639 7424 51.12 5701632 3754158 34.16 0.10 0.10 0.12 0.04 1024 2 348064 0.69 345243 123.38 2821 0.01 895 7424 62.00 7602176 3907272 48.60 0.10 0.14 0.22 0.07 2048 4 1307596 2.61 1280762 48.73 26834 0.14 1525 28672 6.41 58720256 41087168 30.03 1.08 1.08 1.07 0.32 3072 2 1175674 2.34 1162053 86.31 13621 0.07 2558 14336 4.99 44040192 42018252 4.59 1.10 0.81 0.12 0.04 4096 26 1758846 3.51 1581120 9.90 177726 0.93 3572 186368 4.64 763363328 621452046 18.59 16.30 13.99 8.64 2.60 8192 48 17418250 34.74 17086197 52.46 332053 1.73 7274 344064 3.49 2818572288 2397567451 14.94 62.87 51.65 25.62 7.72 ------------------------- BLOBS ------------------------- Bucket(K) Allocations Allocated Deletes Deleted Current Data Mean Churn 16 7529975 87846235428 7432952 86784750383 97023 1061485045 11666 77.61 32 890621 17213523980 885724 17117153133 4897 96370847 19327 181.87 64 15272 560980091 15190 557968835 82 3011256 36732 186.24 128 0 0 0 0 0 0 0 0.00 256 0 0 0 0 0 0 0 0.00 512 0 0 0 0 0 0 0 0.00 1024 0 0 0 0 0 0 0 0.00 2048 0 0 0 0 0 0 0 0.00 4096 0 0 0 0 0 0 0 0.00 8192 0 0 0 0 0 0 0 0.00 16384 0 0 0 0 0 0 0 0.00 32768 0 0 0 0 0 0 0 0.00 65536 0 0 0 0 0 0 0 0.00 2097151 0 0 0 0 0 0 0 0.00 ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 9:30 AM, Andreas Kahl <ka...@bs...> wrote: > Now I am 25mins into the new load with groupCommit enabled and > com.bigdata.rwstore.RWStore.smallSlotType=1024 commented out. > Currently 24,870,000 Triples are parsed and the journal is at 3.6GB. It > looks like disabling smallSlotOptimization also resolves the problem > (Otherwise I would have more than twice the space used at that time). > > So, I would conclude, it's the combination of groupCommit and > smallSlotOptimization. > > All tests were run on Blazegraph 1.5.1 from Git revision f4c63e5. > > Best Regards > Andreas > > > >>> "Andreas Kahl" <ka...@bs...> 23.04.2015 14:54 >>> > Bryan, > > in the meantime, I could successfully load the file into a 18GB journal > after disabling groupCommit (I simply commented out the line in > RWStore.properties). > I can try again with groupCommit enabled, but smallSlotOptimization > disabled. > > Best Regards > Andreas > > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> > Andreas, > > I was not able to replicate your result. Unfortunately I navigated away > from the browser page in which I had submitted the request, so it loaded > all the data but failed to commit. However, the resulting file is only > 16GB. > > I will redo this run and verify that the journal after the commit has this > same size on the disk. > > I was only assuming that this was related to group commit because of your > original message. Perhaps I misinterpreted your message. This is simply > about 1.5.1 (with group commit) vs 1.4.0. > > Perhaps the issue is related to the small slot optimization? Maybe in > combination with group commit? > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > I could not replicate your properties exactly because you are using a > non-standard vocabulary class. Therefore I simply deleted the default > namespace (in quads mode) and recreated it with the defaults in triples > mode. The small slot optimization and other parameters were not enabled in > my run. > > Perhaps you could try to replicate my experience and I will enable the > small slots optimization? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Bryan & Martyn, > > > > Thank you very much for investigating the issue. I assume from the > ticket > > that the error will vanish if I disable groupCommit. I will do so for the > > meantime. > > > > Although there is already extensive information in Bryan's ticket, please > > find attached my logs and DumpJournal outputs: > > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > > ran into "No space left on device" > > - dumpJournalWithTraceEnabled.html is the same dump for a running query > > when the journal was at about 14GB > > - queryStatus.html is just the status page showing my query > > - catalina.out.gz contains the trace outputs from starting Tomcat until I > > killed the curl running the SPARQL Update by Ctrl-C > > - loadGnd.log.gz is Blazegraphs output when loading the data > > > > Best Regards > > Andreas > > > > > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > > See http://trac.bigdata.com/ticket/1206. This is still in the > > investigation stage. > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com <http://bigdata.com> > > http://mapgraph.io > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > > for the sole use of the intended recipient(s) and are confidential or > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > > and its contents and attachments. > > > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > > wrote: > > > > > Hello everyone, > > > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph > from > > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > 1.4.0 > > > this resulted in a journal of ~18GB. Now the process was cancelled > > because > > > the disk was full - the journal was beyond 50GB for the same file with > > the > > > same settings. > > > The only exception was that I activated GroupCommit. > > > > > > The dataset can be downloaded here: > > > > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > > . > > > Please find the settings used to load the file below. > > > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > > > Best regards > > > Andreas > > > > > > Namespace-Properties: > > > curl -H "Accept: text/plain" > > > http://localhost:8080/bigdata/namespace/gnd/properties > > > #Wed Apr 22 11:35:31 CEST 2015 > > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > > com.bigdata.relation.container=gnd > > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > > > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > > com.bigdata.rdf.sail.isolatableIndices=false > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > com.bigdata.rdf.sail.truthMaintenance=false > > > com.bigdata.rdf.sail.namespace=gnd > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > > com.bigdata.relation.namespace=gndity=10000 > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > > Develop your own process in accordance with the BPMN 2 standard > > > Learn Process modeling best practices with Bonita BPM through live > > > exercises > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > > event?utm_ > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > _______________________________________________ > > > Bigdata-developers mailing list > > > Big...@li... > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > > > > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Bryan T. <br...@sy...> - 2015-04-23 13:31:07
|
I just noticed that you have the full text index enabled as well. I have not be enabling that. I would like to see the output from this command on the fully loaded data sets. http://localhost:8080/bigdata/status?dumpJournal&dumpPages This will let us if any specific index is taking up a very large number of pages. It will also tell us the distribution over the page sizes for each index. Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> wrote: > Bryan, > > in the meantime, I could successfully load the file into a 18GB journal > after disabling groupCommit (I simply commented out the line in > RWStore.properties). > I can try again with groupCommit enabled, but smallSlotOptimization > disabled. > > Best Regards > Andreas > > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> > Andreas, > > I was not able to replicate your result. Unfortunately I navigated away > from the browser page in which I had submitted the request, so it loaded > all the data but failed to commit. However, the resulting file is only > 16GB. > > I will redo this run and verify that the journal after the commit has this > same size on the disk. > > I was only assuming that this was related to group commit because of your > original message. Perhaps I misinterpreted your message. This is simply > about 1.5.1 (with group commit) vs 1.4.0. > > Perhaps the issue is related to the small slot optimization? Maybe in > combination with group commit? > > *> com.bigdata.rwstore.RWStore.smallSlotType=1024* > > I could not replicate your properties exactly because you are using a > non-standard vocabulary class. Therefore I simply deleted the default > namespace (in quads mode) and recreated it with the defaults in triples > mode. The small slot optimization and other parameters were not enabled in > my run. > > Perhaps you could try to replicate my experience and I will enable the > small slots optimization? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Bryan & Martyn, > > > > Thank you very much for investigating the issue. I assume from the > ticket > > that the error will vanish if I disable groupCommit. I will do so for the > > meantime. > > > > Although there is already extensive information in Bryan's ticket, please > > find attached my logs and DumpJournal outputs: > > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > > ran into "No space left on device" > > - dumpJournalWithTraceEnabled.html is the same dump for a running query > > when the journal was at about 14GB > > - queryStatus.html is just the status page showing my query > > - catalina.out.gz contains the trace outputs from starting Tomcat until I > > killed the curl running the SPARQL Update by Ctrl-C > > - loadGnd.log.gz is Blazegraphs output when loading the data > > > > Best Regards > > Andreas > > > > > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > > See http://trac.bigdata.com/ticket/1206. This is still in the > > investigation stage. > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com <http://bigdata.com> > > http://mapgraph.io > > > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > > technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > > for the sole use of the intended recipient(s) and are confidential or > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > > and its contents and attachments. > > > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > > wrote: > > > > > Hello everyone, > > > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph > from > > > Git and tried to load a dataset into the updated Webapp. With Bigdata > > 1.4.0 > > > this resulted in a journal of ~18GB. Now the process was cancelled > > because > > > the disk was full - the journal was beyond 50GB for the same file with > > the > > > same settings. > > > The only exception was that I activated GroupCommit. > > > > > > The dataset can be downloaded here: > > > > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > > . > > > Please find the settings used to load the file below. > > > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > > > Best regards > > > Andreas > > > > > > Namespace-Properties: > > > curl -H "Accept: text/plain" > > > http://localhost:8080/bigdata/namespace/gnd/properties > > > #Wed Apr 22 11:35:31 CEST 2015 > > > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > > com.bigdata.relation.container=gnd > > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > > > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > > com.bigdata.rdf.sail.isolatableIndices=false > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > > com.bigdata.rdf.sail.bufferCapacity=2000 > > > com.bigdata.rdf.sail.truthMaintenance=false > > > com.bigdata.rdf.sail.namespace=gnd > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > > com.bigdata.relation.namespace=gndity=10000 > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > > Develop your own process in accordance with the BPMN 2 standard > > > Learn Process modeling best practices with Bonita BPM through live > > > exercises > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > > event?utm_ > > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > _______________________________________________ > > > Bigdata-developers mailing list > > > Big...@li... > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > > > > |
From: Andreas K. <ka...@bs...> - 2015-04-23 13:30:57
|
Now I am 25mins into the new load with groupCommit enabled and com.bigdata.rwstore.RWStore.smallSlotType=1024 commented out. Currently 24,870,000 Triples are parsed and the journal is at 3.6GB. It looks like disabling smallSlotOptimization also resolves the problem (Otherwise I would have more than twice the space used at that time). So, I would conclude, it's the combination of groupCommit and smallSlotOptimization. All tests were run on Blazegraph 1.5.1 from Git revision f4c63e5. Best Regards Andreas >>> "Andreas Kahl" <ka...@bs...> 23.04.2015 14:54 >>> Bryan, in the meantime, I could successfully load the file into a 18GB journal after disabling groupCommit (I simply commented out the line in RWStore.properties). I can try again with groupCommit enabled, but smallSlotOptimization disabled. Best Regards Andreas >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> Andreas, I was not able to replicate your result. Unfortunately I navigated away from the browser page in which I had submitted the request, so it loaded all the data but failed to commit. However, the resulting file is only 16GB. I will redo this run and verify that the journal after the commit has this same size on the disk. I was only assuming that this was related to group commit because of your original message. Perhaps I misinterpreted your message. This is simply about 1.5.1 (with group commit) vs 1.4.0. Perhaps the issue is related to the small slot optimization? Maybe in combination with group commit? *> com.bigdata.rwstore.RWStore.smallSlotType=1024* I could not replicate your properties exactly because you are using a non-standard vocabulary class. Therefore I simply deleted the default namespace (in quads mode) and recreated it with the defaults in triples mode. The small slot optimization and other parameters were not enabled in my run. Perhaps you could try to replicate my experience and I will enable the small slots optimization? Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> wrote: > Bryan & Martyn, > > Thank you very much for investigating the issue. I assume from the ticket > that the error will vanish if I disable groupCommit. I will do so for the > meantime. > > Although there is already extensive information in Bryan's ticket, please > find attached my logs and DumpJournal outputs: > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > ran into "No space left on device" > - dumpJournalWithTraceEnabled.html is the same dump for a running query > when the journal was at about 14GB > - queryStatus.html is just the status page showing my query > - catalina.out.gz contains the trace outputs from starting Tomcat until I > killed the curl running the SPARQL Update by Ctrl-C > - loadGnd.log.gz is Blazegraphs output when loading the data > > Best Regards > Andreas > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > See http://trac.bigdata.com/ticket/1206. This is still in the > investigation stage. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Hello everyone, > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph from > > Git and tried to load a dataset into the updated Webapp. With Bigdata > 1.4.0 > > this resulted in a journal of ~18GB. Now the process was cancelled > because > > the disk was full - the journal was beyond 50GB for the same file with > the > > same settings. > > The only exception was that I activated GroupCommit. > > > > The dataset can be downloaded here: > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > . > > Please find the settings used to load the file below. > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > Best regards > > Andreas > > > > Namespace-Properties: > > curl -H "Accept: text/plain" > > http://localhost:8080/bigdata/namespace/gnd/properties > > #Wed Apr 22 11:35:31 CEST 2015 > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.relation.container=gnd > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > com.bigdata.rdf.sail.isolatableIndices=false > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > com.bigdata.rdf.sail.bufferCapacity=2000 > > com.bigdata.rdf.sail.truthMaintenance=false > > com.bigdata.rdf.sail.namespace=gnd > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > com.bigdata.relation.namespace=gndity=10000 > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > ------------------------------------------------------------------------------ > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > Develop your own process in accordance with the BPMN 2 standard > > Learn Process modeling best practices with Bonita BPM through live > > exercises > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > event?utm_ > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Bigdata-developers mailing list Big...@li... https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Andreas K. <ka...@bs...> - 2015-04-23 12:54:47
|
Bryan, in the meantime, I could successfully load the file into a 18GB journal after disabling groupCommit (I simply commented out the line in RWStore.properties). I can try again with groupCommit enabled, but smallSlotOptimization disabled. Best Regards Andreas >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> Andreas, I was not able to replicate your result. Unfortunately I navigated away from the browser page in which I had submitted the request, so it loaded all the data but failed to commit. However, the resulting file is only 16GB. I will redo this run and verify that the journal after the commit has this same size on the disk. I was only assuming that this was related to group commit because of your original message. Perhaps I misinterpreted your message. This is simply about 1.5.1 (with group commit) vs 1.4.0. Perhaps the issue is related to the small slot optimization? Maybe in combination with group commit? *> com.bigdata.rwstore.RWStore.smallSlotType=1024* I could not replicate your properties exactly because you are using a non-standard vocabulary class. Therefore I simply deleted the default namespace (in quads mode) and recreated it with the defaults in triples mode. The small slot optimization and other parameters were not enabled in my run. Perhaps you could try to replicate my experience and I will enable the small slots optimization? Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> wrote: > Bryan & Martyn, > > Thank you very much for investigating the issue. I assume from the ticket > that the error will vanish if I disable groupCommit. I will do so for the > meantime. > > Although there is already extensive information in Bryan's ticket, please > find attached my logs and DumpJournal outputs: > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > ran into "No space left on device" > - dumpJournalWithTraceEnabled.html is the same dump for a running query > when the journal was at about 14GB > - queryStatus.html is just the status page showing my query > - catalina.out.gz contains the trace outputs from starting Tomcat until I > killed the curl running the SPARQL Update by Ctrl-C > - loadGnd.log.gz is Blazegraphs output when loading the data > > Best Regards > Andreas > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > See http://trac.bigdata.com/ticket/1206. This is still in the > investigation stage. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Hello everyone, > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph from > > Git and tried to load a dataset into the updated Webapp. With Bigdata > 1.4.0 > > this resulted in a journal of ~18GB. Now the process was cancelled > because > > the disk was full - the journal was beyond 50GB for the same file with > the > > same settings. > > The only exception was that I activated GroupCommit. > > > > The dataset can be downloaded here: > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > . > > Please find the settings used to load the file below. > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > Best regards > > Andreas > > > > Namespace-Properties: > > curl -H "Accept: text/plain" > > http://localhost:8080/bigdata/namespace/gnd/properties > > #Wed Apr 22 11:35:31 CEST 2015 > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.relation.container=gnd > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > com.bigdata.rdf.sail.isolatableIndices=false > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > com.bigdata.rdf.sail.bufferCapacity=2000 > > com.bigdata.rdf.sail.truthMaintenance=false > > com.bigdata.rdf.sail.namespace=gnd > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > com.bigdata.relation.namespace=gndity=10000 > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > ------------------------------------------------------------------------------ > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > Develop your own process in accordance with the BPMN 2 standard > > Learn Process modeling best practices with Bonita BPM through live > > exercises > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > event?utm_ > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > |
From: Bryan T. <br...@sy...> - 2015-04-23 11:24:33
|
Andreas, I was not able to replicate your result. Unfortunately I navigated away from the browser page in which I had submitted the request, so it loaded all the data but failed to commit. However, the resulting file is only 16GB. I will redo this run and verify that the journal after the commit has this same size on the disk. I was only assuming that this was related to group commit because of your original message. Perhaps I misinterpreted your message. This is simply about 1.5.1 (with group commit) vs 1.4.0. Perhaps the issue is related to the small slot optimization? Maybe in combination with group commit? *> com.bigdata.rwstore.RWStore.smallSlotType=1024* I could not replicate your properties exactly because you are using a non-standard vocabulary class. Therefore I simply deleted the default namespace (in quads mode) and recreated it with the defaults in triples mode. The small slot optimization and other parameters were not enabled in my run. Perhaps you could try to replicate my experience and I will enable the small slots optimization? Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> wrote: > Bryan & Martyn, > > Thank you very much for investigating the issue. I assume from the ticket > that the error will vanish if I disable groupCommit. I will do so for the > meantime. > > Although there is already extensive information in Bryan's ticket, please > find attached my logs and DumpJournal outputs: > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph > ran into "No space left on device" > - dumpJournalWithTraceEnabled.html is the same dump for a running query > when the journal was at about 14GB > - queryStatus.html is just the status page showing my query > - catalina.out.gz contains the trace outputs from starting Tomcat until I > killed the curl running the SPARQL Update by Ctrl-C > - loadGnd.log.gz is Blazegraphs output when loading the data > > Best Regards > Andreas > > > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> > See http://trac.bigdata.com/ticket/1206. This is still in the > investigation stage. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> > wrote: > > > Hello everyone, > > > > I currently updated to the current Revision (f4c63e5) of Blazegraph from > > Git and tried to load a dataset into the updated Webapp. With Bigdata > 1.4.0 > > this resulted in a journal of ~18GB. Now the process was cancelled > because > > the disk was full - the journal was beyond 50GB for the same file with > the > > same settings. > > The only exception was that I activated GroupCommit. > > > > The dataset can be downloaded here: > > > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > > . > > Please find the settings used to load the file below. > > > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > > > Best regards > > Andreas > > > > Namespace-Properties: > > curl -H "Accept: text/plain" > > http://localhost:8080/bigdata/namespace/gnd/properties > > #Wed Apr 22 11:35:31 CEST 2015 > > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.relation.container=gnd > > com.bigdata.rwstore.RWStore.smallSlotType=1024 > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > > > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > > com.bigdata.btree.BTree.branchingFactor=700 > > > > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > > com.bigdata.rdf.sail.isolatableIndices=false > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > > com.bigdata.rdf.sail.bufferCapacity=2000 > > com.bigdata.rdf.sail.truthMaintenance=false > > com.bigdata.rdf.sail.namespace=gnd > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > > com.bigdata.rdf.store.AbstractTripleStore.quads=false > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > > com.bigdata.search.FullTextIndex.fieldsEnabled=false > > com.bigdata.relation.namespace=gndity=10000 > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > > > > > ------------------------------------------------------------------------------ > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > > Develop your own process in accordance with the BPMN 2 standard > > Learn Process modeling best practices with Bonita BPM through live > > exercises > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > > event?utm_ > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > |
From: Andreas K. <ka...@bs...> - 2015-04-23 05:51:40
|
Bryan & Martyn, Thank you very much for investigating the issue. I assume from the ticket that the error will vanish if I disable groupCommit. I will do so for the meantime. Although there is already extensive information in Bryan's ticket, please find attached my logs and DumpJournal outputs: - dumpJournal.html contains a dump from the 67GB journal after Blazegraph ran into "No space left on device" - dumpJournalWithTraceEnabled.html is the same dump for a running query when the journal was at about 14GB - queryStatus.html is just the status page showing my query - catalina.out.gz contains the trace outputs from starting Tomcat until I killed the curl running the SPARQL Update by Ctrl-C - loadGnd.log.gz is Blazegraphs output when loading the data Best Regards Andreas >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> See http://trac.bigdata.com/ticket/1206. This is still in the investigation stage. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> wrote: > Hello everyone, > > I currently updated to the current Revision (f4c63e5) of Blazegraph from > Git and tried to load a dataset into the updated Webapp. With Bigdata 1.4.0 > this resulted in a journal of ~18GB. Now the process was cancelled because > the disk was full - the journal was beyond 50GB for the same file with the > same settings. > The only exception was that I activated GroupCommit. > > The dataset can be downloaded here: > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > . > Please find the settings used to load the file below. > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > Best regards > Andreas > > Namespace-Properties: > curl -H "Accept: text/plain" > http://localhost:8080/bigdata/namespace/gnd/properties > #Wed Apr 22 11:35:31 CEST 2015 > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > com.bigdata.relation.container=gnd > com.bigdata.rwstore.RWStore.smallSlotType=1024 > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > com.bigdata.rdf.sail.isolatableIndices=false > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > com.bigdata.rdf.sail.bufferCapacity=2000 > com.bigdata.rdf.sail.truthMaintenance=false > com.bigdata.rdf.sail.namespace=gnd > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > com.bigdata.rdf.store.AbstractTripleStore.quads=false > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > com.bigdata.search.FullTextIndex.fieldsEnabled=false > com.bigdata.relation.namespace=gndity=10000 > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Bryan T. <br...@sy...> - 2015-04-22 21:50:02
|
Lee, Acknowledge. I am swamped by other things right now. I will try to get back to this as soon as I can. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Wed, Apr 22, 2015 at 5:41 AM, Lee Kitching <le...@sw...> wrote: > Hi Bryan, > > Yes the AST in the test is supposed to be for the query > > select (count(*) as ?c) where { > select * where { > select * where { ?s ?p ?o } > } limit 21 offset 0 > } > > Thanks > > On Tue, Apr 21, 2015 at 7:53 PM, Bryan Thompson <br...@sy...> wrote: > >> Lee, >> >> I can replicate the problem with your query (as given above) against the >> sparql end point. >> >> Can you state the SPARQL that you are trying to model with this unit >> test? It appears to be not query the same as your SPARQL query above. I >> would like to make sure that it is being translated correctly into the >> AST. I can then look at the expected AST and work backwards and see if I >> believe that the test shows the problem. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://blazegraph.com >> http://blog.bigdata.com <http://bigdata.com> >> http://mapgraph.io >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Tue, Apr 21, 2015 at 11:07 AM, Lee Kitching <le...@sw...> wrote: >> >>> Hi Bryan, >>> >>> We allow users to enter their own SPARQL queries and wrap them to do >>> things like pagination so unfortunately we cannot just re-write our queries >>> to do the expansion manually. >>> I applied the fix detailed in the ticket and it fixes the for the query >>> I provided, however it fails to rewrite the following query: >>> >>> SELECT (COUNT(*) as ?c) { >>> SELECT * { >>> SELECT * WHERE { ?s ?p ?o } >>> } LIMIT 21 OFFSET 0 >>> } >>> >>> I attempted to debug the issue, and it seems to re-write the * >>> projection in the inner-most subquery but not the subquery with the limit >>> and offset. I created a test based on the >>> existing tests: >>> >>> public void test_wildcardProjectionOptimizer03() { >>> >>> /* >>> * Note: DO NOT share structures in this test!!!! >>> */ >>> final IBindingSet[] bsets = new IBindingSet[] {}; >>> >>> // The source AST. >>> final QueryRoot given = new QueryRoot(QueryType.SELECT); >>> { >>> final SubqueryRoot selectQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final JoinGroupNode whereClause1 = new JoinGroupNode(); >>> final StatementPatternNode spoPattern = new >>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"), >>> null, Scope.DEFAULT_CONTEXTS); >>> whereClause1.addChild(spoPattern); >>> >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("*")); >>> selectQuery.setProjection(p); >>> selectQuery.setWhereClause(whereClause1); >>> } >>> >>> final SubqueryRoot sliceQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("*")); >>> sliceQuery.setProjection(p); >>> >>> final JoinGroupNode whereClause = new JoinGroupNode(); >>> whereClause.addChild(selectQuery); >>> >>> sliceQuery.setSlice(new SliceNode(0, 21)); >>> } >>> >>> final FunctionNode countNode = new FunctionNode( >>> FunctionRegistry.COUNT, >>> Collections.EMPTY_MAP, >>> new VarNode("*")); >>> >>> final ProjectionNode countProjection = new ProjectionNode(); >>> countProjection.addProjectionExpression(new >>> AssignmentNode(new VarNode("c"), countNode)); >>> >>> JoinGroupNode countWhere = new JoinGroupNode(); >>> countWhere.addChild(sliceQuery); >>> >>> given.setProjection(countProjection); >>> given.setWhereClause(countWhere); >>> } >>> >>> final QueryRoot expected = new QueryRoot(QueryType.SELECT); >>> { >>> final SubqueryRoot selectQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final JoinGroupNode whereClause1 = new JoinGroupNode(); >>> final StatementPatternNode spoPattern = new >>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"), >>> null, Scope.DEFAULT_CONTEXTS); >>> whereClause1.addChild(spoPattern); >>> >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("s")); >>> p.addProjectionVar(new VarNode("p")); >>> p.addProjectionVar(new VarNode("o")); >>> selectQuery.setProjection(p); >>> selectQuery.setWhereClause(whereClause1); >>> } >>> >>> final SubqueryRoot sliceQuery = new >>> SubqueryRoot(QueryType.SELECT); >>> { >>> final ProjectionNode p = new ProjectionNode(); >>> p.addProjectionVar(new VarNode("s")); >>> p.addProjectionVar(new VarNode("p")); >>> p.addProjectionVar(new VarNode("o")); >>> >>> sliceQuery.setProjection(p); >>> >>> final JoinGroupNode whereClause = new JoinGroupNode(); >>> whereClause.addChild(selectQuery); >>> >>> sliceQuery.setSlice(new SliceNode(0, 21)); >>> } >>> >>> final FunctionNode countNode = new FunctionNode( >>> FunctionRegistry.COUNT, >>> Collections.EMPTY_MAP, >>> new VarNode("*")); >>> >>> final ProjectionNode countProjection = new ProjectionNode(); >>> countProjection.addProjectionExpression(new >>> AssignmentNode(new VarNode("c"), countNode)); >>> >>> JoinGroupNode countWhere = new JoinGroupNode(); >>> countWhere.addChild(sliceQuery); >>> >>> expected.setProjection(countProjection); >>> expected.setWhereClause(countWhere); >>> } >>> >>> final IASTOptimizer rewriter = new >>> ASTWildcardProjectionOptimizer(); >>> >>> final IQueryNode actual = rewriter.optimize(null/* >>> AST2BOpContext */, >>> given/* queryNode */, bsets); >>> >>> assertSameAST(expected, actual); >>> >>> } >>> >>> however I am having some problems running the tests locally so I don't >>> know if it accurately models the situation. >>> >>> Thanks >>> >>> >>> >>> On Mon, Apr 20, 2015 at 9:05 PM, Bryan Thompson <br...@sy...> >>> wrote: >>> >>>> Lee, >>>> >>>> I've updated the ticket with the code changes and the test changes. >>>> Please try this out and let me know if you have any problems. >>>> >>>> Thanks, >>>> Bryan >>>> >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://blazegraph.com >>>> http://blog.bigdata.com <http://bigdata.com> >>>> http://mapgraph.io >>>> >>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive >>>> new technology to use GPUs to accelerate data-parallel graph analytics. >>>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >>>> are for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments is >>>> prohibited. If you have received this communication in error, please notify >>>> the sender by reply email and permanently delete all copies of the email >>>> and its contents and attachments. >>>> >>>> On Mon, Apr 20, 2015 at 1:20 PM, Lee Kitching <le...@sw...> wrote: >>>> >>>>> Hi, >>>>> >>>>> We are currently evaluating using Blazegraph as our rdf database and >>>>> have run in the issue described at http://trac.bigdata.com/ticket/757. >>>>> The below query causes the AssertionError to be thrown: >>>>> >>>>> SELECT (COUNT(*) as ?c) { >>>>> SELECT ?uri ?graph where { >>>>> { >>>>> SELECT * WHERE { >>>>> GRAPH ?graph { >>>>> ?uri a <http://object> . >>>>> ?uri <http://purl.org/dc/terms/title> ?title . >>>>> } >>>>> MINUS { >>>>> ?uri a <http://other> >>>>> } >>>>> } >>>>> ORDER BY ?title >>>>> } >>>>> } >>>>> } >>>>> >>>>> Some debugging shows that the error is caused by the >>>>> ASTWildcardProjectionOptimizer failing to recurse into the subqueries to >>>>> rewrite the * projection. However this recursion is implemented in the >>>>> BOpUtility.postOrderIterator(BOp) method - this method uses the argIterator >>>>> to >>>>> find child operators and therefore only visits children for nodes with >>>>> an arity > 0. >>>>> >>>>> The root query node for the above query has an empty 'args' collection >>>>> and all the associated components of the top-level query are stored in the >>>>> annotations map. It looks like the iterator should search through the >>>>> annotations rather than the args for query nodes. >>>>> >>>>> As there are a lot of implementations of the BOp interface, it seems >>>>> that changing the postOrderIterator2(BOp) method is unlikely to be the >>>>> correct fix. It seems that either the AST query nodes should override the >>>>> arity() function to return the count of the annotations map, or the >>>>> ASTWildcardProjectionOptimizer should use its own iterator for the nodes of >>>>> the query. The latter option would be the least impactful change but I am >>>>> not familiar with the codebase to understand the correct fix. >>>>> >>>>> Any help in resolving the issue would be appreciated. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>>>> Develop your own process in accordance with the BPMN 2 standard >>>>> Learn Process modeling best practices with Bonita BPM through live >>>>> exercises >>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >>>>> event?utm_ >>>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >>>>> _______________________________________________ >>>>> Bigdata-developers mailing list >>>>> Big...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>>> >>>>> >>>> >>> >> > |
From: Bryan T. <br...@sy...> - 2015-04-22 18:56:18
|
See http://trac.bigdata.com/ticket/1206. This is still in the investigation stage. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://blazegraph.com http://blog.bigdata.com <http://bigdata.com> http://mapgraph.io Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> wrote: > Hello everyone, > > I currently updated to the current Revision (f4c63e5) of Blazegraph from > Git and tried to load a dataset into the updated Webapp. With Bigdata 1.4.0 > this resulted in a journal of ~18GB. Now the process was cancelled because > the disk was full - the journal was beyond 50GB for the same file with the > same settings. > The only exception was that I activated GroupCommit. > > The dataset can be downloaded here: > http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz > . > Please find the settings used to load the file below. > > Do I have a misconfiguration, or is there a bug eating all disk memory? > > Best regards > Andreas > > Namespace-Properties: > curl -H "Accept: text/plain" > http://localhost:8080/bigdata/namespace/gnd/properties > #Wed Apr 22 11:35:31 CEST 2015 > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 > com.bigdata.relation.container=gnd > com.bigdata.rwstore.RWStore.smallSlotType=1024 > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl > > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary > com.bigdata.journal.AbstractJournal.initialExtent=209715200 > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true > com.bigdata.btree.BTree.branchingFactor=700 > > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms > com.bigdata.rdf.sail.isolatableIndices=false > com.bigdata.service.AbstractTransactionService.minReleaseAge=1 > com.bigdata.rdf.sail.bufferCapacity=2000 > com.bigdata.rdf.sail.truthMaintenance=false > com.bigdata.rdf.sail.namespace=gnd > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore > com.bigdata.rdf.store.AbstractTripleStore.quads=false > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 > com.bigdata.search.FullTextIndex.fieldsEnabled=false > com.bigdata.relation.namespace=gnd > com.bigdata.journal.Journal.groupCommit=true > com.bigdata.btree.writeRetentionQueue.capacity=10000 > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Martyn C. <ma...@sy...> - 2015-04-22 14:55:24
|
Well TRACE on FixedAllocator will let you know when new Allocators are created, and also whenever addresses are recycled. In a well behaved system, the latter logging will flood the log, while if little or no recycling, then we'll see a higher proportion of new Allocator messages. It may be worth a short run (say 10 minutes, or waiting until journal has grown to 1G) to see what is written with this log4j property: log4j.logger.com.bigdata.rwstore.FixedAllocator=TRACE - Martyn On 22/04/2015 13:50, Bryan Thompson wrote: > I would wait on this. There will not (should not) be any intermediate > commits so what we need to do is log the allocators (and the shadow > allocators used during group commit for unisolated index operations). > > @Martyn: Can you suggest some logging that might capture what is happening > with the allocators during the load before Andreas retries this operation? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Wed, Apr 22, 2015 at 8:32 AM, Andreas Kahl <ka...@bs...> wrote: > >> There were no other concurrent queries. Just the one SPARQL LOAD. >> I have deleted the file in the meantime (after a bit of cleaning I had >> ~60GB, so the disk was full at that size). >> If I can run DumpJournal without a commit, I can easily re-run the Load up >> to the java.io.IOException thrown by the full disk. >> >> Currently I have restarted the LOAD. I will wait until it breaks down >> (about 1h) and try to run DumpJournal on it. >> >> Andreas >> >>>>> Bryan Thompson <br...@sy...> 22.04.15 14.03 Uhr >>> >> Were you running any other operations concurrently against the database? >> Other updates or queries? >> >> In general, it is helpful to get the metadata about the allocators and root >> blocks. However, from what you have written, it sounds like you terminated >> the process when the disk space filled up. In this case there would only >> be the original root blocks and no commit points recorded on the journal. >> >> If you still have the file, can you run DumpJournal on it and send the >> output? The -pages option is not required in this case since we are only >> interested in the root blocks and allocators. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://blazegraph.com >> http://blog.bigdata.com <http://bigdata.com> >> http://mapgraph.io >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Wed, Apr 22, 2015 at 7:58 AM, Andreas Kahl <ka...@bs...> >> wrote: >> >>> That was a newly created journal. I simply stopped tomcat, deleted >>> bigdata.jnl and restarted. >>> >>> Andreas >>> >>>>>> Bryan Thompson <br...@sy...> 22.04.15 13.46 Uhr >>> >>> Was the data loaded into a new and empty journal or into a pre-existing >>> journal? If the latter, what size was the journal and what data were in >>> it? >>> >>> Thanks, >>> Bryan >>> >>> ---- >>> Bryan Thompson >>> Chief Scientist & Founder >>> SYSTAP, LLC >>> 4501 Tower Road >>> Greensboro, NC 27410 >>> br...@sy... >>> http://blazegraph.com >>> http://blog.bigdata.com <http://bigdata.com> >>> http://mapgraph.io >>> >>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >>> technology to use GPUs to accelerate data-parallel graph analytics. >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >>> for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please >> notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. >>> >>> On Wed, Apr 22, 2015 at 6:54 AM, Andreas Kahl <ka...@bs...> >>> wrote: >>> >>>> Bryan, >>>> >>>> yes, I used this command: >>>> curl -d"update=LOAD <file:///srv/feed-dateien/DNBLOD/GND.rdf.gz>;" >>>> -d"namespace=gnd" -d"monitor=true" >> http://localhost:8080/bigdata/sparql >>>> Best Regards >>>> Andreas >>>> >>>>>>> Bryan Thompson <br...@sy...> 22.04.15 12.51 Uhr >>> >>>> Andreas, >>>> >>>> What command did you use to load the data set? I.e., SPARQL update >>> "Load" >>>> or something else? >>>> >>>> Than Hello everyone, >>>>> I currently updated to the current Revision (f4c63e5) of Blazegraph >>> from >>>>> Git and tried to load a dataset into the updated Webapp. With Bigdata >>>> 1.4.0 >>>>> this resulted in a journal of ~18GB. Now the process was cancelled >>>> because >>>>> the disk was full - the journal was beyond 50GB for the same file >> with >>>> the >>>>> same settings. >>>>> The only exception was that I activated GroupCommit. >>>>> >>>>> The dataset can be downloaded here: >>>>> >> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz >>>>> . >>>>> Please find the settings used to load the file below. >>>>> >>>>> Do I have a misconfiguration, or is there a bug eating all disk >> memory? >>>>> Best regards >>>>> Andreas >>>>> >>>>> Namespace-Properties: >>>>> curl -H "Accept: text/plain" >>>>> http://localhost:8080/bigdata/namespace/gnd/properties >>>>> #Wed Apr 22 11:35:31 CEST 2015 >>>>> >>> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 >>>>> com.bigdata.relation.container=gnd >>>>> com.bigdata.rwstore.RWStore.smallSlotType=1024 >>>>> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW >>>>> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl >>>>> >>>>> >> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary >>>>> com.bigdata.journal.AbstractJournal.initialExtent=209715200 >>>>> com.bigdata.rdf.store.AbstractTripleStore.textIndex=true >>>>> com.bigdata.btree.BTree.branchingFactor=700 >>>>> >>>>> >> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms >>>>> com.bigdata.rdf.sail.isolatableIndices=false >>>>> com.bigdata.service.AbstractTransactionService.minReleaseAge=1 >>>>> com.bigdata.rdf.sail.bufferCapacity=2000 >>>>> com.bigdata.rdf.sail.truthMaintenance=false >>>>> com.bigdata.rdf.sail.namespace=gnd >>>>> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore >>>>> com.bigdata.rdf.store.AbstractTripleStore.quads=false >>>>> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 >>>>> com.bigdata.search.FullTextIndex.fieldsEnabled=false >>>>> com.bigdata.relation.namespace=gnd >>>>> com.bigdata.j.sail.BigdataSail.bufferCapacity=2000 >>>>> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false >>>>> >>>> >>>> -- >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://blazegraph.com >>>> http://blog.bigdata.com <http://bigdata.com> >>>> http://mapgraph.io >>>> >>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive >> new >>>> technology to use GPUs to accelerate data-parallel graph analytics. >>>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are >>>> for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments >> is >>>> prohibited. If you have received this communication in error, please >>> notify >>>> the sender by reply email and permanently delete all copies of the >> email >>>> and its contents and attachments. >>>> >>>> >>> >> > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |