From: Martyn C. <ma...@sy...> - 2015-04-22 14:55:24
|
Well TRACE on FixedAllocator will let you know when new Allocators are created, and also whenever addresses are recycled. In a well behaved system, the latter logging will flood the log, while if little or no recycling, then we'll see a higher proportion of new Allocator messages. It may be worth a short run (say 10 minutes, or waiting until journal has grown to 1G) to see what is written with this log4j property: log4j.logger.com.bigdata.rwstore.FixedAllocator=TRACE - Martyn On 22/04/2015 13:50, Bryan Thompson wrote: > I would wait on this. There will not (should not) be any intermediate > commits so what we need to do is log the allocators (and the shadow > allocators used during group commit for unisolated index operations). > > @Martyn: Can you suggest some logging that might capture what is happening > with the allocators during the load before Andreas retries this operation? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Wed, Apr 22, 2015 at 8:32 AM, Andreas Kahl <ka...@bs...> wrote: > >> There were no other concurrent queries. Just the one SPARQL LOAD. >> I have deleted the file in the meantime (after a bit of cleaning I had >> ~60GB, so the disk was full at that size). >> If I can run DumpJournal without a commit, I can easily re-run the Load up >> to the java.io.IOException thrown by the full disk. >> >> Currently I have restarted the LOAD. I will wait until it breaks down >> (about 1h) and try to run DumpJournal on it. >> >> Andreas >> >>>>> Bryan Thompson <br...@sy...> 22.04.15 14.03 Uhr >>> >> Were you running any other operations concurrently against the database? >> Other updates or queries? >> >> In general, it is helpful to get the metadata about the allocators and root >> blocks. However, from what you have written, it sounds like you terminated >> the process when the disk space filled up. In this case there would only >> be the original root blocks and no commit points recorded on the journal. >> >> If you still have the file, can you run DumpJournal on it and send the >> output? The -pages option is not required in this case since we are only >> interested in the root blocks and allocators. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://blazegraph.com >> http://blog.bigdata.com <http://bigdata.com> >> http://mapgraph.io >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Wed, Apr 22, 2015 at 7:58 AM, Andreas Kahl <ka...@bs...> >> wrote: >> >>> That was a newly created journal. I simply stopped tomcat, deleted >>> bigdata.jnl and restarted. >>> >>> Andreas >>> >>>>>> Bryan Thompson <br...@sy...> 22.04.15 13.46 Uhr >>> >>> Was the data loaded into a new and empty journal or into a pre-existing >>> journal? If the latter, what size was the journal and what data were in >>> it? >>> >>> Thanks, >>> Bryan >>> >>> ---- >>> Bryan Thompson >>> Chief Scientist & Founder >>> SYSTAP, LLC >>> 4501 Tower Road >>> Greensboro, NC 27410 >>> br...@sy... >>> http://blazegraph.com >>> http://blog.bigdata.com <http://bigdata.com> >>> http://mapgraph.io >>> >>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >>> technology to use GPUs to accelerate data-parallel graph analytics. >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >>> for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please >> notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. >>> >>> On Wed, Apr 22, 2015 at 6:54 AM, Andreas Kahl <ka...@bs...> >>> wrote: >>> >>>> Bryan, >>>> >>>> yes, I used this command: >>>> curl -d"update=LOAD <file:///srv/feed-dateien/DNBLOD/GND.rdf.gz>;" >>>> -d"namespace=gnd" -d"monitor=true" >> http://localhost:8080/bigdata/sparql >>>> Best Regards >>>> Andreas >>>> >>>>>>> Bryan Thompson <br...@sy...> 22.04.15 12.51 Uhr >>> >>>> Andreas, >>>> >>>> What command did you use to load the data set? I.e., SPARQL update >>> "Load" >>>> or something else? >>>> >>>> Than Hello everyone, >>>>> I currently updated to the current Revision (f4c63e5) of Blazegraph >>> from >>>>> Git and tried to load a dataset into the updated Webapp. With Bigdata >>>> 1.4.0 >>>>> this resulted in a journal of ~18GB. Now the process was cancelled >>>> because >>>>> the disk was full - the journal was beyond 50GB for the same file >> with >>>> the >>>>> same settings. >>>>> The only exception was that I activated GroupCommit. >>>>> >>>>> The dataset can be downloaded here: >>>>> >> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz >>>>> . >>>>> Please find the settings used to load the file below. >>>>> >>>>> Do I have a misconfiguration, or is there a bug eating all disk >> memory? >>>>> Best regards >>>>> Andreas >>>>> >>>>> Namespace-Properties: >>>>> curl -H "Accept: text/plain" >>>>> http://localhost:8080/bigdata/namespace/gnd/properties >>>>> #Wed Apr 22 11:35:31 CEST 2015 >>>>> >>> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 >>>>> com.bigdata.relation.container=gnd >>>>> com.bigdata.rwstore.RWStore.smallSlotType=1024 >>>>> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW >>>>> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl >>>>> >>>>> >> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary >>>>> com.bigdata.journal.AbstractJournal.initialExtent=209715200 >>>>> com.bigdata.rdf.store.AbstractTripleStore.textIndex=true >>>>> com.bigdata.btree.BTree.branchingFactor=700 >>>>> >>>>> >> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms >>>>> com.bigdata.rdf.sail.isolatableIndices=false >>>>> com.bigdata.service.AbstractTransactionService.minReleaseAge=1 >>>>> com.bigdata.rdf.sail.bufferCapacity=2000 >>>>> com.bigdata.rdf.sail.truthMaintenance=false >>>>> com.bigdata.rdf.sail.namespace=gnd >>>>> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore >>>>> com.bigdata.rdf.store.AbstractTripleStore.quads=false >>>>> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 >>>>> com.bigdata.search.FullTextIndex.fieldsEnabled=false >>>>> com.bigdata.relation.namespace=gnd >>>>> com.bigdata.j.sail.BigdataSail.bufferCapacity=2000 >>>>> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false >>>>> >>>> >>>> -- >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://blazegraph.com >>>> http://blog.bigdata.com <http://bigdata.com> >>>> http://mapgraph.io >>>> >>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive >> new >>>> technology to use GPUs to accelerate data-parallel graph analytics. >>>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are >>>> for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments >> is >>>> prohibited. If you have received this communication in error, please >>> notify >>>> the sender by reply email and permanently delete all copies of the >> email >>>> and its contents and attachments. >>>> >>>> >>> >> > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |