From: Martyn C. <ma...@sy...> - 2015-04-24 12:35:30
|
I don't see how the small slot optimisation can result in more waste with larger allocators. It is simply a mechanism to avoid rapid re-allocation of the small slot allocators to attempt to improve write elision on recycled slots. In the latest Allocator dump, there are a lot of 64 byte allocators. Unlike the larger allocators (128 and greater) a large proportion of the 64 byte slots will be used for long literal values (note that the mean allocation is only 27 bytes). Counter intuitively, there may well be a case for excluding the 64 byte allocators from the "small slot optimisation". So "small slot" NOT "smallest slot" ;-) - Martyn On 24/04/2015 00:18, Bryan Thompson wrote: > I've updated the ticket. I've also copied my main conclusions inline below. > > I think that the issue here is the use of the small slot optimization > without proper configuration of the indices in order to target small > allocation slots for at least one of the indices. The small slot > optimization changes the allocation policy in two ways. > > 1. It has a strong preference to use only empty 8k pages for small > allocations (as configured, for allocations less than 1k). This allows us > to coalesce writes by combining them onto the same page. > 2. It has a preference to use allocation blocks that are relatively empty > for small slots. > > As a consequence, the small slot optimization MAY recruit more allocators > in order to have allocators for small slots that have good sparsity. > > The main goal of the small slot optimization is to optimize for indices > that have very scattered IO patterns. The indices that exhibits this the > most are the OSP and OCSP indices. In many cases even batched updates will > modify no more than a single tuple per page on this index. However, in > your configuration (and in mine when I enabled the small slot optimization > without adjusting the branching factors), the O(C)SP indices were not > created with a small branching factor, so the small slot allocation could > not be put to any good effect. However it did have a negative effect -- by > recruiting more allocators. If you want to use the small slot > optimization, make sure that at least the O(C)SP index has a relatively > small branching factor giving an effective slot size of 256 bytes or less > on average. > > I suggest that you retest w/o the small slot optimization and with group > commit still enabled. > > I've asked Martyn to look over the allocators from the small slot > optimization run and think about whether we can make this policy a little > more adaptive when the branching factors are not really tuned properly and > too many allocators with too much wasted space are allocated as a result. > Basically, how to avoid file bloat from misconfiguration. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://blazegraph.com > http://blog.bigdata.com <http://bigdata.com> > http://mapgraph.io > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints > APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new > technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote: > >> Ok, I can redo the test with smallSlots + groupCommit enabled, and run >> http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some >> minutes. (I cannot run it on the fully loaded dataset because my disk is >> not sufficient for the resulting Journal). >> >> By the way: Please find attached my custom Vocabulary classes. They are >> just one of my many attempts to improve IO Perfomance on rotating disks. >> >> Best Regards >> Andreas >> >>>>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>> >> I just noticed that you have the full text index enabled as well. I have >> not be enabling that. >> >> I would like to see the output from this command on the fully loaded data >> sets. >> >> http://localhost:8080/bigdata/status?dumpJournal&dumpPages >> >> This will let us if any specific index is taking up a very large number of >> pages. It will also tell us the distribution over the page sizes for each >> index. >> >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://blazegraph.com >> http://blog.bigdata.com <http://bigdata.com> >> http://mapgraph.io >> >> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >> technology to use GPUs to accelerate data-parallel graph analytics. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> >> wrote: >> >>> Bryan, >>> >>> in the meantime, I could successfully load the file into a 18GB journal >>> after disabling groupCommit (I simply commented out the line in >>> RWStore.properties). >>> I can try again with groupCommit enabled, but smallSlotOptimization >>> disabled. >>> >>> Best Regards >>> Andreas >>> >>>>>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>> >>> Andreas, >>> >>> I was not able to replicate your result. Unfortunately I navigated away >>> from the browser page in which I had submitted the request, so it loaded >>> all the data but failed to commit. However, the resulting file is only >>> 16GB. >>> >>> I will redo this run and verify that the journal after the commit has >> this >>> same size on the disk. >>> >>> I was only assuming that this was related to group commit because of your >>> original message. Perhaps I misinterpreted your message. This is simply >>> about 1.5.1 (with group commit) vs 1.4.0. >>> >>> Perhaps the issue is related to the small slot optimization? Maybe in >>> combination with group commit? >>> >>> *> com.bigdata.rwstore.RWStore.smallSlotType=1024* >>> >>> I could not replicate your properties exactly because you are using a >>> non-standard vocabulary class. Therefore I simply deleted the default >>> namespace (in quads mode) and recreated it with the defaults in triples >>> mode. The small slot optimization and other parameters were not enabled >> in >>> my run. >>> >>> Perhaps you could try to replicate my experience and I will enable the >>> small slots optimization? >>> >>> Thanks, >>> Bryan >>> >>> ---- >>> Bryan Thompson >>> Chief Scientist & Founder >>> SYSTAP, LLC >>> 4501 Tower Road >>> Greensboro, NC 27410 >>> br...@sy... >>> http://blazegraph.com >>> http://blog.bigdata.com <http://bigdata.com> >>> http://mapgraph.io >>> >>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new >>> technology to use GPUs to accelerate data-parallel graph analytics. >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attacP. Any >> unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please >> notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. >>> >>> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> >>> wrote: >>> >>>> Bryan & Martyn, >>>> >>>> Thank you very much for investigating the issue. I assume from the >>> ticket >>>> that the error will vanish if I disable groupCommit. I will do so for >> the >>>> meantime. >>>> >>>> Although there is already extensive information in Bryan's ticket, >> please >>>> find attached my logs and DumpJournal outputs: >>>> - dumpJournal.html contains a dump from the 67GB journal after >> Blazegraph >>>> ran into "No space left on device" >>>> - dumpJournalWithTraceEnabled.html is the same dump for a running query >>>> when the journal was at about 14GB >>>> - queryStatus.html is just the status page showing my query >>>> - catalina.out.gz contains the trace outputs from starting Tomcat >> until I >>>> killed the curl running the SPARQL Update by Ctrl-C >>>> - loadGnd.log.gz is Blazegraphs output when loading the data >>>> >>>> Best Regards >>>> Andreas >>>> >>>> >>>> >>>>>>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>> >>>> See http://trac.bigdata.com/ticket/1206. This is still in the >>>> investigation stage. >>>> >>>> Thanks, >>>> Bryan >>>> >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://blazegraph.com >>>> http://blog.bigdata.com <http://bigdata.com> >>>> http://mapgraph.io >>>> >>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance >>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints >>>> APIs. MapGraph™ <http://www.systap.com/mapgraph> is our disruptive >> new >>>> technology to use GPUs to accelerate data-parallel graph analytics. >>>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are >>>> for the sole use of the intended recipient(s) and are confidential or >>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>>> dissemination or copying of this email or its contents or attachments >> is >>>> prohibited. If you have received this communication in error, please >>> notify >>>> the sender by reply email and permanently delete all copies of the >> email >>>> and its contents and attachments. >>>> >>>> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> >>>> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> I currently updated to the current Revision (f4c63e5) of Blazegraph >>> from >>>>> Git and tried to load a dataset into the updated Webapp. With Bigdata >>>> 1.4.0 >>>>> this resulted in a journal of ~18GB. Now the process was cancelled >>>> because >>>>> the disk was full - the journal was beyond 50GB for the same file >> with >>>> the >>>>> same settings. >>>>> The only exception was that I activated GroupCommit. >>>>> >>>>> The dataset can be downloaded here: >>>>> >> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz >>>>> . >>>>> Please find the settings used to load the file below. >>>>> >>>>> Do I have a misconfiguration, or is there a bug eating all disk >> memory? >>>>> Best regards >>>>> Andreas >>>>> >>>>> Namespace-Properties: >>>>> curl -H "Accept: text/plain" >>>>> http://localhost:8080/bigdata/namespace/gnd/properties >>>>> #Wed Apr 22 11:35:31 CEST 2015 >>>>> >>> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700 >>>>> com.bigdata.relation.container=gnd >>>>> com.bigdata.rwstore.RWStore.smallSlotType=1024 >>>>> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW >>>>> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl >>>>> >>>>> >>> com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true >>>>> com.bigdata.btree.BTree.branchingFactor=700 >>>>> >>>>> >> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms >>>>> com.bigdata.rdf.sail.isolatableIndices=false >>>>> com.bigdata.service.AbstractTransactionService.minReleaseAge=1 >>>>> com.bigdata.rdf.sail.bufferCapacity=2000 >>>>> com.bigdata.rdf.sail.truthMaintenance=false >>>>> com.bigdata.rdf.sail.namespace=gnd >>>>> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore >>>>> com.bigdata.rdf.store.AbstractTripleStore.quads=false >>>>> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500 >>>>> com.bigdata.search.FullTextIndex.fieldsEnabled=false >>>>> com.bigdata.relation.namespace=gndity=10000 >>>>> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000 >>>>> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false >>>>> >>>>> >>>>> >> ------------------------------------------------------------------------------ >>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>>>> Develop your own process in accordance with the BPMN 2 standard >>>>> Learn Process modeling best practices with Bonita BPM through live >>>>> exercises >>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >>>>> event?utm_ >>>>> >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >>>>> _______________________________________________ >>>>> Bigdata-developers mailing list >>>>> Big...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>>> >>>>> >>>> >> > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |