bigdata-developers Mailing List for Blazegraph (powered by bigdata) (Page 23)

Fast, scalable, robust graph database platform

Brought to you by: beebs, hyandell, mrpersonick, thompsonbry

bigdata-developers — List for bigdata developers

This list is closed, nobody may subscribe to it.

2010	Jan	Feb (19)	Mar (8)	Apr (25)	May (16)	Jun (77)	Jul (131)	Aug (76)	Sep (30)	Oct (7)	Nov (3)	Dec
2011	Jan	Feb	Mar	Apr	May (2)	Jun (2)	Jul (16)	Aug (3)	Sep (1)	Oct	Nov (7)	Dec (7)
2012	Jan (10)	Feb (1)	Mar (8)	Apr (6)	May (1)	Jun (3)	Jul (1)	Aug	Sep (1)	Oct	Nov (8)	Dec (2)
2013	Jan (5)	Feb (12)	Mar (2)	Apr (1)	May (1)	Jun (1)	Jul (22)	Aug (50)	Sep (31)	Oct (64)	Nov (83)	Dec (28)
2014	Jan (31)	Feb (18)	Mar (27)	Apr (39)	May (45)	Jun (15)	Jul (6)	Aug (27)	Sep (6)	Oct (67)	Nov (70)	Dec (1)
2015	Jan (3)	Feb (18)	Mar (22)	Apr (121)	May (42)	Jun (17)	Jul (8)	Aug (11)	Sep (26)	Oct (15)	Nov (66)	Dec (38)
2016	Jan (14)	Feb (59)	Mar (28)	Apr (44)	May (21)	Jun (12)	Jul (9)	Aug (11)	Sep (4)	Oct (2)	Nov (1)	Dec
2017	Jan (20)	Feb (7)	Mar (4)	Apr (18)	May (7)	Jun (3)	Jul (13)	Aug (2)	Sep (4)	Oct (9)	Nov (2)	Dec (5)
2018	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 21 22 23 24 25 .. 72 > >> (Page 23 of 72)

[Bigdata-developers] Write performance degrades during import

From: Lee K. <le...@sw...> - 2015-04-27 15:48:19

Hi,

We are trying to perform a bulk import into a new blazegraph journal. The
import process writes quads to an in-process BigdataSailRepository with the
following configuration based on the 'fastload' settings in the
bigdata-sails samples directory:

com.bigdata.rdf.store.AbstractTripleStore.quadsMode=true

com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms

com.bigdata.rdf.sail.truthMaintenance=false

com.bigdata.rdf.store.AbstractTripleStore.justify=false

com.bigdata.journal.AbstractJournal.initialExtent=209715200

com.bigdata.journal.AbstractJournal.maximumExtent=209715200

com.bigdata.rdf.store.AbstractTripleStore.textIndex=false

com.bigdata.journal.AbstractJournal.bufferMode=DiskRW

com.bigdata.sail.isolatableIndices=true

com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary

com.bigdata.journal.AbstractJournal.file=bigdata_conf.jnl

com.bigdata.journal.AbstractJournal.writeCacheBufferCount=2000

com.bigdata.btree.writeRetentionQueue.capacity=8000


When run against a native sesame repository, the import takes around 50
hours. When run against the blazegraph repository the import slows down
significantly after 2-3 hours and begins logging warnings of the form:

[2015-04-27 07:34:30,238][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.OCSP, 8 records (#nodes=3, #leaves=5) in 5493ms :
addrRoot=-244779124025982418
[2015-04-27 07:47:48,342][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.SOPC, 1 records (#nodes=1, #leaves=0) in 40841ms :
addrRoot=-246059333517835846
[2015-04-27 07:47:48,858][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.SPOC, 7 records (#nodes=4, #leaves=3) in 42109ms :
addrRoot=-246099989678259484
[2015-04-27 07:54:47,743][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.SOPC, 1 records (#nodes=1, #leaves=0) in 43231ms :
addrRoot=-245678000551493109
[2015-04-27 07:54:52,251][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.SPOC, 1 records (#nodes=1, #leaves=0) in 44875ms :
addrRoot=-245097441232158259
[2015-04-27 07:54:52,251][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.CSPO, 1 records (#nodes=1, #leaves=0) in 34808ms :
addrRoot=-245097501361700476
[2015-04-27 07:54:52,251][WARN][com.bigdata.btree.AbstractBTree] wrote:
name=kb.spo.POCS, 1 records (#nodes=1, #leaves=0) in 44875ms :
addrRoot=-245097342447910551

Are there any settings we should change or add to the journal configuration
to prevent this slowdown?

Thanks

[Bigdata-developers] Antw: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Andreas K. <ka...@bs...> - 2015-04-27 12:38:35

Attachments: dumpJournal.smallSlots.noGroupCommit.finishedLOAD.html dumpJournal.smallSlots.GroupCommit.unfinishedLOAD.html

Hello Bryan & Martin, 

Sorry for the long delay. Now I ran two dumpJournal&dumpPages: 
1. Dump while the SPARQL LOAD was running with groupCommit and smallSlotOptimization enabled (the one that cannot finish due to disk space)
2. Dump after the whole file was successfully loaded because I disabled groupCommit (I could also use groupCommit and disable smallSlots)

I will do what I can to help you testing and tracking down the problem. For me here it is not too much trouble working with the knowledge that I can only activate one of the both features at a time. 

Best Regards
Andreas

P.S. I also followed your advice to increase com.bigdata.rdf.sail.bufferCapacity as you can see from the settings of run No. 2: 
triples:/tmp # curl -H "Accept: text/plain" http://localhost:8080/bigdata/namespace/gnd/properties
#Mon Apr 27 14:26:37 CEST 2015
com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
com.bigdata.relation.container=gnd
com.bigdata.rwstore.RWStore.smallSlotType=1024
com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
com.bigdata.journal.AbstractJournal.initialExtent=209715200
com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
com.bigdata.btree.BTree.branchingFactor=700
com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
com.bigdata.rdf.sail.isolatableIndices=false
com.bigdata.service.AbstractTransactionService.minReleaseAge=1
com.bigdata.rdf.sail.bufferCapacity=200000
com.bigdata.rdf.sail.truthMaintenance=false
com.bigdata.rdf.sail.namespace=gnd
com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
com.bigdata.rdf.store.AbstractTripleStore.quads=false
com.bigdata.journal.AbstractJournal.writeCacheBufferCount=2000
com.bigdata.search.FullTextIndex.fieldsEnabled=false
com.bigdata.relation.namespace=gnd
com.bigdata.btree.writeRetentionQueue.capacity=10000
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false

>>> Bryan Thompson <br...@sy...> 24.04.15 18.45 Uhr >>>
Martyn and I discussed this in some depth today.  We've reopened the ticket
to:

a. gain more understanding of the interaction of the small slot
optimization and group commit.
b. verify correct reporting by the allocators in dumpJournal.
c. modify the small slots optimization allocator policy to make it less
susceptible to mis-configuration.

In the data as loaded, the OSP index was 66% blob slots (greater than 8k).
For the small slot optimization to be effective the O(C)SP index should
target a page size of 64-256 bytes.

(c) should minimize or remove the negative impact of the small slot
optimization in such cases.

Thanks,
Bryan



----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Fri, Apr 24, 2015 at 8:35 AM, Martyn Cutcher <ma...@sy...> wrote:

>  I don't see how the small slot optimisation can result in more waste with
> larger allocators.
>
> It is simply a mechanism to avoid rapid re-allocation of the small slotllocator dump, there are a lot of 64 byte allocators.
> Unlike the larger allocators (128 and greater) a large proportion of the 64
> byte slots will be used for long literal values (note that the mean
> allocation is only 27 bytes).
>
> Counter intuitively, there may well be a case for excluding the 64 byte
> allocators from the "small slot optimisation".  So "small slot" NOT
> "smallest slot" ;-)
>
> - Martyn
>
> On 24/04/2015 00:18, Bryan Thompson wrote:
>
> I've updated the ticket.  I've also copied my main conclusions inline below.
>
> I think that the issue here is the use of the small slot optimization
> without proper configuration of the indices in order to target small
> allocation slots for at least one of the indices.  The small slot
> optimization changes the allocation policy in two ways.
>
> 1. It has a strong preference to use only empty 8k pages for small
> allocations (as configured, for allocations less than 1k).  This allows us
> to coalesce writes by combining them onto the same page.
> 2. It has a preference to use allocation blocks that are relatively empty
> for small slots.
>
> As a consequence, the small slot optimization MAY recruit more allocators
> in order to have allocators for small slots that have good sparsity.
>
> The main goal of the small slot optimization is to optimize for indices
> that have very scattered IO patterns.  The indices that exhibits this the
> most are the OSP and OCSP indices.  In many cases even batched updates will
> modify no more than a single tuple per page on this index.  However, in
> your configuration (and in mine when I enabled the small slot optimization
> without adjusting the branching factors), the O(C)SP indices were not
> created with a small branching factor, so the small slot allocation could
> not be put to any good effect. However it did have a negative effect -- by
> recruiting more allocators.  If you want to use the small slot
> optimization, make sure that at least the O(C)SP index has a relatively
> small branching factor giving an effective slot size of 256 bytes or less
> on average.
>
> I suggest that you retest w/o the small slot optimization and with group
> commit still enabled.
>
> I've asked Martyn to look over the allocators from the small slot
> optimization run and think about whether we can make this policy a little
> more adaptive when the branching factors are not really tuned properly and
> too many allocators with too much wasted space are allocated as a result.
> Basically, how to avoid file bloat from misconfiguration.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> wrote:
>
>
>  Ok, I can redo the test with smallSlots + groupCommit enabled, and runhttp://localhost:8080/bigdata/status?dumpJournal&dumpPages after some
> minutes. (I cannot run it on the fully loadedjust one of my many attempts to improve IO Perfomance on rotating disks.
>
> Best Regards
> Andreas
>
>
>   Bryan Thompson <br...@sy...> <br...@sy...> 23.04.15 15.31 Uhr >>>
>
>   I just noticed that you have the full text index enabled as well.  I have
> not be enabling that.
>
> I would like to see the output from this command on the fully loaded data
> sets.
> http://localhost:8080/bigdata/status?dumpJournal&dumpPages
>
> This will let us if any specific index is taking up a very large number of
> pages.  It will also tell us the distribution over the page sizes for each
> index.
>
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> <ka...@bs...>
> wrote:
>
>
>  Bryan,
>
> in the meantime, I could successfully load the file into a 18GB journal
> after disabling groupCommit (I simply commented out the line in
> RWStore.properties).
> I can try again with groupCommit enabled, but smallSlotOptimization
> disabled.
>
> Best Regards
> Andreas
>
>
>   Bryan Thompson <br...@sy...> <br...@sy...> 23.04.2015 13:24 >>>
>
>   Andreas,
>
> I was not able to replicate your result.  Unfortunately I navigated away
> from the browser page in which I had submitted the request, so it loaded
> all the data but failed to commit.  However, the resulting file is only
> 16GB.
>
> I will redo this run and verify that the journal after the commit has
>
>  this
>
>  same size on the disk.
>
> I was only assuming that this was related to group commit because of your
> original message.  Perhaps I misinterpreted your message. This is simply
> about 1.5.1 (with group commit) vs 1.4.0.
>
> Perhaps the issue is related to the small slot optimization?  Maybe in
> combination with group commit?
>
> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>
> I could not replicate your properties exactly because you are using a
> non-standard vocabulary class.  Therefore I simply deleted the default
> namespace (in quads mode) and recreated it with the defaults in triples
> mode.  The small slot optimization and other parameters were not enabled
>
>  in
>
>  my run.
>
> Perhaps you could try to replicate my experience and I will enable the
> small slots optimization?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email a> prohibited. If you have received this communication in error, please
>
>  notify
>
>  the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> <ka...@bs...>
> wrote:
>
>
>  Bryan & Martyn,
>
> Thank you very much for investigating the issue. I assume  from the
>
>  ticket
>
>  that the error will vanish if I disable groupCommit. I will do so for
>
>  the
>
>  meantime.
>
> Although there is already extensive information in Bryan's ticket,
>
>  please
>
>  find attached my logs and DumpJournal outputs:
> - dumpJournal.html contains a dump from the 67GB journal after
>
>  Blazegraph
>
>  ran into "No space left on device"
> - dumpJournalWithTraceEnabled.html is the same dump for a running query
> when the journal was at about 14GB
> - queryStatus.html is just the status page showing my query
> - catalina.out.gz contains the trace outputs from starting Tomcat
>
>  until I
>
>  killed the curl running the SPARQL Update by Ctrl-C
> - loadGnd.log.gz is Blazegraphs output when loading the data
>
> Best Regards
> Andreas
>
>
>
>
>   Bryan Thompson <br...@sy...> <br...@sy...> 22.04.15 20.56 Uhr >>>
>
>   See http://trac.bigdata.com/ticket/1206.  This is still in the
> investigation stage.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive
>
>  new
>
>  technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>
>  are
>
>  for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments
>
>  is
>
>  prohibited. If you have received this communication in error, please
>
>  notify
>
>  the sender by reply email and permanently delete all copies of the
>
>  email
>
>  and its contents and attachments.
>
> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> <ka...@bs...>
> wrote:
>
>
>  Hello everyone,
>
> I currently updated to the current Revision (f4c63e5) of Blazegraph
>
>  from
>
>  Git and tried to load a dataset into the updated Webapp. With Bigdata
>
>  1.4.0
>
>  this resulted in a journal of ~18GB. Now the process was cancelled
>
>  because
>
>  the disk was full - the journal was beyond 50GB for the same file
>
>   with
>
>  the
>
>  same settings.
> The only exception was that I activated GroupCommit.
>
> The dataset can be downloaded here:
>
>
>    http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
>
>   .
> Please find the settings used to load the file below.
>
> Do I have a misconfiguration, or is there a bug eating all disk
>
>   memory?
>
>   Best regards
> Andreas
>
> Namespace-Properties:
> curl -H "Accept: text/plain"http://localhost:8080/bigdata/namespace/gnd/properties
> #Wed Apr 22 11:35:31 CEST 2015
>
>
>  com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
>
>  com.bigdata.relation.container=gnd
> com.bigdata.rwstore.RWStore.smallSlotType=1024
> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
>
>
>
>   com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true
>
>  com.bigdata.btree.BTree.branchingFactor=7ionService.minReleaseAge=1
> com.bigdata.rdf.sail.bufferCapacity=2000
> com.bigdata.rdf.sail.truthMaintenance=false
> com.bigdata.rdf.sail.namespace=gnd
> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> com.bigdata.rdf.store.AbstractTripleStore.quads=false
> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> com.bigdata.search.FullTextIndex.fieldsEnabled=false
> com.bigdata.relation.namespace=gndity=10000
> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
>
>
>
>
>    ------------------------------------------------------------------------------
>
>   BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exerciseshttp://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
>
>
>   source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>   _______________________________________________
> Bigdata-developers mailing lis...@li...://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Bigdata-developers mailing lis...@li...://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>

[Bigdata-developers] owl:Restriction reasoning not working

From: Kaushik C. <kay...@gm...> - 2015-04-27 07:12:36

Hi,

Please have a look at my question in StackOverflow where I've asked the
reason of a sparql query not giving me the intended results while I'm
running the NanoSparqlServer

http://stackoverflow.com/questions/29806316/owlrestriction-reasoning-not-working-in-blazegraph

Thanks in advance.

Re: [Bigdata-developers] Subquery wildcard projections not rewritten

From: Bryan T. <br...@sy...> - 2015-04-24 21:37:09

I think that this might be a bug in the postOrderIteratorWithAnnotations()
method.  Which lacks a unit test.

I've create a new ticket for that issue. See #1210.  Martyn is the
striterator wizard so I've asked him to take a look at this.

I have incorporated a version of your test case for the wildcard rewrites
which I believe to be correct into our master development branch.

The other way to fix this is by explicit recursion in the
ASTWildcardProjectionOptimizer. That is actually how most of the rewrites
are implemented which is why we are not using the
postOrderIteratorWithAnnotations() elsewhere and why this has gone
undetected.

I am going to be on travel for several days.  Hopefully this bug will
resolve as soon as the postOrderIteratorWithAnnotations() issue is resolved.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Wed, Apr 22, 2015 at 5:41 AM, Lee Kitching <le...@sw...> wrote:

> Hi Bryan,
>
> Yes the AST in the test is supposed to be for the query
>
> select (count(*) as ?c) where {
>    select * where {
>         select * where { ?s ?p ?o }
>      } limit 21 offset 0
>  }
>
> Thanks
>
> On Tue, Apr 21, 2015 at 7:53 PM, Bryan Thompson <br...@sy...> wrote:
>
>> Lee,
>>
>> I can replicate the problem with your query (as given above) against the
>> sparql end point.
>>
>> Can you state the SPARQL that you are trying to model with this unit
>> test?  It appears to be not query the same as your SPARQL query above.  I
>> would like to make sure that it is being translated correctly into the
>> AST.  I can then look at the expected AST and work backwards and see if I
>> believe that the test shows the problem.
>>
>> Thanks,
>> Bryan
>>
>> ----
>> Bryan Thompson
>> Chief Scientist & Founder
>> SYSTAP, LLC
>> 4501 Tower Road
>> Greensboro, NC 27410
>> br...@sy...
>> http://blazegraph.com
>> http://blog.bigdata.com <http://bigdata.com>
>> http://mapgraph.io
>>
>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
>> technology to use GPUs to accelerate data-parallel graph analytics.
>>
>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
>> for the sole use of the intended recipient(s) and are confidential or
>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>> dissemination or copying of this email or its contents or attachments is
>> prohibited. If you have received this communication in error, please notify
>> the sender by reply email and permanently delete all copies of the email
>> and its contents and attachments.
>>
>> On Tue, Apr 21, 2015 at 11:07 AM, Lee Kitching <le...@sw...> wrote:
>>
>>> Hi Bryan,
>>>
>>> We allow users to enter their own SPARQL queries and wrap them to do
>>> things like pagination so unfortunately we cannot just re-write our queries
>>> to do the expansion manually.
>>> I applied the fix detailed in the ticket and it fixes the for the query
>>> I provided, however it fails to rewrite the following query:
>>>
>>> SELECT (COUNT(*) as ?c) {
>>>   SELECT * {
>>>     SELECT * WHERE { ?s ?p ?o }
>>>   } LIMIT 21 OFFSET 0
>>> }
>>>
>>> I attempted to debug the issue, and it seems to re-write the *
>>> projection in the inner-most subquery but not the subquery with the limit
>>> and offset. I created a test based on the
>>> existing tests:
>>>
>>> public void test_wildcardProjectionOptimizer03() {
>>>
>>>         /*
>>>        * Note: DO NOT share structures in this test!!!!
>>>        */
>>>         final IBindingSet[] bsets = new IBindingSet[] {};
>>>
>>>         // The source AST.
>>>         final QueryRoot given = new QueryRoot(QueryType.SELECT);
>>>         {
>>>             final SubqueryRoot selectQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final JoinGroupNode whereClause1 = new JoinGroupNode();
>>>                 final StatementPatternNode spoPattern = new
>>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"),
>>> null, Scope.DEFAULT_CONTEXTS);
>>>                 whereClause1.addChild(spoPattern);
>>>
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("*"));
>>>                 selectQuery.setProjection(p);
>>>                 selectQuery.setWhereClause(whereClause1);
>>>             }
>>>
>>>             final SubqueryRoot sliceQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("*"));
>>>                 sliceQuery.setProjection(p);
>>>
>>>                 final JoinGroupNode whereClause = new JoinGroupNode();
>>>                 whereClause.addChild(selectQuery);
>>>
>>>                 sliceQuery.setSlice(new SliceNode(0, 21));
>>>             }
>>>
>>>             final FunctionNode countNode = new FunctionNode(
>>>                     FunctionRegistry.COUNT,
>>>                     Collections.EMPTY_MAP,
>>>                     new VarNode("*"));
>>>
>>>             final ProjectionNode countProjection = new ProjectionNode();
>>>             countProjection.addProjectionExpression(new
>>> AssignmentNode(new VarNode("c"), countNode));
>>>
>>>             JoinGroupNode countWhere = new JoinGroupNode();
>>>             countWhere.addChild(sliceQuery);
>>>
>>>             given.setProjection(countProjection);
>>>             given.setWhereClause(countWhere);
>>>         }
>>>
>>>         final QueryRoot expected = new QueryRoot(QueryType.SELECT);
>>>         {
>>>             final SubqueryRoot selectQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final JoinGroupNode whereClause1 = new JoinGroupNode();
>>>                 final StatementPatternNode spoPattern = new
>>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"),
>>> null, Scope.DEFAULT_CONTEXTS);
>>>                 whereClause1.addChild(spoPattern);
>>>
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("s"));
>>>                 p.addProjectionVar(new VarNode("p"));
>>>                 p.addProjectionVar(new VarNode("o"));
>>>                 selectQuery.setProjection(p);
>>>                 selectQuery.setWhereClause(whereClause1);
>>>             }
>>>
>>>             final SubqueryRoot sliceQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("s"));
>>>                 p.addProjectionVar(new VarNode("p"));
>>>                 p.addProjectionVar(new VarNode("o"));
>>>
>>>                 sliceQuery.setProjection(p);
>>>
>>>                 final JoinGroupNode whereClause = new JoinGroupNode();
>>>                 whereClause.addChild(selectQuery);
>>>
>>>                 sliceQuery.setSlice(new SliceNode(0, 21));
>>>             }
>>>
>>>             final FunctionNode countNode = new FunctionNode(
>>>                     FunctionRegistry.COUNT,
>>>                     Collections.EMPTY_MAP,
>>>                     new VarNode("*"));
>>>
>>>             final ProjectionNode countProjection = new ProjectionNode();
>>>             countProjection.addProjectionExpression(new
>>> AssignmentNode(new VarNode("c"), countNode));
>>>
>>>             JoinGroupNode countWhere = new JoinGroupNode();
>>>             countWhere.addChild(sliceQuery);
>>>
>>>             expected.setProjection(countProjection);
>>>             expected.setWhereClause(countWhere);
>>>         }
>>>
>>>         final IASTOptimizer rewriter = new
>>> ASTWildcardProjectionOptimizer();
>>>
>>>         final IQueryNode actual = rewriter.optimize(null/*
>>> AST2BOpContext */,
>>>                 given/* queryNode */, bsets);
>>>
>>>         assertSameAST(expected, actual);
>>>
>>>     }
>>>
>>> however I am having some problems running the tests locally so I don't
>>> know if it accurately models the situation.
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Mon, Apr 20, 2015 at 9:05 PM, Bryan Thompson <br...@sy...>
>>> wrote:
>>>
>>>> Lee,
>>>>
>>>> I've updated the ticket with the code changes and the test changes.
>>>> Please try this out and let me know if you have any problems.
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>> ----
>>>> Bryan Thompson
>>>> Chief Scientist & Founder
>>>> SYSTAP, LLC
>>>> 4501 Tower Road
>>>> Greensboro, NC 27410
>>>> br...@sy...
>>>> http://blazegraph.com
>>>> http://blog.bigdata.com <http://bigdata.com>
>>>> http://mapgraph.io
>>>>
>>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>>>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive
>>>> new technology to use GPUs to accelerate data-parallel graph analytics.
>>>>
>>>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>>>> are for the sole use of the intended recipient(s) and are confidential or
>>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>>>> dissemination or copying of this email or its contents or attachments is
>>>> prohibited. If you have received this communication in error, please notify
>>>> the sender by reply email and permanently delete all copies of the email
>>>> and its contents and attachments.
>>>>
>>>> On Mon, Apr 20, 2015 at 1:20 PM, Lee Kitching <le...@sw...> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are currently evaluating using Blazegraph as our rdf database and
>>>>> have run in the issue described at http://trac.bigdata.com/ticket/757.
>>>>> The below query causes the AssertionError to be thrown:
>>>>>
>>>>> SELECT (COUNT(*) as ?c) {
>>>>>   SELECT ?uri ?graph where {
>>>>>           {
>>>>>             SELECT * WHERE {
>>>>>               GRAPH ?graph {
>>>>>                 ?uri a <http://object> .
>>>>>                 ?uri <http://purl.org/dc/terms/title> ?title .
>>>>>               }
>>>>>               MINUS {
>>>>>                 ?uri a <http://other>
>>>>>               }
>>>>>             }
>>>>>             ORDER BY ?title
>>>>>           }
>>>>>         }
>>>>> }
>>>>>
>>>>> Some debugging shows that the error is caused by the
>>>>> ASTWildcardProjectionOptimizer failing to recurse into the subqueries to
>>>>> rewrite the * projection. However this recursion is implemented in the
>>>>> BOpUtility.postOrderIterator(BOp) method - this method uses the argIterator
>>>>> to
>>>>> find child operators and therefore only visits children for nodes with
>>>>> an arity > 0.
>>>>>
>>>>> The root query node for the above query has an empty 'args' collection
>>>>> and all the associated components of the top-level query are stored in the
>>>>> annotations map. It looks like the iterator should search through the
>>>>> annotations rather than the args for query nodes.
>>>>>
>>>>> As there are a lot of implementations of the BOp interface, it seems
>>>>> that changing the postOrderIterator2(BOp) method is unlikely to be the
>>>>> correct fix. It seems that either the AST query nodes should override the
>>>>> arity() function to return the count of the annotations map, or the
>>>>> ASTWildcardProjectionOptimizer should use its own iterator for the nodes of
>>>>> the query. The latter option would be the least impactful change but I am
>>>>> not familiar with the codebase to understand the correct fix.
>>>>>
>>>>> Any help in resolving the issue would be appreciated.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>>>> Develop your own process in accordance with the BPMN 2 standard
>>>>> Learn Process modeling best practices with Bonita BPM through live
>>>>> exercises
>>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>>>>> event?utm_
>>>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>>>> _______________________________________________
>>>>> Bigdata-developers mailing list
>>>>> Big...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [Bigdata-developers] java 8 support

From: Bryan T. <br...@sy...> - 2015-04-24 17:11:33

We have not been using it ourselves.  No known issues.  The code base is
still at Java 7 compatibility.  We have been discussing when to move the
code base to Java 8.  Personally, I think that this will happen in the 2.0
release.

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Fri, Apr 24, 2015 at 12:44 PM, Jeremy J Carroll <jj...@sy...> wrote:

> Is java 8 supported?
> Any issues?
>
> Jeremy
>
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>

[Bigdata-developers] java 8 support

From: Jeremy J C. <jj...@sy...> - 2015-04-24 17:08:47

Is java 8 supported?
Any issues?

Jeremy

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-24 16:44:43

Martyn and I discussed this in some depth today.  We've reopened the ticket
to:

a. gain more understanding of the interaction of the small slot
optimization and group commit.
b. verify correct reporting by the allocators in dumpJournal.
c. modify the small slots optimization allocator policy to make it less
susceptible to mis-configuration.

In the data as loaded, the OSP index was 66% blob slots (greater than 8k).
For the small slot optimization to be effective the O(C)SP index should
target a page size of 64-256 bytes.

(c) should minimize or remove the negative impact of the small slot
optimization in such cases.

Thanks,
Bryan



----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Fri, Apr 24, 2015 at 8:35 AM, Martyn Cutcher <ma...@sy...> wrote:

>  I don't see how the small slot optimisation can result in more waste with
> larger allocators.
>
> It is simply a mechanism to avoid rapid re-allocation of the small slot
> allocators to attempt to improve write elision on recycled slots.
>
> In the latest Allocator dump, there are a lot of 64 byte allocators.
> Unlike the larger allocators (128 and greater) a large proportion of the 64
> byte slots will be used for long literal values (note that the mean
> allocation is only 27 bytes).
>
> Counter intuitively, there may well be a case for excluding the 64 byte
> allocators from the "small slot optimisation".  So "small slot" NOT
> "smallest slot" ;-)
>
> - Martyn
>
> On 24/04/2015 00:18, Bryan Thompson wrote:
>
> I've updated the ticket.  I've also copied my main conclusions inline below.
>
> I think that the issue here is the use of the small slot optimization
> without proper configuration of the indices in order to target small
> allocation slots for at least one of the indices.  The small slot
> optimization changes the allocation policy in two ways.
>
> 1. It has a strong preference to use only empty 8k pages for small
> allocations (as configured, for allocations less than 1k).  This allows us
> to coalesce writes by combining them onto the same page.
> 2. It has a preference to use allocation blocks that are relatively empty
> for small slots.
>
> As a consequence, the small slot optimization MAY recruit more allocators
> in order to have allocators for small slots that have good sparsity.
>
> The main goal of the small slot optimization is to optimize for indices
> that have very scattered IO patterns.  The indices that exhibits this the
> most are the OSP and OCSP indices.  In many cases even batched updates will
> modify no more than a single tuple per page on this index.  However, in
> your configuration (and in mine when I enabled the small slot optimization
> without adjusting the branching factors), the O(C)SP indices were not
> created with a small branching factor, so the small slot allocation could
> not be put to any good effect. However it did have a negative effect -- by
> recruiting more allocators.  If you want to use the small slot
> optimization, make sure that at least the O(C)SP index has a relatively
> small branching factor giving an effective slot size of 256 bytes or less
> on average.
>
> I suggest that you retest w/o the small slot optimization and with group
> commit still enabled.
>
> I've asked Martyn to look over the allocators from the small slot
> optimization run and think about whether we can make this policy a little
> more adaptive when the branching factors are not really tuned properly and
> too many allocators with too much wasted space are allocated as a result.
> Basically, how to avoid file bloat from misconfiguration.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> <ka...@bs...> wrote:
>
>
>  Ok, I can redo the test with smallSlots + groupCommit enabled, and runhttp://localhost:8080/bigdata/status?dumpJournal&dumpPages after some
> minutes. (I cannot run it on the fully loaded dataset because my disk is
> not sufficient for the resulting Journal).
>
> By the way: Please find attached my custom Vocabulary classes. They are
> just one of my many attempts to improve IO Perfomance on rotating disks.
>
> Best Regards
> Andreas
>
>
>   Bryan Thompson <br...@sy...> <br...@sy...> 23.04.15 15.31 Uhr >>>
>
>   I just noticed that you have the full text index enabled as well.  I have
> not be enabling that.
>
> I would like to see the output from this command on the fully loaded data
> sets.
> http://localhost:8080/bigdata/status?dumpJournal&dumpPages
>
> This will let us if any specific index is taking up a very large number of
> pages.  It will also tell us the distribution over the page sizes for each
> index.
>
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> <ka...@bs...>
> wrote:
>
>
>  Bryan,
>
> in the meantime, I could successfully load the file into a 18GB journal
> after disabling groupCommit (I simply commented out the line in
> RWStore.properties).
> I can try again with groupCommit enabled, but smallSlotOptimization
> disabled.
>
> Best Regards
> Andreas
>
>
>   Bryan Thompson <br...@sy...> <br...@sy...> 23.04.2015 13:24 >>>
>
>   Andreas,
>
> I was not able to replicate your result.  Unfortunately I navigated away
> from the browser page in which I had submitted the request, so it loaded
> all the data but failed to commit.  However, the resulting file is only
> 16GB.
>
> I will redo this run and verify that the journal after the commit has
>
>  this
>
>  same size on the disk.
>
> I was only assuming that this was related to group commit because of your
> original message.  Perhaps I misinterpreted your message. This is simply
> about 1.5.1 (with group commit) vs 1.4.0.
>
> Perhaps the issue is related to the small slot optimization?  Maybe in
> combination with group commit?
>
> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>
> I could not replicate your properties exactly because you are using a
> non-standard vocabulary class.  Therefore I simply deleted the default
> namespace (in quads mode) and recreated it with the defaults in triples
> mode.  The small slot optimization and other parameters were not enabled
>
>  in
>
>  my run.
>
> Perhaps you could try to replicate my experience and I will enable the
> small slots optimization?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attacP. Any
>
>  unauthorized review, use, disclosure,
>
>  dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please
>
>  notify
>
>  the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> <ka...@bs...>
> wrote:
>
>
>  Bryan & Martyn,
>
> Thank you very much for investigating the issue. I assume  from the
>
>  ticket
>
>  that the error will vanish if I disable groupCommit. I will do so for
>
>  the
>
>  meantime.
>
> Although there is already extensive information in Bryan's ticket,
>
>  please
>
>  find attached my logs and DumpJournal outputs:
> - dumpJournal.html contains a dump from the 67GB journal after
>
>  Blazegraph
>
>  ran into "No space left on device"
> - dumpJournalWithTraceEnabled.html is the same dump for a running query
> when the journal was at about 14GB
> - queryStatus.html is just the status page showing my query
> - catalina.out.gz contains the trace outputs from starting Tomcat
>
>  until I
>
>  killed the curl running the SPARQL Update by Ctrl-C
> - loadGnd.log.gz is Blazegraphs output when loading the data
>
> Best Regards
> Andreas
>
>
>
>
>   Bryan Thompson <br...@sy...> <br...@sy...> 22.04.15 20.56 Uhr >>>
>
>   See http://trac.bigdata.com/ticket/1206.  This is still in the
> investigation stage.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 274...@sy...://blazegraph.comhttp://blog.bigdata.com <http://bigdata.com> <http://bigdata.com>http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> <http://www.systap.com/mapgraph> is our disruptive
>
>  new
>
>  technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>
>  are
>
>  for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments
>
>  is
>
>  prohibited. If you have received this communication in error, please
>
>  notify
>
>  the sender by reply email and permanently delete all copies of the
>
>  email
>
>  and its contents and attachments.
>
> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> <ka...@bs...>
> wrote:
>
>
>  Hello everyone,
>
> I currently updated to the current Revision (f4c63e5) of Blazegraph
>
>  from
>
>  Git and tried to load a dataset into the updated Webapp. With Bigdata
>
>  1.4.0
>
>  this resulted in a journal of ~18GB. Now the process was cancelled
>
>  because
>
>  the disk was full - the journal was beyond 50GB for the same file
>
>   with
>
>  the
>
>  same settings.
> The only exception was that I activated GroupCommit.
>
> The dataset can be downloaded here:
>
>
>    http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
>
>   .
> Please find the settings used to load the file below.
>
> Do I have a misconfiguration, or is there a bug eating all disk
>
>   memory?
>
>   Best regards
> Andreas
>
> Namespace-Properties:
> curl -H "Accept: text/plain"http://localhost:8080/bigdata/namespace/gnd/properties
> #Wed Apr 22 11:35:31 CEST 2015
>
>
>  com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
>
>  com.bigdata.relation.container=gnd
> com.bigdata.rwstore.RWStore.smallSlotType=1024
> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
>
>
>
>   com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true
>
>  com.bigdata.btree.BTree.branchingFactor=700
>
>
>
>    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
>
>   com.bigdata.rdf.sail.isolatableIndices=false
> com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> com.bigdata.rdf.sail.bufferCapacity=2000
> com.bigdata.rdf.sail.truthMaintenance=false
> com.bigdata.rdf.sail.namespace=gnd
> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> com.bigdata.rdf.store.AbstractTripleStore.quads=false
> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> com.bigdata.search.FullTextIndex.fieldsEnabled=false
> com.bigdata.relation.namespace=gndity=10000
> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
>
>
>
>
>    ------------------------------------------------------------------------------
>
>   BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exerciseshttp://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
>
>
>   source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>   _______________________________________________
> Bigdata-developers mailing lis...@li...://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
>
> _______________________________________________
> Bigdata-developers mailing lis...@li...://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>

Re: [Bigdata-developers] Bigdata & Deletions

From: Bryan T. <br...@sy...> - 2015-04-24 16:35:32

Rick,

I would recommend that you model your problem a bit differently.  I will
give some suggestions as to how you might do this, but first let me explain
how we handle such moves and storage reclamation.

- Blazegraph does a COPY + DELETE for SPARQL UPDATE "MOVE".  You might be
able to hack this. I will outline how below.

- Blazegraph recycles storage.  This is documented in some depth on the
wiki, but the basic concept is that allocation slots are recycled once they
no longer have data that is visible from a retained commit point.

Let me suggest some ways in which you might achieve your goals without a
performance penalty.  As I see it, you are basically trying to change the
state associated with a named graph as you move it along in some workflow.
The first two options would require you to manage metadata (in yet another
graph) mapping workflow state URIs onto fixed URIs associated with a named
graph.  When you change the workflow state, you are just changing the
mapping between the external URI and the fixed URI naming the graph
internally.  Either of these approaches would give you constant time
"renames".

1. Use named graphs.  But, per above, do the rename outside of the quads
store.  You can either use a special named graph to old this mapping or you
can have yet another graph in the database that has this mapping.  We even
have support for "virtual graphs" that might let you do this out of the
box.  See http://wiki.blazegraph.com/wiki/index.php/VirtualGraphs

2. Using multiple triple stores.  SPARQL "quads" (named graphs) provides
the ability to transparently query across the named graphs either
extracting their identifiers (using the GRAPH keyword for a named graph
access path) or collapsing duplicate statements onto distinct statements
(for a default graph access path).  If each of these named graphs is really
just being used as its own triple store, then you can have many different
triple stores in a single blazegraph instance.  Just put each one into its
own namespace.

The 3rd approach is more in the spirit of hacking the rename.

3. Hacking the rename.  Ok, you effectively want to change the name of the
graph.  Internally each statement in a graph has an IV (Internal Value) in
the 4th position of the statement tuple that is the graph identifier.  If
you need to modify those IVs, then you are going to touch a lot of data.
Not constant time operation.  The alternative is to hack the dictionary.
You would *replace* the entry in the TERM2ID dictionary (mapping the URI
onto an IV) with a different entry mapping the new URI onto the same IV.
You would also update the reverse lookup (in ID2TERM).  The old URI will
"disappear".  The new URI will be mapped to the data associated with the
old URI.  This would be a constant time operation.  However, it WILL NOT
work if the new URI is already defined since it would then orphan any data
associated with the IV for the new URI.  If your URIs are always new when
you do this "rename" then you could use this mechanism.  We could not make
this a general purpose rename.  We could perhaps do this rename if we could
prove that the new URI was not pre-existing through some clever code.
Either we or you could implement this as a special operator for your
application.

Let me know if you want to setup a telcon to discuss any of this.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Fri, Apr 24, 2015 at 7:51 AM, Rick Moynihan <ri...@sw...> wrote:

> Hi all,
>
> We've recently been evaluating quad-stores, and in particular are looking
> for better storage layers, and Blazegraph looks like a promising option.
>
> We have a linked data management system, which has several  management
> workflows where by:
>
> 1. large named graphs can be moved around (renamed via a SPARQL Update
> MOVE command).
>
> 2. large named graphs can be inserted, reviewed, deleted (repaired
> offline) and reinserted again before finally being approved.
>
> With this workflow there are two problems we have been finding with some
> of the other quad stores:
>
> The first is that renames are often implemented as a copy/delete; which
> results in a slow linear-time (or worse) operation.  Ideally renaming
> graphs would be constant time.
>
> The second problem we have been encountering (which the first can
> compound) is that some stores don't free storage on deletions, and don't
> even have a mechanism for expunging deletions without taking the database
> offline.
>
> I'm curious as to what Blazegraph's behaviour is in these two
> circumstances, and whether or not the different journals have different
> behaviours.
>
> Many thanks,
>
> R.
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Martyn C. <ma...@sy...> - 2015-04-24 12:35:30

I don't see how the small slot optimisation can result in more waste 
with larger allocators.

It is simply a mechanism to avoid rapid re-allocation of the small slot 
allocators to attempt to improve write elision on recycled slots.

In the latest Allocator dump, there are a lot of 64 byte allocators.  
Unlike the larger allocators (128 and greater) a large proportion of the 
64 byte slots will be used for long literal values (note that the mean 
allocation is only 27 bytes).

Counter intuitively, there may well be a case for excluding the 64 byte 
allocators from the "small slot optimisation".  So "small slot" NOT 
"smallest slot" ;-)

- Martyn

On 24/04/2015 00:18, Bryan Thompson wrote:
> I've updated the ticket.  I've also copied my main conclusions inline below.
>
> I think that the issue here is the use of the small slot optimization
> without proper configuration of the indices in order to target small
> allocation slots for at least one of the indices.  The small slot
> optimization changes the allocation policy in two ways.
>
> 1. It has a strong preference to use only empty 8k pages for small
> allocations (as configured, for allocations less than 1k).  This allows us
> to coalesce writes by combining them onto the same page.
> 2. It has a preference to use allocation blocks that are relatively empty
> for small slots.
>
> As a consequence, the small slot optimization MAY recruit more allocators
> in order to have allocators for small slots that have good sparsity.
>
> The main goal of the small slot optimization is to optimize for indices
> that have very scattered IO patterns.  The indices that exhibits this the
> most are the OSP and OCSP indices.  In many cases even batched updates will
> modify no more than a single tuple per page on this index.  However, in
> your configuration (and in mine when I enabled the small slot optimization
> without adjusting the branching factors), the O(C)SP indices were not
> created with a small branching factor, so the small slot allocation could
> not be put to any good effect. However it did have a negative effect -- by
> recruiting more allocators.  If you want to use the small slot
> optimization, make sure that at least the O(C)SP index has a relatively
> small branching factor giving an effective slot size of 256 bytes or less
> on average.
>
> I suggest that you retest w/o the small slot optimization and with group
> commit still enabled.
>
> I've asked Martyn to look over the allocators from the small slot
> optimization run and think about whether we can make this policy a little
> more adaptive when the branching factors are not really tuned properly and
> too many allocators with too much wasted space are allocated as a result.
> Basically, how to avoid file bloat from misconfiguration.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote:
>
>> Ok, I can redo the test with smallSlots + groupCommit enabled, and run
>> http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some
>> minutes. (I cannot run it on the fully loaded dataset because my disk is
>> not sufficient for the resulting Journal).
>>
>> By the way: Please find attached my custom Vocabulary classes. They are
>> just one of my many attempts to improve IO Perfomance on rotating disks.
>>
>> Best Regards
>> Andreas
>>
>>>>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>>
>> I just noticed that you have the full text index enabled as well.  I have
>> not be enabling that.
>>
>> I would like to see the output from this command on the fully loaded data
>> sets.
>>
>> http://localhost:8080/bigdata/status?dumpJournal&dumpPages
>>
>> This will let us if any specific index is taking up a very large number of
>> pages.  It will also tell us the distribution over the page sizes for each
>> index.
>>
>> Bryan
>>
>> ----
>> Bryan Thompson
>> Chief Scientist & Founder
>> SYSTAP, LLC
>> 4501 Tower Road
>> Greensboro, NC 27410
>> br...@sy...
>> http://blazegraph.com
>> http://blog.bigdata.com <http://bigdata.com>
>> http://mapgraph.io
>>
>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
>> technology to use GPUs to accelerate data-parallel graph analytics.
>>
>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
>> for the sole use of the intended recipient(s) and are confidential or
>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>> dissemination or copying of this email or its contents or attachments is
>> prohibited. If you have received this communication in error, please notify
>> the sender by reply email and permanently delete all copies of the email
>> and its contents and attachments.
>>
>> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...>
>> wrote:
>>
>>> Bryan,
>>>
>>> in the meantime, I could successfully load the file into a 18GB journal
>>> after disabling groupCommit (I simply commented out the line in
>>> RWStore.properties).
>>> I can try again with groupCommit enabled, but smallSlotOptimization
>>> disabled.
>>>
>>> Best Regards
>>> Andreas
>>>
>>>>>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
>>> Andreas,
>>>
>>> I was not able to replicate your result.  Unfortunately I navigated away
>>> from the browser page in which I had submitted the request, so it loaded
>>> all the data but failed to commit.  However, the resulting file is only
>>> 16GB.
>>>
>>> I will redo this run and verify that the journal after the commit has
>> this
>>> same size on the disk.
>>>
>>> I was only assuming that this was related to group commit because of your
>>> original message.  Perhaps I misinterpreted your message. This is simply
>>> about 1.5.1 (with group commit) vs 1.4.0.
>>>
>>> Perhaps the issue is related to the small slot optimization?  Maybe in
>>> combination with group commit?
>>>
>>> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>>>
>>> I could not replicate your properties exactly because you are using a
>>> non-standard vocabulary class.  Therefore I simply deleted the default
>>> namespace (in quads mode) and recreated it with the defaults in triples
>>> mode.  The small slot optimization and other parameters were not enabled
>> in
>>> my run.
>>>
>>> Perhaps you could try to replicate my experience and I will enable the
>>> small slots optimization?
>>>
>>> Thanks,
>>> Bryan
>>>
>>> ----
>>> Bryan Thompson
>>> Chief Scientist & Founder
>>> SYSTAP, LLC
>>> 4501 Tower Road
>>> Greensboro, NC 27410
>>> br...@sy...
>>> http://blazegraph.com
>>> http://blog.bigdata.com <http://bigdata.com>
>>> http://mapgraph.io
>>>
>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
>>> technology to use GPUs to accelerate data-parallel graph analytics.
>>>
>>> CONFIDENTIALITY NOTICE:  This email and its contents and attacP. Any
>> unauthorized review, use, disclosure,
>>> dissemination or copying of this email or its contents or attachments is
>>> prohibited. If you have received this communication in error, please
>> notify
>>> the sender by reply email and permanently delete all copies of the email
>>> and its contents and attachments.
>>>
>>> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
>>> wrote:
>>>
>>>> Bryan & Martyn,
>>>>
>>>> Thank you very much for investigating the issue. I assume  from the
>>> ticket
>>>> that the error will vanish if I disable groupCommit. I will do so for
>> the
>>>> meantime.
>>>>
>>>> Although there is already extensive information in Bryan's ticket,
>> please
>>>> find attached my logs and DumpJournal outputs:
>>>> - dumpJournal.html contains a dump from the 67GB journal after
>> Blazegraph
>>>> ran into "No space left on device"
>>>> - dumpJournalWithTraceEnabled.html is the same dump for a running query
>>>> when the journal was at about 14GB
>>>> - queryStatus.html is just the status page showing my query
>>>> - catalina.out.gz contains the trace outputs from starting Tomcat
>> until I
>>>> killed the curl running the SPARQL Update by Ctrl-C
>>>> - loadGnd.log.gz is Blazegraphs output when loading the data
>>>>
>>>> Best Regards
>>>> Andreas
>>>>
>>>>
>>>>
>>>>>>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
>>>> See http://trac.bigdata.com/ticket/1206.  This is still in the
>>>> investigation stage.
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>> ----
>>>> Bryan Thompson
>>>> Chief Scientist & Founder
>>>> SYSTAP, LLC
>>>> 4501 Tower Road
>>>> Greensboro, NC 27410
>>>> br...@sy...
>>>> http://blazegraph.com
>>>> http://blog.bigdata.com <http://bigdata.com>
>>>> http://mapgraph.io
>>>>
>>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>>>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive
>> new
>>>> technology to use GPUs to accelerate data-parallel graph analytics.
>>>>
>>>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>> are
>>>> for the sole use of the intended recipient(s) and are confidential or
>>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>>>> dissemination or copying of this email or its contents or attachments
>> is
>>>> prohibited. If you have received this communication in error, please
>>> notify
>>>> the sender by reply email and permanently delete all copies of the
>> email
>>>> and its contents and attachments.
>>>>
>>>> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
>>>> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> I currently updated to the current Revision (f4c63e5) of Blazegraph
>>> from
>>>>> Git and tried to load a dataset into the updated Webapp. With Bigdata
>>>> 1.4.0
>>>>> this resulted in a journal of ~18GB. Now the process was cancelled
>>>> because
>>>>> the disk was full - the journal was beyond 50GB for the same file
>> with
>>>> the
>>>>> same settings.
>>>>> The only exception was that I activated GroupCommit.
>>>>>
>>>>> The dataset can be downloaded here:
>>>>>
>> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
>>>>> .
>>>>> Please find the settings used to load the file below.
>>>>>
>>>>> Do I have a misconfiguration, or is there a bug eating all disk
>> memory?
>>>>> Best regards
>>>>> Andreas
>>>>>
>>>>> Namespace-Properties:
>>>>> curl -H "Accept: text/plain"
>>>>> http://localhost:8080/bigdata/namespace/gnd/properties
>>>>> #Wed Apr 22 11:35:31 CEST 2015
>>>>>
>>> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
>>>>> com.bigdata.relation.container=gnd
>>>>> com.bigdata.rwstore.RWStore.smallSlotType=1024
>>>>> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
>>>>> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
>>>>>
>>>>>
>>> com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true
>>>>> com.bigdata.btree.BTree.branchingFactor=700
>>>>>
>>>>>
>> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
>>>>> com.bigdata.rdf.sail.isolatableIndices=false
>>>>> com.bigdata.service.AbstractTransactionService.minReleaseAge=1
>>>>> com.bigdata.rdf.sail.bufferCapacity=2000
>>>>> com.bigdata.rdf.sail.truthMaintenance=false
>>>>> com.bigdata.rdf.sail.namespace=gnd
>>>>> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
>>>>> com.bigdata.rdf.store.AbstractTripleStore.quads=false
>>>>> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
>>>>> com.bigdata.search.FullTextIndex.fieldsEnabled=false
>>>>> com.bigdata.relation.namespace=gndity=10000
>>>>> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
>>>>> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
>>>>>
>>>>>
>>>>>
>> ------------------------------------------------------------------------------
>>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>>>> Develop your own process in accordance with the BPMN 2 standard
>>>>> Learn Process modeling best practices with Bonita BPM through live
>>>>> exercises
>>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>>>>> event?utm_
>>>>>
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>>>> _______________________________________________
>>>>> Bigdata-developers mailing list
>>>>> Big...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>>>>
>>>>>
>>>>
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>
>
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers

[Bigdata-developers] Bigdata & Deletions

From: Rick M. <ri...@sw...> - 2015-04-24 12:23:30

Hi all,

We've recently been evaluating quad-stores, and in particular are looking
for better storage layers, and Blazegraph looks like a promising option.

We have a linked data management system, which has several  management
workflows where by:

1. large named graphs can be moved around (renamed via a SPARQL Update MOVE
command).

2. large named graphs can be inserted, reviewed, deleted (repaired offline)
and reinserted again before finally being approved.

With this workflow there are two problems we have been finding with some of
the other quad stores:

The first is that renames are often implemented as a copy/delete; which
results in a slow linear-time (or worse) operation.  Ideally renaming
graphs would be constant time.

The second problem we have been encountering (which the first can compound)
is that some stores don't free storage on deletions, and don't even have a
mechanism for expunging deletions without taking the database offline.

I'm curious as to what Blazegraph's behaviour is in these two
circumstances, and whether or not the different journals have different
behaviours.

Many thanks,

R.

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-23 23:18:20

I've updated the ticket.  I've also copied my main conclusions inline below.

I think that the issue here is the use of the small slot optimization
without proper configuration of the indices in order to target small
allocation slots for at least one of the indices.  The small slot
optimization changes the allocation policy in two ways.

1. It has a strong preference to use only empty 8k pages for small
allocations (as configured, for allocations less than 1k).  This allows us
to coalesce writes by combining them onto the same page.
2. It has a preference to use allocation blocks that are relatively empty
for small slots.

As a consequence, the small slot optimization MAY recruit more allocators
in order to have allocators for small slots that have good sparsity.

The main goal of the small slot optimization is to optimize for indices
that have very scattered IO patterns.  The indices that exhibits this the
most are the OSP and OCSP indices.  In many cases even batched updates will
modify no more than a single tuple per page on this index.  However, in
your configuration (and in mine when I enabled the small slot optimization
without adjusting the branching factors), the O(C)SP indices were not
created with a small branching factor, so the small slot allocation could
not be put to any good effect. However it did have a negative effect -- by
recruiting more allocators.  If you want to use the small slot
optimization, make sure that at least the O(C)SP index has a relatively
small branching factor giving an effective slot size of 256 bytes or less
on average.

I suggest that you retest w/o the small slot optimization and with group
commit still enabled.

I've asked Martyn to look over the allocators from the small slot
optimization run and think about whether we can make this policy a little
more adaptive when the branching factors are not really tuned properly and
too many allocators with too much wasted space are allocated as a result.
Basically, how to avoid file bloat from misconfiguration.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote:

> Ok, I can redo the test with smallSlots + groupCommit enabled, and run
> http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some
> minutes. (I cannot run it on the fully loaded dataset because my disk is
> not sufficient for the resulting Journal).
>
> By the way: Please find attached my custom Vocabulary classes. They are
> just one of my many attempts to improve IO Perfomance on rotating disks.
>
> Best Regards
> Andreas
>
> >>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>>
> I just noticed that you have the full text index enabled as well.  I have
> not be enabling that.
>
> I would like to see the output from this command on the fully loaded data
> sets.
>
> http://localhost:8080/bigdata/status?dumpJournal&dumpPages
>
> This will let us if any specific index is taking up a very large number of
> pages.  It will also tell us the distribution over the page sizes for each
> index.
>
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Bryan,
> >
> > in the meantime, I could successfully load the file into a 18GB journal
> > after disabling groupCommit (I simply commented out the line in
> > RWStore.properties).
> > I can try again with groupCommit enabled, but smallSlotOptimization
> > disabled.
> >
> > Best Regards
> > Andreas
> >
> > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
> > Andreas,
> >
> > I was not able to replicate your result.  Unfortunately I navigated away
> > from the browser page in which I had submitted the request, so it loaded
> > all the data but failed to commit.  However, the resulting file is only
> > 16GB.
> >
> > I will redo this run and verify that the journal after the commit has
> this
> > same size on the disk.
> >
> > I was only assuming that this was related to group commit because of your
> > original message.  Perhaps I misinterpreted your message. This is simply
> > about 1.5.1 (with group commit) vs 1.4.0.
> >
> > Perhaps the issue is related to the small slot optimization?  Maybe in
> > combination with group commit?
> >
> > *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
> >
> > I could not replicate your properties exactly because you are using a
> > non-standard vocabulary class.  Therefore I simply deleted the default
> > namespace (in quads mode) and recreated it with the defaults in triples
> > mode.  The small slot optimization and other parameters were not enabled
> in
> > my run.
> >
> > Perhaps you could try to replicate my experience and I will enable the
> > small slots optimization?
> >
> > Thanks,
> > Bryan
> >
> > ----
> > Bryan Thompson
> > Chief Scientist & Founder
> > SYSTAP, LLC
> > 4501 Tower Road
> > Greensboro, NC 27410
> > br...@sy...
> > http://blazegraph.com
> > http://blog.bigdata.com <http://bigdata.com>
> > http://mapgraph.io
> >
> > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> > technology to use GPUs to accelerate data-parallel graph analytics.
> >
> > CONFIDENTIALITY NOTICE:  This email and its contents and attacP. Any
> unauthorized review, use, disclosure,
> > dissemination or copying of this email or its contents or attachments is
> > prohibited. If you have received this communication in error, please
> notify
> > the sender by reply email and permanently delete all copies of the email
> > and its contents and attachments.
> >
> > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
> > wrote:
> >
> > > Bryan & Martyn,
> > >
> > > Thank you very much for investigating the issue. I assume  from the
> > ticket
> > > that the error will vanish if I disable groupCommit. I will do so for
> the
> > > meantime.
> > >
> > > Although there is already extensive information in Bryan's ticket,
> please
> > > find attached my logs and DumpJournal outputs:
> > > - dumpJournal.html contains a dump from the 67GB journal after
> Blazegraph
> > > ran into "No space left on device"
> > > - dumpJournalWithTraceEnabled.html is the same dump for a running query
> > > when the journal was at about 14GB
> > > - queryStatus.html is just the status page showing my query
> > > - catalina.out.gz contains the trace outputs from starting Tomcat
> until I
> > > killed the curl running the SPARQL Update by Ctrl-C
> > > - loadGnd.log.gz is Blazegraphs output when loading the data
> > >
> > > Best Regards
> > > Andreas
> > >
> > >
> > >
> > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> > > See http://trac.bigdata.com/ticket/1206.  This is still in the
> > > investigation stage.
> > >
> > > Thanks,
> > > Bryan
> > >
> > > ----
> > > Bryan Thompson
> > > Chief Scientist & Founder
> > > SYSTAP, LLC
> > > 4501 Tower Road
> > > Greensboro, NC 27410
> > > br...@sy...
> > > http://blazegraph.com
> > > http://blog.bigdata.com <http://bigdata.com>
> > > http://mapgraph.io
> > >
> > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive
> new
> > > technology to use GPUs to accelerate data-parallel graph analytics.
> > >
> > > CONFIDENTIALITY NOTICE:  This email and its contents and attachments
> are
> > > for the sole use of the intended recipient(s) and are confidential or
> > > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> > > dissemination or copying of this email or its contents or attachments
> is
> > > prohibited. If you have received this communication in error, please
> > notify
> > > the sender by reply email and permanently delete all copies of the
> email
> > > and its contents and attachments.
> > >
> > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > I currently updated to the current Revision (f4c63e5) of Blazegraph
> > from
> > > > Git and tried to load a dataset into the updated Webapp. With Bigdata
> > > 1.4.0
> > > > this resulted in a journal of ~18GB. Now the process was cancelled
> > > because
> > > > the disk was full - the journal was beyond 50GB for the same file
> with
> > > the
> > > > same settings.
> > > > The only exception was that I activated GroupCommit.
> > > >
> > > > The dataset can be downloaded here:
> > > >
> > >
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > > > .
> > > > Please find the settings used to load the file below.
> > > >
> > > > Do I have a misconfiguration, or is there a bug eating all disk
> memory?
> > > >
> > > > Best regards
> > > > Andreas
> > > >
> > > > Namespace-Properties:
> > > > curl -H "Accept: text/plain"
> > > > http://localhost:8080/bigdata/namespace/gnd/properties
> > > > #Wed Apr 22 11:35:31 CEST 2015
> > > >
> > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > > > com.bigdata.relation.container=gnd
> > > > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> > > >
> > > >
> > >
> > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true
> > > > com.bigdata.btree.BTree.branchingFactor=700
> > > >
> > > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > > > com.bigdata.rdf.sail.isolatableIndices=false
> > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > > com.bigdata.rdf.sail.truthMaintenance=false
> > > > com.bigdata.rdf.sail.namespace=gnd
> > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > > > com.bigdata.relation.namespace=gndity=10000
> > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> > > >
> > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------
> > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > > Develop your own process in accordance with the BPMN 2 standard
> > > > Learn Process modeling best practices with Bonita BPM through live
> > > > exercises
> > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > > > event?utm_
> > > >
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > > _______________________________________________
> > > > Bigdata-developers mailing list
> > > > Big...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> > > >
> > > >
> > >
> > >
> >
>
>

Re: [Bigdata-developers] NPE on BlazeGraph 1.5.1

From: Bryan T. <br...@sy...> - 2015-04-23 20:44:26

The NPE is on the bold lines below. getSPORelation() is returning null.

final public IAccessPath<ISPO> getAccessPath(final IV s, final IV p,
            final IV o,final IV c, final RangeBOp range) {

*        return getSPORelation()*
*                .getAccessPath(s, p, o, c, range);*

    }

The code for this method is below.  It uses what amounts to a
double-checked locking pattern to avoid synchronization in the common case
where the value is already set on the atomic reference.  abort(), create(),
destroy() and this method can all set its value, but this is the only
method that will set it to a non-null value.

    final public SPORelation getSPORelation() {

        if (spoRelationRef.get() == null) {

            /*
             * Note: double-checked locking pattern (mostly non-blocking).
Only
             * synchronized if not yet resolved. The AtomicReference is
reused
             * as the monitor to serialize the resolution of the
SPORelation in
             * order to have that operation not contend with any other part
of
             * the API.
             */
            synchronized (this) {

                if (spoRelationRef.get() == null) {

                    spoRelationRef.set((SPORelation) getIndexManager()
                            .getResourceLocator().locate(
                                    getNamespace() + "."
                                            + SPORelation.NAME_SPO_RELATION,
                                    getTimestamp()));

                }

            }

        }

        return spoRelationRef.get();

    }

    private final AtomicReference<SPORelation> spoRelationRef = new
AtomicReference<SPORelation>();

My most likely interpretation for this is that the operation has been
cancelled and this represents the asynchronous case where the
spoRelationRef value was cleared by abort().  However, you might want to
turn on logging @ INFO on the DefaultResourceLocator class.  This is the
class that is being called by the *locate()* call above.  This *can* return
null, but it should only return null if the index does not exist.  This
should not be true when it is running a query against an existing triple
store.

This might be related to #468 (rare interrupt of rangeCount during query on
cluster).  That is, it is possible that an interrupt is coming through in a
race with the rangeCount() call and you are seeing this NPE when the
abort() is executed before the rangeCount() and the thread calling
rangeCount() might have been interrupted, but it has not observed the
interrupt yet (has not hit a lock or IO, etc.).

Both of these potential explanations would beg the questions:

a. why is abort() being called (rollback() of the connection running the
query or canceling the query could do this).
b. why is an interrupt being raised (if we believe that abort() was called
due to query termination by interrupt)?  Is this to cancel the query?  Or
is it spurious?

So the question is whether this is a data race that is triggered by an
intentional cancellation of the query (which could also be due to an error
during query processing) or a data race triggered by a spurious interrupt
(which would be unpleasant) or something else?

Yes, it is worth looking into further.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 3:50 PM, Stas Malyshev <sma...@wi...>
wrote:

> Hi!
>
> I've encountered an NPE exception running Blazegraph with our data
> update tool, the dump is here:
> https://gist.github.com/smalyshev/6b8b318c8449bfb837e1
>
> This seems to be random (the same query runs again with no issue) and
> happened under some load, but does not seem to be reproducible since. I
> am still worried it may hint at some bug. Any ideas of how to
> investigate it further and if there's a reason for worry?
>
> Thanks,
> --
> Stas Malyshev
> sma...@wi...
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>

[Bigdata-developers] NPE on BlazeGraph 1.5.1

From: Stas M. <sma...@wi...> - 2015-04-23 20:15:20

Hi!

I've encountered an NPE exception running Blazegraph with our data
update tool, the dump is here:
https://gist.github.com/smalyshev/6b8b318c8449bfb837e1

This seems to be random (the same query runs again with no issue) and
happened under some load, but does not seem to be reproducible since. I
am still worried it may hint at some bug. Any ideas of how to
investigate it further and if there's a reason for worry?

Thanks,
-- 
Stas Malyshev
sma...@wi...

Re: [Bigdata-developers] Antw: Re: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-23 18:57:46

I will say that I am observing a lot of IO Wait on that data set, even on
an SSD (~10-20%).  I am using just the out of the box settings for a newly
created kb.  These are by no means optimal.  I would suggest a larger pool
of write cache buffers in order to reduce the disk IO. The write cache
buffers make it possible for index pages that are evicted and then modified
before they are actually written to the disk to skip the IO for the first
modified version of the page. This can be quite a substantial savings for
large data set loads.

Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 9:30 AM, Andreas Kahl <ka...@bs...> wrote:

> Now I am 25mins into the new load with groupCommit enabled and
> com.bigdata.rwstore.RWStore.smallSlotType=1024 commented out.
> Currently 24,870,000 Triples are parsed and the journal is at 3.6GB. It
> looks like disabling smallSlotOptimization also resolves the problem
> (Otherwise I would have more than twice the space used at that time).
>
> So, I would conclude, it's the combination of groupCommit and
> smallSlotOptimization.
>
> All tests were run on Blazegraph 1.5.1 from Git revision f4c63e5.
>
> Best Regards
> Andreas
>
>
> >>> "Andreas Kahl" <ka...@bs...> 23.04.2015 14:54 >>>
> Bryan,
>
> in the meantime, I could successfully load the file into a 18GB journal
> after disabling groupCommit (I simply commented out the line in
> RWStore.properties).
> I can try again with groupCommit enabled, but smallSlotOptimization
> disabled.
>
> Best Regards
> Andreas
>
> >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
> Andreas,
>
> I was not able to replicate your result.  Unfortunately I navigated away
> from the browser page in which I had submitted the request, so it loaded
> all the data but failed to commit.  However, the resulting file is only
> 16GB.
>
> I will redo this run and verify that the journal after the commit has this
> same size on the disk.
>
> I was only assuming that this was related to group commit because of your
> original message.  Perhaps I misinterpreted your message. This is simply
> about 1.5.1 (with group commit) vs 1.4.0.
>
> Perhaps the issue is related to the small slot optimization?  Maybe in
> combination with group commit?
>
> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>
> I could not replicate your properties exactly because you are using a
> non-standard vocabulary class.  Therefore I simply deleted the default
> namespace (in quads mode) and recreated it with the defaults in triples
> mode.  The small slot optimization and other parameters were not enabled in
> my run.
>
> Perhaps you could try to replicate my experience and I will enable the
> small slots optimization?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Bryan & Martyn,
> >
> > Thank you very much for investigating the issue. I assume  from the
> ticket
> > that the error will vanish if I disable groupCommit. I will do so for the
> > meantime.
> >
> > Although there is already extensive information in Bryan's ticket, please
> > find attached my logs and DumpJournal outputs:
> > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> > ran into "No space left on device"
> > - dumpJournalWithTraceEnabled.html is the same dump for a running query
> > when the journal was at about 14GB
> > - queryStatus.html is just the status page showing my query
> > - catalina.out.gz contains the trace outputs from starting Tomcat until I
> > killed the curl running the SPARQL Update by Ctrl-C
> > - loadGnd.log.gz is Blazegraphs output when loading the data
> >
> > Best Regards
> > Andreas
> >
> >
> >
> > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> > See http://trac.bigdata.com/ticket/1206.  This is still in the
> > investigation stage.
> >
> > Thanks,
> > Bryan
> >
> > ----
> > Bryan Thompson
> > Chief Scientist & Founder
> > SYSTAP, LLC
> > 4501 Tower Road
> > Greensboro, NC 27410
> > br...@sy...
> > http://blazegraph.com
> > http://blog.bigdata.com <http://bigdata.com>
> > http://mapgraph.io
> >
> > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> > technology to use GPUs to accelerate data-parallel graph analytics.
> >
> > CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> > for the sole use of the intended recipient(s) and are confidential or
> > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> > dissemination or copying of this email or its contents or attachments is
> > prohibited. If you have received this communication in error, please
> notify
> > the sender by reply email and permanently delete all copies of the email
> > and its contents and attachments.
> >
> > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I currently updated to the current Revision (f4c63e5) of Blazegraph
> from
> > > Git and tried to load a dataset into the updated Webapp. With Bigdata
> > 1.4.0
> > > this resulted in a journal of ~18GB. Now the process was cancelled
> > because
> > > the disk was full - the journal was beyond 50GB for the same file with
> > the
> > > same settings.
> > > The only exception was that I activated GroupCommit.
> > >
> > > The dataset can be downloaded here:
> > >
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > > .
> > > Please find the settings used to load the file below.
> > >
> > > Do I have a misconfiguration, or is there a bug eating all disk memory?
> > >
> > > Best regards
> > > Andreas
> > >
> > > Namespace-Properties:
> > > curl -H "Accept: text/plain"
> > > http://localhost:8080/bigdata/namespace/gnd/properties
> > > #Wed Apr 22 11:35:31 CEST 2015
> > >
> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > > com.bigdata.relation.container=gnd
> > > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> > > com.bigdata.journal.AbstractJournal.initialExtent=209715200
> > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> > > com.bigdata.btree.BTree.branchingFactor=700
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > > com.bigdata.rdf.sail.isolatableIndices=false
> > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > com.bigdata.rdf.sail.truthMaintenance=false
> > > com.bigdata.rdf.sail.namespace=gnd
> > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > > com.bigdata.relation.namespace=gndity=10000
> > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > Develop your own process in accordance with the BPMN 2 standard
> > > Learn Process modeling best practices with Bonita BPM through live
> > > exercises
> > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > > event?utm_
> > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > _______________________________________________
> > > Bigdata-developers mailing list
> > > Big...@li...
> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> > >
> > >
> >
> >
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-23 13:58:31

You should increase the buffer capacity to get better throughput.  You
specify two different names below. The actual name is the first
(com.bigdata.rdf.sail.bufferCapacity).  This specifies how many statements
will be buffered in the BigdataSailConnection before the statements are
incrementally evicted to the disk.  For large loads, a value of 100000 or
better is a good idea - as long as you do not encounter too much GC
overhead.

> > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 9:36 AM, Andreas Kahl <ka...@bs...> wrote:

> Ok, I can redo the test with smallSlots + groupCommit enabled, and run
> http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some
> minutes. (I cannot run it on the fully loaded dataset because my disk is
> not sufficient for the resulting Journal).
>
> By the way: Please find attached my custom Vocabulary classes. They are
> just one of my many attempts to improve IO Perfomance on rotating disks.
>
> Best Regards
> Andreas
>
> >>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>>
> I just noticed that you have the full text index enabled as well.  I have
> not be enabling that.
>
> I would like to see the output from this command on the fully loaded data
> sets.
>
> http://localhost:8080/bigdata/status?dumpJournal&dumpPages
>
> This will let us if any specific index is taking up a very large number of
> pages.  It will also tell us the distribution over the page sizes for each
> index.
>
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Bryan,
> >
> > in the meantime, I could successfully load the file into a 18GB journal
> > after disabling groupCommit (I simply commented out the line in
> > RWStore.properties).
> > I can try again with groupCommit enabled, but smallSlotOptimization
> > disabled.
> >
> > Best Regards
> > Andreas
> >
> > >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
> > Andreas,
> >
> > I was not able to replicate your result.  Unfortunately I navigated away
> > from the browser page in which I had submitted the request, so it loaded
> > all the data but failed to commit.  However, the resulting file is only
> > 16GB.
> >
> > I will redo this run and verify that the journal after the commit has
> this
> > same size on the disk.
> >
> > I was only assuming that this was related to group commit because of your
> > original message.  Perhaps I misinterpreted your message. This is simply
> > about 1.5.1 (with group commit) vs 1.4.0.
> >
> > Perhaps the issue is related to the small slot optimization?  Maybe in
> > combination with group commit?
> >
> > *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
> >
> > I could not replicate your properties exactly because you are using a
> > non-standard vocabulary class.  Therefore I simply deleted the default
> > namespace (in quads mode) and recreated it with the defaults in triples
> > mode.  The small slot optimization and other parameters were not enabled
> in
> > my run.
> >
> > Perhaps you could try to replicate my experience and I will enable the
> > small slots optimization?
> >
> > Thanks,
> > Bryan
> >
> > ----
> > Bryan Thompson
> > Chief Scientist & Founder
> > SYSTAP, LLC
> > 4501 Tower Road
> > Greensboro, NC 27410
> > br...@sy...
> > http://blazegraph.com
> > http://blog.bigdata.com <http://bigdata.com>
> > http://mapgraph.io
> >
> > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> > technology to use GPUs to accelerate data-parallel graph analytics.
> >
> > CONFIDENTIALITY NOTICE:  This email and its contents and attacP. Any
> unauthorized review, use, disclosure,
> > dissemination or copying of this email or its contents or attachments is
> > prohibited. If you have received this communication in error, please
> notify
> > the sender by reply email and permanently delete all copies of the email
> > and its contents and attachments.
> >
> > On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
> > wrote:
> >
> > > Bryan & Martyn,
> > >
> > > Thank you very much for investigating the issue. I assume  from the
> > ticket
> > > that the error will vanish if I disable groupCommit. I will do so for
> the
> > > meantime.
> > >
> > > Although there is already extensive information in Bryan's ticket,
> please
> > > find attached my logs and DumpJournal outputs:
> > > - dumpJournal.html contains a dump from the 67GB journal after
> Blazegraph
> > > ran into "No space left on device"
> > > - dumpJournalWithTraceEnabled.html is the same dump for a running query
> > > when the journal was at about 14GB
> > > - queryStatus.html is just the status page showing my query
> > > - catalina.out.gz contains the trace outputs from starting Tomcat
> until I
> > > killed the curl running the SPARQL Update by Ctrl-C
> > > - loadGnd.log.gz is Blazegraphs output when loading the data
> > >
> > > Best Regards
> > > Andreas
> > >
> > >
> > >
> > > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> > > See http://trac.bigdata.com/ticket/1206.  This is still in the
> > > investigation stage.
> > >
> > > Thanks,
> > > Bryan
> > >
> > > ----
> > > Bryan Thompson
> > > Chief Scientist & Founder
> > > SYSTAP, LLC
> > > 4501 Tower Road
> > > Greensboro, NC 27410
> > > br...@sy...
> > > http://blazegraph.com
> > > http://blog.bigdata.com <http://bigdata.com>
> > > http://mapgraph.io
> > >
> > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive
> new
> > > technology to use GPUs to accelerate data-parallel graph analytics.
> > >
> > > CONFIDENTIALITY NOTICE:  This email and its contents and attachments
> are
> > > for the sole use of the intended recipient(s) and are confidential or
> > > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> > > dissemination or copying of this email or its contents or attachments
> is
> > > prohibited. If you have received this communication in error, please
> > notify
> > > the sender by reply email and permanently delete all copies of the
> email
> > > and its contents and attachments.
> > >
> > > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > I currently updated to the current Revision (f4c63e5) of Blazegraph
> > from
> > > > Git and tried to load a dataset into the updated Webapp. With Bigdata
> > > 1.4.0
> > > > this resulted in a journal of ~18GB. Now the process was cancelled
> > > because
> > > > the disk was full - the journal was beyond 50GB for the same file
> with
> > > the
> > > > same settings.
> > > > The only exception was that I activated GroupCommit.
> > > >
> > > > The dataset can be downloaded here:
> > > >
> > >
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > > > .
> > > > Please find the settings used to load the file below.
> > > >
> > > > Do I have a misconfiguration, or is there a bug eating all disk
> memory?
> > > >
> > > > Best regards
> > > > Andreas
> > > >
> > > > Namespace-Properties:
> > > > curl -H "Accept: text/plain"
> > > > http://localhost:8080/bigdata/namespace/gnd/properties
> > > > #Wed Apr 22 11:35:31 CEST 2015
> > > >
> > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > > > com.bigdata.relation.container=gnd
> > > > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> > > >
> > > >
> > >
> > com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true
> > > > com.bigdata.btree.BTree.branchingFactor=700
> > > >
> > > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > > > com.bigdata.rdf.sail.isolatableIndices=false
> > > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > > com.bigdata.rdf.sail.truthMaintenance=false
> > > > com.bigdata.rdf.sail.namespace=gnd
> > > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > > > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > > > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > > > com.bigdata.relation.namespace=gndity=10000
> > > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> > > >
> > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------
> > > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > > Develop your own process in accordance with the BPMN 2 standard
> > > > Learn Process modeling best practices with Bonita BPM through live
> > > > exercises
> > > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > > > event?utm_
> > > >
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > > _______________________________________________
> > > > Bigdata-developers mailing list
> > > > Big...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> > > >
> > > >
> > >
> > >
> >
>
>

[Bigdata-developers] Antw: Re: Re: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Andreas K. <ka...@bs...> - 2015-04-23 13:36:24

Attachments: bsbVocabularies.tar.gz

Ok, I can redo the test with smallSlots + groupCommit enabled, and run http://localhost:8080/bigdata/status?dumpJournal&dumpPages after some minutes. (I cannot run it on the fully loaded dataset because my disk is not sufficient for the resulting Journal). 

By the way: Please find attached my custom Vocabulary classes. They are just one of my many attempts to improve IO Perfomance on rotating disks. 

Best Regards
Andreas

>>> Bryan Thompson <br...@sy...> 23.04.15 15.31 Uhr >>>
I just noticed that you have the full text index enabled as well.  I have
not be enabling that.

I would like to see the output from this command on the fully loaded data
sets.

http://localhost:8080/bigdata/status?dumpJournal&dumpPages

This will let us if any specific index is taking up a very large number of
pages.  It will also tell us the distribution over the page sizes for each
index.

Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> wrote:

> Bryan,
>
> in the meantime, I could successfully load the file into a 18GB journal
> after disabling groupCommit (I simply commented out the line in
> RWStore.properties).
> I can try again with groupCommit enabled, but smallSlotOptimization
> disabled.
>
> Best Regards
> Andreas
>
> >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
> Andreas,
>
> I was not able to replicate your result.  Unfortunately I navigated away
> from the browser page in which I had submitted the request, so it loaded
> all the data but failed to commit.  However, the resulting file is only
> 16GB.
>
> I will redo this run and verify that the journal after the commit has this
> same size on the disk.
>
> I was only assuming that this was related to group commit because of your
> original message.  Perhaps I misinterpreted your message. This is simply
> about 1.5.1 (with group commit) vs 1.4.0.
>
> Perhaps the issue is related to the small slot optimization?  Maybe in
> combination with group commit?
>
> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>
> I could not replicate your properties exactly because you are using a
> non-standard vocabulary class.  Therefore I simply deleted the default
> namespace (in quads mode) and recreated it with the defaults in triples
> mode.  The small slot optimization and other parameters were not enabled in
> my run.
>
> Perhaps you could try to replicate my experience and I will enable the
> small slots optimization?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attacP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Bryan & Martyn,
> >
> > Thank you very much for investigating the issue. I assume  from the
> ticket
> > that the error will vanish if I disable groupCommit. I will do so for the
> > meantime.
> >
> > Although there is already extensive information in Bryan's ticket, please
> > find attached my logs and DumpJournal outputs:
> > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> > ran into "No space left on device"
> > - dumpJournalWithTraceEnabled.html is the same dump for a running query
> > when the journal was at about 14GB
> > - queryStatus.html is just the status page showing my query
> > - catalina.out.gz contains the trace outputs from starting Tomcat until I
> > killed the curl running the SPARQL Update by Ctrl-C
> > - loadGnd.log.gz is Blazegraphs output when loading the data
> >
> > Best Regards
> > Andreas
> >
> >
> >
> > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> > See http://trac.bigdata.com/ticket/1206.  This is still in the
> > investigation stage.
> >
> > Thanks,
> > Bryan
> >
> > ----
> > Bryan Thompson
> > Chief Scientist & Founder
> > SYSTAP, LLC
> > 4501 Tower Road
> > Greensboro, NC 27410
> > br...@sy...
> > http://blazegraph.com
> > http://blog.bigdata.com <http://bigdata.com>
> > http://mapgraph.io
> >
> > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> > technology to use GPUs to accelerate data-parallel graph analytics.
> >
> > CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> > for the sole use of the intended recipient(s) and are confidential or
> > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> > dissemination or copying of this email or its contents or attachments is
> > prohibited. If you have received this communication in error, please
> notify
> > the sender by reply email and permanently delete all copies of the email
> > and its contents and attachments.
> >
> > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I currently updated to the current Revision (f4c63e5) of Blazegraph
> from
> > > Git and tried to load a dataset into the updated Webapp. With Bigdata
> > 1.4.0
> > > this resulted in a journal of ~18GB. Now the process was cancelled
> > because
> > > the disk was full - the journal was beyond 50GB for the same file with
> > the
> > > same settings.
> > > The only exception was that I activated GroupCommit.
> > >
> > > The dataset can be downloaded here:
> > >
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > > .
> > > Please find the settings used to load the file below.
> > >
> > > Do I have a misconfiguration, or is there a bug eating all disk memory?
> > >
> > > Best regards
> > > Andreas
> > >
> > > Namespace-Properties:
> > > curl -H "Accept: text/plain"
> > > http://localhost:8080/bigdata/namespace/gnd/properties
> > > #Wed Apr 22 11:35:31 CEST 2015
> > >
> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > > com.bigdata.relation.container=gnd
> > > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabu.textIndex=true
> > > com.bigdata.btree.BTree.branchingFactor=700
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > > com.bigdata.rdf.sail.isolatableIndices=false
> > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > com.bigdata.rdf.sail.truthMaintenance=false
> > > com.bigdata.rdf.sail.namespace=gnd
> > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > > com.bigdata.relation.namespace=gndity=10000
> > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > Develop your own process in accordance with the BPMN 2 standard
> > > Learn Process modeling best practices with Bonita BPM through live
> > > exercises
> > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > > event?utm_
> > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > _______________________________________________
> > > Bigdata-developers mailing list
> > > Big...@li...
> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> > >
> > >
> >
> >
>

Re: [Bigdata-developers] Antw: Re: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-23 13:33:27

I am at the following with group commit + small slots but without the full
text index.

><br>totalElapsed=4502693ms, elapsed=4502570ms, parsed=43820000, tps=9732,
done=false</br

-rw-r--r--  1 root root 6.0G Apr 23 09:31 bigdata.jnl

There is clearly a lot of recycling going on.  I am going to wait for it to
finish to look into this further.

magic=e6b4c275
version=1
extent=209715200(200M), userExtent=209714512(199M),
bytesAvailable=209714512(199M), nextOffset=0
rootBlock{ rootBlock=0, challisField=4, version=3,
nextOffset=253403152405, localTime=1429791054868 [Thursday, April 23,
2015 8:10:54 AM EDT], firstCommitTime=1429789657461 [Thursday, April
23, 2015 7:47:37 AM EDT], lastCommitTime=1429791054859 [Thursday,
April 23, 2015 8:10:54 AM EDT], commitCounter=4,
commitRecordAddr={off=NATIVE:-106500,len=422},
commitRecordIndexAddr={off=NATIVE:-81940,len=220}, blockSequence=1,
quorumToken=-1, metaBitsAddr=206535917615, metaStartAddr=3200,
storeType=RW, uuid=8d9bce3f-db56-4a87-b3fd-c1a433e1d3d8,
offsetBits=42, checksum=-1504696410, createTime=1429789657046
[Thursday, April 23, 2015 7:47:37 AM EDT], closeTime=0}
rootBlock{ rootBlock=1, challisField=3, version=3,
nextOffset=231928315910, localTime=1429791050520 [Thursday, April 23,
2015 8:10:50 AM EDT], firstCommitTime=1429789657461 [Thursday, April
23, 2015 7:47:37 AM EDT], lastCommitTime=1429791050513 [Thursday,
April 23, 2015 8:10:50 AM EDT], commitCounter=3,
commitRecordAddr={off=NATIVE:-40968,len=422},
commitRecordIndexAddr={off=NATIVE:-81925,len=220}, blockSequence=1,
quorumToken=-1, metaBitsAddr=206221344815, metaStartAddr=3200,
storeType=RW, uuid=8d9bce3f-db56-4a87-b3fd-c1a433e1d3d8,
offsetBits=42, checksum=-2109528144, createTime=1429789657046
[Thursday, April 23, 2015 7:47:37 AM EDT], closeTime=0}
The current root block is #0

-------------------------
RWStore Allocator Summary
-------------------------
AllocatorSize      AllocatorCount   SlotsAllocated  %SlotsAllocated
SlotsRecycled        SlotChurn       SlotsInUse      %SlotsInUse
MeanAllocation    SlotsReserved     %SlotsUnused    BytesReserved
BytesAppData       %SlotWaste         %AppData       %StoreFile
%TotalWaste       %FileWaste
64                           3390         25653334            51.17
      8436924             1.49         17216410            89.79
        27         24299520            29.15       1555169280
566105149            63.60            14.84            28.50
 60.20             18.13
128                           178          1349254             2.69
       106699             1.09          1242555             6.48
        87          1270272             2.18        162594816
107803126            33.70             2.83             2.98
  3.33              1.00
192                            19           229968             0.46
       105038             1.84           124930             0.65
       153           134144             6.87         25755648
18951542            26.42             0.50             0.47
 0.41              0.12
320                             5           229984             0.46
       203087             8.55            26897             0.14
       253            35840            24.95         11468800
7145051            37.70             0.19             0.21
0.26              0.08
512                             2           296128             0.59
       289066            41.93             7062             0.04
       415             7424             4.88          3801088
3949092            -3.89             0.10             0.07
-0.01              0.00
768                             2           369042             0.74
       365413           101.69             3629             0.02
       639             7424            51.12          5701632
3754158            34.16             0.10             0.10
0.12              0.04
1024                            2           348064             0.69
       345243           123.38             2821             0.01
       895             7424            62.00          7602176
3907272            48.60             0.10             0.14
0.22              0.07
2048                            4          1307596             2.61
      1280762            48.73            26834             0.14
      1525            28672             6.41         58720256
41087168            30.03             1.08             1.08
 1.07              0.32
3072                            2          1175674             2.34
      1162053            86.31            13621             0.07
      2558            14336             4.99         44040192
42018252             4.59             1.10             0.81
 0.12              0.04
4096                           26          1758846             3.51
      1581120             9.90           177726             0.93
      3572           186368             4.64        763363328
621452046            18.59            16.30            13.99
  8.64              2.60
8192                           48         17418250            34.74
     17086197            52.46           332053             1.73
      7274           344064             3.49       2818572288
2397567451            14.94            62.87            51.65
  25.62              7.72

-------------------------
BLOBS
-------------------------
Bucket(K)   Allocations    Allocated      Deletes      Deleted
Current         Data         Mean        Churn
16              7529975  87846235428      7432952  86784750383
97023   1061485045        11666        77.61
32               890621  17213523980       885724  17117153133
4897     96370847        19327       181.87
64                15272    560980091        15190    557968835
  82      3011256        36732       186.24
128                   0            0            0            0
   0            0            0         0.00
256                   0            0            0            0
   0            0            0         0.00
512                   0            0            0            0
   0            0            0         0.00
1024                  0            0            0            0
   0            0            0         0.00
2048                  0            0            0            0
   0            0            0         0.00
4096                  0            0            0            0
   0            0            0         0.00
8192                  0            0            0            0
   0            0            0         0.00
16384                 0            0            0            0
   0            0            0         0.00
32768                 0            0            0            0
   0            0            0         0.00
65536                 0            0            0            0
   0            0            0         0.00
2097151               0            0            0            0
   0            0            0         0.00



----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 9:30 AM, Andreas Kahl <ka...@bs...> wrote:

> Now I am 25mins into the new load with groupCommit enabled and
> com.bigdata.rwstore.RWStore.smallSlotType=1024 commented out.
> Currently 24,870,000 Triples are parsed and the journal is at 3.6GB. It
> looks like disabling smallSlotOptimization also resolves the problem
> (Otherwise I would have more than twice the space used at that time).
>
> So, I would conclude, it's the combination of groupCommit and
> smallSlotOptimization.
>
> All tests were run on Blazegraph 1.5.1 from Git revision f4c63e5.
>
> Best Regards
> Andreas
>
>
> >>> "Andreas Kahl" <ka...@bs...> 23.04.2015 14:54 >>>
> Bryan,
>
> in the meantime, I could successfully load the file into a 18GB journal
> after disabling groupCommit (I simply commented out the line in
> RWStore.properties).
> I can try again with groupCommit enabled, but smallSlotOptimization
> disabled.
>
> Best Regards
> Andreas
>
> >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
> Andreas,
>
> I was not able to replicate your result.  Unfortunately I navigated away
> from the browser page in which I had submitted the request, so it loaded
> all the data but failed to commit.  However, the resulting file is only
> 16GB.
>
> I will redo this run and verify that the journal after the commit has this
> same size on the disk.
>
> I was only assuming that this was related to group commit because of your
> original message.  Perhaps I misinterpreted your message. This is simply
> about 1.5.1 (with group commit) vs 1.4.0.
>
> Perhaps the issue is related to the small slot optimization?  Maybe in
> combination with group commit?
>
> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>
> I could not replicate your properties exactly because you are using a
> non-standard vocabulary class.  Therefore I simply deleted the default
> namespace (in quads mode) and recreated it with the defaults in triples
> mode.  The small slot optimization and other parameters were not enabled in
> my run.
>
> Perhaps you could try to replicate my experience and I will enable the
> small slots optimization?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Bryan & Martyn,
> >
> > Thank you very much for investigating the issue. I assume  from the
> ticket
> > that the error will vanish if I disable groupCommit. I will do so for the
> > meantime.
> >
> > Although there is already extensive information in Bryan's ticket, please
> > find attached my logs and DumpJournal outputs:
> > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> > ran into "No space left on device"
> > - dumpJournalWithTraceEnabled.html is the same dump for a running query
> > when the journal was at about 14GB
> > - queryStatus.html is just the status page showing my query
> > - catalina.out.gz contains the trace outputs from starting Tomcat until I
> > killed the curl running the SPARQL Update by Ctrl-C
> > - loadGnd.log.gz is Blazegraphs output when loading the data
> >
> > Best Regards
> > Andreas
> >
> >
> >
> > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> > See http://trac.bigdata.com/ticket/1206.  This is still in the
> > investigation stage.
> >
> > Thanks,
> > Bryan
> >
> > ----
> > Bryan Thompson
> > Chief Scientist & Founder
> > SYSTAP, LLC
> > 4501 Tower Road
> > Greensboro, NC 27410
> > br...@sy...
> > http://blazegraph.com
> > http://blog.bigdata.com <http://bigdata.com>
> > http://mapgraph.io
> >
> > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> > technology to use GPUs to accelerate data-parallel graph analytics.
> >
> > CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> > for the sole use of the intended recipient(s) and are confidential or
> > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> > dissemination or copying of this email or its contents or attachments is
> > prohibited. If you have received this communication in error, please
> notify
> > the sender by reply email and permanently delete all copies of the email
> > and its contents and attachments.
> >
> > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I currently updated to the current Revision (f4c63e5) of Blazegraph
> from
> > > Git and tried to load a dataset into the updated Webapp. With Bigdata
> > 1.4.0
> > > this resulted in a journal of ~18GB. Now the process was cancelled
> > because
> > > the disk was full - the journal was beyond 50GB for the same file with
> > the
> > > same settings.
> > > The only exception was that I activated GroupCommit.
> > >
> > > The dataset can be downloaded here:
> > >
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > > .
> > > Please find the settings used to load the file below.
> > >
> > > Do I have a misconfiguration, or is there a bug eating all disk memory?
> > >
> > > Best regards
> > > Andreas
> > >
> > > Namespace-Properties:
> > > curl -H "Accept: text/plain"
> > > http://localhost:8080/bigdata/namespace/gnd/properties
> > > #Wed Apr 22 11:35:31 CEST 2015
> > >
> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > > com.bigdata.relation.container=gnd
> > > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> > > com.bigdata.journal.AbstractJournal.initialExtent=209715200
> > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> > > com.bigdata.btree.BTree.branchingFactor=700
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > > com.bigdata.rdf.sail.isolatableIndices=false
> > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > com.bigdata.rdf.sail.truthMaintenance=false
> > > com.bigdata.rdf.sail.namespace=gnd
> > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > > com.bigdata.relation.namespace=gndity=10000
> > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > Develop your own process in accordance with the BPMN 2 standard
> > > Learn Process modeling best practices with Bonita BPM through live
> > > exercises
> > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > > event?utm_
> > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > _______________________________________________
> > > Bigdata-developers mailing list
> > > Big...@li...
> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> > >
> > >
> >
> >
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-23 13:31:07

I just noticed that you have the full text index enabled as well.  I have
not be enabling that.

I would like to see the output from this command on the fully loaded data
sets.

http://localhost:8080/bigdata/status?dumpJournal&dumpPages

This will let us if any specific index is taking up a very large number of
pages.  It will also tell us the distribution over the page sizes for each
index.

Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 8:54 AM, Andreas Kahl <ka...@bs...> wrote:

> Bryan,
>
> in the meantime, I could successfully load the file into a 18GB journal
> after disabling groupCommit (I simply commented out the line in
> RWStore.properties).
> I can try again with groupCommit enabled, but smallSlotOptimization
> disabled.
>
> Best Regards
> Andreas
>
> >>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
> Andreas,
>
> I was not able to replicate your result.  Unfortunately I navigated away
> from the browser page in which I had submitted the request, so it loaded
> all the data but failed to commit.  However, the resulting file is only
> 16GB.
>
> I will redo this run and verify that the journal after the commit has this
> same size on the disk.
>
> I was only assuming that this was related to group commit because of your
> original message.  Perhaps I misinterpreted your message. This is simply
> about 1.5.1 (with group commit) vs 1.4.0.
>
> Perhaps the issue is related to the small slot optimization?  Maybe in
> combination with group commit?
>
> *> com.bigdata.rwstore.RWStore.smallSlotType=1024*
>
> I could not replicate your properties exactly because you are using a
> non-standard vocabulary class.  Therefore I simply deleted the default
> namespace (in quads mode) and recreated it with the defaults in triples
> mode.  The small slot optimization and other parameters were not enabled in
> my run.
>
> Perhaps you could try to replicate my experience and I will enable the
> small slots optimization?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Bryan & Martyn,
> >
> > Thank you very much for investigating the issue. I assume  from the
> ticket
> > that the error will vanish if I disable groupCommit. I will do so for the
> > meantime.
> >
> > Although there is already extensive information in Bryan's ticket, please
> > find attached my logs and DumpJournal outputs:
> > - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> > ran into "No space left on device"
> > - dumpJournalWithTraceEnabled.html is the same dump for a running query
> > when the journal was at about 14GB
> > - queryStatus.html is just the status page showing my query
> > - catalina.out.gz contains the trace outputs from starting Tomcat until I
> > killed the curl running the SPARQL Update by Ctrl-C
> > - loadGnd.log.gz is Blazegraphs output when loading the data
> >
> > Best Regards
> > Andreas
> >
> >
> >
> > >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> > See http://trac.bigdata.com/ticket/1206.  This is still in the
> > investigation stage.
> >
> > Thanks,
> > Bryan
> >
> > ----
> > Bryan Thompson
> > Chief Scientist & Founder
> > SYSTAP, LLC
> > 4501 Tower Road
> > Greensboro, NC 27410
> > br...@sy...
> > http://blazegraph.com
> > http://blog.bigdata.com <http://bigdata.com>
> > http://mapgraph.io
> >
> > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> > graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> > APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> > technology to use GPUs to accelerate data-parallel graph analytics.
> >
> > CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> > for the sole use of the intended recipient(s) and are confidential or
> > proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> > dissemination or copying of this email or its contents or attachments is
> > prohibited. If you have received this communication in error, please
> notify
> > the sender by reply email and permanently delete all copies of the email
> > and its contents and attachments.
> >
> > On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I currently updated to the current Revision (f4c63e5) of Blazegraph
> from
> > > Git and tried to load a dataset into the updated Webapp. With Bigdata
> > 1.4.0
> > > this resulted in a journal of ~18GB. Now the process was cancelled
> > because
> > > the disk was full - the journal was beyond 50GB for the same file with
> > the
> > > same settings.
> > > The only exception was that I activated GroupCommit.
> > >
> > > The dataset can be downloaded here:
> > >
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > > .
> > > Please find the settings used to load the file below.
> > >
> > > Do I have a misconfiguration, or is there a bug eating all disk memory?
> > >
> > > Best regards
> > > Andreas
> > >
> > > Namespace-Properties:
> > > curl -H "Accept: text/plain"
> > > http://localhost:8080/bigdata/namespace/gnd/properties
> > > #Wed Apr 22 11:35:31 CEST 2015
> > >
> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > > com.bigdata.relation.container=gnd
> > > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> > > com.bigdata.journal.AbstractJournal.initialExtent=209715200
> > > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> > > com.bigdata.btree.BTree.branchingFactor=700
> > >
> > >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > > com.bigdata.rdf.sail.isolatableIndices=false
> > > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > > com.bigdata.rdf.sail.bufferCapacity=2000
> > > com.bigdata.rdf.sail.truthMaintenance=false
> > > com.bigdata.rdf.sail.namespace=gnd
> > > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > > com.bigdata.relation.namespace=gndity=10000
> > > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > Develop your own process in accordance with the BPMN 2 standard
> > > Learn Process modeling best practices with Bonita BPM through live
> > > exercises
> > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > > event?utm_
> > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > _______________________________________________
> > > Bigdata-developers mailing list
> > > Big...@li...
> > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> > >
> > >
> >
> >
>

[Bigdata-developers] Antw: Re: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Andreas K. <ka...@bs...> - 2015-04-23 13:30:57

Now I am 25mins into the new load with groupCommit enabled and com.bigdata.rwstore.RWStore.smallSlotType=1024 commented out. 
Currently 24,870,000 Triples are parsed and the journal is at 3.6GB. It looks like disabling smallSlotOptimization also resolves the problem (Otherwise I would have more than twice the space used at that time). 

So, I would conclude, it's the combination of groupCommit and smallSlotOptimization. 

All tests were run on Blazegraph 1.5.1 from Git revision f4c63e5. 

Best Regards
Andreas


>>> "Andreas Kahl" <ka...@bs...> 23.04.2015 14:54 >>>
Bryan, 

in the meantime, I could successfully load the file into a 18GB journal after disabling groupCommit (I simply commented out the line in RWStore.properties). 
I can try again with groupCommit enabled, but smallSlotOptimization disabled. 

Best Regards
Andreas

>>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
Andreas,

I was not able to replicate your result.  Unfortunately I navigated away
from the browser page in which I had submitted the request, so it loaded
all the data but failed to commit.  However, the resulting file is only
16GB.

I will redo this run and verify that the journal after the commit has this
same size on the disk.

I was only assuming that this was related to group commit because of your
original message.  Perhaps I misinterpreted your message. This is simply
about 1.5.1 (with group commit) vs 1.4.0.

Perhaps the issue is related to the small slot optimization?  Maybe in
combination with group commit?

*> com.bigdata.rwstore.RWStore.smallSlotType=1024*

I could not replicate your properties exactly because you are using a
non-standard vocabulary class.  Therefore I simply deleted the default
namespace (in quads mode) and recreated it with the defaults in triples
mode.  The small slot optimization and other parameters were not enabled in
my run.

Perhaps you could try to replicate my experience and I will enable the
small slots optimization?

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy... 
http://blazegraph.com 
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io 

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> wrote:

> Bryan & Martyn,
>
> Thank you very much for investigating the issue. I assume  from the ticket
> that the error will vanish if I disable groupCommit. I will do so for the
> meantime.
>
> Although there is already extensive information in Bryan's ticket, please
> find attached my logs and DumpJournal outputs:
> - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> ran into "No space left on device"
> - dumpJournalWithTraceEnabled.html is the same dump for a running query
> when the journal was at about 14GB
> - queryStatus.html is just the status page showing my query
> - catalina.out.gz contains the trace outputs from starting Tomcat until I
> killed the curl running the SPARQL Update by Ctrl-C
> - loadGnd.log.gz is Blazegraphs output when loading the data
>
> Best Regards
> Andreas
>
>
>
> >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> See http://trac.bigdata.com/ticket/1206.  This is still in the
> investigation stage.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy... 
> http://blazegraph.com 
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io 
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Hello everyone,
> >
> > I currently updated to the current Revision (f4c63e5) of Blazegraph from
> > Git and tried to load a dataset into the updated Webapp. With Bigdata
> 1.4.0
> > this resulted in a journal of ~18GB. Now the process was cancelled
> because
> > the disk was full - the journal was beyond 50GB for the same file with
> the
> > same settings.
> > The only exception was that I activated GroupCommit.
> >
> > The dataset can be downloaded here:
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz 
> > .
> > Please find the settings used to load the file below.
> >
> > Do I have a misconfiguration, or is there a bug eating all disk memory?
> >
> > Best regards
> > Andreas
> >
> > Namespace-Properties:
> > curl -H "Accept: text/plain"
> > http://localhost:8080/bigdata/namespace/gnd/properties 
> > #Wed Apr 22 11:35:31 CEST 2015
> > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > com.bigdata.relation.container=gnd
> > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> > com.bigdata.journal.AbstractJournal.initialExtent=209715200
> > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> > com.bigdata.btree.BTree.branchingFactor=700
> >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > com.bigdata.rdf.sail.isolatableIndices=false
> > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > com.bigdata.rdf.sail.bufferCapacity=2000
> > com.bigdata.rdf.sail.truthMaintenance=false
> > com.bigdata.rdf.sail.namespace=gnd
> > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > com.bigdata.relation.namespace=gndity=10000
> > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> > exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- 
> > event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Bigdata-developers mailing list
> > Big...@li... 
> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers 
> >
> >
>
>

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Bigdata-developers mailing list
Big...@li... 
https://lists.sourceforge.net/lists/listinfo/bigdata-developers

[Bigdata-developers] Antw: Re: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Andreas K. <ka...@bs...> - 2015-04-23 12:54:47

Bryan, 

in the meantime, I could successfully load the file into a 18GB journal after disabling groupCommit (I simply commented out the line in RWStore.properties). 
I can try again with groupCommit enabled, but smallSlotOptimization disabled. 

Best Regards
Andreas

>>> Bryan Thompson <br...@sy...> 23.04.2015 13:24 >>>
Andreas,

I was not able to replicate your result.  Unfortunately I navigated away
from the browser page in which I had submitted the request, so it loaded
all the data but failed to commit.  However, the resulting file is only
16GB.

I will redo this run and verify that the journal after the commit has this
same size on the disk.

I was only assuming that this was related to group commit because of your
original message.  Perhaps I misinterpreted your message. This is simply
about 1.5.1 (with group commit) vs 1.4.0.

Perhaps the issue is related to the small slot optimization?  Maybe in
combination with group commit?

*> com.bigdata.rwstore.RWStore.smallSlotType=1024*

I could not replicate your properties exactly because you are using a
non-standard vocabulary class.  Therefore I simply deleted the default
namespace (in quads mode) and recreated it with the defaults in triples
mode.  The small slot optimization and other parameters were not enabled in
my run.

Perhaps you could try to replicate my experience and I will enable the
small slots optimization?

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy... 
http://blazegraph.com 
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io 

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> wrote:

> Bryan & Martyn,
>
> Thank you very much for investigating the issue. I assume  from the ticket
> that the error will vanish if I disable groupCommit. I will do so for the
> meantime.
>
> Although there is already extensive information in Bryan's ticket, please
> find attached my logs and DumpJournal outputs:
> - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> ran into "No space left on device"
> - dumpJournalWithTraceEnabled.html is the same dump for a running query
> when the journal was at about 14GB
> - queryStatus.html is just the status page showing my query
> - catalina.out.gz contains the trace outputs from starting Tomcat until I
> killed the curl running the SPARQL Update by Ctrl-C
> - loadGnd.log.gz is Blazegraphs output when loading the data
>
> Best Regards
> Andreas
>
>
>
> >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> See http://trac.bigdata.com/ticket/1206.  This is still in the
> investigation stage.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy... 
> http://blazegraph.com 
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io 
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Hello everyone,
> >
> > I currently updated to the current Revision (f4c63e5) of Blazegraph from
> > Git and tried to load a dataset into the updated Webapp. With Bigdata
> 1.4.0
> > this resulted in a journal of ~18GB. Now the process was cancelled
> because
> > the disk was full - the journal was beyond 50GB for the same file with
> the
> > same settings.
> > The only exception was that I activated GroupCommit.
> >
> > The dataset can be downloaded here:
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz 
> > .
> > Please find the settings used to load the file below.
> >
> > Do I have a misconfiguration, or is there a bug eating all disk memory?
> >
> > Best regards
> > Andreas
> >
> > Namespace-Properties:
> > curl -H "Accept: text/plain"
> > http://localhost:8080/bigdata/namespace/gnd/properties 
> > #Wed Apr 22 11:35:31 CEST 2015
> > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > com.bigdata.relation.container=gnd
> > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> > com.bigdata.journal.AbstractJournal.initialExtent=209715200
> > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> > com.bigdata.btree.BTree.branchingFactor=700
> >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > com.bigdata.rdf.sail.isolatableIndices=false
> > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > com.bigdata.rdf.sail.bufferCapacity=2000
> > com.bigdata.rdf.sail.truthMaintenance=false
> > com.bigdata.rdf.sail.namespace=gnd
> > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > com.bigdata.relation.namespace=gndity=10000
> > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> > exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- 
> > event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Bigdata-developers mailing list
> > Big...@li... 
> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers 
> >
> >
>
>

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-23 11:24:33

Andreas,

I was not able to replicate your result.  Unfortunately I navigated away
from the browser page in which I had submitted the request, so it loaded
all the data but failed to commit.  However, the resulting file is only
16GB.

I will redo this run and verify that the journal after the commit has this
same size on the disk.

I was only assuming that this was related to group commit because of your
original message.  Perhaps I misinterpreted your message. This is simply
about 1.5.1 (with group commit) vs 1.4.0.

Perhaps the issue is related to the small slot optimization?  Maybe in
combination with group commit?

*> com.bigdata.rwstore.RWStore.smallSlotType=1024*

I could not replicate your properties exactly because you are using a
non-standard vocabulary class.  Therefore I simply deleted the default
namespace (in quads mode) and recreated it with the defaults in triples
mode.  The small slot optimization and other parameters were not enabled in
my run.

Perhaps you could try to replicate my experience and I will enable the
small slots optimization?

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Thu, Apr 23, 2015 at 1:51 AM, Andreas Kahl <ka...@bs...> wrote:

> Bryan & Martyn,
>
> Thank you very much for investigating the issue. I assume  from the ticket
> that the error will vanish if I disable groupCommit. I will do so for the
> meantime.
>
> Although there is already extensive information in Bryan's ticket, please
> find attached my logs and DumpJournal outputs:
> - dumpJournal.html contains a dump from the 67GB journal after Blazegraph
> ran into "No space left on device"
> - dumpJournalWithTraceEnabled.html is the same dump for a running query
> when the journal was at about 14GB
> - queryStatus.html is just the status page showing my query
> - catalina.out.gz contains the trace outputs from starting Tomcat until I
> killed the curl running the SPARQL Update by Ctrl-C
> - loadGnd.log.gz is Blazegraphs output when loading the data
>
> Best Regards
> Andreas
>
>
>
> >>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
> See http://trac.bigdata.com/ticket/1206.  This is still in the
> investigation stage.
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...>
> wrote:
>
> > Hello everyone,
> >
> > I currently updated to the current Revision (f4c63e5) of Blazegraph from
> > Git and tried to load a dataset into the updated Webapp. With Bigdata
> 1.4.0
> > this resulted in a journal of ~18GB. Now the process was cancelled
> because
> > the disk was full - the journal was beyond 50GB for the same file with
> the
> > same settings.
> > The only exception was that I activated GroupCommit.
> >
> > The dataset can be downloaded here:
> >
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> > .
> > Please find the settings used to load the file below.
> >
> > Do I have a misconfiguration, or is there a bug eating all disk memory?
> >
> > Best regards
> > Andreas
> >
> > Namespace-Properties:
> > curl -H "Accept: text/plain"
> > http://localhost:8080/bigdata/namespace/gnd/properties
> > #Wed Apr 22 11:35:31 CEST 2015
> > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> > com.bigdata.relation.container=gnd
> > com.bigdata.rwstore.RWStore.smallSlotType=1024
> > com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> > com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
> >
> >
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> > com.bigdata.journal.AbstractJournal.initialExtent=209715200
> > com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> > com.bigdata.btree.BTree.branchingFactor=700
> >
> >
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> > com.bigdata.rdf.sail.isolatableIndices=false
> > com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> > com.bigdata.rdf.sail.bufferCapacity=2000
> > com.bigdata.rdf.sail.truthMaintenance=false
> > com.bigdata.rdf.sail.namespace=gnd
> > com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> > com.bigdata.rdf.store.AbstractTripleStore.quads=false
> > com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> > com.bigdata.search.FullTextIndex.fieldsEnabled=false
> > com.bigdata.relation.namespace=gndity=10000
> > com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> > exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Bigdata-developers mailing list
> > Big...@li...
> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers
> >
> >
>
>

[Bigdata-developers] Antw: Re: Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Andreas K. <ka...@bs...> - 2015-04-23 05:51:40

Attachments: dumpJournalWithTraceEnabled.html queryStatus.html catalina.out.gz loadGnd.log.gz dumpJournal.html

Bryan & Martyn, 

Thank you very much for investigating the issue. I assume  from the ticket that the error will vanish if I disable groupCommit. I will do so for the meantime. 

Although there is already extensive information in Bryan's ticket, please find attached my logs and DumpJournal outputs: 
- dumpJournal.html contains a dump from the 67GB journal after Blazegraph ran into "No space left on device"
- dumpJournalWithTraceEnabled.html is the same dump for a running query when the journal was at about 14GB
- queryStatus.html is just the status page showing my query
- catalina.out.gz contains the trace outputs from starting Tomcat until I killed the curl running the SPARQL Update by Ctrl-C
- loadGnd.log.gz is Blazegraphs output when loading the data

Best Regards
Andreas



>>> Bryan Thompson <br...@sy...> 22.04.15 20.56 Uhr >>>
See http://trac.bigdata.com/ticket/1206.  This is still in the
investigation stage.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> wrote:

> Hello everyone,
>
> I currently updated to the current Revision (f4c63e5) of Blazegraph from
> Git and tried to load a dataset into the updated Webapp. With Bigdata 1.4.0
> this resulted in a journal of ~18GB. Now the process was cancelled because
> the disk was full - the journal was beyond 50GB for the same file with the
> same settings.
> The only exception was that I activated GroupCommit.
>
> The dataset can be downloaded here:
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> .
> Please find the settings used to load the file below.
>
> Do I have a misconfiguration, or is there a bug eating all disk memory?
>
> Best regards
> Andreas
>
> Namespace-Properties:
> curl -H "Accept: text/plain"
> http://localhost:8080/bigdata/namespace/gnd/properties
> #Wed Apr 22 11:35:31 CEST 2015
> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> com.bigdata.relation.container=gnd
> com.bigdata.rwstore.RWStore.smallSlotType=1024
> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
>
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> com.bigdata.journal.AbstractJournal.initialExtent=209715200
> com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> com.bigdata.btree.BTree.branchingFactor=700
>
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> com.bigdata.rdf.sail.isolatableIndices=false
> com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> com.bigdata.rdf.sail.bufferCapacity=2000
> com.bigdata.rdf.sail.truthMaintenance=false
> com.bigdata.rdf.sail.namespace=gnd
> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> com.bigdata.rdf.store.AbstractTripleStore.quads=false
> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> com.bigdata.search.FullTextIndex.fieldsEnabled=false
> com.bigdata.relation.namespace=gndity=10000
> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>

Re: [Bigdata-developers] Subquery wildcard projections not rewritten

From: Bryan T. <br...@sy...> - 2015-04-22 21:50:02

Lee,

Acknowledge.  I am swamped by other things right now.  I will try to get
back to this as soon as I can.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Wed, Apr 22, 2015 at 5:41 AM, Lee Kitching <le...@sw...> wrote:

> Hi Bryan,
>
> Yes the AST in the test is supposed to be for the query
>
> select (count(*) as ?c) where {
>    select * where {
>         select * where { ?s ?p ?o }
>      } limit 21 offset 0
>  }
>
> Thanks
>
> On Tue, Apr 21, 2015 at 7:53 PM, Bryan Thompson <br...@sy...> wrote:
>
>> Lee,
>>
>> I can replicate the problem with your query (as given above) against the
>> sparql end point.
>>
>> Can you state the SPARQL that you are trying to model with this unit
>> test?  It appears to be not query the same as your SPARQL query above.  I
>> would like to make sure that it is being translated correctly into the
>> AST.  I can then look at the expected AST and work backwards and see if I
>> believe that the test shows the problem.
>>
>> Thanks,
>> Bryan
>>
>> ----
>> Bryan Thompson
>> Chief Scientist & Founder
>> SYSTAP, LLC
>> 4501 Tower Road
>> Greensboro, NC 27410
>> br...@sy...
>> http://blazegraph.com
>> http://blog.bigdata.com <http://bigdata.com>
>> http://mapgraph.io
>>
>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
>> technology to use GPUs to accelerate data-parallel graph analytics.
>>
>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
>> for the sole use of the intended recipient(s) and are confidential or
>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>> dissemination or copying of this email or its contents or attachments is
>> prohibited. If you have received this communication in error, please notify
>> the sender by reply email and permanently delete all copies of the email
>> and its contents and attachments.
>>
>> On Tue, Apr 21, 2015 at 11:07 AM, Lee Kitching <le...@sw...> wrote:
>>
>>> Hi Bryan,
>>>
>>> We allow users to enter their own SPARQL queries and wrap them to do
>>> things like pagination so unfortunately we cannot just re-write our queries
>>> to do the expansion manually.
>>> I applied the fix detailed in the ticket and it fixes the for the query
>>> I provided, however it fails to rewrite the following query:
>>>
>>> SELECT (COUNT(*) as ?c) {
>>>   SELECT * {
>>>     SELECT * WHERE { ?s ?p ?o }
>>>   } LIMIT 21 OFFSET 0
>>> }
>>>
>>> I attempted to debug the issue, and it seems to re-write the *
>>> projection in the inner-most subquery but not the subquery with the limit
>>> and offset. I created a test based on the
>>> existing tests:
>>>
>>> public void test_wildcardProjectionOptimizer03() {
>>>
>>>         /*
>>>        * Note: DO NOT share structures in this test!!!!
>>>        */
>>>         final IBindingSet[] bsets = new IBindingSet[] {};
>>>
>>>         // The source AST.
>>>         final QueryRoot given = new QueryRoot(QueryType.SELECT);
>>>         {
>>>             final SubqueryRoot selectQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final JoinGroupNode whereClause1 = new JoinGroupNode();
>>>                 final StatementPatternNode spoPattern = new
>>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"),
>>> null, Scope.DEFAULT_CONTEXTS);
>>>                 whereClause1.addChild(spoPattern);
>>>
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("*"));
>>>                 selectQuery.setProjection(p);
>>>                 selectQuery.setWhereClause(whereClause1);
>>>             }
>>>
>>>             final SubqueryRoot sliceQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("*"));
>>>                 sliceQuery.setProjection(p);
>>>
>>>                 final JoinGroupNode whereClause = new JoinGroupNode();
>>>                 whereClause.addChild(selectQuery);
>>>
>>>                 sliceQuery.setSlice(new SliceNode(0, 21));
>>>             }
>>>
>>>             final FunctionNode countNode = new FunctionNode(
>>>                     FunctionRegistry.COUNT,
>>>                     Collections.EMPTY_MAP,
>>>                     new VarNode("*"));
>>>
>>>             final ProjectionNode countProjection = new ProjectionNode();
>>>             countProjection.addProjectionExpression(new
>>> AssignmentNode(new VarNode("c"), countNode));
>>>
>>>             JoinGroupNode countWhere = new JoinGroupNode();
>>>             countWhere.addChild(sliceQuery);
>>>
>>>             given.setProjection(countProjection);
>>>             given.setWhereClause(countWhere);
>>>         }
>>>
>>>         final QueryRoot expected = new QueryRoot(QueryType.SELECT);
>>>         {
>>>             final SubqueryRoot selectQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final JoinGroupNode whereClause1 = new JoinGroupNode();
>>>                 final StatementPatternNode spoPattern = new
>>> StatementPatternNode(new VarNode("s"), new VarNode("p"), new VarNode("o"),
>>> null, Scope.DEFAULT_CONTEXTS);
>>>                 whereClause1.addChild(spoPattern);
>>>
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("s"));
>>>                 p.addProjectionVar(new VarNode("p"));
>>>                 p.addProjectionVar(new VarNode("o"));
>>>                 selectQuery.setProjection(p);
>>>                 selectQuery.setWhereClause(whereClause1);
>>>             }
>>>
>>>             final SubqueryRoot sliceQuery = new
>>> SubqueryRoot(QueryType.SELECT);
>>>             {
>>>                 final ProjectionNode p = new ProjectionNode();
>>>                 p.addProjectionVar(new VarNode("s"));
>>>                 p.addProjectionVar(new VarNode("p"));
>>>                 p.addProjectionVar(new VarNode("o"));
>>>
>>>                 sliceQuery.setProjection(p);
>>>
>>>                 final JoinGroupNode whereClause = new JoinGroupNode();
>>>                 whereClause.addChild(selectQuery);
>>>
>>>                 sliceQuery.setSlice(new SliceNode(0, 21));
>>>             }
>>>
>>>             final FunctionNode countNode = new FunctionNode(
>>>                     FunctionRegistry.COUNT,
>>>                     Collections.EMPTY_MAP,
>>>                     new VarNode("*"));
>>>
>>>             final ProjectionNode countProjection = new ProjectionNode();
>>>             countProjection.addProjectionExpression(new
>>> AssignmentNode(new VarNode("c"), countNode));
>>>
>>>             JoinGroupNode countWhere = new JoinGroupNode();
>>>             countWhere.addChild(sliceQuery);
>>>
>>>             expected.setProjection(countProjection);
>>>             expected.setWhereClause(countWhere);
>>>         }
>>>
>>>         final IASTOptimizer rewriter = new
>>> ASTWildcardProjectionOptimizer();
>>>
>>>         final IQueryNode actual = rewriter.optimize(null/*
>>> AST2BOpContext */,
>>>                 given/* queryNode */, bsets);
>>>
>>>         assertSameAST(expected, actual);
>>>
>>>     }
>>>
>>> however I am having some problems running the tests locally so I don't
>>> know if it accurately models the situation.
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Mon, Apr 20, 2015 at 9:05 PM, Bryan Thompson <br...@sy...>
>>> wrote:
>>>
>>>> Lee,
>>>>
>>>> I've updated the ticket with the code changes and the test changes.
>>>> Please try this out and let me know if you have any problems.
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>> ----
>>>> Bryan Thompson
>>>> Chief Scientist & Founder
>>>> SYSTAP, LLC
>>>> 4501 Tower Road
>>>> Greensboro, NC 27410
>>>> br...@sy...
>>>> http://blazegraph.com
>>>> http://blog.bigdata.com <http://bigdata.com>
>>>> http://mapgraph.io
>>>>
>>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>>>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive
>>>> new technology to use GPUs to accelerate data-parallel graph analytics.
>>>>
>>>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>>>> are for the sole use of the intended recipient(s) and are confidential or
>>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>>>> dissemination or copying of this email or its contents or attachments is
>>>> prohibited. If you have received this communication in error, please notify
>>>> the sender by reply email and permanently delete all copies of the email
>>>> and its contents and attachments.
>>>>
>>>> On Mon, Apr 20, 2015 at 1:20 PM, Lee Kitching <le...@sw...> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are currently evaluating using Blazegraph as our rdf database and
>>>>> have run in the issue described at http://trac.bigdata.com/ticket/757.
>>>>> The below query causes the AssertionError to be thrown:
>>>>>
>>>>> SELECT (COUNT(*) as ?c) {
>>>>>   SELECT ?uri ?graph where {
>>>>>           {
>>>>>             SELECT * WHERE {
>>>>>               GRAPH ?graph {
>>>>>                 ?uri a <http://object> .
>>>>>                 ?uri <http://purl.org/dc/terms/title> ?title .
>>>>>               }
>>>>>               MINUS {
>>>>>                 ?uri a <http://other>
>>>>>               }
>>>>>             }
>>>>>             ORDER BY ?title
>>>>>           }
>>>>>         }
>>>>> }
>>>>>
>>>>> Some debugging shows that the error is caused by the
>>>>> ASTWildcardProjectionOptimizer failing to recurse into the subqueries to
>>>>> rewrite the * projection. However this recursion is implemented in the
>>>>> BOpUtility.postOrderIterator(BOp) method - this method uses the argIterator
>>>>> to
>>>>> find child operators and therefore only visits children for nodes with
>>>>> an arity > 0.
>>>>>
>>>>> The root query node for the above query has an empty 'args' collection
>>>>> and all the associated components of the top-level query are stored in the
>>>>> annotations map. It looks like the iterator should search through the
>>>>> annotations rather than the args for query nodes.
>>>>>
>>>>> As there are a lot of implementations of the BOp interface, it seems
>>>>> that changing the postOrderIterator2(BOp) method is unlikely to be the
>>>>> correct fix. It seems that either the AST query nodes should override the
>>>>> arity() function to return the count of the annotations map, or the
>>>>> ASTWildcardProjectionOptimizer should use its own iterator for the nodes of
>>>>> the query. The latter option would be the least impactful change but I am
>>>>> not familiar with the codebase to understand the correct fix.
>>>>>
>>>>> Any help in resolving the issue would be appreciated.
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>>>> Develop your own process in accordance with the BPMN 2 standard
>>>>> Learn Process modeling best practices with Bonita BPM through live
>>>>> exercises
>>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>>>>> event?utm_
>>>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>>>> _______________________________________________
>>>>> Bigdata-developers mailing list
>>>>> Big...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Bryan T. <br...@sy...> - 2015-04-22 18:56:18

See http://trac.bigdata.com/ticket/1206.  This is still in the
investigation stage.

Thanks,
Bryan

----
Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
br...@sy...
http://blazegraph.com
http://blog.bigdata.com <http://bigdata.com>
http://mapgraph.io

Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
technology to use GPUs to accelerate data-parallel graph analytics.

CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

On Wed, Apr 22, 2015 at 5:37 AM, Andreas Kahl <ka...@bs...> wrote:

> Hello everyone,
>
> I currently updated to the current Revision (f4c63e5) of Blazegraph from
> Git and tried to load a dataset into the updated Webapp. With Bigdata 1.4.0
> this resulted in a journal of ~18GB. Now the process was cancelled because
> the disk was full - the journal was beyond 50GB for the same file with the
> same settings.
> The only exception was that I activated GroupCommit.
>
> The dataset can be downloaded here:
> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
> .
> Please find the settings used to load the file below.
>
> Do I have a misconfiguration, or is there a bug eating all disk memory?
>
> Best regards
> Andreas
>
> Namespace-Properties:
> curl -H "Accept: text/plain"
> http://localhost:8080/bigdata/namespace/gnd/properties
> #Wed Apr 22 11:35:31 CEST 2015
> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
> com.bigdata.relation.container=gnd
> com.bigdata.rwstore.RWStore.smallSlotType=1024
> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
>
> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
> com.bigdata.journal.AbstractJournal.initialExtent=209715200
> com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
> com.bigdata.btree.BTree.branchingFactor=700
>
> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
> com.bigdata.rdf.sail.isolatableIndices=false
> com.bigdata.service.AbstractTransactionService.minReleaseAge=1
> com.bigdata.rdf.sail.bufferCapacity=2000
> com.bigdata.rdf.sail.truthMaintenance=false
> com.bigdata.rdf.sail.namespace=gnd
> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
> com.bigdata.rdf.store.AbstractTripleStore.quads=false
> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
> com.bigdata.search.FullTextIndex.fieldsEnabled=false
> com.bigdata.relation.namespace=gnd
> com.bigdata.journal.Journal.groupCommit=true
> com.bigdata.btree.writeRetentionQueue.capacity=10000
> com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
>

Re: [Bigdata-developers] Current Revision of Blazegraph: Journal consumes extremely much disk space

From: Martyn C. <ma...@sy...> - 2015-04-22 14:55:24

Well TRACE on FixedAllocator will let you know when new Allocators are 
created, and also whenever addresses are recycled.

In a well behaved system, the latter logging will flood the log, while 
if little or no recycling, then we'll see a higher proportion of new 
Allocator messages.

It may be worth a short run (say 10 minutes, or waiting until journal 
has grown to 1G) to see what  is written with this log4j property:

log4j.logger.com.bigdata.rwstore.FixedAllocator=TRACE

- Martyn

On 22/04/2015 13:50, Bryan Thompson wrote:
> I would wait on this.  There will not (should not) be any intermediate
> commits so what we need to do is log the allocators (and the shadow
> allocators used during group commit for unisolated index operations).
>
> @Martyn: Can you suggest some logging that might capture what is happening
> with the allocators during the load before Andreas retries this operation?
>
> Thanks,
> Bryan
>
> ----
> Bryan Thompson
> Chief Scientist & Founder
> SYSTAP, LLC
> 4501 Tower Road
> Greensboro, NC 27410
> br...@sy...
> http://blazegraph.com
> http://blog.bigdata.com <http://bigdata.com>
> http://mapgraph.io
>
> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
> technology to use GPUs to accelerate data-parallel graph analytics.
>
> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
> for the sole use of the intended recipient(s) and are confidential or
> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
> dissemination or copying of this email or its contents or attachments is
> prohibited. If you have received this communication in error, please notify
> the sender by reply email and permanently delete all copies of the email
> and its contents and attachments.
>
> On Wed, Apr 22, 2015 at 8:32 AM, Andreas Kahl <ka...@bs...> wrote:
>
>> There were no other concurrent queries. Just the one SPARQL LOAD.
>> I have deleted the file in the meantime (after a bit of cleaning I had
>> ~60GB, so the disk was full at that size).
>> If I can run DumpJournal without a commit, I can easily re-run the Load up
>> to the java.io.IOException thrown by the full disk.
>>
>> Currently I have restarted the LOAD. I will wait until it breaks down
>> (about 1h) and try to run DumpJournal on it.
>>
>> Andreas
>>
>>>>> Bryan Thompson <br...@sy...> 22.04.15 14.03 Uhr >>>
>> Were you running any other operations concurrently against the database?
>> Other updates or queries?
>>
>> In general, it is helpful to get the metadata about the allocators and root
>> blocks.  However, from what you have written, it sounds like you terminated
>> the process when the disk space filled up.  In this case there would only
>> be the original root blocks and no commit points recorded on the journal.
>>
>> If you still have the file, can you run DumpJournal on it and send the
>> output? The -pages option is not required in this case since we are only
>> interested in the root blocks and allocators.
>>
>> Thanks,
>> Bryan
>>
>> ----
>> Bryan Thompson
>> Chief Scientist & Founder
>> SYSTAP, LLC
>> 4501 Tower Road
>> Greensboro, NC 27410
>> br...@sy...
>> http://blazegraph.com
>> http://blog.bigdata.com <http://bigdata.com>
>> http://mapgraph.io
>>
>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
>> technology to use GPUs to accelerate data-parallel graph analytics.
>>
>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
>> for the sole use of the intended recipient(s) and are confidential or
>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>> dissemination or copying of this email or its contents or attachments is
>> prohibited. If you have received this communication in error, please notify
>> the sender by reply email and permanently delete all copies of the email
>> and its contents and attachments.
>>
>> On Wed, Apr 22, 2015 at 7:58 AM, Andreas Kahl <ka...@bs...>
>> wrote:
>>
>>> That was a newly created journal. I simply stopped tomcat, deleted
>>> bigdata.jnl and restarted.
>>>
>>> Andreas
>>>
>>>>>> Bryan Thompson <br...@sy...> 22.04.15 13.46 Uhr >>>
>>> Was the data loaded into a new and empty journal or into a pre-existing
>>> journal?  If the latter, what size was the journal and what data were in
>>> it?
>>>
>>> Thanks,
>>> Bryan
>>>
>>> ----
>>> Bryan Thompson
>>> Chief Scientist & Founder
>>> SYSTAP, LLC
>>> 4501 Tower Road
>>> Greensboro, NC 27410
>>> br...@sy...
>>> http://blazegraph.com
>>> http://blog.bigdata.com <http://bigdata.com>
>>> http://mapgraph.io
>>>
>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive new
>>> technology to use GPUs to accelerate data-parallel graph analytics.
>>>
>>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments are
>>> for the sole use of the intended recipient(s) and are confidential or
>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>>> dissemination or copying of this email or its contents or attachments is
>>> prohibited. If you have received this communication in error, please
>> notify
>>> the sender by reply email and permanently delete all copies of the email
>>> and its contents and attachments.
>>>
>>> On Wed, Apr 22, 2015 at 6:54 AM, Andreas Kahl <ka...@bs...>
>>> wrote:
>>>
>>>> Bryan,
>>>>
>>>> yes, I used this command:
>>>> curl -d"update=LOAD <file:///srv/feed-dateien/DNBLOD/GND.rdf.gz>;"
>>>> -d"namespace=gnd" -d"monitor=true"
>> http://localhost:8080/bigdata/sparql
>>>> Best Regards
>>>> Andreas
>>>>
>>>>>>> Bryan Thompson <br...@sy...> 22.04.15 12.51 Uhr >>>
>>>> Andreas,
>>>>
>>>> What command did you use to load the data set?  I.e., SPARQL update
>>> "Load"
>>>> or something else?
>>>>
>>>> Than Hello everyone,
>>>>> I currently updated to the current Revision (f4c63e5) of Blazegraph
>>> from
>>>>> Git and tried to load a dataset into the updated Webapp. With Bigdata
>>>> 1.4.0
>>>>> this resulted in a journal of ~18GB. Now the process was cancelled
>>>> because
>>>>> the disk was full - the journal was beyond 50GB for the same file
>> with
>>>> the
>>>>> same settings.
>>>>> The only exception was that I activated GroupCommit.
>>>>>
>>>>> The dataset can be downloaded here:
>>>>>
>> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz
>>>>> .
>>>>> Please find the settings used to load the file below.
>>>>>
>>>>> Do I have a misconfiguration, or is there a bug eating all disk
>> memory?
>>>>> Best regards
>>>>> Andreas
>>>>>
>>>>> Namespace-Properties:
>>>>> curl -H "Accept: text/plain"
>>>>> http://localhost:8080/bigdata/namespace/gnd/properties
>>>>> #Wed Apr 22 11:35:31 CEST 2015
>>>>>
>>> com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
>>>>> com.bigdata.relation.container=gnd
>>>>> com.bigdata.rwstore.RWStore.smallSlotType=1024
>>>>> com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
>>>>> com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
>>>>>
>>>>>
>> com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
>>>>> com.bigdata.journal.AbstractJournal.initialExtent=209715200
>>>>> com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
>>>>> com.bigdata.btree.BTree.branchingFactor=700
>>>>>
>>>>>
>> com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
>>>>> com.bigdata.rdf.sail.isolatableIndices=false
>>>>> com.bigdata.service.AbstractTransactionService.minReleaseAge=1
>>>>> com.bigdata.rdf.sail.bufferCapacity=2000
>>>>> com.bigdata.rdf.sail.truthMaintenance=false
>>>>> com.bigdata.rdf.sail.namespace=gnd
>>>>> com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
>>>>> com.bigdata.rdf.store.AbstractTripleStore.quads=false
>>>>> com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
>>>>> com.bigdata.search.FullTextIndex.fieldsEnabled=false
>>>>> com.bigdata.relation.namespace=gnd
>>>>> com.bigdata.j.sail.BigdataSail.bufferCapacity=2000
>>>>> com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
>>>>>
>>>>
>>>> --
>>>> ----
>>>> Bryan Thompson
>>>> Chief Scientist & Founder
>>>> SYSTAP, LLC
>>>> 4501 Tower Road
>>>> Greensboro, NC 27410
>>>> br...@sy...
>>>> http://blazegraph.com
>>>> http://blog.bigdata.com <http://bigdata.com>
>>>> http://mapgraph.io
>>>>
>>>> Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance
>>>> graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints
>>>> APIs.  MapGraph™ <http://www.systap.com/mapgraph> is our disruptive
>> new
>>>> technology to use GPUs to accelerate data-parallel graph analytics.
>>>>
>>>> CONFIDENTIALITY NOTICE:  This email and its contents and attachments
>> are
>>>> for the sole use of the intended recipient(s) and are confidential or
>>>> proprietary to SYSTAP. Any unauthorized review, use, disclosure,
>>>> dissemination or copying of this email or its contents or attachments
>> is
>>>> prohibited. If you have received this communication in error, please
>>> notify
>>>> the sender by reply email and permanently delete all copies of the
>> email
>>>> and its contents and attachments.
>>>>
>>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Bigdata-developers mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-developers

27 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 21 22 23 24 25 .. 72 > >> (Page 23 of 72)