From: Joakim S. <joa...@bl...> - 2015-11-04 17:28:06
|
I’m curious, what do you refer to by "inlining and vocabulary optimization"? TBOX stuff? > On Nov 4, 2015, at 5:27 AM, Brad Bebee <be...@sy...> wrote: > > Alex, > > Great -- as Bryan mentioned, we've been doing a lot of work on load performance. The right inlining and vocabulary optimizations can have a significant impact on load performance (40-50% increase in speed depending on your data). We'll have some blog posts related to the release. > > Thanks, --Brad > > On Wed, Nov 4, 2015 at 8:25 AM, Alex Muir <ale...@gm... <mailto:ale...@gm...>> wrote: > Okay thanks Brad and Bryan, will use the REST API and look into optimizations > > > Regards > Alex > www.tilogeo.com <http://www.tilogeo.com/> > > On Wed, Nov 4, 2015 at 1:19 PM, Brad Bebee <be...@sy... <mailto:be...@sy...>> wrote: > Alex, > > Adding to Bryan's comments, if you have a Blazegraph instance running remotely and you want to insert data into it, you can use the REST API to post URIs of the files to load. Assuming you can make those URIs resolvable on the remote server, it will resolve the URIs and load them. > > https://wiki.blazegraph.com/wiki/index.php/REST_API#INSERT_RDF_.28POST_with_URLs.29 <https://wiki.blazegraph.com/wiki/index.php/REST_API#INSERT_RDF_.28POST_with_URLs.29> > > Thanks, --Brad > > On Wed, Nov 4, 2015 at 8:15 AM, Bryan Thompson <br...@sy... <mailto:br...@sy...>> wrote: > Alex, > > If you are referring to the DataLoader, it is an embedded utility class. It is not designed to operate with a remote database instance. > > You can mimic many of the advantages of the DataLoader by increasing BigdataSail.Options.BUFFER_CAPACITY to 100,000. > > You should also follow the guidelines on the wiki for performance optimization if you are interested in bulk data load. See the section entitled Optimizations and benchmarking <https://wiki.blazegraph.com/wiki/index.php/NanoSparqlServer#p-Optimizations_and_benchmarking>. E.g., https://wiki.blazegraph.com/wiki/index.php/IOOptimization <https://wiki.blazegraph.com/wiki/index.php/IOOptimization>. > > Some of the more important optimizations for write throughput are: > > - Write cache service native buffer pool size. > - Use of URI inlining techniques if you have URIs that have numeric or UUID patterns embedded into them. > - Fast disk. > > We have a number of improvements in the development branch that improve load speed, including code to overlap the parser with the index writers. Those will be in the 2.0 release. > > Thanks, > Bryan > > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... <mailto:br...@sy...> > http://blazegraph.com <http://blazegraph.com/> > http://blog.blazegraph.com <http://blog.blazegraph.com/> > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph is now available with GPU acceleration using our disruptive technology to accelerate data-parallel graph analytics and graph query. > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Wed, Nov 4, 2015 at 7:48 AM, Alex Muir <ale...@gm... <mailto:ale...@gm...>> wrote: > Well I downloaded the blazegraph git examples and extracted the unique bigdata properties. I don't think any of them are related to specifying a remote server. > > Perhaps there is another way to specify to upload to a remote server with the bulk loader? > > com.bigdata.btree.BTree.branchingFactor > com.bigdata.btree.keys.KeyBuilder.collator > com.bigdata.btree.writeRetentionQueue.capacity > com.bigdata.journal.AbstractJournal.bufferMode > com.bigdata.journal.AbstractJournal.file > com.bigdata.journal.AbstractJournal.initialExtent > com.bigdata.journal.AbstractJournal.maximumExtent > com.bigdata.journal.AbstractJournal.writeCacheBufferCount > com.bigdata.namespace.BSBM_284826.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_284826.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_284826.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_284826.spo.OSP.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_284826.spo.POS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_284826.spo.SPO.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_566496.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_566496.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_566496.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_566496.spo.OSP.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_566496.spo.POS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.BSBM_566496.spo.SPO.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.spo.CSPO.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.spo.OCSP.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.spo.PCSO.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.spo.POCS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.spo.SOPC.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.chem2bio2rdf.spo.SPOC.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.dbpedia.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.dbpedia.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.dbpedia.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.dbpedia.spo.OSP.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.dbpedia.spo.POS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.dbpedia.spo.SPO.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.kb.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.kb.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.kb.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor > com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor > com.bigdata.rdf.rio.RDFParserOptions.stopAtFirstError > com.bigdata.rdf.sail.BigdataSail.bufferCapacity > com.bigdata.rdf.sail.BigdataSail.truthMaintenance > com.bigdata.rdf.sail.bufferCapacity > com.bigdata.rdf.sail.newEvalStrategy > com.bigdata.rdf.sail.queryTimeExpander > com.bigdata.rdf.sail.truthMaintenance > com.bigdata.rdf.store.AbstractTripleStore.axiomsClass > com.bigdata.rdf.store.AbstractTripleStore.bloomFilter > com.bigdata.rdf.store.AbstractTripleStore.extensionFactoryClass > com.bigdata.rdf.store.AbstractTripleStore.justify > com.bigdata.rdf.store.AbstractTripleStore.quads > com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers > com.bigdata.rdf.store.AbstractTripleStore.textIndex > com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass > com.bigdata.resource.OverflowManager.overflowEnabled > com.bigdata.service.AbstractTransactionService.minReleaseAge > com.bigdata.service.EmbeddedFederation.dataDir > com.bigdata.service.IBigdataClient.collectPlatformStatistics > > > > Regards > Alex > www.tilogeo.com <http://www.tilogeo.com/> > On Wed, Nov 4, 2015 at 11:12 AM, Alex Muir <ale...@gm... <mailto:ale...@gm...>> wrote: > Hi, > > I'm interested to bulk upload onto a remote server > > https://wiki.blazegraph.com/wiki/index.php/Bulk_Data_Load <https://wiki.blazegraph.com/wiki/index.php/Bulk_Data_Load> > > I assume that I can specify a remote server in the properties file however I'm thus far unable to find more information on what goes in a property file from the website. > > Is there a page defining all the properties? > > Regards > Alex > www.tilogeo.com <http://www.tilogeo.com/> > > ------------------------------------------------------------------------------ > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <mailto:Big...@li...> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <mailto:Big...@li...> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > > > > > -- > _______________ > Brad Bebee > CEO, Managing Partner > SYSTAP, LLC > e: be...@sy... <mailto:be...@sy...> > m: 202.642.7961 <tel:202.642.7961> > f: 571.367.5000 <tel:571.367.5000> > w: www.blazegraph.com <http://www.blazegraph.com/> > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > > > > -- > _______________ > Brad Bebee > CEO, Managing Partner > SYSTAP, LLC > e: be...@sy... <mailto:be...@sy...> > m: 202.642.7961 > f: 571.367.5000 > w: www.blazegraph.com <http://www.blazegraph.com/> > > Blazegraph™ <http://www.blazegraph.com/> is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Mapgraph™ <http://www.systap.com/mapgraph> is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |