Blazegraph (powered by bigdata) / Discussion / Help: Edge Insert Performance Slow

lantiantaiyang - 2015-10-25

Hi

I am inserting edges that connects two vertices. The performance of the insert is significantly slower than that of the vertices. I was able to insert ~31 million vertices in about 40 hours. The rate of insert was ~800k vertices per hour.
Now I'm inserting edges, the performance degraded to ~22700 edges per hour.

I do have inferencing turned on for my namespace. I need inferencing turned on for ontology analysis later on.

com.bigdata.rdf.store.AbstractTripleStore.axiomsClass com.bigdata.rdf.axioms.RdfsAxioms
com.bigdata.rdf.sail.truthMaintenance true

How can I speed up my edge insert?

Yan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mike Personick - 2015-10-27

Try setting the property:

com.bigdata.blueprints.BigdataGraph.laxEdges = true

This will turn off enforcement of unique ecge ids, which requires a read for every insert.

Also make sure you are not calling BigdataGraph.addEdge() over the HTTP API, which results in one commit per edge. Use the bulk load method if going over HTTP API:

BigdataGraphClient.loadGraphML
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lantiantaiyang - 2015-10-27

Thanks Mike. I'm using in memory graph to create an XML in GraphML format. Then calling BigdataGraphClient.loadGraphML on the XML.

I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in RWStore.properties and restarted the server, and it seems even slower. I have multiple worker machines reading edge files in HDFS and inserting into one blazegraph server.

A few questions:

When inserting an edge, it creates a triple for the in and out vertices. Does it do a search to see if the vertices exist. I have 30+ million vertices.

If I insert the edges to a blank database, no vertices, it will create a skeleton vertex triple. Later on, if I then insert vertices, will it know to look up the same vertex id?

I have inferencing turned on when I created my namespace. Now I have 30+million vertices. Is it possible to turn inferencing off for the existing namespace or do I have to create a new namepsace.

If I create a new namespace, do I have to re-insert the vertices in the new namespace or can the existing data be transferred.

If I turn inferencing off for the data insert, will I be able to do ontology analysis later on when all data is inserted.

Thanks
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brad Bebee - 2015-10-28
  
  Yan,
  
  Unfortunately, for this change to take affect, you'll need to create a new
  namespace and restart the loading. In general, namespace configuration
  cannot be altered after the initial setup. One option might be to export
  your current namespace as RDR, create the new namespace with the updated
  configuration, and then import the RDR, then resume the load.
  
  Thanks, --Brad
  
  On Tue, Oct 27, 2015 at 1:58 PM, lantiantaiyang <lantiantaiyang@users.sf.net
  
  wrote:
  
  Thanks Mike. I'm using in memory graph to create an XML in GraphML format.
  Then calling BigdataGraphClient.loadGraphML on the XML.
  
  I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in
  RWStore.properties and restarted the server, and it seems even slower. I
  have multiple worker machines reading edge files in HDFS and inserting into
  one blazegraph server.
  
  A few questions:
  
  When inserting an edge, it creates a triple for the in and out
  vertices. Does it do a search to see if the vertices exist. I have 30+
  million vertices.
  
  If I insert the edges to a blank database, no vertices, it will
  create a skeleton vertex triple. Later on, if I then insert vertices, will
  it know to look up the same vertex id?
  
  I have inferencing turned on when I created my namespace. Now I
  have 30+million vertices. Is it possible to turn inferencing off for the
  existing namespace or do I have to create a new namepsace.
  
  If I create a new namespace, do I have to re-insert the vertices in
  the new namespace or can the existing data be transferred.
  
  If I turn inferencing off for the data insert, will I be able to do
  ontology analysis later on when all data is inserted.
  
  Thanks
  Yan
  
  Edge Insert Performance Slow
  https://sourceforge.net/p/bigdata/discussion/676946/thread/436b89f0/?limit=25#0769
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Edge Insert Performance Slow

Fast, scalable, robust graph database platform

Forums

Help

Edge Insert Performance Slow

Edge Insert Performance Slow

Fast, scalable, robust graph database platform

Forums

Help

Edge Insert Performance Slow document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Edge Insert Performance Slow