I am inserting edges that connects two vertices. The performance of the insert is significantly slower than that of the vertices. I was able to insert ~31 million vertices in about 40 hours. The rate of insert was ~800k vertices per hour.
Now I'm inserting edges, the performance degraded to ~22700 edges per hour.
I do have inferencing turned on for my namespace. I need inferencing turned on for ontology analysis later on.
This will turn off enforcement of unique ecge ids, which requires a read for every insert.
Also make sure you are not calling BigdataGraph.addEdge() over the HTTP API, which results in one commit per edge. Use the bulk load method if going over HTTP API:
BigdataGraphClient.loadGraphML
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Mike. I'm using in memory graph to create an XML in GraphML format. Then calling BigdataGraphClient.loadGraphML on the XML.
I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in RWStore.properties and restarted the server, and it seems even slower. I have multiple worker machines reading edge files in HDFS and inserting into one blazegraph server.
A few questions:
When inserting an edge, it creates a triple for the in and out vertices. Does it do a search to see if the vertices exist. I have 30+ million vertices.
If I insert the edges to a blank database, no vertices, it will create a skeleton vertex triple. Later on, if I then insert vertices, will it know to look up the same vertex id?
I have inferencing turned on when I created my namespace. Now I have 30+million vertices. Is it possible to turn inferencing off for the existing namespace or do I have to create a new namepsace.
If I create a new namespace, do I have to re-insert the vertices in the new namespace or can the existing data be transferred.
If I turn inferencing off for the data insert, will I be able to do ontology analysis later on when all data is inserted.
Thanks
Yan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Unfortunately, for this change to take affect, you'll need to create a new
namespace and restart the loading. In general, namespace configuration
cannot be altered after the initial setup. One option might be to export
your current namespace as RDR, create the new namespace with the updated
configuration, and then import the RDR, then resume the load.
Thanks, --Brad
On Tue, Oct 27, 2015 at 1:58 PM, lantiantaiyang <lantiantaiyang@users.sf.net
wrote:
Thanks Mike. I'm using in memory graph to create an XML in GraphML format.
Then calling BigdataGraphClient.loadGraphML on the XML.
I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in
RWStore.properties and restarted the server, and it seems even slower. I
have multiple worker machines reading edge files in HDFS and inserting into
one blazegraph server.
A few questions:
When inserting an edge, it creates a triple for the in and out
vertices. Does it do a search to see if the vertices exist. I have 30+
million vertices.
If I insert the edges to a blank database, no vertices, it will
create a skeleton vertex triple. Later on, if I then insert vertices, will
it know to look up the same vertex id?
I have inferencing turned on when I created my namespace. Now I
have 30+million vertices. Is it possible to turn inferencing off for the
existing namespace or do I have to create a new namepsace.
If I create a new namespace, do I have to re-insert the vertices in
the new namespace or can the existing data be transferred.
If I turn inferencing off for the data insert, will I be able to do
ontology analysis later on when all data is inserted.
Hi
I am inserting edges that connects two vertices. The performance of the insert is significantly slower than that of the vertices. I was able to insert ~31 million vertices in about 40 hours. The rate of insert was ~800k vertices per hour.
Now I'm inserting edges, the performance degraded to ~22700 edges per hour.
I do have inferencing turned on for my namespace. I need inferencing turned on for ontology analysis later on.
com.bigdata.rdf.store.AbstractTripleStore.axiomsClass com.bigdata.rdf.axioms.RdfsAxioms
com.bigdata.rdf.sail.truthMaintenance true
How can I speed up my edge insert?
Yan
Try setting the property:
This will turn off enforcement of unique ecge ids, which requires a read for every insert.
Also make sure you are not calling BigdataGraph.addEdge() over the HTTP API, which results in one commit per edge. Use the bulk load method if going over HTTP API:
Thanks Mike. I'm using in memory graph to create an XML in GraphML format. Then calling BigdataGraphClient.loadGraphML on the XML.
I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in RWStore.properties and restarted the server, and it seems even slower. I have multiple worker machines reading edge files in HDFS and inserting into one blazegraph server.
A few questions:
Thanks
Yan
Yan,
Unfortunately, for this change to take affect, you'll need to create a new
namespace and restart the loading. In general, namespace configuration
cannot be altered after the initial setup. One option might be to export
your current namespace as RDR, create the new namespace with the updated
configuration, and then import the RDR, then resume the load.
Thanks, --Brad
On Tue, Oct 27, 2015 at 1:58 PM, lantiantaiyang <lantiantaiyang@users.sf.net