Menu

Edge Insert Performance Slow

Help
2015-10-25
2015-10-28
  • lantiantaiyang

    lantiantaiyang - 2015-10-25

    Hi

    I am inserting edges that connects two vertices. The performance of the insert is significantly slower than that of the vertices. I was able to insert ~31 million vertices in about 40 hours. The rate of insert was ~800k vertices per hour.
    Now I'm inserting edges, the performance degraded to ~22700 edges per hour.

    I do have inferencing turned on for my namespace. I need inferencing turned on for ontology analysis later on.

    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass com.bigdata.rdf.axioms.RdfsAxioms
    com.bigdata.rdf.sail.truthMaintenance true

    How can I speed up my edge insert?

    Yan

     
  • Mike Personick

    Mike Personick - 2015-10-27

    Try setting the property:

    com.bigdata.blueprints.BigdataGraph.laxEdges = true
    

    This will turn off enforcement of unique ecge ids, which requires a read for every insert.

    Also make sure you are not calling BigdataGraph.addEdge() over the HTTP API, which results in one commit per edge. Use the bulk load method if going over HTTP API:

    BigdataGraphClient.loadGraphML
    
     
  • lantiantaiyang

    lantiantaiyang - 2015-10-27

    Thanks Mike. I'm using in memory graph to create an XML in GraphML format. Then calling BigdataGraphClient.loadGraphML on the XML.

    I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in RWStore.properties and restarted the server, and it seems even slower. I have multiple worker machines reading edge files in HDFS and inserting into one blazegraph server.

    A few questions:

    1. When inserting an edge, it creates a triple for the in and out vertices. Does it do a search to see if the vertices exist. I have 30+ million vertices.
    2. If I insert the edges to a blank database, no vertices, it will create a skeleton vertex triple. Later on, if I then insert vertices, will it know to look up the same vertex id?
    3. I have inferencing turned on when I created my namespace. Now I have 30+million vertices. Is it possible to turn inferencing off for the existing namespace or do I have to create a new namepsace.
    4. If I create a new namespace, do I have to re-insert the vertices in the new namespace or can the existing data be transferred.
    5. If I turn inferencing off for the data insert, will I be able to do ontology analysis later on when all data is inserted.

    Thanks
    Yan

     
    • Brad Bebee

      Brad Bebee - 2015-10-28

      Yan,

      Unfortunately, for this change to take affect, you'll need to create a new
      namespace and restart the loading. In general, namespace configuration
      cannot be altered after the initial setup. One option might be to export
      your current namespace as RDR, create the new namespace with the updated
      configuration, and then import the RDR, then resume the load.

      Thanks, --Brad

      On Tue, Oct 27, 2015 at 1:58 PM, lantiantaiyang <lantiantaiyang@users.sf.net

      wrote:

      Thanks Mike. I'm using in memory graph to create an XML in GraphML format.
      Then calling BigdataGraphClient.loadGraphML on the XML.

      I added com.bigdata.blueprints.BigdataGraph.laxEdges = true in
      RWStore.properties and restarted the server, and it seems even slower. I
      have multiple worker machines reading edge files in HDFS and inserting into
      one blazegraph server.

      A few questions:

      1. When inserting an edge, it creates a triple for the in and out
        vertices. Does it do a search to see if the vertices exist. I have 30+
        million vertices.
      2. If I insert the edges to a blank database, no vertices, it will
        create a skeleton vertex triple. Later on, if I then insert vertices, will
        it know to look up the same vertex id?
      3. I have inferencing turned on when I created my namespace. Now I
        have 30+million vertices. Is it possible to turn inferencing off for the
        existing namespace or do I have to create a new namepsace.
      4. If I create a new namespace, do I have to re-insert the vertices in
        the new namespace or can the existing data be transferred.
      5. If I turn inferencing off for the data insert, will I be able to do
        ontology analysis later on when all data is inserted.

      Thanks
      Yan


      Edge Insert Performance Slow
      https://sourceforge.net/p/bigdata/discussion/676946/thread/436b89f0/?limit=25#0769


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

Log in to post a comment.