Menu

Blazegraph DataLoader Performance

Help
2016-10-24
2016-10-28
  • Alexey Emtsov

    Alexey Emtsov - 2016-10-24

    Hello!
    We try to load about 6 billions triples into database but during data loading faced with performance degradation problem. New database was created and
    in the beginning performance was pretty good -

    ... - file:: 46964 stmts added in 0.519 secs, rate= 90489, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 1324166 stmts added in 21.805 secs, rate= 60727, commitLatency=0ms, {failSet=0,goodSet=31};
    ... - file:: 41112 stmts added in 0.454 secs, rate= 90555, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 1365278 stmts added in 22.259 secs, rate= 61335, commitLatency=0ms, {failSet=0,goodSet=32}
    ... - file:: 63538 stmts added in 0.892 secs, rate= 71230, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 1868122 stmts added in 28.763 secs, rate= 64946, commitLatency=0ms, {failSet=0,goodSet=48};
    

    But during the time performance became worse and worse and after loading just about 200 millions dataloader shown following results -

    ... - file:: 66476 stmts added in 35.012 secs, rate= 1898, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 190802 stmts added in 108.809 secs, rate= 1753, commitLatency=0ms, {failSet=0,goodSet=3};
    ... - file:: 61795 stmts added in 28.199 secs, rate= 2191, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 252597 stmts added in 137.008 secs, rate= 1843, commitLatency=0ms, {failSet=0,goodSet=4}; 
    ... - file:: 62623 stmts added in 24.992 secs, rate= 2505, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 315220 stmts added in 162.0 secs, rate= 1945, commitLatency=0ms, {failSet=0,goodSet=5};
    

    We use server with linux RedHat 6.6 installed , 128GB RAM memory and couple SSD disks. CPU speed and cores numers looks like is not an issue because during processing overall CPU loading was not exceed 20%.

    Here is the command-line and properties file we use

    *** command line 
    java -server -Xmx64g -XX:+UseG1GC -XX:MaxDirectMemorySize=16000m -Dlog4j.configuration=log4j.properties -Djetty.port=9996 -Dbigdata.propertyFile=RWStore.properties -jar blazegraph.jar
    
    *** property file
    com.bigdata.journal.AbstractJournal.file=bigdata.jnl
    com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
    com.bigdata.service.AbstractTransactionService.minReleaseAge=0
    com.bigdata.btree.writeRetentionQueue.capacity=50000
    com.bigdata.btree.BTree.branchingFactor=1024
    com.bigdata.rdf.sail.bufferCapacity=500000
    com.bigdata.rdf.sail.truthMaintenance=false
    com.bigdata.rdf.store.AbstractTripleStore.quads=true
    com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
    com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
    com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
    com.bigdata.journal.AbstractJournal.writeCacheBufferCount=14000
    
    *** dataloader properties
    com.bigdata.rdf.store.DataLoader.flush=true
    com.bigdata.rdf.store.DataLoader.bufferCapacity=30000
    com.bigdata.rdf.store.DataLoader.queueCapacity=5
    com.bigdata.rdf.store.DataLoader.commit=Batch
    com.bigdata.rdf.store.DataLoader.verbose=1
    

    Also we adjusted all branching factors based on recommendations provided by status?dumpPages&dumpJournal command

    Here is also information about data distribution across RWStore and blobs

    -------------------------
    RWStore Allocator Summary
    -------------------------
    AllocatorSize      AllocatorCount   SlotsAllocated  %SlotsAllocated    SlotsRecycled        SlotChurn       SlotsInUse      %SlotsInUse   MeanAllocation    SlotsReserved     %SlotsUnused    BytesReserved     BytesAppData       %SlotWaste         %AppData       %StoreFile      %TotalWaste       %FileWaste 
    64                           5127         51341146            30.01         21811089             1.40         29530057            40.24               40         36746240            19.64       2351759360       1889923648            19.64             3.75             4.31            10.97              0.85 
    128                          5266         37829689            22.11           312997             1.00         37516692            51.13               69         37743616             0.60       4831182848       4802136576             0.60             9.53             8.85             0.69              0.05 
    192                             8           343335             0.20           310715             5.99            32620             0.04              160            57344            43.12         11010048          6263040            43.12             0.01             0.02             0.11              0.01 
    320                            12           681628             0.40           635454             7.92            46174             0.06              255            86016            46.32         27525120         14775680            46.32             0.03             0.05             0.30              0.02 
    512                            15           966104             0.56           904292             8.99            61812             0.08              416           107520            42.51         55050240         31647744            42.51             0.06             0.10             0.56              0.04 
    768                            24          1277702             0.75          1170704             7.43           106998             0.15              637           172032            37.80        132120576         82174464            37.80             0.16             0.24             1.19              0.09 
    1024                           35          1288267             0.75          1124298             5.17           163969             0.22              892           249088            34.17        255066112        167904256            34.17             0.33             0.47             2.07              0.16 
    2048                           44          4381529             2.56          4084802            13.89           296727             0.40             1513           315392             5.92        645922816        607696896             5.92             1.21             1.18             0.91              0.07 
    3072                           63          3868907             2.26          3443739             8.57           425168             0.58             2586           451584             5.85       1387266048       1306116096             5.85             2.59             2.54             1.93              0.15 
    4096                           45          5798404             3.39          5519159            18.06           279245             0.38             3670           321024            13.01       1314914304       1143787520            13.01             2.27             2.41             4.06              0.31 
    8192                          742         63325312            37.01         58403549            11.91          4921763             6.71             6963          5318656             7.46      43570429952      40319082496             7.46            80.04            79.83            77.22              5.96 
    
    -------------------------
    BLOBS
    -------------------------
    Bucket(K)   Allocations    Allocated      Deletes      Current         Mean
    16             18480662 209270698950     17489631       991031        11323
    32              4050241  74923579852      4024017        26224        18498
    64                    9       310663            9            0        34518
    128                   0            0            0            0            0
    256                   0            0            0            0            0
    512                   0            0            0            0            0
    1024                  0            0            0            0            0
    2048                985   1071538160          985            0      1087856
    4096                  0            0            0            0            0
    8192                  0            0            0            0            0
    16384                 0            0            0            0            0
    32768                 0            0            0            0            0
    65536                 0            0            0            0            0
    2097151               0            0            0            0            0
    

    Could you please provide thoughts what we doing wrong and how performance can be improved?

     

    Last edit: Alexey Emtsov 2016-10-24
    • Brad Bebee

      Brad Bebee - 2016-10-24

      Alexey,

      Assuming that you are using a relatively fast SSD disk (if not, this is our
      first recommendation), you can likely benefit from developing a custom
      vocabulary for your data to enable inlining.

      See an example at
      https://github.com/blazegraph/database/tree/master/vocabularies.

      Thanks, --Brad

      On Mon, Oct 24, 2016 at 7:26 AM, Alexey Emtsov mirguan@users.sf.net wrote:

      Hello!
      We try to load about 6 billions triples into database but during data
      loading faced with performance degradation problem. New database was
      created and
      in the beginning performance was pretty good -

      ... - file:: 46964 stmts added in 0.519 secs, rate= 90489, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 1324166 stmts added in 21.805 secs, rate= 60727, commitLatency=0ms, {failSet=0,goodSet=31};... - file:: 41112 stmts added in 0.454 secs, rate= 90555, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 1365278 stmts added in 22.259 secs, rate= 61335, commitLatency=0ms, {failSet=0,goodSet=32}... - file:: 63538 stmts added in 0.892 secs, rate= 71230, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 1868122 stmts added in 28.763 secs, rate= 64946, commitLatency=0ms, {failSet=0,goodSet=48};

      But during the time performance became worse and worse and after loading
      just about 200 millions dataloader shown following results -

      ... - file:: 66476 stmts added in 35.012 secs, rate= 1898, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 190802 stmts added in 108.809 secs, rate= 1753, commitLatency=0ms, {failSet=0,goodSet=3};... - file:: 61795 stmts added in 28.199 secs, rate= 2191, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 252597 stmts added in 137.008 secs, rate= 1843, commitLatency=0ms, {failSet=0,goodSet=4}; ... - file:: 62623 stmts added in 24.992 secs, rate= 2505, commitLatency=0ms, {failSet=0,goodSet=1}; totals:: 315220 stmts added in 162.0 secs, rate= 1945, commitLatency=0ms, {failSet=0,goodSet=5};

      We use server with linux RedHat 6.6 installed and 128GB memory. CPU and
      cores is not and issue because during processing CPU loading were not
      exceed 20%.

      Here is the command-line and properties file we use

      *** command line
      java -server -Xmx64g -XX:+UseG1GC -XX:MaxDirectMemorySize=16000m -Dlog4j.configuration=log4j.properties -Djetty.port=9996 -Dbigdata.propertyFile=RWStore.properties -jar blazegraph.jar

      *** property file
      com.bigdata.journal.AbstractJournal.file=bigdata.jnl
      com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
      com.bigdata.service.AbstractTransactionService.minReleaseAge=0
      com.bigdata.btree.writeRetentionQueue.capacity=50000
      com.bigdata.btree.BTree.branchingFactor=1024
      com.bigdata.rdf.sail.bufferCapacity=500000
      com.bigdata.rdf.sail.truthMaintenance=false
      com.bigdata.rdf.store.AbstractTripleStore.quads=true
      com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
      com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
      com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
      com.bigdata.journal.AbstractJournal.writeCacheBufferCount=14000

      *** dataloader properties
      com.bigdata.rdf.store.DataLoader.flush=true
      com.bigdata.rdf.store.DataLoader.bufferCapacity=30000
      com.bigdata.rdf.store.DataLoader.queueCapacity=5
      com.bigdata.rdf.store.DataLoader.commit=Batch
      com.bigdata.rdf.store.DataLoader.verbose=1

      Also we adjusted all branching factors based on recommendations provided
      by status?dumpPages&dumpJournal command

      Here is also information about data distribution across RWStore and blobs


      RWStore Allocator Summary

      AllocatorSize AllocatorCount SlotsAllocated %SlotsAllocated SlotsRecycled
      SlotChurn SlotsInUse %SlotsInUse MeanAllocation SlotsReserved %SlotsUnused
      BytesReserved BytesAppData %SlotWaste %AppData %StoreFile %TotalWaste
      %FileWaste
      64 5127 51341146 30.01 21811089 1.40 29530057 40.24 40 36746240 19.64
      2351759360 1889923648 19.64 3.75 4.31 10.97 0.85
      128 5266 37829689 22.11 312997 1.00 37516692 51.13 69 37743616 0.60
      4831182848 4802136576 0.60 9.53 8.85 0.69 0.05
      192 8 343335 0.20 310715 5.99 32620 0.04 160 57344 43.12 11010048 6263040
      43.12 0.01 0.02 0.11 0.01
      320 12 681628 0.40 635454 7.92 46174 0.06 255 86016 46.32 27525120
      14775680 46.32 0.03 0.05 0.30 0.02
      512 15 966104 0.56 904292 8.99 61812 0.08 416 107520 42.51 55050240
      31647744 42.51 0.06 0.10 0.56 0.04
      768 24 1277702 0.75 1170704 7.43 106998 0.15 637 172032 37.80 132120576
      82174464 37.80 0.16 0.24 1.19 0.09
      1024 35 1288267 0.75 1124298 5.17 163969 0.22 892 249088 34.17 255066112
      167904256 34.17 0.33 0.47 2.07 0.16
      2048 44 4381529 2.56 4084802 13.89 296727 0.40 1513 315392 5.92 645922816
      607696896 5.92 1.21 1.18 0.91 0.07
      3072 63 3868907 2.26 3443739 8.57 425168 0.58 2586 451584 5.85 1387266048
      1306116096 5.85 2.59 2.54 1.93 0.15
      4096 45 5798404 3.39 5519159 18.06 279245 0.38 3670 321024 13.01
      1314914304 1143787520 13.01 2.27 2.41 4.06 0.31
      8192 742 63325312 37.01 58403549 11.91 4921763 6.71 6963 5318656 7.46
      43570429952 40319082496 7.46 80.04 79.83 77.22 5.96


      BLOBS

      Bucket(K) Allocations Allocated Deletes Current Mean
      16 18480662 209270698950 17489631 991031 11323
      32 4050241 74923579852 4024017 26224 18498
      64 9 310663 9 0 34518
      128 0 0 0 0 0
      256 0 0 0 0 0
      512 0 0 0 0 0
      1024 0 0 0 0 0
      2048 985 1071538160 985 0 1087856
      4096 0 0 0 0 0
      8192 0 0 0 0 0
      16384 0 0 0 0 0
      32768 0 0 0 0 0
      65536 0 0 0 0 0
      2097151 0 0 0 0 0

      Could you please provide thoughts what we doing wrong and how performance
      can be improved?


      Blazegraph DataLoader Performance
      https://sourceforge.net/p/bigdata/discussion/676946/thread/21bbae1a/?limit=25#f5dc


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Alexey Emtsov

    Alexey Emtsov - 2016-10-26

    Brad,
    Thank you for response and suggestion!
    We created custom vocabulary were defined our common types and identifiers we use.
    Aslo we extended InlineURIFactory with attempt to optimize inlinig.
    We exeuted some different tests with loading but looks like aplying these changes in different combinations have no effect for perfromance and we got quite similar results, just may be with 5-10% improvement max.

    Alexey

     
    • Brad Bebee

      Brad Bebee - 2016-10-26

      Alexey,

      Can you post dumpJournal with -pages and your journal properties to verify
      inlining is taking place?

      Thanks, Brad

      On Oct 26, 2016 6:27 AM, "Alexey Emtsov" mirguan@users.sf.net wrote:

      Brad,
      Thank you for response and suggestion!
      We created custom vocabulary were defined our common types and identifiers
      we use.
      Aslo we extended InlineURIFactory with attempt to optimize inlinig.
      We exeuted some different tests with loading but looks like aplying these
      changes in different combinations have no effect for perfromance and we got
      quite similar results, just may be with 5-10% improvement max.

      Alexey

      Blazegraph DataLoader Performance
      https://sourceforge.net/p/bigdata/discussion/676946/thread/21bbae1a/?limit=25#c535


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Alexey Emtsov

    Alexey Emtsov - 2016-10-28

    Here is it

    -------------------------
    RWStore Allocator Summary
    -------------------------
    AllocatorSize      AllocatorCount   SlotsAllocated  %SlotsAllocated    SlotsRecycled        SlotChurn       SlotsInUse      %SlotsInUse   MeanAllocation    SlotsReserved     %SlotsUnused    BytesReserved     BytesAppData       %SlotWaste         %AppData       %StoreFile      %TotalWaste       %FileWaste 
    64                           4452         38900620            24.95         13334417             1.22         25566203            38.43               45         31910912            19.88       2042298368       1636236992            19.88             3.74             4.34            12.08              0.86 
    128                          4960         35569516            22.81           108642             1.00         35460874            53.31               69         35553280             0.26       4550819840       4538991872             0.26            10.39             9.67             0.35              0.03 
    192                             5           141887             0.09           116339             3.96            25548             0.04              158            35840            28.72          6881280          4905216            28.72             0.01             0.01             0.06              0.00 
    320                             6           264922             0.17           241155             6.16            23767             0.04              254            43008            44.74         13762560          7605440            44.74             0.02             0.03             0.18              0.01 
    512                             9           357273             0.23           318708             5.54            38565             0.06              417            64512            40.22         33030144         19745280            40.22             0.05             0.07             0.40              0.03 
    768                            32           553307             0.35           401500             2.43           151807             0.23              616           228096            33.45        175177728        116587776            33.45             0.27             0.37             1.74              0.12 
    1024                            7           397688             0.26           367347             7.93            30341             0.05              894            50176            39.53         51380224         31069184            39.53             0.07             0.11             0.60              0.04 
    2048                           11          1345273             0.86          1278633            17.06            66640             0.10             1513            78848            15.48        161480704        136478720            15.48             0.31             0.34             0.74              0.05 
    3072                          108          3753397             2.41          3019400             4.88           733997             1.10             2687           769024             4.55       2362441728       2254838784             4.55             5.16             5.02             3.20              0.23 
    4096                           55         16139046            10.35         15830487            40.94           308559             0.46             3631           394240            21.73       1614807040       1263857664            21.73             2.89             3.43            10.44              0.75 
    8192                          614         58510539            37.52         54397394            13.29          4113145             6.18             6263          4401152             6.54      36054237184      33694883840             6.54            77.10            76.60            70.20              5.01 
    
    -------------------------
    BLOBS
    -------------------------
    Bucket(K)   Allocations    Allocated      Deletes      Current         Mean
    16             10093108 125168179125      9654905       438203        12401
    32              3590088  66461722547      3566071        24017        18512
    64                   11       383617           10            1        34874
    128                   0            0            0            0            0
    256                   0            0            0            0            0
    512                   0            0            0            0            0
    1024                  0            0            0            0            0
    2048                955   1038902480          955            0      1087856
    4096                  0            0            0            0            0
    8192                  0            0            0            0            0
    16384                 0            0            0            0            0
    32768                 0            0            0            0            0
    65536                 0            0            0            0            0
    2097151               0            0            0            0            0
    

    Also when we specifed inlineURIFactory performance dropped down. So, looks like vocabulary and inlining is on...

     
    • Brad Bebee

      Brad Bebee - 2016-10-28

      Martyn: Do you any thoughts on this dump journal?

      Alexey: Can you please confirm the drive type you are using?

      On Fri, Oct 28, 2016 at 6:38 AM, Alexey Emtsov mirguan@users.sf.net wrote:

      Here is it


      RWStore Allocator Summary

      AllocatorSize AllocatorCount SlotsAllocated %SlotsAllocated SlotsRecycled SlotChurn SlotsInUse %SlotsInUse MeanAllocation SlotsReserved %SlotsUnused BytesReserved BytesAppData %SlotWaste %AppData %StoreFile %TotalWaste %FileWaste
      64 4452 38900620 24.95 13334417 1.22 25566203 38.43 45 31910912 19.88 2042298368 1636236992 19.88 3.74 4.34 12.08 0.86
      128 4960 35569516 22.81 108642 1.00 35460874 53.31 69 35553280 0.26 4550819840 4538991872 0.26 10.39 9.67 0.35 0.03
      192 5 141887 0.09 116339 3.96 25548 0.04 158 35840 28.72 6881280 4905216 28.72 0.01 0.01 0.06 0.00
      320 6 264922 0.17 241155 6.16 23767 0.04 254 43008 44.74 13762560 7605440 44.74 0.02 0.03 0.18 0.01
      512 9 357273 0.23 318708 5.54 38565 0.06 417 64512 40.22 33030144 19745280 40.22 0.05 0.07 0.40 0.03
      768 32 553307 0.35 401500 2.43 151807 0.23 616 228096 33.45 175177728 116587776 33.45 0.27 0.37 1.74 0.12
      1024 7 397688 0.26 367347 7.93 30341 0.05 894 50176 39.53 51380224 31069184 39.53 0.07 0.11 0.60 0.04
      2048 11 1345273 0.86 1278633 17.06 66640 0.10 1513 78848 15.48 161480704 136478720 15.48 0.31 0.34 0.74 0.05
      3072 108 3753397 2.41 3019400 4.88 733997 1.10 2687 769024 4.55 2362441728 2254838784 4.55 5.16 5.02 3.20 0.23
      4096 55 16139046 10.35 15830487 40.94 308559 0.46 3631 394240 21.73 1614807040 1263857664 21.73 2.89 3.43 10.44 0.75
      8192 614 58510539 37.52 54397394 13.29 4113145 6.18 6263 4401152 6.54 36054237184 33694883840 6.54 77.10 76.60 70.20 5.01


      BLOBS

      Bucket(K) Allocations Allocated Deletes Current Mean
      16 10093108 125168179125 9654905 438203 12401
      32 3590088 66461722547 3566071 24017 18512
      64 11 383617 10 1 34874
      128 0 0 0 0 0
      256 0 0 0 0 0
      512 0 0 0 0 0
      1024 0 0 0 0 0
      2048 955 1038902480 955 0 1087856
      4096 0 0 0 0 0
      8192 0 0 0 0 0
      16384 0 0 0 0 0
      32768 0 0 0 0 0
      65536 0 0 0 0 0
      2097151 0 0 0 0 0

      Also when we specifed inlineURIFactory performance dropped down. So, looks
      like vocabulary and inlining is on...


      Blazegraph DataLoader Performance
      https://sourceforge.net/p/bigdata/discussion/676946/thread/21bbae1a/?limit=25#ee6a


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

Log in to post a comment.