Hello!
We try to load about 6 billions triples into database but during data loading faced with performance degradation problem. New database was created and
in the beginning performance was pretty good -
We use server with linux RedHat 6.6 installed , 128GB RAM memory and couple SSD disks. CPU speed and cores numers looks like is not an issue because during processing overall CPU loading was not exceed 20%.
Here is the command-line and properties file we use
Assuming that you are using a relatively fast SSD disk (if not, this is our
first recommendation), you can likely benefit from developing a custom
vocabulary for your data to enable inlining.
Hello!
We try to load about 6 billions triples into database but during data
loading faced with performance degradation problem. New database was
created and
in the beginning performance was pretty good -
We use server with linux RedHat 6.6 installed and 128GB memory. CPU and
cores is not and issue because during processing CPU loading were not
exceed 20%.
Here is the command-line and properties file we use
Brad,
Thank you for response and suggestion!
We created custom vocabulary were defined our common types and identifiers we use.
Aslo we extended InlineURIFactory with attempt to optimize inlinig.
We exeuted some different tests with loading but looks like aplying these changes in different combinations have no effect for perfromance and we got quite similar results, just may be with 5-10% improvement max.
Alexey
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Brad,
Thank you for response and suggestion!
We created custom vocabulary were defined our common types and identifiers
we use.
Aslo we extended InlineURIFactory with attempt to optimize inlinig.
We exeuted some different tests with loading but looks like aplying these
changes in different combinations have no effect for perfromance and we got
quite similar results, just may be with 5-10% improvement max.
Hello!
We try to load about 6 billions triples into database but during data loading faced with performance degradation problem. New database was created and
in the beginning performance was pretty good -
But during the time performance became worse and worse and after loading just about 200 millions dataloader shown following results -
We use server with linux RedHat 6.6 installed , 128GB RAM memory and couple SSD disks. CPU speed and cores numers looks like is not an issue because during processing overall CPU loading was not exceed 20%.
Here is the command-line and properties file we use
Also we adjusted all branching factors based on recommendations provided by status?dumpPages&dumpJournal command
Here is also information about data distribution across RWStore and blobs
Could you please provide thoughts what we doing wrong and how performance can be improved?
Last edit: Alexey Emtsov 2016-10-24
Alexey,
Assuming that you are using a relatively fast SSD disk (if not, this is our
first recommendation), you can likely benefit from developing a custom
vocabulary for your data to enable inlining.
See an example at
https://github.com/blazegraph/database/tree/master/vocabularies.
Thanks, --Brad
On Mon, Oct 24, 2016 at 7:26 AM, Alexey Emtsov mirguan@users.sf.net wrote:
Brad,
Thank you for response and suggestion!
We created custom vocabulary were defined our common types and identifiers we use.
Aslo we extended InlineURIFactory with attempt to optimize inlinig.
We exeuted some different tests with loading but looks like aplying these changes in different combinations have no effect for perfromance and we got quite similar results, just may be with 5-10% improvement max.
Alexey
Alexey,
Can you post dumpJournal with -pages and your journal properties to verify
inlining is taking place?
Thanks, Brad
On Oct 26, 2016 6:27 AM, "Alexey Emtsov" mirguan@users.sf.net wrote:
Here is it
Also when we specifed inlineURIFactory performance dropped down. So, looks like vocabulary and inlining is on...
Martyn: Do you any thoughts on this dump journal?
Alexey: Can you please confirm the drive type you are using?
On Fri, Oct 28, 2016 at 6:38 AM, Alexey Emtsov mirguan@users.sf.net wrote: