|
From: Bryan T. <br...@bl...> - 2016-06-01 00:01:06
|
Sata is a non starter for blazegraph unless it is ssd. The lack of write reordering combined with high seek latency significantly limits performance. This could be different for other engines. Blazegraph (the open source platform) is pretty disk oriented. The gpu platform is focused on high performance in fast memory. Bryan On May 31, 2016 7:00 PM, "Daniel Hernández" <da...@de...> wrote: Edgar, I confirm that the loading rate decreases while the database increases its size. I have loaded 500M of triples and it have taken 23h using a triple store back-end. I loaded the same amount of quads using the quad store back-end and it takes 67h. The resulting databases have 61GB and 120GB, respectively. My machine has 2xSATA disks on RAID 1, 32GB of RAM a 2xIntel Xeon with Six Core. I use the parameter -Xmx6g when loading (For small files, I got better results with 6g than with 5g and 8g). I have seen that using SSD improves at least 3 times the elapsed loading time. However, this could be true for every engine. Edgar, if you improve your loading times without changing your machine I will be grateful if you tell us how to yo did it. (By the way, I loaded the same files into Virtuoso and it required approximately 4 hours for each file.) Cheers, Daniel El 31/05/16 a las 18:15, Bryan Thompson escribió: Edgar, There is no single configuration for maximum load throughput. Instead there are a variety of steps you can take to improve load performance. For example, right sizing the jvm, using fast disk, maximizing inlining, etc. Beyond these steps and those detailed on the wiki, we look at the entire system to identify and remove bottlenecks. Thanks, Bryan On May 31, 2016 5:28 PM, "Edgar Rodriguez-Diaz" <ed...@sy...> wrote: > A correction here on the data size, it’s not 180G - it’s 18G of a gzip > trig file exported by blazegraph; number of triples is correct. > > > On May 31, 2016, at 10:42 AM, Edgar Rodriguez-Diaz <ed...@sy...> > wrote: > > > > Hi, > > > > I’ve been trying to use the DataLoader tool for bulk loading a very > large file into blazegraph (~180G with ~4 billion triples) with and empty > journal file, but I’m noticing a performance degradation on the rate of > triples/s loaded. It started at around 55K and after 200 M triples the rate > is around 32K, the rate keeps going down consistently. > > What is the configuration to get the best performance out of the bulk > load into blazegraph? > > > > Thanks. > > > > - Edgar > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Bigdata-developers mailing lis...@li...://lists.sourceforge.net/lists/listinfo/bigdata-developers ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ Bigdata-developers mailing list Big...@li... https://lists.sourceforge.net/lists/listinfo/bigdata-developers |