|
From: Daniel H. <da...@de...> - 2016-06-01 00:19:46
|
Thanks Bryan for your clarification. I will repeat my experiments with different hardware configurations in the future. Daniel El 31/05/16 a las 20:00, Bryan Thompson escribió: > > Sata is a non starter for blazegraph unless it is ssd. The lack of > write reordering combined with high seek latency significantly limits > performance. This could be different for other engines. Blazegraph > (the open source platform) is pretty disk oriented. The gpu platform > is focused on high performance in fast memory. > > Bryan > > On May 31, 2016 7:00 PM, "Daniel Hernández" <da...@de... > <mailto:da...@de...>> wrote: > > Edgar, > > I confirm that the loading rate decreases while the database > increases its size. > > I have loaded 500M of triples and it have taken 23h using a triple > store back-end. I loaded the same amount of quads using the quad > store back-end and it takes 67h. The resulting databases have 61GB > and 120GB, respectively. My machine has 2xSATA disks on RAID 1, > 32GB of RAM a 2xIntel Xeon with Six Core. I use the parameter > -Xmx6g when loading (For small files, I got better results with 6g > than with 5g and 8g). > > I have seen that using SSD improves at least 3 times the elapsed > loading time. However, this could be true for every engine. Edgar, > if you improve your loading times without changing your machine I > will be grateful if you tell us how to yo did it. > > (By the way, I loaded the same files into Virtuoso and it required > approximately 4 hours for each file.) > > Cheers, > Daniel > > El 31/05/16 a las 18:15, Bryan Thompson escribió: >> >> Edgar, >> >> There is no single configuration for maximum load throughput. >> Instead there are a variety of steps you can take to improve load >> performance. For example, right sizing the jvm, using fast disk, >> maximizing inlining, etc. Beyond these steps and those detailed >> on the wiki, we look at the entire system to identify and remove >> bottlenecks. >> >> Thanks, >> Bryan >> >> On May 31, 2016 5:28 PM, "Edgar Rodriguez-Diaz" >> <ed...@sy... <mailto:ed...@sy...>> wrote: >> >> A correction here on the data size, it’s not 180G - it’s 18G >> of a gzip trig file exported by blazegraph; number of triples >> is correct. >> >> > On May 31, 2016, at 10:42 AM, Edgar Rodriguez-Diaz >> <ed...@sy... <mailto:ed...@sy...>> wrote: >> > >> > Hi, >> > >> > I’ve been trying to use the DataLoader tool for bulk >> loading a very large file into blazegraph (~180G with ~4 >> billion triples) with and empty journal file, but I’m >> noticing a performance degradation on the rate of triples/s >> loaded. It started at around 55K and after 200 M triples the >> rate is around 32K, the rate keeps going down consistently. >> > What is the configuration to get the best performance out >> of the bulk load into blazegraph? >> > >> > Thanks. >> > >> > - Edgar >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network >> bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, >> and protocols are >> consuming the most bandwidth. Provides multi-vendor support >> for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using >> capacity >> planning reports. >> https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> <mailto:Big...@li...> >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> >> >> >> ------------------------------------------------------------------------------ >> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic >> patterns at an interface-level. Reveals which users, apps, and protocols are >> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >> J-Flow, sFlow and other flows. Make informed decisions using capacity >> planning reports.https://ad.doubleclick.net/ddm/clk/305295220;132659582;e >> >> >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> <mailto:Big...@li...> >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth > and traffic > patterns at an interface-level. Reveals which users, apps, and > protocols are > consuming the most bandwidth. Provides multi-vendor support for > NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. > https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > <mailto:Big...@li...> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |