From: Bryan T. <br...@sy...> - 2015-04-07 19:58:07
|
You can also try renting EC2 instances and benchmarking against them. Yes, the analytic mode will help. However, we do not yet do ORDER BY on the native heap. Just FYI. We will be delivering some changes for quads mode to improve the use of the native heap - it is currently not used for default graph access paths (where we impose the DISTINCT SPO constraint on each access path). You have been in a bad environment with slow disk, heavy queries, and relatively little RAM. Even though the indices are clustered, they are not in key order on the disk. Thus an index scan still induces random IO. Improving your IOPs will make a huge difference. More RAM will help, but just drop an SSD in there and you should get a big win. Thanks, Bryan On Fri, Apr 3, 2015 at 4:59 PM, Jim Balhoff <ba...@ne...> wrote: > Hi Bryan, > > Thanks for your reply. I suppose I would characterize my queries as heavy, > since many of them individually take longer than I would like, but I am not > running Blazegraph on a great server at the moment. We do not have that > many concurrent clients. I do a lot of queries that have a large result set > that needs to be distinct and sorted, for paging through. It sounds like I > should experiment more with the analytic query mode. But my current old > server does not have much extra memory available. Would that be a > prerequisite for the analytic mode making a difference? The old server has > 8 GB memory, 6 GB allocated to JVM (perhaps that is too high), with a very > slow disk. > > Best regards, > Jim > > > On Apr 2, 2015, at 6:05 PM, Bryan Thompson <br...@sy...> wrote: > > > > Jim, > > > > The best way to size a machine is for a data set and workload. Always > buy SSD. The historical guidance was to use relatively small heaps (4-8GB) > and let the OS buffer the disk. The concept was to minimize the impact of > GC pauses. However, some people are having good success using large heaps > (112G) and the G1 garbage collector. > > > > We run data sets of that size on platforms as small as Mac minis. > > > > For query performance, faster CPU cores are good and more cores are > good. This assumes that the IO system has high IOPS. > > > > Would you characterize your queries as lightweight of heavy? Is the > query workload highly concurrent (lots of clients)? Is the working set > required to answer those queries small or a large part of your data? These > things effect the throughput you will observe for query. Query plan > optimizations is also very important. If you have an expensive query, make > sure that it is doing what you intend. Often the query can be improved. > For our part, we are working to improve the query optimizer. One client > recently reported a 2x improvement in 1.5.1 vs 1.2.x. We have a lot more > optimizations in the pipeline. > > > > The analytic query mode is for larger intermediate solutions sets. If > you run this kind of query, then turn it on. You can do this on a query by > query basis. The jvm ergonomics automatically allow a certain amount of > native memory allocation. You only need to explicitly specify this if you > are running into limits with those native buffers. The other use of native > buffers is for the write cache. This improves the bulk load rate, but it > does not look like it is your primary concern. > > > > Thanks, > > Bryan > > > > PS: yes, the list is fine. > > > > On Thursday, April 2, 2015, Jim Balhoff <ba...@ne...> wrote: > > Hi, > > > > I was wondering if you provided any guidance on hardware for different > sizes of databases. I have read through the performance articles on the > wiki, but am wondering if there are some more generalized guidelines that > could be stated. In my case, say I will have 150 million triples, and am > going to purchase a new system, how much memory is recommended? How much of > that memory should I give to the JVM via "-Xmx" vs. letting the OS use it > for caching the db? (I am also a little confused about whether I need to > specifically allocate some other amount to the JVM through > MaxDirectMemorySize, for analytic queries). I am only concerned with query > speed, not writes. > > > > Maybe there are too many special cases, but I was hoping there are some > minimum guidelines that could be determined. > > > > Side question: is it okay to post questions like this here? I find email > lists to be a lot more convenient than the Sourceforge forum, but I can > move it there if needed. > > > > Thank you, > > Jim > > > > > > ____________________________________________ > > James P. Balhoff, Ph.D. > > National Evolutionary Synthesis Center > > 2024 West Main St., Suite A200 > > Durham, NC 27705 > > USA > > > > > > > > > > > ------------------------------------------------------------------------------ > > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > > by Intel and developed in partnership with Slashdot Media, is your hub > for all > > things parallel software development, from weekly thought leadership > blogs to > > news, videos, case studies, tutorials and more. Take a look and join the > > conversation now. http://goparallel.sourceforge.net/ > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > -- > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://blazegraph.com > > http://blog.bigdata.com > > http://mapgraph.io > > Blazegraph™ is our ultra high-performance graph database that supports > both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our disruptive > new technology to use GPUs to accelerate data-parallel graph analytics. > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > |