This list is closed, nobody may subscribe to it.
| 2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
| 2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
| 2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
| 2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
| 2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
| 2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
| 2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Antoni M. <ant...@ba...> - 2013-09-19 15:46:27
|
Hi, I think I found a bug. I downloaded the latest bigdata.war release from sourceforge, deployed in on the latest tomcat release, with out-of-the box configuration. Then I went to localhost:8080/bigdata and did this INSERT query: PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX umbel: <http://umbel.org/umbel/> INSERT DATA { <http://example/book1> skos:narrower <http://example/chapter1> . <http://example/book1> umbel:isRelatedToClass <http://example/book> . } And then I tried to do this SELECT: PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX umbel: <http://umbel.org/umbel/> SELECT ?p ?x WHERE { {?concept skos:narrower ?x BIND (skos:narrower as ?p) . } UNION {?concept umbel:isRelatedToClass ?x BIND (umbel:isRelatedToClass as ?p) . } FILTER (?concept in (<http://example/book1)) } I would expect this: ?p ?x skos:narrower <http://example/chapter1> umbel:isRelatedToClass <http://example/book> But when I do this query many times I always get either the first or the second, but never both. This looks to me like a race condition somewhere in the code that handles BIND or UNION. Two questions: 1. Is this a bug? What should be the behavior? 2. I can rephrase this query to say {?concept ?p ?x FILTER(?p in (skos:narrower, umbel:isRelatedToClass)) .}. This works OK. Is the variant with BIND likely to perform better (when it works)? Could anyone confirm? -- Antoni Myłka Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business |
|
From: Jeremy C. <jj...@gm...> - 2013-09-12 17:33:46
|
This is my current list of bigdata related work … To-do: 732: CBD options, one line fix 736: MIN - produce test case and initial exploration 739: BIND and optional path: test case and initial exploration 740: performance NSPIN, revisit 725: FILTER EXISTS - not really sure on next steps … 737: Class Cast Exception, do I need to do anything here? (review earlier e-mails of such lists and see if I have forgotten something :) ) == However, I have burned my bigdata related time budget (on 740) for this week, and probably next week too, and need to get back to other non bigdata work items. When I do get back to bigdata I will pick up on 732 as an easy win, and 736 and 739 as easy to move forward Jeremy |
|
From: Bryan T. <br...@sy...> - 2013-09-12 13:25:04
|
I need to move the following interfaces from com.bigdata.striterator into cutthecrap.utils.striterator.
- com.bigdata.striterator.ICloseable
- com.bigdata.striterator.ICloseableIterator
This is to support the compilation of the CTC striterator package as a distinct module. Right now it depends on the com.bigdata.striterator package. I need to break that dependency.
This will touch a large number of files. However, it should be straightforward to reconcile any conflicts that result. Just fixup the import for those interfaces.
Thanks,
Bryan
|
|
From: Bryan T. <br...@sy...> - 2013-09-12 12:34:42
|
Jeremy,
The code is not specifically optimized for a single or dual core CPU. If you are trying to tune performance for that situation, then I would recommend looking at the following properties:
- NSPIN - I think that this is a red-herring, but who knows. My thought is that you are adjusting the likelihood of a context switch when changing this value. I would suggest working with the parameters discussed below and obtaining the stack frames for slow producers and consumers in order to understand what parts of the query are the bottleneck in your use case.
- CHUNK_CAPACITY- This is the size of a vectored chunk of solutions. The default is 100. Query performance can be improved for some queries by increasing this value. However, if you have a highly concurrent workload then a larger value will increase the heap pressure and the GC time and result in a lower throughput. Try 1,000 or 10,000. The larger the value, the fewer times any given operator will execute. Therefore this can effect context switching. Larger values will tend to cause each operator to execute once and will therefore tend to increase the latency to the first result, but may decrease the latency to the last result.
IChunkedIterator:: // This will effect iterator patterns.
int DEFAULT_CHUNK_SIZE = 100;
BufferAnnotations:: // This will effect query operators.
int DEFAULT_CHUNK_SIZE = 100;
Some other relevant configuration options are defined on PipelineOp.Annotations. I can answer questions about the other options as you become oriented to this part of the code.
I am reassigning https://sourceforge.net/apps/trac/bigdata/ticket/740 to you. Please see my comments there.
Thanks,
Bryan
On 9/11/13 11:05 PM, "Jeremy J Carroll" <jj...@sy...<mailto:jj...@sy...>> wrote:
Since the typically scenario is
multiple queries, multiple operators, and multiple operation execution
phases all running in parallel, there is generally work available to be
done somewhere.
Yes - the improvement in the multi client scenario in the report is less than in the single client scenario, but still pretty impressive.
I am of course thinking about the Syapse system, where each deployment may have a relative small number of users (e.g. 10 or 20), only one or two of whom may be active at any one time. So we may have a server with say a dual core processor with hyper-threading, actually serving just one person.
A different usage scenario for Syapse is a batch job with one enormous query.
While this may differ from the typical bigdata user, I don't think it is totally abnormal.
|
|
From: Jeremy J C. <jj...@sy...> - 2013-09-12 03:27:18
|
> Since the typically scenario is > multiple queries, multiple operators, and multiple operation execution > phases all running in parallel, there is generally work available to be > done somewhere. Yes - the improvement in the multi client scenario in the report is less than in the single client scenario, but still pretty impressive. I am of course thinking about the Syapse system, where each deployment may have a relative small number of users (e.g. 10 or 20), only one or two of whom may be active at any one time. So we may have a server with say a dual core processor with hyper-threading, actually serving just one person. A different usage scenario for Syapse is a batch job with one enormous query. While this may differ from the typical bigdata user, I don't think it is totally abnormal. |
|
From: Bryan T. <br...@sy...> - 2013-09-12 00:57:38
|
It might be low, but if I recall when it falls out of that spin it is
really just falling into another loop.
I just took a peek at the code. The whole thing is wrapped by a
while(true). It checks for an asynchronous close. If there is nothing,
then it drops into a non-blocking poll() in the NSPIN loop. Then it will
drop into a blocking poll() with a timeout. If all of that fails, it is
going to wind up reentering from the top of the top.
When you play with NSPIN, it is playing with how long the CPU will spin on
that thread looking for something from the producer. When you ramp that
value up, it is spinner longer. If that results in higher throughput then
this maybe a tradeoff point where less context switching is occurring and
the net yield is better throughput.
However, normally the producer is dropping chunks of something (solutions,
IVs, Values) onto the BlockingBuffer. If it hits the poll() with the
timeout, then my expectation is that it will wake up a bit later and find
that there is some work to be done. Since the typically scenario is
multiple queries, multiple operators, and multiple operation execution
phases all running in parallel, there is generally work available to be
done somewhere.
Try getting those stack frames and also see what's going into / out of the
buffer. There is a log timeout that you can mess with if you want to see
when the producer is slow (consumer is blocking). You can enable that with
private static final boolean producerConsumerWarnings = false;
But, again, this is typically because of a bad join in the plan. For
example, you might be spinning waiting for the final solutions and the
join is doing too much work and the work is getting eliminated by a filter.
If you set the LOG @ INFO it should grab those stack frames. It will log
them automatically in _hasNext() if the logTimeout is exceeded and the
logger is at INFO or finer.
Bryan
On 9/11/13 8:35 PM, "Jeremy J Carroll" <jj...@sy...> wrote:
>I will try and work out how to get you something more concrete tomorrow.
>
>I thought your 100 looked somewhat low for a spin lock, since I
>remembered being surprised at how high
>java.util.concurrent.locks.AbstractQueuedLongSynchronizer.spinForTimeoutTh
>reshold
>is (1000 ns, maybe 2000 spins) Š then suck it and see pushed the number
>higher.
>
>Jeremy J Carroll
>Principal Architect
>Syapse, Inc.
>
>
>
>On Sep 11, 2013, at 5:00 PM, Bryan Thompson <br...@sy...> wrote:
>
>> Can you obfuscate the data and provide queries so we can reproduce this
>>workload? That would make it easier to have some understanding of the
>>problem. But that is not really a ticket we can work as such. The
>>problem needs to be reproducible. Alternatively, can you reproduce a
>>beneficial effect by mucking around with NSPIN on a known benchmark?
>>E.g., BSBM. Right now, I suspect the configuration and query plans.
>>
>> Anything slow with only 57000 quads is going to be a bad join resulting
>>in an imbalance in the consumers and producers and possibly spamming the
>>heap. Take out each query from the mix in a process or elimination to
>>identify the culprits or just look at each query plan by hand - NSS has
>>an explain page for doing this.
>>
>> I think that nspin is a red herring. Look at the time for each query.
>>Which ones are running slowly? Look at their query plans.
>>
>> There is an implementation of a runtime query optimizer that is not yet
>>integrated into the SPARQL layer. If you are feeling ambitious you can
>>code up the triple patterns and use that to see how it orders the joins
>>based on the estimated cardinalty from sampling cut off join paths.
>>JoinGraph is the entry point. This implements the ROX approach to chain
>>sampling with some minor variations.
>>
>> Bryan
>>
>> On Sep 11, 2013, at 7:44 PM, "Jeremy J Carroll" <jj...@sy...> wrote:
>>
>>> I have written up the performance issue as trac740
>>>
>>> I am coming from observing the code essentially as a black box; maybe
>>>someone who understands the code better might care to review my write
>>>up and the recommended response.
>>>
>>>
>>> Jeremy J Carroll
>>> Principal Architect
>>> Syapse, Inc.
>>>
>>>
>>>
>>>
>>>
>>>------------------------------------------------------------------------
>>>------
>>> How ServiceNow helps IT people transform IT departments:
>>> 1. Consolidate legacy IT systems to a single system of record for IT
>>> 2. Standardize and globalize service processes across IT
>>> 3. Implement zero-touch automation to replace manual, redundant tasks
>>>
>>>http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clk
>>>trk
>>> _______________________________________________
>>> Bigdata-developers mailing list
>>> Big...@li...
>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
|
|
From: Jeremy J C. <jj...@sy...> - 2013-09-12 00:36:00
|
I will try and work out how to get you something more concrete tomorrow. I thought your 100 looked somewhat low for a spin lock, since I remembered being surprised at how high java.util.concurrent.locks.AbstractQueuedLongSynchronizer.spinForTimeoutThreshold is (1000 ns, maybe 2000 spins) … then suck it and see pushed the number higher. Jeremy J Carroll Principal Architect Syapse, Inc. On Sep 11, 2013, at 5:00 PM, Bryan Thompson <br...@sy...> wrote: > Can you obfuscate the data and provide queries so we can reproduce this workload? That would make it easier to have some understanding of the problem. But that is not really a ticket we can work as such. The problem needs to be reproducible. Alternatively, can you reproduce a beneficial effect by mucking around with NSPIN on a known benchmark? E.g., BSBM. Right now, I suspect the configuration and query plans. > > Anything slow with only 57000 quads is going to be a bad join resulting in an imbalance in the consumers and producers and possibly spamming the heap. Take out each query from the mix in a process or elimination to identify the culprits or just look at each query plan by hand - NSS has an explain page for doing this. > > I think that nspin is a red herring. Look at the time for each query. Which ones are running slowly? Look at their query plans. > > There is an implementation of a runtime query optimizer that is not yet integrated into the SPARQL layer. If you are feeling ambitious you can code up the triple patterns and use that to see how it orders the joins based on the estimated cardinalty from sampling cut off join paths. JoinGraph is the entry point. This implements the ROX approach to chain sampling with some minor variations. > > Bryan > > On Sep 11, 2013, at 7:44 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >> I have written up the performance issue as trac740 >> >> I am coming from observing the code essentially as a black box; maybe someone who understands the code better might care to review my write up and the recommended response. >> >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> >> >> ------------------------------------------------------------------------------ >> How ServiceNow helps IT people transform IT departments: >> 1. Consolidate legacy IT systems to a single system of record for IT >> 2. Standardize and globalize service processes across IT >> 3. Implement zero-touch automation to replace manual, redundant tasks >> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2013-09-12 00:18:51
|
There are some options that you can enable in that class to collect stack frames when the blocking buffer is allocated. That generally will tell you who is the producer. This might be a static field or perhaps is triggered automatically at a suitable log level - I am not in front of the code, The Consumer is whoever is calling hasNext() on the iterator. There are a lot of possible use patterns. - materializing rdf values from IVs when projecting out the results of a query. - asynchronous iterator patterns on access paths are used when the key range scan has a high cardinalty. - operators now consume what amounts to a list of chunks and produce chunks that are then dropped into a Deque for the target downstream operator. - there are a variety of asynchronous chunked iterator patterns (including value materialization) that use the blocking buffer. If you enable those stack frame grabs then you can figure out which buffer is spinning while waiting on the producer and who is that producer. A lot of this is also visible in the explain page for a query in terms of the number of solutions read from access paths, the number of solutions in and out of an operator, etc. you can basically see how the intermediate cardinality changes as the solutions flow through the operators. You can also see the number of times each operator is invoked and the total time used by each operator. B On Sep 11, 2013, at 8:03 PM, "Jeremy J Carroll" <jj...@sy...<mailto:jj...@sy...>> wrote: I was being somewhat naive … and I tried a lot of values and 100000 did seem to be better (e.g. 20%) than both 30000 and 300000 as well as a lot better than 100. I am unclear who the producer and consumers were, since I didn't try and understand the code to that point … maybe you could suggest further drilling. My tests definitely show that there is an unnecessary performance hole somewhere related to this! On Sep 11, 2013, at 4:45 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: That said, things that points at the blocking buffer class generally have a root cause in a slow producer or a slow consumer. I have never seen the blocking buffer itself at fault. Running the spin lock counter up and getting better performance is just playing games with the expect latency of arrival in the queue. |
|
From: Jeremy J C. <jj...@sy...> - 2013-09-12 00:03:23
|
I was being somewhat naive … and I tried a lot of values and 100000 did seem to be better (e.g. 20%) than both 30000 and 300000 as well as a lot better than 100. I am unclear who the producer and consumers were, since I didn't try and understand the code to that point … maybe you could suggest further drilling. My tests definitely show that there is an unnecessary performance hole somewhere related to this! On Sep 11, 2013, at 4:45 PM, Bryan Thompson <br...@sy...> wrote: > That said, things that points at the blocking buffer class generally have a root cause in a slow producer or a slow consumer. I have never seen the blocking buffer itself at fault. Running the spin lock counter up and getting better performance is just playing games with the expect latency of arrival in the queue. |
|
From: Bryan T. <br...@sy...> - 2013-09-12 00:01:14
|
Can you obfuscate the data and provide queries so we can reproduce this workload? That would make it easier to have some understanding of the problem. But that is not really a ticket we can work as such. The problem needs to be reproducible. Alternatively, can you reproduce a beneficial effect by mucking around with NSPIN on a known benchmark? E.g., BSBM. Right now, I suspect the configuration and query plans. Anything slow with only 57000 quads is going to be a bad join resulting in an imbalance in the consumers and producers and possibly spamming the heap. Take out each query from the mix in a process or elimination to identify the culprits or just look at each query plan by hand - NSS has an explain page for doing this. I think that nspin is a red herring. Look at the time for each query. Which ones are running slowly? Look at their query plans. There is an implementation of a runtime query optimizer that is not yet integrated into the SPARQL layer. If you are feeling ambitious you can code up the triple patterns and use that to see how it orders the joins based on the estimated cardinalty from sampling cut off join paths. JoinGraph is the entry point. This implements the ROX approach to chain sampling with some minor variations. Bryan On Sep 11, 2013, at 7:44 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > I have written up the performance issue as trac740 > > I am coming from observing the code essentially as a black box; maybe someone who understands the code better might care to review my write up and the recommended response. > > > Jeremy J Carroll > Principal Architect > Syapse, Inc. > > > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. Consolidate legacy IT systems to a single system of record for IT > 2. Standardize and globalize service processes across IT > 3. Implement zero-touch automation to replace manual, redundant tasks > http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2013-09-11 23:45:51
|
BlockingBuffer should be replaced be a Deque at some point using a poison pill pattern for the producer to indicate that no more data is available. It originally supported asynchronous streaming interators on a cluster. Now all such operations rely on chunked processing on the query engine. The class is used extensively. It has not been pulled out because of the extensive use and the lack of evidence that it is actually limiting performance. That said, things that points at the blocking buffer class generally have a root cause in a slow producer or a slow consumer. I have never seen the blocking buffer itself at fault. Running the spin lock counter up and getting better performance is just playing games with the expect latency of arrival in the queue. Though I am surprised that spinning that long ever helps. Bryan On Sep 11, 2013, at 5:39 PM, "Jeremy J Carroll" <jj...@sy...<mailto:jj...@sy...>> wrote: I am still working on it. The profiler led me into looking at: com.bigdata.relation.accesspath.BlockingBuffer.NSPIN which I duplicated into two variables, one for each use, and changed the one for com.bigdata.relation.accesspath.BlockingBuffer.BlockingIterator._hasNext(long) from 100 to 100000 with good effect on my machine. My tests speeded up from: 0m14.018s to 0m4.607s but …. back on a different box, this had no effect. My machine is a Mac with SSD and quad core with ht The different box was an AWS large, I think, with dual core With both values set to 1000 (both in com.bigdata.relation.accesspath.BlockingBuffer.BlockingIterator._hasNext(long) and com.bigdata.relation.accesspath.BlockingBuffer.add(E, long, TimeUnit) ) I fell into a total hole, on the quad core with SSD machine. The first run of the tests took over 2m. On start up the tests are always a bit slower, maybe taking twice as long, but 2m was totally overboard! I have also spent some time looking at the call to randomUUID() for the query op …. but don't seem to be able to make that much progress on it. bigdata/src/java/com/bigdata/bop/engine/QueryEngine.java line 1039 I note that BlockingBuffer is used for multiple purposes and wonder whether different numbers in different places might make sense … or maybe this should be configurable Jeremy J Carroll Principal Architect Syapse, Inc. On Sep 11, 2013, at 1:22 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: Jeremy, Did you get any further with this? Thanks, Bryan From: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Date: Tuesday, September 10, 2013 5:36 PM To: Jeremy Carroll <jj...@sy...<mailto:jj...@sy...>> Cc: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] performance question I have never tried limiting the backend to a single core. Bigdata uses threads to schedule IOs (it does not yet have a dependency on the AIO features in Java 7). So it will always use multiple threads. It will execute queries with plenty of parallelism: * Different queries run concurrently * Each query can run multiple operators concurrently, depending on when intermediate solutions become available for the operators * Each operator in a query can execute concurrently if there is a enough data in the queue for that operator and the implementation of the operator supports parallelism. We run benchmarks with concurrent query and there are no known thread contention hot spots. What are you using to connect to bigdata? If you are trying to query the unisolated connection or executing updates, then those operations will be serialized. Any read-only view of the database is completely non-blocking (outside of contention when there is a need to load a page after a page miss). There is a thread pool for the NSS that determines the maximum number of concurrent queries that it will allow. You can look at the /status html page of the NSS and see the queries that are actively running – there is a hyperlink on the page for this. Thanks, Bryan From: Jeremy Carroll <jj...@sy...<mailto:jj...@sy...>> Date: Tuesday, September 10, 2013 5:24 PM To: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Cc: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] performance question On Sep 10, 2013, at 1:08 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: I would look at the performance of each query individually and see if any of them is an obvious outlier with a bad query plan. the simplistic test environment prints a '.' after each query, and this come at a fairly steady rate … so I don't think it is query related. Thanks for the various suggestions, I am working through them, and a couple of my own or my colleagues …. I will update here when I have a solution. The "parallelizing the load" is the difference between the following two shell commands: nosetests is a python test harness, which runs all the tests in each file passed to it. # run the client tests (11 queries) 6 times over, for 66 queries, one after the other time nosetests python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py and # run the client tests (11 queries) from 6 different parallel sub-shells, making 66 concurrent queries for c in 1 2 3 4 5 6 do time nosetests python/syapse/apps/search/tests/syql_test.py & done Even with my OS limiting my h/w to one core, the parallel query has comparable performance compared with our previous solution which does not parallelize well. |
|
From: Jeremy J C. <jj...@sy...> - 2013-09-11 23:44:11
|
I have written up the performance issue as trac740 I am coming from observing the code essentially as a black box; maybe someone who understands the code better might care to review my write up and the recommended response. Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-09-11 21:39:42
|
I am still working on it. The profiler led me into looking at: com.bigdata.relation.accesspath.BlockingBuffer.NSPIN which I duplicated into two variables, one for each use, and changed the one for com.bigdata.relation.accesspath.BlockingBuffer.BlockingIterator._hasNext(long) from 100 to 100000 with good effect on my machine. My tests speeded up from: 0m14.018s to 0m4.607s but …. back on a different box, this had no effect. My machine is a Mac with SSD and quad core with ht The different box was an AWS large, I think, with dual core With both values set to 1000 (both in com.bigdata.relation.accesspath.BlockingBuffer.BlockingIterator._hasNext(long) and com.bigdata.relation.accesspath.BlockingBuffer.add(E, long, TimeUnit) ) I fell into a total hole, on the quad core with SSD machine. The first run of the tests took over 2m. On start up the tests are always a bit slower, maybe taking twice as long, but 2m was totally overboard! I have also spent some time looking at the call to randomUUID() for the query op …. but don't seem to be able to make that much progress on it. bigdata/src/java/com/bigdata/bop/engine/QueryEngine.java line 1039 I note that BlockingBuffer is used for multiple purposes and wonder whether different numbers in different places might make sense … or maybe this should be configurable Jeremy J Carroll Principal Architect Syapse, Inc. On Sep 11, 2013, at 1:22 PM, Bryan Thompson <br...@sy...> wrote: > Jeremy, > > Did you get any further with this? > > Thanks, > Bryan > > From: Bryan Thompson <br...@sy...> > Date: Tuesday, September 10, 2013 5:36 PM > To: Jeremy Carroll <jj...@sy...> > Cc: "Big...@li..." <Big...@li...> > Subject: Re: [Bigdata-developers] performance question > > I have never tried limiting the backend to a single core. Bigdata uses threads to schedule IOs (it does not yet have a dependency on the AIO features in Java 7). So it will always use multiple threads. It will execute queries with plenty of parallelism: > Different queries run concurrently > Each query can run multiple operators concurrently, depending on when intermediate solutions become available for the operators > Each operator in a query can execute concurrently if there is a enough data in the queue for that operator and the implementation of the operator supports parallelism. > We run benchmarks with concurrent query and there are no known thread contention hot spots. > > What are you using to connect to bigdata? If you are trying to query the unisolated connection or executing updates, then those operations will be serialized. Any read-only view of the database is completely non-blocking (outside of contention when there is a need to load a page after a page miss). There is a thread pool for the NSS that determines the maximum number of concurrent queries that it will allow. > > You can look at the /status html page of the NSS and see the queries that are actively running – there is a hyperlink on the page for this. > > Thanks, > Bryan > > From: Jeremy Carroll <jj...@sy...> > Date: Tuesday, September 10, 2013 5:24 PM > To: Bryan Thompson <br...@sy...> > Cc: "Big...@li..." <Big...@li...> > Subject: Re: [Bigdata-developers] performance question > > > > On Sep 10, 2013, at 1:08 PM, Bryan Thompson <br...@sy...> wrote: > >> I would look at the performance of each query individually and see if any >> of them is an obvious outlier with a bad query plan. > > the simplistic test environment prints a '.' after each query, and this come at a fairly steady rate … so I don't think it is query related. > > Thanks for the various suggestions, I am working through them, and a couple of my own or my colleagues …. > I will update here when I have a solution. > > > The "parallelizing the load" is the difference between the following two shell commands: nosetests is a python test harness, which runs all the tests in each file passed to it. > > > # run the client tests (11 queries) 6 times over, for 66 queries, one after the other > time nosetests python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py > > > and > > > # run the client tests (11 queries) from 6 different parallel sub-shells, making 66 concurrent queries > for c in 1 2 3 4 5 6 > do > time nosetests python/syapse/apps/search/tests/syql_test.py & > done > > > Even with my OS limiting my h/w to one core, the parallel query has comparable performance compared with our previous solution which does not parallelize well. > |
|
From: Bryan T. <br...@sy...> - 2013-09-11 20:23:29
|
Jeremy, Did you get any further with this? Thanks, Bryan From: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Date: Tuesday, September 10, 2013 5:36 PM To: Jeremy Carroll <jj...@sy...<mailto:jj...@sy...>> Cc: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] performance question I have never tried limiting the backend to a single core. Bigdata uses threads to schedule IOs (it does not yet have a dependency on the AIO features in Java 7). So it will always use multiple threads. It will execute queries with plenty of parallelism: * Different queries run concurrently * Each query can run multiple operators concurrently, depending on when intermediate solutions become available for the operators * Each operator in a query can execute concurrently if there is a enough data in the queue for that operator and the implementation of the operator supports parallelism. We run benchmarks with concurrent query and there are no known thread contention hot spots. What are you using to connect to bigdata? If you are trying to query the unisolated connection or executing updates, then those operations will be serialized. Any read-only view of the database is completely non-blocking (outside of contention when there is a need to load a page after a page miss). There is a thread pool for the NSS that determines the maximum number of concurrent queries that it will allow. You can look at the /status html page of the NSS and see the queries that are actively running – there is a hyperlink on the page for this. Thanks, Bryan From: Jeremy Carroll <jj...@sy...<mailto:jj...@sy...>> Date: Tuesday, September 10, 2013 5:24 PM To: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Cc: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] performance question On Sep 10, 2013, at 1:08 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: I would look at the performance of each query individually and see if any of them is an obvious outlier with a bad query plan. the simplistic test environment prints a '.' after each query, and this come at a fairly steady rate … so I don't think it is query related. Thanks for the various suggestions, I am working through them, and a couple of my own or my colleagues …. I will update here when I have a solution. The "parallelizing the load" is the difference between the following two shell commands: nosetests is a python test harness, which runs all the tests in each file passed to it. # run the client tests (11 queries) 6 times over, for 66 queries, one after the other time nosetests python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py and # run the client tests (11 queries) from 6 different parallel sub-shells, making 66 concurrent queries for c in 1 2 3 4 5 6 do time nosetests python/syapse/apps/search/tests/syql_test.py & done Even with my OS limiting my h/w to one core, the parallel query has comparable performance compared with our previous solution which does not parallelize well. |
|
From: Bryan T. <br...@sy...> - 2013-09-10 21:37:07
|
I have never tried limiting the backend to a single core. Bigdata uses threads to schedule IOs (it does not yet have a dependency on the AIO features in Java 7). So it will always use multiple threads. It will execute queries with plenty of parallelism: * Different queries run concurrently * Each query can run multiple operators concurrently, depending on when intermediate solutions become available for the operators * Each operator in a query can execute concurrently if there is a enough data in the queue for that operator and the implementation of the operator supports parallelism. We run benchmarks with concurrent query and there are no known thread contention hot spots. What are you using to connect to bigdata? If you are trying to query the unisolated connection or executing updates, then those operations will be serialized. Any read-only view of the database is completely non-blocking (outside of contention when there is a need to load a page after a page miss). There is a thread pool for the NSS that determines the maximum number of concurrent queries that it will allow. You can look at the /status html page of the NSS and see the queries that are actively running – there is a hyperlink on the page for this. Thanks, Bryan From: Jeremy Carroll <jj...@sy...<mailto:jj...@sy...>> Date: Tuesday, September 10, 2013 5:24 PM To: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Cc: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] performance question On Sep 10, 2013, at 1:08 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: I would look at the performance of each query individually and see if any of them is an obvious outlier with a bad query plan. the simplistic test environment prints a '.' after each query, and this come at a fairly steady rate … so I don't think it is query related. Thanks for the various suggestions, I am working through them, and a couple of my own or my colleagues …. I will update here when I have a solution. The "parallelizing the load" is the difference between the following two shell commands: nosetests is a python test harness, which runs all the tests in each file passed to it. # run the client tests (11 queries) 6 times over, for 66 queries, one after the other time nosetests python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py and # run the client tests (11 queries) from 6 different parallel sub-shells, making 66 concurrent queries for c in 1 2 3 4 5 6 do time nosetests python/syapse/apps/search/tests/syql_test.py & done Even with my OS limiting my h/w to one core, the parallel query has comparable performance compared with our previous solution which does not parallelize well. |
|
From: Jeremy J C. <jj...@sy...> - 2013-09-10 21:24:41
|
On Sep 10, 2013, at 1:08 PM, Bryan Thompson <br...@sy...> wrote: > I would look at the performance of each query individually and see if any > of them is an obvious outlier with a bad query plan. the simplistic test environment prints a '.' after each query, and this come at a fairly steady rate … so I don't think it is query related. Thanks for the various suggestions, I am working through them, and a couple of my own or my colleagues …. I will update here when I have a solution. The "parallelizing the load" is the difference between the following two shell commands: nosetests is a python test harness, which runs all the tests in each file passed to it. # run the client tests (11 queries) 6 times over, for 66 queries, one after the other time nosetests python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py python/syapse/apps/search/tests/syql_test.py and # run the client tests (11 queries) from 6 different parallel sub-shells, making 66 concurrent queries for c in 1 2 3 4 5 6 do time nosetests python/syapse/apps/search/tests/syql_test.py & done Even with my OS limiting my h/w to one core, the parallel query has comparable performance compared with our previous solution which does not parallelize well. |
|
From: Bryan T. <br...@sy...> - 2013-09-10 20:09:51
|
The time in individual threads is aggregated by yourkit. Thread.run() is an idle thread unless there are called methods from run(). I.e., it should be ignored. You talk about parallelizing the load (near the end) and about a query workload (up front). What is your workload? I am confused about that. I would look at the performance of each query individually and see if any of them is an obvious outlier with a bad query plan. Make sure that the JVM has reasonable options, e.g., -server and assign it about 1/2 of the RAM up to 4G on your machine. With JDK 7 and the G1 garbage collector you may be able to give it more RAM, but start in a safe zone. Thanks, Bryan On 9/10/13 3:05 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >I am doing a performance comparison between bigdata based solution and >our previous solution, and I am getting *very* confused. > >My question is what time is being used by bigdata which is not being >measured as either user or sys time when running bigdata? > >The task is as follows: > >I have 11 queries that can be answered by both systems, and from a user >point of view are identical. > >I ask the suite of 11 queries 6 times over. In the bigdata set up, I am >using bigdata as a sparql end point, and the queries are passed over http > >I am currently just doing this on my Mac (a mountain lion, with SSD) > >The wall time to run the queries is approx 30 seconds, however, the cpu >time (both user and sys) recorded against the client and the server is a >lot less, with about 1 second in the client and 5 seconds in the server. >I am having difficulty finding where the time is going - over 20 seconds >is simply missing. > >By running bigdata in the debugger and adding System.nanoTime() calls >before and after QueryServlet.doQuery() I have convinced myself that the >issue is server side not client side, and also not networking related. > >When running inside yourkit, with the settings set to wall-time, the time >seems to be explained in the following cryptic line: > >java.lang.Thread.run() 88804ms Time, 84928ms Own Time >i.e. the vast bulk of the run-time (approximately three times the >experienced time of 30 seconds) >is accounted for in the Thread.run() method doing who knows what (waiting >for Thread scheduling?) > >I am getting very similar results with either of the following changes: >- use ramdisk rather than the SSD >- use only 1 cpu without hyper threading, instead of the quad core with >hyper threading that my machine comes with. > >(i.e. the actual execution time is the same with or without extra cores!) > >=== > >I am continuing with testing, my next tests will be: >- parallelize the load and see if the quad core machine does better >- try on a linux box in AWS > >Any thoughts would be appreciated > >=== > >I am making extensive use of named graphs, with the select queries >starting with approx 40 FROM NAMED and FROM clauses, otherwise I don't >think there is anything particularly funky about my queries. > > >Jeremy J Carroll >Principal Architect >Syapse, Inc. > > > > >-------------------------------------------------------------------------- >---- >How ServiceNow helps IT people transform IT departments: >1. Consolidate legacy IT systems to a single system of record for IT >2. Standardize and globalize service processes across IT >3. Implement zero-touch automation to replace manual, redundant tasks >http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktr >k >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Jeremy J C. <jj...@sy...> - 2013-09-10 19:06:01
|
I am doing a performance comparison between bigdata based solution and our previous solution, and I am getting *very* confused. My question is what time is being used by bigdata which is not being measured as either user or sys time when running bigdata? The task is as follows: I have 11 queries that can be answered by both systems, and from a user point of view are identical. I ask the suite of 11 queries 6 times over. In the bigdata set up, I am using bigdata as a sparql end point, and the queries are passed over http I am currently just doing this on my Mac (a mountain lion, with SSD) The wall time to run the queries is approx 30 seconds, however, the cpu time (both user and sys) recorded against the client and the server is a lot less, with about 1 second in the client and 5 seconds in the server. I am having difficulty finding where the time is going - over 20 seconds is simply missing. By running bigdata in the debugger and adding System.nanoTime() calls before and after QueryServlet.doQuery() I have convinced myself that the issue is server side not client side, and also not networking related. When running inside yourkit, with the settings set to wall-time, the time seems to be explained in the following cryptic line: java.lang.Thread.run() 88804ms Time, 84928ms Own Time i.e. the vast bulk of the run-time (approximately three times the experienced time of 30 seconds) is accounted for in the Thread.run() method doing who knows what (waiting for Thread scheduling?) I am getting very similar results with either of the following changes: - use ramdisk rather than the SSD - use only 1 cpu without hyper threading, instead of the quad core with hyper threading that my machine comes with. (i.e. the actual execution time is the same with or without extra cores!) === I am continuing with testing, my next tests will be: - parallelize the load and see if the quad core machine does better - try on a linux box in AWS Any thoughts would be appreciated === I am making extensive use of named graphs, with the select queries starting with approx 40 FROM NAMED and FROM clauses, otherwise I don't think there is anything particularly funky about my queries. Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-09-04 20:47:59
|
I have added a fix to this issue to the new 1.3.0 branch, and reassigned to Mike for verification.
As far as I could tell the problem was that the code assumed that at least one end of each property path was an unbound variable.
I added code to test for the end being a constant or a pre-bound variable - where the value is the required one.
Jeremy J Carroll
Principal Architect
Syapse, Inc.
On Sep 4, 2013, at 10:25 AM, Jeremy J Carroll <jj...@sy...> wrote:
> My initial commit on bug 734 dropped through the cracks, and is on the 1.2.0 branch - I have added a further commit this morning.
> So far all I have done is add tests that are not linked into the main test suite
>
> I have found a workaround that will work for me for now; so I will return to this later.
>
> Please let me know if you would like me to consolidate the work done so far and add it to the new dev branch
>
> thanks
>
>
> FYGI the bug concerns the following query:
>
>
> SELECT ?A
> WHERE {
> ?A rdf:type / rdfs:subClassOf *
> <os:ClassA> ;
> rdf:value ?B .
> ?B rdf:type / rdfs:subClassOf *
> <os:ClassB>
> }
>
>
> As in the title - the problem comes from having two paths.
>
> Jeremy J Carroll
> Principal Architect
> Syapse, Inc.
>
>
>
|
|
From: Jeremy J C. <jj...@sy...> - 2013-09-04 17:25:15
|
My initial commit on bug 734 dropped through the cracks, and is on the 1.2.0 branch - I have added a further commit this morning.
So far all I have done is add tests that are not linked into the main test suite
I have found a workaround that will work for me for now; so I will return to this later.
Please let me know if you would like me to consolidate the work done so far and add it to the new dev branch
thanks
FYGI the bug concerns the following query:
SELECT ?A
WHERE {
?A rdf:type / rdfs:subClassOf *
<os:ClassA> ;
rdf:value ?B .
?B rdf:type / rdfs:subClassOf *
<os:ClassB>
}
As in the title - the problem comes from having two paths.
Jeremy J Carroll
Principal Architect
Syapse, Inc.
|
|
From: Bryan T. <br...@sy...> - 2013-09-04 11:37:42
|
I've posted a ticket on trac [1]. Please suggest an approach that would allow us to reconcile the existing reporting of mutation results with the human-readable representation for SPARQL UPDATE. Thanks, Bryan [1] https://sourceforge.net/apps/trac/bigdata/ticket/735 From: Eugen F <feu...@ya...<mailto:feu...@ya...>> Reply-To: Eugen F <feu...@ya...<mailto:feu...@ya...>> Date: Wednesday, September 4, 2013 7:11 AM To: "big...@li...<mailto:big...@li...>" <big...@li...<mailto:big...@li...>> Subject: [Bigdata-developers] Create an XHTML response for SPARQL UPDATE It would be useful to have a xml response to the rest Sparql update so data can be parsed by the caller. This has been discussed here: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8664308 |
|
From: Eugen F <feu...@ya...> - 2013-09-04 11:12:03
|
It would be useful to have a xml response to the rest Sparql update so data can be parsed by the caller. This has been discussed here: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8664308 |
|
From: Bryan T. <br...@sy...> - 2013-09-02 22:25:11
|
I have updated the links from the blog that point at CI for the development branch and at the HA test suite detailed results. See below, but these are also linked from the developers section on the blog. CI results (currently a tarball – I will look at restoring navigable results.) http://www.bigdata.com/hudson-release-1.3.0/lastSuccessful/archive/BIGDATA_RELEASE_1_3_0/ant-build/classes/test/test-results/report.tgz HA test suite – detailed results: http://www.bigdata.com/hudson-release-1.3.0/lastSuccessful/archive/BIGDATA_RELEASE_1_3_0/ant-build/classes/test/test-results/HAtest-report.tgz Thanks, Bryan From: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Date: Monday, September 2, 2013 1:53 PM To: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: [Bigdata-developers] New development branch -- please switch over to branches/BIGDATA_RELEASE_1_3_0. All changes through r7381 in the development branch (branches/BIGDATA_RELEASE_1_2_0) have been captured in the HA development branch (branches/READ_CACHE2). At this stage, HA is feature complete and we are (and have been) in QA for an HA release. Owing to difficulties performing a reintegration merge back to the development branch (SF SVN does not support this due to the SVN version on SF), I have instead elected to create a new development branch for 1.3.0. The new development branch is: branches/BIGDATA_RELEASE_1_3_0 Please switch over immediately. We will do a 1.3.0 release from this branch. Thanks, Bryan PS: branches/BIGDATA_RELEASE_1_2_0 should only be used for bug fixes that would go into a 1.2.4 maintenance release. |
|
From: Bryan T. <br...@sy...> - 2013-09-02 17:53:50
|
All changes through r7381 in the development branch (branches/BIGDATA_RELEASE_1_2_0) have been captured in the HA development branch (branches/READ_CACHE2). At this stage, HA is feature complete and we are (and have been) in QA for an HA release. Owing to difficulties performing a reintegration merge back to the development branch (SF SVN does not support this due to the SVN version on SF), I have instead elected to create a new development branch for 1.3.0. The new development branch is: branches/BIGDATA_RELEASE_1_3_0 Please switch over immediately. We will do a 1.3.0 release from this branch. Thanks, Bryan PS: branches/BIGDATA_RELEASE_1_2_0 should only be used for bug fixes that would go into a 1.2.4 maintenance release. |
|
From: Bryan T. <br...@sy...> - 2013-08-31 10:57:39
|
All, I plan to reconcile the BIGDATA_RELEASE_1_2_0 development branch with the HA development branches (READ_CACHE/READ_CACHE2) and then bring back the changes to the BIGDATA_RELEASE_1_2_0 development branch. I will probably get this done over the weekend. Thanks, Bryan |