This list is closed, nobody may subscribe to it.
2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Toby C. <tob...@gm...> - 2014-05-05 16:41:47
|
Thanks Jeremy, great to get feedback. I am inexperienced with bigdata itself so I'm sure there are many improvements that can be made to the workbench - let me know what you are after & I will implement it. As Mike said, Trac is probably the easiest way to keep on top of things. On 5 May 2014 09:17, Jeremy Carroll <jj...@gm...> wrote: > > One problem with comments is that they tend to focus on the negative - > over all this is a great piece of work and a huge improvement: the old Web > UI while very usable and functional, was always a little embarrassing when > showing to colleagues because of the 20th century look and feel. > > > Here goes: > > 1) Empty results are not presented well. > > Try queries like: > > SELECT * {} > > SELECT * { {} UNION {} } > > which should be visually distinguishable > > also > > 2) height of empty rows > SELECT * { BIND (1 as ?x) } > > SELECT * { { BIND (1 as ?x) } UNION {} } > - the empty row should be the same height as the non-empty row > > 3) FOAF and DC button adds <li> > > 4) Personally I would use a SKOS button in that row > > 5) presentation of syntax errors should be improved > > e.g. try the following query > > prefix dc: <http://purl.org/dc/elements/1.1></li> > SELECT ?x ?y { { BIND (1 as ?x) } } > > e.g. extract the title element out of the 400 response if it matches the > pattern for a syntax error message present just that tile with appropriate > (lack of) escaping, and extract the line and column number, and present the > query with some highlighting of the problem error > > > 6) presentation of stack traces still not ideal > - this does not happen very much … > e.g. > modify for > example com.bigdata.rdf.sparql.ast.optimizers.AbstractJoinGroupOptimizer.optimize(AST2BOpContext, > IQueryNode, IBindingSet[]) > by adding the following line as the first line of the method: > > if (true) throw new RuntimeException("banana"); > > then run any query > > The stack trace is not correctly formatted. > A good behavior in my view would be to: > a) extract the message of the first java exception i.e. in this case the > word "banana" > b) display a message like > > Internal Error: banana > > c) Have a button "Technical Details" > which when clicked pops up a text box with <pre> </pre> text being the > stack trace …. > > > ==== > > Which of these are more or less critical is not my call, and overall this > is a very big improvement > > Jeremy > > > > > > > ------------------------------------------------------------------------------ > Is your legacy SCM system holding you back? Join Perforce May 7 to find > out: > • 3 signs your SCM is hindering your productivity > • Requirements for releasing software faster > • Expert tips and advice for migrating your SCM now > http://p.sf.net/sfu/perforce > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Mike P. <mi...@sy...> - 2014-05-05 16:21:01
|
Thanks Jeremy. Feedback is absolutely welcome. Please feel free to create Trac tickets and assign them to Toby Craig, the mastermind behind the new UI. I've cc'd him on this reply. From: Jeremy Carroll <jj...@gm...<mailto:jj...@gm...>> Date: Monday, May 5, 2014 10:17 AM To: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: [Bigdata-developers] comments on new Web UI One problem with comments is that they tend to focus on the negative - over all this is a great piece of work and a huge improvement: the old Web UI while very usable and functional, was always a little embarrassing when showing to colleagues because of the 20th century look and feel. Here goes: 1) Empty results are not presented well. Try queries like: SELECT * {} SELECT * { {} UNION {} } which should be visually distinguishable also 2) height of empty rows SELECT * { BIND (1 as ?x) } SELECT * { { BIND (1 as ?x) } UNION {} } - the empty row should be the same height as the non-empty row 3) FOAF and DC button adds <li> 4) Personally I would use a SKOS button in that row 5) presentation of syntax errors should be improved e.g. try the following query prefix dc: <http://purl.org/dc/elements/1.1></li> SELECT ?x ?y { { BIND (1 as ?x) } } e.g. extract the title element out of the 400 response if it matches the pattern for a syntax error message present just that tile with appropriate (lack of) escaping, and extract the line and column number, and present the query with some highlighting of the problem error 6) presentation of stack traces still not ideal - this does not happen very much … e.g. modify for example com.bigdata.rdf.sparql.ast.optimizers.AbstractJoinGroupOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) by adding the following line as the first line of the method: if (true) throw new RuntimeException("banana"); then run any query The stack trace is not correctly formatted. A good behavior in my view would be to: a) extract the message of the first java exception i.e. in this case the word "banana" b) display a message like Internal Error: banana c) Have a button "Technical Details" which when clicked pops up a text box with <pre> </pre> text being the stack trace …. ==== Which of these are more or less critical is not my call, and overall this is a very big improvement Jeremy |
From: Jeremy C. <jj...@gm...> - 2014-05-05 16:17:39
|
One problem with comments is that they tend to focus on the negative - over all this is a great piece of work and a huge improvement: the old Web UI while very usable and functional, was always a little embarrassing when showing to colleagues because of the 20th century look and feel. Here goes: 1) Empty results are not presented well. Try queries like: SELECT * {} SELECT * { {} UNION {} } which should be visually distinguishable also 2) height of empty rows SELECT * { BIND (1 as ?x) } SELECT * { { BIND (1 as ?x) } UNION {} } - the empty row should be the same height as the non-empty row 3) FOAF and DC button adds <li> 4) Personally I would use a SKOS button in that row 5) presentation of syntax errors should be improved e.g. try the following query prefix dc: <http://purl.org/dc/elements/1.1></li> SELECT ?x ?y { { BIND (1 as ?x) } } e.g. extract the title element out of the 400 response if it matches the pattern for a syntax error message present just that tile with appropriate (lack of) escaping, and extract the line and column number, and present the query with some highlighting of the problem error 6) presentation of stack traces still not ideal - this does not happen very much … e.g. modify for example com.bigdata.rdf.sparql.ast.optimizers.AbstractJoinGroupOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) by adding the following line as the first line of the method: if (true) throw new RuntimeException("banana"); then run any query The stack trace is not correctly formatted. A good behavior in my view would be to: a) extract the message of the first java exception i.e. in this case the word "banana" b) display a message like Internal Error: banana c) Have a button "Technical Details" which when clicked pops up a text box with <pre> </pre> text being the stack trace …. ==== Which of these are more or less critical is not my call, and overall this is a very big improvement Jeremy |
From: Jeremy J C. <jj...@sy...> - 2014-05-05 12:26:12
|
The changes had nothing to do with the graph issue I don't think (not verified); 874 is not fixed (disappointingly - verified only at the optimized AST level), 759 is (surprisingly - and closed as worksforme) Jeremy On May 5, 2014, at 5:05 AM, Bryan Thompson <br...@sy...> wrote: > Jeremy, > > Can you look over the tickets at trac.bigdata.com and see which ones can > be closed? In particular, I am curious whether any of the other FILTER > tickets are related to and/or fixed by your recent changes or if you are > actively working on any of these issues and expect resolution before the > 1.3.1 release. > > http://trac.bigdata.com/ticket/888 (GRAPH ignored by FILTER NOT EXISTS) > http://trac.bigdata.com/ticket/874 (FILTER not applied when there is UNION > in the same join group) > http://trac.bigdata.com/ticket/792 (GRAPH ?g { FILTER NOT EXISTS { ?s ?p > ?o } } not respecting ?g) > http://trac.bigdata.com/ticket/759 (multiple filters interfere) > > > Thanks, > Bryan > > On 5/4/14 9:12 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >> A quick inspection revealed a new file that I forgot to add Š >> I will check in a couple of hours to see that that fixed things >> >> Jeremy >> >> >> >> On May 4, 2014, at 6:09 PM, Jeremy J Carroll <jj...@sy...> wrote: >> >>> >>> Looks like my last commit broke something Š checking now >>> >>> Jeremy J Carroll >>> Principal Architect >>> Syapse, Inc. >>> >>> >>> >> >> >> -------------------------------------------------------------------------- >> ---- >> Is your legacy SCM system holding you back? Join Perforce May 7 to find >> out: >> • 3 signs your SCM is hindering your productivity >> • Requirements for releasing software faster >> • Expert tips and advice for migrating your SCM now >> http://p.sf.net/sfu/perforce >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Bryan T. <br...@sy...> - 2014-05-05 12:11:51
|
UNION is implemented as a parallel flow of solutions over alternative sub-plans. It should be very fast. The DISTINCT UNION pattern imposes a concurrent hash map to filter for the distinct projection, which is also very fast (a different data structure with less potential concurrency is used for the analytic mode distinct). The FILTER (NOT) EXISTS patterns use a solution set hash join. This requires building a hash index, flowing the solutions from the hash index into the sub-plan for the FILTER, and then doing the appropriate hash join for the solutions flowing out of the sub-plan for the FILTER with those in the hash index. I wonder whether there might be an optimization available for FILTER EXISTS and FILTER NOT EXISTS for some interesting special cases. For example, if we have an OPTIONAL with a single triple pattern, we can often optimize that into a pipelined join and then just mark the join semantics as OPTIONAL. If we could do something similar for simple tests for the existence of a triple pattern if might be faster than the sub-plan with the solution set hash index build and solution set hash index join. Bryan On 5/2/14 2:13 PM, "Jeremy J Carroll" <jj...@sy...> wrote: >I implement fine grained access control using additional conditions that >must hold on most queries: the corresponding SPARQL patterns get inserted >into (nearly) all queries at the right point(s). > >This is using a match that is currently expressed as a UNION, and I am >using a SELECT DISTINCT effectively the same as a FILTER EXISTS to make >sure that at least one of the conditions in the union hold. > >I can see three different ways of expressing this Š (I simplify the >example) > > >1) > >pattern binding ?foo > >{ SELECT DISTINCT ?foo { > { ?foo eg:p eg: a } UNION { ?foo eg:q eg:b } >} } > >2) >pattern binding ?foo > >FILTER EXISTS { > { ?foo eg:p eg: a } UNION { ?foo eg:q eg:b } >} > >3) > >pattern binding ?foo > >FILTER EXISTS { ?foo eg:p eg: a } || EXISTS { ?foo eg:q eg:b } > >where actually I have three alternatives on my union, and two of them are >three triple matches > >Is any one of these likely to be faster than the others Š (I do this a >lot). In particular, one implementation might evaluate in parallel, and >abort as soon as one of the alternatives is true. > >I guess also, if one of these constructs is better, should we have an >optimizer that maps the other(s) into it. > >[I have already convinced myself that: >a) for my data I wish this to be done after ?foo is bound rather than >before >b) I need to use query hints or explicit management of named subqueries >or similar to ensure that this happens >c) at some point in the future (a) and (b) may not be optimal for my >application (e.g. when we have a lot of users and each user has >relatively small amounts of data, and has privacy set fairly high: the >FILTER implements our fine-grained access control), but we will cross >that bridge when we get to it >] > >Jeremy > > > > >-------------------------------------------------------------------------- >---- >"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >Instantly run your Selenium tests across 300+ browser/OS combos. Get >unparalleled scalability from the best Selenium testing platform >available. >Simple to use. Nothing to install. Get started now for free." >http://p.sf.net/sfu/SauceLabs >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Bryan T. <br...@sy...> - 2014-05-05 12:07:34
|
Jeremy, Can you look over the tickets at trac.bigdata.com and see which ones can be closed? In particular, I am curious whether any of the other FILTER tickets are related to and/or fixed by your recent changes or if you are actively working on any of these issues and expect resolution before the 1.3.1 release. http://trac.bigdata.com/ticket/888 (GRAPH ignored by FILTER NOT EXISTS) http://trac.bigdata.com/ticket/874 (FILTER not applied when there is UNION in the same join group) http://trac.bigdata.com/ticket/792 (GRAPH ?g { FILTER NOT EXISTS { ?s ?p ?o } } not respecting ?g) http://trac.bigdata.com/ticket/759 (multiple filters interfere) Thanks, Bryan On 5/4/14 9:12 PM, "Jeremy J Carroll" <jj...@sy...> wrote: >A quick inspection revealed a new file that I forgot to add Š >I will check in a couple of hours to see that that fixed things > >Jeremy > > > >On May 4, 2014, at 6:09 PM, Jeremy J Carroll <jj...@sy...> wrote: > >> >> Looks like my last commit broke something Š checking now >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> > > >-------------------------------------------------------------------------- >---- >Is your legacy SCM system holding you back? Join Perforce May 7 to find >out: >• 3 signs your SCM is hindering your productivity >• Requirements for releasing software faster >• Expert tips and advice for migrating your SCM now >http://p.sf.net/sfu/perforce >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Jeremy J C. <jj...@sy...> - 2014-05-05 01:17:15
|
Looks like my last commit broke something … checking now Jeremy J Carroll Principal Architect Syapse, Inc. |
From: Jeremy J C. <jj...@sy...> - 2014-05-05 01:12:47
|
A quick inspection revealed a new file that I forgot to add … I will check in a couple of hours to see that that fixed things Jeremy On May 4, 2014, at 6:09 PM, Jeremy J Carroll <jj...@sy...> wrote: > > Looks like my last commit broke something … checking now > > Jeremy J Carroll > Principal Architect > Syapse, Inc. > > > |
From: Jeremy J C. <jj...@sy...> - 2014-05-04 21:57:24
|
Just a word of explanation. I am continuing to explore issues with FILTER EXISTS and found the debug print out inadequate because the details of the JoinGroup in the FILTER were not shown. This commit fixes that, but touched a surprisingly large number of files because it involved a more general support for toString(int indent), even if I only add additional uses of the method in the fairly limited area of FILTER EXISTS and FILTER NOT EXISTS I also shortened some of the annotation names in the debug print out, e.g. AST2BOpBase.estimatedCardinality instead of com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality An issue which I could address if desired, is that now after the ASK subqueries are lifted out of the EXISTS or NOT EXISTS the debug print out shows the JoinGroup twice - once as a subquery and once as an annotation on the Exists node. (An approach is either - add another annotation to control the print out of the graphPattern annotation or to remove the graphPattern annotation once we have moved it elsewhere) Jeremy |
From: Jeremy J C. <jj...@sy...> - 2014-05-03 23:09:03
|
> > (Hopefully I will have fixed the server side issue by the time you read the message :) …. ) My hope is unfulfilled - this proved harder than I imagined, although I did get two simple fixes in on the way. I have updated the trac item 904 with the result of my investigation, including a work-around which scratches my immediate itch. Jeremy |
From: Toby C. <tob...@gm...> - 2014-05-03 21:38:13
|
Just committed a fix for that so it shows the whole error response. Error highlighting that shows the exact line/character the error is present at is under development, should be ready soon. On 3 May 2014 14:09, Bryan Thompson <br...@sy...> wrote: > Excellent. Thanks, Bryan > > > On May 3, 2014, at 3:39 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > > > > > > I have just being doing some bug fixing on 1.3.0 branch, and see the new > web UI for the first time. > > It looks much much better. > > > > One issue is that the error reporting for something like: > > > > prefix owl: <http://www.w3.org/2002/07/owl#> > > prefix syapse: <https://test-t-jjc.syapse.com/graph/syapse#> > > SELECT * > > { > > > > FILTER ( EXISTS { > > ?property syapse:hasLiteralProperty ?q . > > ?x ?property ?concept . > > } || EXISTS { > > ?r rdfs:subClassOf + syapse:Record . > > } || EXISTS { > > ?r owl:oneOf ?rr . > > }) > > } > > > > which fails with a server error, is poor, in that I just see one line > saying server error. I guess that may be desirable, but at least for > developers having a wizard mode option that allows you to see the whole > stack trace as before is helpful > > > > > > (Hopefully I will have fixed the server side issue by the time you read > the message :) …. ) > > > > Jeremy > > > > > > > > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > unparalleled scalability from the best Selenium testing platform > available. > > Simple to use. Nothing to install. Get started now for free." > > http://p.sf.net/sfu/SauceLabs > > _______________________________________________ > > Bigdata-developers mailing list > > Big...@li... > > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Bryan T. <br...@sy...> - 2014-05-03 21:10:03
|
Excellent. Thanks, Bryan > On May 3, 2014, at 3:39 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > > > I have just being doing some bug fixing on 1.3.0 branch, and see the new web UI for the first time. > It looks much much better. > > One issue is that the error reporting for something like: > > prefix owl: <http://www.w3.org/2002/07/owl#> > prefix syapse: <https://test-t-jjc.syapse.com/graph/syapse#> > SELECT * > { > > FILTER ( EXISTS { > ?property syapse:hasLiteralProperty ?q . > ?x ?property ?concept . > } || EXISTS { > ?r rdfs:subClassOf + syapse:Record . > } || EXISTS { > ?r owl:oneOf ?rr . > }) > } > > which fails with a server error, is poor, in that I just see one line saying server error. I guess that may be desirable, but at least for developers having a wizard mode option that allows you to see the whole stack trace as before is helpful > > > (Hopefully I will have fixed the server side issue by the time you read the message :) …. ) > > Jeremy > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Jeremy J C. <jj...@sy...> - 2014-05-03 19:39:53
|
I have just being doing some bug fixing on 1.3.0 branch, and see the new web UI for the first time. It looks much much better. One issue is that the error reporting for something like: prefix owl: <http://www.w3.org/2002/07/owl#> prefix syapse: <https://test-t-jjc.syapse.com/graph/syapse#> SELECT * { FILTER ( EXISTS { ?property syapse:hasLiteralProperty ?q . ?x ?property ?concept . } || EXISTS { ?r rdfs:subClassOf + syapse:Record . } || EXISTS { ?r owl:oneOf ?rr . }) } which fails with a server error, is poor, in that I just see one line saying server error. I guess that may be desirable, but at least for developers having a wizard mode option that allows you to see the whole stack trace as before is helpful (Hopefully I will have fixed the server side issue by the time you read the message :) …. ) Jeremy |
From: Jeremy J C. <jj...@sy...> - 2014-05-02 18:43:04
|
I implement fine grained access control using additional conditions that must hold on most queries: the corresponding SPARQL patterns get inserted into (nearly) all queries at the right point(s). This is using a match that is currently expressed as a UNION, and I am using a SELECT DISTINCT effectively the same as a FILTER EXISTS to make sure that at least one of the conditions in the union hold. I can see three different ways of expressing this … (I simplify the example) 1) pattern binding ?foo { SELECT DISTINCT ?foo { { ?foo eg:p eg: a } UNION { ?foo eg:q eg:b } } } 2) pattern binding ?foo FILTER EXISTS { { ?foo eg:p eg: a } UNION { ?foo eg:q eg:b } } 3) pattern binding ?foo FILTER EXISTS { ?foo eg:p eg: a } || EXISTS { ?foo eg:q eg:b } where actually I have three alternatives on my union, and two of them are three triple matches Is any one of these likely to be faster than the others … (I do this a lot). In particular, one implementation might evaluate in parallel, and abort as soon as one of the alternatives is true. I guess also, if one of these constructs is better, should we have an optimizer that maps the other(s) into it. [I have already convinced myself that: a) for my data I wish this to be done after ?foo is bound rather than before b) I need to use query hints or explicit management of named subqueries or similar to ensure that this happens c) at some point in the future (a) and (b) may not be optimal for my application (e.g. when we have a lot of users and each user has relatively small amounts of data, and has privacy set fairly high: the FILTER implements our fine-grained access control), but we will cross that bridge when we get to it ] Jeremy |
From: Bryan T. <br...@sy...> - 2014-05-01 20:41:31
|
I am targeting a release next week. All activity in the main branch should be to nail things down for that release. Bryan -------- Original message -------- From: Bryan Thompson Date:05/01/2014 4:28 PM (GMT-05:00) To: Big...@li... Subject: [Bigdata-developers] 1.3.0 CI results See http://ci.bigdata.com:8080/job/bigdata-release-1.3.0/ The branch is open again. Let's try to clear up the remaining test failures for the 1.3.1 release. If you are doing anything major, create a new branch for that now and let me know about the branch and the activity so I can track it. Bryan |
From: Bryan T. <br...@sy...> - 2014-05-01 20:28:01
|
See http://ci.bigdata.com:8080/job/bigdata-release-1.3.0/ The branch is open again. Let's try to clear up the remaining test failures for the 1.3.1 release. If you are doing anything major, create a new branch for that now and let me know about the branch and the activity so I can track it. Bryan |
From: Bryan T. <br...@sy...> - 2014-05-01 16:35:41
|
I have committed the merge. Please checkout a clean copy of this branch [1] and then verify the correct functioning of the features that you worked on in the RDR branch. There was a lot of churn in this branch and the merge was quite difficult. https://svn.code.sf.net/p/bigdata/code/branches/BIGDATA_RELEASE_1_3_0 The first CI run will be available in a few hours. Please hold off on new commits against the main branch until we have verification both locally (from your own tests) and from CI that the main branch is in good working order. I ran a lot of tests locally, but there were a lot of tree conflicts and I want to make sure that everything is Ok. The RDR branch is CLOSED. Thanks, Bryan |
From: Bryan T. <br...@sy...> - 2014-05-01 00:01:37
|
Try a heap dump and look at the memory referenced from the MemoryOrderBy operator during the sort. That will give you exact information, but only for the referenced query. Bryan On Apr 30, 2014, at 7:52 PM, "Jeremy Carroll" <jj...@gm...<mailto:jj...@gm...>> wrote: Did my 32k per solution seem excessive to you? It may be that I am misestimating the lower end of the curve …. i.e. i take the heap size subtract quite a bit, and then divide by some more modest number to get a reasonable limit … At this stage I just need some fairly conservative figures, which can be optimized later when we have a performance test suite. Jeremy On Apr 30, 2014, at 4:11 PM, Bryan Thompson <br...@sy...<mailto:br...@sy...>> wrote: The memory demand depends on the number of variables in those solutions and even the materialized size of the rdf values in those solutions. Yes, a sub-select with a limit is a reasonable approach. |
From: Jeremy C. <jj...@gm...> - 2014-04-30 23:52:57
|
Did my 32k per solution seem excessive to you? It may be that I am misestimating the lower end of the curve …. i.e. i take the heap size subtract quite a bit, and then divide by some more modest number to get a reasonable limit … At this stage I just need some fairly conservative figures, which can be optimized later when we have a performance test suite. Jeremy On Apr 30, 2014, at 4:11 PM, Bryan Thompson <br...@sy...> wrote: > The memory demand depends on the number of variables in those solutions and even the materialized size of the rdf values in those solutions. > > Yes, a sub-select with a limit is a reasonable approach. |
From: Bryan T. <br...@sy...> - 2014-04-30 23:12:09
|
The sort is on solutions, not triples. The memory demand depends on the number of variables in those solutions and even the materialized size of the rdf values in those solutions. Yes, a sub-select with a limit is a reasonable approach. We do not have an external memory sort yet. The challenge has been the ordering semantics of sparql for its runtime typing of Literals. The existing index classes (htree and BTree) require a total ordering for the key rather than a comparator. Combined with the sparql sorting rules, this makes it difficult to use them for an external memory sort. Probably the best path is a partitioned sort by literal type (they can be compared for similar types) with a merged iteration over those partitions. Bryan On Apr 30, 2014, at 7:02 PM, "Jeremy Carroll" <jj...@gm...<mailto:jj...@gm...>> wrote: Hi I believe I avoid GC overhead exceeded messages by limiting the amount of sorting intermediate result sets needed. What are realistic sizing guidelines? === The detail in my case is as follows: I generate SPARQL queries in response to the user specifying their intent on our advanced search UI We always give them results back in 'pages' of say 20 items, and they can then go through page by page. We allow the user to sort these results (in fact they are always sorted) by clicking on column headings. These clicks control the ORDER BY modifier, the paging is by the OFFSET and LIMIT modifiers. Obviously sometimes they ask queries that are not so well thought out and there are thousands of results (realistically other aspects of our design cap the number at an order of magnitude 1,000,000) I am doing scale testing, and if I have a large dataset (quarter of a billion triples - which is large for me). and a result set of 150,000 items, then an appropriate sized machine (30 GB core, journal on SSD, java of size 20GB) does fine with the results coming back in a few seconds (AWS c3.4xlarge) OTOH if I have a machine that is too small (3.75 GB of core, 2GB Java heap, journal on EBS provisioned IIOPs but without EBS optimization), then while 'easier' queries are fine, the same 150,000 result query (taking the first 20), takes two minutes, which is unacceptable, Clearly the right thing to do is to buy a bigger machine, in the cases where we have the larger data sizes. However, from time to time, we may find that we have under-provisioned, and so I am considering putting a LIMIT 50000 on an unsorted subquery, and then sorting only the first 50000 entries (this is on the small machine). I believe this will work in terms of avoiding the GC Overhead exceeded message. (Our team has a strong prejudice against Java OOMEs: we have a collective preference to treat OOMEs as fatal requiring a restart, and are somewhat suspect of bigdata's attempt to continue despite OOMEs). My question is the number 50000 for a 2GB heap size is pretty arbitrary, but may work, what are reasonable policies for limiting these on-heap sorts by Java heap size, and if say we have a 30 GB machine, what is a good way to divide the memory up. I am thinking of a table maybe like: Memory size GB Java Heap GB LIMIT 1.7 0.96 21000 3.75 2.6 56875 7.5 5.6 122500 30 23.6 516250 which is allowing 32000 bytes per item being sorted in memory - which seems enormous! I observe that on-disk a triple is about 200 bytes, and I remember that in Jena a triple in memory is about 2000 bytes, I guess a sort item here may be more than a single triple … any thoughts? Jeremy <ATT00001.c> <ATT00002.c> |
From: Jeremy C. <jj...@gm...> - 2014-04-30 23:02:26
|
Hi I believe I avoid GC overhead exceeded messages by limiting the amount of sorting intermediate result sets needed. What are realistic sizing guidelines? === The detail in my case is as follows: I generate SPARQL queries in response to the user specifying their intent on our advanced search UI We always give them results back in 'pages' of say 20 items, and they can then go through page by page. We allow the user to sort these results (in fact they are always sorted) by clicking on column headings. These clicks control the ORDER BY modifier, the paging is by the OFFSET and LIMIT modifiers. Obviously sometimes they ask queries that are not so well thought out and there are thousands of results (realistically other aspects of our design cap the number at an order of magnitude 1,000,000) I am doing scale testing, and if I have a large dataset (quarter of a billion triples - which is large for me). and a result set of 150,000 items, then an appropriate sized machine (30 GB core, journal on SSD, java of size 20GB) does fine with the results coming back in a few seconds (AWS c3.4xlarge) OTOH if I have a machine that is too small (3.75 GB of core, 2GB Java heap, journal on EBS provisioned IIOPs but without EBS optimization), then while 'easier' queries are fine, the same 150,000 result query (taking the first 20), takes two minutes, which is unacceptable, Clearly the right thing to do is to buy a bigger machine, in the cases where we have the larger data sizes. However, from time to time, we may find that we have under-provisioned, and so I am considering putting a LIMIT 50000 on an unsorted subquery, and then sorting only the first 50000 entries (this is on the small machine). I believe this will work in terms of avoiding the GC Overhead exceeded message. (Our team has a strong prejudice against Java OOMEs: we have a collective preference to treat OOMEs as fatal requiring a restart, and are somewhat suspect of bigdata's attempt to continue despite OOMEs). My question is the number 50000 for a 2GB heap size is pretty arbitrary, but may work, what are reasonable policies for limiting these on-heap sorts by Java heap size, and if say we have a 30 GB machine, what is a good way to divide the memory up. I am thinking of a table maybe like: Memory size GB Java Heap GB LIMIT 1.7 0.96 21000 3.75 2.6 56875 7.5 5.6 122500 30 23.6 516250 which is allowing 32000 bytes per item being sorted in memory - which seems enormous! I observe that on-disk a triple is about 200 bytes, and I remember that in Jena a triple in memory is about 2000 bytes, I guess a sort item here may be more than a single triple … any thoughts? Jeremy |
From: Bryan T. <br...@sy...> - 2014-04-30 20:27:53
|
Just FYI, this merge is going to take a while. It will not be done today. Bryan > On Apr 30, 2014, at 2:45 PM, "Bryan Thompson" <br...@sy...> wrote: > > I am going to merge down the RDR branch into the main branch today. Please synchronize now or apply your outstanding changes to the main branch after this merge. > > The RDR will be closed after this merge. > > Bryan > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Bryan T. <br...@sy...> - 2014-04-30 18:45:47
|
I am going to merge down the RDR branch into the main branch today. Please synchronize now or apply your outstanding changes to the main branch after this merge. The RDR will be closed after this merge. Bryan |
From: Bryan T. <br...@sy...> - 2014-04-29 19:20:02
|
I would like to merge the RDR branch back into the 1.3.0 development and maintenance branch in preparation for a 1.3.1 release. Can anyone who is NOT ready for that merge speak up? If I do not hear otherwise, I will get this done over the next few days. Thanks, Bryan |
From: Bryan T. <br...@sy...> - 2014-04-29 18:17:18
|
Toby, After a bit of a go-around with the jetty and http, I have decided to parameterize the requestURL for the HA load balancer. It looks like this is the only practical way to have low-latency asynchronous http proxying of the request. Anyway, the practical impact is that the workbench needs to use the appropriate URL. http://host:port/bigdata/LBS/leader/ - The request is proxied to the quorum leader (read/write). or http://host:port/bigdata/LBS/read/ - The request is load balanced over the services joined with the met quorum (read-only). or http://host:port/bigdata/ - The request is handled by the local service. For non-HA modes, the HALoadBalancerServlet will simply strip off the /LBS/leader and …/LBS/read prefix and do an internal servlet forward for the non-HA modes. Thus, we can use the same URLs in the non-HA deployments. This internal servlet forward will not impose any overhead to speak of. I am making these changes in the RDR branch. Once they are tested, you should update the workbench to use the …/LBS/(leader|read) URLs for SPARQL QUERY versus SPARQL UPDATE (or other non-idempotent REST API methods). See http://wiki.bigdata.com/wiki/index.php/HALoadBalancer#Configuration for more information. See http://trac.bigdata.com/ticket/624 (HALoadBalancer) Thanks, Bryan |