From: Jeremy J C. <jj...@sy...> - 2014-10-24 23:34:39
|
Hi Bryan et al, concerning the underlying bug that triggered the more significant 1026, this is the best I can come up with as an error report. This is a long way short of a repro, although I can, given some hours, get a report on my system. Jeremy > On Oct 24, 2014, at 2:57 PM, TRAC Bigdata <no...@sy...> wrote: > > #1028: very rare NotMaterializedException: XSDBoolean(true) > ---------------------------+----------------------------------- > Reporter: jeremycarroll | Owner: thompsonbry > Type: defect | Status: new > Priority: major | Milestone: > Component: Query Engine | Version: BIGDATA_RELEASE_1_3_1 > Keywords: | > ---------------------------+----------------------------------- > When running a soak test with characteristics to follow, I get a very hard > to understand error, very rarely, and in conditions I have failed to > replicate other than in my own test harness (that is testing my own code, > not bigdata). > > My code interacts with bigdata only through the http interface to the NSS. > Enabling logging on bigdata makes the problem vanish, but I can reproduce > the problem with logging in my code. In particular enabling the > ASTEvalHelper log makes the problem disappear. > > Attached are 6 logs: > - a log of the NSS being basically the stdout - showing two stack traces, > one at approx 14:00:29, which is the one for which I had the other logging > enabled > - five logs from my code, one for each of five different namespaces being > used in the period 21:00:28 to 21:00:29 (note the 7hour time zone > difference) > The update that failed is in sparqlu2iDMc.log.part > > I note that the error concerns a boolean(true) but there are no such > values in any of the logs. There were some boolean(true)s being used in > earlier completed queries and updates; and I would expect some > boolean(true) values to be in the triple store. > > The version of bigdata I was running is 1.3.2 + five additional commits > and patches as agreed with Systap, in particular a patch fixing 1026. > > The test itself has the following characteristics. > Every forty minutes there is a new round of tests. > There are five concurrent parts to the test, each of which is identical. > Each part creates a namespaces, does some operations, maybe taking 15 > minutes over a 35 minute period, and then deletes the namespace. > The namespace names are reused, not on every round, but ... I have 15 > namespace names, and at any time 5 are in use. Each part logs all the > queries and updates it is sending in several separate log files. > > Typically each query involves resources that start with a URI > http://localsyapsehost:NNNNN/ where the number NNNNN is assigned by > jenkins differently to each of the five parts, hence it is easy to tell > the queries from each namespace apart. > > I have not seen the problem in the first round of testing (i.e. the first > 40 minutes), and I believe it requires the reuse of namespace names to be > seen; on the other hand, reusing a namespace name does not guarantee an > issue. It typically takes 3 or 4 hours to get a single fault. Staggering > the parts by one minute also seems to make the problem go away. > I have seen the error report occur in SELECT queries as well as UPDATEs > (this particular instance is an update). > The error always seem to occur in only two parts of my test suite, this is > one of them. > > I have longer logs but not complete logs of all the operations since the > beginning of the journal file. > > -- > Ticket URL: <http://trac.bigdata.com/ticket/1028> > Bigdata <trac.bigdata.com> > Bigdata Triple Store |