From: SourceForge.net <no...@so...> - 2009-10-28 20:39:15
|
Bugs item #2888044, was opened at 2009-10-28 12:01 Message generated for change (Comment added) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2888044&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 47. Bytecode Compiler Group: development: 8.6b1.1 Status: Open Resolution: None Priority: 9 Private: No Submitted By: Andreas Kupries (andreas_kupries) Assigned to: Andreas Kupries (andreas_kupries) Summary: TclCleanupLiteralTable segmentation fault Initial Comment: NEW _________________________________________________________________________________________ Running the tcllib testsuite I find that the tests of the ncgi package crash in the very same same place, with an identical stacktrace, when run by Tcl 8.6 head. Based on the contents of the structures I am very sure that this is the same problem as shown by Tcl's testsuite, described below. Mail 1 _______________________________________________ While evaluating a patch to avoid calling channel outputproc with 0-length writes I came across a crash in the Tcl 8.6 head testsuite, which I have now confirmed to happen in the unpatched core. Context: Platform is linux-ix86 (32bit). The core was built using --enable-symbols=all (and installed in some side directory) The testsuite was run using make test TESTFLAGS='-verbose bpste -singleproc 1' \ 2>&1 | tee LOG The -singleproc 1 is important. Note: On my platform the --enable-symbols=all means that loading and compiling the 2MB clock.test file took around 15 minutes, during which no output happens. The test is __not__ hung. Just be patient. The test suite crashes after httpdold has completed, see log excerpt: ---- httpold-5.1 start ++++ httpold-5.1 PASSED ---- httpold-5.2 start ++++ httpold-5.2 PASSED ---- httpold-5.3 start ++++ httpold-5.3 PASSED ---- httpold-6.1 start ++++ httpold-6.1 PASSED httpold.test: Total 31 Passed 31 Skipped 0 Failed 0 make: *** [test-tcl] Segmentation fault (core dumped) The stacktrace in the core shows that the crash happens in #0 0x400e4ecd in TclCleanupLiteralTable (interp=0x9ea73e8, tablePtr=0x9ea74b4) at /home/andreask/workbench/ZeroWrite/src/tcl86base/generic/tclLiteral.c:131 #1 0x400409cc in DeleteInterpProc (interp=0x9ea73e8) at /home/andreask/workbench/ZeroWrite/src/tcl86base/generic/tclBasic.c:1392 #2 0x4010313b in Tcl_EventuallyFree (clientData=0x9ea73e8, freeProc=0x40040829 <DeleteInterpProc>) at /home/andreask/workbench/ZeroWrite/src/tcl86base/generic/tclPreserve.c:299 #3 0x40040821 in Tcl_DeleteInterp (interp=0x9ea73e8) at /home/andreask/workbench/ZeroWrite/src/tcl86base/generic/tclBasic.c:1284 #4 0x400c7e2c in SlaveObjCmdDeleteProc (clientData=0x9ea73e8) at /home/andreask/workbench/ZeroWrite/src/tcl86base/generic/tclInterp.c:2581 #5 0x40042488 in Tcl_DeleteCommandFromToken (interp=0x8064d78, cmd=0x820f830) at /home/andreask/workbench/ZeroWrite/src/tcl86base/generic/tclBasic.c:2984 #6 0x400c5096 in Tcl_InterpObjCmd (clientData=0x0, interp=0x8064d78, objc=3, objv=0x80669a8) The bug is in TclCleanupLiteralTable itself I believe. According to the comments * The problem is that freeing a literal's internal representation can * delete other literals to which it refers, making nextPtr invalid. * So each time we free an internal rep, we start its bucket over * again. */ Looking at the code, however the "we start its bucket over again" does __not__ happen. There is the didOne flag, which is set, but then the inner loop is not aborted. It is actually run a second time for the same value of entryPtr, and due to objPtr->type now set to NULL it runs the skip branch of the if, and then continues on to traverse the bucket chain from the current point. No reset at all. In our crash it is (according to my investigations) the current entryPtr which actually gets recursively deleted ((*) = removed from the chain), and then the second iteration on this same entryPtr crashes trying to get the typePtr for an objPtr which is freed and located in a freed LiteralEntry structure (Everything is 0x61616161). A possible red-herring, the type of the objPtr in question was 'substcode'. My first attempt at fixing this by adding a 'break;' just after 'didOne = 1;', to actually reset the loop and restart it from the beginning caused 17 tests to fail. It is actually unclear if that happens because (a) now some literals are actually released, or, conversely, (b) if some literals are not released anymore. Or (c) if these tests would even fail without my patch, but are just prevented from being reached by the crash (they are all in interp.test, after http*.test). So, according to annotate the last changes in this area were by dgp (rev 1.20) and dkf (rev 1.25, 1.34) and I hope you can shed some more light on this. The whole area was surrounded by kbk (rev 1.19), so I added you as well to this mail. Ah, yes. This crash doesn't happen for 8.4 nor 8.5 branches. Next steps here ... See if it crashes for regular build as well (no enable-symbols, (x)), and if yes, try to bisect it. (*) freeIntRepProc is called, and TclReleaseLiteral is called recursively on the same objPtr, refCount goes zero, which also removes the LiteralEntry from the chain. (x) Hoping so because then clock.test should not take such effing long time. Mail 2 _____________________________________________________ > Next steps here ... See if it crashes for regular build as well (no enable-symbols, (x)), Yes. It does. > and if yes, try to bisect it. Bisecting tells us that the crash was introduced between 2009-09-11 and 2009-09-12. >From the Changelog. --- tcl-20090911/ChangeLog 2009-09-10 14:20:00.000000000 -0700 +++ tcl-20090912/ChangeLog 2009-09-11 13:13:27.000000000 -0700 @@ -1,3 +1,18 @@ +2009-09-11 Don Porter <dg...@us...> + + * generic/tclBasic.c: Completed the NR-enabling of [subst]. + * generic/tclCmdMZ.c: [Bug 2314561]. + * generic/tclCompCmds.c: + * generic/tclCompile.c: + * generic/tclInt.h: + * tests/coroutine.test: + * tests/parse.test: + +2009-09-11 Donal K. Fellows <dk...@us...> + + * tests/http.test: Added in cleaning up of http tokens for each test + to reduce amount of global-variable pollution. + This is either a problem with FreeSubstCodeInternalRep, or this function simply exposes the fact that a bug in TclCleanupLiteral was not as fixed as previously thought/believed. Looking at FSCIR it looks good, and I believe it is the latter. What happens, I guess, is: CleanupLiteral table frees the substcode intrep, which frees the Bytecode (TclCleanupByteCode), which refers back to the whole literal string, and TclReleaseLiteral's it, which removes it from the bucket chain, and now the next iteration works with the freed literal entry, bang. Except per the comments it should have restarted the loop from the beginning. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2009-10-28 16:39 Message: ... for-old.test for.test history.test if-old.test if.test info.test It is the info.test leaving behind an array variable ::a that causes the breakage. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-10-28 16:09 Message: ... dict.test dstring.test eval.test exec.test execute.test expr-old.test ... ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-10-28 13:56 Message: The interp.test failures are because earlier test files leave a variable ::a behind. Those test files guilty of such poor hygiene are: binary.test clock.test .... <more to come> ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-10-28 13:00 Message: When committing the fix, it likely needs backporting as well. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-10-28 12:58 Message: adding the "break" fixes the reported bug. This permits the test run to continue, exposing other -singleproc 1 bugs in interp.test. Confirm that with a test: make test-tcl TESTFLAGS="-notfile h\*.test -singleproc 1" ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-10-28 12:28 Message: only one of the files "http.test" or "http11.test" needs to be run in sequence with "httpold.test" to trigger the segfault. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2888044&group_id=10894 |