From: SourceForge.net <no...@so...> - 2009-01-12 15:13:08
|
Bugs item #2496162, was opened at 2009-01-09 17:38 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112997&aid=2496162&group_id=12997 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 88. Themed Tk Group: current: 8.6b1 Status: Closed Resolution: Fixed Priority: 7 Private: No Submitted By: Don Porter (dgp) Assigned to: Joe English (jenglish) Summary: test suite segfault Initial Comment: Running the entire Tk test suite I frequently (as in more than once) get a segfault in the test notebook-6. It doesn't happen every time, and I've occasionally seen the segfault elsewhere by varying platform or time of day or phase of moon. I suspect the real trouble is some memory corruption somewhere. Sorry to be vague, but the data is vague. Useful info is that the trouble has just appeared, so check recent commits. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-01-12 15:12 Message: we38NE <a href="http://orgakbwrshiy.com/">orgakbwrshiy</a>, [url=http://seepneximolw.com/]seepneximolw[/url], [link=http://unjnrqftdobo.com/]unjnrqftdobo[/link], http://yamvmbikxxnb.com/ ---------------------------------------------------------------------- Comment By: Joe English (jenglish) Date: 2009-01-11 21:48 Message: Increased literal sharing is a very plausible explanation for why #2492179 manifests now and not before. I think it explains this problem as well: in the presence of Tk_DeleteOptionTable(), it is possible under certain circumstances for GetOptionFromObj() to return a dangling pointer (specifically: if the main tablePtr has been destroyed and recreated and happens to have the same address as before, and the cached pointer points to an Option in a chained table). More literal sharing increases the likelihood of that eventually occurring. Might be worth fixing later, but for now "don't call Tk_DeleteOptionTable" is an appropriate fix. Closing. > That that change has caused so many problems is deeply horrible. It didn't cause any new problems, it just made existing ones more evident :-) ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2009-01-11 11:53 Message: This was the changes to [source], and it's basically meant that literals became a lot more shared. That that change has caused so many problems is deeply horrible. My apologies. ---------------------------------------------------------------------- Comment By: Joe English (jenglish) Date: 2009-01-11 08:39 Message: Just grepped, and it turns out *nobody* calls Tk_DeleteOptionTable() (contrary to the recommendation in SetOptions.3 that "[w]hen an option table is no longer needed Tk_DeleteOptionTable should be called to free all of its resources."). Removed this from ttkNotebook.c, (and made a mental note to amend the manpage to "...Tk_DeleteOptionTable should *not* be called under any circumstances..."). 100 passes of `make test-ttk` with no errors; calling this fixed. Although I still wonder what's going on with NRE that would trigger this. ---------------------------------------------------------------------- Comment By: Joe English (jenglish) Date: 2009-01-11 07:18 Message: More info: the notebook widget is the only thing in tk/generic/ttk/*.c that calls Tk_DeleteOptionTable(). (I don't remember why the notebook widget does so; the other widgets don't bother, since it just causes cache thrashing, and all Tk_OptionTables are automatically cleaned up at interp deletion time). This is probably why the sporadic errors only occur in tests/ttk/notebook.test. Removing these calls from ttkNotebook.c makes the problem go away. However, *adding* a call to Tk_DeleteOptionTable in DestroyWidget (ttkWidget.c) -- which, although pointless, should be safe -- causes all sorts of sporadic failures (coredumps, panics "bad option type in GetObjectForOption", and random configuration-related test suite failures). Reverted Tcl to just prior to the NRE-enabled [source] commit, relinked wish with the above change still in place, and ran the test suite 100 time in a row with no failures or crashes. I still suspect a bug in tkConfig.c, but am at a loss to explain why NRE would trigger it. ---------------------------------------------------------------------- Comment By: Joe English (jenglish) Date: 2009-01-11 06:26 Message: Confirmed: when linked against Tcl revisions prior to the 2009-01-05 commit NRE-enabling [source], this problem does not manifest. From that revision onward, it does (intermittently; but if I run the test suite in a loop for long enough, it eventually crashes). Same thing for config.test failures reported in #2492179. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-01-09 19:40 Message: caught it again, this time at line 2136 of tkConfig.c. The contents of *tablePtr in my mem-debug configuration indicate it's pointing at free()d memory. ---------------------------------------------------------------------- Comment By: Joe English (jenglish) Date: 2009-01-09 19:33 Message: Bug # 2492179 (reported 2009-01-07) is also Tk_SetOptions - related. That one is replicable, might be easier to track down. ---------------------------------------------------------------------- Comment By: Joe English (jenglish) Date: 2009-01-09 19:11 Message: Was able to replicate -- nondeterministically -- crash happens somewhere in tkConfig.c, called from ttkNotebook.c(ConfigureTab). (Sorry, no backtrace, core file munged and once again fail to replicate). Rebuilt with -DPURIFY, saw -- nondeterministically -- a different set of errors coming from notebook.test, all look like they're related to Tk_SetOptions; looks like a corrupted tabOptionTable. Cannot replicate now. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-01-09 19:02 Message: tablePtr had the value 0x6 at that crash. FWIW, not able to reproduce it. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-01-09 18:53 Message: caught this one in a debugger. Speculating it's the same issue: ---- config-1.9 start Program received signal SIGSEGV, Segmentation fault. 0x0806bb2f in TkDebugConfig (interp=0x8ab0628, table=0x9038d28) at /home/dgp/cvs/tk/unix/../generic/tkConfig.c:2132 2132 Tcl_ListObjAppendElement(NULL, objPtr, (gdb) bt #0 0x0806bb2f in TkDebugConfig (interp=0x8ab0628, table=0x9038d28) at /home/dgp/cvs/tk/unix/../generic/tkConfig.c:2132 #1 0x0805c79c in TestobjconfigObjCmd (clientData=0x8ae7858, interp=0x8ab0628, objc=3, objv=0x8bd0f94) at /home/dgp/cvs/tk/unix/../generic/tkTest.c:944 #2 0x0816906e in NRRunObjProc (data=0x9038c9c, interp=0x8ab0628, result=0) at /home/dgp/cvs/tcl-only/generic/tclBasic.c:4302 #3 0x08168dc6 in TclNRRunCallbacks (interp=0x8ab0628, result=0, rootPtr=0x8ec5ba0, tebcCall=1) at /home/dgp/cvs/tcl-only/generic/tclBasic.c:4240 #4 0x081c9b56 in TclExecuteByteCode (interp=0x8ab0628, codePtr=0x8fd6f58) at /home/dgp/cvs/tcl-only/generic/tclExecute.c:2814 #5 0x0816910d in NRCallTEBC (data=0x8eb4fd4, interp=0x8ab0628, result=0) at /home/dgp/cvs/tcl-only/generic/tclBasic.c:4324 #6 0x08168dc6 in TclNRRunCallbacks (interp=0x8ab0628, result=0, rootPtr=0x8baba70, tebcCall=0) at /home/dgp/cvs/tcl-only/generic/tclBasic.c:4240 #7 0x081689ac in Tcl_EvalObjv (interp=0x8ab0628, objc=5, objv=0x8bd0988, flags=2097152) at /home/dgp/cvs/tcl-only/generic/tclBasic.c:4023 #8 0x0816a748 in TclEvalEx (interp=0x8ab0628, script=0x8cfd03d "if {[singleProcess]} {\n\t incr numTestFiles\n\t uplevel 1 [list ::source $file]\n\t} else {\n\t # Pass along our configuration to the child processes.\n\t # EXCEPT for the -outfile, because the par"..., numBytes=1458, flags=0, line=1) at /home/dgp/cvs/tcl-only/generic/tclBasic.c:5117 Further speculation is that it's related the the recent NRE-enabling of [source]. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-01-09 17:56 Message: ....and just now again in notebook-6.12 ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2009-01-09 17:50 Message: just saw the segfault in listbox-21.12. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112997&aid=2496162&group_id=12997 |