From: SourceForge.net <no...@so...> - 2009-12-26 02:20:34
|
Bugs item #1909647, was opened at 2008-03-07 15:47 Message generated for change (Comment added) made by sf-robot You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1909647&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 69. Other Group: obsolete: 8.5.4 >Status: Closed Resolution: None Priority: 4 Private: No Submitted By: Carlos Tasada (ctasada) Assigned to: miguel sofer (msofer) Summary: TclStackFree: incorrect freePtr. Call out of sequence? Initial Comment: Hi guys, We're using ActiveTcl 8.5.1, bytecoded with tbcload1.7. From time to time, the application crashes with a "TclStackFree: incorrect freePtr. Call out of sequence?", but so far this happens only in Linux. We're using threads and sockets actively and usually the crash appears just after a "puts" in a file or after a socket is closed. I don't even known how to start debugging the problem. I've been looking into Tcl sourcecode and the first think that I see regarding the use of TclStackFree is that it's used in the Linux version of TclpCreateProcess, but not in Windows. Following this lead I finished in the Tcl_OpenCommandChannel, the one used to process the tcl exec command. Am I in drunks o could be that some error or wrongly synched command in the Tcl_OpenCommandChannel could be producing this problem? Any help is more than welcome. Cheers, Carlos ---------------------------------------------------------------------- >Comment By: SourceForge Robot (sf-robot) Date: 2009-12-26 02:20 Message: This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2009-05-12 07:45 Message: Do you mean in the 8.5.7 or in 8.6? Anyway I haven't seen any crash since January using 8.5.4, but also, as you known, I cannot replicate it easily. On the other hand I've seen that using linux kernel 2.6.27 I've tons of crashes using threads, but I don't known if it's related or not with this case. I'll try to create an script to replicate it, but I cannot promise anything in a short time :( Thanks. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2009-05-10 03:09 Message: Still happening with newer releases? ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2009-01-19 11:46 Message: Hi Miguel, Bug appeared again in another machine. And seems to happen with some frequency, so maybe Ill be able to debug there. Any tip about how to do it? ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2008-12-11 09:16 Message: Hi Miguel, Well it took me almost 6 months to come back to you (I'm ashamed). We upgraded to 8.5.4 2 months ago, and seemed that everything was fine, but last week I found the same problem again. This time our customer is running a "CentOS release 5.2 (Final) (2.6.18-92.1.13.el5xen) (x86_64 64 bits)" In a previous email you asked for 2 thinks: a) be sure that the thread model is respected I'm reasonably sure that everything is fine there. But we do, of course, some chitchatting between threads to send and get back results. b) Stacktrace How can I do that? Do I need some special Tcl build? (I'm using ActiveTcl 8.5.4) I'm still without good replication steps. It simply happens so I'm not really sure how to attack it. Best Regards, Carlos. ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2008-07-30 07:34 Message: Logged In: YES user_id=340696 Originator: YES Yes, I saw the patch and also though that could be related :) Thinks have been crazy here the last weeks, anyway as soon as 8.5.4 is released I'll give it a try :) Can we leave this bug in standby until then? ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2008-07-29 23:53 Message: Logged In: YES user_id=148712 Originator: NO One thing that comes to mind is #2030670. A fix was committed today to both HEAD and the 8.5 branch, that patch can be applied as is. Hmmm ... I now notice that it is under 8.5.1. The very first thing to do is to test with a more recent version, that could well be a something that's already fixed. However, if it is #2030670 no version containing the patch has yet been released. ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2008-07-21 07:40 Message: Logged In: YES user_id=340696 Originator: YES Hi Miguel, I'll do my best to send you more info this week. Best Regards, Carlos. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2008-07-21 04:49 Message: Logged In: YES user_id=148712 Originator: NO Sorry this flew under my radar. More info: ideally a smallish example to repro. But at least (a) assurance that the threading model is respected: each interp is only called from the thread where it was created (b) a stack trace ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2008-06-30 08:17 Message: Logged In: YES user_id=340696 Originator: YES Hi Miguel, What more info do you need? My main problem is that I cannot replicate the problem in an easy way. But I found some extra info about that. We've the same code running in 32 and 64 bits and seems that the problem happens more frequently in 64b. Anyway just ask me what more info do you need and I'll do anything on my side to send it to you. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2008-06-28 13:43 Message: Logged In: YES user_id=148712 Originator: NO Submitter went AWOL? I'm still suspecting a violation of Tcl's threading model. In order to do anything about this I would need more info. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2008-03-11 13:30 Message: Logged In: YES user_id=148712 Originator: NO On the timing issue: TclStackAlloc/TclStackFree is a very specialised internal allocator, it reserves memory in Tcl's internal stack. It is of the utmost importance for its proper functioning that frees occur in the precise reverse sequence of allocs (*), or else get a panic. What you are seeing is a violation of this timing principle. Had this happened in a single thread, the crash would be deterministic - an obvious bug with a slim chance of making it past the testing phase. However, if different threads are allocing/freeing from the same interp's stack (shouldn't happen as the stack is interp-specific), thread coordination issues could cause the observed behaviour. This is what I suspect. Not much of a clue beyond that yet. (*) They should also be precisely coordinated with the bytecode engine's own usage of the stack. This is an internal allocator that is not meant to be used by extensions. ---------------------------------------------------------------------- Comment By: Carlos Tasada (ctasada) Date: 2008-03-11 13:14 Message: Logged In: YES user_id=340696 Originator: YES Hi Miguel, What do you mean by "wrong timing"? Some "after"? or maybe some async thread that's released too early? Can the problem be related with some "exec" command as I said before? Regarding you questions 1) I already though about that. I doing some changes in the sourcecode and I have a client that volunteer to test it running from source. Hope I'll have something new in a couple of days. 2) I'm using threads from scripts. 3) That's the bad part. I'm trying to find how to simplify the problem as much as possible to have a nice test case. Anyway seems that's related with some other problem that's leaking memory (I'm looking into that right now). At the moment that I "puts" to a file, seems that the used memory is too much and the application crashes. Best Regards, Carlos Tasada. ---------------------------------------------------------------------- Comment By: miguel sofer (msofer) Date: 2008-03-11 12:48 Message: Logged In: YES user_id=148712 Originator: NO This definitely looks like a threading problem: wrong timing within the same thread will cause TclStackFree to bomb reliably. First let us try to reduce the problem a bit: (1) tbcload should have nothing to do with this; can you confirm by using the script sources instead of the precompiled version? I do not know if you have them available or not, it might be 3rd party code (2) You are using threads: script or C api? If C: are you respecting Tcl's threading model, "no calls to an interp from a different thread"? (3) Can you reduce the problem to some smallish demo that we can study? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1909647&group_id=10894 |