SourceForge has been redesigned. Learn more.
Close

#4003 memory corruption in thread pool ?

obsolete: 8.5.2
closed
8
2008-05-28
2008-05-16
rudenstam
No

I have written a tclscript that uses the packages listed below. After between a few hours and up to a week (approximately) it crashes with "alloc: invalid block" error. I compiled tcl and packages with symbols, trace follows below.

In mainthread I am using:
tcl: 8.5.2
Thread: 2.6.5
tls: 1.6
udp: 1.0.9

In threadpool threads I am using:
tcl: 8.5.2
Thread: 2.6.5
mysqltcl: 3.05

TpoolWorker() is trying to Tcl_Free() a (TpoolResult *rPtr) that
(a) when cast as a (char *) contains "replacedThisPrivateInfo 26 360969"
(b) is different from the clientData it received
(c) is not recognised as having been Tcl_Alloc'ed
This hints at mem corruption?

(gdb) bt 40
#0 0xb7f70410 in ?? ()
#1 0xad79c19c in ?? ()
#2 0x00000006 in ?? ()
#3 0x00003117 in ?? ()
#4 0xb7cfd811 in raise () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7cfefb9 in abort () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7f0d4d6 in Tcl_PanicVA (format=0xb7f60fc4 "alloc: invalid block: %p: %x %x", argList=0xad79c384 "\033\226í·¸Ãy­Y¢ò·\bø\212\t") at /root/tcl8.5.2/unix/../generic/tclPanic.c:103
#7 0xb7f0d506 in Tcl_Panic (format=0xb7f60fc4 "alloc: invalid block: %p: %x %x") at /root/tcl8.5.2/unix/../generic/tclPanic.c:132
#8 0xb7f2ab35 in Ptr2Block (ptr=0x98af808 "replacedThisPrivateInfo 26 360969") at /root/tcl8.5.2/unix/../generic/tclThreadAlloc.c:735
#9 0xb7f2a259 in TclpFree (ptr=0x98af808 "replacedThisPrivateInfo 26 360969") at /root/tcl8.5.2/unix/../generic/tclThreadAlloc.c:376
#10 0xb7e7499d in Tcl_Free (ptr=0x98af808 "replacedThisPrivateInfo 26 360969") at /root/tcl8.5.2/unix/../generic/tclCkalloc.c:1182
#11 0xb736e8cb in TpoolWorker (clientData=0xbfb7c31c) at ../generic/threadPoolCmd.c:1108
#12 0xb7ec13cb in NewThreadProc (clientData=0x8223d80) at /root/tcl8.5.2/unix/../generic/tclEvent.c:1386
#13 0xb7e30240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#14 0xb7da049e in clone () from /lib/tls/i686/cmov/libc.so.6

The "replacedThisPrivateInfo" is private information that I've stripped out, it's a 41 characters long word.

The string is read from a tls channel inside a fileevent. From there it's passed to a proc using this command:
catch {$command $sock $id [lrange $data 2 end]} error

the proc looks like this:
proc SIZES { sock id line } {
if { [llength [split [join $line]]] == 3 } {
foreach { dir files bytes } [split [join $line]] { }
tpool::post -nowait $::pool "SIZES $sock $dir $files $bytes"
log SIZES $dir $files $bytes ($::sockets($sock,user))
} else {
puts $sock [list $id SIZES usage [list {SIZES <dir> <files> <bytes>}]]
}
}

In the threadpool the exact string is never found.

The string is found again when the threadpool returns to the mainthread, where it sends it over a normal tcl (non tls) socket.

The threadpool is started with the following command:
set ::pool [tpool::create -minworkers $min -maxworkers $max -initcmd "source workers.tcl" -exitcmd {catch {mysql::close $::sql}}]

Where both $min and $max is set to 20 so thread should only be created when script starts (had problems with threads not being created when needed).

I'm afraid that I've been unable to make small simple script that has these results, the server also lasted much longer when it was in testing phase with just 10 or so clients on it, now it has about 120-150 concurrent clients.

I use tpool::wait in a non standard way. In my process that read results, I wait for future job ids. I issue the tpool::wait command with a list of any incompleted jobs + future job ids up to 50 in total.

Discussion

  • miguel sofer

    miguel sofer - 2008-05-16
    • priority: 5 --> 8
     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086
    Originator: NO

    Can you please try checking out the current version
    from CVS and try again? I am not 100% sure but it
    can be that I had a race-condition at that place which
    I now corrected.

    Thanks,
    Zoran

     
  • rudenstam

    rudenstam - 2008-05-18

    Logged In: YES
    user_id=1687294
    Originator: YES

    Will do as soon as I can reach that box again, and it might take a few days after that if it didn't help, at least a week before I can with some confidence say that it did help....

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086
    Originator: NO

    If it does not. make sure you collect the coredump
    as in this case.

    Cheers
    Zoran

     
  • rudenstam

    rudenstam - 2008-05-28

    Logged In: YES
    user_id=1687294
    Originator: YES

    No crash yet so seems to be working. Lots of thanks, I'll re-open if it crashes with same reason later...

     
  • rudenstam

    rudenstam - 2008-05-28
    • status: open --> closed