#3404 core dump on Linux while [read]ing file

obsolete: 8.4.12
closed-fixed
9
2006-04-05
2006-03-31
No

Hi *,

I occassionally get a core dump from a Sig11 on Linux
(x86_64) while reading plain files. It seems to depend
on the content of the file.

The Tcl interpreter is built from unmodified sources
and does not contain any extensions. It was built using
gcc 4.0.2 with -m32 to get i386 compatible executables.

% parray tcl_platform
tcl_platform(byteOrder) = littleEndian
tcl_platform(machine) = x86_64
tcl_platform(os) = Linux
tcl_platform(osVersion) = 2.6.13-15.7-smp
tcl_platform(platform) = unix
tcl_platform(user) = makr
tcl_platform(wordSize) = 4
% set tcl_patchLevel
8.4.12

As I just noticed, it also only happens if env(LANG) is
set to an UTF-8 encoding, e.g. de_DE.UTF-8 or en_US.UTF-8.

Having a file like the attached "crashme" file, I'll do
the following then ...

% set f [open crashme]
file3
% while {![eof $f]} {read $f 4096}
Segmentation fault (core dumped)

Backtrace:

#0 0x5561858c in memcpy () from /lib/tls/libc.so.6
(gdb) bt
#0 0x5561858c in memcpy () from /lib/tls/libc.so.6
#1 0x080a973b in ReadChars (statePtr=0x810c338,
objPtr=0x810cff0, charsToRead=74, offsetPtr=0xffffac18,
factorPtr=0xffffac14)
at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4818
#2 0x080a91f3 in DoReadChars (chanPtr=0x8114510,
objPtr=0x810cff0, toRead=74, appendFlag=0) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4488
#3 0x080a90dd in Tcl_ReadChars (chan=0x8114510,
objPtr=0x810cff0, toRead=4096, appendFlag=0) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4410
#4 0x080aeaf6 in Tcl_ReadObjCmd (dummy=0x0,
interp=0x80fc4f0, objc=3, objv=0x80ff78c) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIOCmd.c:365
#5 0x080629e9 in TclEvalObjvInternal
(interp=0x80fc4f0, objc=3, objv=0x80ff78c, command=0x0,
length=0, flags=0) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclBasic.c:3085
#6 0x08090f25 in TclExecuteByteCode (interp=0x80fc4f0,
codePtr=0x810c3b8) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclExecute.c:1419
#7 0x08090106 in TclCompEvalObj (interp=0x80fc4f0,
objPtr=0x8105e58) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclExecute.c:981
#8 0x08063c7d in Tcl_EvalObjEx (interp=0x80fc4f0,
objPtr=0x8105e58, flags=131072) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclBasic.c:4049
#9 0x080a1f85 in Tcl_RecordAndEvalObj
(interp=0x80fc4f0, cmdPtr=0x8105e58, flags=131072) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclHistory.c:142
#10 0x0804ab09 in Tcl_Main (argc=-1, argv=0xffffb5c8,
appInitProc=0x804a461 <Tcl_AppInit>) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclMain.c:392
#11 0x0804a45a in main (argc=1, argv=0xffffb5c4) at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/tclAppInit.c:90
(gdb) up
#1 0x080a973b in ReadChars (statePtr=0x810c338,
objPtr=0x810cff0, charsToRead=74, offsetPtr=0xffffac18,
factorPtr=0xffffac14)
at
/home/makr/cvs/M214/ginfix2/openSrc/tcl8.4/unix/../generic/tclIO.c:4818
4818 memcpy((VOID *) (nextPtr->buf +
nextPtr->nextRemoved), (VOID *) src,

I noticed that memcpy() got fed with srcLen=-22:

(gdb) print srcLen
$1 = -22
(gdb) print src
$2 = 0x812528e ""
(gdb) print nextPtr->buf
$3 = "\201�212\027"
(gdb) print nextPtr->nextRemoved
$4 = 38
(gdb) print nextPtr->buf + nextPtr->nextRemoved
$5 = 0x81252c6 ""

I also just noticed ActiveTcl's tclsh8.4 (8.4.11) does
also core dump...

% parray ::activestate::ActiveTcl
::activestate::ActiveTcl(arch) = linux-ix86
::activestate::ActiveTcl(as,mode) = normal
::activestate::ActiveTcl(build) = 162119
::activestate::ActiveTcl(buildtime,fmt) = Tue Jul 19
10:40:31 AM PDT 2005
::activestate::ActiveTcl(buildtime,sec) = 1121794831
::activestate::ActiveTcl(maturity) = final
::activestate::ActiveTcl(product) = ActiveTcl
::activestate::ActiveTcl(release) = 8.4.11.0
% set f [open crashme]
file3
% while {![eof $f]} {read $f 4096}
Speicherzugriffsfehler (core dumped)

Althouth this core does not give much away, it
apparently crashes at the same position:

#0 0x556c158c in memcpy () from /lib/tls/libc.so.6
(gdb) bt
#0 0x556c158c in memcpy () from /lib/tls/libc.so.6
#1 0x555c535e in ReadChars () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#2 0x555c4fca in DoReadChars () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#3 0x555c4eb0 in Tcl_ReadChars () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#4 0x555c9582 in Tcl_ReadObjCmd () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#5 0x5558eb27 in TclEvalObjvInternal () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#6 0x555b2faa in TclExecuteByteCode () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#7 0x555b2475 in TclCompEvalObj () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#8 0x5558fbe7 in Tcl_EvalObjEx () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#9 0x555bf6f4 in Tcl_RecordAndEvalObj () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#10 0x555d3565 in Tcl_Main () from
/usr/local/ActiveTcl/lib/libtcl8.4.so
#11 0x080488aa in main ()

kind regards -- Matthias Kraft

PS: SF won't let me upload the file as it is 512 kB,
please download from here:
http://www.matkraft.de/files/crashme

Discussion

  • Andreas Kupries

    Andreas Kupries - 2006-03-31

    Logged In: YES
    user_id=75003

    Ideas regarding the crashme file. Compress it to see if the
    result goes below the upload limit of SF. gzip, or bzip2.

    Also, have you found smaller files exhibiting the problem ?
    Maybe a halving search ? First half of crashme crashing it ?
    Second half ? The half-size section centered on the middle
    of the file ? If yes, we can try to divide further.

    Given the reference to LANG I consider it interesting to
    find out what system encoding is chosen by Tcl. Better, what
    default encoding for the channel you open. Can you add a
    'puts [fconfigure $f]' statement to your script before you
    start reading ?

     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2006-03-31
    • priority: 5 --> 9
     
  • Vince Darley

    Vince Darley - 2006-04-01
    • labels: 104242 --> 25. Channel System
    • assigned_to: vincentdarley --> andreas_kupries
     
  • Matthias Kraft

    Matthias Kraft - 2006-04-03

    Logged In: YES
    user_id=330806

    If I understand the code correctly, the file crashme
    contains random data. It is generated to have some file to
    test procedures used for data conversion and unpacking archives.

    I currently have no smaller file, I'll try to produce one,
    but no luck so far.

    Tried halving the search, first half, second half, a moving
    window half the size - but it doesn't crash then.

    % fconfigure $f
    -blocking 1 -buffering full -buffersize 4096 -encoding utf-8
    -eofchar {} -translation auto

     
  • Matthias Kraft

    Matthias Kraft - 2006-04-03

    Binary content.

     
  • Matthias Kraft

    Matthias Kraft - 2006-04-03

    Logged In: YES
    user_id=330806

    Attaching a reduced example file. Although the behavior to
    reproduce changed a little, the core dump still looks the same.

    With this file the crash occurs on Linux x86_64, i386 and
    s390x. With the full file the crash also occurs on AIX 5.2
    and HP-UX 11i.

    Will attach a script to reproduce the crash with this
    example file...

     
  • Matthias Kraft

    Matthias Kraft - 2006-04-03

    Script for crash with crashme3 file.

     
  • Andreas Kupries

    Andreas Kupries - 2006-04-03

    Logged In: YES
    user_id=75003

    Thanks for your efforts ... They were partly for naught. I
    am unable to repro the bug on my i386 machine using the
    crashme3 file with crash.tcl script. I get some nice output
    in the form

    3/12288 >>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
    19 20 21 22 23 24 25 26 27 28 29 30 31 32
    2/8192 >>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    20 21 22 23 24 25 26 27 28 29 30 31 32 33
    1/4096 >>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

    However, I can reproduce the seg.fault using the original
    large crashme file and using the trivial code in your
    initial report. Just one change was necessary, "fconfigure
    $f -encoding utf-8". This was on a i386 machine, i.e 32bit,
    tclsh 8.4.13 (not yet released, near CVS head essentially),
    compiled with a gcc 2.95.3.

    The only part which seems to be the same in both your and my
    crash is the encoding, utf-8. Based on that my guess is that
    it is the utf8 code which gets into trouble for specific
    byte-sequences in the input.

    This is the next thing to check, using a debug build and
    other instrumentation to get a look into the behaviour.

    So, for now I can definitely confirm that there is crash,
    even if the exact cause is not yet known.

     
  • Don Porter

    Don Porter - 2006-04-03

    Logged In: YES
    user_id=80530

    no crash for me using
    the tip of either
    development branch.
    Tested on both Solaris
    and Linux/Alpha.

    Can anyone reproduce
    the crash?

     
  • Andreas Kupries

    Andreas Kupries - 2006-04-03

    Logged In: YES
    user_id=75003

    Ok, there is something odd going on here. I can repro the
    crash only when using the script interactively. The moment I
    go non-interactive things are OK. And replacing the
    while-loop with a large read of the whole file makes it go
    away as well.

    This general elusivity, i.e. vanishing at the slighest
    change, points to a memory-smash somewhere. :(

    Yep, very likely. Changing the size of blocks read by even
    one character, up or down, causes the crash to vanish as
    well ... changing the -buffersize in tandem with the #chars
    read is not effective, the crash vanishes. Changing the
    buffersize alone makes it vanish as well. Ah, one case is
    different: -buffer 4095, read 4096 blocks ... This seems to
    hang the tclsh instead of crashing it.

     
  • Andreas Kupries

    Andreas Kupries - 2006-04-03

    Logged In: YES
    user_id=75003

    Switching to debug build makes the crash vanish. No
    surprise. Usual heisenbug behaviour for a memory smash,
    debug changes the memory layout enough to let the smash run
    into nothing, disabling it. This requires direct fprintf in
    the code ... Hm, maybe not. Just --enable-symbols, but not
    the mem_debug stuff, then maybe just tracing it in the debugger.

     
  • Andreas Kupries

    Andreas Kupries - 2006-04-03
    • status: open --> open-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2006-04-03

    Logged In: YES
    user_id=75003

    Ok. I know where the problem is, and I have fix sitting in
    my sandbox. I have no original, so no patches, and
    committing the fix has to wait for SF get their act together
    and CVS back running.

    Story time:

    (a) ReadChars is called with a buffer containing one byte.
    This buffer has no successor (end-of-input-queue EOIQ).
    (b) The Tcl_ExternalToUtf in ReadChars signals a split
    multi-byte character. Because of EOIQ ReadChars signals
    'nothing read' and 'channel_need_more_data'
    (c) The IO layer does its thing, reading more data.
    This detects EOF, and sets TCL_ENCODING_END.
    (d) ReadChars is called again.
    It now has two buffers. One with split multi-byte at the
    end, a second with the remainder of the multi-byte char.
    It tries to convert the first buffer again.
    This suceeds!! because of the TCL_ENCODING_END.

    This is wrong. It should not have suceeded, but failed
    as before. Then causing ReadChars to copy the partial
    multi-byte char to the beginning of the now existing
    next buffer. And then try again, with the modified
    input queue.

    In essence the TCL_ENCODING_END is handed to the
    Tcl_ExternalToUtf too early. Yes, there is an EOF pending,
    but right now we have two buffers in the queue, so the first
    one cannot be the end.

    There is a contributing bug hidden in the above description:
    Tcl_ExternalToUtf, i.e. Tcl_Utf2UtfProc is accessing memory
    behind the end of its input buffer if the last character is
    the start of a multi-byte character and TEE is set.

    And in this case the character found there was a valid
    completion of the multi-byte header, so it consumed 2 bytes
    and reported that, starting the upper layers to psiral out
    of control. Here the memory layout comes into play, causing
    the high sensitivity against any type of change, be it
    different buffer sizes or even a switch to non-interactive
    operation. In most cases the byte read is not a valid
    completion of the multi-byte char, so only one byte is
    consumed (as part of creating canonical utf-8 from the
    non-canonical input) and that keeps the upper layers stable
    and happy. The read information is bad, a multi-byte char
    was broken, but no crash.

    This bug has been fixed as well. A quick test with valgrind
    confirmed this bad memory access btw. (using a slightly
    modified make valgrind target).

    The whole problem is likely present in 8.5 as well. Fixes
    will done tomorrow. This took the whole day to fully trace
    in its entirety.

     
  • Andreas Kupries

    Andreas Kupries - 2006-04-05
    • status: open-fixed --> closed-fixed
     
  • Andreas Kupries

    Andreas Kupries - 2006-04-05

    Logged In: YES
    user_id=75003

    Fixes committed to both HEAD and 8.4 branch head.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks