#11 Fix for incorrect pattern length

Sergei Golovan


I'd like to propose a patch which fixes segfault when matching exact string with non-ASCII in it (in fact when UTF-8 bytelength is greater than Unicode cahracter length).

The bug description with a simple script which triggers it can be found at http://groups.google.com/group/comp.lang.tcl/browse_thread/thread/8ea4b666c2f31cac#

Another bugreport is at Ubuntu tracker (though it's incomplete, but the segfault is at the same code fragment). See https://bugs.launchpad.net/ubuntu/+source/expect/+bug/608343

The issue seems to be in matching function where UTF-8 pattern is used to match Tcl_UniChar string. The matching itself is fine, but the length of matched string segment is calculated incorrectly as a UTF-8 bytelength of the pattern. The attached patch switches to Tcl_UniChar pattern.


  • Sergei Golovan
    Sergei Golovan

    Fix for incorrect pattern length

  • Alternate patch, smaller.

  • Thank you for the investigation. I managed to reproduce the problem here too.
    Enclosed (attached) my alternate patch for the problem.
    Instead of rewriting the match routines (which do not return the length info in question) I simply convert the patLength from #bytes to #chars via Tcl_NumUtfChars().

    • priority: 5 --> 9
    • assigned_to: nobody --> andreas_kupries
    • status: open --> closed-fixed
  • Committed my patch

    • status: closed-fixed --> open-fixed
  • Reopening after a talk with teo(petuk) on the chat:

    08:23] teo1 hi! i'm about your patch to expect (i can't comment in closed bugreports in sf.net). my patch eliminates double conversion between utf-8 and unicode. so it makes search a bit more efficient (it also removes one unused counter)
    [08:23] teo1 though it's more lengthy
    [08:24] aku moin. Ok, I'll have another look and mediation on the various conversions.
    [08:33] teo1 your patch is ok to me as well. it is small enough to push it into debian stable even in freeze time

    • priority: 9 --> 7