Hi!
I'd like to propose a patch which fixes segfault when matching exact string with non-ASCII in it (in fact when UTF-8 bytelength is greater than Unicode cahracter length).
The bug description with a simple script which triggers it can be found at http://groups.google.com/group/comp.lang.tcl/browse_thread/thread/8ea4b666c2f31cac#
Another bugreport is at Ubuntu tracker (though it's incomplete, but the segfault is at the same code fragment). See https://bugs.launchpad.net/ubuntu/+source/expect/+bug/608343
The issue seems to be in matching function where UTF-8 pattern is used to match Tcl_UniChar string. The matching itself is fine, but the length of matched string segment is calculated incorrectly as a UTF-8 bytelength of the pattern. The attached patch switches to Tcl_UniChar pattern.
Fix for incorrect pattern length
Alternate patch, smaller.
Thank you for the investigation. I managed to reproduce the problem here too.
Enclosed (attached) my alternate patch for the problem.
Instead of rewriting the match routines (which do not return the length info in question) I simply convert the patLength from #bytes to #chars via Tcl_NumUtfChars().
Committed my patch
Reopening after a talk with teo(petuk) on the chat:
08:23] teo1 hi! i'm about your patch to expect (i can't comment in closed bugreports in sf.net). my patch eliminates double conversion between utf-8 and unicode. so it makes search a bit more efficient (it also removes one unused counter)
[08:23] teo1 though it's more lengthy
[08:24] aku moin. Ok, I'll have another look and mediation on the various conversions.
[08:33] teo1 your patch is ok to me as well. it is small enough to push it into debian stable even in freeze time