Fix for incorrect pattern length

Brought to you by: andreas_kupries, hobbs, libes, pyssling

#11 Fix for incorrect pattern length

Status: open-fixed

Owner: Andreas Kupries

Labels: None

Priority: 7

Updated: 2010-10-27

Created: 2010-10-26

Creator: Sergei Golovan

Private: No

Hi!

I'd like to propose a patch which fixes segfault when matching exact string with non-ASCII in it (in fact when UTF-8 bytelength is greater than Unicode cahracter length).

The bug description with a simple script which triggers it can be found at http://groups.google.com/group/comp.lang.tcl/browse_thread/thread/8ea4b666c2f31cac#

Another bugreport is at Ubuntu tracker (though it's incomplete, but the segfault is at the same code fragment). See https://bugs.launchpad.net/ubuntu/+source/expect/+bug/608343

The issue seems to be in matching function where UTF-8 pattern is used to match Tcl_UniChar string. The matching itself is fine, but the length of matched string segment is calculated incorrectly as a UTF-8 bytelength of the pattern. The attached patch switches to Tcl_UniChar pattern.

Discussion

Sergei Golovan - 2010-10-26

Fix for incorrect pattern length

21-match.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-26

Alternate patch, smaller.

3095935.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-26

Thank you for the investigation. I managed to reproduce the problem here too.
Enclosed (attached) my alternate patch for the problem.
Instead of rewriting the match routines (which do not return the length info in question) I simply convert the patLength from #bytes to #chars via Tcl_NumUtfChars().

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-26

priority: 5 --> 9

assigned_to: nobody --> andreas_kupries
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-26

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-26

Committed my patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-27

status: closed-fixed --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-27

Reopening after a talk with teo(petuk) on the chat:

08:23] teo1 hi! i'm about your patch to expect (i can't comment in closed bugreports in sf.net). my patch eliminates double conversion between utf-8 and unicode. so it makes search a bit more efficient (it also removes one unused counter)
[08:23] teo1 though it's more lengthy
[08:24] aku moin. Ok, I'll have another look and mediation on the various conversions.
[08:33] teo1 your patch is ok to me as well. it is small enough to push it into debian stable even in freeze time

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Kupries - 2010-10-27

priority: 9 --> 7
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.