From: SourceForge.net <no...@so...> - 2006-01-23 11:25:53
|
Bugs item #1410553, was opened at 2006-01-20 05:13 Message generated for change (Comment added) made by msofer You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1410553&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 10. Objects Group: current: 8.4.12 >Status: Pending >Resolution: Fixed Priority: 8 Submitted By: Twylite (twylite) Assigned to: miguel sofer (msofer) Summary: String numchars and object length can be desynchronised Initial Comment: Under certain circumstances the Tcl_GetRange() function will incorrectly treat the internal representation as a String (of byte-wide characters) rather than Unicode, resulting in incorrect behaviour. The check is based on a comparison between strPtr->numChars and objPtr->length. The functions "append" and "string length" may be involved in causing a desynchronisation between numChars and length. Bug present in 8.4.11, 8.4.12, 8.5a4 (from CVS 2006/01/19 ) on Win32. Bug reproduction code with actual and expected output is below. --- This code reproduces a bug that we found in a serial receive loop. The loop would read characters from a serial channel and append them one by one to a receive buffer; if an trigger character (ETX = \x03) was found it would witch from data mode to CRC mode. It appears that doing the "string length" when a NUL character (\x00) has a side-effect whereby a subsequent "string range" misidentifies the end of the string. Notes: - It seems that the trigger character can be any non-NUL value. - The indexes to "string range" can be relative to end, or absolute. Both fail. - Not doing the "string length" negates the problem - Doing a "string bytelength" before the "string range" negates the problem - Only the first "string range" is affected; calling the same command again immediately after the first negates the problem - The problem is only reproducable when there is a \x00 in the string before the character that triggers the "string length" - If the \x00 immediately precedes the trigger character, the "string range" brings it out as \x80 instead of \x00. I strongly suspect that this has to do with the internal (unicode) encoding of a NUL. Further investigation: Stepping through the code in a debugger, I found that under the conditions given above, Tcl_GetRange() executes on a different path. if (stringPtr->numChars == objPtr->length) { // Use Tcl_GetString and Tcl_NewStringObj } else { // Use Tcl_NewUnicodeObj } It would appear that this is actually a Unicode object, as the behaviour is correct when the Unicode path is followed. The ->numChars and ->length happen (coincidently) to match on account of the encoding of \x00 and the number of additional characters added after the trigger character. My guess is along the lines of "string length" updating one of these values; whereas "append" updates the other. --- CODE TO REPRODUCE --- proc test_stringbug {} { # set bytes "\x02 \x49 \x4B \x04 \x00 \x00 \x03 \x15 \x57" # set bytes "\x00 \x01 \x57 \x03 \x41" set bytes "\x00 \x03 \x41" set rxBuffer {} foreach ch $bytes { append rxBuffer $ch #1 Take this if{} out and it works if { $ch eq "\x03" } { string length $rxBuffer } } #2 Add this "string bytelength" in and it works # string bytelength $rxBuffer set rxCRC [string range $rxBuffer end-1 end] binary scan [join $bytes {}] "H*" input_hex puts "Input is ...... $input_hex" binary scan $rxBuffer "H*" rxBuffer_hex puts "rxBuffer is ... $rxBuffer_hex" binary scan $rxCRC "H*" rxCRC_hex puts "rxCRC is ...... $rxCRC_hex" } catch { console show ; update ; wm wi . } puts "Tcl version: $tcl_patchLevel" test_stringbug --- OUTPUT --- rxCRC should be the last two bytes (4 hex characters) of rxBuffer D:\>tclkit-sh-8411.exe d:\tcl_string_bug.tcl Tcl version: 8.4.11 Input is ...... 000341 rxBuffer is ... 000341 rxCRC is ...... 8003 D:\>tclsh84-12.exe d:\tcl_string_bug.tcl Tcl version: 8.4.12 Input is ...... 000341 rxBuffer is ... 000341 rxCRC is ...... 8003 D:\>tclsh85sg.exe d:\tcl_string_bug.tcl Tcl version: 8.5a4 Input is ...... 000341 rxBuffer is ... 000341 rxCRC is ...... 8003 --- ---------------------------------------------------------------------- >Comment By: miguel sofer (msofer) Date: 2006-01-23 08:25 Message: Logged In: YES user_id=148712 Thanks to both you guys; fixed in HEAD and cor-8-4-branch. Leaving as "Pending" to remind me to add a test. ---------------------------------------------------------------------- Comment By: Twylite (twylite) Date: 2006-01-20 11:21 Message: Logged In: YES user_id=91629 Thanks :) I didn't really understand the problem because I'm not familiar with Tcl internals. I have tried the solution you suggest against Tcl 8.5a4 (from CVS 2006/01/19), and it corrects the problem. ---------------------------------------------------------------------- Comment By: Peter Spjuth (pspjuth) Date: 2006-01-20 10:58 Message: Logged In: YES user_id=98900 The fault is in Tcl_GetRange. objPtr->length is not valid unless there is a string rep and Tcl_GetRange do not check if objPtr->bytes != NULL first. It would probably work with: if (objPtr->bytes != NULL && stringPtr->numChars == objPtr->length) { ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=1410553&group_id=10894 |