|
From: SourceForge.net <no...@so...> - 2012-02-08 20:47:27
|
Bugs item #3484402, was opened at 2012-02-04 10:04 Message generated for change (Comment added) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3484402&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 10. Objects Group: current: 8.5.11 Status: Open Resolution: None Priority: 8 Private: No Submitted By: Poor Yorick (pooryorick) Assigned to: Don Porter (dgp) Summary: off-by-one address error in AppendUnicodeToUnicodeRep Initial Comment: [append var1 $var2], data-corruption, e.g., random bytes from memory, show up in $var1 problem traced to checkin f0785bb73d1ccd8a524b32c53d457c63b6bb912f, changes to AppendUnicodeToUnicodeRep, if (unicode >= stringPtr->unicode && unicode <= stringPtr->unicode + 1 + stringPtr->uallocated / sizeof(Tcl_UniChar)) { offset = unicode - stringPtr->unicode; } the "+1" seems to cause a comparison with the pointer immediately subsequent to the boundaries of stringPtr, and causing the 'unicode' to point to the wrong location after the ensuing realloc and offset manipulations. Removing resulted the bug disappearing: if (unicode >= stringPtr->unicode && unicode <= stringPtr->unicode + stringPtr->uallocated / sizeof(Tcl_UniChar)) { offset = unicode - stringPtr->unicode; } in order for the bug to be triggered, things have to fall just-so in memory, such that "unicode" is at the next address up from stringPtr. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2012-02-08 12:47 Message: So the remaining dispute is just memcpy() vs. memmove() ? Unquestionably, memmove() is as correct as memcpy(). Unquestionably, memmove() is as safe as memcpy(). If there's even a possibility of overlapping sequences, memmove() is safe where memcpy() is not. That leaves only performance as a possible reason not to use memmove(), correct? I plan to commit the patch from the bugfix branch. If there's a performance argument to change back to memcpy(), let's see the measurements to support it. On my system the memmove() implementation is faster on the STR append benchmarks. 001 STR append 5.44 5.50 002 STR append (1KB + 1KB) 1.00 1.02 003 STR append (1MB + (1b+1K+1b)*100) 168.88 173.85 004 STR append (1MB + 1KB) 129.17 129.15 005 STR append (1MB + 1KB*20) 132.90 133.15 006 STR append (1MB + 1KB*1000) 291.17 300.10 007 STR append (1MB + 1MB*3) 522.82 436.00 008 STR append (1MB + 1MB*5) 653.54 541.52 009 STR append (1MB + 2b*1000) 236.96 245.06 This is core-8-4-branch tip vs. bug-3484402. ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2012-02-08 12:14 Message: If we want to support Don's use of Tcl_SetObjLength officially, that would be a RFE, not a bug fix, so I would keep that separate from the bug fix we are handling here. I wouln't encourage people to use such code. In that respect, I'm with Sebres here. Sorry ;-) Apart from that, the suggested change is OK with me. ---------------------------------------------------------------------- Comment By: Serg G. Brester (sebres) Date: 2012-02-08 09:17 Message: To Don, IMO, it's intrinsically illegal, what you do here: p = Tcl_GetString(objPtr); Tcl_SetObjLength(objPtr, 3); Tcl_AppendToObj(objPtr, p+7, 8); cause TODAY the Tcl_SetObjLength leave the pointer to string yet conserved (It changes just the length). But, If TOMORROW it will be optimized, and frees or recreate new string representation also, that code will produce access violation. IMO, it is not practical. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2012-02-08 06:56 Message: Thanks for the review. Regarding memmove, consider this: objPtr = Tcl_NewStringObj("foo bar dingbat", -1); p = Tcl_GetString(objPtr); Tcl_SetObjLength(objPtr, 3); Tcl_AppendToObj(objPtr, p+7, 8); which should produce the value 'foo dingbat'. That involves a copy of overlapping byte sequences. Similar sequence with the unicode rep is also possible. It's easy to forget that stringPtr->allocated can be strictly larger than objPtr->length + 1 and stringPtr->uallocated can be strictly larger than (stringPtr->numChars + 1)*sizeof(Tcl_UniChar) ---------------------------------------------------------------------- Comment By: Serg G. Brester (sebres) Date: 2012-02-08 02:21 Message: Hi Don, the "memmove" with overlap safe copy does not really necessary by append functions (IMO). The name of the routine "memmove" says it already - move - but we have append. It's slower as "memcpy". Another change "STRING_UALLOC(numChars) > stringPtr->uallocated" seems to be ok. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2012-02-07 12:30 Message: The basic diagnosis reported is correct. Upon review, there are some other problems as well. The branch bug-3484402 off of core-8-4-branch is a patch repairing all the problems. Review and testing invited. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2012-02-06 13:12 Message: Not yet ready to endorse any diagnoses, but I wanted to say that ugly braintwisters like this are the reason the whole thing is rewritten in the trunk. ---------------------------------------------------------------------- Comment By: Poor Yorick (pooryorick) Date: 2012-02-06 09:57 Message: oops, I thought 1 was the highest. Sorry, Don. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2012-02-06 09:56 Message: What the hell are you talking about? Prio-8 is second highest. Multiple people are working on it within hours of receiving it. You're the first to notice the problem in more than 2 years. I think our response is quite good. ---------------------------------------------------------------------- Comment By: Poor Yorick (pooryorick) Date: 2012-02-06 09:54 Message: seems like a rather low priority for something which renders Tcl broken on a major platform (my scripts won't run reliably with macports tcl-8.5.11). ---------------------------------------------------------------------- Comment By: Serg G. Brester (sebres) Date: 2012-02-06 09:18 Message: Amendment: the solution can be so as suggested by Poor Yorick, because if one assumes, that pointer to unicode should be aligned to 2 ... also in my example "unicode" could not be 0xADDR0007. Also remove "+1", and instead of "<" use "<="; contents such as Poor Yorick. ---------------------------------------------------------------------- Comment By: Serg G. Brester (sebres) Date: 2012-02-06 09:02 Message: That could be calculated mathematically : stringPtr->uallocated; - amount of space actually allocated for the Unicode string (minus 2 bytes for the termination char). stringPtr->unicode - is a "unsigned short []" - also "unsigned short *" - also mathematically, stringPtr->unicode + 1 == ADDR + 2 ------------------------------------------------------------------------------------ For example by string "abc": stringPtr->unicode = 0xADDR0000 stringPtr->uallocated = 6 // + 2 bytes for NTS zeros; 0xADDR0000: 61 00 62 00 63 00 [ 00 00 ] stringPtr->unicode + 1 + stringPtr->uallocated / sizeof(Tcl_UniChar) = stringPtr->unicode + 1 + 6/2 = stringPtr->unicode + 4 (offset) = ADDR + 8 (mathematically) When: unicode = 0xADDR0004 // within buffer, "unicode" points to the last UNICHAR before NTS 0xADDR0000: 61 00 62 00 -> 63 00 [ 00 00 ] unicode <= stringPtr->unicode + 1 + stringPtr->uallocated / sizeof(Tcl_UniChar) 4 <= 8 == true -------------------------------- When: unicode = 0xADDR0006 // within buffer, "unicode" points to NTS after the last UNICHAR 0xADDR0000: 61 00 62 00 63 00 ->[ 00 00 ] unicode <= stringPtr->unicode + 1 + stringPtr->uallocated / sizeof(Tcl_UniChar) 6 <= 8 = true -------------------------------- When: unicode = 0xADDR0008 // out of buffer, "unicode" points to ANOTHER unicode string. 0xADDR0000: 61 00 62 00 63 00 [ 00 00 ] -> .. .. .. unicode <= stringPtr->unicode + 1 + stringPtr->uallocated / sizeof(Tcl_UniChar) 8 <= 8 = true That is why it must be the following ("<=" should be replaced with "<", but "+1" must remain) : [SOLUTION BEGIN] if (unicode >= stringPtr->unicode && unicode < stringPtr->unicode + 1 + stringPtr->uallocated / sizeof(Tcl_UniChar)) { offset = unicode - stringPtr->unicode; } [SOLUTION END] ---------------------------------------------------------------------- Comment By: Poor Yorick (pooryorick) Date: 2012-02-06 07:47 Message: It's not easy to recreate, because even with a certain set of values, they might not fall exactly right in memory to expose the problem. I did manage to get a test case that triggered the problem about 50% of the time, and I did observe that the problem only occurred when the 'unicode' symbol was located at exactly the next address up from stringPtr, making the "if" condition true even though the 'unicode' value was not in the boundaries of stringPtr. The test case was still 600 lines of code cut from another program, and very sensitive to contextual things like the length of the current working directory at the time of invocation. I don't currently know how to write a good test case for this. Any suggestions? ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2012-02-06 06:39 Message: Agreed. This is the sort of thing where it's important to get a test written before we try fixing anything (so we can confirm the fix). ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2012-02-06 00:33 Message: Poor, can you provide a more complete test case, with exact values of var1 and var2 you observed the problem with? Something like set var1 <any code to generate a suitable string> set var2 ??? append var1 $var2 Thanks! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3484402&group_id=10894 |