when testing the 6.30 version I randomly met a cloud of bugs related with the UTF8 extension. Shortly, most of the new code in strngfun.c and utility.c does not correctly manipulates real string sizes. For example function SubStringFunction cores for empty string due to the statement:
end = UTF8Offset(tempString,end + 1) - 1;
Before correctly set value end=0 is set to end=-1 (resp. end=MAXINT) which does not cause a crash itself, but the following rewriting of characters in the FOR loop does.
Also most of UTF8xxx functions do not calculate with the fact, that the string can be shorter then they expect. They just access 4 following bytes regardless they still belong to the string or not.
Does anybody plan to revise this code?
Thanks a lot Vranoch
There's a fix for substring in strngfun.c checked into svn.
That's great. Thanks. And how about the other string functions? Did You check them also whether they do not suffer the same problem? And did You looked at UTF8xxx functions in utility.c? - do You find them safe against various mixed-strigs or strange-sized strings?
substring is the only CLIPS function that seems to have an issue. I haven't decided yet on what type of C API I want to use to allow access to the characters of a UTF8 string, so the UTF8xxx functions may remain for internal use only.
It is not a problem that UTF8xxx functions are accessed only internally so far. They are frequently called from public string functions and their problem is that they access memory without any respect to real length of the string. Calling a public string function with appropriately formed string parameter can easily cause a memory corruption.
The problem was with substring, not the UTF8xxx function. The UTF8Offset function doesn't access memory beyond the valid length of a properly formed string.
sorry for re-posting after page refresh :-(