Docs for Tcl_UtfPrev(src, start) say:
Given src, a pointer to some location in a
UTF-8 string,
Tcl_UtfPrev returns a pointer to the previous
UTF-8 charac-
ter in the string. This function will not back
up to a
position before start, the start of the UTF-8
string. If
src was already at start, the return value will be
start.
The source code comments just before Tcl_UtfPrev
also say:
* Given a pointer to some current location in a
UTF-8 string,
* move backwards one character. This works
correctly when the
* pointer is in the middle of a UTF-8 character.
So the claim is that the src argument can be a pointer
to a trailing byte of a multi-byte character.
However, that does not appear to be what the
routine really does. Instead, the routine starts
a search at one byte before the src argument, and
searches backward for the beginning of a UTF8
character byte-sequence and returns that.
So, for example,
CONST char *word="ab\303\521";
Tcl_UtfPrev(word+3,word);
will return word+2 (pointer to the é character in UTF8)
and not word+1 (pointer to the 'b') as I would expect.
One feature of the current Tcl_UtfPrev()
implementation is that it is currently safe
to pass in a 'src' argument, that points one
byte past the end of the allocated buffer,
which can be useful, even though it seems
counter to the documentation.
Either implementation or documentation
ought to be changed to bring them into
agreement.
Logged In: YES
user_id=72656
I think docs should be changed. IIRC there are uses of this
that only Tk makes use of that have to be checked against.
Logged In: YES
user_id=80530
Yes, that is the consensus.
Clarify the docs to accurately
describe existing behavior.
claiming...
Logged In: YES
user_id=79902
Minor tightening of Tcl_UtfNext() done at same time.