In TIP #75 is told, that the new -indexvar option, related to the -regexp option of switch, should behave like in the -indices option in the regexp command to return the range of the found matche and the ranges of the sub matches for any sub expression.
The man page points to "regexp -indices", too, but tells the index variable contains range(s) from the first matching character to the next after the last matching character.
So - TIP #75 is not realized and the behavior of "switch -regexp -indexvar" is not comparable to "regexp -indices".
An example:
% switch -regexp -indexvar i -matchvar m "abcdef" {
^abc {
puts "matchvar = '$m'";
puts "indexvar = [list $i]";
puts "string range = '[string range "abcdef" {*}[lindex $i 0]]'";
}
}
matchvar = 'abc'
indexvar = {{0 3}}
string range = 'abcd'
% string range "abcdef" {*}[lindex [regexp -indices -inline {^abc} "abcdef"] 0]
abc
This behavior should be consistent and be corrected for the switch option -indexvar!
This bug is related to 8.6beta, too!
In tclCmdMZ.c:
line 276+277 (Tcl_RegexpObjCmd):
match = Tcl_RegExpExecObj(interp, regExpr, objPtr, offset, numMatchesSaved, eflags);
line 3740+3741 (Tcl_SwitchObjCmd):
int matched = Tcl_RegExpExecObj(interp, regExpr, stringObj, 0, numMatchesSaved, 0);
The first call to Tcl_RegExpExecObj has an "offset", that later on in ...
line 341-343:
if (end >= offset) {
end--;
}
... is used to correct the "end" index to point to the last character of the match.
This is missing in Tcl_SwitchObjCmd!
Since my time is limited I was not yet able to think about the offset ... :(
Sure, the only problem is that the code is consistent with the documentation. Bug locked in :(
Ready to TIP ?
IMHO wrong implemented or not implemented as specified software is buggy or the specification must be changed.
And there should really no need to TIP a change to let "switch -regexp -indexvar" behave like specified!
At the end of the day, what count as specification is the manpage, not the unwritten intention behind the TIP. Here the manpage says:
... will be a two-element list specifying the index of
the start and index of the first character after the end of
the overall substring of the input string
Hence, any application working today and using the -indexvar option, with its current and documented semantics, will sunddenly fail if the change is applied. That is an API change, hence that needs a TIP.
The current docu is in itself contradictory by doing both: stating the (non-TIP75-conformant) behaviour AND referring to regexp as being alike.
Since this current behaviour is not only at odds with regexp, but also with everything in tcl that involves ranges, it should be seen as a bug that involves both implementation and docu, especially as keeping it is likely to create confusion and do more harm than changing this relatively new feature.
The intent was clearly to mirror [regexp -indices]. That it does not is a bug.
What is more important ... the described, but wrong implemented intention (regexp -indices behavior) ... or a man page describing correctly the wrong behavior - which it should or must, because otherwise the man page would be "buggy"!
No - no behavior is right, only because its documented, but only if the way to the behavior is well documented and the final behavior matches the intentions, which may change on that way!
Here the intentions were clear from the start, but are not met at the end!
And - the specification justifies the behavior not the man page, which only describes!
So even a bug fix could cause incompabilities!
Sure, it could cause incompatibilities to fix a bug but it was nonetheless a bug. Now it's a fixed bug. :-) Note from the ChangeLog:
***POTENTIAL INCOMPATIBILITY***
Uses of [switch -regexp -indexvar] that previously compensated for the
wrong offsets (by subtracting 1 from the end indices) now do not need
to do so as the value is correct.