Hi,
I found that regexp changes the object type to string
type. I see this behavior in Tcl8.3.5 but not in Tcl8.1.
The stack trace when it calls the registered function to
free the old internal representation looks as follows:
#1 0x9d2a4 in SetStringFromAny (interp=0x0,
objPtr=0x11ebf8) at ./../generic/tclStringObj.c:1505
#2 0x9bc3c in Tcl_GetCharLength (objPtr=0x11ebf8)
at ./../generic/tclStringObj.c:329
#3 0x2fab4 in Tcl_RegexpObjCmd (dummy=0x0,
interp=0xfdd68, objc=2, objv=0xffdac)
at ./../generic/tclCmdMZ.c:227
The question is why Tcl_GetCharLength requires
changing the internal representation to string object. It
should simply call Tcl_GetStringFromObj and does a
strlen on the return C-string.
Looking at the code tclCmdMZ.c, it has the following
comment...
Get the length of the string that we are matching
before getting the regexp to avoid shimmering problems.
-- Yohan
Logged In: YES
user_id=72656
Er ... so? Why is this a bug report? In any case, the RE
works with Unicode string objects, not utf-8 string objects.
The code comments are accurate.
Logged In: YES
user_id=799156
At the very least, it should check whether the Tcl_Obj is
shared or not, and create a utf-8 string object on a new
Tcl_obj. Don't you think?
Just curious, what is shimmering problem? Where can I get
reference to understand shimmering problem...?
Logged In: YES
user_id=79902
At the moment, I'm tempted to say that the modification in
behaviour in a place which is not generally exposed visibly
at the script level is a consequence of a bug getting fixed.
Please describe why you believe this to be a problem.
Object types are implementation features; mere caches of
representations, and not something deeper.
Logged In: YES
user_id=80530
Tcl_GetCharLength() returns the number of
characters in the string, not the number of
bytes.
strlen(Tcl_GetStringFromObj) returns the
number of bytes in the string, not the
number of characters.
They do not do the same thing, so you can't
propose the latter as a replacement for
the former.
Logged In: YES
user_id=799156
I'm sorry I said the wrong thing.
I said..
At the very least, it should check whether the Tcl_Obj is
shared or not, and create a utf-8 string object on a new
Tcl_obj.
I meant to say,
At the very least, it should check whether the Tcl_Obj is
shared or not, and create a unicode string object on a new
Tcl_obj.
I guess it's fine if it's a requirement that regexp will change
any object type to a string. I understand from your
explanation that the regexp has to operate on an object with
unicode type, leading to converting the old representation (of
the type I registered) to the unicode representation and leads
to freeing the old representation.
I am trying to make use of Tcl memory management on
Tcl_Obj by registering my own object type so that it will call
the registered free function once it knows the object is not
needed anymore.
In other words, I registered a new object-type, and I want
the Tcl_Obj that holds my object to stay as is throughout its
lifespan. ie. it will only call the free function when the
refCount is zero.
Now I want to stop users from executing commands that will
convert the object of my type to something else. Probably
something like:
set a [newMyType]
regexp {hello} $a
--> Illegal operation on variable a,
--> converting 'MyType' to 'string' is prohibited
incr a
--> Illegal operation on variable a,
--> converting 'MyType' to 'int' is prohibited
Currently, I know only how to detect this by implementing a
free function that detects whether the refCount is already
zero or not. But this function returns void and I cannot stop
the users from doing this.
I'm okay if there is a way to prevent users from using such
commands on my obj. Is there a way to return an error at a
situation like this now.
Logged In: YES
user_id=79902
That's not a supported mode of operation, though it's one
that I know I would like to support. However, the core is
not currently up to it and is not likely to change in this
respect in 8.* as it would require both non-trivial
alterations to the Tcl_ObjType structure and a thorough
code-review of a large fraction of the core to ensure that
there aren't any holes where something could slip through.
The current policy is that internal representations are just
caches of what can be derived (possibly in some context)
from the utf-8 string version of the object.
BTW, I've done "magic objects" in the past and it works
fairly well so long as you don't look at them in any way
that doesn't understand exactly what they are. Putting them
into variables and lists is fine. Treating them as strings
or ints and you are on dangerous ground.
This is a FRQ, not a bug.
Logged In: YES
user_id=79902
See also RFE 219162