Menu

#290 Inhibit some type conversions

open-later
5
2003-06-24
2003-06-11
No

Hi,

I found that regexp changes the object type to string
type. I see this behavior in Tcl8.3.5 but not in Tcl8.1.

The stack trace when it calls the registered function to
free the old internal representation looks as follows:

#1 0x9d2a4 in SetStringFromAny (interp=0x0,
objPtr=0x11ebf8) at ./../generic/tclStringObj.c:1505
#2 0x9bc3c in Tcl_GetCharLength (objPtr=0x11ebf8)
at ./../generic/tclStringObj.c:329
#3 0x2fab4 in Tcl_RegexpObjCmd (dummy=0x0,
interp=0xfdd68, objc=2, objv=0xffdac)
at ./../generic/tclCmdMZ.c:227

The question is why Tcl_GetCharLength requires
changing the internal representation to string object. It
should simply call Tcl_GetStringFromObj and does a
strlen on the return C-string.

Looking at the code tclCmdMZ.c, it has the following
comment...

Get the length of the string that we are matching
before getting the regexp to avoid shimmering problems.

-- Yohan

Discussion

  • Jeffrey Hobbs

    Jeffrey Hobbs - 2003-06-11
    • status: open --> pending
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2003-06-11
    • status: pending --> pending-invalid
     
  • Jeffrey Hobbs

    Jeffrey Hobbs - 2003-06-11

    Logged In: YES
    user_id=72656

    Er ... so? Why is this a bug report? In any case, the RE
    works with Unicode string objects, not utf-8 string objects.
    The code comments are accurate.

     
  • Yohan Sutjandra

    Yohan Sutjandra - 2003-06-11
    • status: pending-invalid --> open-invalid
     
  • Yohan Sutjandra

    Yohan Sutjandra - 2003-06-11

    Logged In: YES
    user_id=799156

    At the very least, it should check whether the Tcl_Obj is
    shared or not, and create a utf-8 string object on a new
    Tcl_obj. Don't you think?

    Just curious, what is shimmering problem? Where can I get
    reference to understand shimmering problem...?

     
  • Donal K. Fellows

    • labels: 104249 --> 105659
    • assigned_to: nobody --> dkf
    • status: open-invalid --> pending-invalid
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    At the moment, I'm tempted to say that the modification in
    behaviour in a place which is not generally exposed visibly
    at the script level is a consequence of a bug getting fixed.
    Please describe why you believe this to be a problem.

    Object types are implementation features; mere caches of
    representations, and not something deeper.

     
  • Don Porter

    Don Porter - 2003-06-11
    • assigned_to: dkf --> nobody
    • labels: 105659 --> 104249
    • status: pending-invalid --> open-invalid
     
  • Don Porter

    Don Porter - 2003-06-11

    Logged In: YES
    user_id=80530

    Tcl_GetCharLength() returns the number of
    characters in the string, not the number of
    bytes.

    strlen(Tcl_GetStringFromObj) returns the
    number of bytes in the string, not the
    number of characters.

    They do not do the same thing, so you can't
    propose the latter as a replacement for
    the former.

     
  • Yohan Sutjandra

    Yohan Sutjandra - 2003-06-11

    Logged In: YES
    user_id=799156

    I'm sorry I said the wrong thing.

    I said..
    At the very least, it should check whether the Tcl_Obj is
    shared or not, and create a utf-8 string object on a new
    Tcl_obj.

    I meant to say,

    At the very least, it should check whether the Tcl_Obj is
    shared or not, and create a unicode string object on a new
    Tcl_obj.

    I guess it's fine if it's a requirement that regexp will change
    any object type to a string. I understand from your
    explanation that the regexp has to operate on an object with
    unicode type, leading to converting the old representation (of
    the type I registered) to the unicode representation and leads
    to freeing the old representation.

    I am trying to make use of Tcl memory management on
    Tcl_Obj by registering my own object type so that it will call
    the registered free function once it knows the object is not
    needed anymore.

    In other words, I registered a new object-type, and I want
    the Tcl_Obj that holds my object to stay as is throughout its
    lifespan. ie. it will only call the free function when the
    refCount is zero.

    Now I want to stop users from executing commands that will
    convert the object of my type to something else. Probably
    something like:

    set a [newMyType]
    regexp {hello} $a
    --> Illegal operation on variable a,
    --> converting 'MyType' to 'string' is prohibited

    incr a
    --> Illegal operation on variable a,
    --> converting 'MyType' to 'int' is prohibited

    Currently, I know only how to detect this by implementing a
    free function that detects whether the refCount is already
    zero or not. But this function returns void and I cannot stop
    the users from doing this.

    I'm okay if there is a way to prevent users from using such
    commands on my obj. Is there a way to return an error at a
    situation like this now.

     
  • Donal K. Fellows

    • labels: 104249 -->
    • milestone: 248896 -->
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    That's not a supported mode of operation, though it's one
    that I know I would like to support. However, the core is
    not currently up to it and is not likely to change in this
    respect in 8.* as it would require both non-trivial
    alterations to the Tcl_ObjType structure and a thorough
    code-review of a large fraction of the core to ensure that
    there aren't any holes where something could slip through.
    The current policy is that internal representations are just
    caches of what can be derived (possibly in some context)
    from the utf-8 string version of the object.

    BTW, I've done "magic objects" in the past and it works
    fairly well so long as you don't look at them in any way
    that doesn't understand exactly what they are. Putting them
    into variables and lists is fine. Treating them as strings
    or ints and you are on dangerous ground.

    This is a FRQ, not a bug.

     
  • Donal K. Fellows

    • assigned_to: nobody --> dkf
    • labels: --> 10. Objects
    • status: open-invalid --> open-later
     
  • Donal K. Fellows

    • summary: Regexp change Tcl_Obj type to string --> Inhibit some type conversions
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    See also RFE 219162