Re: [TCLCORE] Agressive shimmering

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Jan 17, 2013, at 10:37 AM, Andreas Kupries wrote:

> On Thu, Jan 17, 2013 at 10:22 AM, Brian Griffin
> <bri...@me...> wrote:
>> It seems that tcl 8.5 is more aggressively simmering Tcl_Obj's to a
>> StringObj then in 8.4.  This is causing problems and concerns here.  The
>> problems are ultimately self induced, but the concerns are still valid.  Why
>> is it that [string] operations blindly force conversion when there is
>> already a string present in the obj?
> 
> The difference is that one, the "string", i.e. the 'bytes'-field
> contains UTF-8 (var-length characters), and the other, the "string"
> intrep, i.e. internal.ptr1, points to "Unicode" (actually UCS-16, I
> believe, aka Tcl_UniChar, i.e. fixed-length, 2 byte/char). (Note that
> the length field is bytes, not characters.)
> 
> The string ops force conversion to the string intrep to be able to
> directly index into the string, AFAIK, i.e. make use of the
> fixed-length characters, to be fast.
> 
> One of the things some us want to explore in the novem branches are
> different string intreps like ropes, and/or indexing structures which
> can use the UTF-8 without fully converting to a fixed-length
> representation and still be fast.
> 
> This however will not change the fact that even these will blow away a
> non-string intrep for theirs.

All these ideas are great, but that last sentence is troubling.

> 
>> This seems to defeat the purpose of
>> the Tcl_Obj!  If the string representation is not going to be used, why
>> would an alternative internal representation ever bother to produce one?
>> (rhetorical question)
>> 
>> Specifically, the [string length] operation is blowing away any internal
>> representation regardless of whether a string (->bytes != NULL) is present
>> or not.  This seems unnecessary, overly aggressive, and counter intuitive
>> given the fundamental design goal of Tcl_Obj to "behave like strings but
>> also hold an internal representation that can be manipulated more
>> efficiently".  Blowing away the list representation just to find out the
>> length of the string representation is darn less efficient in my book, not
>> more.
> 
> Oh, so the main point here is that the conversion happens even for
> bytes==NULL (length == 0), where the length could be computed fast,
> i.e. 0 for the empty string, without conversion !?

That's a very special case, and I would assume a length of 0 would also mean a none interesting internal representation.  I'm more concerned about a largish object where it takes time to generate the internal rep.  Keeping that is powerful, even if there's a corresponding string present.  Having to recreating it, possibly multiple times because of otherwise benign string inspections is disconcerting. 

-Brian

Re: [TCLCORE] Agressive shimmering

The Tool Command Language implementation

Re: [TCLCORE] Agressive shimmering