|
From: Brian G. <bri...@me...> - 2013-01-17 19:00:04
|
On Jan 17, 2013, at 10:37 AM, Andreas Kupries wrote: > On Thu, Jan 17, 2013 at 10:22 AM, Brian Griffin > <bri...@me...> wrote: >> It seems that tcl 8.5 is more aggressively simmering Tcl_Obj's to a >> StringObj then in 8.4. This is causing problems and concerns here. The >> problems are ultimately self induced, but the concerns are still valid. Why >> is it that [string] operations blindly force conversion when there is >> already a string present in the obj? > > The difference is that one, the "string", i.e. the 'bytes'-field > contains UTF-8 (var-length characters), and the other, the "string" > intrep, i.e. internal.ptr1, points to "Unicode" (actually UCS-16, I > believe, aka Tcl_UniChar, i.e. fixed-length, 2 byte/char). (Note that > the length field is bytes, not characters.) > > The string ops force conversion to the string intrep to be able to > directly index into the string, AFAIK, i.e. make use of the > fixed-length characters, to be fast. > > One of the things some us want to explore in the novem branches are > different string intreps like ropes, and/or indexing structures which > can use the UTF-8 without fully converting to a fixed-length > representation and still be fast. > > This however will not change the fact that even these will blow away a > non-string intrep for theirs. All these ideas are great, but that last sentence is troubling. > >> This seems to defeat the purpose of >> the Tcl_Obj! If the string representation is not going to be used, why >> would an alternative internal representation ever bother to produce one? >> (rhetorical question) >> >> Specifically, the [string length] operation is blowing away any internal >> representation regardless of whether a string (->bytes != NULL) is present >> or not. This seems unnecessary, overly aggressive, and counter intuitive >> given the fundamental design goal of Tcl_Obj to "behave like strings but >> also hold an internal representation that can be manipulated more >> efficiently". Blowing away the list representation just to find out the >> length of the string representation is darn less efficient in my book, not >> more. > > Oh, so the main point here is that the conversion happens even for > bytes==NULL (length == 0), where the length could be computed fast, > i.e. 0 for the empty string, without conversion !? That's a very special case, and I would assume a length of 0 would also mean a none interesting internal representation. I'm more concerned about a largish object where it takes time to generate the internal rep. Keeping that is powerful, even if there's a corresponding string present. Having to recreating it, possibly multiple times because of otherwise benign string inspections is disconcerting. -Brian |