| 
     
      
      
      From: Joe E. <jen...@fl...> - 2013-01-18 04:59:47
      
     
   | 
Don Porter wrote: > > Remember that [string length] returns the number of characters in > the string. That information is not stored anywhere in the Tcl_Obj struct. > > (objPtr->length is in bytes, not chars) > > The shimmer happens so that we have someplace to store the computed > length in chars, so when it's asked for again we don't have to compute > it again. I wonder if that's worth revisiting for novem (or possibly earlier). I have a hunch that in many (most?) real programs, the UCS-2 representation never pays off -- [string length] is only amortized O(1) if you call it a lot :-) the first one still costs O(n). >From a quick scan through the manpages, it looks like [string length], [string range], [string index], [string replace], and the regexp operations are the main things that algorithmically benefit from a UCS-2 representation. Pretty much everything else can be implemented just as efficiently on UTF-8. (For that matter so can all the regexp operations, if we're willing to replace the Spencer engine.) --Joe English * * * ... Well, there's also [string reverse], but if that's ever been used outside of "How fast can *your* language reverse a string?" shootouts I've yet to see it.  |