From: Jan N. <nij...@us...> - 2013-01-25 08:00:16
|
2013/1/25 Brian Griffin <bri...@me...>: > Executive summary: Jan's change cost about 2% more in performance on one of our regression tests. Combining this with the results shown below from tclbench, the conclusion I have is that overall it's not worth it. It would be more cost effective to avoid shimmering in the first place by careful construction of the tcl script(s). .... > I think it would be worthwhile for someone to investigate for novem either A) storing charlength directly in the Tcl_Obj, or B) replacing (byte)length in the Tcl_Obj with charlength, and compute bytelength when needed. These may have been proposed before for better string performance, I don't know. If so, sorry for the duplication. > > Thanks Jan for the patch and for poking me to invest in testing it! Your welcome. For novem I would suggest another approach: The Unicode internal representation was good when Unicode didn't have more than 1 plan yet, but when we want to support characters outside the BMP (see TIP #389) it's not good any more. So I would abandon the unicode internal representation totally and put something else in place. That might be ropes (from Colibri) or some kind if index table (for speeding up searching character indexes in the string), or something completely else some smart person comes up with. This would mean that [string length] is calculated simply by counting the UTF-8 characters if no internal representation is present, or doing some smart speedup when a known internal representation is there. Doing that, we can easily win back the 2% loss that you report, that's what I expect. But more experiments will be needed to find out what the alternative internal representation for string should be. In "novem" we are totally free in re-designing that, The "no-shimmer-string-length" branch is closed now. Thanks for doing all the work, I agree with your conclusion. Regards, Jan Nijtmans |