On 2011/03/25 07:20 PM, miguel sofer wrote:
On 03/25/2011 01:35 PM, Jeff Hobbs wrote:
	http://wiki.tcl.tk/1611 test 280 LIST lset foreach
8.6 is 6x slower, so we'll need to look at that.
We did, didn't we? It's one of:
Yes we did, and none of the above.

I picked this up when the whole performance issue exploded late in January, tracked it down, and wrote the following to you and tcl-core:

On 2011/02/01 07:19 PM, miguel sofer wrote:
No mystery: NRE is known to be sloooow on non-bytecompiled loops. Look there as the first suspicious spot whenever you see huge slowdowns.
Ah, of course.  And tcl::unsupported::disassemble is my friend ;)

But no, that's not the issue.  Replace "lset" with "lappend" and you get 703ms -> 1078ms rather than 188ms -> 4796ms for "lset" (new timings, switched computers).
More info:

  time { apply {{lobj} { time { lset lobj 10 x } 1000 }} $lobj } 1000 ;# 4547.0 microseconds per iteration (328.0 ms/iter on 8.5-head)
  time { apply {{lobj} { time { linsert $lobj 0 x } 1000 }} $lobj } 1000 ;# 4735.0 microseconds per iteration
  time { apply {{lobj} { for {set i 0} {$i < 1000} {incr i} { linsert $lobj 0 x }}} $lobj } 1000 ;# 4609.0 microseconds per iteration

On every iteration [lset] sees the value associated with "lobj" as shared, and duplicates the object (in TclLsetFlat in tclListObj.c).

When TEBCResume() exits the refcount on the object is 2 (+1 from storing in variable "lobj", +1 from Tcl_SetObjResult).  On the next entry to TEBCResume() and subsequently TclLsetFlat(), the interp result is still the same object.  Only after the duplicate is created and assigned to "lobj" does the old results refcount hit 0 allowing it to be freed.

My best guess is that TclEvalObjEx() or similar should be doing a Tcl_ResetResult();


The key here is that the NRE changes have affected something that would have decremented the reference count on an Tcl_Obj used within the loop.  As a result, instead of the list being seen as unshared and [lset] altering the list itself, it sees the list as shared and duplicates it. 

There didn't seem to be any changes in the [lset] code or the bytecode engine that introduced an extra +1 refcount, so I suspect that this is not unique to the lset/foreach case.