The newly available allocCache field in Interp can be used to speed up most alloc/free as well as and Tcl_Obj allocation and freeing.
Specifically, a simple extra interface to the functions in generic/tclThreadAlloc that receives a Cache pointer would bypass accessing the TSD. For Tcl_Obj allocation and freeing, the work can be done mostly in macros (see the macro TclAllocObjStorageEx() in HEAD).
What this means is that we should pass the interp where available to the alloc and obj creation/deletion functions or macros (NULL if unknown). Most of the core activity happens with an interp at hand.
This is a fairly trivial project, but rather bothersome (2000+ locs to change). Note that the "upgrade" can be done gradually, starting with the most critical parts. Note also that this would make normal allocations compete with TclStackAlloc in terms of speed, so that we could remove it and be free of its bothersome limitations.