Re: [Tcl9-cloverfield] [TCLCORE] Cloverfield references (was Re: Variable access)
Status: Alpha
Brought to you by:
fbonnet
From: Frédéric B. <fre...@fr...> - 2008-05-22 22:00:56
|
Alexandre Ferrieux wrote: > On 5/21/08, Larry McVoy <lm...@bi...> wrote: >>> My goal with Cloverfield is to keep the existing Tcl semantics >>> (copy-on-write, pass-by-value) while adding references. This implies >>> that the string rep of a structure holding references may change over >>> time. This would be grossly inefficient if the whole string rep had to >>> be recreated over and over again, especially for large structures, so we >>> need an efficient string representation that allows for selective >>> invalidation of substrings. >> Is this really true? It seems to me it is only true for things wanting >> to get at the string rep. And yes, you want to be able to puts($obj) >> but if all the code you are calling is compiled and knows how to get >> at the internals isn't that a have-your-cake-and-eat-it-too? > > I concur with Larry. See my attempts with the Mutability patch earlier > this year, and also a (long) thread on clt exploring different options > for this: > > http://groups.google.com/group/comp.lang.tcl/tree/browse_frm/thread/b6c26a33f5ebd3e0/b9e4201c7c7c6c6c?hl=en&rnum=101&q=fredderic+ferrieux&_done=%2Fgroup%2Fcomp.lang.tcl%2Fbrowse_frm%2Fthread%2Fb6c26a33f5ebd3e0%2Fb94505c1f6b335db%3Fhl%3Den%26lnk%3Dgst%26q%3Dfredderic%2Bferrieux%26#doc_b9e4201c7c7c6c6c Very interesting discussion (and a bit heated at times). Oddly enough, it took place at the same time I was designing references in Cloverfield. There must have been some kind of mental link between us at this moment ;-) I took some time to read the whole discussion. A lot of concepts were developed that are quite similar to what I propose in Cloverfield. Also, there seems to be a demand from one part of the community, and a rejection from an opposite part, either from plain dismissal of the concept, or out of frustration that it may even be implementable in the current state of Tcl. In short there seems to be a general consensus that references and EIAS don't mix well. > Basically, two approaches were imagined: > > (a) keep track of back references (from containee to container) and > invalidate string reps "up the container hierarchy" when a mutable > object has been updated. > > (b) keep a list of mutable objects whose string rep has been > computed, and mass-invalidate them when any time a mutable is poked > into. > > Only (b) was implemented in my stillborn patch (you can imagine why). > But, as Larry suggest, unless you stuff your code with [puts], the > performance hit of mass invalidation is fairly small in practice. > > Notice that this was done within true Tcl, where string reps are true > strings, not ropes. Of course the efficiency trade-off is between: > > - late rope-to-string (Cloverfield): no mutability overhead, but no > cacheing of the linearized string of a rope. > > - string rep caching (Tcl): variable mutability overhead (a,b), but > full cacheing of string reps until invalidated. > > Do you agree with this summary ? Do you see alternatives ? Regarding the string representation, I must clarify my point of view. I put EIAS (and hence the concept of string representation) on a more conceptual level than the current implementation does. I don't think that generating a string representation in the traditional sense is a strong requirement. Rather, I think that we should provide a means to obtain (notice the choice of word) this string representation on demand, for example using C++-style iterators, or using rope structures. Besides, there is a strong duality between iterators and ropes: they are two facets of the same concept, the former on the algorithmic level, the latter on the structural level. So a string representation only needs to be enforced at the script level. At the internal level, the iterator abstraction would be used instead of a plain string. This means that this concept cannot be used efficiently in the current state of Tcl. IOW, a rope object type won't solve our problems. Rather, Tcl_Objs should use ropes instead of plain strings for string representation. (*) Moreover I think that implementing references or partially mutable values is simply impossible in the existing implementation, given the current semantics of Tcl (COW, pass-by-value, immutability...), and that is the cause of the (relative) failure of your patch. That is also the reason why Cloverfield needs to be a global approach, because any partial approach would eventually clash with some other part of the core. My proposal implies a complete redesign of the core in term of graphs. Data structures are collections of nodes. Nodes are bound to each other by directed edges. Nodes hold string values, and edges are references. Variables are entry points to a graph. A node's string rep can be obtained by iterating over its adjacent nodes. At present Tcl data structures are trees, which are acyclic directed graphs, and whose string rep is generated by a depth-first traversal. Cloverfield proposes to extend the syntax and the serialization format, so as to introduce references and allow the representation of arbitrary directed graphs (potentially cyclic). There are a lot of implications regarding the implementation (backrefs and friends), but I think it's doable (I've already done that kind of things in other projects, details upon request). The first casualties will be Tcl_Objs and variables as we know them, so that means a lot of incompatibilities. This explains why Cloverfield targets Tcl 9. (*) There are ways to plug ropes in place of C strings even in the current implementation. See for example the wiki page on ropes (http://wiki.tcl.tk/rope) and take a look at the original paper on "C Cords". There is also some valuable info on the page about SGI implementation details. |