tcl9-cloverfield Mailing List for Tcl9 (Page 2)
Status: Alpha
Brought to you by:
fbonnet
You can subscribe to this list here.
2008 |
Jan
|
Feb
(3) |
Mar
(9) |
Apr
(22) |
May
(64) |
Jun
(13) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
(12) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
(1) |
2011 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
(1) |
From: Andy G. <unu...@ai...> - 2009-01-13 05:46:49
|
fre...@fr... wrote: > Hi Andy! It is good to hear from you again. I was worried that you had fallen off the edge of the earth, or (worse), off the edge of the Internet! > I've had some time to work on my rope thing (as you can read from the > message I've just posted) I did read it (the email, not the sources), and I am quite excited. I've been thinking about ropes and whether they're really worth it, and the conclusion I came to is that they can help performance when they are implemented as indexes over concatenations of strings of mixed character width, especially if the strings are mostly contiguous in memory, such as when they were read from a file. But if the ropes are "merely" the container, then they negatively impact locality of reference. Your email agrees with my intuition and backs it up with both implementation and measurements. Very cool! > I have some idea for the next stage, AKA the Grand Unification Scheme of > string and object structures. The ropes I had in mind were actually indexes over arrays of arbitrary objects. Characters are just one popular type of object. ;^) > I'm also planning a (slight) revision of the syntax proposal I'd like to see it. > Glad to know you've made some progress on your side, I'd be happy to > learn more. I made substantial changes to the syntax, to the point of it being unfair to call my work "Cloverfield". I haven't come up with another name yet. I warn you: this is all very preliminary, and it's surely not self-consistent. It's just ideas. One major difference is that I backed off completely from the word modifier idea. I started by dropping null, since I found that out-of-band signaling is simpler and works better for the use cases I had in mind for it. For example, I wanted it for detecting when an optional proc argument isn't used. This can be detected more reliably by checking for the variable's existence. (Of course, this requires a change to the way proc works.) Otherwise, a proc can't tell the difference between not getting an argument and getting null as its argument. As for interfacing with SQL, just do "select ..., x, x is not null, ..." for the relatively few items that can be null. Or modify SQLite, etc. to optionally unset a variable to signify that it is null. And even that is only needed when the variable can assume absolutely any string value; often data is somehow constrained (numbers only, non-empty strings only, alphanumerics only, etc.), and it's easy to pick a sensible value that is distinct from all possible data. With null gone, I next looked at metadata. I decided to throw it out as well since its uses can be handled instead by data structures. For example, I often need multiple pieces of code to track internal "annotations" they place on common data. They most easily do this by using the data as the index into an internal associative array. But with the Cloverfield metadata facility, either only one annotation at a time is allowed, or every module in an application has to agree on a convention for not stomping on every other module's annotations. I dropped delayed substitution since I can't think of a good use for it. Argument expansion has proven to be very valuable to Tcl, so of course it stays. But I don't extend it to allow multiple levels of expansion. I have never seen a case where this would help, and it can be implemented in script. I really want word comments, but I don't want to think of them as "word"-anything. Instead I use the more familiar term "inline comments", and the syntax is #{...}# , where the opening #{ starts wherever a word can. Braces are matched (modulo embedded quoting), and it is an error if a # does not immediately follow the final closing brace. (Note: Vim's syntax highlighter cannot handle nested comments, so I might have to submit a patch.) I have a very different idea for references, to be explained below. But it is not implemented in terms of word modifier notation. I can do without raw data, since brace quoting has always been adequate for my needs. I also don't like picking the end-of-text delimiter string, because I have to be careful to pick different ones when nesting heredocs. If I have truly oddball data that can't be cleanly expressed using the existing quoting rules, I think I'll just put it in an external file. In that case, the end of the data is tracked out-of-band by the filesystem under the rubric of file size. In my design, word modifiers have been cut back down to size, and the features they would have imparted are moved to other territory. Quoting, however, is just as it is in Cloverfield. (Well, except for raw data.) I have parenthesized words, and I don't count embedded braces that are inside double quotes or comments. On to references! At the moment (I can change my mind in an instant!), my design actually has two kinds of references. The difference is in how they are dereferenced. One kind, which somewhat corresponds to C pointers, requires explicit dereferencing. The string representation of such a reference is "object 123" or similar. The other kind, which somewhat corresponds to C++ references, is dereferenced automatically. The string representation is the instantaneous value of the referent variable. For now, I name these "pointers" and "references", respectively. A pointer to an object is constructed by prefixing the object's variable name with an @. "set ptr @foo" returns "object 123", and "puts $ptr" prints "object 123". To get the value of foo given ptr, an @ is written *after* the variable substitution: "puts $ptr@". Why @? I would have used * if it wasn't already part of [expr] syntax. Why after? Because dereferencing can be used in combination with vector/key indexing, and I want to avoid needing parentheses. It is legal to obtain a pointer to a nonexistent variable, but such a pointer can't be used to get the variable's value, not until the variable is initialized. This corresponds to ordinary, unadorned names in Tcl. Any word is a potential variable name, and it's perfectly legal to write words that are not (yet) the names of existent variables. Just don't use them to take their value, not until they have been initialized. A reference to an object is constructed by prefixing the object's variable name with an &. "set foo bar; set ref &foo" returns "bar", and "set foo quux; puts $ref" prints "quux". For now I think I will allow reseating a reference by assigning a variable to a new reference value. I won't allow converting a reference to a non-reference, not without unsetting and recreating the variable. But I need to experiment before I can be sure this is the right thing to do. Again, it is legal to obtain a reference to a nonexistent variable; just don't take its value. Whereas multiple $variable substitutions can be concatenated within a single word, @pointers and &references can only be the entire word. Neither @ nor & syntax are recognized inside [expr]. Why two different kinds of references? Well, Cloverfield has two kinds of references. ;^) But seriously, the first kind ("pointers") is for building potentially circular data structures, and the second kind ("references") is for everyday use. And I really mean "everyday use". Let me explain. For indexing to be safe, reliable, and efficient, the parser needs to know when a word is a name. When the programmer helpfully prefixes the word with $, this tells the parser that the word is a name, so it can correctly apply name quoting and indexing rules and will not eagerly perform nested substitutions (e.g. inside keyed indexing) but will defer them to the variable lookup code, where they belong. It's the same deal as brace-quoting with [expr] or SQLite. Okay, so how does the programmer tell the parser that a word is a name without asking that it be dereferenced? Start the word with an & instead of a $. Moreover, in my language it is illegal for the first argument to [set] to be a bare, literal word; the name must be a reference! This is similar to the requirement in PHP that all variable names start with $, but it is superior (I think) in that the syntax clearly indicates whether the code is dealing with a variable's value ($) or the variable itself (&). (Yeah, all my code examples so far ignored this particular design decision.) I should point out one major difference between my references and C++'s references. In C++, reference creation is transparent to the caller of a function that takes an argument by reference. This can catch the programmer by surprise. In my language, the caller has to explicitly create the reference. In both languages, the dereferencing is transparent. I note that with pointers in C, C++, and my language, both pointer construction and dereferencing are explicit, so there is no confusion. Fine point: in my language, "dereferencing" a pointer actually results in a reference to the value, not the literal value; this makes it possible to change a variable's value given a pointer to the variable. I'm also thinking of supporting a $& form which can be thought of as creating an anonymous reference and immediately dereferencing it. It's only useful in combination with vectored or keyed indexing, but I think this will still be important. Examples: "$&(foo bar){0}" will be substituted with "foo". "proc &whatever () {return (foo bar)}; puts $&[whatever]{0}" will print "foo". I think this avoids the need for [dict get], [lindex], etc. on data that is not stored in a variable. Miscellaneous: - \e gives ASCII 27, just like with gcc. - $(...) is like [expr {...}]. Empty string is not a valid variable name, and I don't consider parenthesized quoting to be useful in generating variable names. - Vectored index range endpoints are separated by : instead of .. , to match Python. Also an optional nonzero third value (preceded by : ) gives the stride. Valid uses for references and pointers to vectored index ranges may be restricted. - Numbers can be expressed in sexagesimal notation: 12'34'56.78 . ' is used instead of : to avoid conflict with ?: . I do a lot of GIS stuff at work, so this would be directly useful to me. It can be used for time as well as geography. - Variables, procs, commands, channels, etc. are all in a single object store, with string pseudo-values "object 123", "command 123", "channel 123", etc., kind of like in Python. Variables and procs have proper string representations. [unset] is used to destroy any reference to an object, and the object is garbage-collected when no references (e.g. variables) remain. Objects can be local to a stack frame. - A proc's string representation is its lambda expression. The first word of a command line is treated as a variable name or a namespace name. If it is a namespace name, the next word is the variable name, etc., to help facilitate ensembles. The variable is automatically dereferenced to get a lambda or "command 123" identifier, which is invoked. Some non-"command" objects are invokable, such as channels. - As much as the parsing, analysis, bytecode compilation, and execution machinery as possible is exposed at the script level as a standard package. - The parser is a collection of C routines designed to be easy to incorporate into other projects, as a library. - An object can have multiple representations, not just two. Non-string conversions may avoid an intermediate string form. Optimized for common dual-ported case. Hash table used for tracking multiple representations, and stale representations are removed from hash table. Hash table created when multiple non-string representations are valid. Example: taking the list representation of a dict. - Word origins are tracked and accessed with [info origin]. This facilitates syntax error messages. - Variadic and default arguments to proc can appear anywhere in the argument list, and the "args" argument can have any name; just append * to denote its catchall status, like in Python. Append ? to the name to make its default "value" to leave the variable unset. Use a two-element list to supply a real default value, like in Tcl. And that's what I have so far... -- Andy Goth | http://andy.junkdrome.org/ unununium@{aircanopy.net,openverse.com} |
From: <fre...@fr...> - 2009-01-12 15:39:29
|
Hi all, First, happy new year! Following the discussions we've had a while back about string representations, Unicode, Tcl9, Cloverfield and the like, I've been working during the past weeks on a rope package. You can find it here on the Tcl9 project on Sourceforge: http://sourceforge.net/project/showfiles.php?group_id=216988 The implementation is a materialization of several ideas I've developed over the years, with some borrowed from the seminal paper on Cedar Ropes by Hans Boehm et. al., available here: http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf Ropes are string structures where data is not stored in flat NUL-terminated arrays but in self-balancing binary trees, allowing for fast insertion/removal of arbitrarily long strings. The package is built around a dedicated memory allocator based on fixed-size cells, coupled with a generational, exact, mark-and-sweep garbage collector. Data structures: ================ Ropes are made of chunks of Unicode string data that can use either fixed-width formats (native C strings, UCS 1, 2 or 4 bytes) or UTF-8. Such string data chunks can take a variable number of cells; if the provided data is larger than the maximum size (here 63 cells, i.e. 1008 bytes minus the 4 or 8 byte header), it is split into several chunks and assembled transparently. Ropes can be made of chunks having different representations, thus allowing maximum compacity when mixing e.g. typically 7-bit Tcl code and foreign language data. Moreover, native NUL-terminated C strings are recognized as valid ropes from C code. Basic string chunks are assembled by concatenation to form larger ropes. Small strings are transparently merged into flat leaves, whereas larger ones use a self-balancing binary tree made of concat nodes. Substrings can also be extracted, and form either flat leaves for the smallest ones, or use a substr node for larger ones. The combination of both techniques allows for easy handling of arbitrarily large strings with minimal duplication and maximal sharing of raw string data. Apart from strings, all nodes are designed to fit into one single cell, thus providing maximum allocation performances and minimal memory impact. Indexing is O(1) for flat fixed-width ropes, O(n) for UTF-8 (as with Tcl), and O(logn) in general for complex ropes. The fact that basic string chunk size is limited also means that very large UTF-8 strings have better performances than flat UTF-8 strings (such as those used by Tcl) thanks to the intermediary indexing levels. Memory management: ================== The dedicated memory allocator is based on fixed-size cells (16 bytes on 32-bit architectures) within page-aligned memory. Each page contains a bitmask that indicates the cell status. For 1024-byte pages, this gives 64 16-byte cells, of which one is reserved. This results in a very small overhead (2 bit per cell) and a very good memory locality, both improving the performances dramatically (more on that below) on modern architectures compared to a traditional allocator (which typically uses linked list structures). This choice has been made because, with the exception of pure string data, rope nodes are typically small and easily fit within a 16-byte cell. Moreover, this allocator is coupled with a generational, exact (as opposed to conservative), mark-and-sweep garbage collector that again provides a huge speedup compared to manual free() calls. The only needed action is root declaration for all ropes that are externally referenced, using a simple reference counting scheme. The GC process is fully controllable in the sense that sections of code can be protected by pausing and resuming the automatic collection. Generational GC means that older ropes (having survived several GC cycles) are promoted to older pools that are collected less frequently; this limits the CPU impact of collections. Other features: =============== The package provides iterator structures and procedures, so porting existing code should be easy. I'm thinking especially about the regexp engine, which needs flat Unicode strings. Iterators abstract the whole stuff into a random-access model. Direct traversal of the string is O(1). A custom rope is available for extensions. Typical use would be for example memory-mapping of potentially very large datasets, even on memory-constrained systems (e.g. mobile platforms). Another use case would be programmatically generated data. Or it could be used to wrap malloc'd strings into ropes. Connections with Tcl: ===================== Tcl currently uses UTF-8 flat strings as its string representation. There have been some discussions about the format that future versions should use. One such proposal was augmented strings, where flat strings would be supplemented by additional information (indexing of UTF-8 strings for instance). Another concern was memory consumption needed for the support of UCS4 (32-bit chars). I think ropes provides a good solution to all this problems. Ropes are immutable strings, which fit the Tcl model perfectly. One can mix several formats transparently, making byte arrays unnecessary and removing a prominent cause of shimmering. And as complex types such as lists build their string representations from those of their elements, maximum reuse of existing strings is ensured and limits the memory impact even further. Last, the impact on client code is minimal thanks to the automatic garbage collector and the backward compatibility with C strings. For these reasons I think that ropes would be a good choice as a native string representation for future versions of Tcl. Performances: ============= Of course there are lies, damn lies and benchmarks, but the first performance test I've run on my system are very satisfactory. On a Core 2 Duo P8400 2.26GHz WinXP SP3 with 2GB RAM I get the following results (from test.c): (figures in ms) --------------------------------------------------------------------- testAlloc: ropes vs. malloc raw allocation performances Ropes: 40000000 12-byte ropes = 480000000 data bytes ... 1859 create + 610 GC = 2469 malloc: 40000000 12-byte C strings = 480000000 data bytes ... 6718 malloc + memcpy + 7922 free = 14640 Ropes: 20000000 28-byte ropes = 560000000 data bytes ... 1453 create + 625 GC = 2078 malloc: 20000000 28-byte C strings = 560000000 data bytes ... 3813 malloc + memcpy + 3828 free = 7641 Ropes: 15000000 44-byte ropes = 660000000 data bytes ... 1375 create + 703 GC = 2078 malloc: 15000000 44-byte C strings = 660000000 data bytes ... 3016 malloc + memcpy + 2953 free = 5969* Ropes: 1000000 1000-byte ropes = 1000000000 data bytes ... 1062 create + 1000 GC = 2062 malloc: 1000000 1000-byte C strings = 1000000000 data bytes ... 1047 malloc + memcpy + 391 free = 1438 --------------------------------------------------------------------- This shows that the allocator+GC performs usually faster than malloc+free, mostly because of the GC performances compared to free(). The malloc version outperforms the ropes only in the case of a large number of large strings. So this benchmark shows the real benefits of automatic memory management even on the simplest cases; in the general real-world case where a lot of small structure are allocated and freed during the lifetime of the application, the malloc version would perform closer to the worst case above, and maybe worse because of memory fragmentation. The following test is closer to a real world application: it runs several (10,000) cycles during which 80 large strings are allocated and preserved. --------------------------------------------------------------------- testGeneration: With all ropes preserved: 10000 x 80 988-byte ropes + roots = 790400000 data bytes : 16953 With no more than 10000 ropes preserved: 10000 x 80 988-byte ropes + roots = 790400000 data bytes : 4406 --------------------------------------------------------------------- This shows the generational properties of the GC: older ropes will be promoted to older pools and thus traversed less often by the collector. A real world application during its lifetime would typically store a fairly constant number of stable objects (global or static data, business models) and allocate a larger number of short-lived objects (input, output and temporary values). Generational GC ensures that the latter get collected more often than the former, and let stable objects percolate to deeper layers. Conclusion: =========== The package needs some polish (test suite, docs, etc.) but I think the code is fairly usable in its current state. At present it only works on 32-bit architectures, but a port to 64-bit should be straightforward. Moreover, as the custom allocator needs page-aligned memory, I've only implemented a Win32 version based on VirtualAlloc(). Posix systems would need posix_memalign() or something more suitable to get the same result (it uses a 1024-byte boundary on 32-bit systems). I plan to design a similar package but for objects, especially lists since they are very similar to strings, and integrate both closely. Comments welcome! |
From: Andy G. <unu...@ai...> - 2009-01-12 13:36:06
|
fre...@fr... wrote: > Happy new year to everyone! Heya. I've been wondering where you were. I have been doing some language design work, but in many ways it's in a different direction than your design proposes. I hadn't heard from you in so long, I figured my only choice was to do my own project. Want to see what I've been up to? -- Andy Goth | http://andy.junkdrome.org/ unununium@{aircanopy.net,openverse.com} |
From: <fre...@fr...> - 2009-01-12 13:16:38
|
Happy new year to everyone! |
From: Andy G. <unu...@ai...> - 2008-07-02 02:56:52
|
Right now I'm in Oklahoma for a week. Earlier this month I was in Bagotville, Quebec, for a week or so. And in another week I'm going back to Bagotville for another three weeks. B U S Y!!! -- Andy Goth | <unu...@ai...> | http://andy.junkdrome.org/ |
From: Neil M. <ne...@Cs...> - 2008-06-05 15:33:28
|
Mark Janssen wrote: [...] > What do you mean: "there is no special value", surely there is. If null > becomes a special value in Tcl you will always need to check for it when > calling out to library code (it might return 'null') > > set a [libraryCall 4] > > if {[null? $a]} { > # oops > } > > Instead when you need it you might just as well do: > > proc createNull {} { > return [list true {}] > } > > proc createValue {val} { > return [list false $val] > } > > lassign [createNull] isnull val > > if {$isnull} { > puts null > } else { > puts "not null: $val" > } This is essentially the approach taken in the various functional programming languages I mentioned. For instance, in Haskell, there is the Maybe datatype: data Maybe a = Nothing | Just a which we can have in Tcl much like you show: proc Nothing {} { list Nothing } proc Just a { list Just $a } switch [lindex $val 0] { Nothing { Nothing } Just { Just [f [lindex $val 1]] } } You can even extract this boilerplate switch into a separate function for chaining together sequences of nullable-functions (much like a shell pipeline): ---- proc do {val args} { foreach {| f} $args { switch [lindex $val 0] { Nothing { break } Just { set val [invoke $f [lindex $val 1]] } default { error "invalid value \"$val\"" } } } return $val } proc invoke {f args} { uplevel #0 $f $args } # Wrappers - can be adapted proc yield val { Just $val } proc null {} { Nothing } ---- And a quick demo using this functionality: # A function that might return "null" # div :: Num -> Num -> Maybe Num proc div {a b} { if {$b == 0} { null } else { yield [expr {$a/double($b)}] } } # Now use it: # recip :: Num -> Maybe Num proc recip val { div 1 $val } # Debug: proc debug str { puts $str; yield $str } proc funny val { do [recip $val] -> debug -> {div 2} -> debug -> yield } % funny 12 0.08333333333333333 24.0 Just 24.0 % funny 0 Nothing (This is the Maybe monad from Haskell). This gives you precisely an application-defined definition of Null and lets you encapsulate all the logic for dealing with it in one place (the definitions of do/yield/null). This is still much simpler than null (both in terms of implementation and semantics) and much more elegant. And, you only need to use this in the tiny portions of your application that need to use nulls and can't use dicts or some other simpler approach. -- Nei This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. |
From: Neil M. <ne...@Cs...> - 2008-06-05 14:33:26
|
Frdric Bonnet wrote: > Neil Madden wrote: >> Just to clarify, are we talking about a null value (used e.g. to >> represent missing or unknown information in SQL) or a null reference? > > Null values. To be pedantic, I could say neither, as null is a non-value ;-) > >> From the context, I'd say the primary concern here is something like a >> null/nil-value in Tcl, rather than null references. What is presumably >> actually wanted here is a way of representing missing/unknown >> information in some "direct" manner. This is a semantic issue, of which >> a distinguished nil (non-)value is only one possible solution. This is >> not a good solution, either theoretically or practically. From a >> theoretical point of view, introducing a special value just doesn't seem >> well motivated. It complicates the semantics of the language, for >> actually no real benefit. The semantics of how to handle nulls in >> various operations are quite complicated, and often contradictory. I >> don't know of a single language with null that doesn't have problems >> with it. There are plenty of other ways to encode >> missing/incomplete/unknown information within the existing language >> elements, as have been discussed several times before, and they are not >> just sufficient but also much better solutions. > > Ack'd. But I'm contemplating the problem not from a mathematically > correct point of view, but from a pragmatic point of view (not that > mathematical correctness isn't important, it's just off-topic). If your > application doesn't need this concept of nothingness or out-of-domain, > then don't use it. However there are many cases in real-world > applications when such a concept comes handy. I quite clearly said that the problems were both theoretical *and* practical. I also was careful not to say that missing/unknown values aren't important -- there is a huge amount of literature and practical experience on how to handle them. Nulls are the worst approach I know of. > > To me, the exact semantics of this non-value is application-specific, so > this should not be a matter of rejection. How can you specify a language feature if you do not know what its semantics should be? There has to be *some* description of how nulls are handled and what they mean, otherwise it's just an ad-hoc hack which will cause problems. If the meaning and behaviour of nulls is always application-specific, then define it in your application using one of the many excellent approaches that already exist. I mean, what problem are you solving if you just introduce a bit of syntax and leave it up to application authors to ultimately work out how to deal with it? That's why I didn't want > originally to define the semantics of string operations with nulls, and > instead throw an error at every instance. I thought this would be > simpler and closer to the Tcl philosophy. But this error-throwing > behavior seemed to cause some interrogations, so I removed it. Guess I > shouldn't have :*/ > >> The main practical problem is that the domain of all arguments to all >> commands is suddenly expanded to introduce this new option, which in 99% >> of cases is neither needed or expected, and yet it can turn up there. >> This means there is now another error case that all code has to be >> prepared to deal with, and, in most cases, won't be (e.g. >> NullPointerExceptions in Java which really are a big problem and can be >> a complete PITA to track down which 3rd party library unexpectedly >> returned null). > > But this is a design problem that's outside of our scope. After all, you > can write Fortran in all languages. Moreover NullPointerExceptions are > about null references, not values, so it's a totally different class of > problems. It is the same class of problems: you expected a value in some domain, and instead got some weird non-value that you can't possibly do anything useful with. This is very much an in-scope problem, as it simply doesn't exist with any other approach, so it must be explained and justified. Even the usual approaches to dealing with such error conditions won't work due to lack of string rep: error "unknown option \"$foo\"" ;# boom! [...] > >> Any discussion of introducing a nil/null value should start with some >> pretty strong motivational arguments, rather than syntax. > > Agreed, but I thought that this problem was widely understood and > accepted, and that the main objections would be on syntactic issues, as > for {*}. It happens that this is the other way round. The *problem* is widely understood, as are the solutions. Nulls are just about the worst solution to this problem though, which unfortunately have been propagated in a variety of languages. > >> It should >> explain exactly what problems this is supposed to solve (3-state logic >> is a particular solution/means, not a problem/end in itself), and just >> what is wrong with the existing approaches in Tcl: >> >> 1. For references, we already have variables which can be [unset] and >> tested for existence with [info exists] >> 2. You can encode null fields in a record using missing keys in either a >> dict or array and similarly test for (non-)existance. >> 3. You can use any container that has a natural "empty" element, e.g. a >> set or a list: {} = null, {a} = a >> 4. You can use a tagged list to simulate the maybe/option datatype as >> used in ML and Haskell: [Nothing] / [Just $foo] >> 5. You can just use a special string value that is outside of the domain >> of your application. > > I'm targetting cases where the domain has no empty element, i.e. > accepted values can be arbitrary strings, and you need to encode missing > info without having to use metadata or out-of band signaling. So this > rules out every above approaches. There are no such application cases, unless you are specifically designing them that way. > > I first encountered this problem when writing a custom CSS-like text > widget that I used to create multimedia CD-ROMs about 10 years ago. > Basically I wanted to mix the cascading style and box model of CSS with > the script interface model of the Tk text widget. So I started using the > empty list as an indicator of inheritance. But some options accepted > arbitrary strings, so this special indicator value was unsuitable > because it was part of the domain. This meant that I had to choose > another method to express inheritance. I the end I had to pair every > inheritable option with a flag option to tell whether to inherit or not. > This was tedious and awkward, but this was the only sensible option in > the lack of better alternatives. With a null concept, one could have set > a given option's value to null to indicate inheritance, given that null > is (by design) outside the domain of accepted values. This is a classic case for missing keys in a dictionary. Simply specify your configuration options as dicts of option/value pairs, and omit any configuration options that you want inherited -- [dict merge] will then do precisely the correct thing. This is much *easier* than dealing with nulls: set parent { font-family: Arial font-size: 12 } # Inherit font-family from parent: set child { font-size: 10 } set config [dict merge $parent $child] This is, after all, how CSS works (with the addition of the distinguished "inherit" keyword, which you could also use). -- Neil This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. |
From: Kevin K. <ke...@ac...> - 2008-06-05 14:05:29
|
Neil Madden wrote: > 1. For references, we already have variables which can be [unset] and > tested for existence with [info exists] > 2. You can encode null fields in a record using missing keys in either a > dict or array and similarly test for (non-)existance. > 3. You can use any container that has a natural "empty" element, e.g. a > set or a list: {} = null, {a} = a > 4. You can use a tagged list to simulate the maybe/option datatype as > used in ML and Haskell: [Nothing] / [Just $foo] > 5. You can just use a special string value that is outside of the domain > of your application. For what it's worth, TIP #308 adapts approaches 1. and 2. in Neil's list. Approach 5. is also available (the '-as lists' option in all the commands that interrogate result sets), but I foresee using it only in ad-hoc report generation. It is my belief that most of the people who insist that Tcl should have a special NULL value in all contexts have been misled by SQL database interfaces that insist on dealing with rows as lists. Of course, if rows are represented as lists, you need sparse lists to accommodate empty columns. TIP 308's approach to the problem: Don't Do That. -- 73 de ke9tv/2, Kevin |
From: Frédéric B. <fb...@en...> - 2008-06-05 13:46:48
|
Donal K. Fellows wrote: > As things stand, Tcl represents NULL by an unset variable or a missing > dictionary key. But these represent null *references*. I want to express null *values*, ie valid references or container elements that have no value. > The only problem occurs when you want to construct > sparse lists, but then I think “don't do that!” on that topic. :-) > > What are the use cases for NULL again? Cascaded/stacked/inherited options. See my other message (re: to Neil) for more details. To some extent this covers sparse lists. And in general, expressing undefined *values* when the domain covers arbitrary strings (i.e. there is no available "special" value). |
From: Frédéric B. <fb...@en...> - 2008-06-05 13:38:26
|
Neil Madden wrote: > Just to clarify, are we talking about a null value (used e.g. to > represent missing or unknown information in SQL) or a null reference? Null values. To be pedantic, I could say neither, as null is a non-value ;-) > From the context, I'd say the primary concern here is something like a > null/nil-value in Tcl, rather than null references. What is presumably > actually wanted here is a way of representing missing/unknown > information in some "direct" manner. This is a semantic issue, of which > a distinguished nil (non-)value is only one possible solution. This is > not a good solution, either theoretically or practically. From a > theoretical point of view, introducing a special value just doesn't seem > well motivated. It complicates the semantics of the language, for > actually no real benefit. The semantics of how to handle nulls in > various operations are quite complicated, and often contradictory. I > don't know of a single language with null that doesn't have problems > with it. There are plenty of other ways to encode > missing/incomplete/unknown information within the existing language > elements, as have been discussed several times before, and they are not > just sufficient but also much better solutions. Ack'd. But I'm contemplating the problem not from a mathematically correct point of view, but from a pragmatic point of view (not that mathematical correctness isn't important, it's just off-topic). If your application doesn't need this concept of nothingness or out-of-domain, then don't use it. However there are many cases in real-world applications when such a concept comes handy. To me, the exact semantics of this non-value is application-specific, so this should not be a matter of rejection. That's why I didn't want originally to define the semantics of string operations with nulls, and instead throw an error at every instance. I thought this would be simpler and closer to the Tcl philosophy. But this error-throwing behavior seemed to cause some interrogations, so I removed it. Guess I shouldn't have :*/ > The main practical problem is that the domain of all arguments to all > commands is suddenly expanded to introduce this new option, which in 99% > of cases is neither needed or expected, and yet it can turn up there. > This means there is now another error case that all code has to be > prepared to deal with, and, in most cases, won't be (e.g. > NullPointerExceptions in Java which really are a big problem and can be > a complete PITA to track down which 3rd party library unexpectedly > returned null). But this is a design problem that's outside of our scope. After all, you can write Fortran in all languages. Moreover NullPointerExceptions are about null references, not values, so it's a totally different class of problems. > Another practical problem is how to represent null > values at the C level: the obvious approach of converting a null value > into a NULL Tcl_Obj pointer just opens the door for Tcl code to cause > crashes in C commands that aren't careful about checking the arguments > they are passed (which I guess most aren't) -- that also would raise > security issues. I was rather thinking about a Tcl_Obj with both NULL string and type. So it has no string rep, can't get one, and can't be converted to another type. Null Tcl_Obj is another issue IMHO (and a cause of panic in the core). > Any discussion of introducing a nil/null value should start with some > pretty strong motivational arguments, rather than syntax. Agreed, but I thought that this problem was widely understood and accepted, and that the main objections would be on syntactic issues, as for {*}. It happens that this is the other way round. > It should > explain exactly what problems this is supposed to solve (3-state logic > is a particular solution/means, not a problem/end in itself), and just > what is wrong with the existing approaches in Tcl: > > 1. For references, we already have variables which can be [unset] and > tested for existence with [info exists] > 2. You can encode null fields in a record using missing keys in either a > dict or array and similarly test for (non-)existance. > 3. You can use any container that has a natural "empty" element, e.g. a > set or a list: {} = null, {a} = a > 4. You can use a tagged list to simulate the maybe/option datatype as > used in ML and Haskell: [Nothing] / [Just $foo] > 5. You can just use a special string value that is outside of the domain > of your application. I'm targetting cases where the domain has no empty element, i.e. accepted values can be arbitrary strings, and you need to encode missing info without having to use metadata or out-of band signaling. So this rules out every above approaches. I first encountered this problem when writing a custom CSS-like text widget that I used to create multimedia CD-ROMs about 10 years ago. Basically I wanted to mix the cascading style and box model of CSS with the script interface model of the Tk text widget. So I started using the empty list as an indicator of inheritance. But some options accepted arbitrary strings, so this special indicator value was unsuitable because it was part of the domain. This meant that I had to choose another method to express inheritance. I the end I had to pair every inheritable option with a flag option to tell whether to inherit or not. This was tedious and awkward, but this was the only sensible option in the lack of better alternatives. With a null concept, one could have set a given option's value to null to indicate inheritance, given that null is (by design) outside the domain of accepted values. So, yes, there are ways to do things without nulls, but that doesn't mean that such solutions are elegant or satisfactory. The fact that one never encountered such a situation doesn't mean that it doesn't exist. |
From: Donal K. F. <don...@ma...> - 2008-06-05 09:42:19
|
Frédéric Bonnet wrote: > Sorry but in 15+ yrs or a rather rich programming life with Tcl I've > felt such a need more than once ;-). Jokes apart, I agree that TIP #185 > is far from adequate to say the least. Hence my new proposal. As things stand, Tcl represents NULL by an unset variable or a missing dictionary key. The only problem occurs when you want to construct sparse lists, but then I think “don't do that!” on that topic. :-) What are the use cases for NULL again? Donal. |
From: Alexandre F. <ale...@gm...> - 2008-06-05 09:19:54
|
On Thu, Jun 5, 2008 at 11:14 AM, Frédéric Bonnet <fb...@en...> wrote: > > Sure but this is just an application of nil. I've mentioned 3VL because > that's the way it's done in SQL. However my primary use case is storing > nil in containers when no special value is available (e.g. values are > arbitrary strings including the empty string) and you can't (or don't > want to) use metadata or out-of-band. Come on, reserving a string value is not that bad. And if you really can't, out-of-band is the way to go: just prefix all your valid strings with a type character, and use another character for nil. Or, more Tclly efficient, store [list $type $string]. Plenty of ways. >> I think 185 largely deserves a clear rejection. > > We both agree that TIP #185 cannot be accepted in its current state. > However I made a new proposal that I think respects the Tao of Tcl. So > the question is: should we call for a vote on TIP #185 right now and > reject it, then proceed on a new TIP, or should we (I) rewrite the TIP? Dunno, contact the author maybe... > Moreover, can you comment on my specific proposal? Or do you think that > the very concept of nil is worthless (this is a real question)? If you read a bit between the lines of my arguments against 185, you'll quickly figure out what I think of adding special values with special semantics and special options ;-) -Alex |
From: Frédéric B. <fb...@en...> - 2008-06-05 09:08:52
|
Alexandre Ferrieux wrote: > Sorry but in 10+ yrs of a rather rich programming life with Tcl I've > never felt such an urge. I understand the importance of the concept in > the database subfield, but looking at TIP185 I hope it will stay in > that subfield :-} Sorry but in 15+ yrs or a rather rich programming life with Tcl I've felt such a need more than once ;-). Jokes apart, I agree that TIP #185 is far from adequate to say the least. Hence my new proposal. >> But such a non-value has also a lot of applications in various other >> domains such as three-state logic. > > Surely you don't need an entirely new, exotic internal Tcl value to > handle 3-state logic. Just take 3 (string or numeric) constants, > define the truth tables and you're done. Sure but this is just an application of nil. I've mentioned 3VL because that's the way it's done in SQL. However my primary use case is storing nil in containers when no special value is available (e.g. values are arbitrary strings including the empty string) and you can't (or don't want to) use metadata or out-of-band. >> Meanwhile I also discovered the existence of TIP #185. However it seems >> to be mostly defunct. > > Yeah, and rightly so. Frankly I think its posting date is off by a > week. It should have been 01 (not 08) April 2004. > >> Comments welcome. > > I think 185 largely deserves a clear rejection. [...] We both agree that TIP #185 cannot be accepted in its current state. However I made a new proposal that I think respects the Tao of Tcl. So the question is: should we call for a vote on TIP #185 right now and reject it, then proceed on a new TIP, or should we (I) rewrite the TIP? Moreover, can you comment on my specific proposal? Or do you think that the very concept of nil is worthless (this is a real question)? |
From: rna020 <tom...@fr...> - 2008-06-04 14:54:13
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Frédéric<br> <blockquote cite="mid:484...@en..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="MS Exchange Server version 6.5.7653.2"> <title>[TCLCORE] null handling and TIP #185</title> <!-- Converted from text/plain format --> <p><font size="2"> 1. Instead of {null} or {nil} prefix, we could use {}. Not only is<br> this shorter but this also follows the same logic as {*}: the latter<br> expands the value in place, whereas the former turns any value into<br> nothing. Moreover this frees us from cultural aspects about the<br> differences between NULL vs. nil vs. void vs UNK in various languages.<br> As a syntactic sugar one could write {}NULL or {}nil or {}whatever as a<br> matter of personal taste, depending on the application.</font></p> </blockquote> I think the use of {} for a NULL indicator would be a big mistake because it would break the semitry between quoting with the double quote character (") and curly braces ({}). In particular the use of a set of empty curly braces (i.e. {}) as an alternative to an empty quoted string (i.e. "") is, I suspect, widely used in existing code where an empty string needs to be imbeded in a quoted string. I personally would much perfer representing a true null with something like {NULL},{NIL},{\0},{?}...etc. I don't think cultural concerns should drive the design.<br> </body> </html> |
From: Neil M. <ne...@Cs...> - 2008-06-04 13:52:47
|
Hi Frdric, Frdric Bonnet wrote: > [cc'd to tcl9-cloverfield] > > Howdy folks, > > I have the strong impression (and following the discussion on c.l.t > about merging variables and arrays) that Tcl would benefit from proper > null (or nil) handling. This seems to be a long standing expectation > from part of the community, especially for dealing with SQL databases. > But such a non-value has also a lot of applications in various other > domains such as three-state logic. > Just to clarify, are we talking about a null value (used e.g. to represent missing or unknown information in SQL) or a null reference? These are different things, although a null reference can simulate a null value by making everything a reference. However, I'd say both are bad ideas. From the context, I'd say the primary concern here is something like a null/nil-value in Tcl, rather than null references. What is presumably actually wanted here is a way of representing missing/unknown information in some "direct" manner. This is a semantic issue, of which a distinguished nil (non-)value is only one possible solution. This is not a good solution, either theoretically or practically. From a theoretical point of view, introducing a special value just doesn't seem well motivated. It complicates the semantics of the language, for actually no real benefit. The semantics of how to handle nulls in various operations are quite complicated, and often contradictory. I don't know of a single language with null that doesn't have problems with it. There are plenty of other ways to encode missing/incomplete/unknown information within the existing language elements, as have been discussed several times before, and they are not just sufficient but also much better solutions. The main practical problem is that the domain of all arguments to all commands is suddenly expanded to introduce this new option, which in 99% of cases is neither needed or expected, and yet it can turn up there. This means there is now another error case that all code has to be prepared to deal with, and, in most cases, won't be (e.g. NullPointerExceptions in Java which really are a big problem and can be a complete PITA to track down which 3rd party library unexpectedly returned null). Another practical problem is how to represent null values at the C level: the obvious approach of converting a null value into a NULL Tcl_Obj pointer just opens the door for Tcl code to cause crashes in C commands that aren't careful about checking the arguments they are passed (which I guess most aren't) -- that also would raise security issues. Any discussion of introducing a nil/null value should start with some pretty strong motivational arguments, rather than syntax. It should explain exactly what problems this is supposed to solve (3-state logic is a particular solution/means, not a problem/end in itself), and just what is wrong with the existing approaches in Tcl: 1. For references, we already have variables which can be [unset] and tested for existence with [info exists] 2. You can encode null fields in a record using missing keys in either a dict or array and similarly test for (non-)existance. 3. You can use any container that has a natural "empty" element, e.g. a set or a list: {} = null, {a} = a 4. You can use a tagged list to simulate the maybe/option datatype as used in ML and Haskell: [Nothing] / [Just $foo] 5. You can just use a special string value that is outside of the domain of your application. These approaches work very well, have very clear and intuitive semantics (as they just build on the existing semantics), and don't cause any surprises: they can be confined to just that part of the application that actually needs them, and most operations on them behave in an intuitive manner (e.g. merging dictionaries with missing keys). The languages that really seem to have got references and dealing with missing information right (ML, Haskell, etc) have all done so by completely abolishing any notion of null and instead introducing specific mechanisms for specific problems that have limited and controlled scope. This wasn't by accident, but by careful design. We should not ignore these lessons, as they are not just applicable to statically-typed languages. Neil This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. |
From: Alexandre F. <ale...@gm...> - 2008-06-04 12:24:22
|
On Wed, Jun 4, 2008 at 12:23 PM, Frédéric Bonnet <fb...@en...> wrote: > > I have the strong impression (and following the discussion on c.l.t > about merging variables and arrays) that Tcl would benefit from proper > null (or nil) handling. This seems to be a long standing expectation > from part of the community, especially for dealing with SQL databases. Sorry but in 10+ yrs of a rather rich programming life with Tcl I've never felt such an urge. I understand the importance of the concept in the database subfield, but looking at TIP185 I hope it will stay in that subfield :-} > But such a non-value has also a lot of applications in various other > domains such as three-state logic. Surely you don't need an entirely new, exotic internal Tcl value to handle 3-state logic. Just take 3 (string or numeric) constants, define the truth tables and you're done. > Meanwhile I also discovered the existence of TIP #185. However it seems > to be mostly defunct. Yeah, and rightly so. Frankly I think its posting date is off by a week. It should have been 01 (not 08) April 2004. > Comments welcome. I think 185 largely deserves a clear rejection. Here is why: - assymmetric conversion (you can't shimmer list->string->list safely) violates EIAS - null-propagation violates every possible expectation about the (String,concat) monoid (an absorbing element is an atomic bomb) - the TIP quickly moves away from the true motivations (SQL), into an awkward in vitro construction with complex impacts all over the core (new options, new semantics). - most cases are easily solved by a configurable reserved value for nulls -Alex |
From: Frédéric B. <fb...@en...> - 2008-06-04 10:18:31
|
[cc'd to tcl9-cloverfield] Howdy folks, I have the strong impression (and following the discussion on c.l.t about merging variables and arrays) that Tcl would benefit from proper null (or nil) handling. This seems to be a long standing expectation from part of the community, especially for dealing with SQL databases. But such a non-value has also a lot of applications in various other domains such as three-state logic. That's the reason why I included NULL value handling in Cloverfield. Meanwhile I also discovered the existence of TIP #185. However it seems to be mostly defunct. Cloverfield proposes the following syntax: {null}SomeArbitraryData {nil}SomeArbitraryData That is, any word prefixed by {null} or {nil} is interpreted by the parser as a NULL value. However Cloverfield doesn't develop the semantics for now. For example, what does the following: set v {null}[puts foo] puts "v is $v" append v bar puts "v is $v" expr {$v == $v} expr {$v == {null}0} See also the discussions on the Wiki: http://wiki.tcl.tk/17441 http://wiki.tcl.tk/20638 (look for {null}) OTOH, TIP #185 proposes a similar syntax {null}! and describes the semantics, but also implies that many commands must be modified to accept new options for NULL handling, e.g. [switch -null]. I personally don't agree with some of the proposed semantics and think that such value must be handled transparently. But I acknowledge that people may have various interpretations of what NULL exactly means. So I'm requesting feedback from you people about the current TIP #185 and the Cloverfield proposal, and NULL handling in general. I'm ready to bite the bullet and rewrite the TIP in order to provide an implementation for the upcoming Tcl8.6 if possible. For now, I can make the following proposal: 1. Instead of {null} or {nil} prefix, we could use {}. Not only is this shorter but this also follows the same logic as {*}: the latter expands the value in place, whereas the former turns any value into nothing. Moreover this frees us from cultural aspects about the differences between NULL vs. nil vs. void vs UNK in various languages. As a syntactic sugar one could write {}NULL or {}nil or {}whatever as a matter of personal taste, depending on the application. 2. Nil should compare positively to itself, and be the only value to do so. TIP #185 proposed that it should compare positively with nothing including itself, but I disagree, especially because this breaks the asumption that NaN is the only value that has this property (which is the case in all languages I know, and is part of the IEEE standard) and that Tcl properly supports NaN and other special values such as Infinity anyway. It also makes switch and list handling more complicated (hence the need for special flags in TIP #185). 3. When used as an lvalue, mutable operations on a variable holding nil have no effect. For example: set v {}nil append v foo ; # v is still nil 4. When used as an rvalue, the string representation of nil can be controlled on a per interpreter basis using for example a global variable (tcl_nil). For example: set tcl_nil NULL set v {}nil puts "v is $v" ; # Gives "v is NULL" If this variable is not set, nil "taints" other values. 5. the {}-prefix is recognized by the list parser, so that nil can be included in lists (this includes switch patterns). The canonical form could be {}{}. The aforementioned global variable could also be used to control list serialization: set tcl_nil NULL set l [list 1 {}nil 2] puts "l is {$l}" ; # Gives "l is {1 {}NULL 2}" Comments welcome. |
From: Frédéric B. <fb...@en...> - 2008-06-02 14:00:43
|
Andy Goth wrote: > On Thu, 29 May 2008 14:02:03 +0200, Frédéric Bonnet wrote >> Not that the very concept of grouping is worthless; after all this >> concept is implicit even in Tcl. > > It is implicit, but it's far behind the scenes. It may be worthwhile to > leverage the concept in our documentation to explain the parsing > process. "Characters are *grouped* into words according to the > following rules..." or something like that. Interesting idea. It may clarify the Tridekalogue a bit. > My proposal is to not add anything more to the quoting rules; we've > already made enough changes by adding parentheses and beefing up brace > matching. Instead key off of variable naming, which is already a > well-understood concept. Let me show you. [...] > This happens because Tcl doesn't know that the programmer is trying to > name (as opposed to substitute) a variable, so it doesn't apply any > special grouping rules. (See, that term comes in handy!) Indeed ;-) > My proposal is to enable the programmer to tell the interpreter that he > or she is, in fact, trying to name a variable. I suggest the notation > of a word starting with a commercial-at sign (@), since there are very > few places where @ currently is the first (but not only) character of a > word, and because it only makes sense to name (not substitute) a > variable if the whole word is the variable name. Moreover, as Alexandre pointed out, this can be used as a compiler hint. > I also suggest that it > be a single character because it will get used a lot. (You're invited > to think of a more appropriate single character.) It will be used a lot only if it's always required when dealing with variables. But I think the $ prefix also helps in limiting the number of rules and special chars. Moreover, I have some more ideas about extending the $-syntax (*hint* expr *hint*) so I don't want to multiply syntactic rules. If we require distinct prefixes for these two rules then this means that other rules will require their own prefix as well. > Upon seeing this character at the start of the word, the interpreter > turns on all the same grouping rules that it uses for variable > substitution, except that it is an error to have stray characters after > the end of the variable name, same as with quoted words. And the parse > tree of the output word is that of a variable name, not a concatenation > of strings. This latter fact is what gives us the security and > performance benefits of brace-quoted [expr] expressions and SQLite > queries. Ok so this would justify the use of a new quoting rule. But graceful degradation from reference to value would work as well (i.e. the distinction between variable *substitution* and variable *reference*). > I anticipate that the hardest thing to accept about my proposal is that > it insists on using this notation and refuses to directly accept > unvarnished strings as variable names. That's the feeling I have from the discussion on TCL-CORE. People (including myself) don't want to give up variable names in a variety of situations because the concepts of variable names and references don't overlap totally. > I propose for [set var value] to > be an error: "expected variable name; got string" or something like > that. But everything is a string! We can't have that! Oh no! > Variable references already run afoul of EIAS, so this'll happen anyway. No, because referenced values have a string representation. > Plus it is an error (I think) to attempt to take the string (or numeric > or list or whatever) representation of a word produced by {null}. I say > this to show you that this kind of error is not so out-of-place in > Cloverfield. This is partly true. The string rep of a null value cannot be rendered, and any attempt to do so generates an error. This means that a null cannot be converted to any other type, and it only compares positively to itself. However a null can be serialized in the context of its container. For example a list containing a single null element has the following string rep: % set l ({null}{}) {null}{} This means that interpreting the above as a list gives a list with one single null element. However: % set v {null}{}; puts $v can't convert null value to string This is exactly the same with references: the {ref id} modifier is only output in the context of a container, and can only be specified in the context of a command. I don't think that it violates EIAS: while there is a distinction between data and metadata, the latter is serialized in-band in the string rep. > Here's a case in Tcl where a programmatically generated variable name is > obtained by accident: > > % set var 1 > # somewhere later in the code... > % set $var 2 > # now the programmer is confused as to why $var is still 1 > > Here's code in Cloverfield (including my proposal) that behaves the > same way: > > % set @var 1 > % set @$var 2 > > As you can see, it's not really any harder to get a programmatically > generated variable name, but the syntax for it is visually distinct from > both ordinary variable naming and variable substitution, on account of > it starting with a two-character sequence. Plus, syntax highlighters > are likely to put @var and $var in different colors, let's say green and > red. @$var would show up as a green @ followed by a red $var, making > this error very easy to spot. OTOH you almost guarantees that the community will reject this new syntax :-( This wouldn't be a problem if we designed a whole new language from scratch, but the goal is still to preserve most of the existing L&F, and get most of the community to use it without breaking most habits and idioms (that's a lot of mosts). > Plus think of all the other places where people have trouble remembering > when a command wants the name of a variable versus its value. This problem will be gone anyway, as references and values are interchangeable (*). OTOH your proposal only decorates the var names while preserving this dichotomy. (*) This is the key to understanding the concept of references that I have in mind. > With this > proposal, everywhere a name is expected but a value is passed, an error > can be raised, immediately showing the programmer where the problem is. > Again, indirection is still possible; it just takes proper syntax that > probably won't be typed by accident. You're just shifting the burden from the command to the parser. I fail to see the added value other than catching errors earlier (unless $ and @ are equally interchangeable under your proposal as well). > The remaining problem is getting the name of a variable as a string. > This can be done with an [info]-style command. I suggest that the > command also be capable of building up names from parts and decomposing > them back into constituent parts. Here, a variable's parts are its base > name, namespace qualifiers, vectored indexes, and keyed indexes. As for > [upvar]-like functionality, possibly a notation can be invented for > references into parent stack frames, most likely as a special kind of > namespace. IMHO, [upvar] is the kind of feature that needs to be preserved as is. It belongs to the declarative class of commands that takes variable names by nature. >> Back to the original problem. We wanted to be able to pass variables >> by name or by reference. Following the discussion on Tcl-Core, > > Actually I have not been following the Tcl-Core discussion. I have all > the mails that have been cross-posted back to Tcl9, but I don't know if > every message of the entire thread was cross-posted. Can you check for > me? I'm not on Tcl-Core, and I have never succeeded in viewing the > Tcl-Core archives online. (Freakin' SourceForge...) Once I am sure I > have the whole discussion, I can read it and reply to it as a whole. Ack'd. Most messages were cross-posted, but some were only posted on TCL-CORE. If you ask, I'll check whether they are significant, and in this case I will forward them to you. You should take some time to read the discussion, as the feedback I got from the community members there is very valuable. I can provide you with a summary if you want. >> I think that the distinction between names, values, strong and weak >> refs is interesting and would work quite well once the concept is >> understood. Pass-by-name semantics could be achieved by weak >> references using $@ for commands accepting plain values. > > I take it that the $@ notation arose from discussion on Tcl-Core. That's a proposal I made by mixing the existing syntax for strong references with your proposal of using @ for weak ones. > If a name is a weak reference, it must be possible to name variables in > the caller's stack frame. A special syntax for upvar? I'm not sure it would be very useful (just use upvar). >> The other side of the problem was brace matching in rule [5]. I came >> to the conclusion that the rules I propose are too complicated, as >> they duplicate some, but not all, of the other rules of the >> Tridekalogue, to identify the places where special chars are >> significant. > > That's true; there's a lot to it. But it does make it less surprising > to use. I don't think it's too much, since I was able to implement it. > >> If I remember correctly, your parser implemented rule [5] recursively, >> as if braced strings were properly formatted code. > > No, that's not how I did it. The only recursion in my parser is for > command substitution. I planned for variable substitution to also be > recursive, but I haven't got around to that part just yet. [...] > I suggest you study in detail how my parser works. It's heavily > commented, but it's still unfortunately quite difficult to understand. > You might need to draw a few diagrams to walk through some test cases. > > Basically I have a second, inner state machine for braces. The primary > state machine is driven by the following variables: [...] Ok, I got it now: the parser is not implemented recursively, it uses linear parsing with a state machine. However, (some of) the parsing rules apply recursively. Am I correct? >> While it was slightly incorrect given the rules at this time, I think >> you had a good intuition there. > > Thanks, I guess. :^) > >> So the current rule [5] that allows arbitrary sequences of chars >> between braces, with a set of exceptions to activate brace matching, >> is a dead end. > > I disagree. You say it's a dead end because it's complicated and > doesn't exactly match the rest of Cloverfield. Well, it is the way it > is because you only kept the parts of Cloverfield that were relevant. > That would be comments, double quotes, and backslashes. And in order to > detect comments and double quotes, word starts must also be tracked. It > took surprisingly little code for me to handle this. Indeed. But this implies that most (all?) of the rules must apply within braces, even if substitutions aren't performed. I'm just trying to find a way to express rule [5] so that: - it's short - it's easy to understand - it doesn't duplicate the other rules - it isn't overly selective about the rules that may or may not apply - it accurately reflects the way the parser works - it makes writing code easy and WTF-free For now I came to the conclusion that the only way all these conditions are met is when the rules apply recursively, i.e. braced expressions are whole scripts: >> Rather, rule [5] should require braced expressions to be properly >> formatted Tcl scripts, following rule [1] in a recursive manner. > > Does that mean that {[} will be illegal? Or what about {{xxx}x} or > {"xxx"x}? {[} will be illegal, as the bracketed expression is unclosed. This seems problematic at first, but OTOH we already crossed the Rubicon with double quotes. However {{xxx}x} will be legal, as the interpretation of the braced expression is delayed. So it will be parsed as a word modifier, but the actual validity of this expression won't be checked until the expression is eval'd or converted into a list. Here it will fail with an 'unknown word modifier "xxx"' error. This implies that arbitrary data cannot be put between braces anymore, but this Tcl idiom is due to the lack of a proper heredoc-like feature, which Cloverfield has. So we lose a bit of tolerance in brace parsing but gain a lot (comments, heredocs...). Cheers, Fred |
From: Andy G. <unu...@ai...> - 2008-05-31 04:44:49
|
On Thu, 29 May 2008 14:02:03 +0200, Frédéric Bonnet wrote > Not that the very concept of grouping is worthless; after all this > concept is implicit even in Tcl. It is implicit, but it's far behind the scenes. It may be worthwhile to leverage the concept in our documentation to explain the parsing process. "Characters are *grouped* into words according to the following rules..." or something like that. > But it creates a new significant departure from Tcl that will make it > even harder for Tclers to accept Cloverfield as a possible successor. I know I had trouble with it. :^) It took me a very long time to discover that Tcl has anything like it, and it is only for matching of braces inside of braced words. In that case I never before thought in terms of grouping, merely of brace matching. In other words, I didn't look for the underlying concept; I remained at the surface and saw only simple brace matching. > People already have a hard time grasping concepts such as quoting to > introduce a new one. And it also opens a new can of worms. My proposal is to not add anything more to the quoting rules; we've already made enough changes by adding parentheses and beefing up brace matching. Instead key off of variable naming, which is already a well-understood concept. Let me show you. Tcl currently accepts this: % array set var {" key with spaces " value} % puts $var( key with spaces ) % set foo key % set bar with % set quux spaces % puts $var( $foo $bar $quux ) The above all works without any surprises. The problem is naming the variable without substituting in its value. I skirted the issue in the above code by using [array set], but to use [set] it is necessary to do any of the following: % set {var( key with spaces )} value % set "var( key with spaces )" value % set var(\ key\ with\ spaces\ ) value % set "var( $foo $bar $quux )" value % set var(\ $foo\ $bar\ $quux\ ) value This happens because Tcl doesn't know that the programmer is trying to name (as opposed to substitute) a variable, so it doesn't apply any special grouping rules. (See, that term comes in handy!) My proposal is to enable the programmer to tell the interpreter that he or she is, in fact, trying to name a variable. I suggest the notation of a word starting with a commercial-at sign (@), since there are very few places where @ currently is the first (but not only) character of a word, and because it only makes sense to name (not substitute) a variable if the whole word is the variable name. I also suggest that it be a single character because it will get used a lot. (You're invited to think of a more appropriate single character.) Upon seeing this character at the start of the word, the interpreter turns on all the same grouping rules that it uses for variable substitution, except that it is an error to have stray characters after the end of the variable name, same as with quoted words. And the parse tree of the output word is that of a variable name, not a concatenation of strings. This latter fact is what gives us the security and performance benefits of brace-quoted [expr] expressions and SQLite queries. % set @var( key with spaces ) value % set @var( $foo $bar $quux ) value You're right; it behaves like yet another quoting rule, since it's a whole-word affair. But I don't think that it's worthwhile to describe it as such, since it's really just the variable reference rule already in Cloverfield. Plus observe the massive overlap between quoting and grouping, meaning that it's not always worthwhile to classify any given language rule as one or the other. I anticipate that the hardest thing to accept about my proposal is that it insists on using this notation and refuses to directly accept unvarnished strings as variable names. I propose for [set var value] to be an error: "expected variable name; got string" or something like that. But everything is a string! We can't have that! Oh no! Variable references already run afoul of EIAS, so this'll happen anyway. Plus it is an error (I think) to attempt to take the string (or numeric or list or whatever) representation of a word produced by {null}. I say this to show you that this kind of error is not so out-of-place in Cloverfield. I should explain why I think this last part of my proposal is desirable. It encourages safe, consistent programming. Imagine if [expr] error'ed if passed an unbraced argument. Better yet, imagine if SQLite did. This would limit the flexibility of the language, as there are sometimes legitimate reasons to have the expression or query be programmatically generated, but it would also make it impossible to write code that is vulnerable to injection attacks! The world would definitely be a better place if SQL injection attacks didn't exist. Unlike [expr] and SQLite requiring braces, my proposal does not limit the flexibility of the language; it does not prevent programmatically generated variable names. It just makes them slightly harder to have. In practice I imagine this will mean that no one will get them by accident. Here's a case in Tcl where a programmatically generated variable name is obtained by accident: % set var 1 # somewhere later in the code... % set $var 2 # now the programmer is confused as to why $var is still 1 Here's code in Cloverfield (including my proposal) that behaves the same way: % set @var 1 % set @$var 2 As you can see, it's not really any harder to get a programmatically generated variable name, but the syntax for it is visually distinct from both ordinary variable naming and variable substitution, on account of it starting with a two-character sequence. Plus, syntax highlighters are likely to put @var and $var in different colors, let's say green and red. @$var would show up as a green @ followed by a red $var, making this error very easy to spot. Plus think of all the other places where people have trouble remembering when a command wants the name of a variable versus its value. With this proposal, everywhere a name is expected but a value is passed, an error can be raised, immediately showing the programmer where the problem is. Again, indirection is still possible; it just takes proper syntax that probably won't be typed by accident. The remaining problem is getting the name of a variable as a string. This can be done with an [info]-style command. I suggest that the command also be capable of building up names from parts and decomposing them back into constituent parts. Here, a variable's parts are its base name, namespace qualifiers, vectored indexes, and keyed indexes. As for [upvar]-like functionality, possibly a notation can be invented for references into parent stack frames, most likely as a special kind of namespace. > Back to the original problem. We wanted to be able to pass variables > by name or by reference. Following the discussion on Tcl-Core, Actually I have not been following the Tcl-Core discussion. I have all the mails that have been cross-posted back to Tcl9, but I don't know if every message of the entire thread was cross-posted. Can you check for me? I'm not on Tcl-Core, and I have never succeeded in viewing the Tcl-Core archives online. (Freakin' SourceForge...) Once I am sure I have the whole discussion, I can read it and reply to it as a whole. > I think that the distinction between names, values, strong and weak > refs is interesting and would work quite well once the concept is > understood. Pass-by-name semantics could be achieved by weak > references using $@ for commands accepting plain values. I take it that the $@ notation arose from discussion on Tcl-Core. If a name is a weak reference, it must be possible to name variables in the caller's stack frame. > The other side of the problem was brace matching in rule [5]. I came > to the conclusion that the rules I propose are too complicated, as > they duplicate some, but not all, of the other rules of the > Tridekalogue, to identify the places where special chars are > significant. That's true; there's a lot to it. But it does make it less surprising to use. I don't think it's too much, since I was able to implement it. > If I remember correctly, your parser implemented rule [5] recursively, > as if braced strings were properly formatted code. No, that's not how I did it. The only recursion in my parser is for command substitution. I planned for variable substitution to also be recursive, but I haven't got around to that part just yet. I instead decided to return my focus to language design issues, having done enough implementation to give myself a feel for what we were getting ourselves into. :^) My parser does occasionally reprocess a character after changing states. This is not recursion because a new, temporary set of state variables is not created. I suggest you study in detail how my parser works. It's heavily commented, but it's still unfortunately quite difficult to understand. You might need to draw a few diagrams to walk through some test cases. Basically I have a second, inner state machine for braces. The primary state machine is driven by the following variables: $state ;# Current state. $quote ;# Word quoting style. The secondary state machine for braces is driven by the following: $braces ;# Number of nested brace pairs. $brquote ;# Inside double quotes inside brace pairs? $brcomment ;# Inside a comment inside brace pairs? $brspace ;# Inside whitespace inside brace pairs? The following two variables are shared between the two: $backslash ;# Inside a backslash sequence? $bsseq ;# Characters in the backslash sequence. $brquote is analogous to $quote, except that only double quotes are supported. $brcomment and $brspace are analogous to $state. It's probably a good idea to combine $brquote, $brcomment, and $brspace into $brstate, since they're mutually exclusive. I might just do that... $begin doesn't affect the behavior of the state machine; it's only there to determine if $out_word contains a valid (possibly empty string) word or if it's invalid because the command just started. The other variables are either output buffers or lookup tables. > While it was slightly incorrect given the rules at this time, I think > you had a good intuition there. Thanks, I guess. :^) > So the current rule [5] that allows arbitrary sequences of chars > between braces, with a set of exceptions to activate brace matching, > is a dead end. I disagree. You say it's a dead end because it's complicated and doesn't exactly match the rest of Cloverfield. Well, it is the way it is because you only kept the parts of Cloverfield that were relevant. That would be comments, double quotes, and backslashes. And in order to detect comments and double quotes, word starts must also be tracked. It took surprisingly little code for me to handle this. > Rather, rule [5] should require braced expressions to be properly > formatted Tcl scripts, following rule [1] in a recursive manner. Does that mean that {[} will be illegal? Or what about {{xxx}x} or {"xxx"x}? -- Andy Goth | <unu...@ai...> | http://andy.junkdrome.org/ |
From: Frédéric B. <fb...@en...> - 2008-05-30 14:46:35
|
Alexandre Ferrieux wrote: > On Thu, May 29, 2008 at 3:02 PM, Andy Goth <unu...@ai...> wrote: > >> I daresay that an elegant language will have a harder time being accepted by >> the masses because it's harder to (or the designers are loathe to) shove in >> large amounts of random functionality for every little thing. > > Yes. It should be clear by now that acceptance by the masses is just > as interesting a criterion for us as is East Coast weather to a Mars > orbiter :-) One of the goals of Cloverfield is to minimize the WTF/LOC ratio when compared to Tcl. Of course this won't turn it into an object of desire for Joe Random Coder, but I hope that this will generate enough PR buzz among the educated masses to bring back competent people who were otherwise rebuted by the peculiarities of the current Tcl (the worst offender being comments), hoping that they'll stay once they discover the inner gems hidden in the Tcl Way. I plan to make an announcement on major sites such as OSNews, for example once we have a LLVM-based interpreter, even if it's barely functional. |
From: Alexandre F. <ale...@gm...> - 2008-05-29 13:21:03
|
On Thu, May 29, 2008 at 3:02 PM, Andy Goth <unu...@ai...> wrote: > I daresay that an elegant language will have a harder time being accepted by > the masses because it's harder to (or the designers are loathe to) shove in > large amounts of random functionality for every little thing. Yes. It should be clear by now that acceptance by the masses is just as interesting a criterion for us as is East Coast weather to a Mars orbiter :-) -Alex |
From: Andy G. <unu...@ai...> - 2008-05-29 13:02:26
|
On Thu, 29 May 2008 11:33:10 +0200, Frédéric Bonnet wrote > http://www.codinghorror.com/blog/archives/001119.html Nice article, thanks. It does answer my questions. I wasn't aware of this aspect of PHP because when I last used it (many years ago) I had limited myself to a small subset of its *cough* functionality. I pretended it was a simple language, and I used it to do simple things. :^) That's why I recently was able to tell myself with a straight face that I had reimplemented PHP in Tcl: http://andy.junkdrome.org/growth/data/site.tcl Look at [emit_template]. It gets used on files like this: http://andy.junkdrome.org/growth/data/template This file escapes to Tcl mode by $varsub, [cmdsub], \backsub, and lines beginning with %percent signs. > "PHP Sucks, But It Doesn't Matter" Interesting concept. Apparently momentum/"thrust" can make up for sucky design. Pigs do fly, if you have catapults. Or jet packs. (BACON!) This certainly isn't a desirable goal to have, though! But it does show that when languages (and other such systems, e.g. libraries, operating systems, methodologies, development tools) get compared, their design quality doesn't contribute much to their rank. What's important is the momentum the world has bestowed upon them. The upshot is that no matter how many language comparisons show that Visual Basic is superior to Tcl, you can't conclude that Visual Basic is a better *language* than Tcl, only that it's more widely used by people or projects who directly or indirectly contributed to the comparison(s). I daresay that an elegant language will have a harder time being accepted by the masses because it's harder to (or the designers are loathe to) shove in large amounts of random functionality for every little thing. Apparently the average workaday programmer just wants a language with a very large toolbox. It's like preferring to have a million wrenches over having a single adjustable wrench. I can't understand this preference, but there's a very long list of things on which I can't understand my fellow man, so I'll leave it alone. -- Andy Goth | <unu...@ai...> | http://andy.junkdrome.org/ |
From: Frédéric B. <fb...@en...> - 2008-05-29 11:56:40
|
Hi again, After much thinking, I've decided to give up this whole grouping thing. Not that the very concept of grouping is worthless; after all this concept is implicit even in Tcl. But it creates a new significant departure from Tcl that will make it even harder for Tclers to accept Cloverfield as a possible successor. People already have a hard time grasping concepts such as quoting to introduce a new one. And it also opens a new can of worms. Further comments below. Andy Goth wrote: > In this email I discuss variable access as well as grouping, since it seems > to me that your grouping proposal is primarily targeted at making variable > access work the way we want it to. Making code work the same inside braced > words as at the top level is a secondary benefit of your proposal, not the > original impetus. There are other ways to achieve that if it truly is a > goal rather than a pleasant side effect of something else. Well, it was rather an attempt at killing two birds with one stone. On one hand, this grouping rule simplified brace matching a lot. On the other hand it improved the consistency of variable substitution vs. variable naming. But regarding the latter, the show-stopper was the order of substitution. If we want to handle things consistently then the characters between braces woulnd't be substituted by the parser but by the command, e.g. [set]. This means that substitution would be delayed. But this is inconsistent with indexed variable substitution rules. OTOH if substitution is performed on such braced groups, then not only is it inconsistent with quoting rules, but the substitution also has to be word-preserving in order to work as with indexed variables. IOW, we end up with the same problem as with [expr $i+$j] vs. [expr {$i+$j}]. So this proposal didn't solve either of the problems in a satisfying manner. > In general, the security hole is double evaluation. Both of these problems > (performance and security) are also experienced when not bracing the > argument to [expr]. > > Here's another case, using parentheses instead of braces. > > % set b "first)(second" > % set a($b) 0 > > [set]'s first argument will be "a(first)(second)", which will result in the > variable lookup mechanism seeing two keyed indexes. The programmer may have > intended to have one key with embedded backwards parentheses. Heck, the > programmer might not have even supplied the index name; maybe it's > somebody's nickname read in from IRC! I see you reached the same conclusion. > Safe code: > > % set a{\$b} 0; set a(\$b) 0 > % set {a{$b}} 0; set {a($b)} 0 > > If all variable names containing embedded substitutions must be quoted by > the programmer in this manner, what is the benefit of your proposal? So not only does it fail to solve anything, but it actually makes the problem worse. False good idea ;-) ******************************** Back to the original problem. We wanted to be able to pass variables by name or by reference. Following the discussion on Tcl-Core, I think that the distinction between names, values, strong and weak refs is interesting and would work quite well once the concept is understood. Pass-by-name semantics could be achieved by weak references using $@ for commands accepting plain values. It would still be possible to write commands that expect plain names instead of references or values, for example introspection or declaration commands, but in this case: - regular quoting rules would apply on the variable name, which is passed as string anyway - in such use cases, it makes little sense to use indexed variables, so only the variable *name* part would be passed without the *index* part. The latter implies that there will hardly be any problem passing a variable by name, since these problems arise because of a clash between quoting and variable substitution rules. The only problem that would persist is passing a variable name that looks like an indexed variable, e.g: global var{$index} In this case the whole word would designate the variable name, interpreted at such by the called command, without any index part. Clearly a beginner's problem, not something that we should worry about. Because in the case of [global] it makes no sense to declare as global an element and not the whole variable. Another problem is with [set]. If we want to keep the existing semantics (and I think we should) then passing indexed variables leads to the same problem as passing array elements with strings in their name. So this will require appropriate quoting as well. Again, nothing to worry about with proper documentation and an existing Tcl background. ******************************** The other side of the problem was brace matching in rule [5]. I came to the conclusion that the rules I propose are too complicated, as they duplicate some, but not all, of the other rules of the Tridekalogue, to identify the places where special chars are significant. If I remember correctly, your parser implemented rule [5] recursively, as if braced strings were properly formatted code. While it was slightly incorrect given the rules at this time, I think you had a good intuition there. So the current rule [5] that allows arbitrary sequences of chars between braces, with a set of exceptions to activate brace matching, is a dead end. Rather, rule [5] should require braced expressions to be properly formatted Tcl scripts, following rule [1] in a recursive manner. This is a significant departure from Tcl, but I don't think it's a problem as long as we provide alternative quoting rules for arbitrary data (which we do). After all, Tcl only provided two quoting rules using either doubly quoted strings or braced expressions. Handling arbitrary data required a combination of the two, plus quoting hell implying heavy uses of backslashes. Since Cloverfield provides {data} heredocs, corner cases will be solved much more easily. For example, including plain C code in Critcl cprocs is much easier with heredocs than with plain braced expressions, as the latter require a careful examination of the quoted string in order not to break Tcl's syntactic rules. ******************************** Let me know if you think it solves our problems. Cheers, Fred |
From: Frédéric B. <fb...@en...> - 2008-05-29 09:27:51
|
[sorry for the delay. Been busy catching up] Andy Goth wrote: >> We need to be more pragmatic (but of course not too pragmatic else we'll end >> up like PHP). > > Please explain what it means for PHP to be pragmatic. I was about to write about the ugliness and inconsistency of array and file system support in PHP, but Coding Horror did it better than I would: http://www.codinghorror.com/blog/archives/001119.html "PHP Sucks, But It Doesn't Matter" PHP is pragmatic in the sense that it is geared towards Web development. And it does it very well. In that context it makes sense to have functions that convert arrays to and from HTML data, or a function that opens and parse file data as CSV. However PHP sucks for general purpose development because of this profusion of calls. OTOH Tcl is a general purpose development language, so we must avoid creeping featurism and remain minimalist and elegant in all circumstances. Such profusion of domain-specific calls that is rampant in PHP needs to be properly packaged in Tcl. |
From: Frédéric B. <fb...@en...> - 2008-05-29 09:08:30
|
Alexandre Ferrieux wrote: >> Rather, I think that we should provide a means to >> obtain (notice the choice of word) this string representation on demand, >> for example using C++-style iterators, or using rope structures. >> Besides, there is a strong duality between iterators and ropes: they are >> two facets of the same concept, the former on the algorithmic level, the >> latter on the structural level. > > Yeah of course, but caches are the oil that allows many gears to roll smoothly. > Without them, much more than Tcl would come to a grinding halt ! > The ->bytes field of Tcl_Obj can be seen as just that: a cache of the > on-demand computation embodied by UpdateStringOf* functions. Then > there is a delicate trade-off between the relative costs of forward > string computation and cache consistency enforcement. > > Addressing this trade-off correctly is the result of the superior > insight behind Tcl 8.0 (within the realm of immutable values). > Addressing it with immutables is another story, but I doubt the > solution will be "no cache". That's right. However with ropes the cached values could be decentralized, i.e. each node could hold its own cached value. Whereas today's Tcl_Objs only hold a single cached value for the whole object. If for example each element of a list manages its own cached value, then changing one element has no consequence on its siblings or on the whole list. >> Moreover I think that implementing references or partially mutable >> values is simply impossible in the existing implementation, given the >> current semantics of Tcl (COW, pass-by-value, immutability...), and that >> is the cause of the (relative) failure of your patch. > > You're roughly right, but the actual reason is a bit more subtle. > COW is rather easily handled by proper perturbation of Tcl_IsShared > (see the code). > The show-stopper in my case was the interaction between mutables and > uncontrolled iterators. Indeed, while it is easy to notice that > iterating over a mutable poses a problem, it is impossible to prevent > an extension defining its own iterators from taking (say) a snaphot of > the "elements" array of a (secretly mutable) list. It's more a > question of contract (in a years-old API) between extensions and the > core than anything else. The API was born and designed in an immutable > one, and it keeps the stigmata :-( That's another reason to change the API. Which means that we lose binary compatibility with previous versions, but that's the price to pay, and major version bumps are made for that anyway. > Yes. Simple refcounting GC gone. Go to mark-and-sweep. Worry about > GC-triggering situations, and ugly worst cases, etc... You're > rebuilding the Empire State Building, and you're right to start with > the foundation, but what's not clear is whether you'll be reusing more > than a few bricks from Tcl ... but maybe you want only the dwellers, > not the bricks ;-) If the cement is the core, and the bricks are the commands, then I'm only changing the cement while trying to keep the bricks. The new cement will glue better and allow us to build higher, stronger and more audacious buildings ;-) |