tcl9-cloverfield Mailing List for Tcl9 (Page 2)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

fre...@fr... wrote:
> Hi Andy!

It is good to hear from you again.  I was worried that you had fallen off the edge of the earth, or (worse), off the edge of the Internet!

> I've had some time to work on my rope thing (as you can read from the
> message I've just posted)

I did read it (the email, not the sources), and I am quite excited.  I've been thinking about ropes and whether they're really worth it, and the conclusion I came to is that they can help performance when they are implemented as indexes over concatenations of strings of mixed character width, especially if the strings are mostly contiguous in memory, such as when they were read from a file.  But if the ropes are "merely" the container, then they negatively impact locality of reference.  Your email agrees with my intuition and backs it up with both implementation and measurements.  Very cool!

> I have some idea for the next stage, AKA the Grand Unification Scheme of
> string and object structures.

The ropes I had in mind were actually indexes over arrays of arbitrary objects.  Characters are just one popular type of object. ;^)

> I'm also planning a (slight) revision of the syntax proposal 

I'd like to see it.

> Glad to know you've made some progress on your side, I'd be happy to
> learn more.

I made substantial changes to the syntax, to the point of it being unfair to call my work "Cloverfield".  I haven't come up with another name yet.  I warn you: this is all very preliminary, and it's surely not self-consistent.  It's just ideas.

One major difference is that I backed off completely from the word modifier idea.

I started by dropping null, since I found that out-of-band signaling is simpler and works better for the use cases I had in mind for it.  For example, I wanted it for detecting when an optional proc argument isn't used.  This can be detected more reliably by checking for the variable's existence.  (Of course, this requires a change to the way proc works.)  Otherwise, a proc can't tell the difference between not getting an argument and getting null as its argument.  As for interfacing with SQL, just do "select ..., x, x is not null, ..." for the relatively few items that can be null.  Or modify SQLite, etc. to optionally unset a variable to signify that it is null.  And even that is only needed when the variable can assume absolutely any string value; often data is somehow constrained (numbers only, non-empty strings only, alphanumerics only, etc.), and it's easy to pick a sensible value that is distinct from all possible data.

With null gone, I next looked at metadata.  I decided to throw it out as well since its uses can be handled instead by data structures.  For example, I often need multiple pieces of code to track internal "annotations" they place on common data.  They most easily do this by using the data as the index into an internal associative array.  But with the Cloverfield metadata facility, either only one annotation at a time is allowed, or every module in an application has to agree on a convention for not stomping on every other module's annotations.

I dropped delayed substitution since I can't think of a good use for it.

Argument expansion has proven to be very valuable to Tcl, so of course it stays.  But I don't extend it to allow multiple levels of expansion.  I have never seen a case where this would help, and it can be implemented in script.

I really want word comments, but I don't want to think of them as "word"-anything.  Instead I use the more familiar term "inline comments", and the syntax is #{...}# , where the opening #{ starts wherever a word can.  Braces are matched (modulo embedded quoting), and it is an error if a # does not immediately follow the final closing brace.  (Note: Vim's syntax highlighter cannot handle nested comments, so I might have to submit a patch.)

I have a very different idea for references, to be explained below.  But it is not implemented in terms of word modifier notation.

I can do without raw data, since brace quoting has always been adequate for my needs.  I also don't like picking the end-of-text delimiter string, because I have to be careful to pick different ones when nesting heredocs.  If I have truly oddball data that can't be cleanly expressed using the existing quoting rules, I think I'll just put it in an external file.  In that case, the end of the data is tracked out-of-band by the filesystem under the rubric of file size.

In my design, word modifiers have been cut back down to size, and the features they would have imparted are moved to other territory.  Quoting, however, is just as it is in Cloverfield.  (Well, except for raw data.)  I have parenthesized words, and I don't count embedded braces that are inside double quotes or comments.

On to references!  At the moment (I can change my mind in an instant!), my design actually has two kinds of references.  The difference is in how they are dereferenced.  One kind, which somewhat corresponds to C pointers, requires explicit dereferencing.  The string representation of such a reference is "object 123" or similar.  The other kind, which somewhat corresponds to C++ references, is dereferenced automatically.  The string representation is the instantaneous value of the referent variable.  For now, I name these "pointers" and "references", respectively.

A pointer to an object is constructed by prefixing the object's variable name with an @.  "set ptr @foo" returns "object 123", and "puts $ptr" prints "object 123".  To get the value of foo given ptr, an @ is written *after* the variable substitution: "puts $ptr@".  Why @?  I would have used * if it wasn't already part of [expr] syntax.  Why after?  Because dereferencing can be used in combination with vector/key indexing, and I want to avoid needing parentheses.

It is legal to obtain a pointer to a nonexistent variable, but such a pointer can't be used to get the variable's value, not until the variable is initialized.  This corresponds to ordinary, unadorned names in Tcl.  Any word is a potential variable name, and it's perfectly legal to write words that are not (yet) the names of existent variables.  Just don't use them to take their value, not until they have been initialized.

A reference to an object is constructed by prefixing the object's variable name with an &.  "set foo bar; set ref &foo" returns "bar", and "set foo quux; puts $ref" prints "quux".  For now I think I will allow reseating a reference by assigning a variable to a new reference value.  I won't allow converting a reference to a non-reference, not without unsetting and recreating the variable.  But I need to experiment before I can be sure this is the right thing to do.

Again, it is legal to obtain a reference to a nonexistent variable; just don't take its value.

Whereas multiple $variable substitutions can be concatenated within a single word, @pointers and &references can only be the entire word.  Neither @ nor & syntax are recognized inside [expr].

Why two different kinds of references?  Well, Cloverfield has two kinds of references. ;^)  But seriously, the first kind ("pointers") is for building potentially circular data structures, and the second kind ("references") is for everyday use.  And I really mean "everyday use".  Let me explain.

For indexing to be safe, reliable, and efficient, the parser needs to know when a word is a name.  When the programmer helpfully prefixes the word with $, this tells the parser that the word is a name, so it can correctly apply name quoting and indexing rules and will not eagerly perform nested substitutions (e.g. inside keyed indexing) but will defer them to the variable lookup code, where they belong.  It's the same deal as brace-quoting with [expr] or SQLite.  Okay, so how does the programmer tell the parser that a word is a name without asking that it be dereferenced?  Start the word with an & instead of a $.  Moreover, in my language it is illegal for the first argument to [set] to be a bare, literal word; the name must be a reference!  This is similar to the requirement in PHP that all variable names start with $, but it is superior (I think) in that the syntax clearly indicates whether the code is dealing with a variable's value ($) or the variable itself (&).  (Yeah, all my code examples so far ignored this particular design decision.)

I should point out one major difference between my references and C++'s references.  In C++, reference creation is transparent to the caller of a function that takes an argument by reference.  This can catch the programmer by surprise.  In my language, the caller has to explicitly create the reference.  In both languages, the dereferencing is transparent.  I note that with pointers in C, C++, and my language, both pointer construction and dereferencing are explicit, so there is no confusion.  Fine point: in my language, "dereferencing" a pointer actually results in a reference to the value, not the literal value; this makes it possible to change a variable's value given a pointer to the variable.

I'm also thinking of supporting a $& form which can be thought of as creating an anonymous reference and immediately dereferencing it.  It's only useful in combination with vectored or keyed indexing, but I think this will still be important.  Examples: "$&(foo bar){0}" will be substituted with "foo".  "proc &whatever () {return (foo bar)}; puts $&[whatever]{0}" will print "foo".  I think this avoids the need for [dict get], [lindex], etc. on data that is not stored in a variable.

Miscellaneous:

- \e gives ASCII 27, just like with gcc.
- $(...) is like [expr {...}].  Empty string is not a valid variable name, and I don't consider parenthesized quoting to be useful in generating variable names.
- Vectored index range endpoints are separated by : instead of .. , to match Python.  Also an optional nonzero third value (preceded by : ) gives the stride.  Valid uses for references and pointers to vectored index ranges may be restricted.
- Numbers can be expressed in sexagesimal notation: 12'34'56.78 .  ' is used instead of : to avoid conflict with ?: .  I do a lot of GIS stuff at work, so this would be directly useful to me.  It can be used for time as well as geography.
- Variables, procs, commands, channels, etc. are all in a single object store, with string pseudo-values "object 123", "command 123", "channel 123", etc., kind of like in Python.  Variables and procs have proper string representations.  [unset] is used to destroy any reference to an object, and the object is garbage-collected when no references (e.g. variables) remain.  Objects can be local to a stack frame.
- A proc's string representation is its lambda expression.  The first word of a command line is treated as a variable name or a namespace name.  If it is a namespace name, the next word is the variable name, etc., to help facilitate ensembles.  The variable is automatically dereferenced to get a lambda or "command 123" identifier, which is invoked.  Some non-"command" objects are invokable, such as channels.
- As much as the parsing, analysis, bytecode compilation, and execution machinery as possible is exposed at the script level as a standard package.
- The parser is a collection of C routines designed to be easy to incorporate into other projects, as a library.
- An object can have multiple representations, not just two.  Non-string conversions may avoid an intermediate string form.  Optimized for common dual-ported case.  Hash table used for tracking multiple representations, and stale representations are removed from hash table.  Hash table created when multiple non-string representations are valid.  Example: taking the list representation of a dict.
- Word origins are tracked and accessed with [info origin].  This facilitates syntax error messages.
- Variadic and default arguments to proc can appear anywhere in the argument list, and the "args" argument can have any name; just append * to denote its catchall status, like in Python.  Append ? to the name to make its default "value" to leave the variable unset.  Use a two-element list to supply a real default value, like in Tcl.

And that's what I have so far...

-- 
Andy Goth | http://andy.junkdrome.org/
unununium@{aircanopy.net,openverse.com}

2008	Jan	Feb (3)	Mar (9)	Apr (22)	May (64)	Jun (13)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2009	Jan (12)	Feb (1)	Mar	Apr (1)	May	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct (2)	Nov	Dec (1)
2011	Jan	Feb (1)	Mar (1)	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (5)	Dec (1)

tcl9-cloverfield Mailing List for Tcl9 (Page 2)

tcl9-cloverfield — Mailing list dedicated to the Cloverfield project