First, let me repeat my apology about not voting on TIP #202.
The one-week deadline took me rather by surprise, particularly
inasmuch as I was travelling for four days out of the week.
(I do hope that the community realizes that I seldom miss a
roll call.)
As I mentioned before, I've some vague misgivings about TIP #202.
On the surface, it seems like a fine TIP - small, self-contained,
and at least moderately useful (albeit primarily to achieve
performance). Nevertheless, I suspect that its proposed API may
encumber us moving forward.
I quite agree with the TIP authors that it is wise to keep the
interfaces to arrays and dicts as nearly parallel as possible.
This consideration strongly suggests that if dicts support [dict
values], then arrays indeed ought to support [array values].
On reflection, though, it seems to me that [dict values] as
implemented is not the best choice - and we still, fortunately,
have time to fix it before it becomes set in stone in a general
release.
I begin to see [dict] as a forerunner of other container classes.
In particular, I've been spending a little bit of time thinking
about an ordered dictionary - one in which the keys are kept in
ascending order according to some rule like the ones that [lsort]
uses. One could also imagine containers with free-text-searchable
keys (perhaps using an algorithm like Patricia), ones that support
orthogonal range queries (quadtrees, perhaps), and other similar
ideas. It is significant that a very large fraction of the [dict]
API could apply - entirely without modification - to such containers.
In most systems that support containers, though, all searching is
performed by key; if you need searching by value, you create an
inverted container to support the search. The fact that both dicts
and arrays are hash tables means that we Tcl'ers normally think of all
searching as linear ([dict keys] and [dict values] are both
about equally expensive), but with other containers, there is
a huge difference. A container that maintained its keys in
lexicographic order could support [ordereddict keys $dict foo*] very
efficiently, by probing for the smallest key greater than or
equal to 'foo' and then proceeding linearly until encountering
a key that does not match the pattern. (Many commercial SQL data
bases do this sort of optimisation on LIKE queries routinely.)
Given the idea that selecting among keys is natural, while selecting
among values is always a linear search, I have managed to convince
myself that [dict values] would be much more natural if it were to
return the values corresponding to keys that match a given pattern,
rather than the values that match the pattern; that is,
[dict values $dict foo*] should be analogous to
select value from $dict where key like 'foo%'
and not
select value from $dict where value like 'foo%'
If [dict values] and [array values] were both to work this way -
or if an alternative API that does work this way were proposed -
I'd be much more comfortable with this TIP.
I also suspect that any performance gains from [dict values] will
be moot if we ever get around to doing proper iterators. With
iterators, there should be little performance difference between
foreach value [array values theArray] { ... }
and
foreach { key value } [array get theArray] { ... }
since there would be no need to form an explicit list with
the result of [array get]. Another huge win would be to make
arrays use Tcl_Obj's as their keys, so that array operations don't
necessitate string copying.
Nevertheless, since I don't have the implementation time to spend
on doing things my way at the moment, I'm extremely reluctant to vote
NO - on the grounds that anything is better than nothing. My
indecision caused me to miss the rollcall.
--
73 de ke9tv/2, Kevin KENNY GE Corporate Research & Development
kennykb@... P. O. Box 8, Bldg. K-1, Rm. 5B36A
Schenectady, New York 12301-0008 USA
|