From: Andy G. <unu...@ai...> - 2010-02-03 04:33:55
|
"Tom Jackson" <tom...@gm...> wrote: > Your point is that a dict does not handle multiple entries with identical > keys. I don't want dict to handle multiple entries with identical keys. That's unmanageable and doesn't make sense. A key/value list, on the other hand, can sensibly have multiple entries with identical keys, because like all lists, key/value lists are indexed by numerical position (which is always unique for each entry) and not by key (which is just a convention imposed by this particular application of list). > HTTP headers do not fit into a dictionary type structure, HTTP headers > have an order, which is not maintained with a dict. Hence the key/value list, which does preserve order. But it should still be possible to leverage [dict get]'s constant-time key lookup capability, for keys that can reasonably be expected to not repeat. Dict also preserves order, but it does not preserve not duplication. Condensing the values for duplicate keys into a list brings back duplication but partially sacrifices order. However, for HTTP I don't think it's important to track precisely how different duplicate keys get interleaved, only the order of values for any particular key, so I don't mind the loss in this case. this: {key1 val1 key2 val2 key1 val3 key2 val4} is the same as: {key1 val1 key1 val3 key2 val2 key2 val4} and can safely be stored as: {key1 {val1 val3} key2 {val2 val4}} > When dict was still in development I complained about the use of an array > to store keys, because adding a new entry would destroy order...or order > would be different based upon the platform. Dict has a well defined order. The order of keys in the dict matches the order the keys were created. The value for each key is determined by what value was most recently set for that key. My first email included an example, which I paraphrase here: % dict get {a 0 b 1 a 2 a 3 c 4} a 3 b 1 c 4 The first version of dict I played around with didn't have this ordering property. Keys showed up in whatever order the Tcl hash table functions felt like using, same as with [array names]. Is that what you meant by array? Now it chains together the hash entries with a doubly linked list. > Even with the new list-like ordering, as soon as you add a duplicate key, > the order of the original data is lost. I don't want it to be possible to add a duplicate key to a dict. Any attempt to do so would only replace the key. My goal is to not change the outward behavior of [dict] or Tcl_*Dict*() in any way. External code won't be able to tell the difference unless it measures performance, and I expect performance to improve due to increased locality of reference and decreased usage of ckalloc(). > Another issue with HTTP headers is that the key is not case > sensitive. Host, hOst, hoSt, hosT and HOST are all equivalent. So I normalize the case first. My application doesn't care about case. If I decide that I need to remember case or ordering, I can just make a separate list to keep track of that. This won't cost all that much due to shared Tcl_Obj. Input: {key1 val1 key2 val2 KEY1 val3 KEY2 val4} Data : {key1 {val1 val3} key2 {val2 val4}} Order: {key1 key2 KEY1 KEY2} > One example of a data structure which fits HTTP headers...because it was > designed to handle them is an ns_set > http://rmadilo.com/files/nsapi/ns_set.html I'm sure it internally keeps multiple "views" on the same data, conceptually similar to what I show above. In other words, it's a data structure that is an aggregate of data structures. It could have both an ordered list and a hash table pointing at the same pool of values. > Although this is not available in Tcl outside of AOLserver, I have ported > most of the functionality to pure tcl, called an nvlist: > http://www.junom.com/gitweb/gitweb.perl?p=tnt.git;a=tree;f=packages/tnt/tcl I don't see a direct relationship between ns_set and nvlist. At one level, nvlist seems like a single-table relational database with the WHERE cause in SELECT, UPDATE, or DELETE matching zero or more contiguous columns, and with a UNIQUE constraint on the concatenation of all columns, such that no two rows can be equal to each other, even though they might have some equal columns. Then there are higher-level procedures designed for a two-column table mapping between key and value. Duplicate keys and duplicate values can exist, but the same key-value mapping can't be made multiple times. I think this can be used to implement ns_set by making a three-column table mapping between ordinal, key, and value. > There is example code which uses the nvlist to handle HTTP headers, I can't find it. > and I compare the use to tcl [dict] in the nvlist-init.tcl file. testEmployees? I see two different methods of using [addItem], one like Tcl [dict] and the other like a table. The tabular method uses lists where each element's position determines its meaning. There's an index list mapping between positions and column names. After I'm done with this [dict] change, I would like to write a TIP to create a new data structure that behaves like dict but doesn't have values. I can't really call it a set, since that name is taken. Maybe it's a monobag. ;^) It would have one important operation not present in classical sets: numerical indexes. Anyway, I mention this because a monobag would work very well for making a list of column headers. % set fields {id forenames surname street city phone} % monobag search $fields surname 2 % lappend data {12345-A Joe Schmoe "147 Short St" Springfield 555-1234} % lappend data {98372-J Anne Other "32995 Oakdale Wy" Springfield 555-8765} % set surnames {} % foreach row $data { lappend surnames [lindex $row [monobag search $fields surname]] } % set surnames {Schmoe Other} Really, it's just a list without duplicate elements, combined with constant-time lookup of index given value. It can be implemented using the data structure I already presented; just remove the *2 and /2's. > But these specialized features come with a penalty: you have to create a > fixed location for the data, just like a Tcl array. Your nvlist code can be modified to take nvlist values rather than variable names. [addItem] can return a new nvlist formed by modifying its argument, the same as [lreplace]. Tcl arrays have this restriction because they're not values, they're collections of variables. But they can easily be converted to and from values. I'll have to check if the Tcl_Obj returned by [array get] has a list or dict intrep. > A [dict] gives you lots of tools, but works on portable data. That is a > cool feature. Definitely. Artificially forcing the programmer to name stuff creates lots of headaches. Values are anonymous by default. :^) -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |