You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(19) |
Jul
(96) |
Aug
(144) |
Sep
(222) |
Oct
(496) |
Nov
(171) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(4) |
Feb
(4) |
Mar
(9) |
Apr
(4) |
May
(12) |
Jun
(6) |
Jul
|
Aug
|
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(52) |
Aug
(47) |
Sep
(47) |
Oct
(95) |
Nov
(56) |
Dec
(34) |
2003 |
Jan
(99) |
Feb
(116) |
Mar
(125) |
Apr
(99) |
May
(123) |
Jun
(69) |
Jul
(110) |
Aug
(130) |
Sep
(289) |
Oct
(211) |
Nov
(98) |
Dec
(140) |
2004 |
Jan
(85) |
Feb
(87) |
Mar
(342) |
Apr
(125) |
May
(101) |
Jun
(60) |
Jul
(151) |
Aug
(118) |
Sep
(162) |
Oct
(117) |
Nov
(125) |
Dec
(95) |
2005 |
Jan
(141) |
Feb
(54) |
Mar
(79) |
Apr
(83) |
May
(74) |
Jun
(125) |
Jul
(63) |
Aug
(89) |
Sep
(130) |
Oct
(89) |
Nov
(34) |
Dec
(39) |
2006 |
Jan
(98) |
Feb
(62) |
Mar
(56) |
Apr
(94) |
May
(169) |
Jun
(41) |
Jul
(34) |
Aug
(35) |
Sep
(132) |
Oct
(722) |
Nov
(381) |
Dec
(36) |
2007 |
Jan
(34) |
Feb
(174) |
Mar
(15) |
Apr
(35) |
May
(74) |
Jun
(15) |
Jul
(8) |
Aug
(18) |
Sep
(39) |
Oct
(125) |
Nov
(89) |
Dec
(129) |
2008 |
Jan
(176) |
Feb
(91) |
Mar
(69) |
Apr
(178) |
May
(310) |
Jun
(434) |
Jul
(171) |
Aug
(73) |
Sep
(187) |
Oct
(132) |
Nov
(259) |
Dec
(292) |
2009 |
Jan
(27) |
Feb
(54) |
Mar
(35) |
Apr
(54) |
May
(93) |
Jun
(10) |
Jul
(36) |
Aug
(36) |
Sep
(93) |
Oct
(52) |
Nov
(45) |
Dec
(74) |
2010 |
Jan
(20) |
Feb
(120) |
Mar
(165) |
Apr
(101) |
May
(56) |
Jun
(12) |
Jul
(73) |
Aug
(306) |
Sep
(154) |
Oct
(82) |
Nov
(63) |
Dec
(42) |
2011 |
Jan
(176) |
Feb
(86) |
Mar
(199) |
Apr
(86) |
May
(237) |
Jun
(50) |
Jul
(26) |
Aug
(56) |
Sep
(42) |
Oct
(62) |
Nov
(62) |
Dec
(52) |
2012 |
Jan
(35) |
Feb
(33) |
Mar
(128) |
Apr
(152) |
May
(133) |
Jun
(21) |
Jul
(74) |
Aug
(423) |
Sep
(165) |
Oct
(129) |
Nov
(387) |
Dec
(276) |
2013 |
Jan
(105) |
Feb
(30) |
Mar
(130) |
Apr
(42) |
May
(60) |
Jun
(79) |
Jul
(101) |
Aug
(46) |
Sep
(81) |
Oct
(14) |
Nov
(43) |
Dec
(4) |
2014 |
Jan
(25) |
Feb
(32) |
Mar
(30) |
Apr
(80) |
May
(42) |
Jun
(23) |
Jul
(68) |
Aug
(127) |
Sep
(112) |
Oct
(72) |
Nov
(29) |
Dec
(69) |
2015 |
Jan
(35) |
Feb
(49) |
Mar
(95) |
Apr
(10) |
May
(70) |
Jun
(64) |
Jul
(93) |
Aug
(85) |
Sep
(43) |
Oct
(38) |
Nov
(124) |
Dec
(29) |
2016 |
Jan
(253) |
Feb
(181) |
Mar
(132) |
Apr
(419) |
May
(68) |
Jun
(90) |
Jul
(52) |
Aug
(142) |
Sep
(131) |
Oct
(80) |
Nov
(84) |
Dec
(192) |
2017 |
Jan
(329) |
Feb
(842) |
Mar
(248) |
Apr
(85) |
May
(247) |
Jun
(186) |
Jul
(37) |
Aug
(73) |
Sep
(98) |
Oct
(108) |
Nov
(143) |
Dec
(143) |
2018 |
Jan
(155) |
Feb
(139) |
Mar
(72) |
Apr
(112) |
May
(82) |
Jun
(119) |
Jul
(24) |
Aug
(33) |
Sep
(179) |
Oct
(295) |
Nov
(111) |
Dec
(34) |
2019 |
Jan
(20) |
Feb
(29) |
Mar
(49) |
Apr
(89) |
May
(185) |
Jun
(131) |
Jul
(9) |
Aug
(59) |
Sep
(30) |
Oct
(44) |
Nov
(118) |
Dec
(53) |
2020 |
Jan
(70) |
Feb
(108) |
Mar
(50) |
Apr
(9) |
May
(70) |
Jun
(24) |
Jul
(103) |
Aug
(82) |
Sep
(132) |
Oct
(119) |
Nov
(174) |
Dec
(169) |
2021 |
Jan
(75) |
Feb
(51) |
Mar
(76) |
Apr
(73) |
May
(53) |
Jun
(120) |
Jul
(114) |
Aug
(73) |
Sep
(70) |
Oct
(18) |
Nov
(26) |
Dec
|
2022 |
Jan
(26) |
Feb
(63) |
Mar
(64) |
Apr
(64) |
May
(48) |
Jun
(74) |
Jul
(129) |
Aug
(106) |
Sep
(238) |
Oct
(169) |
Nov
(149) |
Dec
(111) |
2023 |
Jan
(110) |
Feb
(47) |
Mar
(82) |
Apr
(106) |
May
(168) |
Jun
(101) |
Jul
(155) |
Aug
(35) |
Sep
(51) |
Oct
(55) |
Nov
(134) |
Dec
(202) |
2024 |
Jan
(103) |
Feb
(129) |
Mar
(154) |
Apr
(89) |
May
(60) |
Jun
(162) |
Jul
(201) |
Aug
(61) |
Sep
(167) |
Oct
(111) |
Nov
(133) |
Dec
(141) |
2025 |
Jan
(122) |
Feb
(88) |
Mar
(106) |
Apr
(113) |
May
(203) |
Jun
(185) |
Jul
(124) |
Aug
(202) |
Sep
(176) |
Oct
(42) |
Nov
|
Dec
|
From: Donal K. F. <don...@ma...> - 2010-02-05 16:46:54
|
On 05/02/2010 16:12, Andreas Kupries wrote: > One thing I seem to remember regarding closing of a channel is that there are > tricks around serial devices, possibly only on windows. And I wonder if changes > made in and around that area caused anything. And conversely, we have to be > careful now to not break them with the proposed edits. I can remember removing explicit destruction of the pending serial buffers on exit from Tcl on Unix. It was losing error messages that had already been committed to stderr but not consumed by the other side of the terminal. (Evil bug that...) But I don't think that had anything to do with Tcl's flushing really. Donal. |
From: Andreas K. <and...@ac...> - 2010-02-05 16:16:27
|
Alexandre Ferrieux wrote: > On 2/5/10, Alexandre Ferrieux <ale...@gm...> wrote: >> On 2/4/10, Wayne Cuddy <wc...@us...> wrote: >> > >> > TIP #361: RELEASING CHANNEL BUFFERS >> >> >> Uh, is it just me, or is the problem very different in 8.6 HEAD ? >> >> Comment out the 'close' above, and the [exit] blocks again.... >> Note that this *might* be the effect of one of the shortcuts done in >> the 'exit reform' [Bug 2001201]. > > > Update: the exit reform is not guilty, since this bug is also present > in 8.4 and 8.5 HEADs ! > > I managed to put my hands on a 8.3.5, which doesn't have the bug. And > Wayne has a (presumably older) 8.4 without the bug. We should be able > to isolate the commit rather easily now ;-) > > What happens is that TclFinalizeIOSubsystem looks at the channels' > state flags, and doesn't attempt the flush when certain flags are set. > Among these, CHANNEL_CLOSED. This is Too Bad because a just-[closed] > channel in its background flush trip has this bit set, along with > BG_FLUSH_SCHEDULED. It then looks like the fix consists of reallowing > a flush when both flags are set. Not doing it immediately since I have > to figure out how to make that flush foreground again. > > Andreas must have an opinion ;-) Yes. This is a hairy monster, and I am glad that you (Alex) are willing to look into it. That said I recommend to bisect the thing to see where the change came in, and then look in the context for more information about the change. I.e. to find out if it was accidental, or intentional, and in the case of the latter, why. One thing I seem to remember regarding closing of a channel is that there are tricks around serial devices, possibly only on windows. And I wonder if changes made in and around that area caused anything. And conversely, we have to be careful now to not break them with the proposed edits. -- Sincerely, Andreas Kupries <an...@ac...> Developer @ <http://www.activestate.com/> |
From: Alexandre F. <ale...@gm...> - 2010-02-05 14:37:47
|
On 2/5/10, Alexandre Ferrieux <ale...@gm...> wrote: > > What happens is that TclFinalizeIOSubsystem looks at the channels' > state flags, and doesn't attempt the flush when certain flags are set. > Among these, CHANNEL_CLOSED. This is Too Bad because a just-[closed] > channel in its background flush trip has this bit set, along with > BG_FLUSH_SCHEDULED. It then looks like the fix consists of reallowing > a flush when both flags are set. Not doing it immediately since I have > to figure out how to make that flush foreground again. Bug now recorded as 2946474, and patch attached. -Alex |
From: Alexandre F. <ale...@gm...> - 2010-02-05 10:52:01
|
On 2/5/10, Alexandre Ferrieux <ale...@gm...> wrote: > On 2/4/10, Wayne Cuddy <wc...@us...> wrote: > > > > TIP #361: RELEASING CHANNEL BUFFERS > > > Uh, is it just me, or is the problem very different in 8.6 HEAD ? > > Comment out the 'close' above, and the [exit] blocks again.... > Note that this *might* be the effect of one of the shortcuts done in > the 'exit reform' [Bug 2001201]. Update: the exit reform is not guilty, since this bug is also present in 8.4 and 8.5 HEADs ! I managed to put my hands on a 8.3.5, which doesn't have the bug. And Wayne has a (presumably older) 8.4 without the bug. We should be able to isolate the commit rather easily now ;-) What happens is that TclFinalizeIOSubsystem looks at the channels' state flags, and doesn't attempt the flush when certain flags are set. Among these, CHANNEL_CLOSED. This is Too Bad because a just-[closed] channel in its background flush trip has this bit set, along with BG_FLUSH_SCHEDULED. It then looks like the fix consists of reallowing a flush when both flags are set. Not doing it immediately since I have to figure out how to make that flush foreground again. Andreas must have an opinion ;-) -Alex |
From: Fredderic U. <mag...@gm...> - 2010-02-05 05:09:48
|
On 5 February 2010 11:04, Alexandre Ferrieux <ale...@gm...> wrote: > On 2/4/10, Wayne Cuddy <wc...@us...> wrote: >> TIP #361: RELEASING CHANNEL BUFFERS > > Uh, is it just me, or is the problem very different in 8.6 HEAD ? > (in another terminal: mkfifo /tmp/fifo;sleep 9999 < /tmp/fifo) > set ff [open /tmp/fifo w] > fconfigure $ff -blocking 0 > puts -nonewline $ff [string repeat A 999999] > flush $ff > close $ff > exit > => exits immediately > > Comment out the 'close' above, and the [exit] blocks again.... > Note that this *might* be the effect of one of the shortcuts done in > the 'exit reform' [Bug 2001201]. I'd be inclined to think that the TIP is the right thing, and this is a bug. If I explicitly say [flush], I kind of assume TCL would understand that I really really would like the data to be written to the underlyng connection, even if that means waiting a while before carrying through with a [close] or [exit]. In a proper production system, the application should have both internal and external limits set on how long it's allowed to just sit there, and this TIP would allow those internal limits to be enforced on any remaining outgoign data, while leaving the default action to Do The Right Thing. With respect to the example, TCP socket connections should eventualy time out, the consumer should time out (again, due to internal and external limits - most Unix's provide mechanisms to do that and they're almost mandatory on at least a few mainframe type systems I've seen) which in fact does happen in the example above, and if those timeouts could possibly be too long, steps should have been taken to enforce more appropriate limits. I'm more interested now, in how much effort needs to be applied to ensure TCL *won't* dump unwritten data in this case, and how obvious that is to someone who might think of doing what you've just done in that example; imagine a small application that builds one specific component of a webpage, in a string of such components, doesn't want to have a whole pile of extra work to do just to make sure its data gets written at the appropriate time - that is, in many cases, the calling applications job in this instance. ALL it wants to have to worry about, is preparing its chunk of output, writing it, cleaning up (updating databases, etc.), and finishing. My personal preference would be that [flush]d data WILL be written, even if it has to wait indefinitely (unless the other end of the socket gets closed first), sort of like setting a madatory low-water mark on the output buffer. [close] will do its best to close as much of the connection now as possible, and the rest as soon as possible (ie. after unwritten data has been sent), with [flush] being a means to indicate that we really do want this data written. [exit] and falling off the end of the script should be equivelant, and should try to finish cleanly (including [close]ing any open channels, with everything that goes along with [close]). And there should be some mechanism to prevent waiting if so desired (normally, I'd hazard to guess that if you've written something to a channel, regardless of what mode that channel's in, it's probably for a pretty good reason unless you explicitly say otherwise). In the case where there wasn't just a flush, but the channel has been closed, I'd be willing to accept TCL would try its best to write it, but just close it and give up when it hits the [exit], and perhaps also if it finds it's run out of files. (The latter should help resolve the issue mentioned with denial of services, if at all possible.) Other options could also be no-flush and/or max-wait options to any/all of [close] and [exit], and even the channel itself through [fconfigure]. But a quick loop over all open channels, using this TIP to explicitly close them without flush, just before an [exit], would do the trick. Fredderic |
From: Alexandre F. <ale...@gm...> - 2010-02-05 00:04:28
|
On 2/4/10, Wayne Cuddy <wc...@us...> wrote: > > TIP #361: RELEASING CHANNEL BUFFERS Uh, is it just me, or is the problem very different in 8.6 HEAD ? (in another terminal: mkfifo /tmp/fifo;sleep 9999 < /tmp/fifo) set ff [open /tmp/fifo w] fconfigure $ff -blocking 0 puts -nonewline $ff [string repeat A 999999] flush $ff close $ff exit => exits immediately Comment out the 'close' above, and the [exit] blocks again.... Note that this *might* be the effect of one of the shortcuts done in the 'exit reform' [Bug 2001201]. If this is confirmed, then maybe the TIP is unneeded, since we've already broken compatibility, not for [exit] alone, but for explicit close of nonblocking channels. I am sorry to say that Wayne after pushing you to file a TIP... but for once Tcl seems to be doing the Right Thing :} -Alex |
From: Wayne C. <wc...@us...> - 2010-02-04 10:58:16
|
TIP #361: RELEASING CHANNEL BUFFERS ===================================== Version: $Revision: 1.1 $ Author: Wayne Cuddy <wcuddy_at_useunix.net> State: Draft Type: Project Tcl-Version: 8.7 Vote: Pending Created: Wednesday, 03 February 2010 URL: http://purl.org/tcl/tip/361.html WebEdit: http://purl.org/tcl/tip/edit/361 Post-History: ------------------------------------------------------------------------- ABSTRACT ========== Tcl should provide a mechanism by which a channel's output buffer can be released without requiring that Tcl flush any remaining data in the buffer to the operating system. This is of particular interest with output mechanisms which can block indefinitely causing the interpreter to consume unnecessary resources or prevent the interpreter from exiting. PROBLEM ========= When working with processes that handle multiplexing/non-blocking I/O it is not uncommon to write, or call *puts*, with more data than the operating system will accept. Thus Tcl begins to buffer this data using internal buffers at the application level and flushes the data in the background. Problems arise when the consumer of this data, be it the other end of a socket, pipe or FIFO, refuse to read data and do not close the channel. This has adverse effects on the *close* function and consequently the interpreter/process when it attempts to flush and close channels during finalization. CONSEQUENCES -------------- 1. When *close* is issued on a non-blocking channel Tcl removes the channel from pool of accessible channels from the current interpreter and attempts to flush any remaining data in the output buffers before closing the channel. Since the other end of the channel is not consuming data Tcl will never be able to flush it's internal buffers nor close the lower level channel driver. In a multiplexing server which opens new channels this will eventually result in resource starvation or denial of service. 2. When *exit* is called Tcl attempts to flush all output buffers, if any open channel blocks the interpreter will be unable exit and must be forcibly terminated by OS specific mechanisms. PROPOSE API ADDITIONS ======================= Provide a function and/or flags to existing functions which can be used to clear internal buffers without flushing. * *chan clear* /channelId/ * *close -noflush* /channelId/ COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows |
From: Donal K. F. <don...@ma...> - 2010-02-04 09:14:55
|
On 04/02/2010 06:31, Andy Goth wrote: > How does ckalloc() work? What are its costs? Joe English pointed out that > malloc()'s overhead can be significantly less than eight bytes, depending on > the implementation. But ckalloc() isn't malloc(); as far as I can tell, it > uses malloc() sparingly, subdividing malloc()'ed memory to satisfy demand. > It seems that when RCHECK isn't defined, the overhead per ckalloc() is four > bytes, not counting alignment padding or an occasional malloc(). It depends on compilation options, but with the HEAD, in the threaded case on a 32-bit system other than OSX, there is an overhead of 2 words per allocation plus whatever the system malloc has. (OTOH, the threaded allocator we use is much faster than the system malloc because it has less lock contention.) In the non-threaded case, it depends on whether USE_TCLALLOC is defined - it isn't by default anywhere these days - and without we just use the system memory allocator. Donal. |
From: Andy G. <unu...@ai...> - 2010-02-04 06:43:12
|
"Donal K. Fellows" <don...@ma...> wrote: > I'm a bit suspicious of your figures, you know. Good, because I completely forgot about 64-bit memory alignment. I just tallied up the sizes of the fields and neglected the loose bits in between. > Cost of malloc() on a 32-bit platform is 8 bytes. I've not looked at the > cost on 64-bit systems, but I'd expect it to be at least 16. The Tcl_Obj > allocator avoids a lot of these costs. How does ckalloc() work? What are its costs? Joe English pointed out that malloc()'s overhead can be significantly less than eight bytes, depending on the implementation. But ckalloc() isn't malloc(); as far as I can tell, it uses malloc() sparingly, subdividing malloc()'ed memory to satisfy demand. It seems that when RCHECK isn't defined, the overhead per ckalloc() is four bytes, not counting alignment padding or an occasional malloc(). I swapped downShift and mask so that mask can be 64-bit aligned, saving four bytes on a 64-bit machine. Thanks for bringing this to my attention. But then I decided to delete mask outright, since it's equal to maxBuckets-1. Here are the results of GCC sizeof() on an x86 and an x86_64, each running Slackware. I will be using these two platforms for performance comparisons. New code: Dict : 240/304 Dict w/o static arrays: 32/ 48 static arrays : 208/256 DictEntry : 12/ 16 List : 24/ 32 Old code: Dict : 76/120 Dict w/o static arrays: 60/ 88 static arrays : 16/ 32 DictEntry : 28/ 56 List : 20/ 24 I put the static arrays in Dict because of the static hash bucket array in Tcl_HashTable, which the comment says avoids mallocs and frees. I figured that was a good feature to emulate, so I kept it. However, I expanded from one array to three, since I didn't want to separately allocate each entry. Keeping the entries in a linear array meant having to add a third array to track the allocation of the entries array. Avoiding ckalloc() isn't the only reason for the entries and allocation arrays. I wanted to avoid putting linked list pointers in the entries, because they eat a comparatively large amount of memory (8/16 bytes), whereas the allocation array uses only four bytes per entry. This also provides constant-time lookup of an entry given its numerical index, but there isn't a use for this feature yet. If the ckalloc() overhead really is as low as four bytes, I wouldn't mind eliminating the static arrays. Alternately, if the static arrays are helpful for small dicts, they can be split into a separate struct which can be allocated as a single block, then ckfree()'d when the dict grows; this way larger dicts wouldn't waste so much space. I greatly prefer this approach to the ckrealloc() hack I mentioned previously, and I'm sorry I didn't think of it sooner. But I think I overall prefer eliminating the static arrays, since it removes the need for special cases. Let's say I drop the static arrays. This brings the baseline cost of a new dict to 56/80 bytes (including the list part) plus six ckalloc()s for list, dict, elements, buckets, entries, and allocation. The cost per entry is 24/36 bytes (12/16 for the DictEntry plus 8/16 for two slots in elements and 4/4 for one slot in allocation), with no ckalloc(). The cost of non-dict lists is 24/32 bytes plus one ckalloc(). Compare: The old dict has a baseline cost of 60/88 bytes (if static arrays are similarly eliminated), plus two ckalloc()s for dict and bucketPtr. The cost per entry is 28/56 bytes plus one ckalloc(). The cost of non-dict lists is 20/24 bytes plus one ckalloc(). Again, it's possible to save a ckalloc() by interleaving the allocation array into the entries array, but I'd prefer to not do this because it causes the meaning of the index into the combined array to vary depending on which fields are being accessed. Also, combining buckets, entries, and allocation into a single ckalloc() can be done but would make it impossible to simply use ckrealloc() to grow the table. Since I preallocate the entries as well as the buckets, I think it makes sense to change the growth policy from quadrupling to doubling. This would reduce the cost of overestimating the dict size, at the expense of rebuilding more often. (The cost of rebuilding should be greatly reduced by storing the hash key directly in the entry.) Also I could decouple growing buckets and growing elements/allocation so that elements/allocation is ckrealloc()'ed more frequently and in smaller steps. > The more I look at this, the more I'm concerned about the sheer complexity > of what you propose. Building a new complex data structure is a > non-trivial task and demonstrating that it is genuinely cheaper than > existing ones (which are probably about as simple as they can be; there's > very little waste) is quite hard. You've got some way to go to win me over > as yet. (Call me conservative if you want.) I'd like to continue implementing it so that I can get some real data to back up (or refute!) my gut feelings. My current problem is that the dict and list implementations need to be integrated a bit tighter, and I am unsure how to organize the code to accomplish this. (List barely changes; it only needs to kill the dict index when the list is modified. But dict needs to be able to do list operations to add elements, etc.) Do I merge tclDictObj.c into tclListObj.c? Or do I de-static selected dict and list functions and put their prototypes into tclInt.h? When I'm done, dict won't be a distinct Tcl_Obj type, so I slightly favor the first approach. > Have you considered whether a different hash function would be better? I plan to use BLT's hash function, which is the one described in TIP#69. Speaking of TIP#69, my changes address all of the issues in the TIP, except for the Hashing of Malicious Strings problem. I imagine that can be fixed by mixing the address of the Dict struct into the hashed value. In my second email I mentioned that redoing dict to not use Tcl_HashTable enables experimentation with alternative hash table implementations. By the way, starting Monday I'll be on two weeks of international business travel without a laptop, so I'll be mostly out of touch, unless I find a nice hotel computer and some unexpected free time. -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |
From: Joe E. <jen...@fl...> - 2010-02-03 21:32:55
|
Donal K. Fellows wrote: > [...] Cost of malloc() on a 32-bit platform is 8 bytes [...] Clarification/correction: this depends entirely on the malloc() implementation. Eight bytes is typical overhead for simple-minded allocators based on the classic K&R chapter 8 algorithm, but very few modern malloc()s work that way anymore. Under current state-of-the-art allocators, the overhead for a single malloc(N) can be anywhere from <1 bit (amortized) to 2**(floor(ln N)) bytes, depending on the implementation and the size of the request. --JE |
From: Twylite <tw...@cr...> - 2010-02-03 17:58:57
|
Hi all, > The more I look at this, the more I'm concerned about the sheer > complexity of what you propose. Building a new complex data structure is > a non-trivial task and demonstrating that it is genuinely cheaper than > existing ones (which are probably about as simple as they can be; > there's very little waste) is quite hard. You've got some way to go to > win me over as yet. (Call me conservative if you want.) > > Have you considered whether a different hash function would be better? > (This is independent of the memory management details that you've been > focussing on.) There's a long-standing issue with the fact that Tcl's > hash functions are actually very weak; much better ones are available > these days. Some performance data would be interesting. I'm just wondering if anyone has thought to take a look at what Lua does with its "tables", since they are central to the language. LuaJIT is currently leading the performance benchmarks for interpreted languages. And while I'm talking about it, has anyone considered the possibility of targeting LuaJIT for Tcl9? Twylite |
From: Twylite <tw...@cr...> - 2010-02-03 17:58:50
|
Hi, Missed the original comment so I'm not sure if I'm quoting the right person here (apologies otherwise): > "Tom Jackson" <tom...@gm...> wrote: > >> HTTP headers do not fit into a dictionary type structure, HTTP headers >> have an order, which is not maintained with a dict. >> Pedantic: Not true (with one exception). The order of HTTP headers is only significant for those headers that have the same name. If HTTP field order was in general significant and preserved then OAuth and AWS wouldn't have to go to lengths to define an ordering for elements in their signature formats. Ref. http://www.faqs.org/rfcs/rfc2616.html section 4.2: "The order in which header fields with differing field names are received is not significant. However, it is "good practice" to send general-header fields first, followed by request-header or response- header fields, and ending with the entity-header fields. "Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded." Regards, Twylite |
From: Andreas L. <av...@lo...> - 2010-02-03 10:12:43
|
Andy Goth <unu...@ai...> wrote: > "Kevin Kenny" <kev...@gm...> wrote: > > I don't care that much what's in dead memory. If Alex things that > > 0xDECAFBAD would be a bit pattern that makes his debugging easier, > > I'm with Joe here. It doesn't matter for correct C code; and it's > > highly likely to be in the same space as NULL in terms of segv's and > > bus errors. > If you need the address of memory that's defined to not be readable, > writable, or executable, you could use OS-specific calls to get a page > that has those properties. On a platform where this feature doesn't > exist or is more trouble than it's worth, fall back on 0xDEADBEEF, > which is easy to spot and might even generate unaligned memory access > errors. There are probably programs out there that are arguably broken, but still work. E.g. they might have code paths where invalid memory from previos Tcl_Obj-lives is read, but then ignored, anyway. While it is good for the developer, if such progs finally crash so the bug can be fixed, it is of course much less desireable for users to have programs crash that previously appeared to work well. I'd suggest a TCL_MEM_DEBUG_LIGHT compile-time-flag, that would just do the 0xDEADBEEF- or 0xDECAFBAD-izing of the pointers in unused Tcl_Obj's, but none of the other safety-nets activated by debugging builds (which purportedly have a tendency of hiding bugs - those that only show up in production builds, but not in debug builds) PS: Andy, your own text of your mail was one single very long line. |
From: Donal K. F. <don...@ma...> - 2010-02-03 09:29:26
|
On 02/02/2010 17:28, Andy Goth wrote: > From the way the data structures are defined, it appears to me that this new > approach would use less memory. Let me try to put some numbers to this, > with 32-bit/64-bit byte counts. The old and new numbers should compare > directly because I don't plan to change the preallocation characteristics. > I don't know the overhead each ckalloc() incurs. > > If you like, I could make a table comparing the expected memory utilization > for dicts of various sizes. I'm a bit suspicious of your figures, you know. The issue is alignment of structure elements. Typically, these are on machine word boundaries (we don't play any games to pack them tighter because those things are typically not portable) so there's a doubling of size from 32-bit to 64-bit (unless arrays are involved, when the default alignment rules tend to be tighter). Alignment is *very* important on modern processors so it pays to not be over-smart in code layout. Cost of malloc() on a 32-bit platform is 8 bytes. I've not looked at the cost on 64-bit systems, but I'd expect it to be at least 16. (It's got to be more than 8 in order to hold the length of the memory block, and it must also be word-aligned. Since there's usually some guard bytes too, that makes for 16 bytes.) The Tcl_Obj allocator avoids a lot of these costs. > New code: > > Each hash entry, both actively used and preallocated, requires... > 1. One Tcl_Obj * inside List.elements (4/8 bytes). > 2. One DictEntry inside Dict.entries (12/16 bytes). > 3. One int inside Dict.allocation (4/4 bytes). > 4. Total (20/28 bytes). There's no guarantee that a DictEntry will actually be 16 bytes on 64-bit. It might be 24 bytes. > Each list+dict intrep requires... > 1. One List (24/32 bytes). > 2. One Dict (36/56 bytes, not counting the static* arrays). > 3. Total (60/88 bytes). > 4. Two ckallocs() for List and List.dict. Going with 32-bit (easier to calculate) I make your dictionary's baseline cost out to be 61 words for Dict and 6 for List (at least; I wasn't quite sure what the memory management strategy for the list's array of pointers was). Or 268 bytes. Plus malloc overheads (16 bytes I estimate). The cost of those static arrays is the major part of these costs. Rules for estimating the cost of Dict: 9 words for non-array fields 4*1 words for the staticBuckets array 3*4*3=36 words for the staticEntries array 3*4*1=12 words for the staticAllocation array Total: 61 words. On 32-bit, long is the same size as int and has the same alignment rules. On 64-bit, it depends on the details of the architecture, but the most common 64-bit platform these days is LP64. Due to the complexity of alignment rules though, it's important to measure the actual size. In particular, I don't know whether adjacent 32-bit words will be packed into a single 64-bit word. (You should move the downShift field of your Dict one earlier to take advantage of this effect if it does apply...) I'm not bothering to spin up a 64-bit VM to test my theories. :-) > Each list+dict over the STATIC size wastes... > 1. An unused Dict.staticBuckets array (16/16 bytes). > 2. An unused Dict.staticEntries array (240/336 bytes). > 3. An unused Dict.staticAllocation array (48/48 bytes). > 4. Total (304/400 bytes). I don't trust that calculation. I make the 32-bit size of your DictEntry out to be 3 words and STATIC_ENTRIES as 12. That means that the staticEntries array is 36 words 144 bytes (alignment doesn't cause any issues here since the structure's naturally aligned on 32-bit). > Each list+dict over the STATIC size has an extra... > 1. Three ckalloc()s for Dict.buckets, Dict.entries, and Dict.allocation. > > Each non-dict intrep wastes... > 1. An unused List.dict pointer (4/8 bytes). > > Old code: > > Each actively used hash entry requires... > 1. One ChainEntry (28/56 bytes). > 2. One ckalloc() for the ChainEntry. > > Each dict intrep requires... > 1. One Dict (60/88 bytes, not counting staticBuckets). > 2. One ckalloc() for the Dict. Excluding the staticBuckets field makes your whole analysis a bit bogus. Instead, you've got to include such things. > Each dict intrep over the SMALL_HASH_TABLE size wastes... > 1. An unused Dict.table.staticBuckets array (16/32 bytes). I think that's the wrong way to account for that. > Each dict intrep over the SMALL_HASH_TABLE size has an extra... > 1. One ckalloc() for Dict.table.buckets. > > Some cuts are possible. A 32-bit hash could be used on a 64-bit machine, > which is the current Tcl behavior, saving four bytes per entry on a 64-bit > machine. Probably not a worthwhile saving. It will also increase the amount of time spent when working with the hash as non-full-word accesses pay a speed penalty on modern architectures. > Maybe when the Dict struct exceeds the STATIC size (12 entries), > it could be shrunk with ckrealloc() to no longer contain the static* arrays, > possibly saving 304/400 bytes for large dicts. The allocation array could > be interleaved into the entries array, since they always have the same > length (just add an allocation field to the DictEntry struct), saving 4/8 > bytes for all dicts and one ckalloc() for large dicts. The more I look at this, the more I'm concerned about the sheer complexity of what you propose. Building a new complex data structure is a non-trivial task and demonstrating that it is genuinely cheaper than existing ones (which are probably about as simple as they can be; there's very little waste) is quite hard. You've got some way to go to win me over as yet. (Call me conservative if you want.) Have you considered whether a different hash function would be better? (This is independent of the memory management details that you've been focussing on.) There's a long-standing issue with the fact that Tcl's hash functions are actually very weak; much better ones are available these days. Some performance data would be interesting. Donal. |
From: Andy G. <unu...@ai...> - 2010-02-03 04:33:55
|
"Tom Jackson" <tom...@gm...> wrote: > Your point is that a dict does not handle multiple entries with identical > keys. I don't want dict to handle multiple entries with identical keys. That's unmanageable and doesn't make sense. A key/value list, on the other hand, can sensibly have multiple entries with identical keys, because like all lists, key/value lists are indexed by numerical position (which is always unique for each entry) and not by key (which is just a convention imposed by this particular application of list). > HTTP headers do not fit into a dictionary type structure, HTTP headers > have an order, which is not maintained with a dict. Hence the key/value list, which does preserve order. But it should still be possible to leverage [dict get]'s constant-time key lookup capability, for keys that can reasonably be expected to not repeat. Dict also preserves order, but it does not preserve not duplication. Condensing the values for duplicate keys into a list brings back duplication but partially sacrifices order. However, for HTTP I don't think it's important to track precisely how different duplicate keys get interleaved, only the order of values for any particular key, so I don't mind the loss in this case. this: {key1 val1 key2 val2 key1 val3 key2 val4} is the same as: {key1 val1 key1 val3 key2 val2 key2 val4} and can safely be stored as: {key1 {val1 val3} key2 {val2 val4}} > When dict was still in development I complained about the use of an array > to store keys, because adding a new entry would destroy order...or order > would be different based upon the platform. Dict has a well defined order. The order of keys in the dict matches the order the keys were created. The value for each key is determined by what value was most recently set for that key. My first email included an example, which I paraphrase here: % dict get {a 0 b 1 a 2 a 3 c 4} a 3 b 1 c 4 The first version of dict I played around with didn't have this ordering property. Keys showed up in whatever order the Tcl hash table functions felt like using, same as with [array names]. Is that what you meant by array? Now it chains together the hash entries with a doubly linked list. > Even with the new list-like ordering, as soon as you add a duplicate key, > the order of the original data is lost. I don't want it to be possible to add a duplicate key to a dict. Any attempt to do so would only replace the key. My goal is to not change the outward behavior of [dict] or Tcl_*Dict*() in any way. External code won't be able to tell the difference unless it measures performance, and I expect performance to improve due to increased locality of reference and decreased usage of ckalloc(). > Another issue with HTTP headers is that the key is not case > sensitive. Host, hOst, hoSt, hosT and HOST are all equivalent. So I normalize the case first. My application doesn't care about case. If I decide that I need to remember case or ordering, I can just make a separate list to keep track of that. This won't cost all that much due to shared Tcl_Obj. Input: {key1 val1 key2 val2 KEY1 val3 KEY2 val4} Data : {key1 {val1 val3} key2 {val2 val4}} Order: {key1 key2 KEY1 KEY2} > One example of a data structure which fits HTTP headers...because it was > designed to handle them is an ns_set > http://rmadilo.com/files/nsapi/ns_set.html I'm sure it internally keeps multiple "views" on the same data, conceptually similar to what I show above. In other words, it's a data structure that is an aggregate of data structures. It could have both an ordered list and a hash table pointing at the same pool of values. > Although this is not available in Tcl outside of AOLserver, I have ported > most of the functionality to pure tcl, called an nvlist: > http://www.junom.com/gitweb/gitweb.perl?p=tnt.git;a=tree;f=packages/tnt/tcl I don't see a direct relationship between ns_set and nvlist. At one level, nvlist seems like a single-table relational database with the WHERE cause in SELECT, UPDATE, or DELETE matching zero or more contiguous columns, and with a UNIQUE constraint on the concatenation of all columns, such that no two rows can be equal to each other, even though they might have some equal columns. Then there are higher-level procedures designed for a two-column table mapping between key and value. Duplicate keys and duplicate values can exist, but the same key-value mapping can't be made multiple times. I think this can be used to implement ns_set by making a three-column table mapping between ordinal, key, and value. > There is example code which uses the nvlist to handle HTTP headers, I can't find it. > and I compare the use to tcl [dict] in the nvlist-init.tcl file. testEmployees? I see two different methods of using [addItem], one like Tcl [dict] and the other like a table. The tabular method uses lists where each element's position determines its meaning. There's an index list mapping between positions and column names. After I'm done with this [dict] change, I would like to write a TIP to create a new data structure that behaves like dict but doesn't have values. I can't really call it a set, since that name is taken. Maybe it's a monobag. ;^) It would have one important operation not present in classical sets: numerical indexes. Anyway, I mention this because a monobag would work very well for making a list of column headers. % set fields {id forenames surname street city phone} % monobag search $fields surname 2 % lappend data {12345-A Joe Schmoe "147 Short St" Springfield 555-1234} % lappend data {98372-J Anne Other "32995 Oakdale Wy" Springfield 555-8765} % set surnames {} % foreach row $data { lappend surnames [lindex $row [monobag search $fields surname]] } % set surnames {Schmoe Other} Really, it's just a list without duplicate elements, combined with constant-time lookup of index given value. It can be implemented using the data structure I already presented; just remove the *2 and /2's. > But these specialized features come with a penalty: you have to create a > fixed location for the data, just like a Tcl array. Your nvlist code can be modified to take nvlist values rather than variable names. [addItem] can return a new nvlist formed by modifying its argument, the same as [lreplace]. Tcl arrays have this restriction because they're not values, they're collections of variables. But they can easily be converted to and from values. I'll have to check if the Tcl_Obj returned by [array get] has a list or dict intrep. > A [dict] gives you lots of tools, but works on portable data. That is a > cool feature. Definitely. Artificially forcing the programmer to name stuff creates lots of headaches. Values are anonymous by default. :^) -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |
From: Andy G. <unu...@ai...> - 2010-02-03 02:33:33
|
"Kevin Kenny" <kev...@gm...> wrote: > I don't care that much what's in dead memory. If Alex things that > 0xDECAFBAD would be a bit pattern that makes his debugging easier, > I'm with Joe here. It doesn't matter for correct C code; and it's > highly likely to be in the same space as NULL in terms of segv's and > bus errors. If you need the address of memory that's defined to not be readable, writable, or executable, you could use OS-specific calls to get a page that has those properties. On a platform where this feature doesn't exist or is more trouble than it's worth, fall back on 0xDEADBEEF, which is easy to spot and might even generate unaligned memory access errors. -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |
From: Kevin K. <kev...@gm...> - 2010-02-03 00:59:00
|
Andy Goth wrote: > (Is this the right forum for this type of discussion?) > > I believe it is possible to merge the list and dict Tcl_Obj types > such that a dict is simply a list whose intrep contains some extra > indexing information; dict extends list without replacing it, like > a subclass. This change can be made without affecting script-level, > API-level, or even binary-level compatibility. The only observable > difference should be a performance improvement in some circumstances > from reduced shimmering. Uhm. Is it time to mention Feather again? Some of Feather's ideas were - after sober thought - very bad ones. But I rather liked its idea that Tcl_ObjTypes could have inheritable interfaces. It would be a Very Good Thing if [dict for], the [array startsearch] stuff, and [foreach] could be unified. It would be a Very Good Thing if [dict keys] and [array keys] could return some sort of iterator, and not have to construct a list (particularly since their commonest use is as the list argument of [foreach]. And (in Tcl 9 territory, because we'd have either to grow Tcl_ObjType by at least one word or else result to an ugly kludge like the one Paul Duffin came up with), I think unified containers would make sense. -- 73 de ke9tv/2, Kevin PS: (The one I want is a sorted list or map, allowing for duplicate keys, and supporting O(logN) time for insertion (by key and perhaps position within equal keys, search by key, search by numeric position, and split (destructive [lrange]). It's amazing how useful such a structure is if done right. |
From: Kevin K. <kev...@gm...> - 2010-02-03 00:08:57
|
Alexandre Ferrieux wrote: > Hi, > > For what it's worth, Donal decided to ignore the suggestion in tclVar.c > Then let's hope that the remainder is 100% bug-free; otherwise many > hours will be wasted in investigating the next occurrence in a > production, non-mem-debug system. > > -Alex Alex, let's not give up so quickly. Donal, Alex is talking about laying a presumably-invalid background in dead memory, for a case where NULL in live memory would have a different meaning. I don't care that much what's in dead memory. If Alex things that 0xDECAFBAD would be a bit pattern that makes his debugging easier, I'm with Joe here. It doesn't matter for correct C code; and it's highly likely to be in the same space as NULL in terms of segv's and bus errors. |
From: Andy G. <unu...@ai...> - 2010-02-02 17:28:58
|
"Donal K. Fellows" <don...@ma...> wrote: > Have you evaluated the memory cost of these changes to code that only > needs basic lists or basic dicts and isn't transforming one to the other? >From the way the data structures are defined, it appears to me that this new approach would use less memory. Let me try to put some numbers to this, with 32-bit/64-bit byte counts. The old and new numbers should compare directly because I don't plan to change the preallocation characteristics. I don't know the overhead each ckalloc() incurs. If you like, I could make a table comparing the expected memory utilization for dicts of various sizes. New code: Each hash entry, both actively used and preallocated, requires... 1. One Tcl_Obj * inside List.elements (4/8 bytes). 2. One DictEntry inside Dict.entries (12/16 bytes). 3. One int inside Dict.allocation (4/4 bytes). 4. Total (20/28 bytes). Each list+dict intrep requires... 1. One List (24/32 bytes). 2. One Dict (36/56 bytes, not counting the static* arrays). 3. Total (60/88 bytes). 4. Two ckallocs() for List and List.dict. Each list+dict over the STATIC size wastes... 1. An unused Dict.staticBuckets array (16/16 bytes). 2. An unused Dict.staticEntries array (240/336 bytes). 3. An unused Dict.staticAllocation array (48/48 bytes). 4. Total (304/400 bytes). Each list+dict over the STATIC size has an extra... 1. Three ckalloc()s for Dict.buckets, Dict.entries, and Dict.allocation. Each non-dict intrep wastes... 1. An unused List.dict pointer (4/8 bytes). Old code: Each actively used hash entry requires... 1. One ChainEntry (28/56 bytes). 2. One ckalloc() for the ChainEntry. Each dict intrep requires... 1. One Dict (60/88 bytes, not counting staticBuckets). 2. One ckalloc() for the Dict. Each dict intrep over the SMALL_HASH_TABLE size wastes... 1. An unused Dict.table.staticBuckets array (16/32 bytes). Each dict intrep over the SMALL_HASH_TABLE size has an extra... 1. One ckalloc() for Dict.table.buckets. Some cuts are possible. A 32-bit hash could be used on a 64-bit machine, which is the current Tcl behavior, saving four bytes per entry on a 64-bit machine. Maybe when the Dict struct exceeds the STATIC size (12 entries), it could be shrunk with ckrealloc() to no longer contain the static* arrays, possibly saving 304/400 bytes for large dicts. The allocation array could be interleaved into the entries array, since they always have the same length (just add an allocation field to the DictEntry struct), saving 4/8 bytes for all dicts and one ckalloc() for large dicts. -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |
From: Donal K. F. <don...@ma...> - 2010-02-02 16:35:46
|
On 02/02/2010 15:43, Andy Goth wrote: > Since I'm unwilling to change the externally visible behavior of [dict] or > the Tcl_*Dict* functions, technically the "dict" wouldn't contain duplicate > keys, only the "key/value list" that the "dict" is indexing. I give an > example near the end of my first email. Have you evaluated the memory cost of these changes to code that only needs basic lists or basic dicts and isn't transforming one to the other? Donal. |
From: Andy G. <unu...@ai...> - 2010-02-02 15:43:46
|
"Neil Madden" <Nei...@no...> wrote: > Sounds good. If I understand correctly you add a hashtable to the list rep > which maps from string keys to list indices, rather than directly to the > associated value? This seems like a good approach to me. Exactly. > Presumably your direct concern could be solved by using [dict lappend] > instead of [dict set]? Yes. Alexandre and I discussed this and I agree. But as I said, I'm still interested in working on [dict]. This should improve the performance of normal operation as well as list and string conversion performance. > I.e., instead of trying to create a map with duplicate keys, you instead > store a list of values for each key. Since I'm unwilling to change the externally visible behavior of [dict] or the Tcl_*Dict* functions, technically the "dict" wouldn't contain duplicate keys, only the "key/value list" that the "dict" is indexing. I give an example near the end of my first email. -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |
From: Alexandre F. <ale...@gm...> - 2010-02-02 12:09:19
|
Hi, For what it's worth, Donal decided to ignore the suggestion in tclVar.c Then let's hope that the remainder is 100% bug-free; otherwise many hours will be wasted in investigating the next occurrence in a production, non-mem-debug system. -Alex |
From: Neil M. <Nei...@no...> - 2010-02-02 11:54:39
|
Hi Andy, On 1 Feb 2010, at 06:08, Andy Goth wrote: > (Is this the right forum for this type of discussion?) > > I believe it is possible to merge the list and dict Tcl_Obj types such that a dict is simply a list whose intrep contains some extra indexing information; dict extends list without replacing it, like a subclass. This change can be made without affecting script-level, API-level, or even binary-level compatibility. The only observable difference should be a performance improvement in some circumstances from reduced shimmering. Sounds good. If I understand correctly you add a hashtable to the list rep which maps from string keys to list indices, rather than directly to the associated value? Apologies, I don't have time to read through all the details you posted. This seems like a good approach to me. > > The idea came about while working on processing HTTP headers in Wibble. Headers can repeat, so I create a key/value list using [lappend] instead of [dict set]. Most of the time header repetition can be neglected, so the resulting list is conveniently accessed using [dict get], which returns the last (usually only) matching entry. However, this incurs a significant shimmering penalty for every lookup. If [dict get] were to access a persistent index over the list, rather than insisting on storing the values themselves in the index, this shimmering would not take place. (The Wibble version I describe hasn't been posted to the Wiki yet.) Presumably your direct concern could be solved by using [dict lappend] instead of [dict set]? I.e., instead of trying to create a map with duplicate keys, you instead store a list of values for each key. [...] -- NeilThis message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. |
From: Andy G. <unu...@ai...> - 2010-02-02 04:06:50
|
"Alexandre Ferrieux" <ale...@gm...> wrote: > On 2/1/10, Andy Goth <unu...@ai...> wrote: > > (Is this the right forum for this type of discussion?) > > If it's not, we're doomed :} Next time I'll try 4chan. > > I believe it is possible to merge the list and dict Tcl_Obj types > > I sympathetize with the idea of inheritance among obj types, This is a specific performance optimization made to existing types, not the introduction of a new type or a new method of organizing types. Dicts are key/value lists, so let's store them in memory that way. They just happen to have fast access via hash tables. Also: Does my mailer really spit out long lines like that? Sorry. I'll see what I can do to fix it. > given Donal's reaction to the suggestion I have more or less given it up: Inheritance and Tcl_Obj operate at two wildly different levels. Inheritance is an arguably useful concept in systems of aggregating code and data, whereas Tcl_Obj is a way to store an individual piece of data. Sure, a Tcl_Obj can be a container for other Tcl_Objs, but only in simple ways. (This is true for the core types, at least. Extensions can do what they want.) Inheritance is best done either at the script level or with the aid of an extension, not by wiring it into the primitive value storage system. I'm not proposing a general mechanism of inheritance, and describing my proposal in terms of inheritance makes it harder to understand, not easier. > I must say that the lack of a "compelling need" plays an important role > though... This change improves performance and reduces memory usage. It provides a testbed for a better hash table implementation, which could eventually be adapted for Tcl-wide use, with Tcl_HashTable becoming a compatibility wrapper. It enables Tcl programmers to mix and match lists and dicts without guilt. None of these are compelling needs, but optimizations rarely are. My intention is to preserve the script interface or the C API, so by design this change doesn't open up any new possibilities. > Uh, maybe I'm missing something, but doing this you'll wreck repeated > headers, right ? The standard idiom for {HTTP,SMTP,SIP} headers is rather > (with array, translate to dict if you want): lappend values($header) value The idea is to only do [dict get] when reading headers for which repetition is illegal anyway, but use list methods when reading repeatable headers. But that approach derives from Wibble's lack of header parsing; it just made a key/value list of headers without attempting to understand them. The current development version of Wibble not only makes a list of headers, but it also converts many headers into easily parseable dicts and lists. It should be able to merge all repetitions of a header into a single list, and it should know which headers allow this. So for Wibble I will basically do as you suggest, but that doesn't mean I'm not still interested in improving the performance of [dict] in Tcl. > is there a "compelling need" for a new bump in code complexity ? Much of [dict]'s complexity is due to having to cover for Tcl_HashTable's shortcomings. For example, it has to chain all the entries together into a linked list. I expect much of the code will become simpler as a result of ditching Tcl_HashTable in favor of something more targeted. -- Andy Goth | http://andy.junkdrome.org./ unununium@{aircanopy.net,openverse.com} |
From: Alexandre F. <ale...@gm...> - 2010-02-01 21:21:43
|
On 2/1/10, Donald G Porter <don...@ni...> wrote: > Alexandre Ferrieux wrote: > > > (3) What about obj->typePtr itself ? > > > > The reform you are talking about does not apply to objPtr->typePtr. > That field must either hold a valid pointer to the Tcl_ObjType struct > that governs the current intrep, or NULL to indicate there is no current > intrep. No other state is valid. Of course: the point is precisely to get an unambiguously invalid state, so that a SIGBUS is obtained as soon as use is attempted from a stale reference. > Unless you talking exclusively about those Tcl_Obj structs that have > been returned to the free list. For those I don't think I care. This scheme applies to any kind of allocation, be it direct ckalloc, or block allocation like Tcl_Obj. So the objects in the free list are one case among many. And I believe all these discarded objects would uniformly benefit from a reliable "reuse detector" costing very little at free time, and nothing at valid use time. -Alex |