You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(19) |
Jul
(96) |
Aug
(144) |
Sep
(222) |
Oct
(496) |
Nov
(171) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(4) |
Feb
(4) |
Mar
(9) |
Apr
(4) |
May
(12) |
Jun
(6) |
Jul
|
Aug
|
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(52) |
Aug
(47) |
Sep
(47) |
Oct
(95) |
Nov
(56) |
Dec
(34) |
2003 |
Jan
(99) |
Feb
(116) |
Mar
(125) |
Apr
(99) |
May
(123) |
Jun
(69) |
Jul
(110) |
Aug
(130) |
Sep
(289) |
Oct
(211) |
Nov
(98) |
Dec
(140) |
2004 |
Jan
(85) |
Feb
(87) |
Mar
(342) |
Apr
(125) |
May
(101) |
Jun
(60) |
Jul
(151) |
Aug
(118) |
Sep
(162) |
Oct
(117) |
Nov
(125) |
Dec
(95) |
2005 |
Jan
(141) |
Feb
(54) |
Mar
(79) |
Apr
(83) |
May
(74) |
Jun
(125) |
Jul
(63) |
Aug
(89) |
Sep
(130) |
Oct
(89) |
Nov
(34) |
Dec
(39) |
2006 |
Jan
(98) |
Feb
(62) |
Mar
(56) |
Apr
(94) |
May
(169) |
Jun
(41) |
Jul
(34) |
Aug
(35) |
Sep
(132) |
Oct
(722) |
Nov
(381) |
Dec
(36) |
2007 |
Jan
(34) |
Feb
(174) |
Mar
(15) |
Apr
(35) |
May
(74) |
Jun
(15) |
Jul
(8) |
Aug
(18) |
Sep
(39) |
Oct
(125) |
Nov
(89) |
Dec
(129) |
2008 |
Jan
(176) |
Feb
(91) |
Mar
(69) |
Apr
(178) |
May
(310) |
Jun
(434) |
Jul
(171) |
Aug
(73) |
Sep
(187) |
Oct
(132) |
Nov
(259) |
Dec
(292) |
2009 |
Jan
(27) |
Feb
(54) |
Mar
(35) |
Apr
(54) |
May
(93) |
Jun
(10) |
Jul
(36) |
Aug
(36) |
Sep
(93) |
Oct
(52) |
Nov
(45) |
Dec
(74) |
2010 |
Jan
(20) |
Feb
(120) |
Mar
(165) |
Apr
(101) |
May
(56) |
Jun
(12) |
Jul
(73) |
Aug
(306) |
Sep
(154) |
Oct
(82) |
Nov
(63) |
Dec
(42) |
2011 |
Jan
(176) |
Feb
(86) |
Mar
(199) |
Apr
(86) |
May
(237) |
Jun
(50) |
Jul
(26) |
Aug
(56) |
Sep
(42) |
Oct
(62) |
Nov
(62) |
Dec
(52) |
2012 |
Jan
(35) |
Feb
(33) |
Mar
(128) |
Apr
(152) |
May
(133) |
Jun
(21) |
Jul
(74) |
Aug
(423) |
Sep
(165) |
Oct
(129) |
Nov
(387) |
Dec
(276) |
2013 |
Jan
(105) |
Feb
(30) |
Mar
(130) |
Apr
(42) |
May
(60) |
Jun
(79) |
Jul
(101) |
Aug
(46) |
Sep
(81) |
Oct
(14) |
Nov
(43) |
Dec
(4) |
2014 |
Jan
(25) |
Feb
(32) |
Mar
(30) |
Apr
(80) |
May
(42) |
Jun
(23) |
Jul
(68) |
Aug
(127) |
Sep
(112) |
Oct
(72) |
Nov
(29) |
Dec
(69) |
2015 |
Jan
(35) |
Feb
(49) |
Mar
(95) |
Apr
(10) |
May
(70) |
Jun
(64) |
Jul
(93) |
Aug
(85) |
Sep
(43) |
Oct
(38) |
Nov
(124) |
Dec
(29) |
2016 |
Jan
(253) |
Feb
(181) |
Mar
(132) |
Apr
(419) |
May
(68) |
Jun
(90) |
Jul
(52) |
Aug
(142) |
Sep
(131) |
Oct
(80) |
Nov
(84) |
Dec
(192) |
2017 |
Jan
(329) |
Feb
(842) |
Mar
(248) |
Apr
(85) |
May
(247) |
Jun
(186) |
Jul
(37) |
Aug
(73) |
Sep
(98) |
Oct
(108) |
Nov
(143) |
Dec
(143) |
2018 |
Jan
(155) |
Feb
(139) |
Mar
(72) |
Apr
(112) |
May
(82) |
Jun
(119) |
Jul
(24) |
Aug
(33) |
Sep
(179) |
Oct
(295) |
Nov
(111) |
Dec
(34) |
2019 |
Jan
(20) |
Feb
(29) |
Mar
(49) |
Apr
(89) |
May
(185) |
Jun
(131) |
Jul
(9) |
Aug
(59) |
Sep
(30) |
Oct
(44) |
Nov
(118) |
Dec
(53) |
2020 |
Jan
(70) |
Feb
(108) |
Mar
(50) |
Apr
(9) |
May
(70) |
Jun
(24) |
Jul
(103) |
Aug
(82) |
Sep
(132) |
Oct
(119) |
Nov
(174) |
Dec
(169) |
2021 |
Jan
(75) |
Feb
(51) |
Mar
(76) |
Apr
(73) |
May
(53) |
Jun
(120) |
Jul
(114) |
Aug
(73) |
Sep
(70) |
Oct
(18) |
Nov
(26) |
Dec
|
2022 |
Jan
(26) |
Feb
(63) |
Mar
(64) |
Apr
(64) |
May
(48) |
Jun
(74) |
Jul
(129) |
Aug
(106) |
Sep
(238) |
Oct
(169) |
Nov
(149) |
Dec
(111) |
2023 |
Jan
(110) |
Feb
(47) |
Mar
(82) |
Apr
(106) |
May
(168) |
Jun
(101) |
Jul
(155) |
Aug
(35) |
Sep
(51) |
Oct
(55) |
Nov
(134) |
Dec
(202) |
2024 |
Jan
(103) |
Feb
(129) |
Mar
(154) |
Apr
(89) |
May
(60) |
Jun
(162) |
Jul
(201) |
Aug
(61) |
Sep
(167) |
Oct
(111) |
Nov
(133) |
Dec
(141) |
2025 |
Jan
(122) |
Feb
(88) |
Mar
(106) |
Apr
(113) |
May
(203) |
Jun
(185) |
Jul
(124) |
Aug
(202) |
Sep
(176) |
Oct
(52) |
Nov
|
Dec
|
From: Damon C. <da...@tc...> - 2010-03-01 09:55:07
|
TIP #362: SIMPLE 32 AND 64 BIT REGISTRY SUPPORT ================================================= Version: $Revision: 1.1 $ Author: Damon Courtney <damon_at_tclhome.com> State: Draft Type: Project Tcl-Version: 8.6 Vote: Pending Created: Monday, 01 March 2010 URL: http://purl.org/tcl/tip/362.html WebEdit: http://purl.org/tcl/tip/edit/362 Post-History: ------------------------------------------------------------------------- ABSTRACT ========== Add new options to the *registry* command on Windows to allow it to specify that the action should be taken specifically on the 32 or 64 bit registry. RATIONALE =========== When Microsoft created a 64 bit version of Windows they had this idea of separating certain areas of the registry into a new node for 32 bit programs to access. The idea was that 64 bit programs would use the standard keys while 32 bit programs living in the 64 bit world would have their registry access shuffled off to a separate node behind their backs. By default, a 32 bit program will get the 32 bit registry node and a 64 bit program will get the 64 bit nodes, and in Tcl, this is what we're currently left to. It is possible to specify which registry you want to use through flags to the various registry commands, Tcl's *registry* command just doesn't know anything about them currently. SPECIFICATION =============== The *registry* command will receive two new (mutually exclusive) options to specify that the given *registry* command should be executed against the 32 bit or 64 bit registry. The proposed implementation is the addition of a -32bit and a -64bit option like so: *registry* /subcommand/ ?*-32bit* | *-64bit*? ?/options/? By default, no option will mean that the *registry* command does what it has always done, which is to operate on the registry that matches the current compiled version of the running Tcl. Specifying *-32bit* means to operate on the 32 bit registry regardless of the current binary, and *-64bit* means to operate on the 64 bit registry. The options will be supported by all subcommands of *registry*. REFERENCE IMPLEMENTATION ========================== See Patch #2960976 at SourceForge [<URL:https://sourceforge.net/support/tracker.phpaid=2960976>]. COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows |
From: Brian G. <bgr...@mo...> - 2010-02-24 02:22:10
|
I know this may be uncool on this list, so I'll keep this short, hoping my friends here can help: I'm looking for a good new-college-grad CS person who thinks Tcl/Tk is cool. See http://wiki.tcl.tk/13259 for more details. -Brian -- I love deadlines. I like the whooshing sound they make as they fly by. -- Douglas Adams ------------------------------------------------------------- -- Mentor Graphics Corp. -- -- 8005 SW Boeckman Road 503.685.7000 tel -- -- Wilsonville, OR 97070 USA 503.685.0921 fax -- ------------------------------------------------------------- -- Technical support ............ mailto:su...@mo... -- -- Sales and marketing info ....... mailto:sa...@mo... -- -- Licensing .................... mailto:li...@mo... -- -- Home Page ........................ http://www.model.com -- ------------------------------------------------------------- |
From: Jan N. <nij...@us...> - 2010-02-22 14:38:41
|
2010/2/22 Larry W. Virden <lv...@gm...>: > +1 from me as well. I tried to build the tcl 8 sources on cygwin so that I > had a relatively recent version - right now, the version that we seem to get > is 8.4.1. We have been working on making Cygwin available and having > something in the 8.5 family would really be useful. +1 Count me in. Some work on Cygwin is already done in HEAD (at least it compiles now, but that's all...) Cygwin is not wat Anton was talking about, but any patches improving the portability in general are welcomed. Regards, Jan Nijtmans |
From: Kevin K. <kev...@gm...> - 2010-02-22 14:10:02
|
Anton Kovalenko wrote: > At Mon, 22 Feb 2010 08:54:11 -0500, > Kevin Kenny wrote: > [...] >> (but not anything newer) would get a pointer smash deep inside the >> system libraries if [file rename] or [file copy] was called >> on a serial port. Win2k, WinXP, and everything newer correctly >> return an error status. So, not an issue for 64-bitters. >> >> Others have tried to contribute 64-bit SEH code in the past, >> and we've not put it in simply because it isn't needed. > > Sorry, but it should then _compile_ for amd64 without this code. > And, last time I checked it, it didn't. > Were there any changes regarding this issue? > I don't know when you checked it, but if there's a missing #ifndef _WIN64, that's a bug, and would be easy to fix. -- 73 de ke9tv/2, Kevin |
From: Anton K. <an...@sw...> - 2010-02-22 14:05:41
|
At Mon, 22 Feb 2010 16:58:49 +0300, Anton Kovalenko wrote: > > > Others have tried to contribute 64-bit SEH code in the past, > > and we've not put it in simply because it isn't needed. > > Sorry, but it should then _compile_ for amd64 without this code. > And, last time I checked it, it didn't. > Were there any changes regarding this issue? P.S. It compiled well with msvd for amd64, but not with mingw/w64, exactly because of i386-assembly SEH implementation. -- Regards, Anton Kovalenko +7(916)345-34-02 Elektrostal' MO, Russia |
From: Anton K. <an...@sw...> - 2010-02-22 14:03:38
|
At Mon, 22 Feb 2010 08:54:11 -0500, Kevin Kenny wrote: [...] > (but not anything newer) would get a pointer smash deep inside the > system libraries if [file rename] or [file copy] was called > on a serial port. Win2k, WinXP, and everything newer correctly > return an error status. So, not an issue for 64-bitters. > > Others have tried to contribute 64-bit SEH code in the past, > and we've not put it in simply because it isn't needed. Sorry, but it should then _compile_ for amd64 without this code. And, last time I checked it, it didn't. Were there any changes regarding this issue? -- Regards, Anton Kovalenko +7(916)345-34-02 Elektrostal' MO, Russia |
From: Kevin K. <kev...@gm...> - 2010-02-22 14:01:56
|
Anton Kovalenko wrote: > Hello Tcl developers. > > On Windows platform, Tcl has to catch "structured exceptions" at some > places. It is implemented using the microsoft's C language extension > (for MSVC) or a bunch of x86 assembly code (for mingw). As far as I've been able to determine, the SEH code is simply not an issue for Win64. It appears in exactly four places in Tcl: tclWin32Dll.c - It wraps around the CPUID instruction, which faulted on 80386, 80486 and 80486SX (80486DX supported it). None of these ancient machines supported 64-bit operation. (Moreover, the features we're looking for aren't relevant on 64-bit machines so we don't even look.) tclWinChan.c - This is weird code that protects against a process crash if an invalid handle is presented to Tcl_MakeFileChannel. I'd contend that the correct action in that case is to crash. tclWinFCmd.c (2 places) - Windows NT 3.51 and Windows NT 4.0 (but not anything newer) would get a pointer smash deep inside the system libraries if [file rename] or [file copy] was called on a serial port. Win2k, WinXP, and everything newer correctly return an error status. So, not an issue for 64-bitters. Others have tried to contribute 64-bit SEH code in the past, and we've not put it in simply because it isn't needed. -- 73 de ke9tv/2, Kevin |
From: Larry W. V. <lv...@gm...> - 2010-02-22 12:04:24
|
On Sun, Feb 21, 2010 at 5:42 PM, Pat Thoyts <pat...@us... > wrote: > > +1 from me. > +1 from me as well. I tried to build the tcl 8 sources on cygwin so that I had a relatively recent version - right now, the version that we seem to get is 8.4.1. We have been working on making Cygwin available and having something in the 8.5 family would really be useful. > -- > Tcl - The glue of a new generation. http://wiki.tcl.tk/ > Larry W. Virden > http://www.xanga.com/lvirden/ > Even if explicitly stated to the contrary, nothing in this posting > should be construed as representing my employer's opinions. > |
From: Pat T. <pat...@us...> - 2010-02-22 06:03:46
|
Anton Kovalenko <an...@sw...> writes: >Hello Tcl developers, > >There is a couple of issues with Tcl autoconf script for Windows, that >I routinely run across when doing cross-builds with mingw from Linux. >After posting a message about SEH, concerning the changes that require >autoconf modifications, I decided to initiate discussion on these >issues as well. > >Cross-compilation is obviously _not_ what the tcl.m4 author(s) had in >mind. But, fortunately, this script works reasonbly well even in this >case, for building across different-CPU linuxes and for windwos as well, >with the following two exceptions: > >1. System threading implementation selection seems to depend on >`uname` output unconditionally. When running on Linux, I can >cross-compile non-threaded Tcl/win32 and non-threaded extensions >successfully, but when attempting the threaded build, it tries to use >pthread and resorts to a non-threaded one. > >Overriding the uname command by something that comes earlier in the >$PATH and echoes MINGW32_NT-5.1_WIN32 solves this issue, but this >solution is unobvious and probably causes many people to abandon the >idea of cross-compiling Tcl. It should be solved in the script >instead, but, again, I don't have enough autoconf experience to >provide a _reliable_ solution (probably the build system uname should >not be used at all). > >2. When installing message catalogs and timezone data, Makefile >insists on running the freshly-built tclsh (that isn't generally >runnable in case of cross-compiling). For windows builds, I usually >leave it alone and let wine provide the transparent execution. For >cross-linux builds, I sometimes let qemu do the same. Before these >solutions started to work nicely (like they do now) I had to separate out >install-binaries, install-libraries and (install-tzdata install-msgs), >running the last pair of targets with overridden TCLSH. > >I propose a fix: let Makefile use $RUNNABLE_TCLSH in the installation >process, with $RUNNABLE_TCLSH defaulting to the freshly built $TCLSH, as >it was before. That is, it should _install_ $TCLSH into $(prefix)/bin, >but _run_ (when needed) $(RUNNABLE_TCLSH), which defaults to $(TCLSH) >but can be overridden separately. > >This way, Tcl's configure script doesn't have to support finding the >appropriate build-system runnable tclsh. The modification is two lines >long. But when I build an automatically cross-compiled tree, it would >save much time to just "make install RUNNABLE_TCLSH=/usr/bin/tclsh >INSTALL_ROOT=zzzz" instead of fiddling with Makefile internals (and >merging the changes when they happen). > >I realize that the cross-compilation support is probably not a >high-priority goal for Tcl team. But this target is almost reached >anyway, so it'd be nice to take a minor additional step :) +1 from me. I had to do something similar when building for android-arm from linux. I think I called it NATIVE_TCLSH but its the same idea. I don't remember noticing your first issue but then I had some trouble linking pthreads for that target. -- Pat Thoyts http://www.patthoyts.tk/ PGP fingerprint 2C 6E 98 07 2C 59 C8 97 10 CE 11 E6 04 E0 B9 DD |
From: Anton K. <an...@sw...> - 2010-02-21 13:15:42
|
Hello Tcl developers, There is a couple of issues with Tcl autoconf script for Windows, that I routinely run across when doing cross-builds with mingw from Linux. After posting a message about SEH, concerning the changes that require autoconf modifications, I decided to initiate discussion on these issues as well. Cross-compilation is obviously _not_ what the tcl.m4 author(s) had in mind. But, fortunately, this script works reasonbly well even in this case, for building across different-CPU linuxes and for windwos as well, with the following two exceptions: 1. System threading implementation selection seems to depend on `uname` output unconditionally. When running on Linux, I can cross-compile non-threaded Tcl/win32 and non-threaded extensions successfully, but when attempting the threaded build, it tries to use pthread and resorts to a non-threaded one. Overriding the uname command by something that comes earlier in the $PATH and echoes MINGW32_NT-5.1_WIN32 solves this issue, but this solution is unobvious and probably causes many people to abandon the idea of cross-compiling Tcl. It should be solved in the script instead, but, again, I don't have enough autoconf experience to provide a _reliable_ solution (probably the build system uname should not be used at all). 2. When installing message catalogs and timezone data, Makefile insists on running the freshly-built tclsh (that isn't generally runnable in case of cross-compiling). For windows builds, I usually leave it alone and let wine provide the transparent execution. For cross-linux builds, I sometimes let qemu do the same. Before these solutions started to work nicely (like they do now) I had to separate out install-binaries, install-libraries and (install-tzdata install-msgs), running the last pair of targets with overridden TCLSH. I propose a fix: let Makefile use $RUNNABLE_TCLSH in the installation process, with $RUNNABLE_TCLSH defaulting to the freshly built $TCLSH, as it was before. That is, it should _install_ $TCLSH into $(prefix)/bin, but _run_ (when needed) $(RUNNABLE_TCLSH), which defaults to $(TCLSH) but can be overridden separately. This way, Tcl's configure script doesn't have to support finding the appropriate build-system runnable tclsh. The modification is two lines long. But when I build an automatically cross-compiled tree, it would save much time to just "make install RUNNABLE_TCLSH=/usr/bin/tclsh INSTALL_ROOT=zzzz" instead of fiddling with Makefile internals (and merging the changes when they happen). I realize that the cross-compilation support is probably not a high-priority goal for Tcl team. But this target is almost reached anyway, so it'd be nice to take a minor additional step :) -- Regards, Anton Kovalenko +7(916)345-34-02 Elektrostal' MO, Russia |
From: Anton K. <an...@sw...> - 2010-02-21 11:32:06
|
Hello Tcl developers. On Windows platform, Tcl has to catch "structured exceptions" at some places. It is implemented using the microsoft's C language extension (for MSVC) or a bunch of x86 assembly code (for mingw). Looking at the current mingw headers, I came across <excpt.h>, which is basically a drop-in replacement for __try .. __except (no __finally provided). This implementation is probably incomplete, but it seems that Tcl doesn't require something more that this file provides. I would like to propose using this header for mingw (probably with some autoconf test added, like HAVE_MINGW_SEH), instead of a hand-written assembly fragments. The most compelling reason is that no SEH code currently exists in Tcl for 64-bit platforms, but <excpt.h> _does_ exist on these platforms and contains assembly code for them. (I don't know how well this code works, though). Patching Tcl to use it seems mostly trivial, but there are something that I would like to ask experienced developers about: 1. Are there any known problems with mingw's <excpt.h>, like an unexpected, unstable or buggy behavior? Or is it unused for now simply because it did not exist when appropriate parts of Tcl code was written? 2. Maybe it'd be the best solution to keep a current implementation for mingw/i386, but to wrap it into the definitions similar to mingw's __try1 .. __finally1. Then we'd be able to use these definitions (TCL_WIN_TRY ... TCL_WIN_FINALLY) in any place that are currently conditionalized on HAVE_NO_SEH (and that now have a copy-pasted instances of the same assembly code). Does anyone have an opinion on it? 3. I'm not bad in writing C, but as of autotools, my experience is mostly a cargo cultist one :) I can fix the C code myself, but I'd be grateful if someone implements required autoconf tests properly. Is there any experienced "autoconf-guru" who's ready to help? 4. Last but not least: appropriate #error directive should be inserted when HAVE_NO_SEH is defined for the platform for which we cannot provide a reasonable alternative. Compiling Tcl with mingw-w64 now gives an obscure error about invalid assembly opcode. Even if we fix AMD64 and IA64, there may be some other cases (like WinCE/ARM?). We should not assume that MINGW means X86, and, likely, we shouldn't assume that MINGW && !X86 && !AMD64 means IA64, etc. Let's just do #error "No SEH support in compiler and no implementation for this compiler or CPU". Thank you for attention. -- Regards, Anton Kovalenko +7(916)345-34-02 Elektrostal' MO, Russia |
From: Tom J. <tom...@gm...> - 2010-02-18 21:31:53
|
On Thu, Feb 18, 2010 at 1:23 AM, Donal K. Fellows <don...@ma...> wrote: > On 17/02/2010 18:41, Tom Jackson wrote: >> The only way to justify using an array instead of a linear list {{key >> value} {key value}} is if you don't think it is important to maintain >> the data in the form it was sent. This is a valid argument. But note >> that in order to add a key, or lappend to an existing key, you have to >> search the array first. A large array will consume more resources >> during creation than a list containing the same data. > > I'm not criticising other parts of what they're doing. Just the process > of going from key to bucket chain to add to. Theirs is particularly lame > as it loads all keys that start with the same letter into the same > bucket (so trivial to attack it's unfunny) but it's not as lame as when > it was all just an alist. It was in older versions of Apache; adds might > have been quick, but lookups would have sucked... I really wish I could figure out how anyone ever got the idea that arrays are better than lists if the data is only used a few times. Somehow you have to pay for the creation of the array. This makes sense if you need continuous access to the same data over a long period of time. Another important point about Tcl arrays is that the history suggests that they were designed as a method for mapping strings to opaque structures (or pointers, whatever). The string-data to binary-data mapping is one of the most important features of the Tcl language. So you can't use a list to map strings to an opaque structure., because lists are "values". Somehow this important fact is being turned upside down. We are focusing on the key and not the value. Now, back to performance: You have to pay for the creation and maintenance of the array. This is an up-front cost. If you have 10 minutes to visit Disneyland, your per-ride cost will be very high. Enough talk. Here is a script which uses the FNV algorithm to create an array, using the collide keys. After the script I have timings (create the array with the keys, create a name-value list with the keys). The result is that it takes so much time to create the array, I have plenty of time left over to do very slow linear searches...and I can do them case-insensitive. In addition I can recover the order of the input list. Why anyone thinks it is a good idea to spend time destroying information is beyond me. But basically that is what any list->array transform gives you. # original collide proc proc collide {level {accum ""} } { global a if {$level} { incr level -1 collide $level "aj$accum" collide $level "ba$accum" } else { lappend a($accum) 1 } } # new proc just generates keys proc collideNames {level {accum ""} } { global names if {$level} { incr level -1 collideNames $level "aj$accum" collideNames $level "ba$accum" } else { lappend names $accum } } # generate lots of keys collideNames 15 set hostChoice [list host Host hosT hOst hoSt HoSt HOst hoST] set valueIndex 0 set hostIndex 0 # create array using keys, note collide runs faster because it is a compiled proc # the loop here matches the following list creation loop. puts "create host array: [time { foreach name $names { lappend a($name) [incr valueIndex] if {!($valueIndex % 10007)} { incr hostIndex lappend a([lindex $hostChoice $hostIndex]) $valueIndex } } }]" puts [array statistics a] set valueIndex 0 set hostIndex 0 puts "create host list: [time { foreach name $names { lappend headerList [list $name [incr valueIndex]] if {!($valueIndex % 10007)} { incr hostIndex lappend headerList [list [lindex $hostChoice $hostIndex] $valueIndex] } } }]" puts "finding First Host: [time {set hosts [lsearch -inline -index 0 $headerList Host]}]" puts "finding First Host nocase: [time {set hosts [lsearch -inline -index 0 $headerList Host]}]" puts "finding All Hosts: [time {set hosts [lsearch -inline -all -index 0 $headerList Host]}]" puts "finding All Hosts nocase: [time {set hosts [lsearch -inline -all -nocase -index 0 $headerList Host]}]" puts $hosts ### End script The only thing interesting here is the extra time required to create the array. But lets not even get into the problems of searching for a "host" header with no case: % array names a -regexp (h|H)(o|O)(s|S)(t|T) hosT hOst Host % time {array names a -regexp (h|H)(o|O)(s|S)(t|T)} 120592 microseconds per iteration Then you have to retrieve the values. Here is what I get with a list representation: create host array: 1025203 microseconds per iteration (wow! over one second) 32771 entries in table, 16384 buckets number of buckets with 0 entries: 8342 number of buckets with 1 entries: 604 number of buckets with 2 entries: 1144 number of buckets with 3 entries: 1657 number of buckets with 4 entries: 1616 number of buckets with 5 entries: 1271 number of buckets with 6 entries: 848 number of buckets with 7 entries: 487 number of buckets with 8 entries: 237 number of buckets with 9 entries: 111 number of buckets with 10 entries: 47 number of buckets with 11 entries: 15 number of buckets with 12 entries: 4 number of buckets with 14 entries: 1 number of buckets with 10000 or more entries: 0 average search distance for entry: 3.0 create host list: 381651 microseconds per iteration (2.5 times faster) finding First Host: 1867 microseconds per iteration finding First Host nocase: 1697 microseconds per iteration finding All Hosts: 6618 microseconds per iteration finding All Hosts nocase: 40670 microseconds per iteration Host keys: {Host 10007} {hosT 20014} {hOst 30021} Somewhere the conventional wisdom about list vs. array access needs to be discredited. tom jackson |
From: Gustaf N. <ne...@wu...> - 2010-02-18 13:21:56
|
Am 17.02.10 19:41, schrieb Tom Jackson: > The keyspace is only 32 bits, this is trivial by any measure. I guess > Gustav has already found several sets of keys. Of course, the real > problem is that you only need the low order bits of the hash to match, > so the keyspace for the buckets is much smaller than 32 bits. > Exactly. It is sufficient to attack a much smaller keyspace to achieve the reported slowdowns. To make a lookup in a table with 55000 entries a linear search, it is sufficient to find hash-values with the identical 14 low bits (in the current mapping from hash-key to bucket). One can construct such key-sets for every hash function in a c program in less than 20 seconds (finds e.g. 70000 keys with the same hash suffix as "hello" based on fnv). As pointed out earlier, this attack is not possible when using a randomized seed and hash functions like murmur or lookp3, which mix the seed properly into the hash value, since on every start of tcl, it will return sufficiently different hash values for the same keys. Since tcl support custom hash tables, every c-level application can provide for their own hash tables a different hash function with and without seeds. In case, there is still someone listening: Below are some results from loading dict/words into an array (the script donal posted) with 4 hash algorithms. It shows, that Tcl's classic hash function does very well, murmur is the seconds and the perl hash function is the worst of these. -gustaf neumann CLASSIC 218022.7906 microseconds per iteration MURMUR 219907.73859999998 microseconds per iteration LOOKUP3 225928.0354 microseconds per iteration PERL_HASH 249029.0566 microseconds per iteration |
From: Donal K. F. <don...@ma...> - 2010-02-18 09:23:49
|
On 17/02/2010 18:41, Tom Jackson wrote: > My idea was to just apply the mod once (to the final hash and in the > bucket selection code), however, on further testing the current Tcl > hash produces values which (unless a collision) produce a good bucket > distribution. You can't apply the mod inside the hash function > because it doesn't have access to the size of the hash table (number > of buckets). You could (except for in the hash function in tclLiteral.c) as you've got access to the tablePtr. But the problem with that is that it means that when you resize the hash table you've got to recalculate the hashcode for all the entries. That pushes up the cost of resizing quite a lot. > The keyspace is only 32 bits, this is trivial by any measure. I guess > Gustav has already found several sets of keys. Of course, the real > problem is that you only need the low order bits of the hash to match, > so the keyspace for the buckets is much smaller than 32 bits. It's not so trivial if it takes around 2**31 attempts to find another hash; brute-force trouble is unavoidable without using a much larger hashcode, and that pushes the cost up. The problem with the hashing scheme that we've currently got is that there are just too many ways to generate collisions; concatenating colliding bits with constant non-colliding bits generates more colliding bits. But the cost to most Tcl code of fixing this is too high. Most Tcl code just doesn't have the collision problem. Because of *that*, I've reverted to the old hash function. It's fast, very very fast; trying to better it for speed is hard since it is doing exactly what modern microprocessors have been optimized to do. It's also actually good enough for 99.$lots% of all uses, as it produces reasonable results with short keys (much of the research on hashes seems to focus on longer keys). Other approaches are needed for some apps, sure, but they can be shuffled off to extensions without us feeling too bad. I hate reverting code. But in this case, the preponderance of the evidence is that it's not fair on most code to make it bear the performance cost of the malicious-key-choice edge cases. (It was larger than I expected.) > b) fits the typical definition of a collision. Of course collisions > have nothing to do with cryptography, since the key is always compared > anyway. Hopefully the key comparison stops on the first mismatch, so > the actual number of bits compared is more related to the length of a > common prefix between all keys in one bucket. One of the problems with Tcl's hash is that it's trivial to make other values that hash to the same thing as receiver-nominated keys. That is, even if you don't look at keys that you don't know about (other than to store them in case some part of the processing needs them; this is a reasonable mode of operation when building plugin-extensible apps) you still bear the cost. > Note that Apache does not modify the key, it can store multiple copies > of the same key, and can search for keys in a case-insensitive manor. Unrelated to what we do. We implement a map, they implement a slightly different structure (a multimap IIRC). Hashing techniques would work just as well with one as the other. Making Tcl's hash tables work that way wouldn't be too hard, but isn't really about changing that part of the code. > The only way to justify using an array instead of a linear list {{key > value} {key value}} is if you don't think it is important to maintain > the data in the form it was sent. This is a valid argument. But note > that in order to add a key, or lappend to an existing key, you have to > search the array first. A large array will consume more resources > during creation than a list containing the same data. I'm not criticising other parts of what they're doing. Just the process of going from key to bucket chain to add to. Theirs is particularly lame as it loads all keys that start with the same letter into the same bucket (so trivial to attack it's unfunny) but it's not as lame as when it was all just an alist. It was in older versions of Apache; adds might have been quick, but lookups would have sucked... Donal. |
From: Tom J. <tom...@gm...> - 2010-02-17 18:41:53
|
On Wed, Feb 17, 2010 at 3:56 AM, Donal K. Fellows <don...@ma...> wrote: > > Mod is quite an expensive operation, at least on this machine. Applying > a mod operation (mod a large prime) to the result of the classic hash > before returning nearly takes the cost of hashing up to equivalence with > FNV. My idea was to just apply the mod once (to the final hash and in the bucket selection code), however, on further testing the current Tcl hash produces values which (unless a collision) produce a good bucket distribution. You can't apply the mod inside the hash function because it doesn't have access to the size of the hash table (number of buckets). Also, the modulus is not "large" it is just a prime close to the number of buckets. However, the current bucket code works very well with the current hash function, both fnv and the djb hash perform worse. >> The system only needs to be broken once. All that is required is a set >> of keys (maybe a few thousand) which hash to the same value. After >> that the entire algorithm is broken (assuming the goal is to poison >> the bucket code with collisions). Assuming a 32 bit key-space, it >> should be easy to find at least one set of 10000 strings which share >> the same hash value. >> >> Do I care to find such a set? NO! But this is a triviality for >> cryptographers. > > Remember, cryptographers are mathematicians. They have a definition of > "trivial" which doesn't match that used by software engineers. I can > remember a number of acquaintances state that providing examples once an > existence proof was done was trivial. Can't say that I really bought > their argument... :-) The keyspace is only 32 bits, this is trivial by any measure. I guess Gustav has already found several sets of keys. Of course, the real problem is that you only need the low order bits of the hash to match, so the keyspace for the buckets is much smaller than 32 bits. > In terms of hash attacks, they only have real fangs (given that we don't > conflate hashcode equivalence with equality) if it is possible to make > either: > > a) A large number of *short* keys with the same hashcode, or > b) A substantial number of keys with the same hashcode as some > particular key from a set not nominated by the attacker (typically > defined by the communication protocol) > > The aim of the first one is to swamp the hash table with totally > irrelevant things. The aim of the second is to retard required lookups. b) fits the typical definition of a collision. Of course collisions have nothing to do with cryptography, since the key is always compared anyway. Hopefully the key comparison stops on the first mismatch, so the actual number of bits compared is more related to the length of a common prefix between all keys in one bucket. But since (again) the key is always compared, other types of attacks would be just as effective in consuming resources: a few very long keys which hash to the same value, or get into the same bucket. Also the communication protocol itself will consume more resources transmitting the data, and the hashing and storage of large quantities of data is also an attack, in fact the attack would be more effective if you use a more expensive hash function. Like I said earlier: the utility of cryptography is found in the ability to transfer the expensive calculations to the attacker. If instead you provide a platform so the attacker can make you perform many expensive calculations, the attacker has already succeeded without lifting a finger. It is much better to solve this problem at the application level: limit the number of headers, or the length of an entire request. Otherwise the attacker can just send millions of headers instead of thousands, what is the difference how performance is degraded? DOS attacks are not sophisticated attacks. Maybe just send a very long url and see of the whole url gets written to disk in the log file. Eventually the disk fills up. > >> I know there is a huge bias against AOLserver, so I digress to Apache. >> How does Apache handle headers? >> >>> From http://thomas.eibner.dk/apache/table.html (Apache table API): >> >> "Tables in Apache works almost like hashes in Perl. The main >> difference between the two is that you can have two of the same keys >> in the Apache table API. The table lookups are done in a >> case-insensitive manner." >> >> So for everyone who has some problem with AOLserver, maybe note that >> headers in Apache are handled very similarly. More specifically they >> are not handled as associative arrays or dictionaries. > > Checking the source to Apache (strictly the APR) I see that they're > using hashing inside their implementation of tables. OK, they're using a > truly trivial hash, but I suppose it's better than what they had before > (a single linear list). > > Check for yourself at: > http://svn.apache.org/repos/asf/apr/apr/trunk/tables/apr_tables.c Note that Apache does not modify the key, it can store multiple copies of the same key, and can search for keys in a case-insensitive manor. Unfortunately their case-insensitive search is limited to ASCII chars. Headers are also stored in a linear array (but it looks like you can actually only store two keys with the same name). The only way to justify using an array instead of a linear list {{key value} {key value}} is if you don't think it is important to maintain the data in the form it was sent. This is a valid argument. But note that in order to add a key, or lappend to an existing key, you have to search the array first. A large array will consume more resources during creation than a list containing the same data. tom jackson |
From: Larry M. <lm...@bi...> - 2010-02-17 17:18:16
|
On Wed, Feb 17, 2010 at 03:40:45PM +0000, Donal K. Fellows wrote: > Following talking this over with Kevin Kenny, I suspect that the right > thing to do is to roll back to the JO hash function If nothing has been committed on top of these changes can we just strip the subsequent deltas from the RCS files (I can do that for you if you like)? The reason is that the history is going to be completely mangled otherwise, it will end up looking like Donal wrote the hash functions that Jon wrote and that sucks when you are trying to track things in the source base. If we can't do the surgery on the files than could someone please log in as John and do the commit? -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: Tom J. <tom...@gm...> - 2010-02-17 16:27:03
|
On Wed, Feb 17, 2010 at 7:40 AM, Donal K. Fellows <don...@ma...> wrote: > I suspect that the right > thing to do is to roll back to the JO hash function (because it's fast > on most of the types of keys it encounters) and to have a way to let > people plug in different mapping mechanisms for arrays. This would be a major advancement. > Which doesn't > need to be done for 8.6 (and won't be; it requires work). Being able to > put a different mapping mechanism under an array's clothes is something > I've been wanting to do for a while anyway, since maybe 2003. For one > thing, it would let us put things like BLT vectors and metakit's > database on a more solid footing. Allowing the use of a stronger hash > function - or a balanced tree - well, that ought to be trivial (and > implementable in an extension) if the basics are done right. > > The only other truly satisfying solution - upgrading to a strong hash > function with proper entropic salting - seems to be too expensive to > really justify for all code. (FNV is only marginally more resistant than > the classic hash function, not really enough to be helpful, so might as > well keep the speed.) The bucket code could use crit-bit tries, so seaching for the entry also validates the key. In fact, you could dispense with the hashing completely and the rebuilding. Search would involve a linear search for the length of the longest key (which you have to do anyway). The crit-bit trie is especially efficient for storing common prefix strings...such as namespaces and proc names. You could also adapt this to use entire words as branch points, which would make it easier to search hierarchical data without probing each branch. tom jackson |
From: Donal K. F. <don...@ma...> - 2010-02-17 15:40:53
|
On 17/02/2010 12:37, Andreas Leitgeb wrote: > What is the deal with this thread? > > The currently used hash-function is, while very good and fast for > typical tcl-progs use-patterns, very easily "exploitable"(towards bad > performance) when storing (possibly malevolent) users' strings as > keys. > > The obvious exit (maintaining random hash-seeds) is blocked for > reasons of binary compatibility. Doesn't help unless an expensive hash function is used. The cheap ones are just as vulnerable with salting as without (since you can't use a different salt per key, and the cheap hashes are vulnerable to the fact that if h($a)==h($b), then h($s$a$t)==h($s$b$t), for any $s and $t). > Until 9.0 at which point the currently blocked obvious exit becomes > an option. Following talking this over with Kevin Kenny, I suspect that the right thing to do is to roll back to the JO hash function (because it's fast on most of the types of keys it encounters) and to have a way to let people plug in different mapping mechanisms for arrays. Which doesn't need to be done for 8.6 (and won't be; it requires work). Being able to put a different mapping mechanism under an array's clothes is something I've been wanting to do for a while anyway, since maybe 2003. For one thing, it would let us put things like BLT vectors and metakit's database on a more solid footing. Allowing the use of a stronger hash function - or a balanced tree - well, that ought to be trivial (and implementable in an extension) if the basics are done right. The only other truly satisfying solution - upgrading to a strong hash function with proper entropic salting - seems to be too expensive to really justify for all code. (FNV is only marginally more resistant than the classic hash function, not really enough to be helpful, so might as well keep the speed.) Donal. |
From: Gustaf N. <ne...@wu...> - 2010-02-17 14:24:05
|
Am 17.02.10 13:37, schrieb Andreas Leitgeb: > What is the deal with this thread? > from my perspective: a fix of symptoms (collide script, replace tclhash by fnv) was presented as a solution but it is just shifting the problem. > The obvious exit (maintaining random hash-seeds) is blocked for reasons of binary > compatibility. > Actually, this argument was brought up earlier, but i don't understand it. It is easy to provide a seed on a per-script level without requiring different interfaces or structures (see sketch below). For hash functions using the seed properly (like murmur, lookup3) this solves the problem (does not help for e.g. fnv). The biggest difference is that the order of keys will change from run to run. As pointed out earlier, no one can rely on this order. -gn ===== HashVarKey( .... static int seed = 0; static Tcl_Mutex initMutex = 0; if (seed == 0) { struct timeval t; Tcl_MutexLock(&initMutex); if (seed == 0) { gettimeofday(&t, NULL); seed = t.tv_usec; if (seed == 0) seed = &xdeadbeaf; /* avoid seed == 0 */ } Tcl_MutexUnlock(&initMutex); } ... #if defined(MURMUR) result = hash(string, length, seed); #endif ... |
From: Andreas L. <av...@lo...> - 2010-02-17 13:10:34
|
What is the deal with this thread? The currently used hash-function is, while very good and fast for typical tcl-progs use-patterns, very easily "exploitable"(towards bad performance) when storing (possibly malevolent) users' strings as keys. The obvious exit (maintaining random hash-seeds) is blocked for reasons of binary compatibility. So the next best approach is just hiding the "core" of the hash-function a bit better (say on top of the wardrobe rather than under the bed), at a cost for all the other tcl-scripts that do not have to hash malevolent users' input. Do I miss something? If not: are you serious? Where unexploitable hashing is needed, why not just wrap an array with some accessors into a namespace, and have the element-setter scramble the key and the getters unscramble it? Then you can play hide'n'seek with your malevolent users as long as you like. They finds out your scrambling and creates new bad strings? Just drop in a modified (de-&)scrambler and let him search again, rather than waiting for next (pre-9.0) version of Tcl coming with yet another hash-function... Until 9.0 at which point the currently blocked obvious exit becomes an option. |
From: Donal K. F. <don...@ma...> - 2010-02-17 11:56:54
|
On 17/02/2010 01:36, Tom Jackson wrote: > No shit! The cost is extreme. I've been probing the cost of using various hash functions (I've done the comparison of the classic algorithm against FNV and lookup3) when used in Tcl and the attached script contains preliminary results for the two keysets: all dictionary words numbers from 0 to 500k The tests basically check the cost of doing 9 hash lookups per key; array lookups are close to as pure a test of this as we can generate in Tcl. (The cost of the "machinery" is about a third of the total time.) The cost of FNV over these (presumably near normal) keys is 3–10% over the classic algorithm. The cost of lookup3 is 10-30% over the classic algorithm. Other key-sets might have other costs. Note that the only thing that is changing between the runs is the contents of the function TclHashObjKey in tclObj.c, and the change was done by altering #ifdefs only. Everything else is just Tcl HEAD (plus a compiled in lookup3.c; I'm not retyping *that* code!) > The problem here is that in order to get good distribution to all > bits, you have to use a near cryptographic quality hash. A > cryptographic hash can be easily truncated to generate near random > bucket indexes. This is the desired goal, but it is much more easily > achieved most of the time with a single modulus operation which takes > into account all bits. Which is easier: one modulus operation or a > near-cryptographic hash which can be safely truncated? Mod is quite an expensive operation, at least on this machine. Applying a mod operation (mod a large prime) to the result of the classic hash before returning nearly takes the cost of hashing up to equivalence with FNV. > The system only needs to be broken once. All that is required is a set > of keys (maybe a few thousand) which hash to the same value. After > that the entire algorithm is broken (assuming the goal is to poison > the bucket code with collisions). Assuming a 32 bit key-space, it > should be easy to find at least one set of 10000 strings which share > the same hash value. > > Do I care to find such a set? NO! But this is a triviality for > cryptographers. Remember, cryptographers are mathematicians. They have a definition of "trivial" which doesn't match that used by software engineers. I can remember a number of acquaintances state that providing examples once an existence proof was done was trivial. Can't say that I really bought their argument... :-) In terms of hash attacks, they only have real fangs (given that we don't conflate hashcode equivalence with equality) if it is possible to make either: a) A large number of *short* keys with the same hashcode, or b) A substantial number of keys with the same hashcode as some particular key from a set not nominated by the attacker (typically defined by the communication protocol) The aim of the first one is to swamp the hash table with totally irrelevant things. The aim of the second is to retard required lookups. > I know there is a huge bias against AOLserver, so I digress to Apache. > How does Apache handle headers? > >> From http://thomas.eibner.dk/apache/table.html (Apache table API): > > "Tables in Apache works almost like hashes in Perl. The main > difference between the two is that you can have two of the same keys > in the Apache table API. The table lookups are done in a > case-insensitive manner." > > So for everyone who has some problem with AOLserver, maybe note that > headers in Apache are handled very similarly. More specifically they > are not handled as associative arrays or dictionaries. Checking the source to Apache (strictly the APR) I see that they're using hashing inside their implementation of tables. OK, they're using a truly trivial hash, but I suppose it's better than what they had before (a single linear list). Check for yourself at: http://svn.apache.org/repos/asf/apr/apr/trunk/tables/apr_tables.c > All I can say is that the more I investigate the original Tcl hash, > the more my respect grows. It seems to perform better most of the > time. It's very very fast, but weak. With non-malicious inputs, its weakness doesn't really matter all that much and its speed is nice. With malicious inputs, which are easy to create, it's no better than a linear list (plus some memory costs). The alternative hash functions that are recommended in the literature (FNV, lookup3) are slower, but harder to push into the ill-performing case (in the case of lookup3, it is believed to require effort nearly equivalent to just scanning through all possible inputs, which isn't a threat; FNV is faster but weaker). Donal. |
From: Gustaf N. <ne...@wu...> - 2010-02-17 11:49:38
|
Am 16.02.10 19:30, schrieb Tom Jackson: > The problem is not > the hash function, it is generally the method of transforming the hash > value into a bucket index. not sure, what you refer to as a problem. if the hash function returns for many keys the same hash value, any bucket distribution mechanism (from hash value to index) won't work, no matter whether the least significant bits or a modulo is used. The anomaly of "collide" can be fixed via e.g. FNV, but there are other algorithms that seem to provide better results and which are faster. However, with all these functions it is still possible to exploit the linear search in the buckets with tailored sets of keys. The problem can be solved by replacing the linear search in the buckets with something with better properties (e.g. avl tree). But also here, the implementation would benefit from better hash functions to avoid string comparisons). Another, even more complex task would be to replace the bucket management at all by algorithms like linear hashing or extendible hashing, but that is an even larger project requiring much more evaluation and testing. Btw, that would be an interesting GSOC Topic... I was starting to look into the topic, since xotcl spends a lot of time in the hash lookups - so any improvements here would help xotcl (and other tcl-oos) as well. The situation with xotcl is in many aspects different to the "attack" scenarios: in an oo environment, many small hash tables are involved (the number of variables, methods per class is typically less than 20, non-compiled local variables of procs are in hash tables, etc.). Therefore i like there the fast "classical" hash function of Tcl. I would be also happy about a reduction of constant cost from the Tcl hash machinery. On the first part, i am relaxed, since at least in some scenarios, xotcl could provide its custom hash type (such as tcl vars do it). -gustaf neumann |
From: Kristoffer L. <se...@sc...> - 2010-02-17 11:18:29
|
On 17 Feb 2010, at 13:04, Gustaf Neumann wrote: > guess, you got me wrong. The mentioned option was not to > replace hashing by trees in general (imho worst engineering option) > but to replace the linear search in the bucket list by a balanced > tree, such that a bucket list with 32k elements requires instead > of 16k comparisons ~15 on average. Sticking in my 2 cents here, but this is indeed a common knowledge solution to the hash problem and a very reasonable one if you can avoid heavy overhead for small hashes. I'm a bit of a datastructures purist and love balances trees. The aesthetic, structural and self-managing aspect of them have a huge appeal to me. However, I do understand their limitations, and a combination might indeed be a decent solution — if there even is a serious problem to begin with. -- Kristoffer Lawson, Co-Founder, Scred // http://www.scred.com/ |
From: Gustaf N. <ne...@wu...> - 2010-02-17 11:05:14
|
Am 16.02.10 17:45, schrieb Donal K. Fellows: >> How bad is this in practice? This "attack" is not more than a bad >> performance in a pathological case, or? > It's exactly a performance issue. The issue is exactly that it is far > too easy to trigger this worst-case behaviour. my point is: for an "attacker", it is still very easy to attack the runtime behavior of the existing bucket distribution algorithm with FNV (or some other hash function). Get http://alice.wu-wien.ac.at:8000/xowiki/download/file/keys an try with a seeded FNV (i used 0x811c9dc5 as seed). set f [open /tmp/keys r]; set keys [read $f]; close $f foreach key [split $keys] { set x($key) 1} > 55917 entries in table, 65536 buckets > number of buckets with 0 entries: 65528 > number of buckets with 1 entries: 4 > number of buckets with 10 or more entries: 4 > average search distance for entry: 6989.14 you see, most buckets are empty, the avg is horrible. Certainly, other hash-functions need a different set of keys for the "attack". >> Concerning FNV: for string lengths less then two characters, FNV and >> the classic Tcl function return the same hash value (just a side >> note). > Only because I'd made a mistake. :-) The FNV algorithm uses a baseline > value which means that the hashcodes are different. i should have spotted this as well! I fixed this on my implementation. The overall effects are very little, the distribution pattern of the "collide" test is not changed by using a seed value. > If the hashcodes are the same, the problem happens. Of course. The bucket distribution algorithm based on least significant bits can't provide an even distribution when hash values are identical. (Tom, a modulo function does not help either in this case). > If the hashcodes are > only partially the same (to some number of LSBs) then small hash tables > have a problem, but things will go away over some size. No, the problem won't go away, if "attacked" (just the likleyhood for accidentially happening might go down). The point is, even if a hash-algorithm would produce no duplicates (which is impossible), one can get an strongly unbalanced bucket distribution (as long the number of buckets is significant smaller than the number of hash values). This kind of attack is not hard at all. > Switching to a wholly new map system has the disadvantage of needing > really heavy testing. .... Changing the hash function > has much smaller "engineering impact". Of course. Especially, when going towards e.g. linear hashing (using more sophisticated bucket splitting). On the other hand, this area is well understood and is used in many systems today. > The main benefit of going to a balanced tree is that it provides a > strong upper bound on the cost of mapping a key to a value for any > particular size of map. well, it changes worst-case linear to worst-case logarithmic. > Whether that's as low a cost as for Tcl's hash > tables in the normal case, well, I don't know. For a balanced binary > tree with 32k elements, I'd expect ~15 comparisons per lookup. With hash > tables, we're talking 2 or 3 (by your figures). guess, you got me wrong. The mentioned option was not to replace hashing by trees in general (imho worst engineering option) but to replace the linear search in the bucket list by a balanced tree, such that a bucket list with 32k elements requires instead of 16k comparisons ~15 on average. This approach would require less engineering than going towards e.g. linear hashing. It would be a remedy (safety net) against all mentioned hash attacks. > As noted before, lookup3 is probably the best hash. i would consider Murmur 2,0 as a serious contender. It is simpler and apparently faster than lookp3 (also newer), with excellent properties (see statistics at http://murmurhash.googlepages.com/) > IIRC, some of the others make alignment assumptions > AIUI, murmur is one of the hashes with this problem For lookup3 and murmur exist version for aligned and unaligned data. Part of the length of the code of lookp3 is that it contains essentially 3 versions (reading 32-bit chunks, 16-bit chunks and 8-bit chunks). The 32-bit chunk version can most probably lead to unaligned access, when the keys are not on word boundaries. const uint32_t *k = (const uint32_t *)key; /* read 32-bit chunks */ The exactly same is true for murmur and lookup3, both support char and word chunks for input. > > PS: i have the suspicion that the avg search length reported by > > "array statistics" is not correct > > I've not had problems with (admittedly trivial) testing. You're > measuring at a point when there are about two times as many values as > buckets, so perfect loading will produce two values per bucket. i would think, that the average search length per entry should be calculated via sum(entries(bucket)^2)/numEntries for unsuccessful searches, for successful searches the half (every insert requires an unsuccessful search). -gustaf neumann |
From: Joe E. <jen...@fl...> - 2010-02-17 05:54:04
|
Paul Alfille wrote: > > I am trying to eliminate compiler warnings and have run into a problem with > the definition of Tcl_SetResult. > > The code in question is: > > Tcl_SetResult(interp, Tcl_ErrnoMsg(Tcl_GetErrno()), TCL_STATIC); The preferred replacement for Tcl_SetResult(..., TCL_STATIC) is: Tcl_SetObjResult(interp, Tcl_NewStringObj(..., -1)); where "..." can be a a char * or a const char *. This is retroforwardcompatible with all versions of Tcl from 8.0 onwards. If you need to support Tcl 7.6 or earlier, casting away the const (or just ignoring the warning) is the way to go. --Joe English jen...@fl... |