From: Donal K. F. <don...@ma...> - 2010-02-07 09:36:27
Attachments:
donal_k_fellows.vcf
|
Hi everyone This is just a short note to let everyone know that I've just upgraded Tcl 8.6's string (and Tcl_Obj) hash algorithm. Specifically, I've changed from using JO's old algorithm that was approximately[*]: hash = ∑ str[i] * 9**i to the FNV algorithm <URL:http://isthe.com/chongo/tech/comp/fnv/> which is similar in many ways but uses a large prime as the power (instead of 9) and XOR as the combiner instead of addition. The result is a bit faster when I test it, and a little bit harder for code to push into situations where it performs badly. I don't know whether this is from faster computation or better distribution of the resulting hash values. How much better is the new function? I'm sorry to say that the state of research in this area is remarkably poor; very few people are looking at what makes for a good *non-cryptographic* string hashing function. About the only thing that seems to be known for sure is that the power needs to be odd so that low order bits don't get lost. But the FNV algorithm seems to be the most thoroughly understood, and there was a Public Domain implementation (which helped). The big difference is that the amount of mixing of bits is much larger than before. The main reason for this note is that it affects anyone who had code that assumed anything about hash iteration order. If you've got that assumption in your code, it's now been broken. This is more likely to impact on tests rather than production code - production code is more likely to have to deal with an expansion of the hash table which changes the order anyway, and so won't make the assumption - and the fix is to sort the values extracted from the hash before testing for equality. :-) Donal. [* The equation is true if you index from the end of the string... ] |
From: Eric M. <er...@el...> - 2010-02-08 06:57:01
|
Donal K. Fellows wrote: > Hi everyone > > This is just a short note to let everyone know that I've just upgraded > Tcl 8.6's string (and Tcl_Obj) hash algorithm. I'm not much involved in the development of Tcl anymore, but shouldn't a change of this magnitude require a TIP? The consequences of changing this algorithm are significant, yet you present this work as a fait accompli, with the implicit assumption that it is simply understood that the existing Tcl hash is bad and in need of replacement. Do you have data to support that conclusion? What are the specific scenarios in which the existing hash is inadequate? > How much better is the new function? I'm sorry to say that the state of > research in this area is remarkably poor; very few people are looking at > what makes for a good *non-cryptographic* string hashing function. This statement does not give me any confidence either in your analysis of the deficiencies of the existing hash or the advantages of the new hash. At the very least, it seems you should be able to demonstrate the superiority of FNV over the existing hash for the specific problem scenarios that prompted you to change the hash in the first place. Bob Jenkins has spent a lot of time thinking about and designing hash algorithms. Did you look at his web page here: http://burtleburtle.net/bob/hash/index.html Or his article in Dr. Dobbs Journal, reprinted and expanded here: http://burtleburtle.net/bob/hash/doobs.html There are some (albeit brief) comments on FNV in the latter. Best regards, Eric Melski |
From: Donal K. F. <don...@ma...> - 2010-02-08 11:37:16
Attachments:
donal_k_fellows.vcf
|
On 08/02/2010 06:40, Eric Melski wrote: > I'm not much involved in the development of Tcl anymore, but shouldn't a > change of this magnitude require a TIP? The consequences of changing > this algorithm are significant, yet you present this work as a fait > accompli, with the implicit assumption that it is simply understood that > the existing Tcl hash is bad and in need of replacement. Do you have > data to support that conclusion? What are the specific scenarios in > which the existing hash is inadequate? The primary scenarios involved were that the hash code, which is known to be part of the critical path for a lot of Tcl programs, is both slower than it might be (I'll return to the analysis of that later) and particularly easy to exploit in nasty ways. In particular, it was exceptionally simple to generate large numbers of strings with the property that they have the same full hash code. This made it trivial to exploit in situations such as processing HTTP headers; I can remember discussing this in depth back at the Tcl conference in 2003 (in the "Flintstones Cave" bar). While it is of course clear that there is no way to prevent that completely without using a cryptographic hash function (slow!) or extra per-hash randomization state (not compatible with existing code) using a stronger hash function that performs more mixing of the bits is a benefit, especially as table sizes increase. The advantage of the FNV algorithm is that it does plenty of mixing and uses a multiply to do it. That latter part is an advantage because it turns out that on modern hardware (I've checked x86, SPARC and ARM[*]) there is a built-in multiply instruction, making the operation fast. This means that there's no real advantage to using a shift-based combiner; a multiply will do a pretty good job at a fraction of the cost. (There are some issues with the fact that it doesn't mix the bits downwards, but that's not a disaster with tables of non-trivial size.) When I've measured the speed of the hashing, I've got these results: Using FNV hashing instead of Tcl's traditional algorithm with this code: proc foo {} { set x(foobar) [set x(barfoo) [set x(ballyhoo) 1]] return } time { foo } 10000000 Over the long-term, that does 3 hash lookups of fairly short strings (representative length?) per iteration. Specifically, it does one for each array element set. (The overall lookup of the command name gets cached in the 'foo' Tcl_Obj, the array name is a local variable and so is handled by index, the entries are unset at the end of the run without any hash lookups, and no traces are involved at all). Old: 1.9507125463 microseconds per iteration New: 1.5804703108 microseconds per iteration That's a consistent ~19% percent faster. By comparison (the builds I'm testing with aren't quite synchronized in all aspects) with this code: proc foo {} {set x [set y [set z 1]]; return} time { foo } 10000000 That is, with identical operations except for using all local variables (there's no penalty for any length of variable name at that point; all go to LVT accesses) then I get these differences: Old: 0.8757411546 microseconds per iteration New: 0.9598806448000001 microseconds per iteration Don't know why there is a slowdown there; it seems to be something other than hashing. I get the pattern reproduced when I use other tests too, for example this (which has more non-hash operations): proc foo {} {for {set i 0} {$i<1000000} {incr i} { set x(foobar) [set x(barfoo) [set x(ballyhoo) 1]] }} time foo # Old:252616/New:233188 ms/iter vs. proc foo {} {for {set i 0} {$i<1000000} {incr i} { set x [set y [set z 1]] }} time foo # Old:102212/New:168359 ms/iter If anything, that would lead me to conclude that the improvement in hashing is better than the raw figures would suggest, but I can't prove that. All the above were tested with the default optimizing build on OSX. > This statement does not give me any confidence either in your analysis > of the deficiencies of the existing hash or the advantages of the new > hash. At the very least, it seems you should be able to demonstrate the > superiority of FNV over the existing hash for the specific problem > scenarios that prompted you to change the hash in the first place. See the above. As noted previously, there's no practical way right now to prevent hash collision attacks at the moment while still maintaining binary compatibility (I can't store a randomization factor in the hash table because the structure size needs to remain the same as it currently is in already-built extensions) but the FNV hash does perform better in practice. > Bob Jenkins has spent a lot of time thinking about and designing hash > algorithms. Did you look at his web page here: > > http://burtleburtle.net/bob/hash/index.html > > Or his article in Dr. Dobbs Journal, reprinted and expanded here: > > http://burtleburtle.net/bob/hash/doobs.html > > There are some (albeit brief) comments on FNV in the latter. I did read that article prior to selecting the FNV algorithm. I'll admit to having some concerns about the FNV hash function as used, but it does offer some decent performance for short words (the Tcl case!) where we also can't guarantee alignment. And the FNV hash has one huge advantage over Bob's lookup3 hash: it's actually simple to implement (and to do so correctly). Bob's is definitely *not* simple and requires scanning the string twice when it is a normal zero-terminated C string (one of the cases supported by Tcl's hash table system). Again, changing the definition of a Tcl_HashEntry to work around that problem would break binary compatibility (Tcl_GetHashKey is a macro in tcl.h, and effectively fixes the size of a hash entry for the 8.* series). The changes I've made are purely internal to Tcl. No public interface is impacted at all. The only behavioural difference is an alteration to iteration order, but that's the whole point. Moreover, robust code never relied on iteration order anyway; it's *never* been guaranteed to be constant[**] as the hash is manipulated. It's probably a good idea for me to distil this into a TIP, even if just an informational one. The hash algorithm is an implementation detail, but it is an important detail that deserves to be documented. Donal. [* What other processor architectures do we care about in 2010? ] [** Except for dictionaries, and they use a different mechanism that is wholly independent. ] |
From: Joe E. <jen...@fl...> - 2010-02-08 19:56:11
|
Eric Melski wrote: > Donal K. Fellows wrote: > > Hi everyone > > This is just a short note to let everyone know that I've just upgraded > > Tcl 8.6's string (and Tcl_Obj) hash algorithm. > > I'm not much involved in the development of Tcl anymore, but shouldn't a > change of this magnitude require a TIP? I do not believe this requires a TIP. This is not really that big of a change. The only user-visible change involves the order of results returned by [array get] and others, which have always been explicitly specified to be indeterminate. > [...] > data to support that conclusion? What are the specific scenarios in > which the existing hash is inadequate? One area that I'm aware of: Tcl's traditional hash function is more prone to collisions (and it's trivially easy to generate an arbitrary number of them). The FNV hash is one of three (that I'm aware of) that AFAIK are considered among the state-of-the-art. (The other two I know of are Jenkins' hash, which you mentioned, and MurmurHash.) --Joe English |
From: Donal K. F. <don...@ma...> - 2010-02-08 21:56:18
Attachments:
donal_k_fellows.vcf
|
On 08/02/2010 19:56, Joe English wrote: > Eric Melski wrote: >> data to support that conclusion? What are the specific scenarios in >> which the existing hash is inadequate? > > One area that I'm aware of: Tcl's traditional hash function > is more prone to collisions (and it's trivially easy to generate > an arbitrary number of them). That was what prompted me to act. It's been nagging at the back of my mind for a good few years now too, ever since the thread that started with http://groups.google.co.uk/group/comp.lang.tcl/msg/5d6888cd479cf583 way back in 2003. It doesn't *always* take me 6.5 years to act on something. (NB: goal is to make sure that it is unlikely that we end up in an pathological case, which isn't quite the same thing as ensuring a minimal chain length is obtained in the normal case.) > The FNV hash is one of three (that I'm aware of) that AFAIK > are considered among the state-of-the-art. (The other two > I know of are Jenkins' hash, which you mentioned, and > MurmurHash.) Also, of the three you've just mentioned, only FNV is a simple slot-in replacement for the old code. In particular, it's easy to compute over C strings because it is strictly byte-at-a-time. Both Jenkins's and MurmurHash really want to know the length of the data being hashed first (which lets them be a bit faster, which in turn compensates for the fact that they are much more complex; byte-at-a-time forms for these would be excessively complex). The other key point is that I've explicitly rejected taking any steps that involved a change to *any* public structure (i.e., Tcl_HashTable and Tcl_HashEntry; there are macros that point into these so we can't change things around radically). The only thing that changed was the hash algorithm. (We could do better if we had more freedom to change things, but we don't. Breaking either binary compatibility or source compatibility is not an option.) Donal. |
From: Eric M. <er...@el...> - 2010-02-08 18:45:28
|
Donal K. Fellows wrote: > When I've measured the speed of the hashing, I've got these results: > > ... > > Old: 1.9507125463 microseconds per iteration > New: 1.5804703108 microseconds per iteration > > That's a consistent ~19% percent faster. > That's very impressive, but with only three hash entries you can hardly say that's a representative test case. Where's the demonstration of the hash for, eg, all of the standard commands in Tcl? How about all the words in /usr/dict/words? And of course, you've *said* that the new hash is better at avoiding collisions, but saying it don't make it so: where's your data to back up that claim? Again, it would be nice to see timing information for computing the hash (and *only* computing the hash, not also creating Tcl variables and allocating memory and running Tcl byte codes, etc, etc); and information on the distribution of the resulting hash entries (something like the Tcl hash stats data, for example), on several large real-world sample sets. I'm sure you will agree that for a change as fundamental to the core operation of Tcl, it is worth the effort to do this properly. > By comparison (the builds > I'm testing with aren't quite synchronized in all aspects) with this > code: > Wait a second: are you saying that the impressive results you just showed us are comparing two tclsh builds that have more than just the hash function changed? I think that invalidates any data you have shown. There's too many variables changed for you to convincingly state that the new hash function is performing any better than the original hash, especially when you're talking about differences of less that half a microsecond. > If anything, that would lead me to conclude that the improvement in > hashing is better than the raw figures would suggest, but I can't > prove that. > If you can't prove that the new hash is better than the old hash, then you shouldn't be making changes to it. > All the above were tested with the default optimizing build on OSX. > It would be nice to see comparisons on other platforms. According to Wikipedia, OSX represents only about 5% of the total market share of computers (http://en.wikipedia.org/wiki/Usage_share_of_operating_systems). By comparison, XP represents about 60%, and Vista about 25%. Do you have comparison data on either of those platforms? How about on any of the modern Linux distributions (Ubuntu, SUSE, etc)? > I did read that article prior to selecting the FNV algorithm. [...] the FNV hash has one huge advantage over Bob's lookup3 hash: it's actually simple to implement (and to do so correctly). Please don't interpret my comments as advocacy for Bob's lookup3 hash. I have no preference for any particular hash function. I do have concerns about changing a hash function that has worked quite well for 20 years, without sufficient consideration and experimentation to justify that change. > The changes I've made are purely internal to Tcl. No public interface is > impacted at all. > I don't believe that fact is at all relevant to the question of whether the change requires a TIP. Best regards, Eric Melski |
From: Donal K. F. <don...@ma...> - 2010-02-09 02:21:39
Attachments:
hashes.c
donal_k_fellows.vcf
|
On 08/02/2010 18:45, Eric Melski wrote: > That's very impressive, but with only three hash entries you can hardly > say that's a representative test case. Getting a properly representative evaluation of the hash function itself is tricky (actually impossible in Tcl itself; too much noise from other things). The attached program helps clarify (and shows also that speed of hashing isn't the whole story). > It would be nice to see comparisons on other platforms. According to > Wikipedia, OSX represents only about 5% of the total market share of > computers > (http://en.wikipedia.org/wiki/Usage_share_of_operating_systems). By > comparison, XP represents about 60%, and Vista about 25%. Do you have > comparison data on either of those platforms? How about on any of the > modern Linux distributions (Ubuntu, SUSE, etc)? OTOH, those all are typically deployed with the same processor architecture and none of the hashing requires *any* OS interaction. What changes there are between the platforms will be due to compiler differences (I'm using a reasonably recent gcc; I'm not about to acquire either the Microsoft or Intel compilers just for this argument). FWIW, with a focussed test that *just* does comparison of the hashing speed (see attached; the contents of the two hashing functions were copied directly from HashStringKey in Tcl 8.5 and the HEAD) we find that on x86 the old algorithm is indeed faster. But that's not the whole story. Comparing the statistics out of the two hash algorithms (not speed, so easier to compare between builds) then the loading distribution patterns for each of the words in /usr/share/dict/words is very similar. If we take a table containing all integers from 0 to 999999 though, we see a clearer difference: Classic algorithm (as reported by [array statistics]): 1000000 entries in table, 1048576 buckets number of buckets with 0 entries: 502271 number of buckets with 1 entries: 276353 number of buckets with 2 entries: 168961 number of buckets with 3 entries: 51585 number of buckets with 4 entries: 31912 number of buckets with 5 entries: 7806 number of buckets with 6 entries: 6553 number of buckets with 7 entries: 1340 number of buckets with 8 entries: 1086 number of buckets with 9 entries: 344 number of buckets with 10 or more entries: 365 average search distance for entry: 1.8 FNV algorithm: 1000000 entries in table, 1048576 buckets number of buckets with 0 entries: 400740 number of buckets with 1 entries: 388150 number of buckets with 2 entries: 185898 number of buckets with 3 entries: 58036 number of buckets with 4 entries: 13146 number of buckets with 5 entries: 2308 number of buckets with 6 entries: 266 number of buckets with 7 entries: 30 number of buckets with 8 entries: 2 number of buckets with 9 entries: 0 number of buckets with 10 or more entries: 0 average search distance for entry: 1.5 Feel free to reproduce these figures yourself. > Please don't interpret my comments as advocacy for Bob's lookup3 hash. > I have no preference for any particular hash function. I do have > concerns about changing a hash function that has worked quite well for > 20 years, without sufficient consideration and experimentation to > justify that change. The figures above show that the FNV algorithm does *better* at distributing integers over the hash table. On dictionary words, it does slightly better (but only very slightly to be honest; it's only really a hair's-width of difference). Better distribution properties mean faster lookups, of course. >> The changes I've made are purely internal to Tcl. No public interface is >> impacted at all. >> > I don't believe that fact is at all relevant to the question of whether > the change requires a TIP. I'd say that the biggest problem left in the hash table implementation following the switch is that we're using hash bucket array sizes that are powers of 2. That's cheap, but discards a lot of information from the higher-order bits of the hash value. (OTOH, mod is fairly expensive so it would be nice to avoid it.) Donal. |
From: Larry M. <lm...@bi...> - 2010-02-09 02:42:21
|
Thanks for this Donal, it helps. And thanks for diving into this, it helps make tcl better. I wanted to (briefly) touch on process. After a lovely discussion on #tcl it seems like it might be worthwhile to point out the value of a review process. I think that you especially may appreciate the benefits. Reviews are not for beating people up. If people need beating that is best done in private. Reviews are somewhat good at catching problems. That's what people think they are for. Reviews are for educating other people. That's what people forget. IMHO, if this sort of thing went through a review process - you would get more strokes for taking on this problem - at least the core would get educated about what you are doing - maybe someone would have something useful to add I agree with Colin (or anyone) if/when they say "don't slow down the people who are coding good stuff". But I disagree with Colin et al that reviews do that. Good code reviews reward the guy doing the work, educate the people following along, there is pretty much no downside when it is done right. --lm |
From: Gustaf N. <ne...@wu...> - 2010-02-09 17:19:26
|
Am 09.02.10 02:47, schrieb Donal K. Fellows: > On 08/02/2010 18:45, Eric Melski wrote: >> That's very impressive, but with only three hash entries you can hardly >> say that's a representative test case. ... > FWIW, with a focussed test that *just* does comparison of the hashing > speed (see attached; the contents of the two hashing functions were > copied directly from HashStringKey in Tcl 8.5 and the HEAD) we find that > on x86 the old algorithm is indeed faster. indeed, the attached program shows me that the hashing speed of the old function is constantly faster than FNV (when compiled with -Os like Tcl, intel mac, 64bit, 10.6.2). old: 2137177 us (1517306493d) new: 2296518 us (3684793705d) (i just added an outer loop with 500 iterations to obtain more precise timings). By using the function in tcl 8.5.7 for HashStringKey(), i could not get the 15% speedup, but i see a rather larger slowdown for the first test. proc foo {} {set x(foobar) [set x(barfoo) [set x(ballyhoo) 1]];return} puts [time {time { foo } 10000} 1000] proc foo {} {set x [set y [set z 1]]; return} puts [time {time { foo } 10000} 1000] # classic # 9105.318082 microseconds per iteration # 4686.419043 microseconds per iteration # fnv # 9367.280143 microseconds per iteration # 4684.1961710000005 microseconds per iteration Btw, the old function could be made slightly faster by avoiding one incr operation for (c = *string ; c ; c = *++string) { result += (result<<3) + c; } The output of Donal showing the buckets indicates certainly that FNV leads to less collisions. However, since from my experience, we have in tcl many hash tables with only a few items. I would not be surprised if most programs run rather slower with FNV than with the old tcl hash function. maybe, some handcoded inline assembly could speed up the function. -gn |
From: Donal K. F. <don...@ma...> - 2010-02-10 01:21:48
Attachments:
donal_k_fellows.vcf
|
On 09/02/2010 16:15, Gustaf Neumann wrote: > The output of Donal showing the buckets indicates certainly > that FNV leads to less collisions. However, since from my experience, > we have in tcl many hash tables with only a few items. > I would not be surprised if most programs run rather slower > with FNV than with the old tcl hash function. It depends critically on the operation mix, what keys are being used, etc., and it's hard to test in situ because it's typically used in conjunction with other (much more expensive) operations like memory allocation. Given that, keeping bucket chains short is probably the best metric of all. For truly small tables, a Lisp-style alist would actually be cheaper. Not that they scale... > maybe, some handcoded inline assembly could speed up the function. Maybe, but then we'd have to maintain it on all our supported platforms. :-( The only time we've ever used assembly in Tcl's implementation was because there was *no* other way to do it (and was for dealing with particularly odd parts of Windows on x86 only). Donal. |
From: Lars H. <Lar...@re...> - 2010-02-14 22:56:15
|
Donal K. Fellows skrev: > On 09/02/2010 16:15, Gustaf Neumann wrote: >> The output of Donal showing the buckets indicates certainly >> that FNV leads to less collisions. However, since from my experience, >> we have in tcl many hash tables with only a few items. >> I would not be surprised if most programs run rather slower >> with FNV than with the old tcl hash function. > > It depends critically on the operation mix, what keys are being used, > etc., and it's hard to test in situ because it's typically used in > conjunction with other (much more expensive) operations like memory > allocation. Given that, keeping bucket chains short is probably the best > metric of all. Indeed. That's what can cause a difference in the asymptotic complexity of these algorithms. > For truly small tables, a Lisp-style alist would actually be cheaper. > Not that they scale... That is exactly what is in the buckets, is it not? Raises the question of whether it might be a good idea to have hash tables start out with exactly one bucket and skip computing hash keys until the first time the table is rebuilt. (This adds a small constant time to all accesses, and drops one larger constant and one key-size-proportional term for accesses in small tables.) It's probably not possible as a change under-the-hood of Tcl8's hash tables (e.g. hash entries cache hash keys, if I read the code correctly), but might be of interest to those who consider alternatives for e.g. dicts or Tcl9. >> maybe, some handcoded inline assembly could speed up the function. An interesting possibility in that area might be to use something like the BSWAP instruction (http://www1.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_6/CH06-1.html#HEADING1-291) instead of a multiplication, on every second iteration of the loop (I'm imagining a "Duff's device" style loop unrolling). Since this switches what is most and least significant in the hash word, it would have the effect of making the remaining multiplications mix bits "downwards" as well as "upwards". > Maybe, but then we'd have to maintain it on all our supported platforms. > :-( The only time we've ever used assembly in Tcl's implementation was > because there was *no* other way to do it (and was for dealing with > particularly odd parts of Windows on x86 only). Well, it's not technically necessary to use the same hash function on all platforms, is it? On the other hand, I agree it's probably not significant enough to bother with doing anyway, once the more obvious flaws of the current hash function has been delt with. One very glaring way of looking at these flaws is that the old hash function only distinguishes between 2551 different two-byte sequences, despite there being 65536 of them; this is less than 4%. Conversely, for *any* pair of neighbouring bytes of a key, there are on average 24 other pairs of bytes that could be substituted without changing the value of the old hash function. Yet another way of looking at it (though probably hard to formalise): the old hash function only keeps about 3.3 bits of "randomness" for every 8 bits of key fed in to it! Gustaf Neumann skrev: > - At least on x86 Intel, gcc compiles the classical Tcl string hash > function pretty well. The loop body if the hash function > "result += (result<<3) + c;" > results in just two machine instructions (one lea + one add). > This is hard to beat by assembly coding (if someone is interested, i > have an apparently slighter faster inline assembly version for x86 > 64bit). So, the classical Tcl hash function appears well suited for > modern compilers, and gcc does very well here (i used gcc 4.2.1). > FNV needs the slower integer multiply, that's why the FNV > is slower (measured by Donals hashes.c test). That the old function multiplies by such a low number as 9 is why it's not so good already at distinguishing between two-byte keys; the 2551 above arises as (255*9+255) - (0*9+0) + 1 -- maximal contribution from two bytes minus minimal contribution from two bytes plus 1. Multiplying by 1 instead (i.e., dropping that result<<3 term) would probably make it even faster, but also an even poorer hash. ;-) Lars Hellström |
From: Larry M. <lm...@bi...> - 2010-02-15 03:20:43
|
> That is exactly what is in the buckets, is it not? Raises the question > of whether it might be a good idea to have hash tables start out with > exactly one bucket and skip computing hash keys until the first time > the table is rebuilt. (This adds a small constant time to all accesses, > and drops one larger constant and one key-size-proportional term for > accesses in small tables.) It's probably not possible as a change > under-the-hood of Tcl8's hash tables (e.g. hash entries cache hash > keys, if I read the code correctly), but might be of interest to those > who consider alternatives for e.g. dicts or Tcl9. We have an hash implementation that I and another guy built at SGI based on Oz Yigit's sdbm code. It's more or less dbm(3) compat, has in memory and mmap based on disk versions, and scales down somewhat. I believe that you can get the index and data in 128 bytes (doesn't matter if you are 32 or 64 bits but the key/data are limited to 16 bits (64K) in size. You can compile it to use 32 bit key/value sizes but that makes the memory footprint bigger. I can make a shar file of this code if anyone wants to play with it to see if it would be useful for tcl. The actual hashing function used is pluggable, it doesn't care. It comes with a handful of them but you could plug the tcl one (new or old) in if you liked. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: John O. <ou...@pa...> - 2010-02-15 04:57:32
|
> One very glaring way of looking at these flaws is that the old > hash function only distinguishes between 2551 different two-byte > sequences, despite there being 65536 of them; this is less than > 4%. Conversely, for *any* pair of neighbouring bytes of a key, > there are on average 24 other pairs of bytes that could be > substituted without changing the value of the old hash function. > Yet another way of looking at it (though probably hard to > formalise): the old hash function only keeps about 3.3 bits > of "randomness" for every 8 bits of key fed in to it! This is true. I knew this at the time, and it was an intentional (and actually correct!) choice. It turns out that typical strings don't actually have 8 bits of information per byte: for example, most strings that get hashed either consist of purely alphabetic stuff or sequences of digits. Even in the case of alphabetic strings, some characters are much more likely than others, so the space of strings doesn't even grow by a factor of 26 with each additional character that's added. For example, how many 2-character words and numbers are there in the English language? I'll bet it's a lot less than 2551. Suppose instead of the current calculation "result = result*9 + c" we used "result = result*256 + c". The problem with this is that result grows way too quickly, so after processing just a few characters the first character in the string has been shifted into the high-order bits of result. We only use the low-order bits as the hash bucket index, so this means that only the last few characters of the string influence that index. As a result, there isn't enough entropy/information in the low-order bits that are actually used, so more hash collisions result. Ideally we'd like the magnitude of result to grow at the same rate as the overall table size, so that the useful information in the hash value falls into the bits that we actually use. I picked the constant 9 after a lot of experimentation with different hash values. Constants that are either larger or smaller cause worse hash table performance. It's possible that the proposed new algorithm is better than the original Tcl one; it looks like it might use a more complex approach that tends to spread the information from each character more quickly across all of the bits of the hash value, whereas the multiply-by-9 approach only gradually does this. However, BE VERY CAREFUL ABOUT HASH FUNCTIONS. Picking a good hash function seems simple but is actually very complex, with lots of subtle gotchas. Many people think they know good functions, but most of them don't actually behave very well and few people do extensive tests on data that is representative of how the hash functions will actually be used. Hash functions that work well on some kinds of data may not work well on other kinds. I got started in this because I wrote an application using an "off-the-shelf" hash function and discovered that the application performed very poorly at large scale. I eventually discovered that it was only using a small portion of the hash bucket array because the hash function did not successfully spread the hash values across the entire array; there were so many bucket collisions that the entire application performed badly. I ended up with the current function after a lot of experiments. So, if you decide to change, make sure you measure very carefully on representative datasets, and make sure you've studied the properties of hashing functions enough to understand exactly how yours is working and why it is better. -John- |
From: Donal K. F. <don...@ma...> - 2010-02-15 15:14:30
Attachments:
donal_k_fellows.vcf
|
On 15/02/2010 04:56, John Ousterhout wrote: > It's possible that the proposed new algorithm is better than the > original Tcl one; it looks like it might use a more complex > approach that tends to spread the information from each character > more quickly across all of the bits of the hash value, whereas > the multiply-by-9 approach only gradually does this. The original Tcl algorithm is not bad for most keys. It both keeps bits low down and spreads them up, and it uses operations that don't lose too much information (i.e., multiplication and addition). The FNV algorithm is remarkably similar in structure (so advantageous in terms of ease of transition) but uses a different multiplier that spreads the information more thoroughly and uses xor rather than addition (though we all know those are actually closely related operations). The main issue with the old hash algorithm was that it was too easy to attack it and turn hash table accesses from O(1) to O(n**2). A literature search (not very easy; this is not a heavily studied area) indicates that the FNV hash is quite a bit more resistant; attacks would require using much longer keys and would be more likely to hit some other limit. The other advantage of the FNV algorithm is that it is well-known, reasonably well-studied, and does a reasonably good job with producing keys for the space of "short words"; I'm not really the guy saying that it is quite good, a number of others are (URLs elsewhere in this thread). I don't know of anyone other than Tcl using the old hash algorithm (though related variants have been studied and found lacking). I'm tempted by Bob Jenkins's lookup3 hash, but it's big, complex, and really wants to know the length of the string first so implementing it efficiently for string keys is awkward[*]. We could use it for Tcl_Obj keys, but I'm currently keener on using the same algorithm for both types of user-visible keys. (I'm also a bit hesitant because of the complexity of lookup3; it's non-trivial to code.) [Good advice elided] > So, if you decide to change, make sure you measure very carefully > on representative datasets, and make sure you've studied the > properties of hashing functions enough to understand exactly > how yours is working and why it is better. On non-malicious datasets, Tcl's original algorithm works pretty well. It's the case when someone's deliberately trying to provoke bad behaviour that concerned me. FNV seems to do better (and has reasonable distribution behaviour on the non-malicious datasets I've tried it with, such as the contents of /usr/share/dict/words and the first million cardinal numbers). As a side note, changing the hash has flushed out a small number of places in Tcl where we had code that actually did depend on the order of hashing. These places were bugs waiting to happen... Donal. [* Maintaining binary and source compatibility was a cast-iron requirement for this change, so storing the length of a string key was not possible; there's nowhere in a Tcl_HashEntry for it to go and offsets of fields are baked into user code because of macros. ] |
From: Larry M. <lm...@bi...> - 2010-02-15 16:16:44
|
On Mon, Feb 15, 2010 at 03:14:07PM +0000, Donal K. Fellows wrote: > The main issue with the old hash algorithm was that it was too easy to > attack it and turn hash table accesses from O(1) to O(n**2). So there must be bug reports or feature requests from users complaining about this. Could we see those? > On non-malicious datasets, Tcl's original algorithm works pretty well. > It's the case when someone's deliberately trying to provoke bad > behaviour that concerned me. FNV seems to do better (and has reasonable > distribution behaviour on the non-malicious datasets I've tried it with, > such as the contents of /usr/share/dict/words and the first million > cardinal numbers). What's wrong with letting people who are trying to break the system suffer poor performance? You've got working code, well thought out, field tested for a couple of decades, no complaints that I have seen, and you want to change it. That's fine but there should be a compelling case for doing so. In the absence of one any sane group of engineers would not change something that works just because they think the new one might be better. Experience has shown all of us that every change has some unexpected costs. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com |
From: Tom J. <tom...@gm...> - 2010-02-15 20:55:39
|
Hey, As far as I can tell this entire operation and the logic used to justify it is totally messed up. Unless I'm missing some additional code, one of the basic justifications for this change is completely wrong. That justification is the wiki entry which shows FNV somehow creates a hash key distribution that outsmarts the test program "collide". From my reading of the collide program it was designed to find collisions. Unfortunately the code seems to find X candidates which have the same hash value and then create an array. Guess what! This trick works for any hash function. As far as I can tell it works for the new FNV hash and one which I tried the djb hash. Then I made changes to the "collide" procedure. Apparently the result is very sensitive to the data pre-pended to the string to be hashed. Regardless of these results, the best distribution seems to be achieved with the djb hash: result = 5381; for (c=*string++ ; c ; c=*string++) { result = ( ( result<<5 ) + result ) ^ c; } More interesting to me is why the similar function it tclLiteral.c was not changed: line914 result += (result<<3) + bytes[i]; But many words have been thrown around about cryptography, safety, whatever. This really makes no sense. The structure of the hash, which weighs toward low memory usage, is designed to open the door to many collisions. The whole design creates the potential for many collisions. You could compare this to a secure hash which maps into an enormous keyspace. The measure of security is based upon the difficulty of finding a collision with a given string. So basically all the pro-change arguments are based upon un-shown work. My tests show no difference when the collide proc is used as given, but if you change the collide proc, the current hash seems to work very well. For instance, change the collide proc to this: proc collide {level {accum ""}} { global a if {$level} { incr level -1 collide $level "jb$accum" collide $level "ba$accum" } else { set a($accum) 1 } } One interesting change is that the run time goes way down. Maybe there is something special about the original proc. This special feature does not reduce the security of the hash, for that you need a generic method to find a collision with a particular string. My recommendation: demonstrate a real vulnerability to the original hash before introducing a fix. No vulnerability has been demonstrated. Basically a cryptographic hash maps an infinite keyspace into a fixed, but very large hashspace. If the hashspace is scaled to the number of used keys, you can imagine that the cryptographic function is lost, assuming it ever existed. Cryptography is the modern equivalent of a leprechaun tying ribbons around every tree in order to disguise the tree hiding the treasure. The scaled hash functions tend to cut down most of the trees, making the treasure much easier to find. Regardless of any problems with the current hash, why does anyone care if some idiot chooses an array to store HTTP headers? Should the Tcl language protect everyone because they don't realize they need to understand that user data needs to be validated, or at least not trusted? Of course I'm on record as being opposed to using an array to handle HTTP headers, only the hard headed would continue to use arrays to represent HTTP headers, or create any local variables automatically. Instead, what you do is to create a representation of the user data and then query it using known keys. But of course the problem is that the core developers want everything: unconscious security. Many years ago I learned a simple lesson: everyone that came before me knew what they were doing. If I intend to change it, I must first understand the choices they made. I don't see any understanding here. tom jackson On Mon, Feb 15, 2010 at 7:14 AM, Donal K. Fellows <don...@ma...> wrote: > On 15/02/2010 04:56, John Ousterhout wrote: >> >> It's possible that the proposed new algorithm is better than the >> original Tcl one; it looks like it might use a more complex >> approach that tends to spread the information from each character >> more quickly across all of the bits of the hash value, whereas >> the multiply-by-9 approach only gradually does this. > > The original Tcl algorithm is not bad for most keys. It both keeps bits > low down and spreads them up, and it uses operations that don't lose too > much information (i.e., multiplication and addition). The FNV algorithm > is remarkably similar in structure (so advantageous in terms of ease of > transition) but uses a different multiplier that spreads the information > more thoroughly and uses xor rather than addition (though we all know > those are actually closely related operations). > > The main issue with the old hash algorithm was that it was too easy to > attack it and turn hash table accesses from O(1) to O(n**2). A > literature search (not very easy; this is not a heavily studied area) > indicates that the FNV hash is quite a bit more resistant; attacks would > require using much longer keys and would be more likely to hit some > other limit. > > The other advantage of the FNV algorithm is that it is well-known, > reasonably well-studied, and does a reasonably good job with producing > keys for the space of "short words"; I'm not really the guy saying that > it is quite good, a number of others are (URLs elsewhere in this > thread). I don't know of anyone other than Tcl using the old hash > algorithm (though related variants have been studied and found lacking). > > I'm tempted by Bob Jenkins's lookup3 hash, but it's big, complex, and > really wants to know the length of the string first so implementing it > efficiently for string keys is awkward[*]. We could use it for Tcl_Obj > keys, but I'm currently keener on using the same algorithm for both > types of user-visible keys. (I'm also a bit hesitant because of the > complexity of lookup3; it's non-trivial to code.) > > [Good advice elided] >> >> So, if you decide to change, make sure you measure very carefully >> on representative datasets, and make sure you've studied the >> properties of hashing functions enough to understand exactly >> how yours is working and why it is better. > > On non-malicious datasets, Tcl's original algorithm works pretty well. > It's the case when someone's deliberately trying to provoke bad > behaviour that concerned me. FNV seems to do better (and has reasonable > distribution behaviour on the non-malicious datasets I've tried it with, > such as the contents of /usr/share/dict/words and the first million > cardinal numbers). > > As a side note, changing the hash has flushed out a small number of > places in Tcl where we had code that actually did depend on the order of > hashing. These places were bugs waiting to happen... > > Donal. > [* Maintaining binary and source compatibility was a cast-iron > requirement for this change, so storing the length of a string key > was not possible; there's nowhere in a Tcl_HashEntry for it to go > and offsets of fields are baked into user code because of macros. ] > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > _______________________________________________ > Tcl-Core mailing list > Tcl...@li... > https://lists.sourceforge.net/lists/listinfo/tcl-core > > |
From: Donal K. F. <don...@ma...> - 2010-02-16 11:57:05
Attachments:
donal_k_fellows.vcf
|
On 15/02/2010 20:55, Tom Jackson wrote: > More interesting to me is why the similar function it tclLiteral.c was > not changed: > > line914 result += (result<<3) + bytes[i]; Oh dear, that was an oversight. It will be fixed very soon. :-) > So basically all the pro-change arguments are based upon un-shown > work. My tests show no difference when the collide proc is used as > given, but if you change the collide proc, the current hash seems to > work very well. What's the point of that? With the collision demonstrator (which is published on the Wiki) you have a simple way of generating problematic keys. By "problematic" I mean "large numbers of keys with the same hashcode". To argue that changing the collide procedure gets rid of the problem is missing the whole damn point! We're not testing the collision generator. We're testing the hash function. What I would be interested in is a demonstration of an equivalent collision generator for FNV without requiring rapid growth of key sizes (as those would force the attacker to do more work). > One interesting change is that the run time goes way down. Maybe there > is something special about the original proc. This special feature > does not reduce the security of the hash, for that you need a generic > method to find a collision with a particular string. You're mistaken. This is not a strict security issue because we never assume that two different keys have a different hashcode. What it is is a potentially-crippling performance attack, since it lets anyone who can control what the keys are to make as many keys with the same hashcode as they want (hence they always end up in the same hash bucket even when we resize, leading to O(n**2) performance instead of O(1)). The collide procedure demonstrates the core of how to exploit this problem; all that's then needed is code that puts the keys in an array or dictionary (which is the natural way to write a handler for an http request in tclhttpd or Wub) and you've got a real problem. Yes, only a Denial-of-Service attack, but those matter. > My recommendation: demonstrate a real vulnerability to the original > hash before introducing a fix. No vulnerability has been demonstrated. Do you accept the argument above? > Regardless of any problems with the current hash, why does anyone care > if some idiot chooses an array to store HTTP headers? Should the Tcl > language protect everyone because they don't realize they need to > understand that user data needs to be validated, or at least not > trusted? WTF are you talking about? I take it you seem to think that validating is done before the initial parse of the values into a data structure, rather than afterwards? Interesting concept. I'm not aware of any non-trivial web server that does that. > Of course I'm on record as being opposed to using an array to handle > HTTP headers, only the hard headed would continue to use arrays to > represent HTTP headers, or create any local variables automatically. > Instead, what you do is to create a representation of the user data > and then query it using known keys. Oh, we're supposed to use leprechaun juice to hold the representation? > But of course the problem is that the core developers want everything: > unconscious security. All we're looking for is a way to generate hashcodes for arbitrary "short" values (short is probably "no more than 80 characters" really) that are unlikely to be the same. We can't (and never could) guarantee that there will be no clashes, but we do want it to be difficult to generate arbitrarily many clashes because that enables DoS attacks. > Many years ago I learned a simple lesson: everyone that came before me > knew what they were doing. If I intend to change it, I must first > understand the choices they made. I don't see any understanding here. I certainly see no understanding in your message. You're mixing up different hashing schemes and different types of hashes and different reasons for hashing and, well, the result is you're way off base. Donal. |
From: Colin M. <co...@ch...> - 2010-02-16 13:44:42
|
Donal K. Fellows wrote: > On 15/02/2010 20:55, Tom Jackson wrote: > >> Regardless of any problems with the current hash, why does anyone care >> if some idiot chooses an array to store HTTP headers? Should the Tcl >> language protect everyone because they don't realize they need to >> understand that user data needs to be validated, or at least not >> trusted? > > WTF are you talking about? I take it you seem to think that validating > is done before the initial parse of the values into a data structure, > rather than afterwards? Interesting concept. I'm not aware of any > non-trivial web server that does that. > >> Of course I'm on record as being opposed to using an array to handle >> HTTP headers, only the hard headed would continue to use arrays to >> represent HTTP headers, or create any local variables automatically. >> Instead, what you do is to create a representation of the user data >> and then query it using known keys. The HTTP spec permits a client to send headers with essentially arbitrary names. Anything which matches X-* is a valid HTTP header, according to the 1.0 and 1.1 specs. I presume it's possible to craft a series of such headers which would exploit an easily exploitable hash. Validation of individual headers, then, isn't really a viable approach. One could work around the possibility of a spammy malicious client sending too much of this stuff, pragmatically one would limit the size of headers. I doubt whether the issue is that HTTP servers using hashed structures can be made to degenerate. I think it's more that any use of hashed structures can be thus subverted. HTTP servers are probably only an obvious example where a dict or array is exploitable. BTW, I haven't read the record Tom is on, decrying the use of arrays to store HTTP headers. I think Donal's right, though - they're the obvious data type, and no other data type seems to me to have the desired properties. Colin. |
From: Colin M. <co...@ch...> - 2010-02-16 13:16:24
|
John Ousterhout wrote: >> One very glaring way of looking at these flaws is that the old >> hash function only distinguishes between 2551 different two-byte >> sequences, despite there being 65536 of them; this is less than >> 4%. Conversely, for *any* pair of neighbouring bytes of a key, >> there are on average 24 other pairs of bytes that could be >> substituted without changing the value of the old hash function. >> Yet another way of looking at it (though probably hard to >> formalise): the old hash function only keeps about 3.3 bits >> of "randomness" for every 8 bits of key fed in to it! >> > > This is true. I knew this at the time, and it was an intentional > (and actually correct!) choice. It turns out that typical strings > don't actually have 8 bits of information per byte: for example, > most strings that get hashed either consist of purely alphabetic > stuff or sequences of digits. Even in the case of alphabetic > strings, some characters are much more likely than others, so > the space of strings doesn't even grow by a factor of 26 with > each additional character that's added. For example, how many > 2-character words and numbers are there in the English language? > I'll bet it's a lot less than 2551. > If there is less entropy in a set of english words than in the set of extended-ascii characters necessary to encode those words, then one would expect less entropy in the output than in the output of a random bunch of binary. That's not a virtue, it's the law. If, however, there is exploitable entropy in the domain being hashed, it's no virtue of the hash function that it discards it. I would also like to point out that the set of identifiers (being that they are not english words, but often contractions and concatenations and mungings of them) is not the same as the set of english words ... is it? Someone should do the old SHRLDU ETAOIN comparison over them. Colin. |
From: Donal K. F. <don...@ma...> - 2010-02-15 22:02:16
Attachments:
donal_k_fellows.vcf
|
On 14/02/2010 22:56, Lars Hellström wrote: >> For truly small tables, a Lisp-style alist would actually be cheaper. >> Not that they scale... > > That is exactly what is in the buckets, is it not? Close, but not quite. Right now, we also keep the full hashcode so that we can do a cheap pre-check on the equality before going to the full equality test. Apart from that (and some housekeeping info), a linked list of key-value pairs is exactly what we've got. The net effect is that on any lookup (read or update) the hash function is only invoked once, and it's not invoked at all on rebuild. > Raises the question > of whether it might be a good idea to have hash tables start out with > exactly one bucket and skip computing hash keys until the first time > the table is rebuilt. (This adds a small constant time to all accesses, > and drops one larger constant and one key-size-proportional term for > accesses in small tables.) But it complicates the code further and would require a different default hash table size. Right now, we start with 4 cells (preallocated in the hash table structure) so that we minimize the number of mallocs for small tables. > It's probably not possible as a change > under-the-hood of Tcl8's hash tables (e.g. hash entries cache hash > keys, if I read the code correctly), but might be of interest to those > who consider alternatives for e.g. dicts or Tcl9. Further afield (9?) I want to do arrays with things other than hash tables backing them up. Vectors (as in BLT) and databases (as in metakit) are particularly interesting cases. (The main problem with this is working out what to do about traces, but that's a story for another time.) Once that mechanism is done, I'm sure we could try using balanced trees or something like that for array storage. However, I'm not at all convinced that the gain will be worth it for most applications. Let the code prove itself on its own merit when that happens! FWIW, the one thing that I wish had been more clearly enunciated in my undergraduate data structures course (long time ago now!) was just how effective hash tables really are. I suspect the lecturer just didn't believe in them, and so focused on tree balancing and things like that. But hashes are *so* good in practice and, as long as you have a good enough hash function, are easy to implement correctly too. > One very glaring way of looking at these flaws is that the old hash > function only distinguishes between 2551 different two-byte sequences, > despite there being 65536 of them; this is less than 4%. True, but because we apply hashes to our (pseudo-)UTF-8 representation, not all pairs are possible. (We use different hash algorithms for standard pointers and structures, and always have.) I'm not sure what the effect of this is, and I'm *definitely* not sure what effect this has on the attack-prone-ness of the FNV hash. > Conversely, > for *any* pair of neighbouring bytes of a key, there are on average 24 > other pairs of bytes that could be substituted without changing the > value of the old hash function. Yet another way of looking at it > (though probably hard to formalise): the old hash function only keeps > about 3.3 bits of "randomness" for every 8 bits of key fed in to it! I suspect that's not too bad for "nice" (i.e., normal) hash keys. It's probably not far off the entropy level of both numbers and letters in English words. > That the old function multiplies by such a low number as 9 is why it's > not so good already at distinguishing between two-byte keys; the 2551 > above arises as (255*9+255) - (0*9+0) + 1 -- maximal contribution from > two bytes minus minimal contribution from two bytes plus 1. Multiplying > by 1 instead (i.e., dropping that result<<3 term) would probably make > it even faster, but also an even poorer hash. ;-) Everything's made more complex by the fact that we chop high bits off the hash value when building the table indexes. That's known (apparently) to be non-optimal for FNV. OTOH, we resize our hash tables too, and then use more bits of the hash so the bits aren't really ignored. There's not been that much study of the types of hash table that Tcl uses and what the effects of the resizing algorithm are on overall hash efficiency. (That may be because it's a sophisticated thing to be doing - a lot of hash table implementations don't bother - and complex to analyse too.) Too many things wrong that can't be fixed without breaking compatibility with lots of existing code. Donal. |
From: Colin M. <co...@ch...> - 2010-02-09 14:32:14
|
Larry McVoy wrote: > I wanted to (briefly) touch on process. After a lovely discussion > on #tcl it seems like it might be worthwhile to point out the > value of a review process. I think that you especially may > appreciate the benefits. > Reading your email, I'm moved to investigate the connection between 'review' and 'review process'. You argue cogently for code review, but nowhere for code review process. You treat as synonyms what are clearly not. Nobody would dissuade a person from reviewing open source code. No 'process' is required: you get a copy, you study it, it's reviewed. But that's not what 'review process' means. Half of your email, Larry, extols the virtues of code review. The rest insinuates that since it's such a good thing, people should be forced to do it. > Reviews are not for beating people up. If people need beating that > is best done in private. > Reviews are somewhat good at catching problems. That's what people > think they are for. > Reviews are for educating other people. That's what people forget. > Reviews are great, evidently, but what has this to do with process? Isn't what you call 'process' really just a nice way of saying 'obligation?' > IMHO, if this sort of thing went through a review process > See how the word 'process' creeps back into a discussion of the merits of review as if they were the same thing? > - you would get more strokes for taking on this problem > - at least the core would get educated about what you are doing > - maybe someone would have something useful to add > That could all occur in any number of other ways. Not merely from 'review', and not demonstrably from any kind of 'process.' > I agree with Colin (or anyone) if/when they say "don't slow down the > people who are coding good stuff". But I disagree with Colin et al > that reviews do that. Don't think I ever said reviews do that. I think imposed processes do that. > Good code reviews reward the guy doing the work, > educate the people following along, there is pretty much no downside > when it is done right. If there were no downside it wouldn't have to be imposed or mandated, and to call coercion 'process' doesn't make it any more palatable. Review is good when it's needed or desired and not when it's not. And it's precisely when review is not needed or desired by those in the best position to know that 'process' reveals itself as obligation, resulting in wasted volunteer time and effort. Bottom Line Time: the Tcl license explicitly states that there is no warranty. This strongly implies that any onus to inspect the code for quality, suitability ir fitness for purpose is yours (the users.) Trying to foist that obligation onto the people who are giving you the code as a condition of your accepting it seems to me to be pretty grubby. Colin. |
From: Larry M. <lm...@bi...> - 2010-02-09 16:00:31
|
Dear Colin, Thank you for your opinion, but the mail was addressed to the core, it concerns how they do work, not how you do work, and I'm more interested in what they think. Thank you. |
From: Colin M. <co...@ch...> - 2010-02-09 23:15:15
|
Larry McVoy wrote: > Dear Colin, > > Thank you for your opinion, but the mail was addressed to the core, > it concerns how they do work Collaboratively. |
From: Eric M. <er...@el...> - 2010-02-09 23:35:55
Attachments:
hash_bench2.txt
hash_bench3.txt
|
000 VERSIONS: 1:8.5.8+ 2:8.5.8 001 ARRAY genKeys 50 1.12 1.00 002 ARRAY genKeys 500 0.93 1.00 003 ARRAY makeHash 500 50 1.01 1.00 004 BASE64 decode 10 1.11 1.00 005 BASE64 decode 100 1.03 1.00 006 BASE64 decode 1000 1.01 1.00 007 BASE64 decode 10000 1.02 1.00 008 BASE64 decode2 10 1.02 1.00 009 BASE64 decode2 100 0.88 1.00 010 BASE64 decode2 1000 1.03 1.00 011 BASE64 decode2 10000 1.03 1.00 012 BASE64 decode3 10 0.99 1.00 013 BASE64 decode3 100 0.99 1.00 014 BASE64 decode3 1000 1.00 1.00 015 BASE64 decode3 10000 1.01 1.00 016 BASE64 encode 10 0.98 1.00 017 BASE64 encode 100 1.02 1.00 018 BASE64 encode 1000 1.01 1.00 019 BASE64 encode 10000 1.32 1.00 020 BASE64 encode2 10 0.99 1.00 021 BASE64 encode2 100 0.99 1.00 022 BASE64 encode2 1000 1.02 1.00 023 BASE64 encode2 10000 1.12 1.00 024 BASE64 encode3 10 0.97 1.00 025 BASE64 encode3 100 0.98 1.00 026 BASE64 encode3 1000 1.00 1.00 027 BASE64 encode3 10000 1.11 1.00 028 BIN bitset-v1 1000 chars 1.00 1.00 029 BIN bitset-v1 5000 chars 1.00 1.00 030 BIN bitset-v1 10000 chars 1.03 1.00 031 BIN bitset-v2 1000 chars 0.99 1.00 032 BIN bitset-v2 5000 chars 0.99 1.00 033 BIN bitset-v2 10000 chars 0.99 1.00 034 BIN bitset-v3 1000 chars 1.00 1.00 035 BIN bitset-v3 5000 chars 0.99 1.00 036 BIN bitset-v3 10000 chars 0.91 1.00 037 BIN c scan, 1000b 1.00 1.00 038 BIN c scan, 5000b 1.08 1.00 039 BIN c scan, 10000b 1.11 1.00 040 BIN chars, 10000b 1.02 1.00 041 BIN u char, 10000b 0.98 1.00 042 CATCH error, complex 1.01 1.00 043 CATCH no catch used 0.95 1.00 044 CATCH return error 0.92 1.00 045 CATCH return except 0.97 1.00 046 CATCH return ok 0.93 1.00 047 DATA access in a list 1.01 1.00 048 DATA access in an array 1.01 1.00 049 DATA create in a list 1.22 1.00 050 DATA create in an array 1.00 1.00 051 ENC iso2022-jp, gets 0.95 1.00 052 ENC iso2022-jp, read 0.95 1.00 053 ENC iso2022-jp, read & size 0.99 1.00 054 ENC iso8859-2, gets 0.99 1.00 055 ENC iso8859-2, read 1.10 1.00 056 ENC iso8859-2, read & size 1.00 1.00 057 EVAL cmd and mixed lists 1.07 1.00 058 EVAL cmd eval as list 1.00 1.00 059 EVAL cmd eval as string 1.03 1.00 060 EVAL cmd eval in list obj var 1.00 1.00 061 EVAL list cmd and mixed lists 1.02 1.00 062 EVAL list cmd and pure lists 1.01 1.00 063 EXPR $a != $b int 0.97 1.00 064 EXPR $a != $b str (!= len) 1.00 1.00 065 EXPR $a != $b str (== len) 1.02 1.00 066 EXPR $a == $b int 0.96 1.00 067 EXPR $a == $b str (!= len) 1.02 1.00 068 EXPR $a == $b str (== len) 1.00 1.00 069 EXPR braced 0.99 1.00 070 EXPR fifty operands 0.99 1.00 071 EXPR incr with expr 1.00 1.00 072 EXPR incr with incr 1.00 1.00 073 EXPR inline 0.96 1.00 074 EXPR one operand 0.98 1.00 075 EXPR ten operands 1.01 1.00 076 EXPR unbraced 1.01 1.00 077 EXPR unbraced long 0.99 1.00 078 FCOPY binary: 160K 1.01 1.00 079 FCOPY encoding: 160K 1.00 1.00 080 FCOPY std: 160K 1.00 1.00 081 FILE exec interp 1.00 1.00 082 FILE exec interp: pkg require 1.07 1.00 083 FILE exists tmpfile (obj) 1.02 1.00 084 FILE exists ~ 0.96 1.00 085 FILE exists! tmpfile (obj) 0.97 1.00 086 FILE exists! tmpfile (str) 0.95 1.00 087 FILE glob tmpdir (60 entries) 1.02 1.00 088 FILE glob / all subcommands 1.01 1.00 089 FILE glob / atime 1.01 1.00 090 FILE glob / attributes 1.03 1.00 091 FILE glob / dirname 1.00 1.00 092 FILE glob / executable 1.01 1.00 093 FILE glob / exists 1.00 1.00 094 FILE glob / extension 1.00 1.00 095 FILE glob / isdirectory 0.99 1.00 096 FILE glob / isfile 1.00 1.00 097 FILE glob / mtime 0.99 1.00 098 FILE glob / owned 0.99 1.00 099 FILE glob / readable 1.01 1.00 100 FILE glob / rootname 1.01 1.00 101 FILE glob / size 0.99 1.00 102 FILE glob / tail 1.01 1.00 103 FILE glob / writable 1.02 1.00 104 FILE recurse / -dir 1.02 1.00 105 FILE recurse / cd 0.98 1.00 106 GCCont_cpb::cGCC 50 1.00 1.00 107 GCCont_cpb::cGCC 500 0.99 1.00 108 GCCont_cpb::cGCC 5000 0.99 1.00 109 GCCont_cpbre1::cGCC 50 1.02 1.00 110 GCCont_cpbre1::cGCC 500 0.99 1.00 111 GCCont_cpbre1::cGCC 5000 1.06 1.00 112 GCCont_cpbre2::cGCC 50 0.99 1.00 113 GCCont_cpbre2::cGCC 500 0.99 1.00 114 GCCont_cpbre2::cGCC 5000 1.05 1.00 115 GCCont_cpbrs2::cGCC 50 1.00 1.00 116 GCCont_cpbrs2::cGCC 500 1.02 1.00 117 GCCont_cpbrs2::cGCC 5000 1.00 1.00 118 GCCont_cpbrs::cGCC1 50 1.02 1.00 119 GCCont_cpbrs::cGCC1 500 1.01 1.00 120 GCCont_cpbrs::cGCC1 5000 0.98 1.00 121 GCCont_cpbrs::cGCC2 50 1.00 1.00 122 GCCont_cpbrs::cGCC2 500 1.01 1.00 123 GCCont_cpbrs::cGCC2 5000 1.00 1.00 124 GCCont_cpbrs_trap::cGCC 50 1.03 1.00 125 GCCont_cpbrs_trap::cGCC 500 1.06 1.00 126 GCCont_cpbrs_trap::cGCC 5000 1.08 1.00 127 GCCont_expr::cGCC 50 0.99 1.00 128 GCCont_expr::cGCC 500 1.00 1.00 129 GCCont_expr::cGCC 5000 1.01 1.00 130 GCCont_i::cGCC1 50 1.00 1.00 131 GCCont_i::cGCC1 500 0.88 1.00 132 GCCont_i::cGCC1 5000 0.99 1.00 133 GCCont_i::cGCC2 50 0.98 1.00 134 GCCont_i::cGCC2 500 0.95 1.00 135 GCCont_i::cGCC2 5000 1.00 1.00 136 GCCont_i::cGCC3 50 1.01 1.00 137 GCCont_i::cGCC3 500 0.97 1.00 138 GCCont_i::cGCC3 5000 0.99 1.00 139 GCCont_r1::cGCC 50 1.00 1.00 140 GCCont_r1::cGCC 500 1.01 1.00 141 GCCont_r1::cGCC 5000 1.01 1.00 142 GCCont_r2::cGCC 50 1.22 1.00 143 GCCont_r2::cGCC 500 0.95 1.00 144 GCCont_r2::cGCC 5000 1.00 1.00 145 GCCont_r3::cGCC 50 1.01 1.00 146 GCCont_r3::cGCC 500 1.00 1.00 147 GCCont_r3::cGCC 5000 1.00 1.00 148 GCCont_rsf1::cGCC 50 1.01 1.00 149 GCCont_rsf1::cGCC 500 1.01 1.00 150 GCCont_rsf1::cGCC 5000 0.99 1.00 151 GCCont_rsf2::cGCC1 50 1.01 1.00 152 GCCont_rsf2::cGCC1 500 0.99 1.00 153 GCCont_rsf2::cGCC1 5000 0.99 1.00 154 GCCont_rsf2::cGCC2 50 1.00 1.00 155 GCCont_rsf2::cGCC2 500 1.02 1.00 156 GCCont_rsf2::cGCC2 5000 1.00 1.00 157 GCCont_rsf3::cGCC 50 1.00 1.00 158 GCCont_rsf3::cGCC 500 1.01 1.00 159 GCCont_rsf3::cGCC 5000 1.04 1.00 160 GCCont_turing::cGCC 50 1.01 1.00 161 GCCont_turing::cGCC 500 0.98 1.00 162 GCCont_turing::cGCC 5000 0.97 1.00 163 HEAPSORT size 10 1.09 1.00 164 HEAPSORT size 50 1.02 1.00 165 HEAPSORT size 100 1.00 1.00 166 HEAPSORT2 size 10 1.07 1.00 167 HEAPSORT2 size 50 0.98 1.00 168 HEAPSORT2 size 100 1.00 1.00 169 IF 1/0 check 0.95 1.00 170 IF else true al 0.93 1.00 171 IF else true numeric 0.90 1.00 172 IF elseif true al 0.89 1.00 173 IF elseif true numeric 0.86 1.00 174 IF if false al/al 0.80 1.00 175 IF if false al/num 0.91 1.00 176 IF if false num/num 0.92 1.00 177 IF if true al 0.90 1.00 178 IF if true al/al 0.92 1.00 179 IF if true num/num 0.91 1.00 180 IF if true numeric 0.91 1.00 181 IF multi 1st true 0.91 1.00 182 IF multi 2nd true 0.94 1.00 183 IF multi 9th true 0.99 1.00 184 IF multi default true 0.99 1.00 185 KLIST shuffle0 llength 1 0.98 1.00 186 KLIST shuffle0 llength 10 0.91 1.00 187 KLIST shuffle0 llength 100 0.91 1.00 188 KLIST shuffle0 llength 1000 0.99 1.00 189 KLIST shuffle0 llength 10000 0.98 1.00 190 KLIST shuffle1-s llength 1 1.08 1.00 191 KLIST shuffle1-s llength 10 1.04 1.00 192 KLIST shuffle1-s llength 100 1.02 1.00 193 KLIST shuffle1-s llength 1000 0.98 1.00 194 KLIST shuffle1a llength 1 1.00 1.00 195 KLIST shuffle1a llength 10 1.00 1.00 196 KLIST shuffle1a llength 100 0.99 1.00 197 KLIST shuffle1a llength 1000 0.99 1.00 198 KLIST shuffle1a llength 10000 0.99 1.00 199 KLIST shuffle2 llength 1 1.00 1.00 200 KLIST shuffle2 llength 10 0.99 1.00 201 KLIST shuffle2 llength 100 0.99 1.00 202 KLIST shuffle2 llength 1000 0.99 1.00 203 KLIST shuffle2 llength 10000 1.00 1.00 204 KLIST shuffle3 llength 1 0.98 1.00 205 KLIST shuffle3 llength 10 1.00 1.00 206 KLIST shuffle3 llength 100 1.01 1.00 207 KLIST shuffle3 llength 1000 1.00 1.00 208 KLIST shuffle3 llength 10000 1.07 1.00 209 KLIST shuffle4 llength 1 0.97 1.00 210 KLIST shuffle4 llength 10 0.99 1.00 211 KLIST shuffle4 llength 100 0.99 1.00 212 KLIST shuffle4 llength 1000 0.99 1.00 213 KLIST shuffle4 llength 10000 0.99 1.00 214 KLIST shuffle5-s llength 1 1.01 1.00 215 KLIST shuffle5-s llength 10 0.93 1.00 216 KLIST shuffle5-s llength 100 0.93 1.00 217 KLIST shuffle5-s llength 1000 0.99 1.00 218 KLIST shuffle5a llength 1 1.00 1.00 219 KLIST shuffle5a llength 10 1.02 1.00 220 KLIST shuffle5a llength 100 0.99 1.00 221 KLIST shuffle5a llength 1000 1.01 1.00 222 KLIST shuffle5a llength 10000 1.00 1.00 223 KLIST shuffle6 llength 1 1.04 1.00 224 KLIST shuffle6 llength 10 0.97 1.00 225 KLIST shuffle6 llength 100 0.96 1.00 226 KLIST shuffle6 llength 1000 0.96 1.00 227 KLIST shuffle6 llength 10000 0.96 1.00 228 LIST append to list 1.01 1.00 229 LIST concat APPEND 2x10 0.86 1.00 230 LIST concat APPEND 2x100 0.90 1.00 231 LIST concat APPEND 2x1000 0.92 1.00 232 LIST concat APPEND 2x10000 1.00 1.00 233 LIST concat CONCAT 2x10 1.07 1.00 234 LIST concat CONCAT 2x100 1.11 1.00 235 LIST concat CONCAT 2x1000 1.01 1.00 236 LIST concat CONCAT 2x10000 0.79 1.00 237 LIST concat EVAL/LAPPEND 2x10 0.95 1.00 238 LIST concat EVAL/LAPPEND 2x100 0.99 1.00 239 LIST concat EVAL/LAPPEND 2x1000 1.01 1.00 240 LIST concat EVAL/LAPPEND 2x10000 1.00 1.00 241 LIST concat FOREACH/LAPPEND 2x10 0.98 1.00 242 LIST concat FOREACH/LAPPEND 2x100 0.99 1.00 243 LIST concat FOREACH/LAPPEND 2x1000 1.13 1.00 244 LIST concat FOREACH/LAPPEND 2x10000 1.06 1.00 245 LIST concat SET 2x10 1.00 1.00 246 LIST concat SET 2x100 1.03 1.00 247 LIST concat SET 2x1000 1.00 1.00 248 LIST concat SET 2x10000 1.01 1.00 249 LIST exact search, first item 0.97 1.00 250 LIST exact search, last item 0.98 1.00 251 LIST exact search, middle item 0.98 1.00 252 LIST exact search, non-item 1.00 1.00 253 LIST exact search, typed item 0.99 1.00 254 LIST exact search, untyped item 0.99 1.00 255 LIST index first element 0.98 1.00 256 LIST index last element 0.98 1.00 257 LIST index middle element 0.98 1.00 258 LIST insert an item at "end" 1.00 1.00 259 LIST insert an item at middle 1.00 1.00 260 LIST insert an item at start 0.90 1.00 261 LIST iterate list 1.00 1.00 262 LIST join list 1.00 1.00 263 LIST large, early range 0.96 1.00 264 LIST large, late range 0.94 1.00 265 LIST length, pure list 0.98 1.00 266 LIST list 1.04 1.00 267 LIST lset foreach l 0.99 1.00 268 LIST lset foreach list 1.06 1.00 269 LIST lset foreach ""s l 0.95 1.00 270 LIST lset foreach ""s list 0.90 1.00 271 LIST regexp search, first item 1.00 1.00 272 LIST regexp search, last item 0.99 1.00 273 LIST regexp search, non-item 0.95 1.00 274 LIST remove first element 1.02 1.00 275 LIST remove in mixed list 1.03 1.00 276 LIST remove last element 1.00 1.00 277 LIST remove middle element 1.02 1.00 278 LIST replace first el with multiple 1.01 1.00 279 LIST replace first element 1.01 1.00 280 LIST replace in mixed list 1.00 1.00 281 LIST replace last el with multiple 1.05 1.00 282 LIST replace last element 1.03 1.00 283 LIST replace middle el with multiple 1.03 1.00 284 LIST replace middle element 1.02 1.00 285 LIST replace range 0.99 1.00 286 LIST reverse core 0.98 1.00 287 LIST reverse lappend 0.98 1.00 288 LIST small, early range 1.29 1.00 289 LIST small, late range 0.97 1.00 290 LIST sort 1.00 1.00 291 LIST sorted search, first item 0.99 1.00 292 LIST sorted search, last item 0.98 1.00 293 LIST sorted search, middle item 1.00 1.00 294 LIST sorted search, non-item 0.99 1.00 295 LIST sorted search, typed item 1.03 1.00 296 LIST typed sort 1.02 1.00 297 LOOP for (to 1000) 0.96 1.00 298 LOOP for, iterate list 1.08 1.00 299 LOOP for, iterate string 1.00 1.00 300 LOOP foreach, iterate list 0.99 1.00 301 LOOP foreach, iterate string 1.02 1.00 302 LOOP while (to 1000) 0.99 1.00 303 LOOP while 1 (to 1000) 1.00 1.00 304 MAP ([chars])-case regsub 0.99 1.00 305 MAP http mapReply 1.01 1.00 306 MAP regsub -nocase, no match 1.00 1.00 307 MAP regsub 1 val 1.00 1.00 308 MAP regsub 1 val -nocase 1.02 1.00 309 MAP regsub 2 val 1.04 1.00 310 MAP regsub 2 val -nocase 1.00 1.00 311 MAP regsub 3 val 1.01 1.00 312 MAP regsub 3 val -nocase 0.92 1.00 313 MAP regsub 4 val 1.01 1.00 314 MAP regsub 4 val -nocase 0.94 1.00 315 MAP regsub short 0.99 1.00 316 MAP regsub, no match 0.97 1.00 317 MAP string -nocase, no match 1.00 1.00 318 MAP string 1 val 1.19 1.00 319 MAP string 1 val -nocase 1.02 1.00 320 MAP string 2 val 1.05 1.00 321 MAP string 2 val -nocase 0.99 1.00 322 MAP string 3 val 1.02 1.00 323 MAP string 3 val -nocase 1.00 1.00 324 MAP string 4 val 0.99 1.00 325 MAP string 4 val -nocase 0.99 1.00 326 MAP string short 1.00 1.00 327 MAP string, no match 1.00 1.00 328 MAP |-case regsub 0.99 1.00 329 MAP |-case strmap 0.98 1.00 330 MATRIX mult 5x5 1.01 1.00 331 MATRIX mult 10x10 1.00 1.00 332 MATRIX mult 15x15 1.00 1.00 333 MATRIX transposition-0 1.01 1.00 334 MATRIX transposition-1 1.02 1.00 335 MD5 msg len 10 1.02 1.00 336 MD5 msg len 100 1.01 1.00 337 MD5 msg len 1000 1.01 1.00 338 MD5 msg len 10000 1.01 1.00 339 MTHD array stored proc call 0.99 1.00 340 MTHD call absolute 0.99 1.00 341 MTHD call relative 0.98 1.00 342 MTHD direct ns proc call 1.01 1.00 343 MTHD imported ns proc call 0.99 1.00 344 MTHD indirect proc eval 0.97 1.00 345 MTHD indirect proc eval #2 0.96 1.00 346 MTHD inline call 1.04 1.00 347 MTHD interp alias proc call 0.98 1.00 348 MTHD ns lookup call 0.97 1.00 349 MTHD switch method call 1.01 1.00 350 NS alternating 0.98 1.00 351 PARSE html form upload (7978) 0.90 1.00 352 PARSE html form upload (993570) 0.99 1.00 353 PROC do-nothing, no args 1.03 1.00 354 PROC do-nothing, one arg 1.00 1.00 355 PROC empty, no args 1.09 1.00 356 PROC empty, use args 1.09 1.00 357 PROC explicit return 0.99 1.00 358 PROC explicit return (2) 0.56 1.00 359 PROC explicit return (3) 0.97 1.00 360 PROC heavily commented 0.92 1.00 361 PROC implicit return 0.98 1.00 362 PROC implicit return (2) 1.04 1.00 363 PROC implicit return (3) 1.01 1.00 364 PROC local links with global 1.04 1.00 365 PROC local links with upvar 1.00 1.00 366 PROC local links with variable 1.01 1.00 367 RE 1-char long-end 1.01 1.00 368 RE 1-char long-end catching 0.98 1.00 369 RE 1-char long-middle 1.03 1.00 370 RE 1-char long-middle catching 1.01 1.00 371 RE 1-char long-start 1.03 1.00 372 RE 1-char long-start catching 1.01 1.00 373 RE 1-char short 1.05 1.00 374 RE 1-char short catching 0.92 1.00 375 RE basic 1.00 1.00 376 RE basic catching 1.01 1.00 377 RE c-comment long 1.00 1.00 378 RE c-comment long catching 1.00 1.00 379 RE c-comment long nomatch 1.00 1.00 380 RE c-comment long nomatch catching 1.00 1.00 381 RE c-comment long pmatch 1.00 1.00 382 RE c-comment long pmatch catching 1.00 1.00 383 RE c-comment many *s 1.00 1.00 384 RE c-comment many *s catching 1.00 1.00 385 RE c-comment nomatch 1.00 1.00 386 RE c-comment nomatch catching 0.99 1.00 387 RE c-comment simple 0.97 1.00 388 RE c-comment simple catching 1.00 1.00 389 RE count all matches 1.02 1.00 390 RE extract all matches 0.97 1.00 391 RE ini file 0.99 1.00 392 RE ini file ng 1.01 1.00 393 RE literal regexp 1.01 1.00 394 RE n-char long-end 0.98 1.00 395 RE n-char long-end catching 0.99 1.00 396 RE n-char long-middle 0.99 1.00 397 RE n-char long-middle catching 1.07 1.00 398 RE n-char long-start 1.06 1.00 399 RE n-char long-start catching 1.00 1.00 400 RE n-char short 1.05 1.00 401 RE n-char short catching 1.02 1.00 402 RE static anchored match 1.02 1.00 403 RE static anchored match dot 1.04 1.00 404 RE static anchored nomatch 1.00 1.00 405 RE static anchored nomatch dot 1.05 1.00 406 RE static l-anchored match 1.04 1.00 407 RE static l-anchored nomatch 1.01 1.00 408 RE static long match 1.01 1.00 409 RE static long nomatch 1.00 1.00 410 RE static r-anchored match 1.03 1.00 411 RE static r-anchored nomatch 1.03 1.00 412 RE static short match 1.01 1.00 413 RE static short nomatch 1.04 1.00 414 RE var ***= directive match 1.00 1.00 415 RE var ***= directive nomatch 1.01 1.00 416 RE var . match 1.06 1.00 417 RE var [0-9] match 0.99 1.00 418 RE var \d match 0.98 1.00 419 RE var ^$ nomatch 1.01 1.00 420 RE var backtrack case 1.01 1.00 421 RE var-based regexp 0.93 1.00 422 READ 595K, cat 1.00 1.00 423 READ 595K, gets 1.01 1.00 424 READ 595K, glob-grep match 1.00 1.00 425 READ 595K, glob-grep nomatch 1.01 1.00 426 READ 595K, read 1.00 1.00 427 READ 595K, read & size 1.00 1.00 428 READ 595K, read dyn buf 0.99 1.00 429 READ 595K, read small buf 0.99 1.00 430 READ 3050b, cat 0.92 1.00 431 READ 3050b, gets 1.04 1.00 432 READ 3050b, glob-grep match 1.00 1.00 433 READ 3050b, glob-grep nomatch 0.99 1.00 434 READ 3050b, read 0.95 1.00 435 READ 3050b, read & size 1.00 1.00 436 READ 3050b, read dyn buf 1.00 1.00 437 READ 3050b, read small buf 0.99 1.00 438 READ bin 595K, cat 0.98 1.00 439 READ bin 595K, gets 1.00 1.00 440 READ bin 595K, glob-grep match 1.04 1.00 441 READ bin 595K, glob-grep nomatch 0.99 1.00 442 READ bin 595K, read 1.00 1.00 443 READ bin 595K, read & size 1.01 1.00 444 READ bin 595K, read dyn buf 1.07 1.00 445 READ bin 595K, read small buf 0.99 1.00 446 READ bin 3050b, cat 1.09 1.00 447 READ bin 3050b, gets 1.01 1.00 448 READ bin 3050b, glob-grep match 1.00 1.00 449 READ bin 3050b, glob-grep nomatch 1.00 1.00 450 READ bin 3050b, read 1.02 1.00 451 READ bin 3050b, read & size 0.98 1.00 452 READ bin 3050b, read dyn buf 1.00 1.00 453 READ bin 3050b, read small buf 1.00 1.00 454 SHA (A) msg len 10 0.98 1.00 455 SHA (A) msg len 100 0.93 1.00 456 SHA (A) msg len 1000 1.02 1.00 457 SHA (A) msg len 10000 1.03 1.00 458 SPLIT iter, 4000 uchars 1.00 1.00 459 SPLIT iter, 4010 chars 1.03 1.00 460 SPLIT iter, rand 100 c 1.00 1.00 461 SPLIT iter, rand 1000 c 0.89 1.00 462 SPLIT iter, rand 10000 c 1.00 1.00 463 SPLIT on 'c', 4000 uchars 0.99 1.00 464 SPLIT on 'c', 4010 chars 0.99 1.00 465 SPLIT on 'cz', 4000 uchars 0.97 1.00 466 SPLIT on 'cz', 4010 chars 0.99 1.00 467 SPLIT on 'cû', 4000 uchars 1.01 1.00 468 SPLIT on 'cû', 4010 chars 1.00 1.00 469 SPLIT, 4000 uchars 0.99 1.00 470 SPLIT, 4010 chars 0.98 1.00 471 SPLIT, rand 100 c 1.01 1.00 472 SPLIT, rand 1000 c 1.00 1.00 473 SPLIT, rand 10000 c 0.96 1.00 474 STR append 0.99 1.00 475 STR append (1KB + 1KB) 0.98 1.00 476 STR append (1MB + (1b+1K+1b)*100) 0.98 1.00 477 STR append (1MB + 1KB) 0.99 1.00 478 STR append (1MB + 1KB*20) 0.99 1.00 479 STR append (1MB + 1KB*1000) 0.99 1.00 480 STR append (1MB + 1MB*3) 1.01 1.00 481 STR append (1MB + 1MB*5) 1.03 1.00 482 STR append (1MB + 2b*1000) 0.99 1.00 483 STR append (10KB + 1KB) 1.01 1.00 484 STR first (failure) 0.99 1.00 485 STR first (failure) utf 0.98 1.00 486 STR first (success) 1.01 1.00 487 STR first (success) utf 0.95 1.00 488 STR first (total failure) 1.00 1.00 489 STR first (total failure) utf 0.99 1.00 490 STR index 0 1.00 1.00 491 STR index 100 1.05 1.00 492 STR index 500 1.04 1.00 493 STR info locals match 1.01 1.00 494 STR last (failure) 1.00 1.00 495 STR last (success) 0.95 1.00 496 STR last (total failure) 1.00 1.00 497 STR length (==4010) 1.05 1.00 498 STR length growing (1000) 0.99 1.00 499 STR length growing uc (1000) 1.01 1.00 500 STR length of a LIST 0.97 1.00 501 STR length static str 0.99 1.00 502 STR match, complex (failure) 1.00 1.00 503 STR match, complex (success early) 1.03 1.00 504 STR match, complex (success late) 1.00 1.00 505 STR match, complex (total failure) 1.01 1.00 506 STR match, exact (failure) 1.04 1.00 507 STR match, exact (success) 1.03 1.00 508 STR match, exact -nocase (failure) 0.97 1.00 509 STR match, exact -nocase (success) 1.00 1.00 510 STR match, recurse (fail backtrack) 0.99 1.00 511 STR match, recurse (fail bt1) 1.08 1.00 512 STR match, recurse (fail bt2) 1.08 1.00 513 STR match, recurse (fail ranchor) 1.01 1.00 514 STR match, recurse (success... [truncated message content] |
From: Alexandre F. <ale...@gm...> - 2010-02-11 00:08:08
Attachments:
bucketeer.tcl
|
Since it appears that timings are tricky, and that worst-case (including malevolent) performance is at least as interesting as the average, here is a small pure-Tcl function that computes the buckets' contents for a given array. It yields a list of {size bucket} pairs, where size==[llength bucket], sorted by decreasing size. You can verify its honesty by comparing the sizes with what [array statistics] says. Hope this tool will let somebody discover the killing-HTTP-header-attack we are all dying to see. -Alex |