|
From: Tom J. <tom...@gm...> - 2010-02-17 18:41:53
|
On Wed, Feb 17, 2010 at 3:56 AM, Donal K. Fellows <don...@ma...> wrote: > > Mod is quite an expensive operation, at least on this machine. Applying > a mod operation (mod a large prime) to the result of the classic hash > before returning nearly takes the cost of hashing up to equivalence with > FNV. My idea was to just apply the mod once (to the final hash and in the bucket selection code), however, on further testing the current Tcl hash produces values which (unless a collision) produce a good bucket distribution. You can't apply the mod inside the hash function because it doesn't have access to the size of the hash table (number of buckets). Also, the modulus is not "large" it is just a prime close to the number of buckets. However, the current bucket code works very well with the current hash function, both fnv and the djb hash perform worse. >> The system only needs to be broken once. All that is required is a set >> of keys (maybe a few thousand) which hash to the same value. After >> that the entire algorithm is broken (assuming the goal is to poison >> the bucket code with collisions). Assuming a 32 bit key-space, it >> should be easy to find at least one set of 10000 strings which share >> the same hash value. >> >> Do I care to find such a set? NO! But this is a triviality for >> cryptographers. > > Remember, cryptographers are mathematicians. They have a definition of > "trivial" which doesn't match that used by software engineers. I can > remember a number of acquaintances state that providing examples once an > existence proof was done was trivial. Can't say that I really bought > their argument... :-) The keyspace is only 32 bits, this is trivial by any measure. I guess Gustav has already found several sets of keys. Of course, the real problem is that you only need the low order bits of the hash to match, so the keyspace for the buckets is much smaller than 32 bits. > In terms of hash attacks, they only have real fangs (given that we don't > conflate hashcode equivalence with equality) if it is possible to make > either: > > a) A large number of *short* keys with the same hashcode, or > b) A substantial number of keys with the same hashcode as some > particular key from a set not nominated by the attacker (typically > defined by the communication protocol) > > The aim of the first one is to swamp the hash table with totally > irrelevant things. The aim of the second is to retard required lookups. b) fits the typical definition of a collision. Of course collisions have nothing to do with cryptography, since the key is always compared anyway. Hopefully the key comparison stops on the first mismatch, so the actual number of bits compared is more related to the length of a common prefix between all keys in one bucket. But since (again) the key is always compared, other types of attacks would be just as effective in consuming resources: a few very long keys which hash to the same value, or get into the same bucket. Also the communication protocol itself will consume more resources transmitting the data, and the hashing and storage of large quantities of data is also an attack, in fact the attack would be more effective if you use a more expensive hash function. Like I said earlier: the utility of cryptography is found in the ability to transfer the expensive calculations to the attacker. If instead you provide a platform so the attacker can make you perform many expensive calculations, the attacker has already succeeded without lifting a finger. It is much better to solve this problem at the application level: limit the number of headers, or the length of an entire request. Otherwise the attacker can just send millions of headers instead of thousands, what is the difference how performance is degraded? DOS attacks are not sophisticated attacks. Maybe just send a very long url and see of the whole url gets written to disk in the log file. Eventually the disk fills up. > >> I know there is a huge bias against AOLserver, so I digress to Apache. >> How does Apache handle headers? >> >>> From http://thomas.eibner.dk/apache/table.html (Apache table API): >> >> "Tables in Apache works almost like hashes in Perl. The main >> difference between the two is that you can have two of the same keys >> in the Apache table API. The table lookups are done in a >> case-insensitive manner." >> >> So for everyone who has some problem with AOLserver, maybe note that >> headers in Apache are handled very similarly. More specifically they >> are not handled as associative arrays or dictionaries. > > Checking the source to Apache (strictly the APR) I see that they're > using hashing inside their implementation of tables. OK, they're using a > truly trivial hash, but I suppose it's better than what they had before > (a single linear list). > > Check for yourself at: > http://svn.apache.org/repos/asf/apr/apr/trunk/tables/apr_tables.c Note that Apache does not modify the key, it can store multiple copies of the same key, and can search for keys in a case-insensitive manor. Unfortunately their case-insensitive search is limited to ASCII chars. Headers are also stored in a linear array (but it looks like you can actually only store two keys with the same name). The only way to justify using an array instead of a linear list {{key value} {key value}} is if you don't think it is important to maintain the data in the form it was sent. This is a valid argument. But note that in order to add a key, or lappend to an existing key, you have to search the array first. A large array will consume more resources during creation than a list containing the same data. tom jackson |