Re: [Algorithms] streams/<string> vs nasty old C style...
Brought to you by:
vexxed72
From: Neal T. <ne...@ps...> - 2000-12-01 12:12:51
|
From: Adam Moravanszky <amo...@dp...> > I also do this, but I keep the original string around so I can always check > if I really have two identical strings, or if I have a hash crash. I never > actually observed a crash happening -- and I also don't know what chance it > has of happening. This particular piece of code uses short (1 to 20 > character) strings. Does it make sense to worry about crashes? A crash in > this case wouldn't be catastrophic but it would lead to incorrect things > being displayed as long as two different strings map to the same code. Hi, This is why we're using CRCs (IEEE 1003.2 32 bit CRC generation, to be precise:-)). In theory I believe collisions are still possible, and would cause the same sort of effects you're describing, but in practice we've never seen any and a back of the envelope probability estimate seems to indicate that they're very unlikely. Having said that, I think we're currently still passing the strings through in case we end up needing them; we're just not doing any actual string based comparisons. An alternative approach would be to use a more conventional string hash key generator and then do the comparisons using an initial hash test and then a string compare whenever entries are in the same bucket. This is actually what I did for the last engine I worked on, and can be fast if the hashing function is good, but does take up memory for all the string storage and the hash table. (Dave (Hunt), please tell me if any of this post is actually completely wrong:-)) Neal Tringham (Sick Puppies / Empire Interactive) mailto: ne...@ps... ne...@em... |