RE: [Algorithms] Filename hash function
Brought to you by:
vexxed72
|
From: Jon W. <hp...@mi...> - 2001-11-30 17:36:03
|
We had a long thread on different hash functions in this forum a
few months ago. Try finding it in the archives.
The function you want to present can be created. The amount of data
you want to store is 36 tokens per position. Thus, you have to find
the highest number such that 36 to that number is less than or
equal to 2 to 32. As pow(2,32) is 4294967296 and pow(36,6) is
2176782336, the value for X in your case is 6 (giving you slightly
less than 1 bit to spare in a 32-bit word).
Now, that's not really useful anyway, because you can't store 32
bits worth of pointers in any table, so you have to collapse (fold)
the value; typically using modulus, bit masking, or shift-and-xor.
Thus, you will have to deal with collissions no matter what. The
idea behind hash functions is NOT to make them totally avoid
collissions (because you can't) but to make collissions evenly and
randomly distributed, when you view a large number of different
(input,output) tuples.
Cheers,
/ h+
Gareth Lewin wrote:
I was thinking about the hash function used in ( SGI's ) stl. And obviously
since the hash value is 32bit, the chances of 2 strings colliding is very
high, and from some simple tests I did it is even higher than I thought.
Then I started to think about writing a hash function for filenames, or in
other words a hash function for strings with limitations.
I want to write a hash function that is guarenteed to give me a differant
value for each string, assuming the following limitations ( for example )
Only letters(a-z) and numbers(0-9) are value.
The hash function is case-insensitive.
The length of a string is no more than X ( Need a way to work out X ).
But I have no idea how to go about this, any ideas ?
_____________________
Regards, Gareth Lewin
Lead Programmer
Climax Development
Fareham Heights
Standard Way
Fareham
P016 8XT
|