[Libclc-developers] Re: hashing
Status: Planning
Brought to you by:
augestad
|
From: Michael B.A. <mb...@io...> - 2003-03-19 21:46:00
|
On Wed, 19 Mar 2003 21:28:07 +0100 Thomas Stegen <ts...@ci...> wrote: > Michael B.Allen wrote: > > > > It's not entirely obvious to me why the comparator is really necessay. If > > the user supplies a hash function you can just compare hashes yes? Yes, > > there could be a collision but with a suitible hash function the chances > > of that are roughly 1 in ULONG_MAX which on my machine appears to be 1 > > in 4,294,967,295. > > That is a small chance. But can you make a hash function which produces > hashes over the whole range? I don't think you can. And if a collision That's why you need a suitable hash function for the data in question. > does happen it is going to be a rather subtle bug I would imagine. And True. > if the table has less than in 4,294,967,295 buckets then the chance of > collision increases. And equality can mean different things to different Well I don't mean bucket collisions. You can have more than one datum in a bucket. The only way to loose a datum is if the hashes match exactly and not just hash to the same bucket. So provided the hashing function gave you a remotely even distribution the chances of loosing a datum is really 1 in ULONG_MAX. Also I think as the hashtable gets *smaller* the chances of such a collision are less because the divide by size crontibutes to the hashing. > people. Same value, same object, matching only part of a string. Again a suitable hash function is required. You might provide a generic string hashing function though. Incedentally here's a nice page with a bit on hashing: http://burtleburtle.net/bob/ > In short, I think a compare function will improve the map/hashtable > and make it useful in more situations. I don't see how a comparator would "make it more useful in more situations" but it would eliminate the miniscule uncertaintly I described. I suppose you're right though. As slim as it is that there would be a collision resulting in a datum being removed I think it would be technically incorrect not to have it. Mike -- A program should be written to model the concepts of the task it performs rather than the physical world or a process because this maximizes the potential for it to be applied to tasks that are conceptually similar and, more important, to tasks that have not yet been conceived. |