12/17/2005 03:51 rmay@...
Constants and hashing [Was:
On approximately 12/17/2005 4:39 AM, came the following characters from
the keyboard of Robert May:
> So, it was my plan to do a later implementation moving the lookup back
> into XS, using a 'minimal perfect hash' and an efficient C array. The
> storage for this approach would be something like 8 bytes per constant
> identifier (string pointer + long value) plus the constant identifier
> string length (for the constant lookup array), plus the storage for the
> polynomial values used by the hash function.
Agreed, speed probably isn't the issue, and mph may or may not be
necessary (it mostly benefits speed issues). After some reflection from
our original discussions until now, I think it may be more worthwhile to
figure out how to make _all_ 18000 constants available to Win32::GUI
programs, without paying a huge price in memory. Those constants that
actually get used get cached in the stash anyway...
If speed isn't the issue then a binary search has at most 15 probes
to find any of the 18,000 entries in the table with a no time cost
expansion for another 14,000 entries. As benchmarking showed some
time back, the implemented 'hash' was better than a binary search
for some small number of searches. I would expect that the binary
search would be faster than a hash for some searches. Given a
uniform distribution of searches, the expected number of probes
is approximately 8.
Again, if speed is not a criteria and a minor speed decrease is
acceptable, then a binary search is a good way to proceed. The
only overhead are the string pointers in the table of entries.
If you want a tailored partitioned binary search then one data
structure / partition is required, the binary search function
code can be shared. This results in a decrease in the number
of probes (since the table size for each partition is smaller
than for the entire data structure), with added overhead to
determine what partition to access. And on the added overhead,
we esssentially trade time, i.e. probes, for code to determine
what partition to use, and we add some small amount of additional
complexity for multiple tables.
Since the animal breeds, I would expect that there will be
future additions and deletions of constants. Hence, a need
to address the future.
At the end, I'm not wealthy because I do a terrible job of