Menu

New algorithm? Recursive split

2014-08-14
2016-11-03
  • Thomas Mueller

    Thomas Mueller - 2014-08-14

    I have implemented a perfect hash / minimal perfect hash algorithm,
    and I'm not quite sure if this is a new or an existing one. I didn't
    find anything that matches this algorithm, but maybe I'm wrong.

    The implementation is here (Java):
    https://code.google.com/p/h2database/source/browse/trunk/h2/src/tools/org/h2/dev/hash/PerfectHash.java

    It needs about 1.4 bits per key, and the resulting hash table is about
    79% full. The minimal perfect hash function needs about 2.3 bits per key.
    Generating the hash function takes about 1 second per million keys (linear)
    for both perfect hash and minimal perfect hash.

    The algorithm is recursive: sets that contain no or only one entry are not
    processed as no conflicts are possible. Sets that contain between 2 and 16
    buckets, up to 16 hash functions are tested to check if they can store the
    data without conflict. If no function was found, the same is tested on a
    larger bucket (except for the minimal perfect hash). If no hash function was
    found, and for larger buckets, the bucket is split into a number of smaller
    buckets (up to 32).

    At the end of the generation process, the data is compressed using a general
    purpose compression tool (Deflate / Huffman coding). The uncompressed data is
    around 1.52 bits per key (perfect hash) and 3.72 (minimal perfect hash).

     
  • Thomas Mueller

    Thomas Mueller - 2016-11-03

    FYI the code has been moved to https://github.com/thomasmueller/minperf. Now the minimal perfect hash takes less space (down to about 1.52 bits/key).

     

    Last edit: Thomas Mueller 2016-11-03

Log in to post a comment.