The MPHFs for CHD algorithms do not generate unique ids even though the...
Brought to you by:
davi,
fc_botelho
I am running the CHD algorithm in the CMPH library on a set of 500 keys and then I check for the ids the algorithm generates for each key.-
As per my understanding the ids should be unique for each key as the algorithm generates a minimal perfect hash function. However, I observe a lot of duplicate ids for distinct keys. I also set the parameters (keys_per_bin) to 1.
cmph_t* hash = NULL; // source of keys cmph_io_adapter_t *source = cmph_io_nlfile_adapter(fp); cmph_config_t *config = cmph_config_new(source); cmph_config_set_algo(config, CMPH_CHD); **cmph_config_set_keys_per_bin(config, 1); cmph_config_set_b(config, 1);** cmph_config_set_verbosity(config,1); hash = cmph_new(config); cmph_config_destroy(config); char *k = (char *)malloc(sizeof(15)); while(fgets(k, INT_MAX, read) != NULL){ string key = k; unsigned int id = cmph_search(hash, k, (cmph_uint32)key.length()); ids<<id<<"\n"; }
For 500 keys, I get around 90-100 duplicate ids.
Is there something that I am doing wrong ?
Any help will be appreciated.
Thank You.
Please mark this as resolved.
The issue was due to the length argument of the search function. The string had a new line character which provided for an extra length and thus caused ambiguous results.