The MPHFs for CHD algorithms do not generate unique ids even though the...
Brought to you by:
davi,
fc_botelho
I am running the CHD algorithm in the CMPH library on a set of 500 keys and then I check for the ids the algorithm generates for each key.-
As per my understanding the ids should be unique for each key as the algorithm generates a minimal perfect hash function. However, I observe a lot of duplicate ids for distinct keys. I also set the parameters (keys_per_bin) to 1.
cmph_t* hash = NULL;
// source of keys
cmph_io_adapter_t *source = cmph_io_nlfile_adapter(fp);
cmph_config_t *config = cmph_config_new(source);
cmph_config_set_algo(config, CMPH_CHD);
**cmph_config_set_keys_per_bin(config, 1);
cmph_config_set_b(config, 1);**
cmph_config_set_verbosity(config,1);
hash = cmph_new(config);
cmph_config_destroy(config);
char *k = (char *)malloc(sizeof(15));
while(fgets(k, INT_MAX, read) != NULL){
string key = k;
unsigned int id = cmph_search(hash, k, (cmph_uint32)key.length());
ids<<id<<"\n";
}
For 500 keys, I get around 90-100 duplicate ids.
Is there something that I am doing wrong ?
Any help will be appreciated.
Thank You.
Please mark this as resolved.
The issue was due to the length argument of the search function. The string had a new line character which provided for an extra length and thus caused ambiguous results.