cmph / Bugs / #15 The MPHFs for CHD algorithms do not generate unique ids even though the parameters 't' and 'b' are set to 1.

#15 The MPHFs for CHD algorithms do not generate unique ids even though the parameters 't' and 'b' are set to 1.

Milestone: v1.0 (example)

Status: open

Owner: nobody

Labels: CHD (1) unique ids (1)

Priority: 6

Updated: 2017-01-07

Created: 2017-01-06

Creator: Deepti Sabnani

Private: No

I am running the CHD algorithm in the CMPH library on a set of 500 keys and then I check for the ids the algorithm generates for each key.-
As per my understanding the ids should be unique for each key as the algorithm generates a minimal perfect hash function. However, I observe a lot of duplicate ids for distinct keys. I also set the parameters (keys_per_bin) to 1.

cmph_t* hash = NULL;
// source of keys
cmph_io_adapter_t *source = cmph_io_nlfile_adapter(fp);
cmph_config_t *config = cmph_config_new(source);
cmph_config_set_algo(config, CMPH_CHD);
**cmph_config_set_keys_per_bin(config, 1);
cmph_config_set_b(config, 1);**
cmph_config_set_verbosity(config,1);
hash = cmph_new(config);
cmph_config_destroy(config);
char *k = (char *)malloc(sizeof(15));
while(fgets(k, INT_MAX, read) != NULL){
string key = k;
unsigned int id = cmph_search(hash, k, (cmph_uint32)key.length());
ids<<id<<"\n";
}

For 500 keys, I get around 90-100 duplicate ids.
Is there something that I am doing wrong ?
Any help will be appreciated.

Thank You.

Discussion

Deepti Sabnani - 2017-01-07

Please mark this as resolved.
The issue was due to the length argument of the search function. The string had a new line character which provided for an extra length and thus caused ambiguous results.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

The MPHFs for CHD algorithms do not generate unique ids even though the...

Group

Searches

Help

#15 The MPHFs for CHD algorithms do not generate unique ids even though the parameters 't' and 'b' are set to 1.

Discussion