I use cmph to create minimal perfect hash function from integer sets (i dont need collisions). I use cmph_io_struct_vector_adapter interface for it. And it often works. But on some input data I get segmentation fault (exactly, gdb say that error in cmph_packed_size or cmph_search, but it was troublesome to build debug version of libcmph).
My sets's lengths is about 5-500 elements and speed of searching hash are very important, so CMPH_FCH algorithm provides the greatest interest, but it falls.
For creating hash function I used something like cmph_io_struct_vector_adapter(items_to_hash,sizeof(cmph_uint32),0,sizeof(cmph_uint32),items_len); and cmph_search(hash,(char*)&item_to_hash,sizeof(int)) for search.
For example, CMPH_FCH, CMPH_BDZ, CMPH_BDZ_PH falls on {7576423,7554496} set, and CMPH_CHD_PH, CMPH_CHD falls on {2184764,1882984,1170551}. I cann't find integer set for correctly work of CMPH_BRZ.
Simplified version of programm showed what I exactly do attached, libcmph version is 1.1.
So, is it a bug or is restriction? Is it possible to use libcmph to stable create hash of integer sets? or exists better way?
segfault_example
As a temporary solution can be used cycle change of graphsize parameter (from 0 to 10 with step .1 for example) with CMPH_CHM algorithm, that can be set using cmph_config_set_graphsize(config, graphsize).
Problem relates with failure of build graph algorithm.
We will have a release with all reported bugs fixed soon. If you use the CHM algorithm you will get order-preserving PHFs. However, they take much more space.
Let me know if I have not answered your questions.
Fabiano.
This will be fixed in the next release. I also want to clarify how the CMPH_BRZ algorithm is supposed to be used. The following comment has been added to src/brz.h
/*
* The BRZ algorithm has been built so to consume the bare minimum
* amount of memory to generate the MPHFs. Thereby we decided
* to dump the resulting MPHFs to disk while creating them. Thus,
* to use the BRZ algorithm, one has to call brz_config_set_mphf_fd
* before calling brz_new. Otherwise we will fail the MPHF creation.
* One side effect of this design decision is that the resulting
* MPHF cannot be used until its dumping process is finalized
* by calling brz_dump and the caller must use brz_load before
* any call to either one of the following functions is made:
* brz_search
* brz_pack
* brz_packed_size
* brz_search_packed
*/