Menu

#11 hash from integers: segfault

open
nobody
None
7
2011-12-27
2011-12-26
romanoved
No

I use cmph to create minimal perfect hash function from integer sets (i dont need collisions). I use cmph_io_struct_vector_adapter interface for it. And it often works. But on some input data I get segmentation fault (exactly, gdb say that error in cmph_packed_size or cmph_search, but it was troublesome to build debug version of libcmph).
My sets's lengths is about 5-500 elements and speed of searching hash are very important, so CMPH_FCH algorithm provides the greatest interest, but it falls.
For creating hash function I used something like cmph_io_struct_vector_adapter(items_to_hash,sizeof(cmph_uint32),0,sizeof(cmph_uint32),items_len); and cmph_search(hash,(char*)&item_to_hash,sizeof(int)) for search.
For example, CMPH_FCH, CMPH_BDZ, CMPH_BDZ_PH falls on {7576423,7554496} set, and CMPH_CHD_PH, CMPH_CHD falls on {2184764,1882984,1170551}. I cann't find integer set for correctly work of CMPH_BRZ.
Simplified version of programm showed what I exactly do attached, libcmph version is 1.1.
So, is it a bug or is restriction? Is it possible to use libcmph to stable create hash of integer sets? or exists better way?

Discussion

  • romanoved

    romanoved - 2011-12-26

    segfault_example

     
  • romanoved

    romanoved - 2011-12-27
    • priority: 5 --> 7
     
  • romanoved

    romanoved - 2013-02-05

    As a temporary solution can be used cycle change of graphsize parameter (from 0 to 10 with step .1 for example) with CMPH_CHM algorithm, that can be set using cmph_config_set_graphsize(config, graphsize).
    Problem relates with failure of build graph algorithm.

     
  • Fabiano C. Botelho

    We will have a release with all reported bugs fixed soon. If you use the CHM algorithm you will get order-preserving PHFs. However, they take much more space.

    Let me know if I have not answered your questions.
    Fabiano.

     
  • Fabiano C. Botelho

    This will be fixed in the next release. I also want to clarify how the CMPH_BRZ algorithm is supposed to be used. The following comment has been added to src/brz.h

    /*
    * The BRZ algorithm has been built so to consume the bare minimum
    * amount of memory to generate the MPHFs. Thereby we decided
    * to dump the resulting MPHFs to disk while creating them. Thus,
    * to use the BRZ algorithm, one has to call brz_config_set_mphf_fd
    * before calling brz_new. Otherwise we will fail the MPHF creation.
    * One side effect of this design decision is that the resulting
    * MPHF cannot be used until its dumping process is finalized
    * by calling brz_dump and the caller must use brz_load before
    * any call to either one of the following functions is made:
    * brz_search
    * brz_pack
    * brz_packed_size
    * brz_search_packed
    */

     

Log in to post a comment.