stxxl::sort gives segmentation fault

  • Hi.

    Somehow I seem to make the sorting algorithm stxxl::sort fail miserably - giving me a segmentation fault.

    I'm sorting elements looking like this:
    struct trie {
      trie() {
        state = ns_normal;
      trie(node_state state_):
        state(state_), Id(0), ParentId(0), i(0), branch_pos(0) {};
      node_state state;   
      unsigned Id;
      unsigned ParentId;
      unsigned i;
      unsigned branch_pos;
      word_type branch[BLOCK_COUNT];
      word_type cp1[BLOCK_COUNT];
      word_type cp2[BLOCK_COUNT];

    I use the stxxl::vector as a container for my elements - element which are 96 bytes large (on an amd64 arch.) and have approx. 2.7M elements to sort. Giving a total of something like 250MB of data (not much, but this is only a "small" test case on live data it should be >2GB.)

    Since all the fields in the struct are unsigned types (word_type is in this case unsigned 64bit int) i add a state field which can be ns_min, ns_normal or ns_max. Where min and max are used in the min/max_value compare functor calls to distinguish if it's a min or max element.

    The debugging i have done so far is using gdb on the core dump. The stacktrace can be found here:

    As far as i can see the problem here is that the maximum element is compared to something which doesn't have a valid memory address. I'm not entirely sure that it's not my fault, but it's not unlikely.
    I insert the elements using the stxxl::vectors .push(...) method and all elements contain valid data. No element contains pointers to other elements and all elements are unique when being compared to each other using the comparator functor.

    I have also tried compiling my program (and the stxxl library) with STXXL_DEBUG_LEVEL at level 1 and 2. But this has not given me any additional information. I can paste the output from this debug information on pastebin also if needed.

    I would like any help you can think of and I'm willing to post any debugging info required.

    Kind regards,

    • Hi Torsten,

      which stxxl version are you using? 1.1.0? or the (latest) trunk version from the svn repository?

      check if this bug has been already fixed in the trunk version (if you were not using it).

      As a temporary workaround (that might help) try to set your block size (which is in bytes) to a multiple of data item size (96 bytes).

      Best regards,

    • Dear Roman.

      First thanks for the kind response. It's always nice to see a developer care for the project and it's users.

      I've tried now with both 1.1.0 and the latest trunk version v1.1.1-20080205 (SVN r602M), but I still get the same result - segmentation fault during sort.

      I've also tried as you suggested, changing the block size to a multiple of item size - have currently tested with BlkSize_ = 96 * 1024, 96 * 1024 * 16, 96 * 1024 * 32, just to have some number which are close to the default 2MB block size. This unfortunately didn't help either... :(

      Doing a single step through the code of the sorting algorith, i found that the error always happens inside the  "create_runs(..)" method. To be more specific, it is always when sorting the last partial run. (In the 1.1.0 version it is in from line 245 - 262)

      I hope this is of some help to you, otherwise don't hesitate to ask for more information.

      Kind regards,
      Torsten B. Christiansen.

      • Update:

        I've been trying to dig deeper into this problem and this is what i can come up with so far.

        The sorting algorithm end up in the case where "first" is the first element in it's block and "last" is NOT the first element in it's block. So from reading the code i can see that it requests a single new block, copies data from the "last" block to the newly created and assign the remaining part of the new block with "max_value()" elements.

        Reading the stxxl file i've assigned (in the .stxxl file - only using a single file) with gnome hex editor (ghex2) i can see that all my elements are saved correctly to disk. And i can also see that the copy part described above also working correctly - since the new block contains the last elements in the vector and some element constructed by "max_value()".

        I hope this helps a little bit... :)

        Kind regards,

    • Hi Torsten,

      how does your comparision functor trie_compare_first look like?
      Does it work correctly if one operand is max_value()?
      Does the data to be sorted include max_value() or min_value() elements?

      Some things you might try:

      There is another development branch:

      To make the program more chatty, you could compile it with -DSTXXL_VERBOSE_LEVEL=1 (that should print a lot of sorting progress messages). (No need to compile the stxxl library with this option.)

      What happens if you change the number of elements to be sorted (add/remove some elements)?
      What happens if you change the amount of memory given to the sort call?
      (In both cases the size of the last run should change...)

      In .stxxl, add another diskfile. You can put it on the same disk, this will be inefficient, but we are just bug hunting.

      Is there a chance to provide a (minimized) example program that produces this error?

      Checking stxxl programs with valgrind is hard, but if you reduce the memory usage of the program, buffers, ... it might be possible (if you have a lot of RAM).


      • Hi Andreas.

        1) The element, comparison functor and defines can be found here:

        2) As you can see from the code, when a max_value element is compared with a normal element it should return either true or false based whether "l" or "r" is the max_value element. When I use gdb to backtrack the place where it crashes the max_value element it shown just fine, the problem is that the element being compared with does not exists - or has an invalid memory address.

        3) I do not insert neither min_value nor max_value in my own code. The only place I can see that there elements are inserted, is inside the sorting method itself.

        4) The branch you suggest does not work with my program. I get compilation error regarding missing -= operator for vector and so forth.

        5) Verbose output can be found here:

        6) Changing number of elements does nothing, unfortunately. Tried with a completely different dataset.

        7) Changing the memory has no effect as long as the amount used is smaller than the amount required to sort the entire vector in memory. However, if there is enough memory for sorting the entire vector, the sorting works just fine and continues without problems.

        8) Adding another diskfile made no difference. It still crashed if it cannot be sorted in memory.

        9) I will see if i can write a little sample program later today or tomorrow. My complete program would be far to much for pasting here and even on pastebin... :)

        Kind regards,

    • Hi Torsten,

      trie_compare_first does not fulfill the properties of a strict weak ordering for min_value()/max_value(), e.g.
      cmp(min_value(), min_value()) must return false.

      I added a check for this in the trunk (r603), your program should now die with a failed assertion at the beginning of the sort() function (unless compiled with NDEBUG defined).
      Please fix the comparator, hopefully this should fix the segmentation fault, too.


      • Hi Andreas.

        THANK YOU!!! Finally it's working as it should - no seg. faults anymore.

        Why on earth didn't I manage to think of this on my own...? Perhaps a little read up on what strict weak ordering is would help.

        Thanks again - i'm really grateful that you both taken the time to help.

        Kind regards,