Menu

#260 longjmp error

Core Dump
open
nobody
None
5
2023-06-20
2023-06-15
No

I am porting a mashine learning application from SWI-PL that uses the C foreign language interface for implementing numerical linear algebra.
More precisely, I am using the higher-level interface, but since I am importing and exporting floatlists rather than primitive datatypes, I am currently still using a fair amount of p2c, c2p and p2p predicates on the C side.
Although the C code gives correct answers and works perfectly well when called, even repeatedly, when called in test files, my main Prolog application exits with an error of

*** longjmp causes uninitialized stack frame ***: terminated.

When I write the XSB log to a log file, it seems as if the error occurred during some innocuous Prolog list recursion, but surely it must be related to the C code?

Does anyone know how one could pin the error down any more?
Valgrind confirms there are no memory leaks in the core C procedure, but I don't manage to check the passing of Prolog terms back and forth since I cannot compile them to Bytecode (and Valgrind doesn't take xwam files).

Any help would be greatly appreciated!

Discussion

  • David S. Warren

    David S. Warren - 2023-06-15

    Hi Felix,
    As you note, XSB doesn't have builtin routines to pass arrays of basic XSB datatypes back and forth to C, so you must use (as you do) the various p2c_ and c2p_ functions. So that all seems right.
    As to the longjmp error, I'm not an expert in longjmp, but I would first try to locate the particular longjmp call that generates the error. By grepping through the emu directory .[ch] for longjmp, I find there are the following occurrences in XSB:

    biassert.c: longjmp(assertcmp_env, num);
    cinterf.c: longjmp(cinterf_env, num);
    error_xsb.c: longjmp(ccall_init_env, XSB_ERROR);
    error_xsb.c: longjmp(xsb_abort_fallback_environment, XSB_ERROR);
    error_xsb.c: longjmp(xsb_abort_fallback_environment, XSB_ERROR);
    error_xsb.c: longjmp(xsb_abort_fallback_environment, XSB_ERROR);

    I'm an old fashioned debugger so I add printf statements and recompile XSB (using the usual makexsb script). What system are you running on and have you compiled the XSB emulator? I.e., is it easy for you to recompile?

    The suspicious longjmp to me is the one in cinterf.c, the C interface code. If that's offending call, then we need to determine where the setjmp was previously called to set the environment that seems to be missing as indicated in the error message. If this is indeed the offending longjmp, then it probably has not been well-tested and could well still have bugs in it. The other longjmps I recognize and are used often in XSB execution, so I'd have less expectations for bugs there. But...

    I don't know if all this is obvious to you and you've already tried it, or if it might be helpful. Thanks for using XSB and I'll try to work with you to get your application running.

    -David

     
  • Felix Weitkämper

    Dear David,

    Thank you so much for the hints!
    I am on Ubuntu 20.04 and have compiled from source, so recompiling is no problem at all.
    I was so hung up on trying to find a root cause in my foreign language code that it somehow never occurred to me simply to take the message at face value.

    To my great surprise, the procedure you suggested showed that the only one of these 6 longjmps that is evoked is in the function xsb_throw_internal from error_xsb.c

     
  • Felix Weitkämper

    I have found the mistake in my code, and fixed it. In fact, it had nothing to do at all with my foreign language code, and everything with the "innocuous list recursion" I mentioned in my first post.
    The issue is that in SWI-Prolog, I can use "inf" to notate infinity, which occurs in the base case of a list minimum. On my XSB setup, comparing against "inf" seems to cause the illegal longjmp error.

     

    Last edit: Felix Weitkämper 2023-06-17
  • David S. Warren

    David S. Warren - 2023-06-17

    Interesting. I'm glad you found and fixed the problem. Yes, I know that our treatment of floats is more primitive than SWI's. I actually remember seeing that SWI had the inf constant and treated it as floating point infinity, but for some reason never incorporated that into XSB.
    But you still have uncovered a bug in XSB's handling of exceptions generated in C code. The idea of the error handing is to give you a more informative error message than the one you got:-). I'll take a look at that code to see if I can see anything obviously wrong with the error handling.
    Thanks!

     
  • Felix Weitkämper

    Thanks again! I assume what I should have gotten is the usual

    Error[XSB/Runtime/P]: [Type (inf in place of evaluable)] Wrong type in evaluable function compare-operator/2: (Goal: compare-operator(5.0,inf))

    from a well-behaved longjmp, as I would when running ?- 5.0 < inf on command line.
    I will see if I can pin down the conditions any better from my end, if that would be helpful.

    Of course, I am very happy to have located the issue. My main motivation for the port is the more sophisticated and stable tabling support and the sheer performance gain from the WAM compilation, and that showed immediately, even on my toy benchmarks.

    Do you happen to know of the top of your head whether there is a "float_max" or similar that I could use instead?

     
  • David S. Warren

    David S. Warren - 2023-06-17

    Yes, if you have more information about from where the exception was thrown, that would be helpful. I looked quickly and didn't see anything obvious. But that is to be expected since we throw those errors all the time in XSB and they work fine. You get what your suggest as a runtime XSB error and a forward trace.
    The largest floating point number in 64-bits is slightly larger than 1.797E308 . XSB doesn't have a "float_max" function, a limitation. You can generate the +inf representation by dividing a positive float by 0.0. You could use this as the max. E.g.
    | ?- 1.797E308 < 1.0/0.0.

    yes
    | ?-
    But it would have to be computed at every use, which you may not want to do.

     
  • David S. Warren

    David S. Warren - 2023-06-18

    Hi Fritz,
    You motivated me to add inf as a constant float, which will evaluate to the IEEE inf float value when used in expressions, as in X < inf, or in is/2 expressions. I hope this would have avoided the error you encountered when using inf. I've committed it to the XSB repository.
    I assume you were using a cinterf.c macro or function when the error was thrown. If so, do you happen to know what particular macro/function you were using?
    My current conjecture on why you got the longjmp error is that somehow the loading/linking of your C routines might not have resolved the appropriate externals correctly. Just a conjecture.
    -David

     
  • David S. Warren

    David S. Warren - 2023-06-18

    I said "Fritz" when I meant "Felix". Sorry....

     
  • Felix Weitkämper

    Thank you so much for committing this extra feature!
    Regarding pinning down the issue, loading the foreign module is insufficient to provoke the crash dump, but a single execution of the foreign language routine suffices.

    See below for the console output, where matrix_popt.H is the FLI header and the last line is just the German expression for crash dump.

    [xsb_configuration loaded, cpu time used: 0.001 seconds]
    [sysinitrc loaded]
    [xsbbrat loaded]

    XSB Version 5.0.0 (Green Tea) of May 15, 2022
    [x86_64-pc-linux-gnu 64 bits; mode: optimal; engine: slg-wam; scheduling: local]
    [Build date: 2023-06-15]

    | ?- [matrix_popt].
    [matrix_popt loaded]

    yes
    | ?- 5 < inf.
    longjmp from XSB throw internal++Error[XSB/Runtime/P]: [Type (inf in place of evaluable)] Wrong type in evaluable function compare-operator/2: (Goal: compare-operator(5,inf))
    Forward Continuation...
    ... x_interp:_$call/1 From /home/weitkaemper/XSB/syslib/x_interp.xwam
    ... x_interp:call_query/1 From /home/weitkaemper/XSB/syslib/x_interp.xwam
    ... standard:call/1 From /home/weitkaemper/XSB/syslib/standard.xwam
    ... standard:catch/3 From /home/weitkaemper/XSB/syslib/standard.xwam
    ... x_interp:interpreter/0 From /home/weitkaemper/XSB/syslib/x_interp.xwam
    ... loader:ll_code_call/3 From /home/weitkaemper/XSB/syslib/loader.xwam
    ... loader:find_ofile_and_load/7 From /home/weitkaemper/XSB/syslib/loader.xwam
    ... standard:call/1 From /home/weitkaemper/XSB/syslib/standard.xwam
    ... standard:catch/3 From /home/weitkaemper/XSB/syslib/standard.xwam

    | ?- matrix_popt:popt_weight_score([1.0],[0.0],[0.5],1.0,1,_).

    yes
    | ?- 5 < inf.
    longjmp from XSB throw internal longjmp causes uninitialized stack frame : terminated
    Abgebrochen (Speicherabzug geschrieben)

     

    Last edit: Felix Weitkämper 2023-06-18
  • David S. Warren

    David S. Warren - 2023-06-20

    Very interesting. The only thing I can think of, and it's a long shot, is that maybe your C routine is overflowing the C runtime stack. Is your C routine recursively traversing a list? That is dangerous because the C runtime stack is quite small for such things. If that is a possibility, you could either rewrite the code to be iterative, or you could compile XSB with a larger runtime stack. (I don't know the particular option for your C compiler but it should not be hard to find.)

     
    • Felix Weitkämper

      My original suspicion was that it has something to do with my clumsy use of prolog_terms, since I don't understand sufficiently how memory allocation works for those.
      For instance, I am effectively doing this at some point:
      prolog_term NewList = p2p_new();
      p2p_unify(NewList,OldList);
      To me, this seems to be conjuring stack space for an arbitrarily long list out of nowhere, since I only reserved memory for a variable. And they are not just pointers, as far as I understood, since I think I can now manipulate NewList without changing OldList, so I must have copied them in memory.

       

      Last edit: Felix Weitkämper 2023-06-20
  • David S. Warren

    David S. Warren - 2023-06-20

    Yes, that may be a more likely issue. The C routine is responsible for being sure that there is enough space on the heap to hold whatever it causes to be put there. The way to do this is to determine the amount of heap space needed, and then to call a routine to ensure that that much space is available. To see if this might be a problem, what I'd do is start XSB with a very large heap (which should be large enough to handle any large term constructed on the heap) and see if the problem disappears. This can be done by starting XSB with, say,
    xsb -m 100000
    which would initialize the heap (and local stack) to be 100Megabytes (the number is in K).
    Try this to see if the bad error message goes away....

     
    • Felix Weitkämper

      Having attempted to find a minimum example, I have realised that in fact the C function itself is immaterial to the error.
      For instance, I get exactly the same behaviour with the file test.c:

      #include <cinterf.h>
      void test(){}
      

      and header file test.h

      :- foreign_pred ptest() from
      test():void.
      

      which leads to exactly the same behaviour:

      | ?- [test].
      [Compiling Foreign Module ./test]
      [test compiled, cpu time used: 0.004 seconds]
      [Compiling C file ./test.c using gcc]
      gcc -s -o ./test.so -shared ./test.c xsb_wrap_test.c  -Wall -fPIC -I/home/weitkaemper/XSB/emu -I/home/weitkaemper/XSB/config/x86_64-pc-linux-gnu   -I"/usr/lib/jvm/java-11-oracle/include"  -O3 -fno-strict-aliasing   -fPIC -Wall -pipe -fsigned-char      -lm -ldl -Wl,-export-dynamic -lpthread 
      [test loaded]
      
      yes
      | ?- 5 < a.
      longjmp from XSB throw internal++Error[XSB/Runtime/P]: [Type (a in place of evaluable)]  Wrong type in evaluable function compare-operator/2: (Goal: compare-operator(5,a))
      Forward Continuation...
      ... x_interp:_$call/1  From /home/weitkaemper/XSB/syslib/x_interp.xwam
      ... x_interp:call_query/1  From /home/weitkaemper/XSB/syslib/x_interp.xwam
      ... standard:call/1  From /home/weitkaemper/XSB/syslib/standard.xwam
      ... standard:catch/3  From /home/weitkaemper/XSB/syslib/standard.xwam
      ... x_interp:interpreter/0  From /home/weitkaemper/XSB/syslib/x_interp.xwam
      ... loader:ll_code_call/3  From /home/weitkaemper/XSB/syslib/loader.xwam
      ... loader:find_ofile_and_load/7  From /home/weitkaemper/XSB/syslib/loader.xwam
      ... standard:call/1  From /home/weitkaemper/XSB/syslib/standard.xwam
      ... standard:catch/3  From /home/weitkaemper/XSB/syslib/standard.xwam
      | ?- test:ptest.
      
      yes
      | ?- 5 < a.
      longjmp from XSB throw internal*** longjmp causes uninitialized stack frame ***: terminated
      Abgebrochen (Speicherabzug geschrieben)
      
       
  • David S. Warren

    David S. Warren - 2023-06-20

    Hmmm. That is interesting. I'll see what I can find.

     

Log in to post a comment.