#23 segfault running test suite on py2.3 x86_64

Errors
closed-fixed
7
2005-02-28
2005-02-10
Matthew L Daniel
No

Testing: BuildClass, FindClass, ClassList, {...} ... ok
Testing: Clear, Reset, Class.PPForm, Class.Description,
Class.Module ... ok
Testing: Class.BuildSubclass ... ok
Testing: Class.WatchSlots, Class.WatchInstances ... ok
Testing: Class.MessageHandlerIndex,
Class.MessageHandlerWatched ... ok
Testing: Class.UnwatchMessageHandler,
Class.WatchMessageHandler ... ok
Testing: Class.MessageHandlerName,
Class.MessageHandlerType ... ok
Testing: Class.NextMessageHandlerIndex,
Class.MessageHandlerDeletable ... ok
Testing: BuildMessageHandler, Class.AddMessageHandler,
{...} ... ok
Testing: Class.BuildInstance, Class.RawInstance ... ok
Testing: Class.MessageHandlerList,
Class.AllMessageHandlerList ... ok
Testing: BuildInstance, Class.Deletable,
Instance.Slots, Instance.PPForm ... ok
Testing: FindInstance, Instance.Class, Instance.Name ... ok
Testing: LoadInstancesFromString ... ok
Testing: Slots.Names, Slots.Exists,
Slots.ExistsDefined, {...} ... ok
Testing: Slots.Cardinality, Slots.AllowedValues ... ok
Testing: Slots.Types, Slots.Sources ... ok
Testing: Slots.IsPublic, Slots.IsInitable, Slots.Range
... ok
Testing: Slots.IsWritable, Slots.HasDirectAccess,
Slots.Facets ... ok
Testing: ClassList, InitialClass, FindClass, Class.Next
... ok
Testing: InitialDeffacts, DeffactsList, Deffacts.Next,
Deffacts.Name ... ok
Testing: FindDeffacts, Deffacts.PPForm,
Deffacts.Deletable, {...} ... ok
Testing: InitialDefinstances, DefinstancesList,
Definstances.Next, {...} ... ok
After installing the module, I ran `cd testsuite;
python tests.py` and it yielded the following segfault.

If I can help further, please contact me.

===
Testing: Definstances.Name, Definstances.Module,
Definstances.Deletable ... ok
Testing: FactList, InitialFact, Fact.Next, Fact.Index
... ok
Testing: InitialFunction, FunctionList, Function.Name,
Function.Next ... ok
Testing: FindFunction, Function.PPForm, Function.Watch,
{...} ... ok
Testing: BuildGeneric, Generic.Name, Generic.PPForm,
Generic.Watch ... ok
Testing: InitialGeneric, GenericList, FindGeneric,
{...} ... ok
Testing: BuildGlobal, Global.Name, Global.PPForm, {...}
... ok
Testing: InitialGlobal, GlobalList, FindGlobal,
Global.Watch, {...} ... ok
Testing: InstancesChanged, InitialInstance,
FindInstance, Instance.Next ... ok
Testing: Instance.IsValid, Instance.Remove,
Instance.DirectRemove ... ok
Testing: Instance.Class, Instance.Slots, Instance.Send
... ok
Testing: Class.InitialInstance,
Class.InitialSubclassInstance, {...} ... Segmentation
fault (core dumped)

====
(gdb) bt
#0 EnvGetNextInstanceInClassAndSubclasses_PY
(theEnv=0x57ff80,
cptr=0x100000000, iptr=0x1,
iterationInfo=0x7fbfffde40) at inscptch_py.c:98
#1 0x0000002a97f1c11a in
g_getNextInstanceInClassAndSubclasses (
self=0x57ff80, args=0x2a9821a3b0) at clipsmodule.c:5719
#2 0x0000003f0218973f in _PyEval_SliceIndex ()
from /usr/lib64/libpython2.3.so.1.0

Discussion

1 2 > >> (Page 1 of 2)
  • Logged In: YES
    user_id=328337

    Hi...

    Sorry for noticing it so late: I thought I was monitoring
    this forum and apparently I'm not. Now I'll start working on
    the bug.

    In fact the only 64 bit platform I have for testing is an
    old SPARC, and it did not show any problem using the test
    suite. I'll try to figure out what happened using your
    backtrace...

    Thank you for submiting the bug!

    F.

     
    • priority: 5 --> 7
    • assigned_to: nobody --> franzg
     
  • Logged In: YES
    user_id=88251

    I believe I am in a position to help you work through this.
    While direct ssh to my machine is not an option, I am
    relatively competant with gdb and c hacking. I am just not
    an expert at Python modules.

    If I can help, please let me know.

    Toward that end, doesn't Sourceforge have an x86_64 in their
    compile farm?

     
  • Logged In: YES
    user_id=328337

    Well... of course you can be of big, big help, especially as
    you are offering it! The fact is only that I usually do not want
    to bother other people *asking* for help. And I still have not
    investigated whether or not SF offers an x86_64 in their CF
    (but I was going to, I confess).

    I'm not so good with gdb: I'm so bad at it that I use ddd to
    feel more comfortable - I must say that this graphical front
    end is also a great tool.

    Apart from my stupid comments above, now I'm "unfolding"
    the guilty test, and if you don't mind I'll attach to my next
    comment a Python source file containing the tests in a more
    straightforward sequence: then I will be able to really see
    what happens.

    There are some things that make me think, in fact:

    1) the crash happens apparently when the g_* function is
    called for the first time (the instance pointer is NULL)
    because it's line 5719: from the traceback it looks like this
    NULL is passed as 0x1. Strange, isn't it? Also the other
    pointer (cptr), which should be almost ordinary, is a rather
    curious 0x100000000! I think we should look at this.

    2) the "self" parameter passed to a Python-visible function is
    also a pointer, actually to a PyObject, which should have
    nothing to do with CLIPS environments. But then my g_*
    function expands void *CurrentEnvironment() exactly to the
    same pointer as self (=0x57ff80). This is also quite strange,
    looks like I'm misusing something somewhere, but for now I
    can't figure out where or what.

    By the way, I lied before: I wasn't remembering that my
    SPARC Python is compiled in 32 bit mode, and thus all
    extensions are compiled exactly the same way. So I'm still
    unable to give any results about 64 bit environments: your
    help will surely be precious.

    I'll pop up later with this two or three tests, so that at least
    we will see what are the parameters passed to g_... before
    the segfault.

    Thank you again,

    F.

     
  • The "unfolded" test

     
    Attachments
  • Logged In: YES
    user_id=328337

    Here again...

    if you would like to use the attached file and report the
    output, I will see when the error occurs. Maybe it's anyway
    a "border condition" (mathematically speaking).

    Til soon,

    F.

     
  • Logged In: YES
    user_id=88251

    I built _clips.so with debugging (-g) and ran the g_test.py.
    Please find the output below.

    I also discovered the joy of "bt full" in gdb, which prints
    out the local variables, too. That should help lots.

    #0 EnvIncrementInstanceCount (theEnv=0x520530,
    vptr=0x4554415254535f48)
    at ./clipssrc/insfun.c:112
    No locals.
    #1 0x0000002a9574b15a in
    g_getNextInstanceInClassAndSubclasses (
    self=0x520530, args=0x0) at clipsmodule.c:5725
    p = (clips_InstanceObject *) 0x2a955c8290
    q = (clips_InstanceObject *) 0x2a955c82b0
    c = (clips_DefclassObject *) 0x2a955c62b8
    o = {supplementalInfo = 0x0, type = 4, value =
    0x522cd0, begin = 1,
    end = 4294967295, next = 0x0}
    ptr = (void *) 0x4554415254535f48
    #2 0x0000003f0218973f in _PyEval_SliceIndex ()
    from /usr/lib64/libpython2.3.so.1.0

     
  • Logged In: YES
    user_id=88251

    Further, the vptr is actually an ascii string; I am not a
    savy enough stack-overflow guru to know which order it is,
    but if either of these two strings look familiar to you,
    even the better:

    H_STRATE
    ETARTS_H

     
  • Logged In: YES
    user_id=328337

    Hmmm... this looks more normal than the one before. But the
    vptr (the actual instance address) seems not to have been
    initialized by the underlying CLIPS engine to anything
    useful: in fact the string "H_STRATE" is either a substring
    of "BREADTH_STRATEGY" or "DEPTH_STRATEGY" which are declared
    as "manifest constants" in the high level module, and thus
    have also a string representation (their name in __dict__).
    This means that ptr was set to something that points to the
    outer space. Unfortunately I have not much control on CLIPS
    internals (it's the GetNextInstanceInClassAndSubclasses_PY
    function that lets the "cursor" advance in the list of
    instances, and that is almost copied from CLIPS source).

    But there is something that gives me some thought in the
    local variables dump you report. I guess that a DATA_OBJECT
    -- it is a struct -- uses the "begin" and "end" members to
    track some array bounds, or something like that: actually
    they are integers or integeroids (printf usually shows
    addresses in hex, and I bet gdb does the same ;-)). Did you
    see? The "end" member is exactly 4294967295. If it was a 32
    bit integer, it would be 0xffffffff which is -1 in decimal
    (a good value to say "hey, there's nothing beyond this" or
    to construct a MULTIFIELD with no elements). But maybe the
    standard "int" type with gcc -m64 is 64 bit long, and in
    this case "end" is just a plain, useless number... Since I
    often see during the CLIPS compilation some
    "signed/unsigned" mismatches or "int/size_t" mismatches, I'm
    a little bit suspicious about the possibility of one of
    these conversions to really be dangerous!

    Unfortunately I also don't see any of the output that
    g_test.py provides: I need just that to see what parameters
    are given to the function when the module crashes. Maybe gdb
    eats up the debuggee's output... Could you please just
    attach the output of a non-gdb session running g_test.py, so
    that I can see the progress? It would be very kind of yours.

    Til soon!

    F.

     
  • Logged In: YES
    user_id=88251

    Sorry, I don't know why I forgot to post the output:

    Testing PyCLIPS top level module
    Building classes
    Initial/Next Instance 1
    Initial/Next SubclassInstance 1
    Test01
    Test02
    Test03
    Test04
    Initial/Next Instance 2
    Initial/Next SubclassInstance 2
    Test05
    Test06
    Testing ends of lists
    Test07 ok
    Test08 ok
    Test09 ok
    Segmentation fault (core dumped)

     
1 2 > >> (Page 1 of 2)