Menu

#1449 Crash while executing createStackFrame()

5.0.0
closed
nobody
None
none
1
2023-01-01
2017-05-16
No

The enclosed self-contained archive includes all that is necessary to create a crash of ooRexx 5.0.

Just unzip the attached package and run "pgm_01.rex".

Please consult the "readme.txt" file for further information.

1 Attachments

Discussion

  • Erich

    Erich - 2017-05-16

    You have an uninitialized variable i in the uninit() method in line 170 in test_util.cls
    The interpreter crashes while trying to report error 41.001 Nonnumeric value .. used in arithmetic operation

     
  • Rony G. Flatscher

    The ooRexx interpreter should never be able to crash in such a situation!

     
  • Per Olov Jonsson

    With the help of Rony I could make the test cases run also on a 64 bit ooRexx installation on a Mac, and they produce the same behaviour as on 32-bit ooRexx for Windows, or so it seems at least.

    I have enclosed a Zip with source, makefile and instructions on how to do the testing also on a Mac. I have corrected the uninitialised value referred to above, I do not see a difference before&after, so I do not think that "i" is the culprit.

    Running the program pgm_02.rex as-is I get a Bus error: 10, whereas running it in Valgrind I get a Segmentation fault: 11 error. I have enclosed logs for both cases.

    I suggest start reading the Valgrind log at

    ==2912== Process terminating with default action of signal 11 (SIGSEGV)
    ==2912== Bad permissions for mapped region at address 0x100E02C28

    This points to specific ooRexx libraries. I hope this can shed some further light on the bug(s) reported by Rony. I will continue to run this test with more options to try to capture more detail of where it goes wrong.

    PS in the Valgrind log this information:

    Syscall param msg->desc.port.name points to uninitialised byte(s)

    Emanates from a piece of code in syswrap-darwin.c (internal to Valgrind)

    static void pre_port_desc_read(ThreadId tid, mach_msg_port_descriptor_t *desc2)
    {

    pragma pack(4)

    struct {
    mach_port_t name;
    mach_msg_size_t pad1;
    uint16_t pad2;
    uint8_t disposition;
    uint8_t type;
    } desc = (void)desc2;

    pragma pack()

    PRE_FIELD_READ("msg->desc.port.name", desc->name);
    PRE_FIELD_READ("msg->desc.port.disposition", desc->disposition);
    PRE_FIELD_READ("msg->desc.port.type", desc->type);
    }

    And I think it can be ignored, the same error message appear also for fully functional ooRexx code.

     
  • Per Olov Jonsson

    I have continued to run this test case and the crash is consistently showing up. I have tried to extract the most relevant parts from the Valgrind log (attached) Hopefully some of these names of methods in named libraries can ring a bell?

    The first reported error is that there is a pointer (in the first thread/interpreter instance) socketcall.sendto(msg) that points to uninitialised byte(s)

    The reason seems to be a chain of calls starting with RexxCreateSessionQueue (read from the bottom to the top) in librexxapi.5.0.0.dylib (The DLL in macOS)

    --Begin Valgrind--

    Syscall param socketcall.sendto(msg) points to uninitialised byte(s)

    at sendto in /usr/lib/system/libsystem_kernel.dylib

    by

    SysSocketConnection::write(void, unsigned long, unsigned long)
    SysSocketConnection::write(void, unsigned long, void, unsigned long, unsigned long*)
    ServiceMessage::writeMessage(SysClientStream&)

    ClientMessage::send(SysClientStream*)
    ClientMessage::send()

    LocalAPIManager::establishServerConnection()
    LocalAPIManager::initProcess()
    LocalAPIManager::getInstance()
    LocalAPIContext::getAPIManager()
    RexxCreateSessionQueue

    in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib

    --End Valgrind--

    The second reported error (in the first thread/interpreter instance) is

    Interpreter::startInterpreter(Interpreter::InterpreterStartupMode)
    in librexx.5.0.0.dylib

    Caused by an uninitialised value created by a stack allocation by LocalAPIManager::establishServerConnection()
    in librexxapi.5.0.0.dylib

    --Begin Valgrind--

    Interpreter::startInterpreter(Interpreter::InterpreterStartupMode)
    in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib

    Address 0x7fff5fbfd452 is on thread 1's stack
    in frame #6, created by LocalAPIManager::establishServerConnection() (???:)
    Uninitialised value was created by a stack allocation

    in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib

    --End Valgrind--

    The actual crash occur the FIRST time a further thread is generated, Thread 1 is the first interpreter instance, Thread 2 is the first new one, the code in test_util.cls is this:

    ::method create class
    use strict arg number=10000, useReply=.false -- number of objects to create
    if useReply=.true then -- do it in another thread
    reply <------------ The crash occurs HERE
    do i=1 to number <-------- This line is never reached in Thread 2 (but the thread gets created)
    self~new
    if i//100=0 then .output~charout(".")
    if i//1000=0 then .output~charout(i)
    end
    .output~say

    The library is libsystem_pthread.dylib and the methods (?) are _pthread_body, _pthread_start and thread_start

    --Begin Valgrind--

    Thread 2:
    Invalid read of size 4
    at 0x10087D899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
    by 0x10087D886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
    by 0x10087D08C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
    Address 0x18 is not stack'd, malloc'd or (recently) free'd

    --End Valgrind--

    I will continue to drill deeper, I assume I can get hold of the source for all the libraries above?

    There are also 4 warnings from the compiler when creating the testcase

    test_gc.cpp:192:45: warning: format specifies type 'int' but the argument has
    type 'logical_t' (aka 'unsigned long') [-Wformat]
    instance, rc, rtc, pgmName, args, resu...

    It does not look worrying but I will try to silence it and see if it makes a difference.

    If there is ANYTHING you want me to test further just let me know.

     
  • Per Olov Jonsson

    I have now silenced the compiler warnings

    Warning: format specifies type 'int' but the argument has type 'logical_t'

    in

    test_gc.cpp:192:45:
    test_gc.cpp:192:77:
    test_gc.cpp:229:123:
    test_gc.cpp:229:146:

    Replacing %d with %lu in fprintf in
    rc=[%d]
    condition=[%d]
    bCondition=[%d]

    silenced the warnings, but the crash is the same.

     

    Last edit: Per Olov Jonsson 2017-10-11
  • Per Olov Jonsson

    Here are two more trace file with ooRexx build as debug. I still get "Segmentation fault: 11" when running this test case on macOS

    -Checked out revision 11317, complete
    -cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=DEBUG ../../oorexxsvn/main/trunk
    -make

     
  • Rony G. Flatscher

    Unfortunately, this crash still occurs on the tested Windows (32-bit) and on Linux (64-bit)!

    Enclosed please find the updated zip-archive, the full Linux thread stack traces and the full Windows stack trace.

    [In order to get the thread stack traces with and without local vars, I updated test_gc.cpp to allow it to be run on Linux (and MacOSX) by adding the external Function BsfGetTid() as on Unix SysQueryProcess("TID") is not available. Adapted the Rexx scripts accordingly and created new zip archive 'bug2a_20171021.zip" ]

     
  • Erich

    Erich - 2017-10-21

    I'm seeing that, at the time of the crash, when RexxActivation::getArguments() retrieves argCount it receives 1145128260, which is '44414544'x which is "DAED" in ASCII, which is the eye-catcher that dead objects are marked with (if running a DEBUG version).

     
    • Rick McGuire

      Rick McGuire - 2017-10-21

      Something a little more subtle. The exception occurs because an error occurs in an uninit method. While building the stack trace, if is picking up the stack frame of the just completed method that initiated the uninit processing. Because execution of that activation is complete, a lot of the information (in this case, the current instruction) is not valid. I believe this patch should fix the problem (untested in this scenario).

       
  • Erich

    Erich - 2017-10-21

    Committed code fix with revision [r11321].
    Regression tests run fine.
    The provided error case pgm01.rex runs fine

    Rick, thanks!!!!
    Rony, can you please confirm?

     

    Related

    Commit: [r11321]

  • Rony G. Flatscher

    Built 32-bit Windows and 64-bit Linux ooRexx from trunk and can confirm that the test case runs fine on either system.

     
  • Erich

    Erich - 2017-10-22
    • status: open --> pending
     
  • Rony G. Flatscher

    • Status: pending --> closed
     

Anonymous
Anonymous

Add attachments
Cancel