Menu

#1736 Intermittent Segfaults

None
open
nobody
None
none
1
2021-02-15
2020-12-30
Bob Jewett
No

I am getting intermittent Segmentation Faults using with Open Object Rexx Version 5.0.0 r12142.
System is Linux Debian 10, kernel 4.19.146-1.

Dec 29 19:33:25 main1 kernel: [265621.347007] CharlotteGlucos[30848]: segfault at 180 ip 00007f93711444ac sp 00007f936eadbd70 error 4 in librexx.so.4[7f9370fda000+218000]

Dec 29 19:33:25 main1 kernel: [265621.347011] Code: 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 53 48 83 ec 08 48 8b 1d 2b 25 2c 00 48 8b 43 10 eb 16 0f 1f 44 00 00 48 83 c0 08 <80> bd 80 01 00 00 00 48 89 43 10 74 4b 48 39 43 30 74 61 48 8b 4b

I'm not sure where to start. How do I go about troubleshooting this?

Discussion

  • Erich

    Erich - 2020-12-30

    Hi Bob, an at least somewhat consistently failing piece of code will most surely be required.

    To start with debugging see our Wiki https://sourceforge.net/p/oorexx/wiki/how-to-debug-oorexx/, but beware: debugging segfaults isn't easy at all

     
  • Anonymous

    Anonymous - 2020-12-30

    Thanks Erich.
    This may take a long time for several reasons. I am a little green when it comes to C++ code. I have never used the gdb tool. The problem I am having fails once in a blue moon. The program has several threads (using the GUARD ON WITH x<>y). I am using ADDRESS "sh" "command" WITH OUTPUT USING (stdout) ERROR USING (stderr) for the 1st time and in a thread to boot.

    I wonder if it has some kind of timing issue? Maybe with garbage collection or with the GUARD?

    I think this is going to be quite a challenge for me. I'm not very optimistic that I am going to succeed, but I will give it my best shot. I think I will start with the fully-optimized RELEASE code with gdb. If I get the optimized code to fail, I will proceed to the DEBUG version.

     
  • Bob Jewett

    Bob Jewett - 2020-12-30

    Sorry about the Anonymous post above.

    I have downloaded the source from svn and compiled both RELEASE and DEBUG versions. I have installed the executables in there own respective private folders not included in the PATH environment. I have created a couple of scripts, one for RELEASE and one for DEBUG. They temporary change the paths before invoking the rexx application to start a passed ooRexx program . This seems to override the shebang on the first line of the ooRexx program.

    I have several ooRexx programs running as daemons and from crontab using ooRexx from the installed ooRexx package. Do I need to worry about conflicts with rxapi when testing an application using the compiled RELEASE or DEBUG code?

     
  • Bob Jewett

    Bob Jewett - 2021-01-01

    Ok, I decided to remove the current ooRexx package and install ooRexx from a DEBUG deb package I compiled from a local oorexxsvn directory containing r12143.

    I have an oorexx program running as a daemon named evolution-sync. I just realized that this program is crashing quite frequently. It has been getting crashes and other errors since I installed r12142. I had an older 5.0.0 revision (maybe r11969) installed before this and it was not having a problem.

    So I have attached a file containing a stack trace with variables that seem to show the issue. Thread 1 stack #1 ActivityManager.cpp:323 seems to show a problem variable: activity = 0x0.

     
  • Erich

    Erich - 2021-02-09

    Bob, can you pinpoint the offending exact revision in the range r11969 - r12143? You'd have to build an intermediate revision in the range and test whether it fails and repeat within the appropriate sub-range of revisions (a lot of work, I know ..)

    Or, alternatively, do you have a piece of code that eventually fails, so that we can run tests against it?

     
  • Bob Jewett

    Bob Jewett - 2021-02-09

    Erich,
    The revision I am using is r12143. I'm not sure that I understand what you are looking for.

    I captured r12143 using svn to a local directory and compiled both debug and non debug ooRexx from it. I have not updated it since. The segfault is very intermittent. I do not have any sample code that shows the problem. I am currently running the ooRexx debug version on my system.

    One program (dispatched using linux cron every 5 minutes) starts a concurrent thread to test devices on my LAN (one device per thread, about 30). Each concurrent thread runs several bash shell commands (not using the ADDRESS instruction) to gather info about the device to see if it is working properly. This program seems to get the most segfaults (maybe 10-20 per day). I have attached the latest segfault for this program. I have no experience with "gdb" other that how to start it. :-)

    I am not an experienced C++ programmer. Know enough to be dangerous. :-) I did use C when I programed firmware for my hardware before I retired from IBM, I hated it! I currently do most of my programing (personal use only) using ooRexx and Python.

     
  • Anonymous

    Anonymous - 2021-02-14

    My Debian 10 (Buster) system uses coredumpctl to manage the coredumps . I realize now that the information produced using coredumps -1 debug (it runs gdb) does not produce the detailed information. I'm so sorry about that.

    I'm still working on getting more info about this segfault.

    I have found one of my rexx programs (a very short program) that seems to fail about 1% of the time. I made it into a program called TestThread. I am trying to make it simpler and still fail. I have written a bash script to start this program (looping forever) and when it crashes run gdb in batch mode on the coredump file.

    I am using the following gdb command in the script.

    gdb -ex "set logging file /tmp/TestThread_msg" -ex "set logging on" -ex "info threads" -ex "thread apply all backtrace" -ex "thread apply all backtrace full" -ex quit --batch /usr/bin/rexx  /tmp/TestThread_coredump
    

    Do you think the gdb command above is sufficient in obtaining data that may be helpful?

    I have included the latest output from the above gdb. Please let me know if this is what you expect. This segfault seems to be different than the one I first reported.

    Bob

     
  • Bob Jewett

    Bob Jewett - 2021-02-15

    Sorry about the Anonymous message yesterday.

    I have been able to create an ooRexx "test_thread" program that shows the segfault on my Debian 10 (Buster) 64bit system. Because it fails so infrequently, I have also created a bash script "test_loop" to run it.

    Attached are the 2 programs and a "test_thread_msg" file containing the results of the "test_loop" bash script.

    Please let me know if this works.

     

Anonymous
Anonymous

Add attachments
Cancel





Auth0 Logo