#217 segmentation violation in large programs

v3.1
closed
5
2012-08-14
2006-12-27
No

We are developing a large infrastructure in oorexx, version 3.1.1, on a x86 linux box.

We are using redhat enterprise server 3 update 2.

When I do rexx -v I get as output:
Open Object Rexx Interpreter Version 3.1.1 for LINUX
Build date: Nov 13 2006

The project consists of over 2700 lines of code, from approximately 50 routines using the ::ROUTINE command.

At some point we seem to cross a certain limit and any program we write crashes with a segmentation violation.

If we arbitrarily remove one routine, or another, it will work - but eventually we will need all the routines to be available.

Please help asap - I have four programmers going around in circles and a deadline approaching...

Attached is a tar file with a sample that demonstrates the problem.

ttt.rexx is the main. When we run it, we get a segmentation violation.
ttt does requires of genned.rexx
genned does requires of seclev.rexx.

Note that we get the segmentation violation even before the first line of code is executed.

This gives a reasonable simulation of our code - I think the business of 2 levels of requires is critical to the bug - see below.

Please note the following that occurred when I was constructing the sample:

a/ You will notice that in genned I have commented out code.

b/ If I do NOT include the requires to seclev, and I uncomment ALL of genned, it runs correctly.

c/ If I then add the requires to seclev, but seclev is a file consisting solely of #!/usr/bin/rexx I get the following output:

11 *-* ::requires *genned.rexx*

REX0005E: Error 5 running /home/jcl2ksh/ari/rexxcrash/ttt.rexx line 11: System resources exhausted

d/ This is why I had to comment out parts of genned. Then the code ran, but when I added body to seclev, it crashed on the segmentation violation.

Discussion

  • Ari Uniksoki

    Ari Uniksoki - 2006-12-27

    tar file with example

     
    Attachments
  • Mark Miesfeld

    Mark Miesfeld - 2006-12-27

    Logged In: YES
    user_id=191588
    Originator: NO

    Ari,

    Thanks for providing the additional information. I'll take a look at this, but I will have to do it tomorrow - I don't have a Linux box up and running here.

    Rick McGuire will have better insight into this than I and he might also comment.

     
  • Mark Miesfeld

    Mark Miesfeld - 2006-12-28

    Logged In: YES
    user_id=191588
    Originator: NO

    Ari,

    It was great that you provided the sample program. I'm going to put some notes on my initial investigation here. Then, hopefully, if Rick can take a look at this he won't have to duplicate my work.

    1.) The segmentation fault is easily reproduced on my system with your example program.

    a. The exact same program that crashes on Linux, does not crash on Windows and runs fine.

    b. The Linux box I used is SuSE 10.1, so it seems to be a generic problem on Linux.

    c. The program behaves the same way (seg faults) under ooRexx 3.1.0 also.

    2.) The program behaves on my system as you described, in that it will seg fault as written, but if you comment out some of the the public routines, you will get the System resources exhausted message.

    Unfortunately, I do not have near the understanding of, or experience with, the internals of the ooRexx interpretor that Rick does. I will continue to work on this, but it may take me a while. When Rick has some free time, I'm sure he will give this his consideration. Ultimately, the solution is likely to come from him.

     
  • Mark Miesfeld

    Mark Miesfeld - 2006-12-28

    Logged In: YES
    user_id=191588
    Originator: NO

    The specific cause of the seg fault, when I run the program on a SuSE 10.1 box is in RexxBehaviour::methodLookup.

    In that function, the problem is in this line:

    methodObject = (RexxMethod *)this->methodDictionary->stringGet(messageName);

    At the point of the crash, the methodDictionary object pointer is corrupt. The function tests that the pointer is not OREF_NULL, and it isn't. But it is not a valid pointer. Some print out showing this:

    methodLookup() this: 0xb78adce0 methodName ptr: 0xb78829fc methodName: UNINIT
    methodDictionary not null: 0xb78b5368
    methodLookup() this: 0xb788746c methodName ptr: 0xb78d5868 methodName: RUN_PROGRAM
    methodDictionary not null: 0xb788e6a4
    methodLookup() this: 0xb791ff78 methodName ptr: 0xb787119c methodName: ==
    methodDictionary not null: 0x5
    Segmentation fault
    Raven:/work/tools/work.ooRexx/bugs/bug.1623151 #

     
  • Rick McGuire

    Rick McGuire - 2006-12-28

    Logged In: YES
    user_id=1125291
    Originator: NO

    And it's already been getting some of my attention, but a bit handicapped by A) a nasty case of the flu, and B) current lack of a Linux system to try to reproduce/debug this. The initial report seemed to point to a garbage collection problem, but I couldn't figure out why it would fail on Linux but not on Windows. However, the System Resources Exhausted error message I think provides a vital clue. I'm guessing this program bumps up against the initial interpreter memory allocation, and attempts to extend the memory by allocating additional memory segments is failing for some reason. This is handled in the platform-specific layer, which would explain why it wouldn't fail on Windows. Now we just need to narrow this down a little.

    Rick

     
  • Rick McGuire

    Rick McGuire - 2006-12-28

    Logged In: YES
    user_id=1125291
    Originator: NO

    Mark, any chance you can generate a call stack traceback of the point of the failure? Knowing the point where this method lookup is getting performed would be of enormous assistance.

    Rick

     
  • Mark Miesfeld

    Mark Miesfeld - 2006-12-28

    Logged In: YES
    user_id=191588
    Originator: NO

    Simple back trace from gdb:

    Raven:/work/tools/work.ooRexx/bugs/bug.1623151 # gdb rexx
    GNU gdb 6.4
    ...
    This GDB was configured as "i686-pc-linux-gnu"...(no debugging symbols
    found)
    Using host libthread_db library "/lib/libthread_db.so.1".

    (gdb) set args ttt.rexx
    (gdb) run
    Starting program: /usr/bin/rexx ttt.rexx
    ...
    [Thread debugging using libthread_db enabled]
    [New Thread -1212119376 (LWP 26200)]
    (no debugging symbols found)
    ...

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread -1212119376 (LWP 26200)]
    0xb7f32f93 in RexxBehaviour::methodLookup () from
    /opt/ooRexx/lib/ooRexx/librexx.so.3
    (gdb) where

    0 0xb7f32f93 in RexxBehaviour::methodLookup () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    1 0xb7eddd75 in RexxObject::messageSend () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    2 0xb7ee3ee9 in RexxString::hash () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    3 0xb7f36f3c in RexxHashTable::stringGet () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    4 0xb7f0fa95 in RexxSource::addVariable () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    5 0xb7f1226a in RexxSource::addText () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    6 0xb7f138b3 in RexxSource::subTerm () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    7 0xb7f142a7 in RexxSource::messageTerm () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    8 0xb7f143f9 in RexxSource::instruction () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    9 0xb7f14b18 in RexxSource::translateBlock () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    10 0xb7f16007 in RexxSource::directive () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    11 0xb7f166b8 in RexxSource::translate () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    12 0xb7f16968 in RexxSource::method () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    13 0xb7ed32f8 in RexxMethodClass::newRexxMethod () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    14 0xb7ed336b in RexxMethodClass::newRexxBuffer () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    15 0xb7f1bbac in SysRestoreProgram () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    16 0xb7f284cd in RexxActivation::loadRequired () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    17 0xb7f10c81 in RexxSource::processInstall () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    18 0xb7f29ea4 in RexxActivation::run () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    19 0xb7ed4775 in RexxMethod::call () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    20 0xb7f285c2 in RexxActivation::loadRequired () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    21 0xb7f10c81 in RexxSource::processInstall () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    22 0xb7f29ea4 in RexxActivation::run () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    23 0xb7ed4849 in RexxMethod::call () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    24 0xb7edcfc6 in RexxObject::shriekRun () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    25 0xb7f1abdf in SysRunProgram () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    26 0xb7f3ddb0 in RexxLocal::runProgram () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    27 0xb7ed446e in RexxMethod::run () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    28 0xb7edddd6 in RexxObject::messageSend () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    29 0xb7f3229f in RexxSendMessage () from

    /opt/ooRexx/lib/ooRexx/librexx.so.3

    30 0xb7f1b1b3 in RexxStart () from /opt/ooRexx/lib/ooRexx/librexx.so.3

    31 0x08048996 in ?? ()

    32 0xb7c1f87c in __libc_start_main () from /lib/libc.so.6

    33 0x080487c1 in ?? ()

    (gdb) quit

     
  • Ari Uniksoki

    Ari Uniksoki - 2007-01-28

    Logged In: YES
    user_id=1676439
    Originator: YES

    This bug has been fixed - thanks for the help!

     


Anonymous

Cancel  Add attachments





Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks