When running multiple Rexx interpreter instances constantly in threads, eventually a crash occurs in RexxWaitForTermination(). This API is documented to have no effect in the new (4.0) C++ Rexx APIs.
A program, TestConcurrencyCrash4.java, accompanying a bug report in BSF4ooRexx (https://sourceforge.net/tracker/index.php?func=detail&aid=3604128&group_id=400586&atid=1660873) constantly creates Rexx interpreter instances in two different Java threads and executes a Rexx program from Java, using the BSF4ooRexx external function package.
Eventually a crash occurs that Java documents in a hs_err_pid???.log file, pointing at native code in rexx.dll, denoting the RexxWaitForTermination() function.
Environment:
ooRexx 4.1.2, 32-Bit Java 1.6 latest BSF4ooRexx (from: https://sourceforge.net/projects/bsf4oorexx/files/GA/BSF4ooRexx-410.20130107-GA/) Java-testprogram: TestConcurrencyCrash4.java from the above bug report
After having BSF4ooRexx installed, and compiling of the Java-Program, one may run it using "java TestConcurrencyCrash4" from the command line to get at the crash.
Anonymous
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Created a zip-archive to even further ease debugging of this bug.
Informed the ooRexx developers via the ooRexx developer list, cf. http://sourceforge.net/mailarchive/forum.php?thread_name=51A7699B.4080603%40wu.ac.at&forum_name=oorexx-devel.
The zip-archive can be found at: http://wi.wu.ac.at/rgf/rexx/misc/bugs/bugAndCrash20130530.zip.
This crash has nothing to do with RexxWaitForTermination.
I spent quite some time looking at this and I see that at the point of the crash, an object has had some fields overlaid with bad values.
For me 90% of the time this happens in PackageClass::newRexx() with the object being the RexxString resolvedName
RexxString *resolvedName = instance->resolveProgramName(nameString, OREF_NULL, OREF_NULL);
In newRexx() the sequence goes loadRequires -> addRequires -> RexxDirectory::put() with the index being the resolvedName object.
From there it goes stringArgument() -> RexxObject::requiredString() -> RexxInternalObject::isBaseClass() -> RexxBehaviour::isPrimitive()
It crashes there with the this pointer in isPrimitive() being null. If you go back up the stack a bit, the behaviour field of the RexxInternalObject of the resolvedName object set is NULL.
When I look, the _vfptr always looks good, and usually some or all of the header fields are incorrect. Although in this example they are correct.
objectSize 0x00000000000000a0 unsigned int64
flags 0x0022 unsigned short
sizePadding 0x0000000000000022 unsigned int64
For instance, header might be:
objectSize 0x00000000000000a0 unsigned int64
flags 0x0021 unsigned short
sizePadding 0x000007fffd390021 unsigned int64
Some times the crash is not in this path, with it usually being in markObjectsMain(). It crashes at this line:
Looking at markObject in the debugger, I see many of the fields of the object with bad values.
I haven't been able to make any progress in determining when or how the resolvedName object gets overlaid.
If this is consistently the same object, then it is likely this object
needs to be protected from garbage collection after it is created.
On Sun, Jun 9, 2013 at 2:38 PM, Mark Miesfeld miesfeld@users.sf.net wrote:
Related
Bugs:
#1161Committed revision 9301. [r9301] trunk
I believe this commit fixes the specific crash initially reported here and the crash I could reproduce. Prior to the code change I could reproduce the crash 100% of the times I ran the test program, within a reasonably short period of time.
After the code change, I ran the test around 30 times and never saw that crash.
However, if the program is not stopped after some amount of time it eventually consumes all memory on my system. At which point the program needs to be halted. Out of the 30 tests, twice the test program did crash after the operating system warned that the system was unstable and could not continue. Twice the Visual Studio debugger itself crashed, and once Chrome crashed.
I think it is unreasonable to expect any program to be stable after the point the operating system warns that memory is so low the system can not continue. Therefore I don't give any significance to the 2 crashes I did see.
Related
Commit: [r9301]
Thank you very much, indeed!
Ad constantly consuming memory: I have tested over the weekend a comparable program written in pure C++, creating constantly Rexx interpreter instances to run the same Rexx program and doing a Terminate() thereafter. This program runs forever and stable (had to even limit the loop to 750,000 Rexx interpreter creations :) ) !
So over the weekend I have been trying to look into the BSF4ooRexx framework to find a reason why the memory consumption constantly is raised there, running the same Rexx program with the same logic. I think I found the culprit in this case being in BSF4ooRexx.cc (it seems that it is not really terminating a Rexx interpreter instance in the native code under certain conditions), which I further need to debug.
So to make a long story short: I think the problem with running out of system resources is not linked to ooRexx, but to BSF4ooRexx. Will update the tracker items once I know for sure.
Committed revision 9302. [r9302] 4.1.3 branch
Committed revision 9303. [r9302] 4.1 fixes branch
This bug is fixed in the ooRexx 4.1.3 release.