While trying to set up a testcase that can replicate the problems I have been reporting, the enclosed self-contained test-case reliably evokes a runtime problem which causes performance to drop significantly.
Here the brief description from the 'readme.txt' file that explains the setup and purpose of the different files:
Abmysal performance creating objects when running multihreaded Rexx programs on multiple Rexx instances, once Rexx instances get terminated!
This test application demonstrates that after terminating the two additional Rexx interpreter instances (RII) the creations of rgfTest objects on a separate thread (and also their uninits) all of a sudden drops to an abmysal peformance!
At one occasion a crash happened, which might help shed some light to the problem, so I enclosed the crash'es stack trace together with the local window data for the crash position.
Anonymous
I have replicated this bug on macOS, to try it out download&unpack the attached zip for MAC and read readme_mac.txt
I do not see the "abysmal performance" mentioned the test cases run at good speed but I DO get the crash.
Running pgm_02 I do not get a crash but several errors, PLEASE look at my question below
First Error
Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
at sendto (in /usr/lib/system/libsystem_kernel.dylib)
(Which is in the kernel of macOS as it seems)
by this chain of calls
SysSocketConnection::write(void, unsigned long, unsigned long)
SysSocketConnection::write(void, unsigned long, void, unsigned long, unsigned long)
ServiceMessage::writeMessage(SysClientStream&)
ClientMessage::send(SysClientStream)
ClientMessage::send()
LocalAPIManager::establishServerConnection()
LocalAPIManager::initProcess()
LocalAPIContext::getAPIManager()
RexxCreateSessionQueue
all in librexxapi.5.0.0.dylib
QUESTION:
Looking into the source code of SysSocketConnection in SysCSStream.cpp The definition is:
bool SysSocketConnection::write(void buf, size_t bufsize, size_t byteswritten)
i.e. it takes 3 arguments and I see one call with 5 arguments, which looks weird. Can this be the cause of the first error? same applies for "bug2a"
-- Begin Valgrind --
Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==2020== at 0x100727FA6: sendto (in /usr/lib/system/libsystem_kernel.dylib)
==2020== by 0x100349CD7: SysSocketConnection::write(void, unsigned long, unsigned long) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x100349D77: SysSocketConnection::write(void, unsigned long, void, unsigned long, unsigned long) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x100349636: ServiceMessage::writeMessage(SysClientStream&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x10033CB76: ClientMessage::send(SysClientStream) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x10033CA77: ClientMessage::send() (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x10033D033: LocalAPIManager::establishServerConnection() (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x10033CF06: LocalAPIManager::initProcess() (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x10033CDC3: LocalAPIManager::getInstance() (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x10033CC08: LocalAPIContext::getAPIManager() (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x1003461F2: RexxCreateSessionQueue (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexxapi.5.0.0.dylib)
==2020== by 0x1001C16B2:
-- End Valgrind --
First warning (in 1st thread running libtest_gc.dylib"
warning: no debug symbols in executable (-arch x86_64)
Maybe more info can be harvested by adding debug symbols to libtest_gc
-- Begin Valgrind --
Interpreter::startInterpreter(Interpreter::InterpreterStartupMode) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== Address 0x7fff5fbfd3f2 is on thread 1's stack
==2020== in frame #6, created by LocalAPIManager::establishServerConnection() (???:)
==2020==
--2020-- run: /usr/bin/dsymutil "/Users/po/bug3aMac/libtest_gc.dylib"
warning: no debug symbols in executable (-arch x86_64)
-- End Valgrind --
2nd Error is a memory problem
Bad permissions for mapped region at ??? in /usr/lib/libc++abi.dylib (system library)
by a long chain of commands
NativeActivation::run(ActivityDispatcher&)
Activity::run(ActivityDispatcher&
CallRoutine
all in librexx.5.0.0.dylib
RexxThreadContext_::CallRoutine(RexxRoutineObject, _RexxArrayObject)
RII_CallRoutine_impl(RexxCallContext, void, _RexxObjectPtr, _RexxArrayObject)
RII_CallRoutine
in libtest_gc.dylib
NativeActivation::callNativeRoutine(RoutineClass, NativeRoutine, RexxString, RexxObject, unsigned long, ProtectedObject&)
NativeRoutine::call(Activity, RoutineClass, RexxString, RexxObject, unsigned long, ProtectedObject&)
RoutineClass::call(Activity, RexxString, RexxObject, unsigned long, ProtectedObject&)
PackageManager::callNativeRoutine(Activity, RexxString, RexxObject, unsigned long, ProtectedObject&)
SystemInterpreter::invokeExternalFunction(RexxActivation, Activity, RexxString*, RexxObject, unsigned long, RexxString*, ProtectedObject&)
in librexx.5.0.0.dylib
-- Begin Valgrind --
==2020== Process terminating with default action of signal 11 (SIGSEGV)
==2020== Bad permissions for mapped region at address 0x100E02C28
==2020== at 0x100E02C28: ??? (in /usr/lib/libc++abi.dylib)
==2020== by 0x100136792: NativeActivation::run(ActivityDispatcher&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x100167CBD: Activity::run(ActivityDispatcher&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x10010AA7C: CallRoutine (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x103EE8785: RexxThreadContext_::CallRoutine(RexxRoutineObject, _RexxArrayObject) (in /Users/po/bug3aMac/libtest_gc.dylib)
==2020== by 0x103EE864D: RII_CallRoutine_impl(RexxCallContext, void, _RexxObjectPtr, _RexxArrayObject) (in /Users/po/bug3aMac/libtest_gc.dylib)
==2020== by 0x103EE85AB: RII_CallRoutine (in /Users/po/bug3aMac/libtest_gc.dylib)
==2020== by 0x100135C86: NativeActivation::callNativeRoutine(RoutineClass, NativeRoutine, RexxString, RexxObject, unsigned long, ProtectedObject&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x100139E75: NativeRoutine::call(Activity, RoutineClass, RexxString, RexxObject, unsigned long, ProtectedObject&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x1000E3CCD: RoutineClass::call(Activity, RexxString, RexxObject, unsigned long, ProtectedObject&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x100162412: PackageManager::callNativeRoutine(Activity, RexxString, RexxObject, unsigned long, ProtectedObject&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
==2020== by 0x1001B8150: SystemInterpreter::invokeExternalFunction(RexxActivation, Activity, RexxString*, RexxObject, unsigned long, RexxString*, ProtectedObject&) (in /Library/Frameworks/ooRexx.framework/Versions/A/Libraries/librexx.5.0.0.dylib)
The crash can be seen running valgrind rexx pgm_01.rex
All errors as in pgm_02.rex
In addition a crash occurs the first time a 2nd thread is created in the same manner as for bug2a
Thread 2:
Invalid read of size 4
at 0x10087D899: _pthread_body (in /usr/lib/system/libsystem_pthread.dylib)
by 0x10087D886: _pthread_start (in /usr/lib/system/libsystem_pthread.dylib)
by 0x10087D08C: thread_start (in /usr/lib/system/libsystem_pthread.dylib)
Address 0x18 is not stack'd, malloc'd or (recently) free'd
This bug is related to
1449 2a Crash while executing createStackFrame()
1450 3a Performance drops suddenly (once even a crash)
1459 6a Test case causing crashes, giving different stack traces
1461 8a Crash in "MemoryObject::markObjectsMain" et.al.
1462 8b Memory leak when creating many Rexx interpreter instances
1463 8c Using multiple Rexx interpreter instances causing crashes
1464 8d Further crashes with a slightly changed abc.cls
I have added a trace with more advanced trace options but the conclusion is the same, the programs crash at the instance (no pun intended) the 2nd thread is created
Last edit: Per Olov Jonsson 2017-10-14
Here one more trace file with these options
valgrind --track-origins=yes -v --trace-children=yes --leak-check=full --leak-resolution=high rexx <pgm></pgm>
P.O. thanks for your help, though I must say, I believe you'd need a DEBUG (or RELWITHDEBINFO (1)) ooRexx build and run it in the debugger to be able to figure out the root cause of this issue
(1) often issues which are easily reproduced with a RELEASE build, don't surface in a DEBUG build. In such a case the only alternative is a RELWITHDEBINFO build (which is more difficult to debug)
Hello Erich and thank for you reply!
Indeed I have considered (and still consider) to create a build of my own, I have downloaded the complete trunk (I think that is the correct term?) oorexx-code-0 yesterday to start looking at the routines, methods etc that are thrown up by Valgrind. I will look at the WIKI to see if I can pull it of to MAKE my own build otherwise I will seek your guidance again.
The compiler for Mac, Clang/LLVM seems to have some really great features for debugging. Xcode, the programming environment delivered with macOS uses that, but I might use it from the command line as I am used to do things the hard way:-).
I will grind through all of the related bug reports from Rony and generate MAKE files for Mac for all of them, Don’t spend any more time on them unless I mention something I did not see before. All these bugg reports from Rony seems to boil down to the same thing, at the very instance a 2nd interpreter instance is created the program crashes.
Hälsningar/Regards/Grüsse,
P.O. Jonsson
oorexx@jonases.se
Von mein MacBookPro gesendet
Hi P.O.,
when trying to build, I'd really appreciate if you could start with our existing Cmake build setup, make any modifications necessary for Darwin regarding e. g.
As of now, we don't have this. Rony's builds are custom-made "make" builds, and the Darwin build running on our Jenkins build machine (netrexx.org/jenkins/) doesn't make its build rpm's publicly available
If you'd be willing to set up your Mac to run as a Jenins slave to our biuld machine, I'd offer any help any help I can provide
Related
Bugs:
#1476Hello Erich,
I am quite close to have my own build, I am documenting what I do as I go along, hopefully it can be added to the WIKI later. I guess still some work is necessary to prepare a standalone installer (I have no knowledge in that corner) but one step at the time. I am in Spain on holidays and do this using my telephone as a bridge to the internet so only very little time occasionally to do something. Next week I am in Berlin and can set up one of my Macs as a slave, I trust I can come back to you for directions?
Most of the tools necessary (like SVN) are already present on the Mac so only problem now is the cmake settings (and nmake)
I have come so far as this and is experimenting with the settings
cmake -G "Unix Makefiles" /volumes/"Macintosh HD"/Users/po/oorexxsvn/main/trunk
CMake Deprecation Warning at CMakeLists.txt:43 (cmake_policy):
The OLD behavior for policy CMP0010 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
-- The C compiler identification is AppleClang 9.0.0.9000037
-- The CXX compiler identification is AppleClang 9.0.0.9000037
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
<snip></snip>
CMake Warning (dev):
Policy CMP0042 is not set: MACOSX_RPATH is enabled by default. Run "cmake
--help-policy CMP0042" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
MACOSX_RPATH is not specified for the following targets:
orxclassic
orxclassic1
orxexits
orxfunction
orxinvocation
orxmethod
wpipe1
wpipe2
wpipe3
This warning is for project developers. Use -Wno-dev to suppress it.
-- Generating done
-- Build files have been written to: /Users/po/oorexxbuild
But I have not had time to test the build, also I cannot judge the seriousness of these warnigns/errors. I will continue to experiment, all help is welcome.
These are the options I have
Generators
Unix Makefiles = Generates standard UNIX makefiles.
Ninja = Generates build.ninja files.
Xcode = Generate Xcode project files.
CodeBlocks - Ninja = Generates CodeBlocks project files.
CodeBlocks - Unix Makefiles = Generates CodeBlocks project files.
CodeLite - Ninja = Generates CodeLite project files.
CodeLite - Unix Makefiles = Generates CodeLite project files.
Sublime Text 2 - Ninja = Generates Sublime Text 2 project files.
Sublime Text 2 - Unix Makefiles
= Generates Sublime Text 2 project files.
Kate - Ninja = Generates Kate project files.
Kate - Unix Makefiles = Generates Kate project files.
Eclipse CDT4 - Ninja = Generates Eclipse CDT 4.0 project files.
Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files.
KDevelop3 = Generates KDevelop 3 project files.
KDevelop3 - Unix Makefiles = Generates KDevelop 3 project files.
I have no idea what the other optionas are but I guess Xcode might be good for MAC since that is the standard programming environment/debugger.
If you want to look at the complete info from the cmake run let me know, I have saved it.
Hälsningar/Regards/Grüsse,
P.O. Jonsson
oorexx@jonases.se
Von mein MacBookPro gesendet
Dear Erich,
I have had some time again and got one step further.
My first goal is to make a running, standalone version of ooRexx from scratch, after that I will try to make an installer.
Problem 1: There are a number of warnings from Cmake that I do not really know how to fix.
Problem 2: running make I get several warnings and in the end 2 errors, probably as a result of 1 above?
Remember I am doing this for the absolute first time so things that are obvious to you may not be obvious to me. Given that I cannot change anything in the source, where should I do the modifications to make this work? I will read your hints below in detail (again)
I have CCed Rony since I hope he might have some intel to share (that would be much appreciated)
Here is the shortened version of my trial, I have enclosed the entire journey as a text file:
CMake Warning (dev):
Policy CMP0042 is not set: MACOSX_RPATH is enabled by default. Run "cmake
--help-policy CMP0042" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
MACOSX_RPATH is not specified for the following targets:
orxclassic
orxclassic1
orxexits
orxfunction
orxinvocation
orxmethod
wpipe1
wpipe2
wpipe3
This warning is for project developers. Use -Wno-dev to suppress it.
-- Generating done
-- Build files have been written to: /Users/po/oorexxbuild/release
POs-MacBook-Pro:release po$ make
[ 9%] Building CXX object CMakeFiles/rexxapi.dir/common/platform/unix/SysThread.cpp.o
/Users/po/oorexxsvn/main/trunk/common/platform/unix/SysThread.cpp:90:32: warning:
unknown warning group '-Wreturn-local-addr', ignored
[-Wunknown-warning-option]
pragma GCC diagnostic ignored "-Wreturn-local-addr"
1 warning generated.
[ 66%] Building CXX object CMakeFiles/rexx.dir/interpreter/platform/unix/SysActivity.cpp.o
/Users/po/oorexxsvn/main/trunk/interpreter/platform/unix/SysActivity.cpp:172:32: warning:
unknown warning group '-Wreturn-local-addr', ignored
[-Wunknown-warning-option]
pragma GCC diagnostic ignored "-Wreturn-local-addr"
1 warning generated.
[ 67%] Building CXX object CMakeFiles/rexx.dir/interpreter/platform/unix/SysFileSystem.cpp.o
/Users/po/oorexxsvn/main/trunk/interpreter/platform/unix/SysFileSystem.cpp:174:12: warning:
'tmpnam' is deprecated: This function is provided for compatibility
reasons only. Due to security concerns inherent in the design of
tmpnam(3), it is highly recommended that you use mkstemp(3) instead.
[-Wdeprecated-declarations]
return tmpnam(NULL);
^
/usr/include/stdio.h:275:1: note: 'tmpnam' has been explicitly marked deprecated
here
deprecated_msg("This function is provided for compatibility reasons on...
^
/usr/include/sys/cdefs.h:180:48: note: expanded from macro '__deprecated_msg'
#define __deprecated_msg(_msg) __attribute((deprecated(_msg)))
^
1 warning generated.
[ 70%] Building CXX object CMakeFiles/rexx.dir/common/platform/unix/SysThread.cpp.o
/Users/po/oorexxsvn/main/trunk/common/platform/unix/SysThread.cpp:90:32: warning:
unknown warning group '-Wreturn-local-addr', ignored
[-Wunknown-warning-option]
pragma GCC diagnostic ignored "-Wreturn-local-addr"
1 warning generated.
[ 74%] Linking CXX shared library bin/librxunixsys.dylib
ld: library not found for -lcrypt
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [bin/librxunixsys.5.0.0.dylib] Error 1
make[1]: *** [CMakeFiles/rxunixsys.dir/all] Error 2
make: *** [all] Error 2
POs-MacBook-Pro:release po$
This is where I am today.
Hälsningar/Regards/Grüsse,
P.O. Jonsson
oorexx@jonases.se
Von mein MacBookPro gesendet
Related
Bugs:
#1450Bugs:
#1476Please find attached a trace file made with debug build of ooRexx. I still get "Segmentation fault: 11" when running this test case on macOS
-Checked out revision 11317, complete
-cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=DEBUG ../../oorexxsvn/main/trunk
-make
The attached example also crashes on Linux with current trunk. From what I see I suspect a memory corruption issue. With debugging enabled, one of the threads hangs in
DeadObject::getObjectSize, without debugging I'm getting random segfaults. I'm almost certain that passing memory locations between different instances causes the problem because each instance has its own allocator. Easy solution is to avoid doing that, second solution is to make sure that pointers are not passed between different allocations, third solution is to be very careful when passing pointers. As a fail-early mechanism one could trigger a GC cycle on every thread attach/detach point.I am not particularly inclined to debugging the problem as it seems like a bad example to me. Rexx has a global interpreter lock so performance won't go up when using multiple threads - only interleaving is possible. However, it would be possible to write a serializer in the library code that makes sure only one Rexx interpreter exists at any point in time, or that the library only talks to one.
Moritz, if that's a quick fix .. what code changes would be required?
Or do you mean: don't use multiple instances?
I don't know a quick fix. Not using multiple instances might solve the problem right now.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
There is only one memory allocator per process. All objects are allocated from the same memory heap. The allocator is serialized by the use of the kernel lock that only allows one thread access to the interpreter code at one time. Objects can be shared just fine between instances.
If I had to make a guess, something during the creation of a new instance is allocating an object while it does not hold the kernel lock, resulting in two threads calling the object allocator at the same time. A lot of the crashes could be explained by that happening, but I've not been able to spot any windows in the code where that might be occurring.
Unfortunately, people are only posting the top few entries of execution stacks and generally only the thread where the failures occur. Full stacks of all of the threads might actually identify where this is happening.
Oops, didn't notice I wasn't logged in. This was from Rick.
Hi Rick, thanks for the clarification. Attached you'll find two different traces, one resulting in a segmentation fault, the other just hanging in
DeadObject. Let me know if you need more info, I can rerun things while in the office.We may have a winner. The hang trace pinpointed a place where objects were being allocated without holding the lock. This patch might fix the problem.
Hmmmm, the hang traces likely point to a bug in the TableIterator class. This iterator is used when traversing the uninit table and is intended to allow iteration with the ability to delete items without affecting the iteration. It looks like it is stuck in an infinite loop. If that occurs again, you might be able to figure out what sort of condition caused that failure. The rest of the threads look like they are in good places.
Oops, strike that, I was looking at the segv trace, not the hang.
Committed Rick's
1450.patchcode fix with revision [r11312]Moritz, can you test if that fixes the issue (or parts thereof)?
Related
Commit: [r11312]
It seems to have fixed the memory corruption issue. Now it's getting stuck in
phthread_cond_wait, but that might be a problem of the example. Nevertheless, attached are the stack traces.On Mon, Oct 16, 2017 at 7:51 AM, Moritz Hoffmann antiguru@users.sf.net
wrote:
I suspect it is. I noticed that several threads were in a guard wait state
in the hang trace.
Rick
Related
Bugs:
#1450Was able to look at the tracebacks. All of the threads but one are blocked trying to cal a guarded method on the same object. The remaining thread is in a guard wait obviously waiting for something to happen that will not occur because all of the threads are blocked.
Diff: