From: Mike C. <MF...@uk...> - 2007-12-23 16:46:42
|
Thanks for the quick response! > > OK, I have now got to the point where the finger is > > definitely pointing at > > ooRexx. If I serialize calls to RexxStart by putting a > > mutex around it, > > everything runs perfectly. If unserialized, the segmentation fault > > occurs while Rexx is starting up and before any lines in the Rexx file > > seem to be interpreted. Here's what GDB shows: > > ... > > The routine it fails in is almost trivial, but says: /* This method > > should be called within a critical section */ and there are various > > things in RexxInitialize that probably do the right thing > > (this works fine > > in Windows) but they depend on conditional code (#if SHARED > > and the like). > > So I'm wordering if the .deb package was built without some vital > > compile-time option? > > Mike, > > That is possible. But, Ubuntu is Linux and the compilation is done > using the standard configure mechanism and gcc. So, in theory, it > should be compiled the same as if it were compiled on Fedora or SuSE. > > That doesn't mean that there isn't a problem with a compile-time > option of course. Just that I think the error will also show up if > you were working on a SuSE or Fedora system. Absolutely -- I assume that it would, on any Linux. [I should have said: I wonder if the Linux packages(s) ...] This looks like exactly the same problem I had when porting to the Nokia N800 earlier this year, and that was under Debian (on x86) and Maemo (on ARM). I ran out of time (and the tools were inadequate) then, to investigate -- but worked around it by making the web server single-threaded (which was fine for a single-user application). So this happens on at least 3 Linux environments. I have now re-ported the whole thing (with a complete review/rewrite of the threading and critsec domains) from scratch .. and the problem is the same. I have also now kept the application multi-threaded, and the only difference between 'works' and 'crashes' is a mutex lock around the RexxStart call so that calls to ooRexx are serialized. > Rick has opened up at task, I believe, to remove as much of the > conditional compile code as possible. (Tasks are where we are keeping > track of things we are planning on doing.) > > Since it looks like the segmentation fault is in ooRexx, we definitely > want to fix it. The probability is highest that Rick will fix it, but > one of these days I might fix a hard one myself. <grin> :-)) > You could open a bug in tracker. If you had some sample code that > caused the problem and attached it to the bug that would be ideal. > Almost every bug that has some simple program attached that > demonstrates the bug has been fixed in a short time. I can let you have a VMWare appliance that fails every time (it's about 2GB, zipped). Since this is a timing-related multi-thread thing, in my experience trying to make a failing small testcase takes much longer than applying some thought-experiment. I'll open a bug if that helps, but I'm happy to help out too -- I just need some suggestions to try (reverse engineering the ooRexx kernel isn't really a good way to address this kind of -- probably very simple -- problem). Mike Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU |