I had to apply a kernel update to a box running Ubuntu Server 9.10, and it would not fully boot after that. I tracked it down to the start of the rxapid service.
I looked inside the script and found that on this version of Ubuntu it starts it via:
start_daemon /opt/ooRexx/bin/rxapi
which seems like it was not forking the process, thus would not arrive at a login prompt on tty1.
We tried various things, but finally arrived at uninstlaling the service via:
update-rc.d -f rxapid remove
and added to file /etc/rc.local
/etc/init.d/rxapid start &
which still leaves the script hanging, but at least the box boots completely.
root 969 0.0 0.0 3048 836 ? S 18:48 0:00 /bin/bash /etc/init.d/rxapid start
nobody 971 0.0 0.0 3276 1124 ? Ss 18:48 0:00 /opt/ooRexx/bin/rxapi
It is very odd that start_daemon developed a dislike for forking processes. Suggestions?
Anonymous
I made the following post to the bug report about this in the Ubuntu bug tracker. I think this is an Ubuntu bug rather than an ooRexx bug. Updating both tickets / referencing the other ticket in both places:
"Suddenly start_daemon does not fork the new pid, hangs the script"
https://bugs.launchpad.net/ubuntu/+source/lsb/+bug/532341
I was able to recreate this problem in a VirtualBox test environment.
Steps to recreate this situation:
1) Install Ubuntu Server 9.10
2) Apply all available updates
3) Reboot
4) Add openssh-server
5) ssh to the VM test server
6) use wget to DL the latest ooRexx for Ubuntu:
wget http://downloads.sourceforge.net/project/oorexx/oorexx/4.0.0/ooRexx-4.0.0.i586.deb?use_mirror=cdnetworks-us-1
7) Install said package:
sudo dpkg -i ooRexx-4.0.0.i586.deb
8) Reboot
9) The rxapid process never forks, system will not complete booting on tty1
As ooRexx 4.0 has been out for a while, it must be something in one of the latest Ubuntu updates as the package of ooRexx has not changed since it was last working correctly.
I guess I never rebooted this server since I had installed ooRexx.
Now I was able to recreate this problem on Ubuntu Server 9.10 without any updates applied.
Updating both tickets with this information.
Works fine on other versions of Ubuntu.
This is what you get for using Ubuntu.
Just kidding, although I've hardly made it a secret that I don't like debian-based distributions. <grin>
I had problems with the deb packages on installing for the 4.0.0 release. I can't remember the exact details of what I did, now, but the problem was rxapid hanging on start up. Whatever the fix was that I used, it worked on all the debian-based systems I had to test with. I don't think I had 9.10 to test on.
The ideal solution would be to not use debian-based distributions. At least, it seems ideal to me. ;-)
I'll poke around and see if I can recall what solutions I tried.
--
Mark Miesfeld
ooRexx v4 .deb package works fine on 9.04 for example, which has the same lsb way of starting services. In the shell script, the first option detected returns false on 9.04 and 9.10. The second find success on both 9.04 and 9.10. That second method equates to the lsb package in Ubuntu.
Michael, turns out I had a Ubuntu 9.10 system close by.
Both the ooRexx-4.0.0.i586.deb and the ooRexx-4.0.0.i586.debian50.deb packages work perfectly fine on the system. The both install without rxapid hanging. On both systems a reboot worked without problem. On both systems, as root, I can stop and start rxapid without any problems.
My guess would be the debian zealots added some extra security measure for the server edition. Or it is a debian bug. You'll need to see if you can get the debian community to give you a clue as to why it doesn't work on your 9.10 version and does work on my 9.10 version.
Very well, I will test recreate on Ubuntu 9.10 desktop edition and compare results. Thanks much!
BTW: Yes, once booted I can start/stop the service. Just at boot time is does not start properly. Most odd.
Michael, keep me posted on this. When I get a chance I'll install a sever edition of Ubuntu and see if I have a problem.
I installed a 9.10 desktop VM. Then added ooRexx v4 without applying the updates. I ran:
$ sudo ps aux|grep rx
nobody 1750 0.0 0.1 3272 776 ? Ss 14:50 0:00 /opt/ooRexx/bin/rxapi
mdlueck 1755 0.0 0.1 3036 796 pts/0 R+ 14:50 0:00 grep --color=auto rx
Then I rebooted and I see...
$ sudo ps aux|grep rx
root 1089 0.0 0.1 3040 1432 ? S 14:51 0:00 /bin/bash /etc/rc2.d/S89rxapid start
nobody 1093 0.0 0.1 3272 1120 ? Ss 14:51 0:00 /opt/ooRexx/bin/rxapi
mdlueck 1258 0.0 0.1 3036 792 pts/0 R+ 14:51 0:00 grep --color=auto rx
This /etc/rc2.d/S89rxapid is what I am seeing on 9.10
So it seems that on desktop OS's that have a GUI, you just do not notice that tty1 is missing.
Next I did the following:
$ sudo kill 1093
$ sudo ps aux|grep rx
mdlueck 1329 0.0 0.1 3036 796 pts/0 R+ 14:55 0:00 grep --color=auto rx
$ sudo /etc/init.d/rxapid start
Starting rxapi:
$ sudo ps aux|grep rx
nobody 1337 0.0 0.1 3272 772 ? Ss 14:55 0:00 /opt/ooRexx/bin/rxapi
mdlueck 1340 0.0 0.1 3036 796 pts/0 R+ 14:55 0:00 grep --color=auto rx
$
Which shows that the service starts cleanly once the OS is booted. Only during the boot process is there trouble starting the service.
This problem also affects the fully updated version of the Lucid Alpha.
For some reason when the operation system boots, it is unable to fork the daemon.
If I kill the daemon, then the start script does finish / exit, which cleans up the environment.
However manually starting stopping the service via:
sudo /etc/init.d/rxapid start
sudo /etc/init.d/rxapid stop
Which is the exact same script works as expected. The script starts the daemon and exists.
What happened with the upgrade to 9.10 that forking the daemon fails at boot-up but is able to succeed when the system is fully booted?
Looks to me like a definite Ubuntu problem, and not with ooRexx.
Updating both tickets with the same text.
I went to the support forum and inquired there, and received a prompt reply. Seems with 9.10, Ubuntu made some changes to how services start. Poster seems to think that something ooRexx depends on might not be started yet, and that is why the start script gets stuck at boot-up but runs cleanly with a fully booted system.
The thread is here:
http://ubuntuforums.org/showthread.php?p=8978938#post8978938
Please advise.
Thanks!
Michael,
Try editing the rxapid script after you are booted and put this in the top:
! /bin/bash
The following is LSB information
BEGIN INIT INFO
Provides: rxapi
Required-Start: $local_fs $network $time
Required-Stop: $local_fs $network $time
Default-Start: 2 3 4 5
Default-Stop: 0 1 6
Short-Description: start and stop rxapi daemon
Description: rxapid provides the communication service for between all running
ooRexx scripts
END INIT INFO
rxapid.sh Start/Stop the rxapi daemon.
You just need the LSB stuff, the above has a couple of extra lines of context.
Not sure, but maybe that Required-Start line will help.
I modified it accordingly, no difference.
Running on a fully up-to-date Lucid daily desktop build.
You could try commenting out a few things in the script and see if the system boots. I have a low expectation that this will help.
In the start() function comment out these lines:
In the rhstatus() function comment out the single line:
If you don't mind playing with it a little, comment out the lines in start(); try it. Comment out only the line in rhstatus(); try it. Comment out both sections; try it.
I got the impression from the UbuntuForums post that with this parallel starting of services change, seems like there needs to be a prioritization in the rxapid service... much like I had to on Windows add that Apache relies on RxAPI and that cured the race on Windows - no further RxAPI processes started as a non-service task.
So what does rxapid on Linux rely on being running?
I will check into how to prioritize services in Ubuntu 9.10 <
The main dependency is networking. The sockets layer must be up for rxapid to start.
David's right, the only thing really needed is networking. Although, since rxapid does a touch:
touch var/lock/subsys/rxapi
it would need a file system. But the LSB updates to rxapid that David made should handle that:
Required-Start: $local_fs $network $time
time is probably not needed, but it shouldn't hurt. The only thing is, if you are mounting your root file system over nfs, then you need $remote_fs.
I doubted that you were using a root file system on nfs, plus, touch should just fail. I don't see why that would hang the script. But, it's why I suggested commenting out the 'touch' lines. Did you try that?
I commented out the entire IF block involving the touch commands, IPL, service start script still gets stuck.
I think I will simply try to attach the script I am running with to this ticket - I see a place to attach things below...
My modified rxapid script
Michael,
I'm going to change this to a bug. When I get a chance I'll try to install the OS you are using and see if I can figure out what is wrong. In the mean time, you may want to just go with your work-around. <grin>
Michael,
I just noticed that there is a common thread between this and:
2946714 rpm installation hangs starting rxapi
That bug, is for an installation on SLES, and it seems to be that start_daemon is also hanging
Could you see if changing start_daemon to startproc solves the problem for you? Thanks.
Already went looking... since there was one more option after start_daemon... alas startproc is not available on standard Ubuntu.
Just now I used the package search for parts of filenames, startproc is not in any Ubuntu package for several recent releases.
So did SLES also implement parallel starting of services?
I was thinking more about this... as I listed our workaround is to remove auto-start of this service and manually start it via the /etc/rc.local script. Even there a "&" is requred as the daemon will not fork from the start script.
I would think that networking is up by the time the OS gets to running /etc/rc.local script, correct?
This turns out to be the same problem as was reported on SuSE.
Committed revision 5765. (4.0.1 source tree.)
Committed revision 5765. (trunk)
Because of a problem with #defines, rxapi was being compiled for a special
case on AIX. As a result, on SuSE the rxapi process was not being started
correctly.
The fix for this item was in the 4.0.1 release.