Menu

#891 rxapid service suddenly not forking on Ubuntu Server 9.10

v4.0.1
closed
5
2012-08-14
2010-03-05
mdlueck
No

I had to apply a kernel update to a box running Ubuntu Server 9.10, and it would not fully boot after that. I tracked it down to the start of the rxapid service.

I looked inside the script and found that on this version of Ubuntu it starts it via:

start_daemon /opt/ooRexx/bin/rxapi

which seems like it was not forking the process, thus would not arrive at a login prompt on tty1.

We tried various things, but finally arrived at uninstlaling the service via:

update-rc.d -f rxapid remove

and added to file /etc/rc.local

Hack to get ooRexx rxapi started async

/etc/init.d/rxapid start &

which still leaves the script hanging, but at least the box boots completely.

root 969 0.0 0.0 3048 836 ? S 18:48 0:00 /bin/bash /etc/init.d/rxapid start
nobody 971 0.0 0.0 3276 1124 ? Ss 18:48 0:00 /opt/ooRexx/bin/rxapi

It is very odd that start_daemon developed a dislike for forking processes. Suggestions?

Discussion

  • mdlueck

    mdlueck - 2010-03-05

    I made the following post to the bug report about this in the Ubuntu bug tracker. I think this is an Ubuntu bug rather than an ooRexx bug. Updating both tickets / referencing the other ticket in both places:

    "Suddenly start_daemon does not fork the new pid, hangs the script"
    https://bugs.launchpad.net/ubuntu/+source/lsb/+bug/532341

    I was able to recreate this problem in a VirtualBox test environment.

    Steps to recreate this situation:

    1) Install Ubuntu Server 9.10
    2) Apply all available updates
    3) Reboot
    4) Add openssh-server
    5) ssh to the VM test server
    6) use wget to DL the latest ooRexx for Ubuntu:
    wget http://downloads.sourceforge.net/project/oorexx/oorexx/4.0.0/ooRexx-4.0.0.i586.deb?use_mirror=cdnetworks-us-1
    7) Install said package:
    sudo dpkg -i ooRexx-4.0.0.i586.deb
    8) Reboot
    9) The rxapid process never forks, system will not complete booting on tty1

    As ooRexx 4.0 has been out for a while, it must be something in one of the latest Ubuntu updates as the package of ooRexx has not changed since it was last working correctly.

     
  • mdlueck

    mdlueck - 2010-03-05

    I guess I never rebooted this server since I had installed ooRexx.

    Now I was able to recreate this problem on Ubuntu Server 9.10 without any updates applied.

    Updating both tickets with this information.

    Works fine on other versions of Ubuntu.

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-05

    This is what you get for using Ubuntu.

    Just kidding, although I've hardly made it a secret that I don't like debian-based distributions. <grin>

    I had problems with the deb packages on installing for the 4.0.0 release. I can't remember the exact details of what I did, now, but the problem was rxapid hanging on start up. Whatever the fix was that I used, it worked on all the debian-based systems I had to test with. I don't think I had 9.10 to test on.

    The ideal solution would be to not use debian-based distributions. At least, it seems ideal to me. ;-)

    I'll poke around and see if I can recall what solutions I tried.

    --
    Mark Miesfeld

     
  • mdlueck

    mdlueck - 2010-03-05

    ooRexx v4 .deb package works fine on 9.04 for example, which has the same lsb way of starting services. In the shell script, the first option detected returns false on 9.04 and 9.10. The second find success on both 9.04 and 9.10. That second method equates to the lsb package in Ubuntu.

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-05

    Michael, turns out I had a Ubuntu 9.10 system close by.

    Both the ooRexx-4.0.0.i586.deb and the ooRexx-4.0.0.i586.debian50.deb packages work perfectly fine on the system. The both install without rxapid hanging. On both systems a reboot worked without problem. On both systems, as root, I can stop and start rxapid without any problems.

    My guess would be the debian zealots added some extra security measure for the server edition. Or it is a debian bug. You'll need to see if you can get the debian community to give you a clue as to why it doesn't work on your 9.10 version and does work on my 9.10 version.

     
  • mdlueck

    mdlueck - 2010-03-05

    Very well, I will test recreate on Ubuntu 9.10 desktop edition and compare results. Thanks much!

     
  • mdlueck

    mdlueck - 2010-03-05

    BTW: Yes, once booted I can start/stop the service. Just at boot time is does not start properly. Most odd.

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-05

    Michael, keep me posted on this. When I get a chance I'll install a sever edition of Ubuntu and see if I have a problem.

     
  • mdlueck

    mdlueck - 2010-03-05

    I installed a 9.10 desktop VM. Then added ooRexx v4 without applying the updates. I ran:

    $ sudo ps aux|grep rx
    nobody 1750 0.0 0.1 3272 776 ? Ss 14:50 0:00 /opt/ooRexx/bin/rxapi
    mdlueck 1755 0.0 0.1 3036 796 pts/0 R+ 14:50 0:00 grep --color=auto rx

    Then I rebooted and I see...

    $ sudo ps aux|grep rx
    root 1089 0.0 0.1 3040 1432 ? S 14:51 0:00 /bin/bash /etc/rc2.d/S89rxapid start
    nobody 1093 0.0 0.1 3272 1120 ? Ss 14:51 0:00 /opt/ooRexx/bin/rxapi
    mdlueck 1258 0.0 0.1 3036 792 pts/0 R+ 14:51 0:00 grep --color=auto rx

    This /etc/rc2.d/S89rxapid is what I am seeing on 9.10

    So it seems that on desktop OS's that have a GUI, you just do not notice that tty1 is missing.

    Next I did the following:

    $ sudo kill 1093
    $ sudo ps aux|grep rx
    mdlueck 1329 0.0 0.1 3036 796 pts/0 R+ 14:55 0:00 grep --color=auto rx
    $ sudo /etc/init.d/rxapid start
    Starting rxapi:
    $ sudo ps aux|grep rx
    nobody 1337 0.0 0.1 3272 772 ? Ss 14:55 0:00 /opt/ooRexx/bin/rxapi
    mdlueck 1340 0.0 0.1 3036 796 pts/0 R+ 14:55 0:00 grep --color=auto rx
    $

    Which shows that the service starts cleanly once the OS is booted. Only during the boot process is there trouble starting the service.

     
  • mdlueck

    mdlueck - 2010-03-07

    This problem also affects the fully updated version of the Lucid Alpha.

    For some reason when the operation system boots, it is unable to fork the daemon.

    If I kill the daemon, then the start script does finish / exit, which cleans up the environment.

    However manually starting stopping the service via:

    sudo /etc/init.d/rxapid start
    sudo /etc/init.d/rxapid stop

    Which is the exact same script works as expected. The script starts the daemon and exists.

    What happened with the upgrade to 9.10 that forking the daemon fails at boot-up but is able to succeed when the system is fully booted?

    Looks to me like a definite Ubuntu problem, and not with ooRexx.

    Updating both tickets with the same text.

     
  • mdlueck

    mdlueck - 2010-03-17

    I went to the support forum and inquired there, and received a prompt reply. Seems with 9.10, Ubuntu made some changes to how services start. Poster seems to think that something ooRexx depends on might not be started yet, and that is why the start script gets stuck at boot-up but runs cleanly with a fully booted system.

    The thread is here:
    http://ubuntuforums.org/showthread.php?p=8978938#post8978938

    Please advise.
    Thanks!

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-17

    Michael,

    Try editing the rxapid script after you are booted and put this in the top:

    ! /bin/bash

    The following is LSB information

    BEGIN INIT INFO

    Provides: rxapi

    Required-Start: $local_fs $network $time

    Required-Stop: $local_fs $network $time

    Default-Start: 2 3 4 5

    Default-Stop: 0 1 6

    Short-Description: start and stop rxapi daemon

    Description: rxapid provides the communication service for between all running

    ooRexx scripts

    END INIT INFO

    rxapid.sh Start/Stop the rxapi daemon.

    You just need the LSB stuff, the above has a couple of extra lines of context.

    Not sure, but maybe that Required-Start line will help.

     
  • mdlueck

    mdlueck - 2010-03-17

    I modified it accordingly, no difference.

    Running on a fully up-to-date Lucid daily desktop build.

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-17

    You could try commenting out a few things in the script and see if the system boots. I have a low expectation that this will help.

    In the start() function comment out these lines:

    if [ $is_debian_like -eq 0 ]; then
        [ $RETVAL -eq 0 ] && touch /var/lock/subsys/rxapi
    else
        [ $RETVAL -eq 0 ] && touch /var/lock/rxapi
    fi
    

    In the rhstatus() function comment out the single line:

    status @prefix@/bin/rxapi
    

    If you don't mind playing with it a little, comment out the lines in start(); try it. Comment out only the line in rhstatus(); try it. Comment out both sections; try it.

     
  • mdlueck

    mdlueck - 2010-03-19

    I got the impression from the UbuntuForums post that with this parallel starting of services change, seems like there needs to be a prioritization in the rxapid service... much like I had to on Windows add that Apache relies on RxAPI and that cured the race on Windows - no further RxAPI processes started as a non-service task.

    So what does rxapid on Linux rely on being running?

    I will check into how to prioritize services in Ubuntu 9.10 <

     
  • David Ashley

    David Ashley - 2010-03-19

    The main dependency is networking. The sockets layer must be up for rxapid to start.

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-19

    David's right, the only thing really needed is networking. Although, since rxapid does a touch:

    touch var/lock/subsys/rxapi

    it would need a file system. But the LSB updates to rxapid that David made should handle that:

    Required-Start: $local_fs $network $time

    time is probably not needed, but it shouldn't hurt. The only thing is, if you are mounting your root file system over nfs, then you need $remote_fs.

    I doubted that you were using a root file system on nfs, plus, touch should just fail. I don't see why that would hang the script. But, it's why I suggested commenting out the 'touch' lines. Did you try that?

     
  • mdlueck

    mdlueck - 2010-03-19

    I commented out the entire IF block involving the touch commands, IPL, service start script still gets stuck.

    I think I will simply try to attach the script I am running with to this ticket - I see a place to attach things below...

     
  • mdlueck

    mdlueck - 2010-03-19

    My modified rxapid script

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-19

    Michael,

    I'm going to change this to a bug. When I get a chance I'll try to install the OS you are using and see if I can figure out what is wrong. In the mean time, you may want to just go with your work-around. <grin>

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-03-28

    Michael,

    I just noticed that there is a common thread between this and:

    2946714 rpm installation hangs starting rxapi

    That bug, is for an installation on SLES, and it seems to be that start_daemon is also hanging

    Could you see if changing start_daemon to startproc solves the problem for you? Thanks.

     
  • mdlueck

    mdlueck - 2010-03-28

    Already went looking... since there was one more option after start_daemon... alas startproc is not available on standard Ubuntu.

    Just now I used the package search for parts of filenames, startproc is not in any Ubuntu package for several recent releases.

    So did SLES also implement parallel starting of services?

     
  • mdlueck

    mdlueck - 2010-03-30

    I was thinking more about this... as I listed our workaround is to remove auto-start of this service and manually start it via the /etc/rc.local script. Even there a "&" is requred as the daemon will not fork from the start script.

    I would think that networking is up by the time the OS gets to running /etc/rc.local script, correct?

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-04-14

    This turns out to be the same problem as was reported on SuSE.

    Committed revision 5765. (4.0.1 source tree.)
    Committed revision 5765. (trunk)

    Because of a problem with #defines, rxapi was being compiled for a special
    case on AIX. As a result, on SuSE the rxapi process was not being started
    correctly.

     
  • Mark Miesfeld

    Mark Miesfeld - 2010-09-08

    The fix for this item was in the 4.0.1 release.

     

Anonymous
Anonymous

Add attachments
Cancel