SourceForge has been redesigned. Learn more.
Close

#48 Server instance controller process

open
None
5
2003-05-31
2003-05-31
No

Since some time we have added an option to nsd to fork
twice instead of once when putting itself in the background.
This way we have created one additional server instance
which is now able to monitor the actual server and restart
it in the case of failure. This has proven very convenient,
yet simple to do. Far more simple than fiddling with the
inittab and init machinery. It is just a matter of giving one
more startup option to the command starting the server.

Now, I'm being tired to patch this simple change in the
nsmain.c each time and I think that this would be interesing
for other nsd users as well.

I would like to get this into the regular distribution and if there
are any voices against, please step out.
Again, this change has no implications on the regular server
operation and is conveniently selectable.

Discussion

  • Dossy Shiobara

    Dossy Shiobara - 2003-05-31

    Logged In: YES
    user_id=21885

    I think as long as this is kept as a command line switch, I think
    there's no reason not to.

    My personal solution to this is to run the nsd process from a shell
    script. In the shell script, there's a infinite while loop, and the
    body of the loop runs nsd under strace (in Linux) or truss (in Solaris).
    I do this so that it can run in the background (IOW, avoiding having to
    use -f in the script) so that nsd logs to log/server.log instead of
    the console (then having to redirect it in the shell script back into
    log/server.log).

    When the nsd exits (dies gracefully, or segfaults, or whatever) control
    resumes in the shell script. So, when I shut down nsd, what I do is rm
    the log/nspid.${servername} file before issuing the nsd -K to shut it
    down. When strace/truss exit, the shell script looks to see if the
    pidfile exists. If it does, then nsd exited abnormally, and the while
    loop starts over and re-runs nsd. If the file doesn't exist, a break is
    issued and the shell script terminates.

    This might feel kludgy to some folks, but I've been running webservers
    like this for ages -- I believe Netscape Enterprise Server even ships
    with its start/stop scripts doing this! Of course, they also have some
    kind of watchdog process that runs as well, but I'm not 100% sure what
    that process actually does ... ;-)

    As I said, I think implementing this directly in code with a
    user-selectable command line switch might be really convenient. But,
    it's obviously not necessary to achieve the same results ... as you can
    do it simply with a 8 line shell script.

    -- Dossy

     
  • Andrew Piskorski

    Logged In: YES
    user_id=43168

    This is still a good idea, and according to the thread
    below, it's also something that OpenACS could definitely
    use, now that they're planning to move away from daemontools
    to an init script based approach:

    http://openacs.org/forums/message-view?message_id=279841

    Zoran, I don't think you ever commited this. Will you do so
    please?

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086

    Attached is a patch to be applied against he current
    CVS head. It will add the watchdog functionality to nsd.
    Use "-w" option of the command line to activate it.

    The [ns_shutdown] command is extended to allow for
    one optional argument: "-restart". This will instruct the
    watchdog to restart the server. If the watchdog is not
    started (i.e. no "-w" argument was used when starting server)
    the [ns_shutdown -restart] will throw error.

    Operation is simple. Watchdog sits above the server process
    and restarts it in case it exited with exit code != 0 or if
    signalled with signal other than SIGTERM. It also logs events
    to the system syslog facility (see SysLog() function).
    To stop the watchdog/server tandem, you either send
    SIGTERM to watchdog *or* server process (i.e. do 'kill <pid>')
    or have the server process call exit(0) at some point.

    Suggestions/comments are welcome. I'll commit this
    unless I hear some very good reasons against.

     
  • Tom Jackson

    Tom Jackson - 2005-03-19

    Logged In: YES
    user_id=661593

    Why not do it? Can you explain how the watchdog works and how do
    you start/stop/restart/signal AOLserver with the watchdog running?

    Although I was the first to publicize the use of daemontools to control
    AOLserver, and find the amount of control and security available
    unmatched, anything is better than twiddling with the inittab file for a
    user process. In situations where you don't have root access this will
    be a good addition.

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086

    To start with with watchdog:

    bin/nsd -w -t myconfig.tcl

    This will create two processes:
    # ps -ef | grep nsd
    zoran 23711 1 0 17:10 ? 00:00:00 bin/nsd -w -t
    myconf.tcl
    zoran 23712 23711 0 17:10 ? 00:00:00 bin/nsd -w -t
    myconf.tcl

    The one with ppid of 1 is the watchdog. The other is server.
    To stop:
    kill `cat log/nspid.server1`

    this will send SIGTERM to the server and both server and
    watchdog will exit.

    If you:
    kill -9 `cat log/nspid.server1`
    the watchdog will restart the server again (observe also
    the syslog file /var/adm/messages on Linux). The same
    will happen if the server cores.

    You can also:

    kill <watchdog-pid>

    and both server and watchdog will exit but this is just a
    convenience.

     
  • Tom Jackson

    Tom Jackson - 2005-03-19

    Logged In: YES
    user_id=661593

    The only thing I noticed is that you wrapped syslog into the same
    service. I know this is part of what you need, but is it a feature that
    should be part of a watchdog switch?

    There are also a number of hard coded #defines used in the patch.

    This points to a real, but unaddressed problem with AOLserver. The
    architecture is wonderful after startup has taken place, modules work
    great. However you are left to muck around with the hard coded startup
    script to add any basic functionality. Wouldn't it be nice if you had a
    command line switch to load basic modules prior to initialization from a
    config script, or a switch to load a file, similar to the -B switch? For one,
    many nsd/*.c files are really mini-modules. One example being
    config.c. Unfortunately config.c cannot be separated because it is used
    immediately upon startup. A watchdog module and syslog module are
    other examples of what could be mini-modules. I don't know if syslog
    could be moved into a after startup module, but I'm guessing not.

    I still don't object to the patch for reasons stated in my last comment: it
    allows easy user mode control/watchdog without root access.
    However, I guess if you grab port 80 you still need to be root, or sudo?
    How are you handling this situation?

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086

    Apropos syslog: yes, I think this is very valuable. But not
    *really* needed, of course. Does it hurt getting more
    info in syslog file? I do not think so.

    Apropos "loading mini-modules": I did not want to put any
    other functionality in watchdog. Not even load Tcl or such.
    It is designed to be simple. Yes, #define's are not that
    elegant, but I did not have any other idea how to pass them.
    Over the command line? It becomes very complicated.
    Over the config file? I need to load Tcl to parse config file, so
    I scraped that. Well, if you have a better idea, yell :)

    Apropos "grab port 80": yes you need be root. Therefore I
    start watchdog before binding. So the watchdog will run as
    root whereas the server process will run as given user.
    Hence the decision to KISS (keep it simple and stupid) and not
    muck up with Tcl or anything else.

     
  • Tom Jackson

    Tom Jackson - 2005-03-19

    Logged In: YES
    user_id=661593

    Okay, so I'll stick my neck out and admit ignorance in a lot of areas
    here. For instance, you now have a continuously running process,
    running as root. I don't know if this is a security problem or not. Second,
    one thing I like about djb's svc is that he guarantees that with each
    restart the process gets the same 'clean' process state. I'm not sure
    that this is happening here, since somehow the restart doesn't fork
    another watchdog, so there must be some state that is maintained?

    I wasn't suggesting you put the mini-modules idea into your patch, only
    that it is impossible with the current setup to provide customizations
    early in server startup.

    How does your setup work at machine startup and shutdown?

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086

    Apropos security: watchdog has no connections to outer
    world. You can't do very much with it if you're not root.
    And enven then, its only task is to monitor one process
    and restart it in case it breaks.

    Apropos "clean process state": watchdog is a separate
    process. All it encompasses is nsd, libnsd and libnsthread.
    If you change one of those during runtime: bang. Otherwise
    the server process loads whatever modules it needs and it is
    hence "clean" so to speak.

    Apropos "machine startup": this is entirely out of the scope.
    Usual start-stop machinery has to be installed. The watchdog
    is only conveniently monitoring your server and restarts it
    if needed. You can go and tweak all this with external shell
    scripts, daemontools and whatnot... The watchdog is just
    more convenient: do "-w" on startup and relax. This is no
    rocket science. It is done out of the necessity to make
    installation/management easy. You can really solve this
    issue in a myriad of ways. This is one of them.

    Why don't you patch your copy of AS and try it for yourself?

     
  • Tom Jackson

    Tom Jackson - 2005-03-19

    Logged In: YES
    user_id=661593

    Why don't you patch your copy of AS and try it for yourself?

    I use daemontools. I know how it works, and it seems to provide the
    easiest to use interface. It also seems to provide all the features of this
    patch and more. However, you have to be root to install daemontools.
    Once it is installed, and the server is setup with the right permissions,
    any user of a specific group can control the process, without the need
    for members of this group needing even read access to the startup
    script.

    I'm still wondering about syslog. The patch seems great for a local user
    who wants to use a high port, but you need root for lower ports. To use
    syslog, you still need root, I think. So why not just get root so set you up
    something else, like sudo of a shell script?

     
  • Jeff Davis

    Jeff Davis - 2005-03-19

    Logged In: YES
    user_id=287865

    The reason I would like to see this in is that we want to make
    AOLServer more palatable to the various distributions as a
    standard
    service. daemontoools is great and I will keep using it but
    it's
    never going to come installed on all distributions (thanks djb!)

    Obviously it's easy enough to do from a shell script as
    Dossy says but
    some of the signal handling stuff and the ability to easily
    do an
    ns_shutdown -restart in the code is nice.

    Also I noticed looking at the patch that we do not do a a couple
    things I was taught a daemon should do (the stevens "Advanced
    Programming in the UNIX Environment" book says you should call
    setsid(); chdir("/"); umask(0);) We do setsid() but not the
    chdir (/)
    (so you wont prevent any filesystems from being unmounted), nor
    umask(0). Maybe we should consider these at the same time?
    (It's all
    ch.13 in the Stevens book, btw).

     
  • Dossy Shiobara

    Dossy Shiobara - 2005-03-19

    Logged In: YES
    user_id=21885

    I'm very busy this weekend, so please give me some time
    (until next week) to dig up the shell script which I used in
    a production environment (not at AOL) for almost 4 years to
    run ~60 Netscape Enterprise Servers with great success.
    I'll publish the shell script and some brief documentation
    on how to use it, so that people can try it out and see how
    it works.

    Then, lets examine the gap of what has already been and/or
    could additionally be implemented in the shell script vs.
    what must go into the nsd, and decide how to proceed.

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086

    Respected collegues,

    As I said: you can do this trick with a shell,
    with daemontools and with all other kind of whatever...

    Nobody said it can't be done. The question is how
    you package it and how (easy) is to use it, how
    it integrates in your environment, etc, etc...

    Dossy,
    There is absolutely *no* need to go and show the usage
    with the shell script. I (everybody) knows it can be
    done. The question is: either you have something built
    in for immediate use or you don't. That is: either it
    is there, and you use it, or it is not there, and you
    have to cook your own solution.
    I personnaly have no attitude or vested interest to
    make this in the server distro. I'm maintaining my
    own code copy anyways hence I really do not care.
    The thing is that some people find this a good idea
    and they asked me to bring it in. Well, I tried...

    To my undserstanding: either people want this or not.
    Patch is here. Do whatever you want with it.

    Jeff, your hints about umask/chdir for watchdog are
    fine. I'll look into this...

     
  • Andrew Piskorski

    Logged In: YES
    user_id=43168

    Zoran, using your in-process Watchdog, how do you simply
    RESTART the server without kill -9'ing it? By running the
    "ns_shutdown -restart" from inside AOLsever? Is there some
    way to send it a "please restart" signal from OUTSIDE
    AOLserver? If there is (or that can be easily added), then
    that would be particularly cool, as it means this
    functionality could entirely replace using /etc/inittab.

    Jeff, I will have to look up Stevens' daemon
    recommendations, I'm not familiar with those. But these
    setsid(), chdir(), and umask() issues apply across the board
    to AOLserver, not just to this Watchdog patch, right?

    Dossy, contrary to Zoran, I would like to see that shell
    script of yours. I strongly suspect that putting the
    Watchdog funcionality into AOLserver itself is going to be
    the simpler and more robust cross-platform way to go (think
    Windows support, for example), but it would be useful to
    compare the two approaches. (Then at least we'd be
    comparing apples to apples, this C patch which Zoran has
    used in Productin for years, and your shell script which was
    also heavily used for real.) Quite possibly, we should
    start out by providing BOTH approaches stock with AOLserver.

     
  • Zoran Vasiljevic

    Logged In: YES
    user_id=95086

    To restart the server from "inside",
    do "ns_shutdown -restart".

    To restart the server from "outside"
    you can send the server the SIGINT.

     

Log in to post a comment.