Since some time we have added an option to nsd to fork
twice instead of once when putting itself in the background.
This way we have created one additional server instance
which is now able to monitor the actual server and restart
it in the case of failure. This has proven very convenient,
yet simple to do. Far more simple than fiddling with the
inittab and init machinery. It is just a matter of giving one
more startup option to the command starting the server.
Now, I'm being tired to patch this simple change in the
nsmain.c each time and I think that this would be interesing
for other nsd users as well.
I would like to get this into the regular distribution and if there
are any voices against, please step out.
Again, this change has no implications on the regular server
operation and is conveniently selectable.
Logged In: YES
user_id=21885
I think as long as this is kept as a command line switch, I think
there's no reason not to.
My personal solution to this is to run the nsd process from a shell
script. In the shell script, there's a infinite while loop, and the
body of the loop runs nsd under strace (in Linux) or truss (in Solaris).
I do this so that it can run in the background (IOW, avoiding having to
use -f in the script) so that nsd logs to log/server.log instead of
the console (then having to redirect it in the shell script back into
log/server.log).
When the nsd exits (dies gracefully, or segfaults, or whatever) control
resumes in the shell script. So, when I shut down nsd, what I do is rm
the log/nspid.${servername} file before issuing the nsd -K to shut it
down. When strace/truss exit, the shell script looks to see if the
pidfile exists. If it does, then nsd exited abnormally, and the while
loop starts over and re-runs nsd. If the file doesn't exist, a break is
issued and the shell script terminates.
This might feel kludgy to some folks, but I've been running webservers
like this for ages -- I believe Netscape Enterprise Server even ships
with its start/stop scripts doing this! Of course, they also have some
kind of watchdog process that runs as well, but I'm not 100% sure what
that process actually does ... ;-)
As I said, I think implementing this directly in code with a
user-selectable command line switch might be really convenient. But,
it's obviously not necessary to achieve the same results ... as you can
do it simply with a 8 line shell script.
-- Dossy
Logged In: YES
user_id=43168
This is still a good idea, and according to the thread
below, it's also something that OpenACS could definitely
use, now that they're planning to move away from daemontools
to an init script based approach:
http://openacs.org/forums/message-view?message_id=279841
Zoran, I don't think you ever commited this. Will you do so
please?
Logged In: YES
user_id=95086
Attached is a patch to be applied against he current
CVS head. It will add the watchdog functionality to nsd.
Use "-w" option of the command line to activate it.
The [ns_shutdown] command is extended to allow for
one optional argument: "-restart". This will instruct the
watchdog to restart the server. If the watchdog is not
started (i.e. no "-w" argument was used when starting server)
the [ns_shutdown -restart] will throw error.
Operation is simple. Watchdog sits above the server process
and restarts it in case it exited with exit code != 0 or if
signalled with signal other than SIGTERM. It also logs events
to the system syslog facility (see SysLog() function).
To stop the watchdog/server tandem, you either send
SIGTERM to watchdog *or* server process (i.e. do 'kill <pid>')
or have the server process call exit(0) at some point.
Suggestions/comments are welcome. I'll commit this
unless I hear some very good reasons against.
Logged In: YES
user_id=661593
Why not do it? Can you explain how the watchdog works and how do
you start/stop/restart/signal AOLserver with the watchdog running?
Although I was the first to publicize the use of daemontools to control
AOLserver, and find the amount of control and security available
unmatched, anything is better than twiddling with the inittab file for a
user process. In situations where you don't have root access this will
be a good addition.
Logged In: YES
user_id=95086
To start with with watchdog:
bin/nsd -w -t myconfig.tcl
This will create two processes:
# ps -ef | grep nsd
zoran 23711 1 0 17:10 ? 00:00:00 bin/nsd -w -t
myconf.tcl
zoran 23712 23711 0 17:10 ? 00:00:00 bin/nsd -w -t
myconf.tcl
The one with ppid of 1 is the watchdog. The other is server.
To stop:
kill `cat log/nspid.server1`
this will send SIGTERM to the server and both server and
watchdog will exit.
If you:
kill -9 `cat log/nspid.server1`
the watchdog will restart the server again (observe also
the syslog file /var/adm/messages on Linux). The same
will happen if the server cores.
You can also:
kill <watchdog-pid>
and both server and watchdog will exit but this is just a
convenience.
Logged In: YES
user_id=661593
The only thing I noticed is that you wrapped syslog into the same
service. I know this is part of what you need, but is it a feature that
should be part of a watchdog switch?
There are also a number of hard coded #defines used in the patch.
This points to a real, but unaddressed problem with AOLserver. The
architecture is wonderful after startup has taken place, modules work
great. However you are left to muck around with the hard coded startup
script to add any basic functionality. Wouldn't it be nice if you had a
command line switch to load basic modules prior to initialization from a
config script, or a switch to load a file, similar to the -B switch? For one,
many nsd/*.c files are really mini-modules. One example being
config.c. Unfortunately config.c cannot be separated because it is used
immediately upon startup. A watchdog module and syslog module are
other examples of what could be mini-modules. I don't know if syslog
could be moved into a after startup module, but I'm guessing not.
I still don't object to the patch for reasons stated in my last comment: it
allows easy user mode control/watchdog without root access.
However, I guess if you grab port 80 you still need to be root, or sudo?
How are you handling this situation?
Logged In: YES
user_id=95086
Apropos syslog: yes, I think this is very valuable. But not
*really* needed, of course. Does it hurt getting more
info in syslog file? I do not think so.
Apropos "loading mini-modules": I did not want to put any
other functionality in watchdog. Not even load Tcl or such.
It is designed to be simple. Yes, #define's are not that
elegant, but I did not have any other idea how to pass them.
Over the command line? It becomes very complicated.
Over the config file? I need to load Tcl to parse config file, so
I scraped that. Well, if you have a better idea, yell :)
Apropos "grab port 80": yes you need be root. Therefore I
start watchdog before binding. So the watchdog will run as
root whereas the server process will run as given user.
Hence the decision to KISS (keep it simple and stupid) and not
muck up with Tcl or anything else.
Logged In: YES
user_id=661593
Okay, so I'll stick my neck out and admit ignorance in a lot of areas
here. For instance, you now have a continuously running process,
running as root. I don't know if this is a security problem or not. Second,
one thing I like about djb's svc is that he guarantees that with each
restart the process gets the same 'clean' process state. I'm not sure
that this is happening here, since somehow the restart doesn't fork
another watchdog, so there must be some state that is maintained?
I wasn't suggesting you put the mini-modules idea into your patch, only
that it is impossible with the current setup to provide customizations
early in server startup.
How does your setup work at machine startup and shutdown?
Logged In: YES
user_id=95086
Apropos security: watchdog has no connections to outer
world. You can't do very much with it if you're not root.
And enven then, its only task is to monitor one process
and restart it in case it breaks.
Apropos "clean process state": watchdog is a separate
process. All it encompasses is nsd, libnsd and libnsthread.
If you change one of those during runtime: bang. Otherwise
the server process loads whatever modules it needs and it is
hence "clean" so to speak.
Apropos "machine startup": this is entirely out of the scope.
Usual start-stop machinery has to be installed. The watchdog
is only conveniently monitoring your server and restarts it
if needed. You can go and tweak all this with external shell
scripts, daemontools and whatnot... The watchdog is just
more convenient: do "-w" on startup and relax. This is no
rocket science. It is done out of the necessity to make
installation/management easy. You can really solve this
issue in a myriad of ways. This is one of them.
Why don't you patch your copy of AS and try it for yourself?
Logged In: YES
user_id=661593
Why don't you patch your copy of AS and try it for yourself?
I use daemontools. I know how it works, and it seems to provide the
easiest to use interface. It also seems to provide all the features of this
patch and more. However, you have to be root to install daemontools.
Once it is installed, and the server is setup with the right permissions,
any user of a specific group can control the process, without the need
for members of this group needing even read access to the startup
script.
I'm still wondering about syslog. The patch seems great for a local user
who wants to use a high port, but you need root for lower ports. To use
syslog, you still need root, I think. So why not just get root so set you up
something else, like sudo of a shell script?
Logged In: YES
user_id=287865
The reason I would like to see this in is that we want to make
AOLServer more palatable to the various distributions as a
standard
service. daemontoools is great and I will keep using it but
it's
never going to come installed on all distributions (thanks djb!)
Obviously it's easy enough to do from a shell script as
Dossy says but
some of the signal handling stuff and the ability to easily
do an
ns_shutdown -restart in the code is nice.
Also I noticed looking at the patch that we do not do a a couple
things I was taught a daemon should do (the stevens "Advanced
Programming in the UNIX Environment" book says you should call
setsid(); chdir("/"); umask(0);) We do setsid() but not the
chdir (/)
(so you wont prevent any filesystems from being unmounted), nor
umask(0). Maybe we should consider these at the same time?
(It's all
ch.13 in the Stevens book, btw).
Logged In: YES
user_id=21885
I'm very busy this weekend, so please give me some time
(until next week) to dig up the shell script which I used in
a production environment (not at AOL) for almost 4 years to
run ~60 Netscape Enterprise Servers with great success.
I'll publish the shell script and some brief documentation
on how to use it, so that people can try it out and see how
it works.
Then, lets examine the gap of what has already been and/or
could additionally be implemented in the shell script vs.
what must go into the nsd, and decide how to proceed.
Logged In: YES
user_id=95086
Respected collegues,
As I said: you can do this trick with a shell,
with daemontools and with all other kind of whatever...
Nobody said it can't be done. The question is how
you package it and how (easy) is to use it, how
it integrates in your environment, etc, etc...
Dossy,
There is absolutely *no* need to go and show the usage
with the shell script. I (everybody) knows it can be
done. The question is: either you have something built
in for immediate use or you don't. That is: either it
is there, and you use it, or it is not there, and you
have to cook your own solution.
I personnaly have no attitude or vested interest to
make this in the server distro. I'm maintaining my
own code copy anyways hence I really do not care.
The thing is that some people find this a good idea
and they asked me to bring it in. Well, I tried...
To my undserstanding: either people want this or not.
Patch is here. Do whatever you want with it.
Jeff, your hints about umask/chdir for watchdog are
fine. I'll look into this...
Logged In: YES
user_id=43168
Zoran, using your in-process Watchdog, how do you simply
RESTART the server without kill -9'ing it? By running the
"ns_shutdown -restart" from inside AOLsever? Is there some
way to send it a "please restart" signal from OUTSIDE
AOLserver? If there is (or that can be easily added), then
that would be particularly cool, as it means this
functionality could entirely replace using /etc/inittab.
Jeff, I will have to look up Stevens' daemon
recommendations, I'm not familiar with those. But these
setsid(), chdir(), and umask() issues apply across the board
to AOLserver, not just to this Watchdog patch, right?
Dossy, contrary to Zoran, I would like to see that shell
script of yours. I strongly suspect that putting the
Watchdog funcionality into AOLserver itself is going to be
the simpler and more robust cross-platform way to go (think
Windows support, for example), but it would be useful to
compare the two approaches. (Then at least we'd be
comparing apples to apples, this C patch which Zoran has
used in Productin for years, and your shell script which was
also heavily used for real.) Quite possibly, we should
start out by providing BOTH approaches stock with AOLserver.
Logged In: YES
user_id=95086
To restart the server from "inside",
do "ns_shutdown -restart".
To restart the server from "outside"
you can send the server the SIGINT.