#471 a cleanup and overhaul of jboss_init_redhat.sh

open
nobody
None
5
2004-03-09
2004-03-09
H X Pearlmutter
No

DETAILED DESCRIPTION

Background & symptoms
-------------------------
jboss_init_redhat.sh hasn't been maintained in a long
time.

It never worked reliably, especially for shutdown.

Although it did an ok job for simple manual start, and
maybe ok for stop if conditions were right, the original
jboss_init_redhat.sh script was never properly set up to
participate in the unix RC (run control) system.

To participate in RC, use of the /var/lock mechanism is
mandatory.

When run levels are changed, the script needs to be
called from:

# First, run the KILL scripts.
for i in /etc/rc$runlevel.d/K* ; do
check_runlevel "$i" || continue

# Check if the subsystem is already up.
subsys=${i#/etc/rc$runlevel.d/K??}
[ -f /var/lock/subsys/$subsys -o -
f /var/lock/subsys/$subsys.init ] \ || continue

Worse yet, if the script is started and then the RC chgs
the run levels, then the script is invoked again..... and
tries to startup a second time & we get really hosed w
tons of "java.net.BindException: Address already in use"
msgs in the log (or worse).

~~~

Because the JVM creates a level of indirection not
present in most unix daemons, most of the normal bash
idioms for starting and stopping daemons need to be
rethought in the Java context.

(See the code to get a better idea of the PID problem.)

~~~
Another problem was that a semi-dead JBoss server
can't really be relied on to shut itself down. For example:

[root@photon init.d]# ./jboss stop
00:41:16,087 WARN [NamingContext] Failed to
connect to localhost:1099
javax.naming.CommunicationException: Failed
to connect to server localhost:1099. Root exception is
javax.naming.ServiceUnavailableException:
Failed to connect to server localhost:1099. Root
exception is
java.net.ConnectException: Connection refused
....
at javax.naming.InitialContext.lookup
(InitialContext.java:347)
at org.jboss.Shutdown.main
(Shutdown.java:180)

If you can't get a connection with which to send the
SIGHALT, then there is no internal way to recover.

In general, there is no way to solve the "kill yourself"
problem without OS cooperation... you need a
guaranteed way to fully get a hosed JBoss, JMX, &/or
JVM shutdown & cleaned up (with all sockets released,
etc)... or else you get really really hosed when you try
to start up again... and of course a good clean
shutdown is also a must for good OS citizenship, such as
for when we change runlevels...

We definitely want to have lots of eyeballs looking at
making sure the ShutdownHook in specific, and the
MBean server more generally, are debugged to the
fullest extent within the java code base, but there are a
few special things that can never be done within that
code base, and this is one of them. You can't give
yourself a head transplant, and you cant kill yourself
when you're already comatose. So this is a case where,
as a last resort, we need the OS to put us out of our
misery.

~~~

# relevant bug reports
# [ #420297 ] JBoss startup/shutdown for RedHat
# [ #546360 ] jboss_init_redhat.sh fails

For a long time, 420297 was the "oldest bug in the
database"... so obviously nobody thought it was a top
priority... so I felt I had to be the one to do it, even
though I'm about as far from a bash fan as you can get.
http://sourceforge.net/tracker/index.php?
func=detail&aid=546360&group_id=22866&atid=376685

======================================

Intended behaviour & furher development ideas
----------------------------------------------

This "jboss_redhat_rcscript.sh" script seeks to fix all the
above problems.

It is conservative, in that it's pessimistic. It beats the
dead horse way beyond what is probably necessary. But
then, we really do want to make sure it's dead, and
don't much care about a few wasted cycles in doing the
overkill.

~~~

"jboss_redhat_rcscript.sh" seeks to be a good citizen
within the linux RC mechanism, yet it maintains a java-
centric view of the world.

It also looks at the issue of 'what is the best way
to "bounce" jboss?'

Restart wont work well unless stop and start work
reliably.

telinit 4? 5? Well, now it often tries to start without prior
shutdown (see above). One of the issues is that JBoss is
started in too many runlevels by the earlier
configuration. And the script had no sense of the special
place J2EE/JBoss has in relation to the operating system.

My attitude is expressed in the comments in the code:
# These settings assume we're on a "java oriented"
(mainly j2ee) box;
# i.e., higher-level apps typically run on top of
java/jboss platform
# (and therefore don't participate in the init.d runcontrol
system)
# while other linux processes are typically lower-level
infrastructure
# (i.e., Apache & your DBMS are probably the highest
level non-java apps).
# Therefore, from a *nix point of view, we start JBoss
late & kill it early,
# and run it only in the higher runlevels.

(see script for details).

~~~

This script also attempts to begin to deal with the
question of "what if multiple JBosses are running
independently??" My solution is applicable for 1
autostart/autostop RC daemon, which will in turn control
any number of multiple separate JBoss instances running,
in separate processes.

NOTE THAT THIS DOES *NOT* PREVENT
ADDITIONAL SUBSIDIARY INSTANCES OF JBOSS; it only
integrates a single instance of JBoss into Redhat Linux's
SysV RC system, and this instance's JMX server
(potentially) can in turn spawn & manage other JBoss
instances.

If we want multiple independent JBoss instances to be
separately managed in the Linux RC system, then this
script will have to be substantially expanded, so that the
PID for each JBoss instance is reliably and independently
managed.

======================================

Request for review & testing
---------------------------
It would be good for a bash expert who's intimate with
PIDs to review my work (beware that many standard PID
assumptions dont apply here due to the uncommon way
a JVM relates to Unix), and it'd be nice to have some
people running other flavors of Redhat to try it out...
and then maybe some folks who want to adapt it to
other non-RH flavors of Linux, and other Sys-V style
*nixs.

This needs to be tested both on RH9 and pre-RH9.
Notice that RH9's new NPTL marks a big difference in the
area of processes and threads, so we'd expect different
kinds of gotchas on RH9 vs RH7.x

-------------------------------------------------
-- hxp
(Howard Pearlmutter)

Discussion

  • overhaul of jboss/bin/jboss_init_redhat.sh