On the assumption that the affected site runs on Linux, one thing to
check is the "OOM Killer". When all memory and swap space is
committed and the kernel is desperate to free some memory, it will
pick a large process and terminate it to reclaim its memory. Tomcat
tends to be large and in many ways look like a good candidate for
reclamation. When the OutOfMemory Killer strikes, the process has no
warning and its logs won't show anything. I don't recall whether the
kernel logs such events. You can get some useful numbers from 'top'.
If this is happening, then you need one or more of several things:
o reduce Tomcat's memory demand significantly. There is a lot of
advice out there about sizing Tomcat's memory, some of it better
than others. If Tomcat's heap bloats up to tremendous size and
then sheds much committed space during infrequent garbage
collections, try reducing the heap size until you get small garbage
collections at intervals of a few minutes. Any more heap than that
is probably unnecessary, unless your load is *very* bursty. Small
GCs are cheap and we can afford to have them frequently.
I try to keep the heap startup size slightly larger than the
steady-state committed size and the heap maximum about twice that.
A monitor such as Psi Probe helps to get these numbers, and is good
to glance at regularly to understand loads and trends.
o migrate some other large services to some other machine, to free up
o install more memory. You can temporarily alleviate the problem
with more swap space, but swapping is slow. Normally the swap
space should be very little used. Add memory until every essential
process fits, you have a comfortable margin in free space and I/O
buffers, and swap space is mostly or entirely unused.
Other things to look for would be hardware errors that destroy the
process somehow. Is the kernel logging "oops"es, severe I/O errors,
or the like? You might have memory or interconnect issues.
Mark H. Wood, Lead System Programmer mwood@...
Machines should not be friendly. Machines should be obedient.