In the 'System and server status' module, it would be
good to be able to monitor processes which have
continually reported using more than a specified
percentage of the CPU for a defined period of time.
Ie. If pid 9999 reports more than 90% processor
utilisation over a 20 Minute period of time, then an alert
is triggered.
Logged In: YES
user_id=635054
In addition, it may be handy to be able to narrow the check
down to a particular process name or all processes EXCEPT a
particular name (Eg. only look for looping sendmail processes,
OR don't count ldap processes).
Logged In: YES
user_id=635054
I have attached a monitor which will hopefully achieve the
desired result. Perhaps it could be included in the status
module?
The relevant text values from the lang/en file is included here
<snip lan/en>
proccpu_pid=These process(s) have used more than $1% cpu
for more than $2 seconds $3
proccpu_cmd=Optional process name
proccpu_not=Include process which
proccpu_not0=Match
proccpu_not1=Don't match
proccpu_cputhresh=Processor percentage threshold
proccpu_timethresh=Time period threshold in seconds
proccpu_ecputhresh=Missing or invalid threshold percentage
proccpu_etimethresh=Missing or invalid time threshold
</snip>
Logged In: YES
user_id=635054
Mmm, this monitor might need modifed to deal with multiple
instances of the monitor with different parameters.
ie. One instance looking for 'ldap' processes using more than
50% cpu for 10 minutes and another looking for any process
using more than 80% cpu for 30 minutes.
Perhaps creating a unique file for each monitor would solve
this problem, but we would probably want to clean up this file
if the monitor instance was deleted. Maybe an hook to call an
optional entrypoint in a monitor script when it is deleted
would help with this?
Logged In: YES
user_id=635054
I have attached a new version that allows you to select
processes which do or don't match a given name AND makes
the monitor process list file unique to allow you to have
multiple instances of the monitor AND allows you to terminate
processes which may be looping according to the thresholds.
I also modified save_mon.cgi to call a delete and save
subroutine in the script (See last two subroutines in attached
file), to provide a way of cleaning up these files when you
change or delete a monitor.
I have also added two new english strings
the new lang/en secion is like so
<snip>
proccpu_pid=These process(s) have used more than $1% cpu
for more than $2 seconds $3
proccpu_cmd=Optional process name
proccpu_not=Include processes which
proccpu_not0=Match
proccpu_not1=Don't match
proccpu_cputhresh=Processor percentage threshold
proccpu_timethresh=Time period threshold in seconds
proccpu_ecputhresh=Missing or invalid threshold percentage
proccpu_etimethresh=Missing or invalid time threshold
proccpu_kill=Kill processes
proccpu_kill0=no
proccpu_kill1=yes
</snip>
Any chance of getting this monitor integrated into the system
status module?
Processor hog monitor script latest version
Logged In: YES
user_id=635054
Jamie, do you have any comments about this? Are you open
to code submissions into webmin?