Patch highlights:
- Added support for 32bits PowerPC. I added this platform for my p2020.
- Service recovery like the one in Windows. Catch failures but to act differently with a failure counter.
- number of restarts restricted for a certain amount of time
- back trace in the signal catcher (it permits to enhance the PowerPC parts)
- Process ping (heartbeat) support
Some remarks on the patch:
- Makefile : needed for backtrace
- pcd/include/errlog.h : it was for our logs
- pcd/include/except.h : for backtrace and for PowerPC platform
- pcd/include/parser.h : adds for the new rules
- pcd/include/pcd_api.h : added process_ping for hung app detection
- pcd/include/pcd.h : added level in logs
- pcd/include/rules_db.h : adds for the new rules management
- pcd/include/rulestate.h : added hung state for rules
- pcd/include/timer.h : adds for start postpone and alive timeout for hung state
- pcd/src/except.c : added a switch/case on signal, the dump of backtrace, a log in file for post mortem analysis and the PowerPC registers dump
- pcd/src/failact.c : added new rules
- pcd/src/main.c: just logs
- pcd/src/parser/src/parser.c and pcd/src/parser.c : added a directory scan for *.pcd config files like xinetd and the parse of the new rules
- pcd/src/pcdapi/include/pcdapi.h : added send_process_ping and remove redundant PCD_API_REGISTER_EXCEPTION_HANDLERS
- pcd/src/pcdapi/src/Makefile : modification for ifdef
- pcd/src/pcdapi/src/pcdapi.c : added PCD_api_send_process_ping(), backtrace file and context specific platform
- pcd/src/pcd_api.c : added process ping management
- pcd/src/process.c : added context specific platform for backtrace
- pcd/src/timer.c : added timer management for failures and restart
- scripts/configs/PCDConfig.in : added 32bits PowerPC platform
And this is the header of the pcd config file:
#################################################################
# Index of the rule RULE = <GROUP>_<DESCRIPTION>[$]
# Condition to start rule, existence of one of the following
# START_COND = {NONE | FILE,[filename] | RULE_COMPLETED,[rule],..
# | NET_DEVICE,[netdev] | IPC_OWNER,[owner] | ENV_VAR,[variable,value]}
# Command with parameters
# COMMAND = <Full path> [parameters] [$variable]
# Scheduling (priority) of the process
# SCHED = {NICE,-19..19 | FIFO,1..99}
# Daemon flag - Process must not end
# DAEMON = {YES | NO}
# Condition to end rule and move to next rule, wait for:
# END_COND = {NONE | FILE,[filename] | NET_DEVICE,[netdevice]
# | WAIT,[delay] | EXIT,[status] | IPC_OWNER,[owner]}
# Timeout for end condition. Fail if timeout expires
# END_COND_TIMEOUT = {-1 | 0..99999}
# Action upon first failure, do one of the following actions upon failure Only RESTART pass to the next action.
# FAILURE_ACTION = {NONE | REBOOT | RESTART | EXEC_RULE}
# if RESTART is followed by a number, it will the number of restart allowed in FAILURE_RESET_COUNT hour(s). After this number of fail, the action will be definitively NONE.
# Action upon second failure (only if the first one is RESTART), do one of the following actions upon failure
# FAILURE_ACTION2 = {NONE | REBOOT | RESTART | EXEC_RULE}
# Action upon next failures (only if the second one is RESTART), do one of the following actions upon failure
# FAILURE_ACTION_NEXT = {NONE | REBOOT | RESTART | EXEC_RULE}
# Interval in seconds between a failure and the next failure action.
# FAILURE_INTERVAL = (0..4294967)
# Time in hours before resetting the failure counter. 0: use only the first failure action.
# FAILURE_RESET_COUNT = (0..1193)
# Active rule, start automatically or manually
# ACTIVE = {YES | NO}
# User id for the process
# USER = { UID | User name }
################################################################
Regards,
Marc