|
From: Anders W. <and...@er...> - 2013-05-31 11:24:38
|
Summary: osaf: Add time supervision of opensaf_reboot [#437]
Review request for Trac Ticket(s): 437
Peer Reviewer(s): Ramesh
Pull request to:
Affected branch(es): default
Development branch: default
--------------------------------
Impacted area Impact y/n
--------------------------------
Docs n
Build system n
RPM/packaging n
Configuration files y
Startup scripts y
SAF services n
OpenSAF services n
Core libraries y
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
---------------------------------------------
changeset d72e135fa6ad35adb9389f1872d84cdbdaf34bfb
Author: Anders Widell <and...@er...>
Date: Fri, 31 May 2013 12:55:28 +0200
osaf: Add common configuration file to configure reboot supervision [#437]
The opensaf reboot supervision time is configured using the environment
variable OSAF_REBOOT_SUPERVISION_TIME. This variable is used by the library
function opensaf_reboot(), as well as by the shell script opensaf_reboot.
Since the library function and the shell script can be called by any
service, the configuration variable must be set in all services. Hence, it
is put in a common configuration file.
changeset 7dc3c80d2539f0c10b749337e73c420ff756a43a
Author: Anders Widell <and...@er...>
Date: Fri, 31 May 2013 13:04:39 +0200
osaf: Add time supervision of opensaf_reboot [#437]
Add a time supervision of the library function opensaf_reboot() as well as
the shell script opensaf_reboot. If the reboot has not happened before the
timeout, the OS is rebooted hard using the SysRq trigger /proc/sysrq-
trigger. This makes it possible to reboot the node also when the system is
in a very bad state, for example when fork() fails because the system is out
of resources (no free memory, process table full etc.). It also handles the
case when the ordinary reboot command hangs trying to sync the file system,
for example due to a disk or NFS problem.
Complete diffstat:
------------------
Makefile.am | 3 +++
config/common.conf | 16 ++++++++++++++++
opensaf.spec.in | 1 +
osaf/libs/core/include/ncssysf_def.h | 2 +-
osaf/libs/core/leap/sysf_def.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
osaf/services/infrastructure/dtms/scripts/osaf-dtm.in | 1 +
osaf/services/infrastructure/fm/fms/scripts/osaf-fmd.in | 1 +
osaf/services/infrastructure/rde/scripts/osaf-rded.in | 1 +
osaf/services/saf/avsv/amfwdog/scripts/osaf-amfwd.in | 1 +
osaf/services/saf/avsv/avd/scripts/osaf-amfd.in | 1 +
osaf/services/saf/avsv/avnd/scripts/osaf-amfnd.in | 1 +
osaf/services/saf/clmsv/clms/scripts/osaf-clmd.in | 1 +
osaf/services/saf/clmsv/nodeagent/scripts/osaf-noded.in | 1 +
osaf/services/saf/cpsv/cpd/scripts/osaf-ckptd.in | 1 +
osaf/services/saf/cpsv/cpnd/scripts/osaf-ckptnd.in | 1 +
osaf/services/saf/edsv/eds/scripts/osaf-evtd.in | 1 +
osaf/services/saf/glsv/gld/scripts/osaf-lckd.in | 1 +
osaf/services/saf/glsv/glnd/scripts/osaf-lcknd.in | 1 +
osaf/services/saf/immsv/immd/scripts/osaf-immd.in | 1 +
osaf/services/saf/immsv/immnd/scripts/osaf-immnd.in | 1 +
osaf/services/saf/logsv/lgs/scripts/osaf-logd.in | 1 +
osaf/services/saf/mqsv/mqd/scripts/osaf-msgd.in | 1 +
osaf/services/saf/mqsv/mqnd/scripts/osaf-msgnd.in | 1 +
osaf/services/saf/ntfsv/ntfs/scripts/osaf-ntfd.in | 1 +
osaf/services/saf/plmsv/plms/scripts/osaf-plmd.in | 1 +
osaf/services/saf/smfsv/smfd/scripts/osaf-smfd.in | 1 +
osaf/services/saf/smfsv/smfnd/scripts/osaf-smfnd.in | 1 +
scripts/opensaf_reboot | 10 +++++++++-
28 files changed, 118 insertions(+), 9 deletions(-)
Testing Commands:
-----------------
Test the opensaf_reboot() supervision by causing execution of the opensaf_reboot
shell script to fail (e.g. by removing it). Test the supervision of the
opensaf_reboot shell script by causing "reboot -f" to fail (e.g. by removing the
reboot command).
Testing, Expected Results:
--------------------------
The node should reboot after 60 seconds in both test cases mentioned above.
Conditions of Submission:
-------------------------
Ack from Ramesh
Arch Built Started Linux distro
-------------------------------------------
mips n n
mips64 n n
x86 n n
x86_64 y y
powerpc n n
powerpc64 n n
Reviewer Checklist:
-------------------
[Submitters: make sure that your review doesn't trigger any checkmarks!]
Your checkin has not passed review because (see checked entries):
___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.
___ You have failed to nominate the proper persons for review and push.
___ Your patches do not have proper short+long header
___ You have grammar/spelling in your header that is unacceptable.
___ You have exceeded a sensible line length in your headers/comments/text.
___ You have failed to put in a proper Trac Ticket # into your commits.
___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)
___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.
___ You have ^M present in some of your files. These have to be removed.
___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.
___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.
___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.
___ You have extraneous garbage in your review (merge commits etc)
___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.
___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.
___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.
___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.
___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
___ Your computer have a badly configured date and time; confusing the
the threaded patch review.
___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.
___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.
|