This defect report concerns the OpenSAF HiSV service which is a program that makes calls into OpenHPI. For reference, I'm running OpenHPI 2.11.1 on c-Class with the OA Soap plugin. The OpenSAF version is 1.2.1.
HiSV makes calls into OpenHPI whenever a server blade is powered-up, powered-down, inserted, or removed. In the case of an existing blade powering-up or powering-down, or being removed, HiSV does not run into any issues with the OpenHPI calls it is making. Only when a server blade is inserted are there problems.
When a server blade is inserted into the c-Class enclosure, two events are generated - and the events are about a minute apart from each other.
The first event occurs as the new blade goes from the "NOT-PRESENT" state to the "INSERTION-PENDING" state. The second event occurs about a minute later as the blade goes from the "INSERTION-PENDING" state to the "ACTIVE" state.
HiSV makes OpenHPI calls on the new resource after receiving the first (INSERTION-PENDING) event, and some of these calls fail.
The first OpenHPI call that fails is:
saHpiHotSwapPolicyCancel()
which fails with a return value of:
-1002 (SA_ERR_HPI_UNSUPPORTED_API)
OpenHPI should never return this value on a managed hotswap device.
The second OpenHPI call that fails is:
saHpiIdrAreaHeaderGet()
which fails with a return value of:
-1011 (SA_ERR_HPI_NOT_PRESENT)
Note that if I add a sleep of 60 seconds in HiSV after receiving the first (INSERTION-PENDING) event, but prior to making these OpenHPI calls, then the calls do succeed. This implies some kind of race condition in OpenHPI. Also note that when powering-up an existing blade, the same 2 events are generated and HiSV makes these same OpenHPI calls on the powered-up resource, but no failures occur. I suspect that on blade insertion, the OpenHPI data structures do not yet have full integrity at the time when the first event (INSERTION-PENDING) is generated - and this causes the about OpenHPI calls that HiSV is making to fail. The OA Soap plugin should perhaps force a re-discovery of the blade prior to sending out the first event, so that the internal OpenHPI data structures will have full integrity if a client application (such as HiSV) starts making OpenHPI calls on the new resource after receiving the first (INSERTION-PENDING) event.
Also, the same situation is true for switch interconnects (on c-Class) as they also support the managed hotswap model. So a solution to the problem described here for server blades should also be applied to switch interconnects.
It is important that these defects are fixed, otherwise OpenSAF/HiSV will not work properly on c-Class enclosures.
Logged In: YES
user_id=814412
Originator: NO
Note that this tracker issue really addresses two problems:
1) Returning SA_ERR_HPI_UNSUPPORTED_API
* Duplicate of: [ 2021749 ] Correct plugin return values for [In]ActiveSet, PolicyCancel
* Fixed in commit 6852
2) The timing of the insertion event
* Fixed in commit 6871
This entire issue is now closed. Requesting (as much as possible) that defect reports don't combine unrelated problems.