From: SourceForge.net <no...@so...> - 2013-04-15 15:06:34
|
Bugs item #3610943, was opened at 2013-04-15 08:06 Message generated for change (Tracker Item Submitted) made by mbaldessari You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=532251&aid=3610943&group_id=71730 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: HP c-Class Plugin Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Michele Baldessari (mbaldessari) Assigned to: dr_mohan (dr_mohan) Summary: openhpi OA switchover issues Initial Comment: Hi, with 3.2.0 we seem to be able to observe the following phenomenon. In a BladeSystem c7000 Enclosure G2 enclosure configured like the following (note ENCLOSURE_IP_MODE and LLF are non default ones): SET OA NAME 1 hp-bladectr-1 SET IPCONFIG STATIC 1 10.65.209.28 255.255.252.0 10.65.211.254 10.65.255.201 10.11.255.27 SET NIC AUTO 1 SET OA NAME 2 hp-bladectr-1-bkp SET IPCONFIG STATIC 2 10.65.209.33 255.255.252.0 10.65.211.254 10.65.255.201 10.11.255.27 SET NIC AUTO 2 ENABLE ENCLOSURE_IP_MODE SET LLF INTERVAL 60 ENABLE LLF DISABLE ROUTER ADVERTISEMENTS DISABLE DHCPV6 DISABLE IPV6 OA Firmware Ver. : 3.71 Dec 07 2012 Once openhpi 3.2.0 is started (configured with only the active IP due to ENCLOSURE_IP_MODE set) and an OA switchover is forced at least one thread is observed to be stuck in constantly trying to rediscover the STANDBY OA like the following (note this issue takes a few tries to be reproduces, it is not a 100% hit thing): oa_soap: CRIT: oa_soap_callsupport.c:978: OA SOAP error 139: Not a valid request while running in standby mode. oa_soap: CRIT: oa_soap_re_discover.c:695: Get blade info failed oa_soap: CRIT: oa_soap_re_discover.c:172: Re-discovery of server blade failed oa_soap: CRIT: oa_soap_event.c:417: Re-discovery failed for OA 10.65.209.33 After that two things can happen: - At least once I observed that openhpi was not reactive anymore (no events processed any longer?) - Most of the times openhpi will continue to be working but at least the thread polling the standby OA will never stop and will constantly try to rediscover the STANDBY OA filling up the logs with errors I will attach the full log to the case. Below you will find the steps I took. regards, Michele 1) gdb /usr/local/sbin/openhpid Reading symbols from /usr/local/sbin/openhpid...done. (gdb) set args -c /etc/openhpi/openhpi.conf -n -v >& /tmp/3.2.0-20130413.log (gdb) r Waiting for the discovery to finish...i.e. until we see only HEARTBEAT events Forcing takeover at about Fri Apr 12 22:01:20 CEST 2013 2) We see the failure (shortly after the OA SWITCH done via "force takeover") moment at line 20751: oa_soap: DBG: oa_soap_event.c:258: getAllEvents call failed, may be due to OA switchover oa_soap: DBG: oa_soap_event.c:259: Re-try the getAllEvents SOAP call oa_soap: DBG: oa_soap_callsupport.c:669: OA request(1): POST /hpoa HTTP/1.1 Host: 10.65.209.33:443 <-- here we try the standby OA right away !!! This is dangerous as we don't know yet which OA has that IP We ask the standby OA the following: oa_soap: DBG: oa_soap_callsupport.c:680: OA request(2): <?xml version="1.0"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:hpoa="hpoa.xsd"> <SOAP-ENV:Header><wsse:Security SOAP-ENV:mustUnderstand="true"> <hpoa:HpOaSessionKeyToken> <hpoa:oaSessionKey>b518615c20ceb860</hpoa:oaSessionKey> </hpoa:HpOaSessionKeyToken> </wsse:Security> </SOAP-ENV:Header> <SOAP-ENV:Body> <hpoa:getAllEvents><hpoa:pid>2837</hpoa:pid><hpoa:waitTilEventHappens>1</hpoa:waitTilEventHappens><hpoa:lcdEvents>0</hpoa:lcdEvents></hpoa:getAllEvents> </SOAP-ENV:Body> </SOAP-ENV:Envelope> and we get a: oa_soap: DBG: oa_soap_callsupport.c:708: OA response(0): HTTP/1.1 500 Internal Server Error^M Date: Sat, 13 Apr 2013 08:57:02 GMT^M Server: Apache^M Connection: close^M Content-Length: 1299^M Content-Type: application/soap+xml; charset=utf-8^M ^M oa_soap: DBG: oa_soap_callsupport.c:728: OA response(1): <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope" xmlns:SOAP-ENC="http://www.w3.org/2003/05/soap-encoding" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:hpoa="hpoa.xsd"> <SOAP-ENV:Header> <wsse:Security> <hpoa:HpOaSessionKeyToken> <hpoa:oaSessionKey>b518615c20ceb860</hpoa:oaSessionKey> </hpoa:HpOaSessionKeyToken> </wsse:Security> </SOAP-ENV:Header> <SOAP-ENV:Body> <SOAP-ENV:Fault> <SOAP-ENV:Code> <SOAP-ENV:Value>SOAP-ENV:Receiver</SOAP-ENV:Value> </SOAP-ENV:Code> <SOAP-ENV:Reason> <SOAP-ENV:Text>Onboard Administrator Error</SOAP-ENV:Text> </SOAP-ENV:Reason> <SOAP-ENV:Detail> <hpoa:faultInfo> <hpoa:errorType>ONBOARD_ADMINISTRATOR</hpoa:errorType> <hpoa:errorCode>201</hpoa:errorCode> <hpoa:operationName>getAllEvents</hpoa:operationName> <hpoa:errorText>Could not open event pipe for reading.</hpoa:errorText> </hpoa:faultInfo> </SOAP-ENV:Detail> </SOAP-ENV:Fault> </SOAP-ENV:Body> </SOAP-ENV:Envelope> After a couple of the above,we do around line 20898 : Host: 10.65.209.33:443 <hpoa:getOaStatus><hpoa:bayNumber>2</hpoa:bayNumber></hpoa:getOaStatus> <-- So here we seem to ask the ip address 10.65.209.33 (is it still the ip of bay 2 probably not?) for a status of oa nr 2. and we get: <hpoa:getOaStatusResponse> <hpoa:oaStatus> <hpoa:bayNumber>2</hpoa:bayNumber> <hpoa:oaName>hp-bladectr-1-bkp</hpoa:oaName> <-- We get the reply that OA nr. 2 is ACTIVE now (this is probably OA 1 telling us) <hpoa:oaRole>ACTIVE</hpoa:oaRole> oa_soap: CRIT: oa_soap_utils.c:702: OA 10.65.209.33 has become Active (line 20977) So my fear here is that we opened a connection with 10.65.209.33 and it tells us correctly that OA 2 is ACTIVE. But we keep connecting to OA 1 (because we probably still the wrong IP address) So now we have the Rediscovery and this I think confirms my theory: 21028 oa_soap: CRIT: oa_soap_re_discover.c:140: Re-discovery started 21029 oa_soap: DBG: oa_soap_callsupport.c:669: OA request(1): 21030 POST /hpoa HTTP/1.1 21031 Host: 10.65.209.33:443 <-- Rediscovery happens by talking to this IP which at this point probably belongs to OA nr 1???? 21032 Content-Type: application/soap+xml; charset="utf-8" 21033 Content-Length: 749 21034 21035 21036 21037 oa_soap: DBG: oa_soap_callsupport.c:680: OA request(2): 21038 <?xml version="1.0"?> 21039 <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://ww w.w3.org/2001/XMLSchema" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" xmlns:wsse="http://docs.oasis- open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:hpoa="hpoa.xsd"> 21040 <SOAP-ENV:Header><wsse:Security SOAP-ENV:mustUnderstand="true"> 21041 <hpoa:HpOaSessionKeyToken> 21042 <hpoa:oaSessionKey>b518615c20ceb860</hpoa:oaSessionKey> 21043 </hpoa:HpOaSessionKeyToken> 21044 </wsse:Security> 21045 </SOAP-ENV:Header> 21046 <SOAP-ENV:Body> 21047 <hpoa:getBladeInfo><hpoa:bayNumber>1</hpoa:bayNumber></hpoa:getBladeInfo> 21048 </SOAP-ENV:Body> 21049 </SOAP-ENV:Envelope> And at this point we are told off by the OA: 21052 oa_soap: DBG: oa_soap_callsupport.c:708: OA response(0): 21053 HTTP/1.1 500 Internal Server Error^M 21054 Date: Sat, 13 Apr 2013 08:57:16 GMT^M 21055 Server: Apache^M 21056 Connection: close^M 21057 Content-Length: 1370^M 21058 Content-Type: application/soap+xml; charset=utf-8^M 21059 ^M 21060 21061 21062 oa_soap: DBG: oa_soap_callsupport.c:728: OA response(1): 21063 <?xml version="1.0" encoding="UTF-8"?> 21064 <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope" xmlns:SOAP-ENC="http://www.w3.org/2003/05/soap-encoding" xmlns:xsi="http:/ /www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-w ssecurity-utility-1.0.xsd" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:hpoa="hpoa.xsd"> 21065 <SOAP-ENV:Header> 21066 <wsse:Security> 21067 <hpoa:HpOaSessionKeyToken> 21068 <hpoa:oaSessionKey>b518615c20ceb860</hpoa:oaSessionKey> 21069 </hpoa:HpOaSessionKeyToken> 21070 </wsse:Security> 21071 </SOAP-ENV:Header> 21072 <SOAP-ENV:Body> 21073 <SOAP-ENV:Fault> 21074 <SOAP-ENV:Code> 21075 <SOAP-ENV:Value>SOAP-ENV:Receiver</SOAP-ENV:Value> 21076 </SOAP-ENV:Code> 21077 <SOAP-ENV:Reason> 21078 <SOAP-ENV:Text>Onboard Administrator Error</SOAP-ENV:Text> 21079 </SOAP-ENV:Reason> 21080 <SOAP-ENV:Detail> 21081 <hpoa:faultInfo> 21082 <hpoa:errorType>ONBOARD_ADMINISTRATOR</hpoa:errorType> 21083 <hpoa:errorCode>139</hpoa:errorCode> 21084 <hpoa:operationName>getBladeInfo</hpoa:operationName> 21085 <hpoa:operationBayNumber>01</hpoa:operationBayNumber> 21086 <hpoa:errorText>Not a valid request while running in standby mode.</hpoa:errorText> 21087 </hpoa:faultInfo> 21088 </SOAP-ENV:Detail> 21089 </SOAP-ENV:Fault> 21090 </SOAP-ENV:Body> 21091 </SOAP-ENV:Envelope> 21093 oa_soap: CRIT: oa_soap_callsupport.c:978: OA SOAP error 139: Not a valid request while running in standby mode. 21094 oa_soap: CRIT: oa_soap_re_discover.c:695: Get blade info failed 21095 oa_soap: CRIT: oa_soap_re_discover.c:172: Re-discovery of server blade failed 21096 oa_soap: CRIT: oa_soap_event.c:417: Re-discovery failed for OA 10.65.209.33 This happens a couple of times until we decide to check with the master IP again and do the re-discovery: 21428 oa_soap: CRIT: oa_soap_re_discover.c:140: Re-discovery started 21429 oa_soap: DBG: oa_soap_callsupport.c:669: OA request(1): 21430 POST /hpoa HTTP/1.1 21431 Host: 10.65.209.28:443 21432 Content-Type: application/soap+xml; charset="utf-8" 21433 Content-Length: 773 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=532251&aid=3610943&group_id=71730 |