From: Thien M. H. <thi...@de...> - 2024-07-02 03:41:44
|
Hi Thang, ACK from me. Best Regards, Thien -----Original Message----- From: Thang Duc Nguyen <tha...@de...> Sent: Tuesday, June 25, 2024 8:56 AM To: Thien Minh Huynh <thi...@de...>; Dat Tran Quoc Phan <dat...@de...> Cc: ope...@li...; Thang Duc Nguyen <tha...@de...> Subject: [PATCH 1/1] smf: fix one step upgrade failed [#3354] In large cluster or system under high load, during one step upgrade, SMF orders AMF to lock node group(NG). There are many request to IMM to update attribute and it causes the timeout respond from IMM to AMF. SMF receives timeout then retry lock again and again while the first lock still on going. When the first lock is successful and the request lock again from SMF will receive NO_OP error from AMF. In this case, NO_OP should be considered as a success. --- src/smf/smfd/SmfAdminState.cc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/smf/smfd/SmfAdminState.cc b/src/smf/smfd/SmfAdminState.cc index 958b7ae82..c20df8d74 100755 --- a/src/smf/smfd/SmfAdminState.cc +++ b/src/smf/smfd/SmfAdminState.cc @@ -926,6 +926,9 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation( saImmOmAdminOperationInvoke_2(ownerHandle_, &nodeGroupName, 0, adminOp, params, &oi_rc, smfd_cb->adminOpTimeout); + if ((imm_rc != SA_AIS_OK) || (oi_rc != SA_AIS_OK)) + LOG_WA("%s: imm_rc: %s, oi_rc: %s", __FUNCTION__, + saf_error(imm_rc), saf_error(oi_rc)); if ((imm_rc == SA_AIS_ERR_TRY_AGAIN) || (imm_rc == SA_AIS_OK && oi_rc == SA_AIS_ERR_TRY_AGAIN)) { base::Sleep(base::MillisToTimespec(2000)); @@ -933,7 +936,8 @@ bool SmfAdminStateHandler::nodeGroupAdminOperation( } else if (imm_rc == SA_AIS_ERR_TIMEOUT) { // Retry continue; - } else if (imm_rc == SA_AIS_ERR_NO_OP) { + } else if ((imm_rc == SA_AIS_ERR_NO_OP) || + (oi_rc == SA_AIS_ERR_NO_OP)) { // If an admin operation is already performed SA_AIS_ERR_NO_OP // is returned. Treat this as OK, just log it and return // operation success -- 2.25.1 |