You can subscribe to this list here.
2013 |
Jan
|
Feb
|
Mar
(2) |
Apr
(79) |
May
(32) |
Jun
(26) |
Jul
(39) |
Aug
(31) |
Sep
(9) |
Oct
(44) |
Nov
(29) |
Dec
(21) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 |
Jan
(12) |
Feb
(56) |
Mar
(50) |
Apr
(27) |
May
(33) |
Jun
(24) |
Jul
(44) |
Aug
(25) |
Sep
(18) |
Oct
(16) |
Nov
(18) |
Dec
(31) |
2015 |
Jan
(52) |
Feb
(49) |
Mar
(28) |
Apr
(78) |
May
(109) |
Jun
(18) |
Jul
(31) |
Aug
(25) |
Sep
(12) |
Oct
(73) |
Nov
(13) |
Dec
(13) |
2016 |
Jan
(10) |
Feb
(8) |
Mar
(6) |
Apr
(10) |
May
(14) |
Jun
(3) |
Jul
(19) |
Aug
(8) |
Sep
(8) |
Oct
(38) |
Nov
(6) |
Dec
(9) |
2017 |
Jan
(8) |
Feb
(3) |
Mar
(33) |
Apr
(7) |
May
(26) |
Jun
(8) |
Jul
(9) |
Aug
(16) |
Sep
(9) |
Oct
(11) |
Nov
(26) |
Dec
(5) |
2018 |
Jan
(15) |
Feb
(8) |
Mar
(23) |
Apr
(5) |
May
(2) |
Jun
(1) |
Jul
(7) |
Aug
|
Sep
(1) |
Oct
(8) |
Nov
(7) |
Dec
(5) |
2019 |
Jan
(5) |
Feb
(15) |
Mar
(1) |
Apr
|
May
|
Jun
(2) |
Jul
(1) |
Aug
|
Sep
(11) |
Oct
(1) |
Nov
|
Dec
(4) |
2020 |
Jan
|
Feb
(8) |
Mar
(5) |
Apr
(2) |
May
(28) |
Jun
(14) |
Jul
(14) |
Aug
(4) |
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
2021 |
Jan
(4) |
Feb
(7) |
Mar
(12) |
Apr
(2) |
May
|
Jun
(8) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(4) |
2022 |
Jan
(5) |
Feb
(9) |
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Hoyt, D. <DH...@rb...> - 2024-07-16 13:23:15
|
Ok, thanks for the suggestion Thang! Regards, David From: Thang Nguyen <tha...@en...> Sent: Tuesday, July 16, 2024 2:16 AM To: Hoyt, David <DH...@rb...> Cc: ope...@li... Subject: [EXTERNAL] RE: Question about saAmfSGCompRestartMax and saAmfSGCompRestartProb Hi David, >From my understanding there is no way. But I think you can consider to change "saAmfCompRecoveryOnError" at component level for recovery action. Thang D Nguyen -----Original Message----- From: Hoyt, David <DH...@rb...<mailto:DH...@rb...>> Sent: Thursday, July 11, 2024 9:41 PM To: ope...@li...<mailto:ope...@li...> Cc: Hoyt, David <DH...@rb...<mailto:DH...@rb...>> Subject: [users] Question about saAmfSGCompRestartMax and saAmfSGCompRestartProb CAUTION - EXTERNAL EMAIL Hi all, I know the saAmfSGCompRestartMax and saAmfSGCompRestartProb are parameters that can be set for a SG. Is there any way for a specific component to override these values just for itself? For example, I have a SU with 3 components. One of these components is not as critical and I would like it to have a different restart max & restart prob value than the other two. Is this possible and if so, how? Setup: 2 nodes: SC-1,SC-2 Running opensaf-5.19.10 Virtualization: kvm Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo) Kernel: Linux 3.10.0-1127.el7.x86_64 Architecture: x86-64 Regards, David Disclaimer This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. _______________________________________________ Opensaf-users mailing list Ope...@li...<mailto:Ope...@li...> https://lists.sourceforge.net/lists/listinfo/opensaf-users<https://lists.sourceforge.net/lists/listinfo/opensaf-users> The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Any opinions expressed are mine and do not necessarily represent the opinions of the Company. Emails are susceptible to interference. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is strictly prohibited and may be unlawful. If you have received this message in error, do not open any attachments but please notify the Endava Service Desk on (+44 (0)870 423 0187), and delete this message from your system. The sender accepts no responsibility for information, errors or omissions in this email, or for its use or misuse, or for any act committed or omitted in connection with this communication. If in doubt, please verify the authenticity of the contents with the sender. Please rely on your own virus checkers as no responsibility is taken by the sender for any damage rising out of any bug or virus infection. Endava plc is a company registered in England under company number 5722669 whose registered office is at 125 Old Broad Street, London, EC2N 1AR, United Kingdom. Endava plc is the Endava group holding company and does not provide any services to clients. Each of Endava plc and its subsidiaries is a separate legal entity and has no liability for another such entity's acts or omissions. |
From: Thang N. <tha...@en...> - 2024-07-16 06:15:40
|
Hi David, >From my understanding there is no way. But I think you can consider to change "saAmfCompRecoveryOnError" at component level for recovery action. Thang D Nguyen -----Original Message----- From: Hoyt, David <DH...@rb...> Sent: Thursday, July 11, 2024 9:41 PM To: ope...@li... Cc: Hoyt, David <DH...@rb...> Subject: [users] Question about saAmfSGCompRestartMax and saAmfSGCompRestartProb CAUTION - EXTERNAL EMAIL Hi all, I know the saAmfSGCompRestartMax and saAmfSGCompRestartProb are parameters that can be set for a SG. Is there any way for a specific component to override these values just for itself? For example, I have a SU with 3 components. One of these components is not as critical and I would like it to have a different restart max & restart prob value than the other two. Is this possible and if so, how? Setup: 2 nodes: SC-1,SC-2 Running opensaf-5.19.10 Virtualization: kvm Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo) Kernel: Linux 3.10.0-1127.el7.x86_64 Architecture: x86-64 Regards, David Disclaimer This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Any opinions expressed are mine and do not necessarily represent the opinions of the Company. Emails are susceptible to interference. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is strictly prohibited and may be unlawful. If you have received this message in error, do not open any attachments but please notify the Endava Service Desk on (+44 (0)870 423 0187), and delete this message from your system. The sender accepts no responsibility for information, errors or omissions in this email, or for its use or misuse, or for any act committed or omitted in connection with this communication. If in doubt, please verify the authenticity of the contents with the sender. Please rely on your own virus checkers as no responsibility is taken by the sender for any damage rising out of any bug or virus infection. Endava plc is a company registered in England under company number 5722669 whose registered office is at 125 Old Broad Street, London, EC2N 1AR, United Kingdom. Endava plc is the Endava group holding company and does not provide any services to clients. Each of Endava plc and its subsidiaries is a separate legal entity and has no liability for another such entity's acts or omissions. |
From: Hoyt, D. <DH...@rb...> - 2024-07-11 14:57:12
|
Hi all, I know the saAmfSGCompRestartMax and saAmfSGCompRestartProb are parameters that can be set for a SG. Is there any way for a specific component to override these values just for itself? For example, I have a SU with 3 components. One of these components is not as critical and I would like it to have a different restart max & restart prob value than the other two. Is this possible and if so, how? Setup: 2 nodes: SC-1,SC-2 Running opensaf-5.19.10 Virtualization: kvm Operating System: Red Hat Enterprise Linux Server 7.8 (Maipo) Kernel: Linux 3.10.0-1127.el7.x86_64 Architecture: x86-64 Regards, David Disclaimer This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. |
From: Gary L. <gar...@de...> - 2024-02-28 09:35:46
|
The OpenSAF community is pleased to announce the availability of the OpenSAF 5.24.02 release. The source code for OpenSAF 5.24.02 and the corresponding documentation can be downloaded using the following links: [opensaf-5.24.02.tar.gz](http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.24.02.tar.gz/download), [opensaf-documentation-5.24.02.tar.gz](http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.24.02.tar.gz/download). For a complete list of new features in this release, please refer to the [NEWS](https://sourceforge.net/p/opensaf/wiki/NEWS-5.24.02/) at the wiki. See the [ChangeLog](https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.24.02/) for a full list of changes in this release. |
From: Gary L. <gar...@de...> - 2023-07-30 06:55:39
|
The OpenSAF community is pleased to announce the availability of the OpenSAF 5.23.07 release. The source code for OpenSAF 5.23.07 and the corresponding documentation can be downloaded using the following links: http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.23.07.tar.g z/download http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.2 3.07.tar.gz/download For a complete list of new features in this release, please refer to the NEWS at the wiki: https://sourceforge.net/p/opensaf/wiki/NEWS-5.23.07/ See the ChangeLog for a full list of changes in this release: https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.23.07/ Thank you for your continued interest in OpenSAF and to everyone who has contributed to this release. |
From: Gary L. <gar...@de...> - 2023-03-28 00:35:43
|
The OpenSAF community is pleased to announce the availability of the OpenSAF 5.23.03 release. The source code for OpenSAF 5.23.03 and the corresponding documentation can be downloaded using the following links: http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.23.03.tar.g z/download http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.2 3.03.tar.gz/download For a complete list of new features in this release, please refer to the NEWS at the wiki: https://sourceforge.net/p/opensaf/wiki/NEWS-5.23.03/ See the ChangeLog for a full list of changes in this release: https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.23.03/ Thank you for your continued interest in OpenSAF and to everyone who has contributed to this release. |
From: Gary L. <gar...@de...> - 2022-11-18 05:42:03
|
The OpenSAF community is pleased to announce the availability of the OpenSAF 5.22.11 release. The source code for OpenSAF 5.22.11 and the corresponding documentation can be downloaded using the following links: http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.22.11.tar.g z/download http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.2 2.11.tar.gz/download For a complete list of new features in this release, please refer to the NEWS at the wiki: https://sourceforge.net/p/opensaf/wiki/NEWS-5.22.11/ See the ChangeLog for a full list of changes in this release: https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.22.11/ Thank you for your continued interest in OpenSAF and to everyone who has contributed to this release. |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-07-26 23:38:58
|
Hi Prince, Yes, it should work. Thanks & Regards, Mohan Kanakam, +91-8333082448, Senior Software Engineer "Book a demo for our brand new product KubeHA, more information at https://GetHighAvailability.com/product/kubeha " contact@GetHighAvailability.com High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Philip, Prince [mailto:pp...@rb...] Sent: 26 July 2022 16:45 To: Mohan Kanakam; ope...@li... Subject: RE: [EXTERNAL] RE: [users] OpenSAF AMF does support container versioncontainer version? Thank you Mohan. Below is my requirement. Currently we use AMF in a non-container env where one use case is to detect any process failure in the system and we take some corrective actions. (also we use for HA etc). Now when we move to container environment, each of these processes will become independent containers, but all inside one POD (yes, for the time being, all in one POD). But I wanted to know whether the AMF in containerized form can detect failure of these processes each one inside different containers? thanks, Prince -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: Tuesday, July 26, 2022 4:33 PM To: Philip, Prince <pp...@rb...>; ope...@li... Subject: [EXTERNAL] RE: [users] OpenSAF AMF does support container versioncontainer version? Hi Prince, Sorry, I didn't get your question. OpenSAF supports container-contained feature of AMF and you can also containerize OpenSAF. Thanks & Regards, Mohan Kanakam | 91-8333082448, Senior Software Engineer "Book a demo for our brand new product KubeHA, more information at https://clicktime.symantec.com/15siF8gYVgpaQhJkfUfPy?h=b1miLrBSu7pXiR0pZOY_u YL5T1cMQCPwKd8JfCguSno=&u=https://GetHighAvailability.com/product/kubeha " contact@GetHighAvailability.com High Availability Solutions https://clicktime.symantec.com/15siQo57QvBmEaxbkbThD?h=pz7a2Zz_YcoK04ppNKTl_ WbudovJ_m0BFn0QimfgA2s=&u=www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Philip, Prince [mailto:pp...@rb...] Sent: 26 July 2022 09:33 To: ope...@li... Subject: [users] OpenSAF AMF does support container version Hi Does OpenSAF AMF support container version? thanks, Prince Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. _______________________________________________ Opensaf-users mailing list Ope...@li... https://clicktime.symantec.com/15siKxspxJWApe8gD34Yb?h=dbSISdE6gCDfnW0-068hD KoT34xnNHU_E6eG22EWZ9w=&u=https://lists.sourceforge.net/lists/listinfo/opens af-users Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-07-26 11:23:07
|
Hi Prince, Sorry, I didn't get your question. OpenSAF supports container-contained feature of AMF and you can also containerize OpenSAF. Thanks & Regards, Mohan Kanakam | 91-8333082448, Senior Software Engineer "Book a demo for our brand new product KubeHA, more information at https://GetHighAvailability.com/product/kubeha " contact@GetHighAvailability.com High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Philip, Prince [mailto:pp...@rb...] Sent: 26 July 2022 09:33 To: ope...@li... Subject: [users] OpenSAF AMF does support container version Hi Does OpenSAF AMF support container version? thanks, Prince Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Philip, P. <pp...@rb...> - 2022-07-26 11:15:50
|
Thank you Mohan. Below is my requirement. Currently we use AMF in a non-container env where one use case is to detect any process failure in the system and we take some corrective actions. (also we use for HA etc). Now when we move to container environment, each of these processes will become independent containers, but all inside one POD (yes, for the time being, all in one POD). But I wanted to know whether the AMF in containerized form can detect failure of these processes each one inside different containers? thanks, Prince -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: Tuesday, July 26, 2022 4:33 PM To: Philip, Prince <pp...@rb...>; ope...@li... Subject: [EXTERNAL] RE: [users] OpenSAF AMF does support container versioncontainer version? Hi Prince, Sorry, I didn't get your question. OpenSAF supports container-contained feature of AMF and you can also containerize OpenSAF. Thanks & Regards, Mohan Kanakam | 91-8333082448, Senior Software Engineer "Book a demo for our brand new product KubeHA, more information at https://clicktime.symantec.com/15siF8gYVgpaQhJkfUfPy?h=b1miLrBSu7pXiR0pZOY_uYL5T1cMQCPwKd8JfCguSno=&u=https://GetHighAvailability.com/product/kubeha " contact@GetHighAvailability.com High Availability Solutions https://clicktime.symantec.com/15siQo57QvBmEaxbkbThD?h=pz7a2Zz_YcoK04ppNKTl_WbudovJ_m0BFn0QimfgA2s=&u=www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Philip, Prince [mailto:pp...@rb...] Sent: 26 July 2022 09:33 To: ope...@li... Subject: [users] OpenSAF AMF does support container version Hi Does OpenSAF AMF support container version? thanks, Prince Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. _______________________________________________ Opensaf-users mailing list Ope...@li... https://clicktime.symantec.com/15siKxspxJWApe8gD34Yb?h=dbSISdE6gCDfnW0-068hDKoT34xnNHU_E6eG22EWZ9w=&u=https://lists.sourceforge.net/lists/listinfo/opensaf-users Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. |
From: Philip, P. <pp...@rb...> - 2022-07-26 04:18:06
|
Hi Does OpenSAF AMF support container version? thanks, Prince Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. |
From: Gary L. <gar...@de...> - 2022-06-01 01:19:18
|
The OpenSAF community is pleased to announce the availability of the OpenSAF 5.22.06 release. The source code for OpenSAF 5.22.06 and the corresponding documentation can be downloaded using the following links: http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.22.06.tar.gz/download http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.22.06.tar.gz/download For a complete list of new features in this release, please refer to the NEWS at the wiki: https://sourceforge.net/p/opensaf/wiki/NEWS-5.22.06/ See the ChangeLog for a full list of changes in this release: https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.22.06/ Thank you for your continued interest in OpenSAF and to everyone who has contributed to this release. |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-03-09 17:39:05
|
Hi Sergio, We have analyzed the issue. Please find the detail analysis below. 1. When SC-1 is coming up after reboot in the step #4 mentioned by you below, application from PL-2(assumed) is creating a checkpoint replica(ckptnd) and the request goes to Active Ckptd at SC-2(Active): Mar 7 08:26:06 OLT2T4-UNICOM-2 local0.notice osafamfd[1267]: NO Node 'CC-1' joined the cluster Mar 7 08:26:07 OLT2T4-UNICOM-2 local0.notice osafimmnd[1219]: NO Implementer connected: 31 (MsgQueueService133391) <0, 2090f> Mar 7 08:26:07 OLT2T4-UNICOM-2 local0.info osafimmnd[1219]: IN Create runtime object 'safReplica=safNode=CC-1\#safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService' by Impl id: 22 2. SC-2 Ckptd checkpoints(along with nodes) the information to upcoming Ckptd(Standby) during sync, but the checkpoint addition fails at SC-1 because CLMD doesn’t have node id 136463 in its data base. <143>1 2022-03-07T08:26:07.300294Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="580"] 1288:ckpt/ckptd/cpd_mbcsv.c:1066 >> cpd_mbcsv_dec_sync_resp <143>1 2022-03-07T08:26:07.300481Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="581"] 1288:base/hj_enc.c:418 >> osaf_decode_sanamet <143>1 2022-03-07T08:26:07.300538Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="582"] 1288:base/hj_enc.c:447 TR str: safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService (52) <143>1 2022-03-07T08:26:07.300638Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="583"] 1288:base/hj_enc.c:451 << osaf_decode_sanamet <143>1 2022-03-07T08:26:07.300831Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="584"] 1288:ckpt/ckptd/cpd_sbevt.c:103 >> cpd_sb_proc_ckpt_create ………………………………. <143>1 2022-03-07T08:26:07.315981Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="630"] 1288:clm/agent/clma_api.cc:1232 << saClmClusterNodeGet <143>1 2022-03-07T08:26:07.316023Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="631"] 1288:ckpt/ckptd/cpd_sbevt.c:192 T4 cpd standby create evt failed for node_id:136463 <143>1 2022-03-07T08:26:07.316069Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="632"] 1288:ckpt/ckptd/cpd_db.c:214 >> cpd_ckpt_node_and_ref_delete <143>1 2022-03-07T08:26:07.316113Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="633"] 1288:ckpt/ckptd/cpd_db.c:1121 >> cpd_ckpt_ref_info_del <143>1 2022-03-07T08:26:07.316154Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="634"] 1288:ckpt/ckptd/cpd_db.c:1143 << cpd_ckpt_ref_info_del <143>1 2022-03-07T08:26:07.316201Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="635"] 1288:ckpt/ckptd/cpd_db.c:244 << cpd_ckpt_node_and_ref_delete <143>1 2022-03-07T08:26:07.316242Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="636"] 1288:ckpt/ckptd/cpd_db.c:666 >> cpd_ckpt_map_node_delete <143>1 2022-03-07T08:26:07.316288Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="637"] 1288:ckpt/ckptd/cpd_db.c:682 << cpd_ckpt_map_node_delete <143>1 2022-03-07T08:26:07.316329Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="638"] 1288:ckpt/ckptd/cpd_sbevt.c:288 << cpd_sb_proc_ckpt_create <143>1 2022-03-07T08:26:07.316368Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="639"] 1288:ckpt/ckptd/cpd_mbcsv.c:1111 T4 cpd standby create evt failed <143>1 2022-03-07T08:26:07.316415Z OLT2T4-UNICOM-1 osafckptd 1288 osafckptd [meta sequenceId="640"] 1288:ckpt/ckptd/cpd_mbcsv.c:1136 << cpd_mbcsv_dec_sync_resp 3. After switchover when SC-1 becomes Active, then it doesn’t have checkpoint information with itself and hence, immlist fails. It looks Active ckptd sends nodes information, which existed before(like on PL-1) and is not available right now. It looks like adding checkpoint replica having some issues. This may be older defects, but because of timing issue, you got it. Workaround: 1. When SC-1 is coming up, then don’t create the same checkpoint until PL-1 is not up. So, you are avoiding to create checkpoint till PL-1 joins the cluster. Or 2. In step #4, start PL-1 first and then SC-1, immlist shouldn’t give any error. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Sérgio Marques [mailto:ser...@al...] Sent: 07 March 2022 14:59 To: Mohan Kanakam Cc: 'Nagendra Kumar'; ope...@li... Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Mohan, I’m sending the logs in attach. Please feel free to ask for more info/logs/tests… Many thanks and best regards, Sérgio Marques From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 3 de março de 2022 11:34 To: 'Mohan Kanakam' <mohan@GetHighAvailability.com>; Sérgio Marques <ser...@al...> Cc: 'Nagendra Kumar' <nagendra@GetHighAvailability.com>; ope...@li... Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, We are not able to reproduce the issue as per the steps shared by you on version 5.22.01. So, can you please send us the immd , immnd , ckptd , ckptnd , syslog and mdslog of all the nodes of the cluster. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 28 February 2022 20:55 To: 'Sérgio Marques' Cc: 'Nagendra Kumar' Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Sergio, Thanks for the information. We will try to reproduce and get back to you. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Sérgio Marques [mailto:ser...@al...] Sent: 28 February 2022 20:46 To: Mohan Kanakam Cc: 'Nagendra Kumar' Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Mohan, I believe I have finally found a way for you to reproduce the problem: Please try the following steps: 1. Start 2 controllers with SC-2 Active and SC-1 Standby and 2 payloads, PL-1 and PL-2. 2. At PL-2 create a checkpoint, a section and write on it. 3. At PL-1 create exactly the same checkpoint created in PL-2 and try to create the same section as previously created. You will receive a SA_AIS_ERR_EXIST. Do a SectionOverwrite. 4. At SC-1 perform a si-swap (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and reboot SC-1 and PL-1 nodes. 5. Wait for SC-1 and PL-1 to rejoin the cluster. 6. At SC-2 perform a si-swap (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and then list the checkpoint using immlist. Thanks and regards, Sérgio Marques From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 18 de fevereiro de 2022 13:40 To: Sérgio Marques <ser...@al...> Cc: 'Nagendra Kumar' <nagendra@GetHighAvailability.com>; ope...@li... Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, Thanks for the testing and sharing the results. We try to reproduce the issue in our lab setup, unfortunately we are not able to reproduce. These are the steps we followed : 1. Start 2 controllers with SC-1 Act and SC-2 Standby and PL-3 payload 2. Create checkpoints by applications running on payload 3. Reboot SC-1 (Act). SC-2 becomes Active. And SC-1 joins as Standby. 4. Now perform si-swap. SC-2 becomes Standby and SC-1 becomes Active 5. Reboot SC-1 again. 6. While it is rebooting, perform immlist on checkpoints created. Here we got the output of immlist. Can you please confirm, this is the way to reproduce it or not? Did this issue continue after rebooted controller joined the cluster i.e., immlist worked after rebooted controller joined the cluster? I was thinking that, this could be a transient issue. Can you please share immd, immnd, amfd, amfnd, ckptd, ckptnd, mds.log and syslog from all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Sérgio Marques [mailto:ser...@al...] Sent: 17 February 2022 22:53 To: mohan@GetHighAvailability.com Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Mohan, I’ve done a small change in your patch to be able of compiling it. Where you have “sinfo->ctxt->length” I’ve changed it to “sinfo->ctxt.length”. It resolves the problem. Now, there is no “MDS_SND_RCV: Invalid Sync CTXT Len” events being registered in mds.log. Thanks! Unfortunately, this does not resolve another issue that we also have and were hoping to resolve it with this patch as well. We set a cluster with 2 controller and 2 payload nodes, then we create a checkpoint like the following one: [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1645097377000000000 (0x16d48f5929030a00, Thu Feb 17 11:29:37 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After swapping (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and rebooting the active controller node for the second time, immlist starts returning SA_AIS_ERR_NO_RESOURCES: [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService error - saImmOmAccessorGet_2 FAILED: SA_AIS_ERR_NO_RESOURCES (18) The checkpoint can be found using immfind but not listed width immlist neither accessed via the libSaCkpt.so library. [root@OLT2T4-UNICOM-2~]# immfind safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safReplica=safNode=CC-1\,safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safReplica=safNode=CC-2\,safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService If I only perform the swap command, without the reboot, this issue is not reproduced. I don’t have this issue with the 4.5.2 OpenSAF version. Do you have an idea of what could cause such thing and how should we debug this issue? Many thanks and regards, Sérgio Marques From: Mohan Kanakam <moh...@us...> Sent: 17 de fevereiro de 2022 10:51 To: [opensaf:tickets] <33...@ti...> Subject: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, can you please test the attached patch for your scenario and share your observations. thanks Attachments: * mds_error.patch <https://sourceforge.net/p/opensaf/tickets/_discuss/thread/04984c7ecf/8052/attachment/mds_error.patch> (703 Bytes; application/octet-stream) _____ [tickets:#3306] <https://sourceforge.net/p/opensaf/tickets/3306/> ckpt: checkpoint node director responding to async call. Status: accepted Milestone: 5.22.04 Created: Thu Feb 17, 2022 10:46 AM UTC by Mohan Kanakam Last Updated: Thu Feb 17, 2022 10:46 AM UTC Owner: Mohan Kanakam During section create, one ckptnd sends async request(normal mds send) to another ckptnd. But, another ckptnd is responding to the request in assumption that it received the sync request and it has to respond to the sender ckptnd. In few cases, it is needed to respond when a sync req comes to ckptnd, but in few cases, it receives async req and it needn't respond async request. We are getting the following messages in mds log when creating the section: sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len _____ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/3306/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-03-03 11:34:15
|
Hi Sergio, We are not able to reproduce the issue as per the steps shared by you on version 5.22.01. So, can you please send us the immd , immnd , ckptd , ckptnd , syslog and mdslog of all the nodes of the cluster. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 28 February 2022 20:55 To: 'Sérgio Marques' Cc: 'Nagendra Kumar' Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Sergio, Thanks for the information. We will try to reproduce and get back to you. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Sérgio Marques [mailto:ser...@al...] Sent: 28 February 2022 20:46 To: Mohan Kanakam Cc: 'Nagendra Kumar' Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Mohan, I believe I have finally found a way for you to reproduce the problem: Please try the following steps: 1. Start 2 controllers with SC-2 Active and SC-1 Standby and 2 payloads, PL-1 and PL-2. 2. At PL-2 create a checkpoint, a section and write on it. 3. At PL-1 create exactly the same checkpoint created in PL-2 and try to create the same section as previously created. You will receive a SA_AIS_ERR_EXIST. Do a SectionOverwrite. 4. At SC-1 perform a si-swap (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and reboot SC-1 and PL-1 nodes. 5. Wait for SC-1 and PL-1 to rejoin the cluster. 6. At SC-2 perform a si-swap (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and then list the checkpoint using immlist. Thanks and regards, Sérgio Marques From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 18 de fevereiro de 2022 13:40 To: Sérgio Marques <ser...@al...> Cc: 'Nagendra Kumar' <nagendra@GetHighAvailability.com>; ope...@li... Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, Thanks for the testing and sharing the results. We try to reproduce the issue in our lab setup, unfortunately we are not able to reproduce. These are the steps we followed : 1. Start 2 controllers with SC-1 Act and SC-2 Standby and PL-3 payload 2. Create checkpoints by applications running on payload 3. Reboot SC-1 (Act). SC-2 becomes Active. And SC-1 joins as Standby. 4. Now perform si-swap. SC-2 becomes Standby and SC-1 becomes Active 5. Reboot SC-1 again. 6. While it is rebooting, perform immlist on checkpoints created. Here we got the output of immlist. Can you please confirm, this is the way to reproduce it or not? Did this issue continue after rebooted controller joined the cluster i.e., immlist worked after rebooted controller joined the cluster? I was thinking that, this could be a transient issue. Can you please share immd, immnd, amfd, amfnd, ckptd, ckptnd, mds.log and syslog from all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Sérgio Marques [mailto:ser...@al...] Sent: 17 February 2022 22:53 To: mohan@GetHighAvailability.com Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Mohan, I’ve done a small change in your patch to be able of compiling it. Where you have “sinfo->ctxt->length” I’ve changed it to “sinfo->ctxt.length”. It resolves the problem. Now, there is no “MDS_SND_RCV: Invalid Sync CTXT Len” events being registered in mds.log. Thanks! Unfortunately, this does not resolve another issue that we also have and were hoping to resolve it with this patch as well. We set a cluster with 2 controller and 2 payload nodes, then we create a checkpoint like the following one: [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1645097377000000000 (0x16d48f5929030a00, Thu Feb 17 11:29:37 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After swapping (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and rebooting the active controller node for the second time, immlist starts returning SA_AIS_ERR_NO_RESOURCES: [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService error - saImmOmAccessorGet_2 FAILED: SA_AIS_ERR_NO_RESOURCES (18) The checkpoint can be found using immfind but not listed width immlist neither accessed via the libSaCkpt.so library. [root@OLT2T4-UNICOM-2~]# immfind safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safReplica=safNode=CC-1\,safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safReplica=safNode=CC-2\,safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService If I only perform the swap command, without the reboot, this issue is not reproduced. I don’t have this issue with the 4.5.2 OpenSAF version. Do you have an idea of what could cause such thing and how should we debug this issue? Many thanks and regards, Sérgio Marques From: Mohan Kanakam <moh...@us...> Sent: 17 de fevereiro de 2022 10:51 To: [opensaf:tickets] <33...@ti...> Subject: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, can you please test the attached patch for your scenario and share your observations. thanks Attachments: * mds_error.patch <https://sourceforge.net/p/opensaf/tickets/_discuss/thread/04984c7ecf/8052/attachment/mds_error.patch> (703 Bytes; application/octet-stream) _____ [tickets:#3306] <https://sourceforge.net/p/opensaf/tickets/3306/> ckpt: checkpoint node director responding to async call. Status: accepted Milestone: 5.22.04 Created: Thu Feb 17, 2022 10:46 AM UTC by Mohan Kanakam Last Updated: Thu Feb 17, 2022 10:46 AM UTC Owner: Mohan Kanakam During section create, one ckptnd sends async request(normal mds send) to another ckptnd. But, another ckptnd is responding to the request in assumption that it received the sync request and it has to respond to the sender ckptnd. In few cases, it is needed to respond when a sync req comes to ckptnd, but in few cases, it receives async req and it needn't respond async request. We are getting the following messages in mds log when creating the section: sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len _____ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/3306/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-18 13:40:23
|
Hi Sergio, Thanks for the testing and sharing the results. We try to reproduce the issue in our lab setup, unfortunately we are not able to reproduce. These are the steps we followed : 1. Start 2 controllers with SC-1 Act and SC-2 Standby and PL-3 payload 2. Create checkpoints by applications running on payload 3. Reboot SC-1 (Act). SC-2 becomes Active. And SC-1 joins as Standby. 4. Now perform si-swap. SC-2 becomes Standby and SC-1 becomes Active 5. Reboot SC-1 again. 6. While it is rebooting, perform immlist on checkpoints created. Here we got the output of immlist. Can you please confirm, this is the way to reproduce it or not? Did this issue continue after rebooted controller joined the cluster i.e., immlist worked after rebooted controller joined the cluster? I was thinking that, this could be a transient issue. Can you please share immd, immnd, amfd, amfnd, ckptd, ckptnd, mds.log and syslog from all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 From: Sérgio Marques [mailto:ser...@al...] Sent: 17 February 2022 22:53 To: mohan@GetHighAvailability.com Subject: RE: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Hi Mohan, I’ve done a small change in your patch to be able of compiling it. Where you have “sinfo->ctxt->length” I’ve changed it to “sinfo->ctxt.length”. It resolves the problem. Now, there is no “MDS_SND_RCV: Invalid Sync CTXT Len” events being registered in mds.log. Thanks! Unfortunately, this does not resolve another issue that we also have and were hoping to resolve it with this patch as well. We set a cluster with 2 controller and 2 payload nodes, then we create a checkpoint like the following one: [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1645097377000000000 (0x16d48f5929030a00, Thu Feb 17 11:29:37 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After swapping (amf-adm -t 10 si-swap safSi=SC-2N,safApp=OpenSAF) and rebooting the active controller node for the second time, immlist starts returning SA_AIS_ERR_NO_RESOURCES: [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService error - saImmOmAccessorGet_2 FAILED: SA_AIS_ERR_NO_RESOURCES (18) The checkpoint can be found using immfind but not listed width immlist neither accessed via the libSaCkpt.so library. [root@OLT2T4-UNICOM-2~]# immfind safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safReplica=safNode=CC-1\,safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService safReplica=safNode=CC-2\,safCluster=myClmCluster,safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService If I only perform the swap command, without the reboot, this issue is not reproduced. I don’t have this issue with the 4.5.2 OpenSAF version. Do you have an idea of what could cause such thing and how should we debug this issue? Many thanks and regards, Sérgio Marques From: Mohan Kanakam <moh...@us...> Sent: 17 de fevereiro de 2022 10:51 To: [opensaf:tickets] <33...@ti...> Subject: [opensaf:tickets] #3306 ckpt: checkpoint node director responding to async call. Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, can you please test the attached patch for your scenario and share your observations. thanks Attachments: * mds_error.patch <https://sourceforge.net/p/opensaf/tickets/_discuss/thread/04984c7ecf/8052/attachment/mds_error.patch> (703 Bytes; application/octet-stream) _____ [tickets:#3306] <https://sourceforge.net/p/opensaf/tickets/3306/> ckpt: checkpoint node director responding to async call. Status: accepted Milestone: 5.22.04 Created: Thu Feb 17, 2022 10:46 AM UTC by Mohan Kanakam Last Updated: Thu Feb 17, 2022 10:46 AM UTC Owner: Mohan Kanakam During section create, one ckptnd sends async request(normal mds send) to another ckptnd. But, another ckptnd is responding to the request in assumption that it received the sync request and it has to respond to the sender ckptnd. In few cases, it is needed to respond when a sync req comes to ckptnd, but in few cases, it receives async req and it needn't respond async request. We are getting the following messages in mds log when creating the section: sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len _____ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/3306/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ |
From: Mohan K. <mo...@ha...> - 2022-02-17 11:17:24
|
Hi Sergio, I raised a ticket(https://sourceforge.net/p/opensaf/tickets/3306/) in opensaf community. I attached the patch in the ticket. Can you please download the patch and test the patch in your scenario and share your observations. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 17 February 2022 00:43 To: 'Mohan Kanakam'; 'Sérgio Marques'; ope...@li... Subject: Re: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, I have an update. We have the patch for the scenario we could reproduce. We will test it and we will share it to you for testing your scenarios. Once you confirm, I will float the patch in the community. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 14 February 2022 22:24 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the logs. We could also reproduce the issue by simply running the demo application as below. <139>1 2022-02-14T22:15:09.496432+05:30 sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len We will debug the issue this week and will let you know. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 14 February 2022 16:03 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, I have increased the MDS log level (export MDS_LOG_LEVEL=5) to have more detail in the "MDS_SND_RCV: Invalid Sync CTXT Len" error. I'm sending the logs in attach. You can find the "Invalid Sync CTXT Len" errors in logs/sc/cc-2/mds.log file. Thanks and regards, Sérgio Marques -----Original Message----- From: Sérgio Marques Sent: 11 de fevereiro de 2022 15:23 To: Mohan Kanakam <mohan@GetHighAvailability.com>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, To reproduce the problem I only need to power up the cluster nodes and launch our applications at the SC and PL nodes immediately after openSAF coming up. These applications starts creating some checkpoints as the one bellow. [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1644591280000000000 (0x16d2c30e451ae000, Fri Feb 11 14:54:40 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After creating these ckpts, the nodes start creating, reading and writing their sections. I'm sending in attach the requested logs. Please feel free to ask for more info/logs/tests. Thanks a lot for your help, Regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 10 de fevereiro de 2022 17:57 To: 'Mohan Kanakam' <mohan@GetHighAvailability.com>; Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, Can you please share syslog and mdslog of all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 10 February 2022 23:23 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 10 February 2022 20:17 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-16 19:13:41
|
Hi Sergio, I have an update. We have the patch for the scenario we could reproduce. We will test it and we will share it to you for testing your scenarios. Once you confirm, I will float the patch in the community. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 14 February 2022 22:24 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the logs. We could also reproduce the issue by simply running the demo application as below. <139>1 2022-02-14T22:15:09.496432+05:30 sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len We will debug the issue this week and will let you know. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 14 February 2022 16:03 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, I have increased the MDS log level (export MDS_LOG_LEVEL=5) to have more detail in the "MDS_SND_RCV: Invalid Sync CTXT Len" error. I'm sending the logs in attach. You can find the "Invalid Sync CTXT Len" errors in logs/sc/cc-2/mds.log file. Thanks and regards, Sérgio Marques -----Original Message----- From: Sérgio Marques Sent: 11 de fevereiro de 2022 15:23 To: Mohan Kanakam <mohan@GetHighAvailability.com>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, To reproduce the problem I only need to power up the cluster nodes and launch our applications at the SC and PL nodes immediately after openSAF coming up. These applications starts creating some checkpoints as the one bellow. [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1644591280000000000 (0x16d2c30e451ae000, Fri Feb 11 14:54:40 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After creating these ckpts, the nodes start creating, reading and writing their sections. I'm sending in attach the requested logs. Please feel free to ask for more info/logs/tests. Thanks a lot for your help, Regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 10 de fevereiro de 2022 17:57 To: 'Mohan Kanakam' <mohan@GetHighAvailability.com>; Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, Can you please share syslog and mdslog of all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 10 February 2022 23:23 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 10 February 2022 20:17 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-14 16:54:29
|
Hi Sergio, Thanks for the logs. We could also reproduce the issue by simply running the demo application as below. <139>1 2022-02-14T22:15:09.496432+05:30 sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len We will debug the issue this week and will let you know. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 14 February 2022 16:03 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, I have increased the MDS log level (export MDS_LOG_LEVEL=5) to have more detail in the "MDS_SND_RCV: Invalid Sync CTXT Len" error. I'm sending the logs in attach. You can find the "Invalid Sync CTXT Len" errors in logs/sc/cc-2/mds.log file. Thanks and regards, Sérgio Marques -----Original Message----- From: Sérgio Marques Sent: 11 de fevereiro de 2022 15:23 To: Mohan Kanakam <mohan@GetHighAvailability.com>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, To reproduce the problem I only need to power up the cluster nodes and launch our applications at the SC and PL nodes immediately after openSAF coming up. These applications starts creating some checkpoints as the one bellow. [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1644591280000000000 (0x16d2c30e451ae000, Fri Feb 11 14:54:40 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After creating these ckpts, the nodes start creating, reading and writing their sections. I'm sending in attach the requested logs. Please feel free to ask for more info/logs/tests. Thanks a lot for your help, Regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 10 de fevereiro de 2022 17:57 To: 'Mohan Kanakam' <mohan@GetHighAvailability.com>; Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, Can you please share syslog and mdslog of all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 10 February 2022 23:23 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 10 February 2022 20:17 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-10 17:57:18
|
Hi Sergio, Can you please share syslog and mdslog of all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 10 February 2022 23:23 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 10 February 2022 20:17 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-10 17:53:21
|
Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 10 February 2022 20:17 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Sérgio M. <ser...@al...> - 2022-02-10 14:46:57
|
Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-09 18:07:31
|
Hi Sergio, Can you please let us know the opensaf version being used. Since, I dont know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Sérgio M. <ser...@al...> - 2022-02-09 09:20:28
|
Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len |
From: Mohan K. <mohan@GetHighAvailability.com> - 2022-01-27 12:12:12
|
Hi Jim, We downloaded opensaf 5.2.0 and started SC-1(Active) and SC-2(Standby) running 2N redundancy model Amf demo on both the controllers. Amf demo is Act on SC-1 and Standby on SC-2. We performed the following commands to do admin operations: 1. amf-adm lock safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster [This node group contains SC-1 and SC-2 in imm.xml (default)] The assignments got removed from Amf demo instances. 2. amf-adm lock-in safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster Both demo instances got terminated. 3. amf-adm unlock-in safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster Both demo instances got started. 4. amf-adm unlock safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster Amf demo instances got Act and Standby assignments. We did immlist on SUs of Amf demo, they are all in IN-SERVICE with assignments. We also did some more configuration changes like: - Standby instance was hosted on payload(PL-3) and then performed the same steps. Everything was as per expectation. - Standby instance was hosted on payload(PL-3) but payload was down while performing the above steps. Everything was as per expectation here as SU2 was never up and was always showing out of service. So, we couldn't reproduce the issue reported. If we haven't performed the steps, you did, then please correct our steps and we would try again. If possible, can you please share the traces of Amfnd and Amfd and immdump output when issue is reproduced. You can enable amf traces as below: kill -USR2 <amfd pid> kill -USR2 <amfnd pid> And then run the steps to reproduce. You can disable the traces again by running the same commands as mentioned above. Also, you can check: - operational, administrative, and presence states of a service unit - operational state of its containing node - administrative states of its containing node, service group, application, and the cluster - administrative state of ClmCluster As per Amf Specs: "The operational, administrative, and presence states of a service unit, the operational state of its containing node, and the administrative states of its containing node, service group, application, and the cluster are combined into another state, called the readiness state of a service unit." Hope it helps! Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Carroll, James R [mailto:jam...@lm...] Sent: 26 January 2022 01:53 To: Ope...@li... Subject: [users] troubles getting HA Assignment Hi All, We are using OpenSAF 5.2.0, and we are utilizing the OpenSAF Node Group extension, which allows for Admin Commands to be issued in parallel to nodes in the cluster. The node group commands all work as expected, and we are getting back a success code. However, the nodes never transition to a state where they receive an HA assignment. Below is the sequence of commands: 1. All Nodes in cluster are fully up and operational 2. Send Node Group Command to Lock all Nodes * Success - Nodes and SUs achieve LOCK state 3. Send Node Group Command to Lock_Instantiate all Nodes * Success - Nodes and SUs achieve LOCK_Instantiate state 4. Note - at this point, the cluster is completely down, as expected. The only processes running are OpenSAF processes. 5. Send Node Group Command to UnLock_Instantiate all Nodes * Success - Nodes and SUs achieve LOCK state 6. Send Node Group Command to UnLock all Nodes * Success - Nodes and SUs achieve UnLock state 7. At this point, the system should be fully operational. But instead we have the following: * Node States: i. Admin State = unlocked ii. Operational State = enabled * SU States: i. Admin State = unlocked ii. Operational State = enabled iii. Presence State = instantiated iv. READINESS STATE = OUT OF SERVICE v. HA STATE = NONE ASSIGNED We cannot figure out why the Readiness State is OUT OF SERVICE. It seems like something has prevented the AMF from assigning an HA STATE to the SUs, but it is not clear what this is. Has anyone encountered a similar issue, where HA STATE were not getting assigned? Does anyone have a recommendation on how to troubleshoot this issue? Thanks Jim _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |
From: Thang D. N. <tha...@de...> - 2022-01-26 02:25:32
|
Hi Carroll, This version is too old. Seen it is not supported any more. You can try on the latest release e.i, 5.22.01. B.R/Thang -----Original Message----- From: Carroll, James R <jam...@lm...> Sent: Wednesday, January 26, 2022 3:23 AM To: Ope...@li... Subject: [users] troubles getting HA Assignment Hi All, We are using OpenSAF 5.2.0, and we are utilizing the OpenSAF Node Group extension, which allows for Admin Commands to be issued in parallel to nodes in the cluster. The node group commands all work as expected, and we are getting back a success code. However, the nodes never transition to a state where they receive an HA assignment. Below is the sequence of commands: 1. All Nodes in cluster are fully up and operational 2. Send Node Group Command to Lock all Nodes * Success - Nodes and SUs achieve LOCK state 3. Send Node Group Command to Lock_Instantiate all Nodes * Success - Nodes and SUs achieve LOCK_Instantiate state 4. Note - at this point, the cluster is completely down, as expected. The only processes running are OpenSAF processes. 5. Send Node Group Command to UnLock_Instantiate all Nodes * Success - Nodes and SUs achieve LOCK state 6. Send Node Group Command to UnLock all Nodes * Success - Nodes and SUs achieve UnLock state 7. At this point, the system should be fully operational. But instead we have the following: * Node States: i. Admin State = unlocked ii. Operational State = enabled * SU States: i. Admin State = unlocked ii. Operational State = enabled iii. Presence State = instantiated iv. READINESS STATE = OUT OF SERVICE v. HA STATE = NONE ASSIGNED We cannot figure out why the Readiness State is OUT OF SERVICE. It seems like something has prevented the AMF from assigning an HA STATE to the SUs, but it is not clear what this is. Has anyone encountered a similar issue, where HA STATE were not getting assigned? Does anyone have a recommendation on how to troubleshoot this issue? Thanks Jim _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |