From: Mohan K. <mohan@GetHighAvailability.com> - 2022-02-16 19:13:41
|
Hi Sergio, I have an update. We have the patch for the scenario we could reproduce. We will test it and we will share it to you for testing your scenarios. Once you confirm, I will float the patch in the community. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 14 February 2022 22:24 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the logs. We could also reproduce the issue by simply running the demo application as below. <139>1 2022-02-14T22:15:09.496432+05:30 sc1-VirtualBox osafckptnd 27692 mds.log [meta sequenceId="2"] MDS_SND_RCV: Invalid Sync CTXT Len We will debug the issue this week and will let you know. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 14 February 2022 16:03 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, I have increased the MDS log level (export MDS_LOG_LEVEL=5) to have more detail in the "MDS_SND_RCV: Invalid Sync CTXT Len" error. I'm sending the logs in attach. You can find the "Invalid Sync CTXT Len" errors in logs/sc/cc-2/mds.log file. Thanks and regards, Sérgio Marques -----Original Message----- From: Sérgio Marques Sent: 11 de fevereiro de 2022 15:23 To: Mohan Kanakam <mohan@GetHighAvailability.com>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, To reproduce the problem I only need to power up the cluster nodes and launch our applications at the SC and PL nodes immediately after openSAF coming up. These applications starts creating some checkpoints as the one bellow. [root@OLT2T4-UNICOM-2~]# immlist safCkpt=CKPT_BACKPLANE_CONTROL,safApp=safCkptService Name Type Value(s) ======================================================================== safCkpt SA_STRING_T safCkpt=CKPT_BACKPLANE_CONTROL saCkptCheckpointUsedSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointSize SA_UINT64_T 2024 (0x7e8) saCkptCheckpointRetDuration SA_TIME_T 9223372036854775807 (0x7fffffffffffffff, Sat Jan 27 10:50:44 1990) saCkptCheckpointNumWriters SA_UINT32_T 7 (0x7) saCkptCheckpointNumSections SA_UINT32_T 22 (0x16) saCkptCheckpointNumReplicas SA_UINT32_T 2 (0x2) saCkptCheckpointNumReaders SA_UINT32_T 7 (0x7) saCkptCheckpointNumOpeners SA_UINT32_T 7 (0x7) saCkptCheckpointNumCorruptSections SA_UINT32_T 0 (0x0) saCkptCheckpointMaxSections SA_UINT32_T 22 (0x16) saCkptCheckpointMaxSectionSize SA_UINT64_T 92 (0x5c) saCkptCheckpointMaxSectionIdSize SA_UINT64_T 1 (0x1) saCkptCheckpointCreationTimestamp SA_TIME_T 1644591280000000000 (0x16d2c30e451ae000, Fri Feb 11 14:54:40 2022) saCkptCheckpointCreationFlags SA_UINT32_T 2 (0x2) SaImmAttrImplementerName SA_STRING_T safCheckPointService SaImmAttrClassName SA_STRING_T SaCkptCheckpoint SaImmAttrAdminOwnerName SA_STRING_T <Empty> After creating these ckpts, the nodes start creating, reading and writing their sections. I'm sending in attach the requested logs. Please feel free to ask for more info/logs/tests. Thanks a lot for your help, Regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 10 de fevereiro de 2022 17:57 To: 'Mohan Kanakam' <mohan@GetHighAvailability.com>; Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Atenção: Este email foi originado fora da Altice Portugal. Por favor, não clique em links nem abra anexos, a não ser que conheça o remetente e saiba que o seu conteúdo é seguro. Hi Sergio, Can you please share syslog and mdslog of all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mohan@GetHighAvailability.com] Sent: 10 February 2022 23:23 To: 'Sérgio Marques'; 'ope...@li...' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 10 February 2022 20:17 To: Mohan Kanakam; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mohan@GetHighAvailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <ser...@al...>; ope...@li... Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:ser...@al...] Sent: 09 February 2022 14:50 To: ope...@li... Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |