From: Nivrutti K. <nkale@Brocade.com> - 2015-07-02 10:33:02
|
Following are few logs in /var/log/messages Some Malformed packet is received and connection is closed by controller //Controller Logs Jul 2 13:14:19 VEM-1 osafimmd[2641]: Node 11d0f request sync sync-pid:2841 epoch:0 Jul 2 13:14:20 VEM-1 osafimmnd[2656]: Announce sync, epoch:3 Jul 2 13:14:20 VEM-1 osafimmnd[2656]: SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER Jul 2 13:14:20 VEM-1 osafimmd[2641]: Successfully announced sync. New ruling epoch:3 Jul 2 13:14:20 VEM-1 osafimmnd[2656]: NODE STATE-> IMM_NODE_R_AVAILABLE Jul 2 13:14:20 VEM-1 immload: Sync starting Jul 2 13:14:20 VEM-1 immload: Synced 1541 objects in total Jul 2 13:14:20 VEM-1 osafimmnd[2656]: NODE STATE-> IMM_NODE_FULLY_AVAILABLE 12197 Jul 2 13:14:20 VEM-1 osafimmnd[2656]: Epoch set to 3 in ImmModel Jul 2 13:14:20 VEM-1 osafimmd[2641]: ACT: New Epoch for IMMND process at node 1010f old epoch: 2 new epoch:3 Jul 2 13:14:20 VEM-1 immload: Sync ending normally Jul 2 13:14:21 VEM-1 osafimmnd[2656]: SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM SERVER READY Jul 2 13:14:21 VEM-1 osafdtmd[2593]: DTM:dtm_comm_socket_recv() failed rc : 79 Jul 2 13:14:21 VEM-1 osafclmd[2707]: Node 72975 doesn't exist Jul 2 13:14:21 VEM-1 osafimmnd[2656]: Global discard node received for nodeId:11d0f pid:2841 Jul 2 13:22:18 VEM-1 osafimmnd[2656]: Global discard node received for nodeId:11d0f pid:0 Jul 2 13:22:34 VEM-1 osafimmd[2641]: Node 11d0f request sync sync-pid:3382 epoch:0 Jul 2 13:22:35 VEM-1 osafimmnd[2656]: Announce sync, epoch:4 Jul 2 13:22:35 VEM-1 osafimmnd[2656]: SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER Jul 2 13:22:35 VEM-1 osafimmd[2641]: Successfully announced sync. New ruling epoch:4 //Payload logs Jul 2 13:14:16 SDB-1 opensafd: Starting OpenSAF Services Jul 2 13:14:18 SDB-1 osafdtmd[2823]: Started Jul 2 13:14:19 SDB-1 osafimmnd[2841]: Started Jul 2 13:14:19 SDB-1 osafimmnd[2841]: Director Service is up Jul 2 13:14:19 SDB-1 osafimmnd[2841]: SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jul 2 13:14:19 SDB-1 osafimmnd[2841]: SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jul 2 13:14:19 SDB-1 osafimmnd[2841]: SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Jul 2 13:14:19 SDB-1 osafimmnd[2841]: NODE STATE-> IMM_NODE_ISOLATED Jul 2 13:14:20 SDB-1 osafimmnd[2841]: NODE STATE-> IMM_NODE_W_AVAILABLE Jul 2 13:14:20 SDB-1 osafimmnd[2841]: SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Jul 2 13:14:21 SDB-1 osafdtmd[2823]: DTM: Malformed packet recd, Ident : 1097688659, ver : 101 Jul 2 13:14:21 SDB-1 osafdtmd[2823]: DTM: Malformed packet recd, Ident : 1634956110, ver : 97 Jul 2 13:14:21 SDB-1 osafdtmd[2823]: DTM: Malformed packet recd, Ident : 1883464806, ver : 0 Jul 2 13:14:21 SDB-1 osafimmnd[2841]: Director Service in NOACTIVE state Jul 2 13:14:21 SDB-1 osafimmnd[2841]: Director Service is down Jul 2 13:22:18 SDB-1 opensafd[2810]: Timed-out for response from IMMND Jul 2 13:22:18 SDB-1 opensafd[2810]: Jul 2 13:22:18 SDB-1 opensafd[2810]: Going for recovery Jul 2 13:22:18 SDB-1 opensafd[2810]: Trying To RESPAWN /usr/lib64/opensaf/clc-cli/osaf-immnd attempt #1 Jul 2 13:22:18 SDB-1 opensafd[2810]: Sending SIGKILL to IMMND, pid=2829 Jul 2 13:22:34 SDB-1 osafimmnd[3382]: Started Jul 2 13:22:34 SDB-1 osafimmnd[3382]: Director Service is up Jul 2 13:22:34 SDB-1 osafimmnd[3382]: SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jul 2 13:22:34 SDB-1 osafimmnd[3382]: SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jul 2 13:22:34 SDB-1 osafimmnd[3382]: SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Jul 2 13:22:34 SDB-1 osafimmnd[3382]: NODE STATE-> IMM_NODE_ISOLATED Jul 2 13:22:35 SDB-1 osafimmnd[3382]: NODE STATE-> IMM_NODE_W_AVAILABLE Jul 2 13:22:35 SDB-1 osafimmnd[3382]: SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Jul 2 13:22:36 SDB-1 osafdtmd[2823]: DTM: Malformed packet recd, Ident : 1231908161, ver : 116 Jul 2 13:22:36 SDB-1 osafimmnd[3382]: Director Service in NOACTIVE state Jul 2 13:22:36 SDB-1 osafimmnd[3382]: Director Service is down Jul 2 13:25:36 SDB-1 osafimmnd[3382]: Director Service is down Jul 2 13:30:34 SDB-1 opensafd[2810]: Timed-out for response from IMMND Jul 2 13:30:34 SDB-1 opensafd[2810]: Could Not RESPAWN IMMND Getting this resolved is very critical to our product. Anyone have any idea what is happening here? Thanks, Nivrutti -----Original Message----- From: Anders Björnerstedt [mailto:and...@er...] Sent: Thursday, July 02, 2015 2:14 PM To: Nivrutti Kale; ope...@li... Subject: RE: [users] Timeout for response from IMMD Ok 1600 objects is nothing. The sync should get done in seconds after having been started. So you apparently have some configuration problem or communication problem. Hard to say what it is. /AndersBj -----Original Message----- From: Nivrutti Kale [mailto:nkale@Brocade.com] Sent: den 2 juli 2015 09:26 To: Anders Björnerstedt; ope...@li... Subject: RE: [users] Timeout for response from IMMD I have around 1600 objects which synced up correctly. When I enabled the dtmd trace between the nodes, some malformed packets is received on payload node and connection between controller and payload is closed. After this, there is no dtmd trace for those 8 minutes. And after 8 minutes " Time-out for response from IMMD" log comes in /var/log/messages. Any idea what is happening here? Is it possible to have a communication break between 2 VMs. Every time the time is exactly around 8 minutes. Thanks, Nivrutti -----Original Message----- From: Anders Björnerstedt [mailto:and...@er...] Sent: Tuesday, June 30, 2015 1:45 PM To: Nivrutti Kale; ope...@li... Subject: RE: [users] Timeout for response from IMMD Eight minutes is extremely long for a sync. Sync time of course depends on volume to be synced. How much data roughly are you using? That is number of IMM objects and average size per imm object roughly. The IMM programmers reference doc points out that the IMM is not suitable for storing large volumes of data. It is tested regularly to cope with 300 000 objects of 300 bytes average size. If you go beyond that then you are stretching the use case. The IMM is intended only for storing config data and runtime data for configuring and reflecting services Running in that OpenSAF cluster. /AndersBj -----Original Message----- From: Nivrutti Kale [mailto:ni...@co...] Sent: den 30 juni 2015 10:05 To: ope...@li... Subject: Re: [users] Timeout for response from IMMD Hi All, Sometimes, I am getting Time-out for response from IMMD issue while starting one of payload. I am using openSAF 4.5, though I have seen this issue with 4.2.2 as well. Also I want to understand how imm sync works between payload and controller. In this case payload waits for almost 8 minutes before any error recovery. Please find attached logs for controller and payload. Can we control these timeout values or can we off the imm sync, so that payload should not wait before any error recovery? Thanks, Nivrutti On Tue, Jun 30, 2015 at 1:33 PM, Nivrutti Kale <ni...@co...> wrote: > Hi All, > > Sometimes, I am getting Time-out for response from IMMD issue while > starting one of payload. > I am using openSAF 4.5, though I have seen this issue with 4.2.2 as well. > > Also I want to understand how imm sync works between payload and > controller. > In this case payload waits for almost 8 minutes before any error recovery. > > Please find attached logs for controller and payload. > > Can we control these timeout values or can we off the imm sync, so > that payload should not wait before any error recovery? > > > Thanks, > Nivrutti > > ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Opensaf-users mailing list Ope...@li... https://lists.sourceforge.net/lists/listinfo/opensaf-users |