From: Vlad Y. <vla...@hp...> - 2010-08-17 12:47:07
|
On 08/17/2010 04:59 AM, Kumar, Vivek (NSN - IN/Bangalore) wrote: > Hello Vlad, > > After going through the traffic capture taking into account the inputs provided by you I have observed a few things. > > Before the cable of the primary address of EP1 was unplugged: > > Primary of EP1 sent HEARTBEAT to Primary of EP2. > Primary of EP2 replied with HEARTBEAT_ACK to primary of EP1 > Primary of EP2 sent HEARTBEAT to Primary of EP1. > Primary of EP1 replied with HEARTBEAT_ACK to primary of EP2 > > The cable remains unplugged for about 10 minutes > > After the cable of the primary of EP1 was plugged back in: > Primary of EP1 sent HEARTBEAT to Primary of EP2. > Primary of EP2 replied with HEARTBEAT_ACK to primary of EP1 > Primary of EP2 sent HEARTBEAT to Primary of EP1. > No reply from primary of EP1. > This behavior continues after that ,i.e., no HEARTBEAT_ACKs are sent from the primary of EP1 to the primary of EP2 once the cable is plugged back in. > > We have handled the SCTP_ADDR_AVAILABLE as a part of processing the events at the local end. > But we do not receive SCTP_ADDR_AVAILABLE indication from the SCTP stack even after the network cable is plugged in and the heartbeat generated at the local end(i.e., Primary of EP1) is acknowledged by the peer (i.e., primary of EP2) > > Do you think it could be a stack issue? It certainly sounds like it. The app running on EP1 should have received the SCTP_ADDR_AVAILABLE notification when the HEARTBEAT_ACK was received. I am a bit concerned that EP1 doesn't seem to respond to the HB. Can you rebuild the sctp kernel module with debugging enabled and run this test? Your kern.log will contain debugging output. I'll also try to simulated it here and see what happens. -vlad > Also why does the Primary of EP1 does not reply to the heartbeats from the primary of EP1? > What could cause the SCTP stack at the local end (i.e.,EP1) to not raise SCTP_ADDR_AVAILABLE even though heartbeat generated at the local end(i.e., Primary of EP1) is acknowledged by the peer (i.e., primary of EP2)? > > Regards, > Vivek > > -----Original Message----- > From: ext Vlad Yasevich [mailto:vla...@hp...] > Sent: Monday, August 16, 2010 7:17 PM > To: Kumar, Vivek (NSN - IN/Bangalore) > Cc: lks...@li... > Subject: Re: SCTP responding back on secondary link even though primary is up > > Hello Vivek > > On 08/16/2010 08:57 AM, Kumar, Vivek (NSN - IN/Bangalore) wrote: >> Hello Vlad, >> >> Thank you for taking your time off to help me. >> >> The kernel version we are currently using is 2.6.21.7. >> >> Although we are handling the SCTP_PEER_ADDR_CHANGE event but we don’t handle SCTP_ADDR_AVAILABLE notification. >> As per the handling of the SCTP_PEER_ADDR_CHANGE event we handle SCTP_ADDR_UNREACHABLE notification though this >> is just written to the log as a debug message,i.e. , no explicit action is performed as a part of handling this notification. >> Should we be processing the SCTP_ADDR_AVAILABLE ? > > Well, this was more from a debugging perspective. You can log it to make sure you actually > got the notification that the "primary" is back. Until you get this notification, the > primary has not been probed. So just the fact that you connected the wire, doesn't mean > that SCTP knows that the destination is back. Also, the path comes back when the local > ends generates a Heartbeat that's answered. If all we are doing is answering someone else's > Heartbeats, the path state does not change. > > >> I believe that it serves an informational purpose more than anything else. >> >> For the third point,i.e., possible that the upper layer is specifying a different destination, the application logs don’t show >> Any thing in its favor which makes me conclude that there in no actual change in the primary destination. >> > > I think that once you log that address is reachable, you should be assured that it will > be used (if it was primary). Prior to that, the address is not reachable and will no > be used. > > -vlad > >> Regards, >> Vivek >> >> -----Original Message----- >> From: ext Vlad Yasevich [mailto:vla...@hp...] >> Sent: Friday, August 13, 2010 7:10 PM >> To: Kumar, Vivek (NSN - IN/Bangalore) >> Cc: lks...@li... >> Subject: Re: SCTP responding back on secondary link even though primary is up >> >> On 08/13/2010 08:18 AM, Kumar, Vivek (NSN - IN/Bangalore) wrote: >>> Hello Sir, >>> Let me introduce myself as Vivek Kumar, currently working with Nokia >>> Siemens Networks . >>> We are facing a strange situation while testing our stack. >>> >>> The issue is as described below: >>> We have two multihomed endpoints(consisting of 2 IPs each,henceforth >>> referred as primary and secondary). >>> >>> On top of SCTP stack we have diameter running. >>> >>> We also have routing policies which enforce that messages for the >>> Primary of EP2 are always sent via the primary of EP1.The same holds >>> true for the secondary. >>> >>> >>> EP1(Primary/secondary)-----------------------------------------EP2(primary/secondary) >>> >>> >>> >>> The association is initially established with the primaries exchanging >>> the 4 way handshake. >>> No data other than the Diameter capability exchange message is passed >>> between the two endpoints. >>> After which we observe Heartbeats being exchanged between the concerned >>> primaries and secondaries. >>> >>> We then plug out the cable of the primary interface on EP1. >>> After heartbeats are exchanges the primary is marked as inactive and we >>> get the SCTP_ADDR_UNREACHABLE notification which is displayed by >>> diameter stack (though no further processing takes place) >>> >>> At this time any Diameter messages are exchanges are done between >>> secondary of EP1 and EP2. >>> >>> After about 10 minutes the cable of the primary interface on EP1 is >>> plugged back in. >>> >>> We see that the heartbeats being exchanges between the primaries of EP1 >>> which confirms that the primary of EP has come back to ACTIVE state (and >>> theoretically reclaim its primary role) >>> >>> But when we try to establish a diameter session we see the messages are >>> still being exchanges between the secondaries of the endpoints(where as >>> the SCTP rfc clearly states that once the primary ips are active all >>> data exchanges should happen on the primaries). >>> >>> Could you please help me explain why we could be facing such an issue? >>> >> >> May I ask which kernel version you are using? >> >> Do you get a SCTP_PEER_ADDR_CHANGE notification to indicate that the primary >> is up? >> >> From what I am seeing, if the primary comes back to life, it will be set as >> current active and all traffic will automatically flow over that unless a >> different destination is specified. Is it possible that the upper layer >> is specifying a different destination? >> >> -vlad >> >>> Thanks in advance. >>> >>> Regards, >>> Vivek >>> >> > |