From: Stephens, A. <all...@wi...> - 2010-03-23 19:51:08
|
Hi Jon: I'm currently examining Osamu's patch, which attempts to ensure that a TIPC link endpoint doesn't transmit an out-dated link protocol message by mistake. On the surface, it doesn't seem like there should ever be a case where Osamu's code will purge a deferred protocol message, since it occurs after tipc_link_send_proto_msg() has checked that the link's bearer is not congested -- in which case there should be no deferred messages associated with the link (either normal messages or a protocol message). However, I think I've identified at least one scenario which could result in the creation and later use of an "orphan" protocol message: 1) The link endpoint goes into the RR state. 2) The link endpoint attempts to transmit a link protocol message (an ACTIVATE, to be specific), but is unable to do so because the bearer is congested; the ACTIVATE message is queued. 3) The link endpoint is reset for some reason (for example, by the arrival of a neighbor discovery message with a node signature different from the one it was expecting); the deferred ACTIVATE message is not flushed because tipc_link_reset() doesn't do much work if the link endpoint is in the RU or RR states. 4) The link endpoint then comes up normally; the orphaned protocol message remains. 5) Numerous data and protocol messages are sent by the link endpoint. 6) The link becomes congested. 7) The link becomes uncongested; the orphaned protocol message is transmitted. (Aaack!) Osamu's patch attempts to get rid of an orphaned protocol message by ensuring that it is purged whenever a new protocol message is about to be sent. However, I'm wondering if a better solution is to simply remove all of the code that deals with deferred protocol messages, thereby eliminating any chance that there are other as-yet-unknown paths that could cause an orphan protocol message to be sent unexpectedly. To be honest, I'm a bit dubious about the entire concept of deferring the sending of a link protocol message, since it looks like the code we currently have in place populates the message with information that could potentially be outdated by the time the message is actually sent. In addition, I'm not sure if there is really a great deal of benefit in attempting to defer them, since the most common scenarios in which protocol messages are sent appear to be as follows: 1) The link is working and the bearer is not congested. 2) The link is reset and the bearer is not congested. 3) The link is working and the bearer is congested (as a result of an impending link failure). 4) The link is working and the bearer is congested (temporarily, and will recover shortly). In cases 1) and 2), there is no need for a deferred queue since an attempt to send a protocol message will succeed. In case 3), there is also no need for a deferred queue since the protocol message will be trashed when the link fails. Only in case 4) does the deferred queue provide any potential value, and since the message being sent (almost certainly a STATE message) isn't essential the system should be able to continue without it; the only potential issue I can see here might be if the user sends so much traffic that the bearer is almost permanently congested, in which case an on-going failure to transmit STATE messages could potentially cause problems. As far as I can tell, removing support for deferred protocol messages is pretty easy; however, we'd have to test case 4) under stress conditions to ensure there were no surprises. I'd be interested in hearing your views on this proposal. If you're not comfortable with making this change in the short term, I think the next best thing would be for me to adapt Osamu's patch so that the code to purge an unsent protocol message happens right at the start of tipc_link_send_proto_msg(), rather than further down, so that any attempt to send a protocol message will "invalidate" any previously deferred protocol message that is now out-dated. Regards, Al |