[tipc-discussion] Problems with deferred link protocol message code

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Jon:

I'm currently examining Osamu's patch, which attempts to ensure that a
TIPC link endpoint doesn't transmit an out-dated link protocol message
by mistake.  On the surface, it doesn't seem like there should ever be a
case where Osamu's code will purge a deferred protocol message, since it
occurs after tipc_link_send_proto_msg() has checked that the link's
bearer is not congested -- in which case there should be no deferred
messages associated with the link (either normal messages or a protocol
message).  However, I think I've identified at least one scenario which
could result in the creation and later use of an "orphan" protocol
message:

1) The link endpoint goes into the RR state.
2) The link endpoint attempts to transmit a link protocol message (an
ACTIVATE, to be specific), but is unable to do so because the bearer is
congested; the ACTIVATE message is queued.
3) The link endpoint is reset for some reason (for example, by the
arrival of a neighbor discovery message with a node signature different
from the one it was expecting); the deferred ACTIVATE message is not
flushed because tipc_link_reset() doesn't do much work if the link
endpoint is in the RU or RR states.
4) The link endpoint then comes up normally; the orphaned protocol
message remains.
5) Numerous data and protocol messages are sent by the link endpoint.
6) The link becomes congested.
7) The link becomes uncongested; the orphaned protocol message is
transmitted.  (Aaack!)

Osamu's patch attempts to get rid of an orphaned protocol message by
ensuring that it is purged whenever a new protocol message is about to
be sent.  However, I'm wondering if a better solution is to simply
remove all of the code that deals with deferred protocol messages,
thereby eliminating any chance that there are other as-yet-unknown paths
that could cause an orphan protocol message to be sent unexpectedly.  

To be honest, I'm a bit dubious about the entire concept of deferring
the sending of a link protocol message, since it looks like the code we
currently have in place populates the message with information that
could potentially be outdated by the time the message is actually sent.
In addition, I'm not sure if there is really a great deal of benefit in
attempting to defer them, since the most common scenarios in which
protocol messages are sent appear to be as follows:

1) The link is working and the bearer is not congested.
2) The link is reset and the bearer is not congested.
3) The link is working and the bearer is congested (as a result of an
impending link failure). 
4) The link is working and the bearer is congested (temporarily, and
will recover shortly).

In cases 1) and 2), there is no need for a deferred queue since an
attempt to send a protocol message will succeed.  In case 3), there is
also no need for a deferred queue since the protocol message will be
trashed when the link fails.  Only in case 4) does the deferred queue
provide any potential value, and since the message being sent (almost
certainly a STATE message) isn't essential the system should be able to
continue without it; the only potential issue I can see here might be if
the user sends so much traffic that the bearer is almost permanently
congested, in which case an on-going failure to transmit STATE messages
could potentially cause problems.

As far as I can tell, removing support for deferred protocol messages is
pretty easy; however, we'd have to test case 4) under stress conditions
to ensure there were no surprises.

I'd be interested in hearing your views on this proposal.  If you're not
comfortable with making this change in the short term, I think the next
best thing would be for me to adapt Osamu's patch so that the code to
purge an unsent protocol message happens right at the start of
tipc_link_send_proto_msg(), rather than further down, so that any
attempt to send a protocol message will "invalidate" any previously
deferred protocol message that is now out-dated.

Regards,
Al

[tipc-discussion] Problems with deferred link protocol message code

Cluster wide IPC providing datagram, connection, and bus messaging

[tipc-discussion] Problems with deferred link protocol message code