From: Stephens, A. <all...@wi...> - 2010-08-31 20:18:01
|
FYI, I've now pushed the patch for item 2) in the email below into the TIPC 1.7 code stream. I'll get to the remaining items as soon as I can. Regards, Al ________________________________ From: Stephens, Allan Sent: Friday, August 27, 2010 1:54 PM To: 'Andrew Booth' Cc: tip...@li... Subject: RE: Status of broadcast link-related patches Hi Andrew: This is a bit complicated, so I'm going to discuss this subject in numbered parts. 1) So far, there has only been one broadcast link-related patch accepted into TIPC 1.7 that isn't already in TIPC 1.7.7-rc1. You can find it at the Ericsson TIPC repository using: http://tipc.cslab.ericsson.net/cgi-bin/gitweb.cgi?p=people/allan/tipc.gi t;a=shortlog;h=tipc1.7 it's the patch labelled "Prevent broadcast link stalling in dual LAN environments". I think this is the one commonly referred to as "Al's patch", although you'll note that Laser also contributed to the solution. 2) I've previously mentioned a second broadcast link fix that looks good to me, namely the one that swaps the lines: tipc_bclink_acknowledge(n_ptr, mod(n_ptr->bclink.acked + 10000)); tipc_bclink_remove_node(n_ptr->elm.addr); in node_lost_contact(). Unless someone objects I'm planning on submitting into TIPC 1.7 once I get a few minutes of spare time. This patch was developed as the result of an anonymous posting to the TIPC bug tracker. 3) I've got a patch from Surya from July 2009 which I think is the one referred to as "Surya's patch". This patch was partly incorporated into TIPC 1.7.7-rc1; you'll find it in the Ericsson repository (same as above) dated March 25, 2010 under the label "Enhance cleanup of broadcast link when contact with node is lost". I haven't yet incorporated the rest of the patch as I haven't been able to determine whether the other parts are really correct and/or essential. If you want to try them out, the other parts are: a) Change BCLINK_WIN_DEFAULT from 20 to 50. b) In node_abort_link_changeover() replace: l_ptr->reset_checkpoint = l_ptr->next_in_no; with: l_ptr->reset_checkpoint = 1; I'm a bit suspicous of item a) since it seems like a change that is more likely to be masking a problem than actually solving it; however it's probably relatively harmless, and if it makes your current problems go away then you might want to use it. I'm also suspicous about b) because I think the correct solution is to set reset_checkpoint to an invalid sequence number (eg. 0x10000) and check for this case in link_recv_changeover_msg(), rather than setting the field to a potentially legitimate sequence number (i.e. 1). Use at your own risk for the time being ... 4) I haven't had a chance to look at Ying Xue's emails yet, so I can't comment on them. 5) Likewise, I haven't had a chance to look at Laser's work relating to the initialization of a link's peer session, which is what I think Jon meant when he referred to "Laser's patch". Check out the TIPC mailing list archive for Jon's post of June 18th on this subject. Jon seems to think the work is correct so it'll probably be incorporated into TIPC 1.7 once I get some time. Hope this info helps. Regards, Al ________________________________ From: Andrew Booth [mailto:ab...@pt...] Sent: Friday, August 27, 2010 10:37 AM To: Stephens, Allan Cc: tip...@li... Subject: Hi Allan, We're now running TIPC 1.7.7rc1 and I'd like to apply the patches relevant to trouble with name table distribution and broadcast links getting stuck. I'm a little unclear as to what those patches are. They've been referred to informally as Allan's patch, Surya's patch and Laser's patch, but I'm not sure where to find the final versions. There's also a patch referenced here that seems to handle a different broadcast link problem: Aug 17, 2010 from Ying Xue Re: [tipc-discussion] Another possible fix for broadcast link woes? Aug 17, 2010 from Ying Xue [tipc-discussion] [PATCH] tipc: Fix bclink outqueue is released completely when a node is lost Any help would be appreciated. Thanks, Andrew |