From: Suryanarayana G. <SGa...@go...> - 2010-08-25 04:15:40
|
Hi Felix, As far I had experimented and my development history, this issue was observed on redundant links only. With a single link this issue was never observed. Regards Surya > -----Original Message----- > From: Nayman Felix-QA5535 [mailto:Fel...@mo...] > Sent: Tuesday, August 24, 2010 10:09 PM > To: Stephens, Allan; tip...@li... > Subject: Re: [tipc-discussion] TIPC Name table distribution problem in > TIPC1.7.6 > > Al, > > So we ran the test you suggested to just use "Al's patch without > Surya's" and we were able to reproduce our name table distribution > problem. > >From reading your post and the comment about link changeover, I had a > question. Are you saying that this problem is related to having > redundant links? > We recently (2 months ago) changed to use redundant links in an > active/standby configuration. This coincided in time with us switching > to TIPC 1.7.6 and seeing all of the problems that we've recently seen. > So I'm wondering if we went back to using only 1 link between nodes > instead of 2, do you think that our name table distribution problem > will > go away? > > Thanks, > Felix > > > > > -----Original Message----- > From: Stephens, Allan [mailto:all...@wi...] > Sent: Monday, August 16, 2010 1:07 PM > To: Nayman Felix-QA5535; Suryanarayana Garlapati; > tip...@li... > Subject: RE: [tipc-discussion] TIPC Name table distribution problem in > TIPC1.7.6 > > Hi Felix: > > I'm finally back from vacation and can respond to your recent emails > (at > least to some degree). > > I'd be interested in knowing whether you experience any problems if you > try running TIPC 1.7.7-rc1 with Al's patch (presumably the one entitled > "Prevent broadcast link stalling in dual LAN environment") but not > Surya's patch. From the data you supplied below it is conceivable that > this might be sufficient to resolve your issue. I did incorporate a > portion of Surya's patch into the TIPC 1.7 stream, but I haven't yet > brought in the part that changes the default link window size or which > resets the link's reset_checkpoint when a link changeover is aborted; > the delay is because I'm not yet convinced that either of these changes > is necessary and/or correct. If you encounter problems when Surya's > changesa re missing this would provide evidence that they are actually > needed. > > I don't have any guidance to provide on what link window size values > should be used for either the unicast links or for the broadcast link; > I > suspect that the value will depend heavily on the type of hardware > you're running in your system and the nature of the traffic load you're > passing between nodes, and that you'll need to experiment to see what > values work best for your system. It looks like Peter was running his > traffic over high speed Ethernet interfaces and found that he needed to > increase his unicast window size to prevent the links from declaring > congestion prematurely; presumably the larger window sizes helped > improve his link throughput values. I've got no idea where the > broadcast link window size of 224 came from; as with the unicast links, > you're probably best off to experiment to see what values work best in > your system. > > I'm continuing to investigate the entire broadcast link and dual link > areas to see what issues remain unresolved, as there are still some > known issues that look to be problematic. I suspect that there will be > a > few more patches added to TIPC 1.7 before things are totally > stabilized. > > Regards, > Al > > > -----Original Message----- > > From: Nayman Felix-QA5535 [mailto:Fel...@mo...] > > Sent: Sunday, August 15, 2010 10:41 PM > > To: Suryanarayana Garlapati; tip...@li... > > Subject: Re: [tipc-discussion] TIPC Name table distribution > > problem in TIPC1.7.6 > > > > All, > > > > So we've run a number of tests: > > 1)TIPC 1.7.6 - we were able to continuously reproduce the issue. > > 2) Surya's patch on top of 1.7.6. Result: We could not reproduce > the > > issue > > 3) Surya's patch on top of 1.7.6 with the window size set to > > 20 instead of 50. Result: We were able to reproduce the issue. > > This means that at least for our test run, the window size > > change was the main reason why we didn't see the name table > > distribution problem. > > 4) Surya's and Al's patch on top of 1.7.6. Result: We could > > not reproduce the issue > > 5) Surya's and Al's patch on top of 1.7.6 with the window > > size set to 20 instead of 50. Result: We could not reproduce > > the issue > > 6) TIPC 1.7.7-rc1 with Surya's and Al's patch. Result: We > > could not reproduce the issue > > > > > > So it looks like a combination of Al's patch and Surya's > > patch seems to prevent our name table distribution problem > > from happening no matter if the window size is 20 or 50, but > > Surya's patch does not work when we reduced the window size > > down to 20. > > > > I saw that in TIPC 1.7.6 support for window sizes as high as > > 8192 was added. We are considering increasing our window > > size for both the broadcast link and unicast links from the > > current default size of 50. > > I'm wondering if there is a recommended maximum value. From > > Peter Litov's post it appears that he's tried 1024 for the > > unicast link window size. Are there any issues with this > > value? Has it been found to improve throughput and alleviate > > congestion issues? Are there any drawbacks or side effects? > > He mentions a broadcast link window size of 224. I'm not > > sure why 224 is a magic number, but are there any > > recommendations for the broadcast link window size. > > > > Thanks for any feedback, > > Felix > > > > -----Original Message----- > > From: Suryanarayana Garlapati [mailto:SGa...@go...] > > Sent: Tuesday, August 03, 2010 12:45 AM > > To: Nayman Felix-QA5535; tip...@li... > > Subject: RE: TIPC Name table distribution problem in TIPC 1.7.6 > > > > Hi Felix, > > This is a issue of broadcast link congestion only. Please try > > my patch and let me know whether it works. I had faced the > > similar issue and with > > > > my patch, the issue was fixed. > > > > Regards > > Surya > > > > > > > -----Original Message----- > > > From: Nayman Felix-QA5535 [mailto:Fel...@mo...] > > > Sent: Tuesday, August 03, 2010 1:06 AM > > > To: Suryanarayana Garlapati; tip...@li... > > > Subject: RE: TIPC Name table distribution problem in TIPC 1.7.6 > > > > > > All, > > > > > > So I tried to disable both bearers and the broadcast-link is still > > up?? > > > How is that possible? > > > From the stats, it appears that we can't send out any > > messages on the > > > broadcast link. I tried and nothing got sent so it appears > > that the > > > link is permanently congested with respect to sending > > messages. The > > > broadcast link does appear to be receiving messages though. When I > > > enabled both bearers the broadcast link did not recover. > > > > > > We have redundant links between nodes in an active/standby > > > configuration with 1 link having a priority of 10 and the other a > > > priority of 9. > > How > > > does the broadcast link choose a bearer? If we only had one bearer > > > would we not see this problem? > > > > > > > > > bash-3.1# tipc-config -b > > > Bearers: > > > eth:bond0 > > > eth:bond1 > > > bash-3.1# tipc-config -bd eth:bond0 > > > bash-3.1# tipc-config -bd eth:bond1 > > > bash-3.1# tipc-config -ls=broadcast-link Link statistics: > > > Link <broadcast-link> > > > Window:20 packets > > > RX packets:14407 fragments:0/0 bundles:39/62 > > > TX packets:20 fragments:0/0 bundles:111/2314 > > > RX naks:0 defs:0 dups:0 > > > TX naks:0 acks:903 dups:0 > > > Congestion bearer:0 link:0 Send queue max:131 avg:75 > > > > > > bash-3.1# tipc-config -l > > > Links: > > > broadcast-link: up > > > > > > > > > > > > Thanks, > > > Felix > > > > > > > > > -----Original Message----- > > > From: Nayman Felix-QA5535 > > > Sent: Monday, August 02, 2010 10:14 AM > > > To: 'Suryanarayana Garlapati'; tipc- > dis...@li... > > > Subject: RE: TIPC Name table distribution problem in TIPC 1.7.6 > > > > > > Surya, > > > > > > Thanks for the quick response. Here are the broadcast link > > stats from > > > that node: > > > > > > mpug@ATCA35_pl0_1:/usr/vob/mp/common/tools/linux$ tipc-config > > > -ls=broadcast-link Link statistics: > > > Link <broadcast-link> > > > Window:20 packets > > > RX packets:14375 fragments:0/0 bundles:39/62 > > > TX packets:20 fragments:0/0 bundles:110/2301 > > > RX naks:0 defs:0 dups:0 > > > TX naks:0 acks:901 dups:0 > > > Congestion bearer:0 link:0 Send queue max:130 avg:74 > > > > > > Wouldn't I expect to see link congestion set to a non-zero > > value with > > a > > > send queue max size of 130? Yes, we are using redundant > > links between > > > nodes, but with different Ethernet bearers for each link. > > > > > > I assume that the name table distributions are sent over > > the broadcast > > > link and that is why you suspect that this is the problem. Would > > there > > > be any sign of trouble in the syslog when this happens (kernel > > messages > > > from TIPC). I'll look through the syslog for clues. > > > > > > Thanks, > > > Felix > > > > > > -----Original Message----- > > > From: Suryanarayana Garlapati [mailto:SGa...@go...] > > > Sent: Monday, August 02, 2010 9:09 AM > > > To: Nayman Felix-QA5535; tip...@li... > > > Subject: RE: TIPC Name table distribution problem in TIPC 1.7.6 > > > > > > Hi Felix, > > > I am suspecting that this is a broadcast link congestion being > > > happening. > > > Please use the following patch on top of 1.7.6-RC3 and let me know > > > whether it solves your problem. > > > By the way are you using any redudant links for the bearer? > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > +++++++++ > > > + > > > +++++++++++++++++++++ > > > diff -uNr findings_tipc/net/tipc/tipc_bcast.c > > > patch_tipc/net/tipc/tipc_bcast.c > > > --- findings_tipc/net/tipc/tipc_bcast.c 2008-10-01 > > 23:27:21.000000000 > > > +0530 > > > +++ patch_tipc/net/tipc/tipc_bcast.c 2009-07-30 > > 10:43:35.000000000 > > > +0530 > > > @@ -50,7 +50,7 @@ > > > > > > #define MAX_PKT_DEFAULT_MCAST 1500 /* bcast link max packet size > > > (fixed) */ > > > > > > -#define BCLINK_WIN_DEFAULT 20 /* bcast link window > > size > > > (default) */ > > > +#define BCLINK_WIN_DEFAULT 50 /* bcast link window > > size > > > (default) */ > > > > > > #define BCLINK_LOG_BUF_SIZE 0 > > > > > > diff -uNr findings_tipc/net/tipc/tipc_node.c > > > patch_tipc/net/tipc/tipc_node.c > > > --- findings_tipc/net/tipc/tipc_node.c 2008-10-01 > > 23:27:21.000000000 > > > +0530 > > > +++ patch_tipc/net/tipc/tipc_node.c 2009-07-30 10:45:21.000000000 > > > +0530 > > > @@ -313,7 +313,7 @@ > > > for (i = 0; i < TIPC_MAX_BEARERS; i++) { > > > l_ptr = n_ptr->links[i]; > > > if (l_ptr != NULL) { > > > - l_ptr->reset_checkpoint = l_ptr->next_in_no; > > > + l_ptr->reset_checkpoint = 1; > > > l_ptr->exp_msg_count = 0; > > > tipc_link_reset_fragments(l_ptr); > > > } > > > @@ -361,6 +361,8 @@ > > > tipc_bclink_acknowledge(n_ptr, > > mod(n_ptr->bclink.acked + 10000)); > > > tipc_bclink_remove_node(n_ptr->elm.addr); > > > } > > > + > > > + memset(&n_ptr->bclink,0,sizeof(n_ptr->bclink)); > > > > > > #ifdef CONFIG_TIPC_MULTIPLE_LINKS > > > node_abort_link_changeover(n_ptr); > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > +++++++++ > > > + > > > +++++++++++++++++++++ > > > > > > > > > > > > Regards > > > Surya > > > > > > > -----Original Message----- > > > > From: Nayman Felix-QA5535 [mailto:Fel...@mo...] > > > > Sent: Monday, August 02, 2010 12:42 AM > > > > To: tip...@li... > > > > Subject: [tipc-discussion] TIPC Name table distribution problem > in > > > TIPC > > > > 1.7.6 > > > > > > > > All, > > > > > > > > > > > > > > > > Just started running TIPC 1.7.6 on one of our lab systems and I'm > > > > seeing a problem where it appears that one of the nodes in the > > > > system is > > not > > > > distributing its name table entries to the rest of the > > nodes in the > > > > system. > > > > > > > > > > > > > > > > As an example , I have a process with a tipc name type of > > 75 and on > > > the > > > > node with a tipc address of 1.1.2 those entries are present: > > > > > > > > > > > > > > > > mpug@ATCA35_pl0_1:~$ tipc-config -nt=75 > > > > > > > > Type Lower Upper Port Identity > > > Publication > > > > Scope > > > > > > > > 75 1 1 <1.1.3:1384022146> > > > 1384022147 > > > > cluster > > > > > > > > <1.1.7:282460177> > > 282460178 > > > > cluster > > > > > > > > <1.1.6:2055446601> > > > 2055446602 > > > > cluster > > > > > > > > <1.1.5:2320122096> > > > 2320122097 > > > > cluster > > > > > > > > <1.1.4:83304688> > > 83304689 > > > > cluster > > > > > > > > <1.1.2:1118560329> > > > 1118560330 > > > > cluster > > > > > > > > <1.1.1:2317705313> > > > 2317705314 > > > > cluster > > > > > > > > <cut off for the sake of brevity> > > > > > > > > If I view the same entry on any other node in the system, > > I see the > > > > following: > > > > > > > > > > > > > > > > appadm@ATCA35_cm1:~$ tipc-config -nt=75 > > > > > > > > Type Lower Upper Port Identity > > > Publication > > > > Scope > > > > > > > > 75 1 1 <1.1.3:1384022146> > > > 1384022147 > > > > cluster > > > > > > > > <1.1.7:282460177> > > 282460178 > > > > cluster > > > > > > > > <1.1.6:2055446601> > > > 2055446602 > > > > cluster > > > > > > > > <1.1.5:2320122096> > > > 2320122097 > > > > cluster > > > > > > > > <1.1.4:83304688> > > 83304689 > > > > cluster > > > > > > > > <1.1.1:2317705313> > > > 2317705314 > > > > cluster > > > > > > > > <cut off for the sake of brevity> > > > > > > > > > > > > > > > > The entries for 1.1.2 are not there. In fact none of the > > processes > > > > running on the 1.1.2 node are visible in the tipc name > > table outside > > > of > > > > that node. Therefore, we cannot send any messages to > > that node (We > > > use > > > > the topology service to verify that a tipc name, domain > > combo that > > > > we're sending to is available. I've currently left the system in > > > this > > > > state. Any idea on why this could be happening? Or what we can > > look > > > > for to debug this problem? We were not seeing this problem with > > > > 1.5.12. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Felix > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > --------- > > > > ------- > > > > The Palm PDK Hot Apps Program offers developers who use > > the Plug-In > > > > Development Kit to bring their C/C++ apps to Palm for a > > share > > > > of $1 Million in cash or HP Products. Visit us here for more > > details: > > > > http://p.sf.net/sfu/dev2dev-palm > > > > _______________________________________________ > > > > tipc-discussion mailing list > > > > tip...@li... > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > -------------------------------------------------------------- > > ---------------- > > This SF.net email is sponsored by > > > > Make an app they can't live without > > Enter the BlackBerry Developer Challenge > > http://p.sf.net/sfu/RIM-dev2dev > > _______________________________________________ > > tipc-discussion mailing list > > tip...@li... > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > ----------------------------------------------------------------------- > ------- > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program > Be part of this innovative community and reach millions of netbook > users > worldwide. Take advantage of special opportunities to increase revenue > and > speed time-to-market. Join now, and jumpstart your future. > http://p.sf.net/sfu/intel-atom-d2d > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion |