|
From: Jon M. <jm...@re...> - 2020-11-20 17:25:34
|
Hi Howard,
This is the code executed when TIPC receives a NETDEV_CHANGE event:
switch (evt) {
| case NETDEV_CHANGE:
| | if (netif_carrier_ok(dev) && netif_oper_up(dev)) {
| | | test_and_set_bit_lock(0, &b->up);
| | | break;
| | }
| | fallthrough;
| case NETDEV_GOING_DOWN:
| | clear_bit_unlock(0, &b->up);
| | tipc_reset_bearer(net, b);
| | break;
| case NETDEV_UP:
| | test_and_set_bit_lock(0, &b->up);
| | break;
| case NETDEV_CHANGEMTU:
So, unless the bond interface really reports that it is going down TIPC
doesn't reset any links. And if it *does* report that it is going down,
what else can we do?
To me this looks more like a problem with the bond device rather than
with TIPC, but we might of course have misunderstood its expected behavior.
We will look into this.
On a different note, you could instead omit the bond interface and try
using dual TIPC links, which work in active-active mode and give better
performance.
Is that an option for you?
BR
Jon Maloy
On 11/19/20 11:36 PM, Howard Finer wrote:
> I am trying to use TIPC (kernel version 4.19) over a bond device that is
> configured for active-backup and arp monitoring for the slaves. If a slave
> goes down, TIPC is receiving a netdev_change during the timeframe that the
> bond device is working towards brining up the new slave. This causes TIPC
> to disable the bearer, which in turn causes a temporary loss of
> communication between the nodes.
>
>
>
> Instrumentation of the bond and tipc drivers shows the following:
>
> <6> 1 2020-11-19T23:58:33.111549+01:00 LABNBS5A kernel - - - [ 153.655776]
> Enabled bearer <eth:bond0>, priority 10
>
> <6> 1 2020-11-20T00:07:58.544040+01:00 LABNBS5A kernel - - - [ 718.799259]
> bond0: bond_ab_arp_commit: BOND_LINK_DOWN: link status definitely down for
> interface eth1, disabling it
>
> <6> 1 2020-11-20T00:07:58.544063+01:00 LABNBS5A kernel - - - [ 718.799261]
> bond0: bond_ab_arp_commit: do_failover, block netpoll_tx and call
> select_active_slave
>
> <6> 1 2020-11-20T00:07:58.544069+01:00 LABNBS5A kernel - - - [ 718.799263]
> bond0: bond_select_active_slave: bond_find_best_slave returned NULL
>
> <6> 1 2020-11-20T00:07:58.544072+01:00 LABNBS5A kernel - - - [ 718.799347]
> bond0: bond_select_active_slave: now running without any active interface!
>
> <6> 1 2020-11-20T00:07:58.544080+01:00 LABNBS5A kernel - - - [ 718.799349]
> bond0: bond_ab_arp_commit: do_failover, returned from select_active_slave
> and unblock netpoll tx
>
> <6> 1 2020-11-20T00:07:58.544081+01:00 LABNBS5A kernel - - - [ 718.799611]
> Resetting bearer <eth:bond0>
>
> <6> 1 2020-11-20T00:07:58.655535+01:00 LABNBS5A kernel - - - [ 718.907245]
> bond0: bond_ab_arp_commit: BOND_LINK_UP: link status definitely up for
> interface eth0
>
> <6> 1 2020-11-20T00:07:58.655545+01:00 LABNBS5A kernel - - - [ 718.907247]
> bond0: bond_ab_arp_commit: do_failover, block netpoll_tx and call
> select_active_slave
>
> <6> 1 2020-11-20T00:07:58.655548+01:00 LABNBS5A kernel - - - [ 718.907248]
> bond0: bond_select_active_slave: bond_find_best_slave returned slave eth0
>
> <6> 1 2020-11-20T00:07:58.655559+01:00 LABNBS5A kernel - - - [ 718.907249]
> bond0: making interface eth0 the new active one
>
> <6> 1 2020-11-20T00:07:58.655562+01:00 LABNBS5A kernel - - - [ 718.907560]
> bond0: bond_select_active_slave: first active interface up!
>
>
>
> With arp based monitoring only 1 slave will be 'up'. When the active slave
> goes down, the other slave needs to be brought up. During that timeframe we
> see TIPC is resetting the bearer. That defeats the entire purpose of
> using the bond device.
>
> It seems that the handling of the netdev_change event for a active/backup
> bond device is not correct. It needs to leave the bearer intact so that
> when the backup slave is brought up the communication is properly restored
> without any upper layer applications being aware that something happened at
> the lower level.
>
>
>
> Thanks,
>
> Howard
>
>
> _______________________________________________
> tipc-discussion mailing list
> tip...@li...
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>
|