|
From: Alexander D. <ale...@gm...> - 2017-08-03 01:08:17
|
On Tue, Aug 1, 2017 at 8:29 AM, Codrut Grosu <co...@ix...> wrote: > Hi, > > We believe there might be a bug in the i40evf driver. > > We think that there is race issue between i40evf_reset_task and i40evf_down / i40evf_open. > The reason is that the functions napi_enable / napi_disable must be called in pairs > in order not to loop indefinitely (or crash). > > Consider the following: > > ifconfig eth1 mtu 1000 & ifconfig eth1 up > > What happens now is that the change of mtu schedules a reset. Suppose the reset task starts, > and the first call to netif_running (after continue_reset) returns false. Before the thread reaches > the second call to netif_running, i40evf_open starts in another thread. Then the second netif_running > in reset will return true, and we will have 2 consecutive calls of napi_enable. > > We could not reproduce this particular situation in practice (probably due to the short timing). > However, we did hang the driver using a call to ndo_close() followed quickly by > "ethtool -G eth1 rx 4096". In this case netif_running will return true always (as we bypassed > the call to dev_close), the reset will be scheduled before the interface finishes going down, > and 2 calls to napi_disable will happen. So this last statement isn't exactly clear. There isn't an ndo_close() in the kernel last I knew. There is an ndo_stop() and an i40evf_close(), but there isn't an ndo_close(). Can you clarify which one of these you were calling? I ask because ndo_stop() and i40evf_close() should only be called with the RTNL lock held. It sounds like you may not be doing that and that could cause a collsion with something like an "ethtool -G" command because the ethtool ioctl is taking the RTNL lock to avoid such collisions and if your call that is getting into i40evf_close() isn't holding the RTNL then that might explain the issue. - Alex |