|
From: Nicholas K. <xer...@gm...> - 2015-02-27 21:25:22
|
On February 27, 2015 2:42:29 PM EST, "Nelson, Shannon" <sha...@in...> wrote:
>> From: nick [mailto:xer...@gm...]
>> On 2015-02-27 09:16 AM, Stefan Assmann wrote:
>> > On 27.02.2015 15:02, nick wrote:
>> >
>> > [...]
>> >
>> >>> i40e: Fix a bug where Rx would stop after some time
>> >>> [...]
>> >>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >>> index f7464e8..ff6d94d 100644
>> >>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> >>> [...]
>> >>> @@ -9169,6 +9178,13 @@ static int i40e_probe(struct pci_dev
>*pdev,
>> const struct pci_device_id *ent)
>> >>> if (err)
>> >>> dev_info(&pf->pdev->dev, "set phy mask fail, aq_err %d\n",
>> err);
>> >>>
>> >>> + msleep(75);
>> >>> + err = i40e_aq_set_link_restart_an(&pf->hw, true, NULL);
>> >>> + if (err) {
>> >>> + dev_info(&pf->pdev->dev, "link restart failed, aq_err=%d\n",
>> >>> + pf->hw.aq.asq_last_status);
>> >>> + }
>> >>> +
>> >>> /* The main driver is (mostly) up and happy. We need to set
>this
>> state
>> >>> * before setting up the misc vector or we get a race and the
>> vector
>> >>> * ends up disabled forever.
>> >>>
>> >>> With this hunk removed the driver successfully unloaded/reloaded
>a
>> >>> couple of hundred times. Would it be safe to just remove this
>hunk?
>> >>> I haven't seen any negative effects by removing this yet.
>> >>>
>> >>> Stefan
>> >>>
>> >> Stefan,
>> >> I wouldn't remove them yet as this does look like a valid idea to
>> check to see if the link is
>> >> restarting successfully. On the other hand can you try removing
>the
>> msleep line as this one is
>> >> most likely causing the issue due to sleeping for some long in a
>> probe function is generally a
>> >> bad idea.
>> >> Thanks,
>> >> Nick
>> >
>> > Thanks Nick for the quick reply. I tested removing the msleep but
>that
>> > didn't make a difference. You actually need to remove the complete
>> hunk
>> > to get a stable driver reload.
>> >
>> > Stefan
>> >
>> Stefan,
>> Basically there are a few things that could be going wrong
>> 1. You are getting a error return for the
>> function,i40e_aq_set_link_restart_an
>> 2. You are trying to re able the device again when not needed
>> 3. You are sending a NULL value to a field for command arguments that
>> takes a 0 and not NULL
>> to take no arguments
>> Nick
>
>First of all, I would make sure you've got a short sleep in between
>each load and unload in this stress test. There's a lot going on under
>the covers in the Firmware that really should be allowed to settle out
>before jostling it again with another load/unload command.
>
>It would help to know what Firmware you have on your NIC - can you give
>us the output from "ethtool -i <ethX>"?
>
>The out-of-tree driver has just (finally!) been updated on SourceForge,
>so you might give this version 1.2.37 driver a try to see if it changes
>your result. That code still has the hunk in question, but protected
>by a FW version check. The related patch will be headed upstream to
>net-next very soon.
>
>Firmware updates have also just been released, but I'm not sure they've
>made it to the Intel Downloads site yet. Updating your FW will make a
>difference.
>
>sln
Thanks Shannon,
For the advice on there being a newer driver and firmware to test that on before reporting the bug against this driver. I am curious as to why this newer code is not up streamed to Linux next yet through.
Nick
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|