Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#401 Potential races in e1000e: down() VS NAPI

open
None
in-kernel_driver
1
2015-05-12
2014-02-06
No

The race detection tools [1] have found a couple of potential races in e1000e that could lead to crashes (and may be related to some skb_over_panic problems as well, like https://sourceforge.net/p/e1000/bugs/389/).

OS: ROSA Fresh R2 GNOME, x86-64
Kernel: 3.10.27

Hardware (from lspci -vnn):

--------------
00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04)
    Subsystem: Lenovo Device [17aa:21f3]
    Flags: bus master, fast devsel, latency 0, IRQ 43
    Memory at f2500000 (32-bit, non-prefetchable) [size=128K]
    Memory at f253b000 (32-bit, non-prefetchable) [size=4K]
    I/O ports at 5080 [size=32]
    Capabilities: [c8] Power Management version 2
    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [e0] PCI Advanced Features
    Kernel driver in use: e1000e
    Kernel modules: e1000e
--------------

The reported potential races are between e1000e_poll() and e1000e_down(). I did "ping flood" to the system under analysis and changed MTU there at the same time. I suspect, this may also happen with just making the interface up/down (like suggested in ticket #389) but implementing that scenario is tricky when NetworkManager is involved.

From the report of the race detector:
--------------
(R1) Possible data race during read of size 2 at 0xffff8800a200e3a2: {{{
   T11 (L{}):
    #0  e1000_clean_rx_irq_ps (netdev.c:1289)
    #1  e1000e_poll (netdev.c:2654)
  Concurrent write(s) happened at (OR AFTER) these points:
   T39 (L{L3}):
    #0  e1000_clean_rx_ring (netdev.c:1701)
    #1  e1000e_down (netdev.c:4032)
    #2  e1000_change_mtu (netdev.c:5688)
  Location 0xffff8800a200e3a2 is 34 bytes inside a block 
  starting at 0xffff8800a200e380 of size 128 allocated by T1 from heap:
    #0  kmalloc (slub_def.h:170)
    #0  kzalloc (slab.h:519)
    #0  e1000_alloc_queues (netdev.c:2621)
    #0  e1000_sw_init (netdev.c:4096)
    #0  e1000_probe (netdev.c:6606)
  Locks involved in this report: {L3} # this is rtnl_lock
}}}
Thread T11: IRQ on CPU #2
Thread T39: ifconfig

(R2) Possible data race during read of size 8 at 0xffffc900047ef7e0: {{{
   T5 (L{}):
    #0  e1000_alloc_rx_buffers (netdev.c:653)
    #1  e1000_clean_rx_irq (netdev.c:1041)
    #2  e1000e_poll (netdev.c:2654)
  Concurrent write(s) happened at (OR AFTER) these points:
   T43 (L{L3}):
    #0  e1000_clean_rx_ring (netdev.c:1677)
    #1  e1000e_down (netdev.c:4032)
    #2  e1000_change_mtu (netdev.c:5688)
  Location 0xffffc900047ef7e0 is 10208 bytes inside a block 
  starting at 0xffffc900047ed000 of size 10240 allocated by T3 from heap:
    #0  e1000e_setup_rx_resources (netdev.c:2341)
    #1  e1000_open (netdev.c:4273)
  Locks involved in this report: {L3} # this is rtnl_lock
}}}
Thread T5:  IRQ on CPU #1
Thread T43: ifconfig
--------------

If e1000e_poll() may run concurrently with e1000_clean_rx_ring, the driver may end up cleaning the same skb twice and get use-after-free.

Consider the potential race R1 (technically, it is on the accesses to next_to_clean field).

e1000_clean_rx_ring() cleans the rx_ring, frees the appropriate skbs (netdev.c:1675):

if (buffer_info->skb) {
    dev_kfree_skb(buffer_info->skb);
    buffer_info->skb = NULL;
}

and then sets next_to_clean to 0, (netdev.c:1701):

rx_ring->next_to_clean = 0;

Suppose, e1000_clean_rx_irq_ps() executes at the same time, gets next_to_clean, gets the appropriate skb after dev_kfree_skb() has been called for it by e1000_clean_rx_ring() but before 'buffer_info->skb = NULL' is executed (netdev.c:1289).

i = rx_ring->next_to_clean;
rx_desc = E1000_RX_DESC_PS(*rx_ring, i);
staterr = le32_to_cpu(rx_desc->wb.middle.status_error);
buffer_info = &rx_ring->buffer_info[i];

while (staterr & E1000_RXD_STAT_DD) {
    if (*work_done >= work_to_do)
        break;
    (*work_done)++;
    skb = buffer_info->skb;

Then e1000_clean_rx_irq_ps() processes the skb that may contain garbage now and skb_put() could result in skb_over_panic().

The story is similar for the potential race R2.

e1000_alloc_rx_buffers() gets the skb pointer and processes it (netdev.c:653):

skb = buffer_info->skb;
if (skb) {
    skb_trim(skb, 0);
    goto map_skb;
}

but the memory might have been freed already by e1000_clean_rx_ring() (netdev.c:1675):

if (buffer_info->skb) {
    dev_kfree_skb(buffer_info->skb);
    buffer_info->skb = NULL;
}

If the pointer is obtained before it has been set to NULL, there may be a trouble.

e1000e_down() calls napi_synchronize(&adapter->napi) (e1000e/netdev.c:4019) before e1000_clean_rx_ring() but that alone does not prevent poll() from being called concurrently with e1000_clean_rx_ring. It just waits for the currently scheduled poll() callbacks to finish. I cannot see in the code, what else may prevent execution of poll() then.

To avoid such problems altogether, what about disabling NAPI in e1000e_down() rather than using napi_synchronize()? NAPI can then be enabled again in appropriate moment (in e1000e_up() may be?).

FWIW, e1000 driver disables NAPI in e1000_down() before cleaning the rings, unlike e1000e.

[1] http://code.google.com/p/kernel-strider/

Discussion

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
    -The race detection tools [1] have found a couple of potential races in e1000e that could lead to crashed (and may be related to some skb_over_panic problems as well, like https://sourceforge.net/p/e1000/bugs/389/).
    +The race detection tools [1] have found a couple of potential races in e1000e that could lead to crashes (and may be related to some skb_over_panic problems as well, like https://sourceforge.net/p/e1000/bugs/389/).
    
     OS: ROSA Fresh R2 GNOME, x86-64
     Kernel: 3.10.27
    
     
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -131,6 +131,6 @@
    
     To avoid such problems altogether, what about disabling NAPI in e1000e_down() rather than using napi_synchronize()? NAPI can then be enabled again in appropriate moment (in e1000e_up() may be?).
    
    -FWIW, e1000 driver disables NAPI in e1000_down() before cleaning the rings unlike e1000e.
    +FWIW, e1000 driver disables NAPI in e1000_down() before cleaning the rings, unlike e1000e.
    
     [1] http://code.google.com/p/kernel-strider/
    
     
  • Todd Fujinaka
    Todd Fujinaka
    2014-02-06

    • assigned_to: dertman
     
  • Todd Fujinaka
    Todd Fujinaka
    2015-05-12

    • assigned_to: dertman --> Yanir Lubetkin