Menu

#43 Allocation of Latency Statistics is Non-Atomic

1.0
closed
nobody
None
2020-10-29
2020-10-16
No

Allocation of the latency statistics struct crashes with drivers that allocate sessions in an interrupt context. Allocating sessions in an interrupt context is explicitly allowed in the SCST documentation and works when latency statistics is disabled.

[ 6061.943473] kernel BUG at ../mm/vmalloc.c:2069!
[ 6061.943483] invalid opcode: 0000 [#1] SMP PTI
[ 6061.943490] CPU: 31 PID: 0 Comm: swapper/31 Kdump: loaded Tainted: P           OE  X   5.3.18-22-default #1 SLE15-SP2 (unreleased)
[ 6061.943492] Hardware name: Supermicro SYS-2029U-TN24R4T/X11DPU, BIOS 3.0b 03/04/2019
[ 6061.943516] RIP: 0010:__get_vm_area_node+0x159/0x160
[ 6061.943518] Code: 4c c8 b8 1e 00 00 00 83 f9 1e 0f 4f c8 49 d3 e7 e9 0d ff ff ff 48 89 ef e8 84 ea 02 00 31 ed eb ac 48 0f bd cb 83 c1 01 eb c7 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 ff 34 24 68 c0 0c 00 00 49 89
[ 6061.943520] RSP: 0018:ffffbd7b86f4cc00 EFLAGS: 00010206
[ 6061.943522] RAX: 0000000080000100 RBX: 00000000ffffffff RCX: ffffbd7b80000000
[ 6061.943523] RDX: 0000000000000022 RSI: 0000000000000001 RDI: 000000000000c000
[ 6061.943525] RBP: 000000000000b9a0 R08: ffffdd7b7fffffff R09: 00000000ffffffff
[ 6061.943527] R10: 000000000001ac39 R11: 000000000000026a R12: 0000000000000dc0
[ 6061.943530] R13: ffff9eab5e816cec R14: 0000000fffffffe0 R15: ffff9eab5e816cc0
[ 6061.943534] FS:  0000000000000000(0000) GS:ffff9eab5f7c0000(0000) knlGS:0000000000000000
[ 6061.943536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6061.943538] CR2: 00007f4307d851c0 CR3: 00000009b6c0a002 CR4: 00000000007606e0
[ 6061.943539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6061.943540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6061.943541] PKRU: 55555554
[ 6061.943542] Call Trace:
[ 6061.943544]  <IRQ>
[ 6061.943547]  __vmalloc_node_range+0x6d/0x270
[ 6061.943573]  ? scst_alloc_session+0x1c9/0x270 [scst]
[ 6061.943587]  __vmalloc_node.constprop.39+0x43/0x60
[ 6061.943599]  ? scst_alloc_session+0x1c9/0x270 [scst]
[ 6061.943606]  scst_alloc_session+0x1c9/0x270 [scst]
[ 6061.943612]  ? handle_async_event_fc+0x150/0x150 [atto_scst]
[ 6061.943618]  __scst_register_session+0x35/0x130 [scst]
[ 6061.943626]  scst_register_session+0xc/0x10 [scst]
[ 6061.943628]  create_session+0xcc/0x240 [atto_scst]
[ 6061.943638]  HwTgtReportIni+0xb4/0xe0 [celerity16fc]
[ 6061.943646]  ? HwTgtHandleNewCmd+0x475/0x710 [celerity16fc]
[ 6061.943652]  ? HwRcvFrmReceived+0x205/0x3a0 [celerity16fc]
[ 6061.943658]  ? HwIntProcCqForWqRq+0x458/0xa10 [celerity16fc]
[ 6061.943670]  ? ixgbe_poll+0x614/0x740 [ixgbe]
[ 6061.943675]  ? HwIntProcEq+0xa9/0x100 [celerity16fc]
[ 6061.943680]  ? CoreMsixInterrupt+0x37/0x50 [celerity16fc]
[ 6061.943683]  ? tasklet_action_common.isra.21+0x5e/0x110
[ 6061.943686]  ? __do_softirq+0xe3/0x2dc
[ 6061.943688]  ? irq_exit+0xd5/0xe0
[ 6061.943690]  ? do_IRQ+0x7f/0xd0
[ 6061.943692]  ? common_interrupt+0xf/0xf
[ 6061.943693]  </IRQ>
[ 6061.943696]  ? cpuidle_enter_state+0xa7/0x430

The attached patch fixes the issue by atomically allocating the latency statistics memory with kmalloc instead of vmalloc.

1 Attachments

Discussion

  • Bart Van Assche

    Bart Van Assche - 2020-10-25

    Thanks for the report and for the patch. However, allocating large amounts of data with kmalloc() or kzalloc() is considered a bad practice since both allocate contiguous data and since Linux memory tends to get fragmented over time. How about the patch below? It's not perfect but may be good enough.

    diff --git a/scst/src/scst_lib.c b/scst/src/scst_lib.c
    index 1a9391c75801..af53dc68b1a0 100644
    --- a/scst/src/scst_lib.c
    +++ b/scst/src/scst_lib.c
    @@ -7173,7 +7173,9 @@ struct scst_session *scst_alloc_session(struct scst_tgt *tgt, gfp_t gfp_mask,
        }
    
        if (atomic_read(&scst_measure_latency)) {
    
    -       sess->lat_stats = vzalloc(sizeof(*sess->lat_stats));
    +       sess->lat_stats = gfp_mask & GFP_ATOMIC ?
    +           kzalloc(sizeof(*sess->lat_stats), gfp_mask) :
    +           vzalloc(sizeof(*sess->lat_stats));
            if (!sess->lat_stats)
                goto out_free_name;
        }
    @@ -7232,7 +7234,7 @@ void scst_free_session(struct scst_session *sess)
        mutex_unlock(&scst_mutex);
    
        kfree(sess->transport_id);
    
    -   vfree(sess->lat_stats);
    +   kvfree(sess->lat_stats);
        kfree(sess->initiator_name);
        if (sess->sess_name != sess->initiator_name)
            kfree(sess->sess_name);
    diff --git a/scst/src/scst_sysfs.c b/scst/src/scst_sysfs.c
    index c0f931374753..24a596aca6a4 100644
    --- a/scst/src/scst_sysfs.c
    +++ b/scst/src/scst_sysfs.c
    @@ -6986,7 +6986,7 @@ static void scst_free_lat_stats_mem(void)
            list_for_each_entry(tgt, &tt->tgt_list, tgt_list_entry) {
                list_for_each_entry(sess, &tgt->sess_list,
                            sess_list_entry) {
    -               vfree(sess->lat_stats);
    +               kvfree(sess->lat_stats);
                    sess->lat_stats = NULL;
                }
            }
    
     

    Last edit: Bart Van Assche 2020-10-25
    • Steven Siwinski

      Steven Siwinski - 2020-10-28

      Thanks Bart, The above patch looks like a good compromise.

       
      • Bart Van Assche

        Bart Van Assche - 2020-10-29

        Thanks for the feedback. That patch has been applied on the trunk.

         
  • Bart Van Assche

    Bart Van Assche - 2020-10-25
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,6 +1,6 @@
     Allocation of the latency statistics struct crashes with drivers that allocate sessions in an interrupt context. Allocating sessions in an interrupt context is explicitly allowed in the SCST documentation and works when latency statistics is disabled.
    
    -`
    +```
     [ 6061.943473] kernel BUG at ../mm/vmalloc.c:2069!
     [ 6061.943483] invalid opcode: 0000 [#1] SMP PTI
     [ 6061.943490] CPU: 31 PID: 0 Comm: swapper/31 Kdump: loaded Tainted: P           OE  X   5.3.18-22-default #1 SLE15-SP2 (unreleased)
    @@ -44,6 +44,6 @@
     [ 6061.943692]  ? common_interrupt+0xf/0xf
     [ 6061.943693]  &lt;/IRQ&gt;
     [ 6061.943696]  ? cpuidle_enter_state+0xa7/0x430
    -`
    +```
    
     The attached patch fixes the issue by atomically allocating the latency statistics memory with kmalloc instead of vmalloc.
    
     
  • Bart Van Assche

    Bart Van Assche - 2020-10-29
    • status: open --> closed
     

Log in to post a comment.

MongoDB Logo MongoDB