Menu

#343 Increase CMSIS_DAP usb timeout value

0.10.0
new
nobody
None
2022-03-22
2022-03-18
Gabor
No

The linux USB stack locks up for 5 seconds if there's an unreponsive USB device, while linux is trying to reconnect.

If I halt my device under test, it causes the above linux usb stack bug and my openOCD connection with my DAPLink breaks.

The best solution I've found is increasing the USB_TIMEOUT to 6000ms in cmsis_dap.c. What do you think about submitting this fix? If there's an easy way for me to create the CL, I'm happy to do so.

Discussion

  • Tommy Murphy

    Tommy Murphy - 2022-03-18

    Would it be better to add a command to allow the USB_TIMEOUT to be configured rather than hardcoding a specific fixed value?

     
  • Antonio Borneo

    Antonio Borneo - 2022-03-18

    My understanding is:
    on the USB of your Linux PC are connected both one MCU and the openocd adapter DAPLink.
    With OpenOCD you halt the target. This blocks the Linux USB stack so also the DAPLink cannot be accessed.
    Is this correct?

    Do you get some error message in Linux log?

    Can the problem be bypassed moving the target on another USB port?
    "lsusb --tree" should report the USB buses and on which bus the devices are plugged.

     
  • Berk Akinci

    Berk Akinci - 2022-03-18

    Unrelated to the bug, but possibly related to your issue:

    The linux USB stack locks up for 5 seconds if there's an unreponsive USB device

    USB is intolerant of link errors. At least USB2 and below; I don't know >=USB3. This extends to all devices and links connected to a single host controller. Make sure you don't have any marginal cables anywhere on that controller.

     
  • Chris Reed

    Chris Reed - 2022-03-20

    Since debugging USB devices is a common use case (I've certainly done it innumerable times), it is imo reasonable to have workarounds in OpenOCD and other debug tools to ensure this use case actually works reliably. Whether or not moving the two devices to separate USB controllers works (and it's a good idea to try!), that option won't necessarily be available to all users (eg debugging with a laptop that has only 1 USB controller).

    Making the timeout a configuration is an option, but is that really necessary? It's not like the workaround timeout is 10 minutes. :)

     
  • Gabor

    Gabor - 2022-03-21

    Yes, Antonio, your understanding is correct.

    I've tried moving to different USB buses, which fixes this issue. However, I need all my devices to be plugged into the same programmable USB hub, which can do a power-on-reset when I'm working remotely.

    dmesg def spits out a bunch of errors about not being able to talk to the USB device under test. Linux tries disconnecting, power cycling and the USB locks up around the xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command message. Unfortunately I'm not familiar with linux programming.

    Berk, everything is sturdy and it's definitely a software issue with a 100% repro rate.

    I agree with Chris, this workaround is no-risk/no-effort.

     
  • Antonio Borneo

    Antonio Borneo - 2022-03-21

    The xHCI driver's code has this comment:

            /* Section 4.6.1.2 of xHCI 1.0 spec says software should also time the
    
             * completion of the Command Abort operation. If CRR is not negated in 5
             * seconds then driver handles it as if host died (-ENODEV).
             * In the future we should distinguish between -ENODEV and -ETIMEDOUT
             * and try to recover a -ETIMEDOUT with a host controller reset.
             */
            ret = xhci_handshake(&xhci->op_regs->cmd_ring,
                            CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
    

    This 5 second is in xHCI spec, it's not configurable on kernel side. And should be there for other OS too.

    At my knowledge, most of the timeouts in OpenOCD are randomly chosen by the developer. Timeouts in flash drivers are often taken from memory datasheet, but the rest is just "some reasonable value".
    From the comment above, the USB stack can try to recover, so eventually succeed and returns a valid reply after 5 seconds. Imagine this during a "halt" request that timeouts after 2 seconds. The halt succeed but well after 2 seconds and you get an error!
    I'm afraid that we just start stretching one timeout now and later we have to extend one after the other...

    Anyway, this 5 seconds is in xHCI spec, so I agree to this fix.
    But please properly explain the reason in a comment above the new 6 second value.
    The proposed fix is for CMSIS-DAP, but all other USB drivers need the same. Having the proper explanation would help other driver's development. And maybe in future the timeout should be moved in some common libusb wrapper, outside the driver.
    Also libjaylink should revisit its 1 second USB_TIMEOUT macro.

     
  • Gabor

    Gabor - 2022-03-21

    Sounds good. I agree, Antonio, a systematic change would be the most appropriate. How about adding the timeout value to jtag/drivers/libusb_helper.h? and have cmsis_dap and jaylink include that instead of each defining their own? This change would be fairly simple (although I'm not sure how appropriate this header file is)

    Do you want me to take a stab at it? If yes, can you share how to set up git and gerrit with the proper access permissions?

     
  • Antonio Borneo

    Antonio Borneo - 2022-03-21

    yes, jtag/drivers/libusb_helper.h could be the right place to put such value.
    For JLink, the file src/jtag/drivers/libjaylink/libjaylink/transport_usb.c is out of OpenOCD, we need to change the library, so out of the scope of this ticket.
    It's not a big patch, but if you want to contribute to OpenOCD code, I would be glad to have you on-board.
    The info in HACKING file should be enough. Available also here.

     
  • Gabor

    Gabor - 2022-03-22
     

Log in to post a comment.

MongoDB Logo MongoDB