|
From: Chris A. <chr...@sr...> - 2023-07-28 20:16:02
|
Xiaofan, Jorn, Thanks for your replies. It seems I have the 64-bit patch, but I will continue to follow the various ASMedia bugs in the kernel bug tracker. I have done some more investigation. Both controllers (ASMedia and Starship) work well when I queue up a single libusb request (as opposed to queueing up multiple asynchronous requests). The ASMedia seems like a kernel problem (something to do with soft reset perhaps). With starship it fails in two ways either a -32 error from libusb: [ 0.257641] [00001ee2] libusb: debug [reap_for_handle] urb type=3 status=-32 transferred=0 [ 0.257648] [00001ee2] libusb: debug [handle_bulk_completion] handling completion status -32 of bulk urb 1/1 [ 0.257651] [00001ee2] libusb: debug [handle_bulk_completion] detected endpoint stall [ 0.257657] [00001ee2] libusb: debug [arm_timer_for_next_timeout] next timeout originally 5000ms [ 0.257663] [00001ee2] libusb: debug [usbi_handle_transfer_completion] transfer 0x55ea46c14a10 has callback 0x55ea456f663f Corresponding kernel messages: [ 8374.005433] xhci_hcd 0000:02:00.3: xhci_check_bandwidth called for udev 00000000afe62d98 [ 8374.005440] xhci_hcd 0000:02:00.3: // Ding dong! [ 8374.005521] xhci_hcd 0000:02:00.3: Successful Endpoint Configure command [ 8374.005716] xhci_hcd 0000:02:00.3: endpoint disable with ep_state 0x40 [ 8374.254236] xhci_hcd 0000:02:00.3: Stalled endpoint for slot 1 ep 2 [ 8374.254247] xhci_hcd 0000:02:00.3: Hard-reset ep 2, slot 1 [ 8374.254253] xhci_hcd 0000:02:00.3: // Ding dong! I'm also able to get another kind of error if I change the Cypress FX3 firmware to use smaller burst lengths. According to their code: /* Burst length in 1 KB packets. Only applicable to USB 3.0. */ #define CY_FX_EP_BURST_LENGTH (16) Their instructions say that lowering this could improve compatibility with some host controllers. Setting this to a lower value gives me the -32 error that is then followed by a -71 error: [ 0.017982] [000024c7] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling [ 0.017985] [000024c7] libusb: debug [usbi_wait_for_events] poll() 3 fds with timeout in 60000ms [ 0.018105] [000024c7] libusb: debug [usbi_wait_for_events] poll() returned 1 [ 0.018112] [000024c7] libusb: debug [reap_for_handle] urb type=3 status=-32 transferred=0 [ 0.018113] [000024c7] libusb: debug [handle_bulk_completion] handling completion status -32 of bulk urb 1/1 [ 0.018117] [000024c7] libusb: debug [handle_bulk_completion] detected endpoint stall [ 0.018123] [000024c7] libusb: debug [arm_timer_for_next_timeout] next timeout originally 5000ms [ 0.018124] [000024c7] libusb: debug [usbi_handle_transfer_completion] transfer 0x555e82f12f40 has callback 0x555e8224063f [ 0.018129] [000024c7] libusb: debug [libusb_submit_transfer] transfer 0x555e82f12f40 [ 0.018132] [000024c7] libusb: debug [submit_bulk_transfer] need 1 urbs for new transfer with length 262144 [ 0.018147] [000024c7] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling [ 0.018151] [000024c7] libusb: debug [usbi_wait_for_events] poll() 3 fds with timeout in 60000ms [ 0.058847] [000024c7] libusb: debug [usbi_wait_for_events] poll() returned 1 [ 0.058871] [000024c7] libusb: debug [reap_for_handle] urb type=3 status=-71 transferred=0 [ 0.058877] [000024c7] libusb: debug [handle_bulk_completion] handling completion status -71 of bulk urb 1/1 [ 0.058881] [000024c7] libusb: debug [handle_bulk_completion] low-level bus error -71 [ 0.058884] [000024c7] libusb: debug [arm_timer_for_next_timeout] next timeout originally 5000ms [ 0.058889] [000024c7] libusb: debug [usbi_handle_transfer_completion] transfer 0x555e82f0e310 has callback 0x555e8224063f [ 0.058893] [000024c7] libusb: debug [libusb_submit_transfer] transfer 0x555e82f0e310 [ 0.058896] [000024c7] libusb: debug [submit_bulk_transfer] need 1 urbs for new transfer with length 262144 [ 0.058937] [000024c7] libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling [ 0.058945] [000024c7] libusb: debug [usbi_wait_for_events] poll() 3 fds with timeout in 60000ms [ 0.066968] [000024c7] libusb: debug [usbi_wait_for_events] poll() returned 1 What to try next? Chris On 2023-07-27 08:14, Jörn Müller [Allied Vision] via libusb-devel wrote: CAUTION: External mail. Do not click on links or open attachments you do not trust. Hi, although I´m not sure if they are connected to this particular issue I remember there have been patches regarding ASMx142 chips and their handling of 64bit addressing: https://urldefense.com/v3/__https://github.com/torvalds/linux/commit/b71c669ad8390dd1c866298319ff89fe68b45653__;!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZB_JiOR0$ https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/202...@sa.../*24125779__;Iw!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZYGNCL8A$ AFAIK only the first made it to the mainline and should be included in kernels >= 5.14 I think. Kind regards, Jörn -----Ursprüngliche Nachricht----- Von: Xiaofan Chen <xia...@gm...><mailto:xia...@gm...> Gesendet: Donnerstag, 27. Juli 2023 03:33 An: Chris Adams <chr...@sr...><mailto:chr...@sr...> Cc: lib...@li...<mailto:lib...@li...> Betreff: Re: [libusb] Bulk Transfers Timeout on AMD EPYC Computers Hi Chris, In this case, it seems to be a Linux kernel issue and nothing to do with libusb. And it may have already been fixed in mainline Linux. You may have to upgrade your Ubuntu installation to 22.04 to see if that helps. Or you have to use a later version of kernel on your Ubuntu 20.04 installation. Best regards, Xiaofan On Thu, Jul 27, 2023 at 1:28 AM Chris Adams <chr...@sr...><mailto:chr...@sr...> wrote: I tried with an ASMedia ASM3142 based PCIe card. The behaviour is similar, except there are exactly 254 successful transfers every time and then they start failing. I get this message from dmesg: [20047.573189] xhci_hcd 0000:68:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 [20047.573198] xhci_hcd 0000:68:00.0: Looking for event-dma 00000000fffc30b0 trb-start 00000000fffc4fb0 trb-end 0000000000000000 seg-start 00000000fffc4000 seg-end 00000000fffc4ff0 [20047.573204] xhci_hcd 0000:68:00.0: Looking for event-dma 00000000fffc30b0 trb-start 00000000fffc3000 trb-end 00000000fffc3070 seg-start 00000000fffc3000 seg-end 00000000fffc3ff0 [20052.571651] xhci_hcd 0000:68:00.0: Cancel URB 00000000d0831cdb, dev 1, ep 0x81, starting at offset 0xfffc3040 [20052.571671] xhci_hcd 0000:68:00.0: // Ding dong! [20052.571790] xhci_hcd 0000:68:00.0: Stopped on No-op or Link TRB for slot 8 ep 2 [20052.571837] xhci_hcd 0000:68:00.0: Removing canceled TD starting at 0xfffc3040 (dma) in stream 0 URB 00000000d0831cdb [20052.571850] xhci_hcd 0000:68:00.0: xhci_giveback_invalidated_tds: Giveback cancelled URB 00000000d0831cdb TD [20053.572166] xhci_hcd 0000:68:00.0: Cancel URB 000000004ef649ab, dev 1, ep 0x81, starting at offset 0xfffc3080 [20053.572188] xhci_hcd 0000:68:00.0: // Ding dong! [20053.572308] xhci_hcd 0000:68:00.0: Stopped on No-op or Link TRB for slot 8 ep 2 [20053.572355] xhci_hcd 0000:68:00.0: Removing canceled TD starting at 0xfffc3080 (dma) in stream 0 URB 000000004ef649ab [20053.572364] xhci_hcd 0000:68:00.0: xhci_giveback_invalidated_tds: Giveback cancelled URB 000000004ef649ab TD This is very similar to this bug, but was fixed a long time ago: https://urldefense.com/v3/__https://ubun/__;!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZUq0MkgA$ tu-bugs.narkive.com%2Fi6kgAbRA%2Fbug-1667750-new-xhci-hcd-error-transf er-event-trb-dma-ptr-not-part-of-current-td-ep-index-2-comp&data=05%7C 01%7Cjoern.mueller%40alliedvision.com%7Cb60b2eb0b7484d02b17c08db8e41e9 b3%7Cc062730e38944c8f957c1ad0855e4e16%7C0%7C0%7C638260185976746165%7CU nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5hi%2BRiMIQqLDF1%2FuquPAqz0%2B IasdrIAChs2oxMUU59k%3D&reserved=0 My kernel version is 5.15.0-76-generic on Ubuntu 20.04 Here is the output from LIBUSB_DEBUG=4 https://urldefense.com/v3/__https://past/__;!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZHUcHI78$ ebin.com%2FS2MwkgQr&data=05%7C01%7Cjoern.mueller%40alliedvision.com%7C b60b2eb0b7484d02b17c08db8e41e9b3%7Cc062730e38944c8f957c1ad0855e4e16%7C 0%7C0%7C638260185976746165%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata= 1Hzd8abMAXMx3L84uLHBoRtu5Ak%2F6vB5Hz3N1kVpP%2Bc%3D&reserved=0 Behaviourally the ASM3142 and AMD's starship controller are the same, but I suspect that the cause is different since this looks like an xhci/linux issue? I would actually prefer to use the PCIE card anyway since the starship only has two ports exposed on this motherboard. Chris On 2023-07-25 21:45, Xiaofan Chen wrote: CAUTION: External mail. Do not click on links or open attachments you do not trust. On Wed, Jul 26, 2023 at 2:04 AM Chris Adams <chr...@sr...><mailto:chr...@sr...> wrote: I messaged some time ago about an issue I was having with bulk end points timing out (https://urldefense.com/v3/__https://eur02.safelinks.protection.outlook.com/?url=https*3A*2F*2F__;JSUl!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZvvYGFV4$ urldefense.com%2Fv3%2F__https%3A%2F%2Fsourceforge.net%2Fp%2Flibusb% 2Fmailman%2Fmessage%2F37802472%2F__%3B!!Ibyq0D7xP3j_!sd3S7arcjamOn1 akDVnoqignrsOfzrArOvPprQCM5mwh8g_mHgvoRM-BARa_cROBjHIoSuMRQfcFvRKz2 -F72aFe3vR1%24&data=05%7C01%7Cjoern.mueller%40alliedvision.com%7Cb6 0b2eb0b7484d02b17c08db8e41e9b3%7Cc062730e38944c8f957c1ad0855e4e16%7 C0%7C0%7C638260185976746165%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C &sdata=mQzX%2F3jsBB5H7DRedTzWy8%2BPjerp1hK9zl9BxuvpIYM%3D&reserved= 0 ) I have narrowed down the problem now and it is repeatable. For the test I use the Cypress FX3 SDK (which essentially just wraps libusb) and run their example code 09_cyusb_performance.cpp which uses the libusb asynchronous API to repeatedly read from a bulk endpoint (cypress fx3 development board). Steps to reproduce: Start 09_cyusb_performance In a separate terminal start stress-ng --hdd 2 After a few successful transfers LIBUSB_TIMEOUT errors start The error occurs with the latest version of libusb 1.0.26 and older versions (1.0.24). The device can be recovered using libusb_device_reset but it happens again. I have tried long timeouts of 20s and it does not make a difference. I have tried this on 3 Intel machines with no issues. But it happens consistently on the two AMD EPYCs I have tried it on. The specs of the two machines I have tried it on: ... Where do I even start to solve this issue? I will say to post the debug log first. Then try libusb git to see if that makes any differences. This may be a Linux kernel issue as well. So after you post the debug log, you may also want to check out linux-usb devel mailing list. -- Xiaofan _______________________________________________ libusb-devel mailing list lib...@li...<mailto:lib...@li...> https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/libusb-devel__;!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZ-BgFRcE$ _______________________________________________ libusb-devel mailing list lib...@li...<mailto:lib...@li...> https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/libusb-devel__;!!Ibyq0D7xP3j_!sRmXfx-nE80mROAMYTkKfYcCtBWeDFucco9x6Eef4FDi-U4LK-k3iCjYf59VjwsVGA_Q5d_bk2fTDb1JSVfUTET6-NBKHypG_3LZ-BgFRcE$ This e-mail is intended only for the named recipient(s) and may contain confidential, personal and/or health information (information which may be subject to legal restrictions on use, retention and/or disclosure). No waiver of confidence is intended by virtue of communication via the internet. Any review or distribution by anyone other than the person(s) for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and destroy all copies. |