Re: [libusb] Issues with low CDC_ACM throughput on STM32 USB-FS (sync/async)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Monday, April 22nd, 2024 at 8:35 PM, Tim Roberts <ti...@pr...> wrote:

> Jerri Lipp via libusb-devel wrote:
>
>> Also remember that USB is not a real-time bus -- each 1 millisecond frame is scheduled in advance. Once it is submitted to the hardware, that frame's schedule starts executing automatically while the next frame is scheduled. If you don't have a request pending when the frame is being scheduled, you miss that frame and will slip into the next one.
>>
>> If async can reach 900KB/s with just one transfer pending at a time (by just resubmitting the transfer in the OUT callback), I don't see why the sync api shouldn't be able too. I suspect it boils down to the same issue. Since the sync API can't have an outstanding IN request, it's the same as the behavior of async without an IN packet. I just don't know why this behavior is manifesting.

First of all, thank you for the careful explanation. Much appreciated.

> Consider the processing of a sync loop like this. When frame N ends, the host controller hardware fires an interrupt. That interrupt starts up the HCD driver. It figures out which pending requests finished during the last frame. It then completes those requests, releasing them back to their drivers, and triggers the execution of frame N+1.. That procedure will take your process out of sleep and make it "ready to run".
>
> Eventually, the system does a context switch to bring your process back into a CPU and start running. Your completion handler runs, processes the data, preps the buffer, and submits a new request, which flows back down to the driver stack into the host controller driver.
>
> That's a fair amount of overhead, and by the time your request gets into the driver, from N+1 will already be executing, and your new request will be scheduled in frame N+2. Thus, you're only able to access every other frame, cutting the bandwidth in half.

I accept all of this. But I can always amortize the overhead by submitting larger transfers, right? from the libusb side, if I submit larger transfers, I may miss a frame in the time it takes to resubmit the transfer, but the overall throughput lost should diminish with increasing transfer size, right? but that's not what I'm seeing.

Also, I do see a throughput drop with just one transfer kept in-flight. But that hit takes me from 960Kb/s (max I achieved) to 900Kb/s. So I still think the above (while true) doesn't explain what I'm seeing.

>> I'm not sure I follow. It's thelack of an IN request that results in lower throughput. Also, as I mentioned, it appears the dd/linux kernel driver (so, no libusb involved) submits IN requests even when dd is engaged in strictly OUT transfers. That doesn't suggest an issue only with ST's middleware, but some kind of standardized requirement which the kernel driver follows. People have been Full-speed USB virtual com ports for... 25 years? This can't be a new thing.
>
> I'm suggesting that those IN tokens cause interrupts that trigger the USB processing in the STM hardware, causing it to handle the other packets more often. This is just blue sky brainstorming; I don't have any evidence for this.

so a single IN transfer at the beginning, gets resent periodically which causes the MCU to flush it's OUT buffers more regularly? It's not impossible, but it seems strange. host sends the data to the device, but no matter how much is sent the device they device may way for an IN request to send the ACKs back to the host? Maybe, and lord knows embedded firmware can be buggy, but I'm still not convinced.

There's also the completely independent data point, that the linux kernel driver sends these IN requests even when no IN traffic is actually happening. I've sent a balloon to the linux-usb mailing list with the same question, we'll see if someone picks up.

> There is obviously no spec that requires this. IN and OUT endpoints are completely separate and unrelated. Endpoints do not affect one another.
>
> And, although it's true that full-speed USB support started in Windows 98, remember that the current host controllers are a tangled mess. They are all super-speed engines, with a high-speed EHCI engine tacked on to the side and a full-speed UHCI engine dangling off of that. They have three protocols built in to one, and it's possible there are some problems.

Undoubetdly true, though as it happens, I'm running this on an old junk box that is USB2/EHCI only, so that's not it at least in this case.

Thanks again for your patience and assistance.

Re: [libusb] Issues with low CDC_ACM throughput on STM32 USB-FS (sync/async)

A cross-platform library that gives apps easy access to USB devices

Re: [libusb] Issues with low CDC_ACM throughput on STM32 USB-FS (sync/async)