dfu-util fails to upload firmware to STM32L43x devices
Multiplatform USB DFU host utility
Brought to you by:
tormod
Opening DFU capable USB device... ID 0483:df11 Run-time device DFU version 011a Claiming USB DFU Interface... Setting Alternate Setting #0 ... Determining device status: state = dfuERROR, status = 10 dfuERROR, clearing status Determining device status: state = dfuIDLE, status = 0 dfuIDLE, continuing DFU mode device DFU version 011a Device returned transfer size 2048 DfuSe interface name: "Internal Flash " Downloading to address = 0x08000000, size = 133392 Download [== ] 9% 12288 bytesdfu-util: Error during special command "ERASE_PAGE" get_status
This appears to be due to the same issue experienced by dfuse-tool. The kernel is returning EPIPE when the bus stalls.
libusb: debug [libusb_alloc_transfer] transfer 0x560f17d7abc8 libusb: debug [libusb_submit_transfer] transfer 0x560f17d7abc8 libusb: debug [add_to_flying_list] arm timerfd for timeout in 5000ms (first in line) libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling libusb: debug [handle_events] poll() 3 fds with timeout in 60000ms libusb: debug [handle_events] poll() returned 1 ----- libusb: debug [reap_for_handle] urb type=2 status=-32 transferred=0 libusb: debug [handle_control_completion] handling completion status -32 libusb: debug [handle_control_completion] unsupported control request ----- libusb: debug [disarm_timerfd] libusb: debug [usbi_handle_transfer_completion] transfer 0x560f17d7abc8 has callback 0x7f3078d2fe40 libusb: debug [sync_transfer_cb] actual_length=0 libusb: debug [libusb_free_transfer] transfer 0x560f17d7abc8 dfu-util: Error during special command "ERASE_PAGE" get_status
The -32 return value is -EPIPE, which is returned when the bus is stalled. The STM32L43xx system USB DFU bootloader can stall the bus while doing an erase and while writing.
This is what usbmon shows
ffff9d2a0732a600 2731283048 S Co:3:041:0 s 21 01 0002 0000 0800 2048 = 40ea0300 21fa04f3 45ea0301 30bdc4f1 0c04c4f1 200220fa 02f001fa 04f340ea ffff9d2a0732a600 2731287168 C Co:3:041:0 0 2048 > ffff9d2a0732a600 2731287205 S Ci:3:041:0 s a1 03 0000 0000 0006 6 < ffff9d2a0732a600 2731287361 C Ci:3:041:0 0 6 = 00780000 0400 ffff9d2a0732a600 2731407497 S Ci:3:041:0 s a1 03 0000 0000 0006 6 < ffff9d2a0732a600 2731407626 C Ci:3:041:0 0 6 = 00780000 0500 ffff9d2a0732a600 2731527763 S Co:3:041:0 s 21 01 0000 0000 0005 5 = 41001000 08 ffff9d2a0732a600 2731527839 C Co:3:041:0 0 5 > ffff9d2a0732a600 2731527873 S Ci:3:041:0 s a1 03 0000 0000 0006 6 < ffff9d2a0732a600 2731527981 C Ci:3:041:0 0 6 = 000a0000 0400 ffff9d2a0732a600 2731538107 S Ci:3:041:0 s a1 03 0000 0000 0006 6 < ffff9d2a0732a600 2731550004 C Ci:3:041:0 -32 0
Note the last line.
Anonymous
What is the bwPollTimeout reported before this happen? Please run with -v -v or -v -v -v to see. The bwPollTimeout is meant to prevent the client from trying to talk with the device while it is erasing or writing.
Downloading to address = 0x08000000, size = 133380
Download [ ] 0% 0 bytes Poll timeout 10 ms
Poll timeout 10 ms
Download from image offset 00000000 to memory 08000000-080007ff, size 2048
Poll timeout 120 ms
Poll timeout 120 ms
Poll timeout 10 ms
Poll timeout 10 ms
Download from image offset 00000800 to memory 08000800-08000fff, size 2048
Poll timeout 120 ms
Poll timeout 120 ms
Poll timeout 10 ms
Poll timeout 10 ms
There are two bootloader versions in the STM32L43xx chips. This is for the new/updated bootloader with ID 0x10. The bootloader with ID 0xFF has a 20ms poll timeout instead of 120ms. The 0xFF version exhibits the same problem but firmware load will actually succeed when using a smaller (less than 2K) transfer size. I have not managed to get a good load with the new 0x10 bootloader at all with dfu-util, no matter the transfer size.
So the 0xFF version also stalls on the ERASE_PAGE when using 2048 transfer size, but works fine with 1024?
What is the timeout on the ERASE_PAGE when the stall happens?
With the 0xFF version, it appears that it always stalls on Download, not Erase.
With the 0x10 version, it appears that it always stalls on Erase:
Just checked the datasheet for STM32L433xx, the page erase takes maximum 24.47ms and page programming 23.24ms. The poll timeout should probably have been longer than that in both cases. ST makes great hardware, but as soon as a little bit of code (bootloader) is involved...
That explains it. I wonder if adding a timeout command-line option to override the minimum timeouts might address this issue?
I am a bit wary about adding yet another command line option. We already have some device quirks in the code, e.g. for wrong mass erase time. The question is how to identify and filter down the affected devices in this case.
Do you know if ST's own tools recover from a stall and use this as a mechanism to monitor the progress?
I do not.
I decided to take alook at their code and was not well rewarded for doing so. I'm a Linux guy and their demo source code is only available via a Windows installer executable. So I had to install it on a Windows VM.
I have to say that the ST DFU client code is rather hard to follow since it appears to be event driven for use within a Windows GUI. It is not readily apparent to me how the timeout is actually used within the application.
I do not know how much of the USB protocol behavior is hidden by their device driver. I'm still looking at that. I am also not familiar with the Windows USB API at all.
I was able to confirm that their Windows DFU programmer exhibited no problems during the firmware download.
Thanks for checking that out. IIRC they have source code for the top level application, but not for the library doing the dirty work. Would it be possible for you to do a USB snoop on it?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
The driver source is there as well. It's in the STTubeDevice directory.
I will look at trying to sniff the USB traffic when I get home this evening. FWIW I'm running a Windows VM under OSX. It is not the easiest USB debugging environment to work in.
The anonymous poster above is me.
I ran through USB capture using wireshark on both Linux and Windows and the OS is behaving differently. On the Linux side, a final Get Status response packet that stops dfu-util looks like this:
Note the URB status.
On Windows, using ST's own tools, the only possible indication of a problem I see is occasional back to back Get Status packets sent about 30ms apart in Wireshark with no intervening response.
I tried to use Wireshark on Windows with LibUSB and dfu-util, but it just slowed everything to a crawl and no packets were captured. I was able to confirm that dfu-util behaves the same on Windows and Linux in this regard.
It would appear from my testing with the Python dfuse-tools that just ignoring EPIPE from a secondary Get Status call is safe. If the first Get Status fails, you know there was a problem with the request. As long as the final result is dfuDownloadIdle, it seems that the write or erase operation succeeded.
Let me know if there is anything else you want me to look at.
Thanks for the thorough investigation! Can you please try the attached patch? I hope the EPIPE is returned to us as LIBUSB_ERROR_PIPE. I am only targeting the page erase here, not the download in the 0xFF chip version.
Just back from vacation and finally had time to try this. The patch does not apply to my cloned git repo with "git am". Am I missing some other commits that are needed for this to apply cleanly?
Ah, I have a patch in my tree pending for ticket #38 that might interfere. Can you please apply that patch first?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
I'm having this the same issue right now. I'm using a STM32L432. With Volden's patch I stop seeing the error: bytesdfu-util: Error during special command "ERASE_PAGE" get_status
Instead I occasionally see "dfu-util: ERASE_PAGE not correctly executed" instead after 10+ failed 10ms polls
This also failed. Please see attached debug output (-vvv)
Rob, in your log file it looks like it is erasing the first page then programming it, then erasing the second page and so on. However, since commit 740858264113140aeb8f1504abb59f9140acba9d we are erasing all pages before starting to program. Are you using latest git?
That works. I applied those patches to the 0.9.0 tag. I forgot I had switched to that tag when the patch did not apply cleanly. Once I switched to master and applied the patches, everything works as expected on the 0x10 bootloader.
Rob, that's great. Now for the 0xFF bootloader, is it a released device or some engineering sample? Your log of it shows that it succeds also on the second get_status on download, but I guess a similar workaround would work here too.
Mr Anonymous, can you please attach a debug log? I looked up the STM32L432 and it has the same timings as the STM32L433xx.
SF seems to have lost my last update, so I'll try to repost it.
Yes, the 0xFF bootloader is a released chips. All of the STM32L43x MCUs I received prior to March 2017 had this bootloader, and all STM32L432KC Nucleo boards I have use this bootloader.
OK. What's weird about the 0xFF log above is that it successfully returns from get_status with a poll timeout also after the initial one, before the stall comes on the third. How come it was able to deal with the second one? Maybe it hadn't started the flashing operation yet?
Can you please try this patch?
About your lost comment, I think the web interface checks if you have a copy of your post in your clipboard, and if not it will most likely ignore your submission :-p There is some timeout issue there.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
I tried applying the patch but I still see the error. Homemade board but I can flash via SWD and everything seems to be an order.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Last edit: Anonymous 2017-06-13