Menu

#40 dfu-util fails to upload firmware to STM32L43x devices

0.12
closed
nobody
None
2024-04-16
2017-03-25
Rob Riggs
No
Opening DFU capable USB device...
ID 0483:df11
Run-time device DFU version 011a
Claiming USB DFU Interface...
Setting Alternate Setting #0 ...
Determining device status: state = dfuERROR, status = 10
dfuERROR, clearing status
Determining device status: state = dfuIDLE, status = 0
dfuIDLE, continuing
DFU mode device DFU version 011a
Device returned transfer size 2048
DfuSe interface name: "Internal Flash  "
Downloading to address = 0x08000000, size = 133392
Download    [==                       ]   9%        12288 bytesdfu-util: Error during special command "ERASE_PAGE" get_status

This appears to be due to the same issue experienced by dfuse-tool. The kernel is returning EPIPE when the bus stalls.

libusb: debug [libusb_alloc_transfer] transfer 0x560f17d7abc8
libusb: debug [libusb_submit_transfer] transfer 0x560f17d7abc8
libusb: debug [add_to_flying_list] arm timerfd for timeout in 5000ms (first in line)
libusb: debug [libusb_handle_events_timeout_completed] doing our own event handling
libusb: debug [handle_events] poll() 3 fds with timeout in 60000ms
libusb: debug [handle_events] poll() returned 1
-----
libusb: debug [reap_for_handle] urb type=2 status=-32 transferred=0
libusb: debug [handle_control_completion] handling completion status -32
libusb: debug [handle_control_completion] unsupported control request
-----
libusb: debug [disarm_timerfd] 
libusb: debug [usbi_handle_transfer_completion] transfer 0x560f17d7abc8 has callback 0x7f3078d2fe40
libusb: debug [sync_transfer_cb] actual_length=0
libusb: debug [libusb_free_transfer] transfer 0x560f17d7abc8
dfu-util: Error during special command "ERASE_PAGE" get_status

The -32 return value is -EPIPE, which is returned when the bus is stalled. The STM32L43xx system USB DFU bootloader can stall the bus while doing an erase and while writing.

This is what usbmon shows

ffff9d2a0732a600 2731283048 S Co:3:041:0 s 21 01 0002 0000 0800 2048 = 40ea0300 
21fa04f3 45ea0301 30bdc4f1 0c04c4f1 200220fa 02f001fa 04f340ea
ffff9d2a0732a600 2731287168 C Co:3:041:0 0 2048 >
ffff9d2a0732a600 2731287205 S Ci:3:041:0 s a1 03 0000 0000 0006 6 <
ffff9d2a0732a600 2731287361 C Ci:3:041:0 0 6 = 00780000 0400
ffff9d2a0732a600 2731407497 S Ci:3:041:0 s a1 03 0000 0000 0006 6 <
ffff9d2a0732a600 2731407626 C Ci:3:041:0 0 6 = 00780000 0500
ffff9d2a0732a600 2731527763 S Co:3:041:0 s 21 01 0000 0000 0005 5 = 41001000 08
ffff9d2a0732a600 2731527839 C Co:3:041:0 0 5 >
ffff9d2a0732a600 2731527873 S Ci:3:041:0 s a1 03 0000 0000 0006 6 <
ffff9d2a0732a600 2731527981 C Ci:3:041:0 0 6 = 000a0000 0400
ffff9d2a0732a600 2731538107 S Ci:3:041:0 s a1 03 0000 0000 0006 6 <
ffff9d2a0732a600 2731550004 C Ci:3:041:0 -32 0

Note the last line.

Discussion

1 2 3 > >> (Page 1 of 3)
  • Tormod Volden

    Tormod Volden - 2017-03-26
    • Milestone: none --> 1.0
     
  • Tormod Volden

    Tormod Volden - 2017-03-26

    What is the bwPollTimeout reported before this happen? Please run with -v -v or -v -v -v to see. The bwPollTimeout is meant to prevent the client from trying to talk with the device while it is erasing or writing.

     
  • Rob Riggs

    Rob Riggs - 2017-03-26

    Downloading to address = 0x08000000, size = 133380
    Download [ ] 0% 0 bytes Poll timeout 10 ms
    Poll timeout 10 ms
    Download from image offset 00000000 to memory 08000000-080007ff, size 2048
    Poll timeout 120 ms
    Poll timeout 120 ms
    Poll timeout 10 ms
    Poll timeout 10 ms
    Download from image offset 00000800 to memory 08000800-08000fff, size 2048
    Poll timeout 120 ms
    Poll timeout 120 ms
    Poll timeout 10 ms
    Poll timeout 10 ms

    There are two bootloader versions in the STM32L43xx chips. This is for the new/updated bootloader with ID 0x10. The bootloader with ID 0xFF has a 20ms poll timeout instead of 120ms. The 0xFF version exhibits the same problem but firmware load will actually succeed when using a smaller (less than 2K) transfer size. I have not managed to get a good load with the new 0x10 bootloader at all with dfu-util, no matter the transfer size.

     
  • Tormod Volden

    Tormod Volden - 2017-03-26

    So the 0xFF version also stalls on the ERASE_PAGE when using 2048 transfer size, but works fine with 1024?

    What is the timeout on the ERASE_PAGE when the stall happens?

     
  • Rob Riggs

    Rob Riggs - 2017-03-26

    With the 0xFF version, it appears that it always stalls on Download, not Erase.

     Download from image offset 00002000 to memory 08002000-080027ff, size 2048
       Poll timeout 21 ms
       Poll timeout 21 ms
    dfu-util: Error during download get_status
    

    With the 0x10 version, it appears that it always stalls on Erase:

    Erasing page size 2048 at address 0x08001800, page starting at 0x08001800
       Poll timeout 10 ms
    dfu-util: Error during special command "ERASE_PAGE" get_status
    
     
  • Tormod Volden

    Tormod Volden - 2017-03-26

    Just checked the datasheet for STM32L433xx, the page erase takes maximum 24.47ms and page programming 23.24ms. The poll timeout should probably have been longer than that in both cases. ST makes great hardware, but as soon as a little bit of code (bootloader) is involved...

     
  • Rob Riggs

    Rob Riggs - 2017-03-26

    That explains it. I wonder if adding a timeout command-line option to override the minimum timeouts might address this issue?

     
  • Tormod Volden

    Tormod Volden - 2017-03-27

    I am a bit wary about adding yet another command line option. We already have some device quirks in the code, e.g. for wrong mass erase time. The question is how to identify and filter down the affected devices in this case.

    Do you know if ST's own tools recover from a stall and use this as a mechanism to monitor the progress?

     
  • Rob Riggs

    Rob Riggs - 2017-03-28

    I do not.

    I decided to take alook at their code and was not well rewarded for doing so. I'm a Linux guy and their demo source code is only available via a Windows installer executable. So I had to install it on a Windows VM.

    I have to say that the ST DFU client code is rather hard to follow since it appears to be event driven for use within a Windows GUI. It is not readily apparent to me how the timeout is actually used within the application.

    I do not know how much of the USB protocol behavior is hidden by their device driver. I'm still looking at that. I am also not familiar with the Windows USB API at all.

    I was able to confirm that their Windows DFU programmer exhibited no problems during the firmware download.

     
  • Tormod Volden

    Tormod Volden - 2017-03-28

    Thanks for checking that out. IIRC they have source code for the top level application, but not for the library doing the dirty work. Would it be possible for you to do a USB snoop on it?

     
  • Anonymous

    Anonymous - 2017-03-28

    The driver source is there as well. It's in the STTubeDevice directory.

    I will look at trying to sniff the USB traffic when I get home this evening. FWIW I'm running a Windows VM under OSX. It is not the easiest USB debugging environment to work in.

     
  • Rob Riggs

    Rob Riggs - 2017-03-29

    The anonymous poster above is me.

    I ran through USB capture using wireshark on both Linux and Windows and the OS is behaving differently. On the Linux side, a final Get Status response packet that stops dfu-util looks like this:

    Frame 360: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface 0
        Interface id: 0 (usbmon3)
        Encapsulation type: USB packets with Linux header and padding (115)
        Arrival Time: Mar 28, 2017 20:40:12.404542000 CDT
        [Time shift for this packet: 0.000000000 seconds]
        Epoch Time: 1490751612.404542000 seconds
        [Time delta from previous captured frame: 0.011896000 seconds]
        [Time delta from previous displayed frame: 0.011896000 seconds]
        [Time since reference or first frame: 10.003779000 seconds]
        Frame Number: 360
        Frame Length: 64 bytes (512 bits)
        Capture Length: 64 bytes (512 bits)
        [Frame is marked: False]
        [Frame is ignored: False]
        [Protocols in frame: usb:usbdfu]
    USB URB
        [Source: 3.127.0]
        [Destination: host]
        URB id: 0xffff9d29f8de06c0
        URB type: URB_COMPLETE ('C')
        URB transfer type: URB_CONTROL (0x02)
        Endpoint: 0x80, Direction: IN
        Device: 127
        URB bus id: 3
        Device setup request: not relevant ('-')
        Data: present (0)
        URB sec: 1490751612
        URB usec: 404542
        URB status: Broken pipe (-EPIPE) (-32)
        URB length [bytes]: 0
        Data length [bytes]: 0
        [Request in: 359]
        [Time from request: 0.011896000 seconds]
        Unused Setup Header
        Interval: 0
        Start frame: 0
        Copy of Transfer Flags: 0x00000200
        Number of ISO descriptors: 0
    USB Device Firmware Upgrade 
        [Response: Get Status (3)]
        [Interface: 0]
        [Command Frame: 359]
    [Malformed Packet: USB DFU]
        [Expert Info (Error/Malformed): Malformed Packet (Exception occurred)]
            [Malformed Packet (Exception occurred)]
            [Severity level: Error]
            [Group: Malformed]
    

    Note the URB status.

    On Windows, using ST's own tools, the only possible indication of a problem I see is occasional back to back Get Status packets sent about 30ms apart in Wireshark with no intervening response.

    I tried to use Wireshark on Windows with LibUSB and dfu-util, but it just slowed everything to a crawl and no packets were captured. I was able to confirm that dfu-util behaves the same on Windows and Linux in this regard.

    It would appear from my testing with the Python dfuse-tools that just ignoring EPIPE from a secondary Get Status call is safe. If the first Get Status fails, you know there was a problem with the request. As long as the final result is dfuDownloadIdle, it seems that the write or erase operation succeeded.

    Let me know if there is anything else you want me to look at.

     
  • Tormod Volden

    Tormod Volden - 2017-04-18

    Thanks for the thorough investigation! Can you please try the attached patch? I hope the EPIPE is returned to us as LIBUSB_ERROR_PIPE. I am only targeting the page erase here, not the download in the 0xFF chip version.

     
  • Rob Riggs

    Rob Riggs - 2017-05-08

    Just back from vacation and finally had time to try this. The patch does not apply to my cloned git repo with "git am". Am I missing some other commits that are needed for this to apply cleanly?

     
  • Tormod Volden

    Tormod Volden - 2017-05-08

    Ah, I have a patch in my tree pending for ticket #38 that might interfere. Can you please apply that patch first?

     
  • Anonymous

    Anonymous - 2017-05-08

    I'm having this the same issue right now. I'm using a STM32L432. With Volden's patch I stop seeing the error: bytesdfu-util: Error during special command "ERASE_PAGE" get_status

    Instead I occasionally see "dfu-util: ERASE_PAGE not correctly executed" instead after 10+ failed 10ms polls

     
  • Rob Riggs

    Rob Riggs - 2017-05-11

    This also failed. Please see attached debug output (-vvv)

     
  • Tormod Volden

    Tormod Volden - 2017-05-11

    Rob, in your log file it looks like it is erasing the first page then programming it, then erasing the second page and so on. However, since commit 740858264113140aeb8f1504abb59f9140acba9d we are erasing all pages before starting to program. Are you using latest git?

     
  • Rob Riggs

    Rob Riggs - 2017-05-13

    That works. I applied those patches to the 0.9.0 tag. I forgot I had switched to that tag when the patch did not apply cleanly. Once I switched to master and applied the patches, everything works as expected on the 0x10 bootloader.

     
  • Tormod Volden

    Tormod Volden - 2017-05-13

    Rob, that's great. Now for the 0xFF bootloader, is it a released device or some engineering sample? Your log of it shows that it succeds also on the second get_status on download, but I guess a similar workaround would work here too.

    Mr Anonymous, can you please attach a debug log? I looked up the STM32L432 and it has the same timings as the STM32L433xx.

     
  • Rob Riggs

    Rob Riggs - 2017-05-16

    SF seems to have lost my last update, so I'll try to repost it.

    Yes, the 0xFF bootloader is a released chips. All of the STM32L43x MCUs I received prior to March 2017 had this bootloader, and all STM32L432KC Nucleo boards I have use this bootloader.

     
  • Tormod Volden

    Tormod Volden - 2017-05-16

    OK. What's weird about the 0xFF log above is that it successfully returns from get_status with a poll timeout also after the initial one, before the stall comes on the third. How come it was able to deal with the second one? Maybe it hadn't started the flashing operation yet?

    Can you please try this patch?

    About your lost comment, I think the web interface checks if you have a copy of your post in your clipboard, and if not it will most likely ignore your submission :-p There is some timeout issue there.

     
  • Tormod Volden

    Tormod Volden - 2017-06-03
    • status: open --> needsinfo
     
  • Anonymous

    Anonymous - 2017-06-12

    I tried applying the patch but I still see the error. Homemade board but I can flash via SWD and everything seems to be an order.

     
1 2 3 > >> (Page 1 of 3)

Anonymous
Anonymous

Add attachments
Cancel