OpenOCD - Open On-Chip Debugger / Tickets / #203 programming st_nucleo

Sorry about reference to program in the ticket, I simplified the reproduction to write_image.

Also, heres some log following immediately after the above output (had to reset before I tried again, with the same result):

> flash write_image fw.elf 0x100000                          
Flash write discontinued at 0x081020c4, next section at 0x08120000
Target is already running an algorithm
error starting target flash write algorithm
error writing to flash at address 0x08000000 at offset 0x00100000

> reset init
Unable to match requested speed 2000 kHz, using 1800 kHz
Unable to match requested speed 2000 kHz, using 1800 kHz
adapter speed: 1800 kHz
target halted due to debug-request, current mode: Thread 
xPSR: 00000000 pc: 00000000 msp: 00000000
Unable to match requested speed 8000 kHz, using 4000 kHz
Unable to match requested speed 8000 kHz, using 4000 kHz
adapter speed: 4000 kHz
> flash write_image fw.elf 0x100000
Flash write discontinued at 0x081020c4, next section at 0x08120000
timed out while waiting for target halted
target halted due to debug-request, current mode: Handler HardFault
xPSR: 0x00000003 pc: 00000000 msp: 0xffffffe0
error waiting for target flash write algorithm
error writing to flash at address 0x08000000 at offset 0x00100000

>

Cody Schafer - 2018-08-20

Attached is a log of the command output:

openocd -f board/st_nucleo_f7.cfg -c 'init' -c 'reset init' -c 'flash write_image fw.elf 0x100000' -d

stm32f767-write-image-debug-log.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-21

As a side note: it isn't just the algorithm that's failing: the fallback/normal write mechanism fails too.
Here's some log output from running with set WORKAREASIZE 0 to force non-algorithm flash writing, which also fails.

This is kind enough to fail more quickly than the algorithm variant, which waits for a timeout (probably should try to catch hardfaults when executing an algorithm).

st-nucleo-f767zi-non-algo-flashwrite-fail.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Bolsch - 2018-08-22

Hm, just checked with current head on Nucleo-F767ZI via integrated ST-Link:
stm32f2x user_options 0xDFC, boot_add0 0x0080, boot_add1 0x0040,
so in dual-bank mode, after mass erase.

Programming the whole flash (2MBytes) with random data (flash write_bank 0 random.bin) and verify after read back (flash read_bank 0 verify.bin) works flawlessly for me.

And same to second bank only works for me, too.

I'd suggest you try again without the 'erase' (do a mass erase instead and an erase check before), and then use flash write_bank with a binary (or ihex, srec) file.

Maybe your elf file has some 'unusual' properties.

BTW: Any sector protection set?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Cody Schafer - 2018-08-22
  
  I just tried reproducing and got the same failure when using telnet to command openocd. I've attached a telnet session (openocd -f board/st_nucleo_f7.cfg -c 'init' -c 'reset init') .
  
  I then tried a fully automated variant immediately afterward with the same board and was not able to reproduce (flash occured succesfully): openocd -f board/st_nucleo_f7.cfg -c 'init; reset init; stm32f2x mass_erase 0; flash write_bank 0 random_1MB.bin
  
  (Attached file is the failure via telnet, the content is the telnet session)
  
  No sector protection set (I printed it in the attached log)
  
  write-bank-2-stm32f7x-telnet.txt
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Info on my elf file:

program headers readelf -l fw.elf:

Elf file type is EXEC (Executable file)
Entry point 0x8020239
There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x010000 0x08000000 0x08000000 0x020c4 0x020c4 R   0x10000
  LOAD           0x020000 0x08020000 0x08020000 0x28ad0 0x28ad0 RWE 0x10000
  LOAD           0x055538 0x20025538 0x08048ad0 0x00e80 0x00e80 RW  0x10000
  LOAD           0x059950 0x08049950 0x08049950 0x00068 0x00068 R   0x10000
  LOAD           0x060000 0x20020000 0x20020000 0x00000 0x05538 RW  0x10000
  NOTE           0x059990 0x08049990 0x08049990 0x00024 0x00024 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     .bootldr 
   01     .vector_table .text .ARM .init_array .fini_array .rodata 
   02     .data 
   03     .build_info .note.gnu.build-id .build_info_suffix 
   04     .bss 
   05     .note.gnu.build-id

openocd -f board/st_nucleo_f7.cfg -c 'test_image fw.elf 0 elf':

address 0x08000000 length 0x000020c4
address 0x08020000 length 0x00028ad0
address 0x08048ad0 length 0x00000e80
address 0x08049950 length 0x00000068
verified 178812 bytes in 0.000513s (340392.000 KiB/s)

I've done some further testing with the following 2 commands:

A. a.cfg: write random 1MB to bank 1, random 1MB to bank2 (openocd -f board/st_nucleo_f7.cfg -f a.cfg -c exit)
B. b.cfg: write fw.elf to bank 1, random 1MB to bank 2 (openocd -f board/st_nucleo_f7.cfg -f b.cfg -c exit)

Here's a sequence of executions with OK, ERROR 1, and ERROR 2 indicating the operation which failed (flashing bank 1 or 2)

# plug in nucleo-f767zi's stlink to computer
A OK
A OK
B ERROR 2
B ERROR 1
B ERROR 2
B ERROR 1
B ERROR 2
B ERROR 1
A OK
B ERROR 2
A ERROR 1
A OK
B ERROR 2
A ERROR 1
B ERROR 2
A ERROR 1
A OK
A OK

So:

The programming the elf file triggers this issue
Programming that elf file appears to break the following 2 programming attempts (at least in the bank1, bank2, bank1, bank2, ... programming sequence I tested here.

Last edit: Cody Schafer 2018-08-22

a.cfg

b.cfg

Cody Schafer - 2018-08-22

fw.elf test_image with gap(s) annotated:

address 0x08000000 length 0x000020c4 #gap 0x080020c4 length 0x0001df3c address 0x08020000 length 0x00028ad0 address 0x08048ad0 length 0x00000e80 address 0x08049950 length 0x00000068
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Bolsch - 2018-08-22

I've created an elf file with the same section addresses/sizes, filled with garbage, test_image reports the same figures as for yor file. No problem whatsoever, programming and verification works ok for me. Checked with ST-Link V2J30M19 and V2J31M21.
Other than the chip rev. (yours is Z, mine is A), I don't see any difference.

So either it's defective hardware, or ... your firmware does weird things like fiddling with watchdogs, clocks, interrupts, sleep mode ...
This might explain your observations above.

Maybe add

reset_config srst_only srst_nogate connect_assert_srst

to your cfg or place an infinite loop at the very beginning of your startup code (but take care not to change the length of the startup code, so that all sections remain at precisely the same offsets).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Cody Schafer - 2018-08-23
  
  Thank you for trying to reproduce.
  
  It seems very curious that the actions of my firmware (which doesn't write to flash, etc) would affect the ability of openocd to program the chips flash, especially given that reset init is being used here to reset & halt the target.
  
  I'll try out adding a loop in startup code so we can see if somehow openocd isn't managing to reset/halt the processor properly.
  
  My firmware does use clocks (it increases clock speed to 216MHz), interrupts (enables a bunch of them, including a few timers), and enables the watchdog (specifically, IWDG).
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Cody Schafer - 2018-08-27
  
  I tried the reset_config srst_only srst_nogate connect_assert_srst with fw.elf. No change in behavior (still fails every other time) was noted (test script attached).
  
  I also modified fw.elf to start with an infinite loop. No change in behavior was noted (still fails to flash via algorithm every-other time)
  
  Last edit: Cody Schafer 2018-08-27
  
  pfrr.cfg
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Fritiofson - 2018-08-23

On Wed, Aug 22, 2018 at 8:33 PM Ismail Kose ihkose@gmail.com wrote:

I built openocd from 6060545458f6863710d576fc4bd2512d34f88f89 commit-id,
but cant make SWD working. I get invalid command name "swd" error
message when I run "sudo openocd -f max3263x_hdk.cfg" command on my Ubuntu
16.04.

You'll need to "transport select swd" after selecting the interface to get
access to the swd commands.

Also NEVER start openocd as root!

/Andreas

alternate

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-23

I've done some more testing wrt this issue and noticed that the failure is reproduced without even using the second bank. Instead, just programming my fw.elf multiple times causes exactly every-other attempt to program fw.elf to fail. (running openocd -f board/st_nucleo_f7.cfg -f d.cfg -c exit each time)

I've added some debug output to various code around stm32x_block_write to try and figure out what in particular is failing (see attached output & patches).

I've added various mov r5, #0xbb (etc) instructions to try to track the progression of the algorithm, initially setting r5 to 0xaa. In the failure case, I've never seen anything by 0xaa.

I've observed that the r0 value returned by the algorithm (which should be the flash status register) does not appear to actually be the flash status register. It has bits set that are marked as "reserved" in the stm32f7/6xxx manual and it's value appears to change as I change/add to the algorithm asm. The value looks very much like a pointer to ram, possibly to the end of the code composing the algorithm.

Edit:
Further examination indicated r0 was exactly source->address: a pointer to the working area where the circular buffer would have been stored, which is preloaded into r0 prior to algorithm execution. This seems to indicate that the issue is that the algorithm is never getting started at all in the failure case, and the preloading of r0 with source->address was hiding this (probably should use an additional register for return and preload with a sentinal to detect the "didn't execute" case)

Last edit: Cody Schafer 2018-08-23

0001-XXX-flash-stm32f2x-tweak-to-u32-writes-and-add-debug.patch

d.cfg

fail.txt

success.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-24

After some more digging I've seen the following:

reading DHCSR before & after resuming the processor indicates that when the failure occurs the processor is in lockup (S_LOCKUP is set)

further examination of CFSR indicates that this is an imprecise usage fault

the T bit in xPSR was 0, which would cause a usage fault (can't disable thumb mode on armv7m)

tweaking run_algorithm (in armv7m.c) to set xPSR so the T bit is set causes flashing multiple times to be reliable (have not yet tested writing to offset parts of the flash).

Not yet clear to me why T is getting cleared in the first place, as resolving that would be ideal.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tommy Murphy - 2018-08-24

Any chance the CPU is executing from zeroized or garbage memory on power on reset thus causing the T bit to be cleared at some stage, and then a double fault and lockup occurring?
However even if this was happening I would expect the debug connection and reset init to get it back into a known good state....

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-27

I've used the attached (4-line) patch on openocd master to workaround this issue (by setting xPSR.T).

0001-armv7m-always-set-xPSR.T-1-when-starting-an-algorith.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Cody Schafer - 2018-08-28
  
  I've submitted a variation of this for inclusion. http://openocd.zylin.com/#/c/4658/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-27

A few more details (using stlink_usb_v2_read_debug() to get values)

In success case, while running the flash algorithm: stlink_usb_run(), xPSR.T==1 prior to clearing C_HALT and xPSR.T==0 after clearing C_HALT.

In the failing case, xPSR.T==0 also prior to clearing C_HALT (theory: it immediately faults in this case).

Multiple algorithm executions in a single flash write_image erase fw.elf work. Even though xPSR.T==0 on read back in stlink_usb_run(), on second algorithm execution xPSR.T==1 is seen prior to clearing C_HALT.

Multiple algorithm executions across multiple flash write_image erase fw.elf work (no reset between). (see pf.cfg attached)

Multiple algorithm executions with resets between them fail (see pfr.cfg, attached).

Theory: the reset is relevent because of the caching of register values by openocd. It's plausible that the st-link is failing to return the full xPSR in some cases, causing openocd to write-back different values into xPSR, clearing the xPSR.T bit.

Last edit: Cody Schafer 2018-08-27

pf.cfg

pfr.cfg
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-29

Ah, found out why xPSR.T was 0:

My fw.elf file's first section (address 0x08000000 length 0x000020c4) is a bootloader, which is generated/included via (approximately) the following steps:

$CC -o boot.elf $BOOT_OBJ $OBJCOPY -O binary boot.elf boot.bin $LD -r -b binary boot.bin -o boot.o $OBJCOPY --rename-section .data=.bootldr,alloc,load,readonly,rom,data boot.o boot-ldr.o

boot-ldr.o is then linked into the image with the following linker script snippet:

SECTIONS { .bootldr ORIGIN(FLASH) : { KEEP(*(.bootldr)) } }

The key part is (for some reason) the section flags set when objcopying: alloc,load,readonly,rom,data. These were added to the firmware somewhat recently.

When loading the fw.elf binary composed with the bootloader image (generated as described above), the bootloader section in fw.elf (.bootldr) appears to be filled with zeros rather than actual data. As a result, when the processor resets, it reads the second element of the interrupt vector (0, in this case), and sets pc=0, xPSR=0. This is why after a reset I would observe failures would begin to happen (as a reset would trigger loading 0 into xPSR).

Failures only occured every-other time because erasing the flash (which happened without running a target algorithm on the device) causes the interrupt vector to contain (instead) 0xffffffff, resulting in pc=0xfffffffe, xPSR.T=1 on the next reset. This likely means that a reset between erase and writing would have also worked around the issue.

Removing the section flag specification from objcopy ($OBJCOPY --rename-section .data=.bootldr boot.o boot-ldr.o) results in the values in the .bootldr section being loaded as expected (rather than being set to zero). It's not yet clear to me why the section flags are having this effect. The diff from arm-none-eabi-objdump -h fw.elf is below, and only shows that I've removed the READONLY flag by not passing my explicit flags.

--- without-set-flags 2018-08-29 14:29:13.971991842 -0400 +++ with-set-flags 2018-08-29 14:28:53.065266231 -0400 @@ -4,7 +4,7 @@ Sections: Idx Name Size VMA LMA File off Algn 0 .bootldr 000021b4 08000000 08000000 00010000 2**0 - CONTENTS, ALLOC, LOAD, DATA + CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .vector_table 000001f8 08020000 08020000 00020000 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .text 0001d410 080201f8 080201f8 000201f8 2**6

On a related note: I discovered this while loading with gdb's load rather than using openocd's program or flash write_image.

In any case: while it's true that the xPSR.T being set to 0 is something not entirely related to target algorithms, it's also the case that given that reset halt can cause xPSR.T to be set to 0, we should explictily set it when trying to run algorithms.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Antonio Borneo - 2018-08-30
  
  When loading the fw.elf binary composed with the bootloader image (generated as described above), the bootloader section in fw.elf (.bootldr) appears to be filled with zeros rather than actual data. As a result, when the processor resets, it reads the second element of the interrupt vector (0, in this case), and sets pc=0, xPSR=0. This is why after a reset I would observe failures would begin to happen (as a reset would trigger loading 0 into xPSR).
  
  Failures only occured every-other time because erasing the flash (which happened without running a target algorithm on the device) causes the interrupt vector to contain (instead) 0xffffffff, resulting in pc=0xfffffffe, xPSR.T=1 on the next reset. This likely means that a reset between erase and writing would have also worked around the issue.
  
  This makes sense!
  After a "reset halt" the PC is loaded from the reser vector and the thumb mode is set from the LSB of the reset vector.
  In OpenOCD there is nothing that forces thumb mode before executing an algorithm (and every angorithm for ARM in contrib/loaders/ is written in thumb).
  I have not tested your patch, but the functionality seams correct.
  But now that you have clear the root cause, I suggest you to update both commit message and comment in your patch.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul Fertser - 2018-08-30
  
  On Wed, Aug 29, 2018 at 06:49:55PM -0000, Cody Schafer wrote:
  
  Ah, found out why xPSR.T was 0:
  ...
  
  So obvious in the hindsight but boy what a rough trip you had finding
  it! Thank you so much for your persistence and sharing the
  result.
  
  --
  Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
  mailto:fercerpav@gmail.com
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cody Schafer - 2018-08-29

For others running into this: turns out the magical flag for objcopy is contents, without which it zeros the section's content.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

programming st_nucleo_f7 (stm32f767) bank 2 consistently fails

The Open On-Chip Debugger

Milestone

Searches

Help

#203 programming st_nucleo_f7 (stm32f767) bank 2 consistently fails

Discussion