In the stm32f767zi (on the nucleo-f767zi board), there is 2MiB of flash. When it is configured into dual bank mode (stm32f2x options_write 0 0xDFC 0x0080 0x0040, presuming all other options are left at their defaults), using the program command to program the second bank (bank1_start=0x0810_0000, bank2_start=0x0800_0000) with the command flash write_image fw.elf erase 0x100000, the execution consistently fails with the following output:
openocd -f board/st_nucleo_f7.cfg
> flash write_image fw.elf 0x100000
Flash write discontinued at 0x081020c4, next section at 0x08120000
timed out while waiting for target halted
target halted due to debug-request, current mode: Handler HardFault
xPSR: 0x00000003 pc: 00000000 msp: 0xffffffe0
error waiting for target flash write algorithm
error writing to flash at address 0x08000000 at offset 0x00100000
This is using the embedded ST-LINK included on the nucleo. The st-link firmware version is V2J31M21.
fw.elf has sections starting in bank1 of flash, which is why the offset is only the difference between bank1 and bank2.
The banks refered to here are banks in the stm32f7x sense, and are not openocd flash banks.
Sorry about reference to
programin the ticket, I simplified the reproduction towrite_image.Also, heres some log following immediately after the above output (had to reset before I tried again, with the same result):
Attached is a log of the command output:
openocd -f board/st_nucleo_f7.cfg -c 'init' -c 'reset init' -c 'flash write_image fw.elf 0x100000' -dAs a side note: it isn't just the algorithm that's failing: the fallback/normal write mechanism fails too.
Here's some log output from running with
set WORKAREASIZE 0to force non-algorithm flash writing, which also fails.This is kind enough to fail more quickly than the algorithm variant, which waits for a timeout (probably should try to catch hardfaults when executing an algorithm).
Hm, just checked with current head on Nucleo-F767ZI via integrated ST-Link:
stm32f2x user_options 0xDFC, boot_add0 0x0080, boot_add1 0x0040,
so in dual-bank mode, after mass erase.
Programming the whole flash (2MBytes) with random data (flash write_bank 0 random.bin) and verify after read back (flash read_bank 0 verify.bin) works flawlessly for me.
And same to second bank only works for me, too.
I'd suggest you try again without the 'erase' (do a mass erase instead and an erase check before), and then use flash write_bank with a binary (or ihex, srec) file.
Maybe your elf file has some 'unusual' properties.
BTW: Any sector protection set?
I just tried reproducing and got the same failure when using telnet to command openocd. I've attached a telnet session (
openocd -f board/st_nucleo_f7.cfg -c 'init' -c 'reset init') .I then tried a fully automated variant immediately afterward with the same board and was not able to reproduce (flash occured succesfully):
openocd -f board/st_nucleo_f7.cfg -c 'init; reset init; stm32f2x mass_erase 0; flash write_bank 0 random_1MB.bin(Attached file is the failure via telnet, the content is the telnet session)
No sector protection set (I printed it in the attached log)
Info on my elf file:
program headers
readelf -l fw.elf:openocd -f board/st_nucleo_f7.cfg -c 'test_image fw.elf 0 elf':I've done some further testing with the following 2 commands:
A.
a.cfg: write random 1MB to bank 1, random 1MB to bank2 (openocd -f board/st_nucleo_f7.cfg -f a.cfg -c exit)B.
b.cfg: write fw.elf to bank 1, random 1MB to bank 2 (openocd -f board/st_nucleo_f7.cfg -f b.cfg -c exit)Here's a sequence of executions with
OK,ERROR 1, andERROR 2indicating the operation which failed (flashing bank 1 or 2)So:
bank1, bank2, bank1, bank2, ...programming sequence I tested here.Last edit: Cody Schafer 2018-08-22
fw.elftest_image with gap(s) annotated:I've created an elf file with the same section addresses/sizes, filled with garbage, test_image reports the same figures as for yor file. No problem whatsoever, programming and verification works ok for me. Checked with ST-Link V2J30M19 and V2J31M21.
Other than the chip rev. (yours is Z, mine is A), I don't see any difference.
So either it's defective hardware, or ... your firmware does weird things like fiddling with watchdogs, clocks, interrupts, sleep mode ...
This might explain your observations above.
Maybe add
reset_config srst_only srst_nogate connect_assert_srst
to your cfg or place an infinite loop at the very beginning of your startup code (but take care not to change the length of the startup code, so that all sections remain at precisely the same offsets).
Thank you for trying to reproduce.
It seems very curious that the actions of my firmware (which doesn't write to flash, etc) would affect the ability of openocd to program the chips flash, especially given that
reset initis being used here to reset & halt the target.I'll try out adding a loop in startup code so we can see if somehow openocd isn't managing to reset/halt the processor properly.
My firmware does use clocks (it increases clock speed to 216MHz), interrupts (enables a bunch of them, including a few timers), and enables the watchdog (specifically, IWDG).
I tried the
reset_config srst_only srst_nogate connect_assert_srstwithfw.elf. No change in behavior (still fails every other time) was noted (test script attached).I also modified
fw.elfto start with an infinite loop. No change in behavior was noted (still fails to flash via algorithm every-other time)Last edit: Cody Schafer 2018-08-27
On Wed, Aug 22, 2018 at 8:33 PM Ismail Kose ihkose@gmail.com wrote:
Also NEVER start openocd as root!
/Andreas
I've done some more testing wrt this issue and noticed that the failure is reproduced without even using the second bank. Instead, just programming my
fw.elfmultiple times causes exactly every-other attempt to programfw.elfto fail. (runningopenocd -f board/st_nucleo_f7.cfg -f d.cfg -c exiteach time)I've added some debug output to various code around stm32x_block_write to try and figure out what in particular is failing (see attached output & patches).
I've added various
mov r5, #0xbb(etc) instructions to try to track the progression of the algorithm, initially setting r5 to 0xaa. In the failure case, I've never seen anything by0xaa.I've observed that the
r0value returned by the algorithm (which should be the flash status register) does not appear to actually be the flash status register. It has bits set that are marked as "reserved" in the stm32f7/6xxx manual and it's value appears to change as I change/add to the algorithm asm. The value looks very much like a pointer to ram, possibly to the end of the code composing the algorithm.Edit:
Further examination indicated r0 was exactly
source->address: a pointer to the working area where the circular buffer would have been stored, which is preloaded into r0 prior to algorithm execution. This seems to indicate that the issue is that the algorithm is never getting started at all in the failure case, and the preloading of r0 withsource->addresswas hiding this (probably should use an additional register for return and preload with a sentinal to detect the "didn't execute" case)Last edit: Cody Schafer 2018-08-23
After some more digging I've seen the following:
Not yet clear to me why
Tis getting cleared in the first place, as resolving that would be ideal.Any chance the CPU is executing from zeroized or garbage memory on power on reset thus causing the T bit to be cleared at some stage, and then a double fault and lockup occurring?
However even if this was happening I would expect the debug connection and reset init to get it back into a known good state....
I've used the attached (4-line) patch on openocd master to workaround this issue (by setting xPSR.T).
I've submitted a variation of this for inclusion. http://openocd.zylin.com/#/c/4658/
A few more details (using
stlink_usb_v2_read_debug()to get values)stlink_usb_run(), xPSR.T==1 prior to clearingC_HALTand xPSR.T==0 after clearingC_HALT.C_HALT(theory: it immediately faults in this case).flash write_image erase fw.elfwork. Even though xPSR.T==0 on read back instlink_usb_run(), on second algorithm execution xPSR.T==1 is seen prior to clearingC_HALT.flash write_image erase fw.elfwork (no reset between). (seepf.cfgattached)pfr.cfg, attached).Theory: the reset is relevent because of the caching of register values by openocd. It's plausible that the st-link is failing to return the full xPSR in some cases, causing openocd to write-back different values into xPSR, clearing the xPSR.T bit.
Last edit: Cody Schafer 2018-08-27
Ah, found out why
xPSR.Twas 0:My
fw.elffile's first section (address 0x08000000 length 0x000020c4) is a bootloader, which is generated/included via (approximately) the following steps:boot-ldr.ois then linked into the image with the following linker script snippet:The key part is (for some reason) the section flags set when objcopying:
alloc,load,readonly,rom,data. These were added to the firmware somewhat recently.When loading the
fw.elfbinary composed with the bootloader image (generated as described above), the bootloader section infw.elf(.bootldr) appears to be filled with zeros rather than actual data. As a result, when the processor resets, it reads the second element of the interrupt vector (0, in this case), and setspc=0,xPSR=0. This is why after a reset I would observe failures would begin to happen (as a reset would trigger loading0intoxPSR).Failures only occured every-other time because erasing the flash (which happened without running a target algorithm on the device) causes the interrupt vector to contain (instead)
0xffffffff, resulting inpc=0xfffffffe,xPSR.T=1on the next reset. This likely means that a reset between erase and writing would have also worked around the issue.Removing the section flag specification from objcopy (
$OBJCOPY --rename-section .data=.bootldr boot.o boot-ldr.o) results in the values in the.bootldrsection being loaded as expected (rather than being set to zero). It's not yet clear to me why the section flags are having this effect. The diff fromarm-none-eabi-objdump -h fw.elfis below, and only shows that I've removed theREADONLYflag by not passing my explicit flags.On a related note: I discovered this while loading with gdb's
loadrather than using openocd'sprogramorflash write_image.In any case: while it's true that the
xPSR.Tbeing set to 0 is something not entirely related to target algorithms, it's also the case that given thatreset haltcan causexPSR.Tto be set to0, we should explictily set it when trying to run algorithms.This makes sense!
After a "reset halt" the PC is loaded from the reser vector and the thumb mode is set from the LSB of the reset vector.
In OpenOCD there is nothing that forces thumb mode before executing an algorithm (and every angorithm for ARM in contrib/loaders/ is written in thumb).
I have not tested your patch, but the functionality seams correct.
But now that you have clear the root cause, I suggest you to update both commit message and comment in your patch.
On Wed, Aug 29, 2018 at 06:49:55PM -0000, Cody Schafer wrote:
So obvious in the hindsight but boy what a rough trip you had finding
it! Thank you so much for your persistence and sharing the
result.
--
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav@gmail.com
For others running into this: turns out the magical flag for
objcopyiscontents, without which it zeros the section's content.