Hi all,
I'm developing a device called ColorHug+ (the successor to the ColorHug and ColorHug2 OpenHardware projects). Because this is open hardware people are actually encouraged to build and flash new firmware to the device. For the previous designs I've used a home-made custom HID bootloader, but for various reasons now want to use the DFU mechanism instead. Another design choice was that there is no external buttons on the device, so if you flash a bad firmware you're basically hosed unless you have a screwdriver and a hardware programmer handy.
To fix this, there is actually two firmwares on the device, a bootloader (non-user-updatable) and a firmware image that the user can update. What I'm doing is making the bootloader emulate a DFU programmer, and the firmware emulate the DFU runtime. This works really well and allows the user to dowload and upload from the device using dfu-util.
The extra step I have in the firmware is to set the "autoboot" flag when the user does a GetStatus when in appIDLE mode. This causes the bootloader to set a flag in the EEPROM and then skip straight from bootloader mode into the firmware image on re-plug.
The logic for doing this is that if the new firmware can do GetStatus (i.e. get the USB stack working, and push a few packets back on the interface) then it's highly likely it will be able to do Detach() and then do the right thing on USB bus reset, so you can update in the future.
Although this is not fool-proof, it does mean that the user that flashes a 0x8000 byte /dev/random doesn't end up with a brick. In this case the bootloader dutifully loads /dev/random into the firmware space, the host does a bus reset and the bootloader execs the firmware. The host then waits for the USB device to reappear in runtime mode and attempts to just get the status. If the firmware crashes or hangs either the watchdog timer catches us or the user unplugs and replugs. Because GetStatus() was never called we never set the "the firmware is sane enough to auto-boot it in the future" flag and we remain in DFU bootloader mode waiting for a new file.
Of course, another way of doing it would be to set the auto-boot flag only after we've done DFU_DETACH() and the timeout (I don't think you can do DFU_ABORT in runtime mode) returns us back into appIDLE mode, but that's slightly more complicated and also means we don't test "sending back" data using the USB stack.
So, what I'm asking (and well done if you're still reading this mega-email) is that dfu-util does everything it normally does with -D -R and then after the download has completed then do:
For anything that implements DFU this extra step is completely harmless. If you're open to this idea I'm happy to submit a patch.
Thanks,
Richard.
Anonymous
Hi Richard,
Thank for the suggestion and the long :) explanation.
Hmm, that is how most DFU capable devices do it, isn't it? Or am I already missing something here? A "DFU programmer" is not the device in DFU mode? I will reread it again next week, but in the meantime this is also unclear to me:
In appIDLE mode the "firmware" is running, so how does this cause the bootloader to set a flag? Or is your "firmware" (I guess we can call this the application) that sets this flag, and the bootloader only interprets it?
Would it be an alternative to have two flags in EPROM, one "just programmed" flag written by the bootloader when programming has finished (before it may decide to reboot the device), and which it will clear every time it boots, after checking it. If the check is positive, it can jump straight to the application to save the user from, say, a one minute timeout after which it will retry the application anyway, just as a failsafe measure. The application can after being enumerated or some other qualifying activity set a "app is fine" flag. The same flag is cleared by the bootloader when starting to program. On boot the bootloader will jump straight to the application if this flag is set. If I got this right, there would not be any need for special features in dfu-util here, because "app is fine" can be set by something else already used or available. I understand your suggestion uses a GetStatus in appIdle for this, which is close to what it should be able to handle in order to switch to DFU mode again. However normal devices are not expected to handle a GetStatus request in runtime mode. We might upset many devices with this "innocent" request, even if the USB protocol dictates they should reject it gracefully. Many devices supported by dfu-util have tiny firmwares and bootloaders with minimal USB stacks and not all are very robust.
Regards,
Tormod
Right, my DFU bootloader is in "DFU mode". I've got a little state diagram here https://github.com/hughski/colorhug-plus-firmware/blob/master/bootloader.c#L88 if that helps. We can't do a timeout as the bootloader hands over control to the firmware completely and does not keep running in the background; there's no RTC on most PIC models for this purpose either.
I also think it's fine to call GetStatus in appIDLE mode; "Figure A.1 Interface state transition diagram" seems to suggest that DFU_GETSTATUS can be requested in state 0 and 1, although I do accept some firmware might not implement the DFU runtime mode completely.
I was thinking a timeout before the bootloader jumps to the application. It would be skipped if "app is fine" or if "just programmed", so it would only mean staying in bootloader on subsequent reboots in case there is no app declaring itself "fine", but the point of the timeout is to give the app a (second?) chance, or if both flags are clear for some reason.
You're right about the A.1 diagram, it should be supported in theory. In practice, some devices don't even have a DFU runtime, but a DFU-less runtime and they enter DFU mode via button or other magic.
Yes, I agree. I think that setting the "firmware is okay" flag can be done with the coimpletion of a self test in the firmware, or in the case of the ColorHug+ by the sending of a color sample. The reason I chose the GetStatus() was that it was on the same interface as the DFU_DETACH, giving me some confidence the firmware can do enough to get back into the bootloader for a reflash.
Would you be okay If I made the "wait for appIDLE and GetStatus" optional with a --wait-for-appidle optional argument? Or perhaps just do it unconditionally but ignore all errors, which would solve the case of holding a specific button to get to DFU mode?
Would you still need any "wait for appIDLE and GetStatus" feature in dfu-util if you use a "firmware is okay" and "just programmed" flag? The bootloader clears the former and sets the latter when programming, the new firmware sets the former on a successful run. If any of them is set, the bootloader jumps immediately to the firmware on power-on (just clearing "just programmed" first). If both are clear, the bootloader lingers around for, say, 30 seconds, giving the user a chance to program a new one, but if no download is initiated in that time, jumps to the existing firmware to give it another chance. The timeout is optional, it can alternatively just stay in the bootloader.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
But how do we know if it's successfull?
I was thinking the firmware itself would be capable of figuring that out, for instance by running a checksum over itself.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Right; my "successful" means "actually works on the device" rather than "matches checksum" so we actually have to startup and use a significant part of the new firmware (this stops flashing a 32kb NUL-filled image bricking the device). For me, I chose "can start up a USB stack and communicate with the host" as a sanity check the firmware worked. For communicating with the host, dfu-util is only really able to use the runtime DFU interface (rather than some device/vendor-specific protocol) during the upload process.
I assume the firmware already has been tested and proven to be working correctly, i.e. by the developer. So it is only necessary to verify that that is has been loaded correctly and is able to execute. The firmware checksumming itself is a bit more of a sanity check than the bootloader doing the same, since somebody could have slammed a correct checksum onto a 0x8000 byte /dev/random dump.
Do you also want to catch the case where somebody has prepared a firmware that cannot communicate over USB but otherwise executes? Should it be part of some development kit where users cook their own, possibly broken, firmware?
Is it useful that dfu-util does this check for a working firmware running? Cannot simply the user/script/whatever who calls dfu-util also do something like "lsusb" to see that the device is there in its runtime mode afterwards?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Well, in this case "the developer" is possibly the end user, as this is OpenHardware with firmware that the user is supposed to hack (but without requiring them to take it to bits, and have an external programmer handy).
Having dfu-util call GetStatus on upload seemed like a nice compromise between user-safety and not requiring the user to run some custom binary to mark the firmware as "good". If you're not interested that's okay, and I'll just get users to call a binary after flashing, although this makes updating more error prone.
Ah, I see. Things can fail in all unimaginable ways then :) It is hard to make this absolutely foolproof without any hardware button, I would recommend having a jumper (or just solder points that needs to be short-circuited) inside. Having to pull out a screwdriver and open the case in the hopefully rare case of SW "bricking" should be acceptable.
Yes, since this is a pretty special case, non-standard and not common, I am reluctant to having to support it in dfu-util.