From: Steve S. <sa...@gm...> - 2008-01-22 20:35:02
|
Several folks have written to me asking about this issue, so I thought it would be good to give a quick status update via the mailing list. The issue: During boot large numbers of CRC error messages are sent to the console. They typically are of the form: JFFS2 notice: (213) jffs2_get_inode_nodes: Node header CRC failed at 0x9199dc. {0080,0080,00800080,00800080} What we know: Errors have only been seen on verdex XL6P motherboards The issue occurs with both buildroot and OpenEmbedded images Reflashing a new image does not cure the problem There is no difference in board revision between working and non-working units There are no part mis-loads on the failing boards Extensive R/W testing of flash with u-boot shows *no* errors Downgrading the cpu clock to 400Mhz does not cure the problem Downgrading ram and flash access timings to the most conservative values does not cure the problem The data pattern in the CRC error message is always the same The data pattern seems to suggest that the flash part is stuck in Read Status Register mode In Read Status Register mode this bit pattern indicates: "Device is ready" The linux code handling this functionality is: drivers/mtd/chips/cfi_cmdset_0001.c What we don't know: Why only some motherboards end up in this state How to safely modify the linux code to either avoid this condition or alternatively to detect it and safely send the flash part back to Read Array mode I will continue working on a solution to this problem. If you are an expert on the mtd code and want to help, please drop me an email! Steve |
From: Felix D. <fe...@gm...> - 2008-01-24 05:13:33
|
So does this mean that you think it's definitely a software issue? Why then would the same images that used to work not work anymore on new hardware? Felix On Jan 23, 2008 6:35 AM, Steve Sakoman <sa...@gm...> wrote: > > Extensive R/W testing of flash with u-boot shows *no* errors > |
From: Steve S. <sa...@gm...> - 2008-01-24 05:56:39
|
Felix, > So does this mean that you think it's definitely a software issue? It smells that way, but I am far from being "definite" in that assessment. The flash hardware seems to function perfectly when read/written by u-boot. I've done extensive tests writing and reading flash using u-boot (~32MB of data each time, with crc checks) and have experienced zero failures. >From following copious debug messages I placed in the linux mtd code I've determined that these errors always occur after a flash erase/write operation (but only occasionally) and that the flash memory stays in the Read Status Register mode. When it receives a Read Array command the part functions again as expected until another erase/write occurs that happens to trigger this condition. So it might be that the erase/write timing on this lot of flash components is different enough that is takes an uncommon path through the linux mtd code and that path is lacking the command to put the part back into Read Array mode. It also could be a really subtle hardware issue that is triggered by the higher currents involved in the erase/write operation. There's really no hard evidence at this point. But the fact that the u-boot code doesn't suffer from this problem definitely points towards software. I'm certainly not an expert on this section of the Linux code! I've sent a description of the problem to the linux-mtd mailing list in the hopes that the mtd team might be able to shed some light on what is going on here. No responses yet. I also found via google a 2 year old description of this same issue occurring in a PXA255/Strataflash based system. I wrote to the originator of this message to see if he was able to resolve it. No response yet there either. If anyone else would like to dig into the code, all of the action is occurring in: drivers/mtd/chips/cfi_cmdset_0001.c Steve On Jan 23, 2008 9:13 PM, Felix Duvallet <fe...@gm...> wrote: > So does this mean that you think it's definitely a software issue? Why then > would the same images that used to work not work anymore on new hardware? > > Felix > > > > > On Jan 23, 2008 6:35 AM, Steve Sakoman < sa...@gm...> wrote: > > > > Extensive R/W testing of flash with u-boot shows *no* errors > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users > > |
From: Scott P. <spa...@ho...> - 2008-01-24 07:42:42
|
I give up. Where PRECISELY does the robotstix directory tree go in order to get it to = compile? I have a=20 gumstix-buildroot/ /branches /trunk structure pulled from svn. I saw a previous post about moving the robostix tree but that does not seem to work no matter where I put it. Scott= |
From: Dave H. <dhy...@gm...> - 2008-01-24 08:34:05
|
jHi Scott, > Where PRECISELY does the robotstix directory tree go in order to get it to compile? The robostix directory should be in the same directory as gumstix-buildroot. So if you have ~/gumstix/gumstix-buildroot, then you should have ~/gumstix/robotsix Some portions of the robostix tree (specifically, ~/gumstix/robostix/gumstix) use the gumstix compiler. The rest of the robostix tree uses avr-gcc, which you need to install. under ubuntu, you can install avr-gcc by using: sudo apt-get install gcc-avr avr-libc > gumstix-buildroot/ > /branches > /trunk Hmm. What command did you use to pull this from svn? This looks like an incorrect directory structure for gumstix-buildroot. The correct command to retrieve gumstix-buildroot is: svn co http://svn.gumstix.com/gumstix-buildroot/trunk gumstix-buildroot ls gumstix-buildroot is expected to look something like this (after doing a build): build_arm_nofpu/ config.txt Makefile target/ u-boot.bin* config2.txt dl@ package/ toolchain/ uImage Config.in docs/ rootfs.arm_nofpu.jffs2 toolchain_build_arm_nofpu/ > I saw a previous post about moving the robostix tree but that does not seem > to work no matter where I put it. I think that's because you have the trunk directory in your gumstix-buildroot, which the robostix stuff is not expecting. I think you can fix it by renaming gumstix-buildroot to something else, and then moving trunk up a directory and calling it gumstix-buildroot. -- Dave Hylands Vancouver, BC, Canada http://www.DaveHylands.com/ |
From: Koen K. <ko...@do...> - 2008-01-24 08:43:04
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Sakoman schreef: | Several folks have written to me asking about this issue, so I thought | it would be good to give a quick status update via the mailing list. | | The issue: | | During boot large numbers of CRC error messages are sent to the | console. They typically are of the form: | | JFFS2 notice: (213) jffs2_get_inode_nodes: Node header CRC failed | at 0x9199dc. {0080,0080,00800080,00800080} | | What we know: | | Errors have only been seen on verdex XL6P motherboards | The issue occurs with both buildroot and OpenEmbedded images | Reflashing a new image does not cure the problem | There is no difference in board revision between working and non-working units | There are no part mis-loads on the failing boards | Extensive R/W testing of flash with u-boot shows *no* errors | Downgrading the cpu clock to 400Mhz does not cure the problem | Downgrading ram and flash access timings to the most conservative | values does not cure the problem | The data pattern in the CRC error message is always the same | The data pattern seems to suggest that the flash part is stuck in | Read Status Register mode | In Read Status Register mode this bit pattern indicates: "Device is ready" | The linux code handling this functionality is: | drivers/mtd/chips/cfi_cmdset_0001.c Do we know for sure this isn't problem with the erase-blocksize? regards, Koen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iD8DBQFHmE+IMkyGM64RGpERAkhyAJwMMZBsVi8bWbKIY/OOseyV8aWwlwCbBTLs R+uWvCR2y4P/b5Ox2X2DDPA= =wuos -----END PGP SIGNATURE----- |
From: Steve S. <sa...@gm...> - 2008-01-24 16:05:24
|
> Do we know for sure this isn't problem with the erase-blocksize? I'm fairly certain it isn't. It's got to be something much more subtle than that. Gumstix has shipped a large number of these motherboards with no issues. It's only the recent batch of boards where *some* units exhibit this problem. My suspicion is that it is timing related. If you read through the code you will see stuff like: /* If the flash has finished erasing, then 'erase suspend' * appears to make some (28F320) flash devices switch to * 'read' mode. Make sure that we switch to 'read status' * mode so we get the right data. --rmk */ map_write(map, CMD(0x70), adr); and if (time_after(jiffies, timeo)) { /* Urgh. Resume and pretend we weren't here. */ map_write(map, CMD(0xd0), adr); /* Make sure we're in 'read status' mode if it had finished */ map_write(map, CMD(0x70), adr); and /* What if one interleaved chip has finished and the other hasn't? The old code would leave the finished one in READY mode. That's bad, and caused -EROFS errors to be returned from do_erase_oneblock because that's the only bit it checked for at the time. As the state machine appears to explicitly allow sending the 0x70 (Read Status) command to an erasing chip and expecting it to be ignored, that's what we do. */ map_write(map, CMD(0xd0), adr); map_write(map, CMD(0x70), adr); and /* We've broken this before. It doesn't hurt to be safe */ map_write(map, CMD(0x70), adr); All of these send CMD(0x70), which puts the flash in Read Status Register mode. The chip has to explicitly be put back into Read Array mode (CMD(0xff)), and it is not clear to me that every exit path does this. There are many timing related paths through this code. The code is not particularly easy to follow, and I want to understand the issue rather than hack in CMD(0x70) in likely looking spots. If anyone wants to help track this down your help would be much appreciated! Steve On Jan 24, 2008 12:42 AM, Koen Kooi <ko...@do...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Steve Sakoman schreef: > > | Several folks have written to me asking about this issue, so I thought > | it would be good to give a quick status update via the mailing list. > | > | The issue: > | > | During boot large numbers of CRC error messages are sent to the > | console. They typically are of the form: > | > | JFFS2 notice: (213) jffs2_get_inode_nodes: Node header CRC failed > | at 0x9199dc. {0080,0080,00800080,00800080} > | > | What we know: > | > | Errors have only been seen on verdex XL6P motherboards > | The issue occurs with both buildroot and OpenEmbedded images > | Reflashing a new image does not cure the problem > | There is no difference in board revision between working and > non-working units > | There are no part mis-loads on the failing boards > | Extensive R/W testing of flash with u-boot shows *no* errors > | Downgrading the cpu clock to 400Mhz does not cure the problem > | Downgrading ram and flash access timings to the most conservative > | values does not cure the problem > | The data pattern in the CRC error message is always the same > | The data pattern seems to suggest that the flash part is stuck in > | Read Status Register mode > | In Read Status Register mode this bit pattern indicates: "Device is > ready" > | The linux code handling this functionality is: > | drivers/mtd/chips/cfi_cmdset_0001.c > > Do we know for sure this isn't problem with the erase-blocksize? > > regards, > > Koen > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (Darwin) > > iD8DBQFHmE+IMkyGM64RGpERAkhyAJwMMZBsVi8bWbKIY/OOseyV8aWwlwCbBTLs > R+uWvCR2y4P/b5Ox2X2DDPA= > =wuos > -----END PGP SIGNATURE----- > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > gumstix-users mailing list > gum...@li... > https://lists.sourceforge.net/lists/listinfo/gumstix-users > |