On Wed, Oct 30, 2013 at 1:25 PM, Jason Cipriani <jason.cipriani@gmail.com> wrote:

On Wed, Oct 30, 2013 at 1:22 AM, Jeff DeFouw <jeffd@grtavionics.com> wrote:
On 10/30/2013 12:40 AM, Jason Cipriani wrote:
> *Software*:
>
> Currently, all devices are being booted from an SD card containing a Linaro
> image created by following the instructions at
> http://www.b1gtuna.com/2012/08/installing-linaro-image-on-gumstix-overo/.
> There is our own custom software on there which is essentially just a
> fullscreen video player that starts with the window manager. Everything else
> is fairly stock.

I see the year 2012 in several places.  How old are your Linaro kernels?  It
was discovered late last year that the stock Linaro Overo kernel was missing
CONFIG_ARM_ERRATA_430973=y, which would lead to crashes.  The OMAP35xx
processor in the Water definitely needs that enabled.

The kernel is 3.2.1, dated July 2012:

3.2.1-linaro-omap #3 PREEMPT Thu Jul 26 17:05:26 PDT 2012

I can confirm that that configuration option is not enabled:

root:~# cat /boot/config-3.2.1-linaro-omap | grep ARM_ERRATA
# CONFIG_ARM_ERRATA_430973 is not set
# CONFIG_ARM_ERRATA_458693 is not set
# CONFIG_ARM_ERRATA_460075 is not set
CONFIG_ARM_ERRATA_720789=y
# CONFIG_ARM_ERRATA_743622 is not set
# CONFIG_ARM_ERRATA_751472 is not set
# CONFIG_ARM_ERRATA_754322 is not set

Although in reading the description of that erratum (http://cateee.net/lkddb/web-lkddb/ARM_ERRATA_430973.html), I am wondering if that can lead to the effect I am seeing. Of course it could lead to any number of strange things happening, but I'm not convinced that this isn't a red herring. The severity, predictability, and consistency of the issue I'm seeing *seems* mismatched with the effects of that erratum. Definitely no evidence *against* it, and in any case that does mean that the kernel image I'm using isn't appropriate for the OLD boards, but I still want to be sure of the cause of the memory issue.

If I can't come up with any other good theories I will see if I can reproduce it with a controlled kernel build with and without that workaround enabled (for now I am going to avoid dedicating time to that).


The BAD Gumstix, when running an Angstrom kernel:

3.2.28-rt42+ #6 PREEMPT RT Fri Sep 21 12:23:39 EDT 2012

Do not exhibit the memory corruption problem. These kernels do have the erratum workaround enabled:

root:~# gunzip -c /proc/config.gz | grep ARM_ERRATA
CONFIG_ARM_ERRATA_430973=y
# CONFIG_ARM_ERRATA_458693 is not set
# CONFIG_ARM_ERRATA_460075 is not set
# CONFIG_ARM_ERRATA_720789 is not set
# CONFIG_ARM_ERRATA_743622 is not set
# CONFIG_ARM_ERRATA_751472 is not set
# CONFIG_ARM_ERRATA_754322 is not set

That does lend support to that being the cause...

Jason