#933 Error 822 during Cache Edge detection on Raspberry Pi

Stable_(v3.10.x)
closed-fixed
None
5
2014-08-15
2014-01-09
Nikos Kyrtatas
No

Dear ATLAS developers,

First of all thank you very much for creating and sharing ATLAS. I have already successfully built ATLAS on my Cortex-A8, Cortex-A9 and Intel Atom machines, but I get the error 822 during cache edge detection when trying to build ATLAS 3.10.1 on my Raspberry Pi (ARM1176JZF-S processor, running ARCH Linux, kernel 3.6.11-17-ARCH+). The gcc version installed is 4.7.2. The flags that I pass to configure are: -D c -DWALL -D c -DATL_ARM_HARDFP=1 -Si archdef 0 --nof77 -m 700. Also is it normal that architecture is configured as UNKNOWN32? Any help would be greatly appreciated.

1 Attachments

Discussion

1 2 > >> (Page 1 of 2)
    • assigned_to: R. Clint Whaley
     
  • The UNKNOWN is understandable: I don't have access to a PI, and so all ATLAS is saying is that "this is not one of the handful of ARM machines Clint has access to / user support for".

    The semester is about to begin, and presently all my time is going to getting ready for teaching a new class. Hopefully I'll catch up on the months long support queue once the semester gets going.

    In the meantime, maybe you could try the newest 3.11, in the off chance that works (ARM support is quite a bit more advanced in the developer series".

    Thanks,
    Clint

     
  • Nikos Kyrtatas
    Nikos Kyrtatas
    2014-01-11

    Hi Clint,

    Thanks a lot for the fast response. I tried 3.11.22 as you suggested and got ERROR 653 DURING CACHESIZE SEARCH, with the same configuration flags as the ones mentioned above. I have attached the relevant error file in case you want to take a look when you have time.

    Thanks again,
    Nikos

     
  •  
  • OK, I took a quick look at your error files, and I think I see what is going on.

    In your first 3.10 install, you did indeed die in cacheedge, and the system killed your run. This almost certainly means that the problem is that you don't have enough memory to run this test, and so the system killed the run when it ran out of memory.

    In the second install, you died with an "illegal instruction" immediately (far before cacheedge search). The difference between compiler flags is that the 3.11 install includes "-mfpu=vfpv3-d16". Since this results in illegal instructions, this leads me to suspect that your PI does not have a floating point unit (FPU) at all. In this case, I suspect that the HARDFP your asking for will not work if you get a library to work, and that ATLAS (or any FPU-centric code) performance will be terrible even once you get an install going, since all floating point will be done in software.

    So, does your PI have a FPU? If not, why are you using HARDFP (won't that require use of registers you don't have?)?

    We can work around the out-of-mem error in cacheedge by overriding atlas's search with a bogus value, but ATLAS is likely to die on other tests that require a lot of memory, which we'll also have to manually override. If you don't have an FPU, then this might not be worth the hassle, since your performance will be so low. ATLAS might provide some speedup over netlib blas due to cache blocking, but software emulation may be so slow that it won't.

    Let me know,
    Clint

     
  • Nikos Kyrtatas
    Nikos Kyrtatas
    2014-01-16

    Hi Clint,

    Raspberry Pi has a FPU, but it implements vfpv2 instead of vfpv3-d16 that was detected by atlas in the 3.11 install and probably this is what led to the "illegal instruction" error. So I tried to build 3.10 after overriding atlas's CacheEdge search by creating a folder ARMHARDFP with contents as shown below:

    ARMHARDFP/UNKNOWN32/gemm/atlas_cacheedge.h:
    #ifndef ATLAS_CACHEEDGE_H
    #define ATLAS_CACHEEDGE_H
    #define CacheEdge 147456
    #endif

    and then calling configure with the flags: -D c -DWALL -D c -DATL_ARM_HARDFP=1 -Si archdef 0 --nof77 -m 700 -Ss ADdir <path>/ARMHARDFP -t 0 .

    Practically I downloaded the arch defaults for ARMv732NEON, changed the folder name to UNKNOWN32 and deleted all files except atlas_cachedge.h . Since for all my experiments my data are L1-cache resident, I left for CacheEdge the default value for ARMv732NEON. I am not sure if this was the right thing to do, especially after seeing in SUMMARY.LOG that CacheEdge was set to 786432 bytes and not to the value in atlas_cacheedge.h, but in the end the installation finished successfully after ~2 days of execution.

    Thanks a lot,
    Nikos

     
    Attachments
  • The reason your clever cacheedge work came to naught was that you threw the configure flag "-Si archdef 0", which tells the machine not to use the archdefs that you had created in and pointed to by "-Ss ADdir" :)

    There are easier ways to override just cacheedge, but it looks like you don't need them! I'm marking this as open-fixed, and you can close if you agree.

    The performance doesn't look too steller, but I'm not sure what to compare it to. Clearly, the machine is strongly memory-limited in perf (almost always the case with low-power machines).

    You can probably also get 3.11 to work by removing the offending flag from the compiler flags, since 3.10 finished. It is barely possible this might be slightly faster, but it would take even longer to complete. Also, since 3.10 had failed before, it may be hit or miss that you install finishes (depending on how much memory is being used by system or how large certain tunings in ATLAS goes), so installs may often get killed.

    BTW, can you summarize the difference between vfpv2 and vfpv3-d16? I should eventually write a probe that distinguishes the two so other people don't have the problem in 3.11 that you had.

    Thanks,
    Clint

     
    • Mr. Morden
      Mr. Morden
      2014-08-15

      Hi, Clint,

      I'm having a similar problem with my Raspberry Pi on 3.11.28. Dying on cache edge. You said there was a way around the problem? I'm happy to start from scratch if you know the best flags to pass at the configure step.

      Matt

       
    • status: open --> open-fixed
     
  • Nikos Kyrtatas
    Nikos Kyrtatas
    2014-01-17

    I see. So it was plain luck that compilation finished this time.

    Regarding vfp versions, vfpv2 is an optional floating-point extension to older architectures like ARMv5 and ARMv6 (ARMv6 is the one implemented by the ARM1176 processor of Pi). It has 32 singleword floating-point registers that can also be seen as 16 doubleword ones. vfpv3 is an optional extension to ARMv7, implemented by processors like Cortex-A8 and Cortex-A9. It comes in two versions, vfpv3-d16 and vfpv3-d32, depending on the number of doubleword floating-point registers used (16 or 32). So vfpv2 and vfpv3-d16 are similar in terms of number of registers, but they are used by different architectures. Also the vfpv3 instruction set is a superset of the vfpv2 one.

    Thanks for the help,
    Nikos

     
1 2 > >> (Page 1 of 2)