#955 xztfc does not converge for AARCH64

Developer_(v3.11.x)
closed-fixed
None
5
2014-09-17
2014-08-14
No

I am using 3.11.28 with redhat's patches to define aarch64. I am using a locally modified version of gcc (APM-6.0.4) 4.8.1.

I did a:
../configure -b 64 --with-netlib-lapack-tarfile=../../lapack-3.5.0.tgz -D c -DWALL

The xztfc step starts out fine, but eventually gets to the point where it is searching for larger and larger values of K, and never gets a speed up. Do you have any suggestions on what I should try?

tail of log:

TEST TA TB M N K alpha beta Time Mflop SpUp
==== == == === === === ===== ===== ===== ===== ====== ===== ====

42 T T 10 10 10 -1.0 0.0 1.0 0.0 12.49 820.8 1.00
42 T T 10 10 10 -1.0 0.0 1.0 0.0 11.78 870.2 1.06

TEST TA TB M N K alpha beta Time Mflop SpUp
==== == == === === === ===== ===== ===== ===== ====== ===== ====

43 N T 10 10 10 -1.0 0.0 1.0 0.0 11.48 893.3 1.00
43 N T 10 10 10 -1.0 0.0 1.0 0.0 14.78 693.8 0.78
44 N T 288 288 750 -1.0 0.0 1.0 0.0 3.03 3450.0 1.00
44 N T 288 288 750 -1.0 0.0 1.0 0.0 3.11 3364.1 0.98
45 N T 288 288 1500 -1.0 0.0 1.0 0.0 3.18 3442.9 1.00
45 N T 288 288 1500 -1.0 0.0 1.0 0.0 3.38 3243.2 0.94
46 N T 288 288 2250 -1.0 0.0 1.0 0.0 3.02 3464.1 1.00
46 N T 288 288 2250 -1.0 0.0 1.0 0.0 3.11 3361.8 0.97
47 N T 288 288 3000 -1.0 0.0 1.0 0.0 3.47 3444.9 1.00
47 N T 288 288 3000 -1.0 0.0 1.0 0.0 3.72 3209.0 0.93
48 N T 288 288 3750 -1.0 0.0 1.0 0.0 3.58 3470.6 1.00
48 N T 288 288 3750 -1.0 0.0 1.0 0.0 3.83 3246.6 0.94
49 N T 288 288 4500 -1.0 0.0 1.0 0.0 3.44 3470.9 1.00

Discussion

<< < 1 2 (Page 2 of 2)
  • Dave Nuechterlein

    Insufficient caffeine mistake. The failures are not in slv tests, but rather in the xXqrtst_pt tests.

    When I change just the ICC flags back to O3, then do the touches, the problem returns.

     
  • R. Clint Whaley

    R. Clint Whaley - 2014-08-22

    OK, the fact that it is failing ONLY in QR makes me suspicious the problem is due to an incomplete ARMv8 port, so let's make sure RedHat fixed it so you don't count on have strongly-ordered caches.

    Edit ATLAS/include/atlas_pca.h

    You can search for ATL_ARCH_ARMv7. In my file, line 46 of this file looks like

    #elif defined(ATL_ARCH_ARMv7)
    

    but yours should say something like

    #elif defined(ATL_ARCH_ARMv7) || defined(ATL_ARCH_ARMv8)
    

    Under the assumption that their patches correctly make configure define ARMv8. If this line is missing anything like this, you can change it for now to:

    #elif 1 || defined(ATL_ARCH_ARMv7)
    

    There are some parallel ops in QR where we count on strongly ordered caches, and this file turns off those optimizations for architectures with weakly ordered caches (like ARMv7, and I'm guessing, ARMv8). Then, the reason why turning down optimization fixes the problem, is that it gives extra time for the weakly ordered caches to cohere, so they usually work like they are strongly ordered.

    After making the fix, take your code with full opt, and redo make ptcheck.

    Let me know,
    Clint

     
    Last edit: R. Clint Whaley 2014-08-22
  • Dave Nuechterlein

    No, it is even weirder than that. The problem appears random. If I just rerun:
    ./xdqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l

    sometimes it passes, sometimes it fails.

    dnuechte@mustang06:~/ATLAS/build.2/bin$ ./xdqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l
    Rt Maj M N lda TIME MFLOP RESIDUAL
    == === ===== ===== ===== ========== ========== =========
    QR Col 517 477 517 3.9082e-02 4180.62 1.81e-02
    QL Col 517 477 517 3.8134e-02 4284.55 1.87e-02
    RQ Col 517 477 517 3.7025e-02 4413.41 1.90e-02
    LQ Col 517 477 517 3.6747e-02 4446.80 1.82e-02

    4 cases ran, 4 cases passed

    dnuechte@mustang06:~/ATLAS/build.2/bin$ ./xdqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l
    Rt Maj M N lda TIME MFLOP RESIDUAL
    == === ===== ===== ===== ========== ========== =========
    QR Col 517 477 517 3.8993e-02 4190.17 1.81e-02
    QL Col 517 477 517 3.7965e-02 4303.63 2.63e+11
    RQ Col 517 477 517 3.6866e-02 4432.45 1.90e-02
    LQ Col 517 477 517 3.6685e-02 4454.32 1.82e-02

    4 cases ran, 1 cases failed, 3 cases passed

    dnuechte@mustang06:~/ATLAS/build.2/bin$ ./xdqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l
    Rt Maj M N lda TIME MFLOP RESIDUAL
    == === ===== ===== ===== ========== ========== =========
    QR Col 517 477 517 3.8941e-02 4195.76 1.83e-02
    QL Col 517 477 517 3.7906e-02 4310.32 1.89e-02
    RQ Col 517 477 517 3.7228e-02 4389.35 1.88e-02
    LQ Col 517 477 517 3.6754e-02 4445.96 6.83e+11

    4 cases ran, 1 cases failed, 3 cases passed

    dnuechte@mustang06:~/ATLAS/build.2/bin$ ./xdqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l
    Rt Maj M N lda TIME MFLOP RESIDUAL
    == === ===== ===== ===== ========== ========== =========
    QR Col 517 477 517 3.8776e-02 4213.61 1.81e-02
    QL Col 517 477 517 3.8043e-02 4294.80 2.45e+11
    RQ Col 517 477 517 3.6912e-02 4426.93 1.90e-02
    LQ Col 517 477 517 3.6828e-02 4437.02 1.82e-02

    4 cases ran, 1 cases failed, 3 cases passed

    dnuechte@mustang06:~/ATLAS/build.2/bin$ ./xdqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l
    Rt Maj M N lda TIME MFLOP RESIDUAL
    == === ===== ===== ===== ========== ========== =========
    QR Col 517 477 517 3.9012e-02 4188.12 1.81e-02
    QL Col 517 477 517 3.7836e-02 4318.30 1.87e-02
    RQ Col 517 477 517 3.7145e-02 4399.16 1.90e-02
    LQ Col 517 477 517 3.6759e-02 4445.35 1.82e-02

    4 cases ran, 4 cases passed

     
  • R. Clint Whaley

    R. Clint Whaley - 2014-08-22

    That's the expected behaviour if the problem is as I outlined (lack of strongly-ordered caches causes race conditions). So, have you scoped the problem as I said, and still have the problem, or what?

     
  • Dave Nuechterlein

    No, I have not yet handled the lack of strongly ordered cache support. I have been fighting with rebulding the locally modified compiler to use the existing glibc instead of a newer one. The initial rootfs image on my board was not well put together. I am not using a ubuntu or redhat distribution. We have our own version of the kernel and rootfs because we are ahead of everyone on the aarch64 board support.

     
  • Dave Nuechterlein

    I am now using 3.11.30 and I no longer see this issue. This may be closed.

     
  • R. Clint Whaley

    R. Clint Whaley - 2014-09-17
    • status: open --> closed-fixed
     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks