#197 PPCG4 cannot pass ptcheck

Developer
closed-fixed
5
2012-03-24
2012-03-07
No

New PPCG4 passes make check and time, but dies in make ptcheck:

/home/whaley/TEST/ATLAS3.9.69.1/obj32/bin/ATLrun.sh /home/whaley/TEST/ATLAS3.9.69.1/obj32/bin xsqrtst_pt -n 1 477 -m 1 517 -U 2 u l \ -S 2 r l >> /home/whaley/TEST/ATLAS3.9.69.1/obj32/bin/ptsanity.out
make[3]: *** [ssanity_test_pt] Error 4
make[3]: Leaving directory `/home/whaley/TEST/ATLAS3.9.69.1/obj32/bin'
make[2]: *** [ptsanity_test] Error 2
make[2]: Leaving directory `/home/whaley/TEST/ATLAS3.9.69.1/obj32/bin'
make[1]: *** [ptsanity_test] Error 2
make[1]: Leaving directory `/home/whaley/TEST/ATLAS3.9.69.1/obj32'
make: *** [pttest] Error 2
whaley@etl-g42:~/TEST/ATLAS3.9.69.1/obj32$ d bin/

Here is the run by hand:

./xsqrtst_pt -n 1 477 -m 1 517 -U 2 u l -S 2 r l
Rt Maj M N lda TIME MFLOP RESIDUAL
== === ===== ===== ===== ========== ========== =========
QR Col 517 477 517 1.8141e-01 900.65 4.65e+04
QL Col 517 477 517 1.6771e-01 974.20 7.98e+04
RQ Col 517 477 517 1.7407e-01 938.77 4.99e+04
LQ Col 517 477 517 1.7403e-01 938.97 3.03e+04

or fails every test. Serial is fine:
./xsqrtst -n 1 477 -m 1 517 -U 2 u l -S 2 r l
Rt Maj M N lda TIME MFLOP RESIDUAL
== === ===== ===== ===== ========== ========== =========
QR Col 517 477 517 2.2692e-01 720.02 1.81e-02
QL Col 517 477 517 2.2317e-01 732.11 1.25e-02
RQ Col 517 477 517 2.1841e-01 748.17 1.44e-02
LQ Col 517 477 517 2.1909e-01 745.86 1.40e-02

Here's another run:
./xsqrtst_pt -N 20 200 20
Rt Maj M N lda TIME MFLOP RESIDUAL
== === ===== ===== ===== ========== ========== =========
QR Col 20 20 20 4.8400e-04 23.88 1.50e-01
QR Col 40 40 40 7.0200e-04 126.38 6.76e-02
QR Col 60 60 60 1.5630e-03 189.05 5.73e-02
QR Col 80 80 80 3.2320e-03 215.30 4.84e-02
QR Col 100 100 100 6.2740e-03 215.78 4.06e-02
QR Col 120 120 120 7.7710e-03 300.27 3.98e-02
QR Col 140 140 140 1.1118e-02 332.66 3.70e-02
QR Col 160 160 160 1.5052e-02 366.28 6.47e+04
QR Col 180 180 180 1.8676e-02 419.88 7.15e+04
QR Col 200 200 200 2.2592e-02 475.73 6.21e+04

Discussion

  • R. Clint Whaley

    R. Clint Whaley - 2012-03-07

    If I make this link to the serial BLAS, the error still occurs, but if I link the serial lapack with the parallel blas, the error goes away, so apparantly the error is in ptlapack, though it could be due to a blas call that goes bad in serial or parallel that only occurs in ptlapack. In that case, the likley suspect would be a L2BLAS call arising from PCA code.

     
  • R. Clint Whaley

    R. Clint Whaley - 2012-03-08

    double precision works fine.

     
  • R. Clint Whaley

    R. Clint Whaley - 2012-03-09
    • status: open --> open-fixed
     
  • R. Clint Whaley

    R. Clint Whaley - 2012-03-09

    This was a stack overwrite in my assembly file ATL_smm4x4x128_av.c that seemed to only cause a problem for 32-bit Linux. I have started a new search to update archdefs. Will need to make sure corrected file is still OK on G5.

     
  • R. Clint Whaley

    R. Clint Whaley - 2012-03-24

    was really out-of-order write problem. Fixed by 3.9.71

     
  • R. Clint Whaley

    R. Clint Whaley - 2012-03-24
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks