Menu

elk-6.8.4 / gfortran-10.2.1 / libxc-5.0.0: tests crash on aarch64 and ppc64le on fedora 33

2020-09-08
2020-09-08
  • marcindulak

    marcindulak - 2020-09-08

    See f33 build logs https://koji.fedoraproject.org/koji/taskinfo?taskID=50951060. An example aarch64 error is given below:

    + tee test-libxc.1_openmpi.log
    cd tests-libxc; ./test.sh
    Running test in directory test_001...
     Passed
    Running test in directory test_002...
     Passed
    Running test in directory test_003...
     Passed
    Running test in directory test_004...
    Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG
    ERROR STOP 
    Error termination. Backtrace:
    #0  0xffffb2fa8d7b in ???
    #1  0xffffb2fa9b17 in ???
    #2  0xffffb2fab05f in ???
    #3  0x4e7963 in testcheck_
        at /builddir/build/BUILD/elk-6.8.4/src/testcheck.f90:42
    #4  0x40e473 in elk
        at /builddir/build/BUILD/elk-6.8.4/src/elk.f90:214
    #5  0x406117 in main
        at /builddir/build/BUILD/elk-6.8.4/src/elk.f90:8
     Failed! See test.log and output files
    Running test in directory test_005...
     Passed
    Running test in directory test_006...
     Passed
    real    8m20.412s
    user    31m46.019s
    sys 0m44.353s
    

    There are more errors on ppc64le.

    For comparison elk-6.3.2 / gfortran-10.0.1 / libxc-4.3.3 https://koji.fedoraproject.org/koji/buildinfo?buildID=1455707 does not seem to experience any test errors.

    See also the libxc issue on fedora https://bugzilla.redhat.com/show_bug.cgi?id=1876735 about helping in diagnosing the source of the crashes.

     
  • J. K. Dewhurst

    J. K. Dewhurst - 2020-09-08

    That's because elk calls

    error stop
    

    if the test parameters are outside tolerance. Previously it simply called 'stop' which didn't generate an error, however this would not stop the MPI test runs.

    Take a look at the test.log file. Perhaps the error is relatively small and the tolerance can be changed. This particular test is the Tran-Blaha functional which is quite sensitive.

    Regards,
    Kay.

     
  • marcindulak

    marcindulak - 2020-09-08

    I'm running this on a build system and don't have access to those platforms.

    I could try to modify the test*sh scripts, but would it be possible to add an official "debug" option to the test*sh scripts to print the full content in case of failures (current and reference)?

     
  • marcindulak

    marcindulak - 2020-09-08

    I've modified the tests scripts to "cat test.log"

    sed -i "/Failed/ a \ \ \ \ cat test.log" tests-libxc/test.sh
    

    You can find the failed tests on aarch64 and ppc64le in the build.log files available at https://koji.fedoraproject.org/koji/taskinfo?taskID=51082141 (fedora 34)

    There are two sets of tests: one for openmpi and another for mpich build, both include "make test-libxc" and "make test-mpi". There are some large discrepancies wrt the reference output, and some "diagonalisation failed", the problems may be also related to flexiblas-3.0.2 used (fedora 33 is switching to it https://bugzilla.redhat.com/show_bug.cgi?id=1860504).

     

    Last edit: marcindulak 2020-09-10

Log in to post a comment.