Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#204 segfault in cblas_dgemm

Developer
closed-fixed
5
2012-06-23
2012-06-22
Volker Braun
No

The attached code segfaults in cblas_dgemm for me. Backtrace:

[vbraun@volker-desktop tmp]$ gdb a.out
GNU gdb (GDB) Fedora (7.4.50.20120120-42.fc17)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/a.out...(no debugging symbols found)...done.
(gdb) run
Starting program: /tmp/a.out

Program received signal SIGSEGV, Segmentation fault.
0x000000000040d560 in MNLOOP ()
(gdb) bt
#0 0x000000000040d560 in MNLOOP ()
#1 0x00000000004089bb in ATL_dmmIJK2 ()
#2 0x00000000004096c4 in ATL_dmmIJK ()
#3 0x0000000000402a1b in ATL_dgemm ()
#4 0x0000000000402208 in cblas_dgemm ()
#5 0x0000000000401c4d in main ()
(gdb) disassemble
Dump of assembler code for function MNLOOP:
=> 0x000000000040d560 <+0>: vmovapd -0x80(%rax),%ymm4
0x000000000040d565 <+5>: vmovapd -0x80(%rcx),%ymm0
0x000000000040d56a <+10>: vmulpd %ymm0,%ymm4,%ymm7
0x000000000040d56e <+14>: vmovapd -0x80(%rcx,%rbx,1),%ymm1

Discussion

  • Volker Braun
    Volker Braun
    2012-06-22

    Sample code that segfaults for me

     
  • Hi,

    From you description, I assumed your library had built correctly, but the resulting lib was seg faulting. However, the error file you posted is dying in essentially the first step, long before any GEMM is built ????

    I can't tell you for sure why things are dying, but it looks like you you did a make -j10 rather than just 'make' in order to build? If so, this is illegal: ATLAS will automatically use parallel make anywhere it is legal.

    Finally, it looks like you have a machine that ATLAS's configure cannot recognize. Can you post the output of 'cat /proc/cpuinfo'.

    Thanks,
    Clint

     
    • assigned_to: nobody --> rwhaley
     
  • Volker Braun
    Volker Braun
    2012-06-22

    Sorry it was a couple of days ago, I might have posted the wrong error log file. I'm recompiling right now. I have MAKE='make -j10' set in my .bash_profile, but I only typed "make" to build.

    This is an Ivy Bridge i7, 32GB RAM:

    [vbraun@volker-desktop ATLAS]$ cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 58
    model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
    stepping : 9
    microcode : 0x12
    cpu MHz : 3501.000
    cache size : 8192 KB
    physical id : 0
    siblings : 8
    core id : 0
    cpu cores : 4
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 13
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
    bogomips : 7000.62
    clflush size : 64
    cache_alignment : 64
    address sizes : 36 bits physical, 48 bits virtual
    power management:

    (and repeated for a total of 8 = four physical cores + hyperthreading)

     
  • Volker Braun
    Volker Braun
    2012-06-22

    After recompiling everything cblas_gemm works. Thank you for the help & sorry for the false alert!

     
    • status: open --> open-fixed
     
  • I assume you just compiled without the -j, and that made the difference? If so, then I guess this issue is completely resolved for you?

    Also, just out of curiosity, let us see how Corei2 archdefs do on your system. Can you do a new configure, but add the flag:
    -A 26
    to your configure line?

    After this completes (should only take 30 min or so), do a "make time", and then compare your install (which I'm assuming uses a full search), to this Corei2 install, by issuing the following in your Corei2 BINdir:
    ./xatlbench -dp <path to original install BLDdir)/bin/INSTALL_LOG -dc bin/INSTALL_LOG
    and post the results.

    Thanks,
    Clint

     
  • Volker Braun
    Volker Braun
    2012-06-23

    Yes, I unset MAKE before compiling. Otherwise the build will fail within a few minutes. For the record, I used the following configuration:

    ../ATLAS/configure --with-netlib-lapack-tarfile=../lapack.tgz --prefix=/home/vbraun/Code/ATLAS/local -Si latune 0 -Fa alg "-fPIC -g"

    ###################################################################
    Result of the full search (I'm pretty sure it did because it took a loong time):

    res+off= 24998.5 1.00 ---

    BIG_MM N=1600, mf=24998.50,24979.40!

    The times labeled Reference are for ATLAS as installed by the authors.
    NAMING ABBREVIATIONS:
    kSelMM : selected matmul kernel (may be hand-tuned)
    kGenMM : generated matmul kernel
    kMM_NT : worst no-copy kernel
    kMM_TN : best no-copy kernel
    BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
    kMV_N : NoTranspose matvec kernel
    kMV_T : Transpose matvec kernel
    kGER : GER (rank-1 update) kernel
    Kernel routines are not called by the user directly, and their
    performance is often somewhat different than the total
    algorithm (eg, dGER perf may differ from dkGER)

    Clock rate=3501Mhz
    single precision double precision
    ********************* ********************
    real complex real complex
    Benchmark % Clock % Clock % Clock % Clock
    ========= ========= ========= ========= =========
    kSelMM 1420.7 1293.9 688.0 734.2
    kGenMM 210.8 218.5 226.2 222.3
    kMM_NT 208.8 214.2 220.1 207.5
    kMM_TN 215.0 224.5 219.0 210.5
    BIG_MM 1393.0 1411.9 684.3 714.0
    kMV_N 342.8 526.0 131.4 263.1
    kMV_T 342.8 526.0 131.5 263.0
    kGER 171.4 350.4 65.7 175.3

    ###################################################################
    Result of -A 26:

    res+off= 26072.4 1.00 ---

    BIG_MM N=1600, mf=26072.40,26134.80!

    The times labeled Reference are for ATLAS as installed by the authors.
    NAMING ABBREVIATIONS:
    kSelMM : selected matmul kernel (may be hand-tuned)
    kGenMM : generated matmul kernel
    kMM_NT : worst no-copy kernel
    kMM_TN : best no-copy kernel
    BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
    kMV_N : NoTranspose matvec kernel
    kMV_T : Transpose matvec kernel
    kGER : GER (rank-1 update) kernel
    Kernel routines are not called by the user directly, and their
    performance is often somewhat different than the total
    algorithm (eg, dGER perf may differ from dkGER)

    Reference clock rate=3292Mhz, new rate=3501Mhz
    Refrenc : % of clock rate achieved by reference install
    Present : % of clock rate achieved by present ATLAS install

    single precision double precision
    ******************************** *******************************
    real complex real complex
    --------------- --------------- --------------- ---------------
    Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
    ========= ======= ======= ======= ======= ======= ======= ======= =======
    kSelMM 1289.9 1508.0 1188.7 1292.6 686.7 496.4 647.4 474.1
    kGenMM 198.2 174.2 198.5 221.5 193.9 135.6 196.0 155.0
    kMM_NT 193.7 209.1 195.2 219.6 184.2 118.7 188.5 137.4
    kMM_TN 198.5 221.1 197.9 224.2 189.8 138.6 189.5 134.9
    BIG_MM 1213.8 1360.6 1241.3 1384.8 652.0 742.9 661.4 746.5
    kMV_N 224.3 171.5 438.8 350.6 115.9 131.4 205.8 131.5
    kMV_T 224.6 171.4 460.3 350.6 123.2 87.6 211.3 175.3
    kGER 148.3 114.3 290.2 262.9 73.3 65.7 144.3 131.4

    ###################################################################
    build-2 = full search
    build-3 = Corei2

    [vbraun@volker-desktop build-3]$ ./xatlbench -dp ../build-2/bin/INSTALL_LOG -dc bin/INSTALL_LOG

    The times labeled Reference are for ATLAS as installed by the authors.
    NAMING ABBREVIATIONS:
    kSelMM : selected matmul kernel (may be hand-tuned)
    kGenMM : generated matmul kernel
    kMM_NT : worst no-copy kernel
    kMM_TN : best no-copy kernel
    BIG_MM : large GEMM timing (usually N=1600); estimate of asymptotic peak
    kMV_N : NoTranspose matvec kernel
    kMV_T : Transpose matvec kernel
    kGER : GER (rank-1 update) kernel
    Kernel routines are not called by the user directly, and their
    performance is often somewhat different than the total
    algorithm (eg, dGER perf may differ from dkGER)

    Reference clock rate=3501Mhz, new rate=3501Mhz
    Refrenc : % of clock rate achieved by reference install
    Present : % of clock rate achieved by present ATLAS install

    single precision double precision
    ******************************** *******************************
    real complex real complex
    --------------- --------------- --------------- ---------------
    Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
    ========= ======= ======= ======= ======= ======= ======= ======= =======
    kSelMM 1420.7 1508.0 1293.9 1292.6 688.0 496.4 734.2 474.1
    kGenMM 210.8 174.2 218.5 221.5 226.2 135.6 222.3 155.0
    kMM_NT 208.8 209.1 214.2 219.6 220.1 118.7 207.5 137.4
    kMM_TN 215.0 221.1 224.5 224.2 219.0 138.6 210.5 134.9
    BIG_MM 1393.0 1360.6 1411.9 1384.8 684.3 742.9 714.0 746.5
    kMV_N 342.8 171.5 526.0 350.6 131.4 131.4 263.1 131.5
    kMV_T 342.8 171.4 526.0 350.6 131.5 87.6 263.0 175.3
    kGER 171.4 114.3 350.4 262.9 65.7 65.7 175.3 131.4

     
  • Volker Braun
    Volker Braun
    2012-06-23

    The "make time" data as a text file

     
    Attachments
  • Volker Braun
    Volker Braun
    2012-06-23

    gdb backtrace of cblas_dgemm crash

     
  • Volker Braun
    Volker Braun
    2012-06-23

    I ran the Sage testsuite with the tuned build (build-2) and got again a crash in cblas_gemm(). Though this time it is different, it doesn't crash with the code sample that I originally attached. Since I compiled it with -g, I get a useful backtrace which I attached. The crash only happens with very specific matrix sizes, if I do the same computation for modular forms of slightly different weight then it works fine.

     
  • Volker Braun
    Volker Braun
    2012-06-23

    PS: If I change the matrix size in my original testcase to

    int n = 52;
    int m = 264;

    then I can trigger the crash with both the tuned build (build-2) and the -A 26 build (build-3)

     
  • Volker Braun
    Volker Braun
    2012-06-23

    improved testcase that loops over matrix sizes

     
    Attachments
  • Volker Braun
    Volker Braun
    2012-06-23

    I've attached an improved testcase that loops over matrix sizes. For me, it goes up to

    [...]
    Testing n=51
    Testing n=52
    Segmentation fault (core dumped)

     
    • milestone: 148063 -->
    • labels: 360153 -->
     
    • milestone: --> Developer
    • labels: --> Incorrect answer
    • status: open-fixed --> open-accepted
     
  • I confirm this as a bug in ATLAS. It can be reproduced by the ATLAS tester on my Corei2 arch:
    ./xdmmtst -m 264 -n 52 -k 52

     
  • To fix this bug, edit your ATLAS/include/atlas_misc.h, and change line 301 from:
    #define ATL_MinMMAlign 16
    to:
    #define ATL_MinMMAlign 32

    Then, force a recompile of the blas by issuing (from BLDdir/bin):
    make xdl3blastst xcl3blastst xzl3blastst xsl3blastst

     
  • Fixed in basefiles, can be closed when 3.9.80 is released.

     
    • status: open-accepted --> open-fixed
     
  • Volker Braun
    Volker Braun
    2012-06-23

    I can confirm that this fixes the issue! Now the Sage testsuite runs without errors.

     
    • status: open-fixed --> closed-fixed
     
  • I have just released 3.9.80, which should have the bug fix already in it. If your tests show that 3.9.80 still suffers from this particular bug, reopen this report; for new bugs, open a new one.

    Thank you very much for bringing this bug to my attention, and confirming the fix!

    Thanks,
    Clint