Menu

#148 Random crash on build time

Stable
closed-fixed
5
2014-06-27
2010-03-19
No

Hello,

I am currently repackaging Atlas 3.8.X for Debian. I simplified the
previous process and fixes many issues.

However, from time to time, I have a random bug occurring at build time.
This does not happen all the time but the ratio is pretty important
(about once every 4 or 5 builds).
Does any one know what is wrong ?

The configure option:
./xconfig -d
s /build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-core2sse3/../../ -d b /build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-core2sse3 -D c -DWALL -b 32 -Fa alg "-Wa,--noexecstack -fPIC" -Ss f77lib "-L/usr/lib/gcc/i486-kfreebsd-gnu/4.4.3/ -lgfortran -lgcc_s" -Ss flapack /usr/lib/liblapack_pic.a -A 14 -V 28 -v 2 -Si cputhrchk 0

[...]

The command is the following:
gcc -DL2SIZE=4194304
-I/build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-sse3/include -I/build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-sse3/../..//include -I/build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-sse3/../..//include/contrib -DAdd_ -DF77_INTEGER=int -DStringSunStyle -DATL_OS_FreeBSD -DATL_ARCH_P4 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632 -DWALL -DATL_UCLEANM -DATL_UCLEANN -DATL_UCLEANK -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -Wa,--noexecstack -fPIC -m32 -c ATL_zupNBmm_b0.c
ATL_zupNBmm_b0.c: In function "ATL_zpNBmm_b0":
ATL_zupNBmm_b0.c:61: error: "else" without a previous "if"
ATL_zupNBmm_b0.c:65: error: "else" without a previous "if"

I don't have the file available but the syntax is clearly wrong (I can
produce it if really needed).

The log is available here:
https://buildd.debian.org/fetch.cgi?&pkg=atlas&ver=3.8.3-11&arch=kfreebsd-i386&stamp=1267920117&file=log
(atlas sse3)

Thanks,
Sylvestre

Discussion

  • R. Clint Whaley

    R. Clint Whaley - 2010-03-19
    • labels: 360151 -->
    • milestone: 148062 -->
    • status: open --> open-accepted
     
  • R. Clint Whaley

    R. Clint Whaley - 2010-03-19

    Sylvestre,

    I believe you are seeing a long-standing bug that I just recently tracked down and fixed in the developer release. I'm going to attach an updated emit_mm, and can you tell me if it compiles and fixed the problem (you want to save it in your ATLAS/tune/blas/gemm directory)? I'm in the middle of a bunch of stuff, and can't test it, but I did a hasty backpatch of the changes to fix the bug, and so hopefully I did that OK . . .

    Thanks,
    Clint

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-03-19
    • labels: --> Install problem
    • milestone: --> Stable
    • assigned_to: nobody --> rwhaley
     
  • R. Clint Whaley

    R. Clint Whaley - 2010-03-19
     
  • Sylvestre Ledru

    Sylvestre Ledru - 2010-03-19

    Thanks for this quick and useful answer!

    I just launched a new build of the packages (about 6 to 7 hours of build). I will then upload it in Debian experimental. This will give me some chances to reproduce (or not) this bug ... but due to it random nature, it is not always easy to test.

    Thanks again!

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-03-24

    Sylvestre,

    Did the file I posted compile OK, and work in your new builds? I'm wondering if I can post to it to the errata to avoid this rarely occurring but annoying bug . . .

    Thanks,
    Clint

     
  • Sylvestre Ledru

    Sylvestre Ledru - 2010-03-24

    I made the mistake to add other modifications in my upload, which, of course, broke the package. I will let you know when it is fixed and confirmed.

     
  • Sylvestre Ledru

    Sylvestre Ledru - 2010-04-07

    Hello Clint,

    I have been building atlas several times with your patch and I didn't see the crash on build time again.
    For now, I think we can consider it as closed. Many thanks for the fix.

    Sylvestre
    PS: I wrote many patches to have ATLAS building correctly into Debian/Ubuntu. Is it possible to consider the inclusion of some of them into ATLAS sources ?

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-04-07

    Sylvestre,

    Great news that the patch I whipped up is kosher. As for my accepting other's patched, absolutely! The only caveat is that of course I'll have to see them with an explanation of what they are fixing before I can say I take them for sure.

    You can submit on the ATLAS patches tracker. It will really help me evaluate them if for each patch, you describe what problem you are fixing, and some idea of how you fix it (just looking at the code doesn't always tell me). Have each patch be as discrete as possible.

    Thanks!
    Clint

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-04-07
    • status: open-accepted --> open-fixed
     
  • Sylvestre Ledru

    Sylvestre Ledru - 2010-05-16

    Hello Clint,

    Bad luck. this bug is still open but seems to occur less ...
    I had the problem twice again.

    As usual, I am using Debian build infrastructure.
    https://buildd.debian.org/build.cgi?pkg=atlas

    With the version 3.8.3-22 under hppa, it failed while it worked with 3.8.3-21.
    and alpha 3.8.3-21 & 3.8.3-22

    Sounds like Gentoo might have also this issue:
    http://bugs.gentoo.org/show_bug.cgi?id=303185

    Thanks

    PS: I also see this problem with the build under 64bits (amd64) between version 3.8.3-20 (good) & 3.8.3-21 (failed) but it might be related to a too old CPU on the build chain.

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-06-02

    Bummer. Now I've got to wait another epoch until I can reproduce the problem :(

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-06-02
    • status: open-fixed --> open
     
  • Sylvestre Ledru

    Sylvestre Ledru - 2010-06-30

    Is there anyway I can help you here ?

     
  • R. Clint Whaley

    R. Clint Whaley - 2010-06-30

    You could try 3.9.25 if you like. The fix on the stable was a back-port of a fix for the developer. If the developer always works, then at least I know I just didn't backport everything. If the developer fails as well, then I've still got the bug everywhere . . .

    I don't have time to actively look at this bug right now. It still happens only rarely to you, or is it happening frequently now?

    Thanks,
    Clint

     
  • Sylvestre Ledru

    Sylvestre Ledru - 2010-06-30

    Yes, it happens pretty often. Something like one every 10 to 15 builds.

    I could try the 3.9 family but I would have to upload it into Debian to do it... (is this version stable enough ?)

    Do you have an ETA of the 3.10 ? (if I wait or not)

    Thanks

     
  • R. Clint Whaley

    R. Clint Whaley - 2014-06-27

    This code should be fixed, and if not, is going to be eliminated in 3.12. Closing.

     
  • R. Clint Whaley

    R. Clint Whaley - 2014-06-27
    • status: open --> closed-fixed
     

Log in to post a comment.