Hello,
I am currently repackaging Atlas 3.8.X for Debian. I simplified the
previous process and fixes many issues.
However, from time to time, I have a random bug occurring at build time.
This does not happen all the time but the ratio is pretty important
(about once every 4 or 5 builds).
Does any one know what is wrong ?
The configure option:
./xconfig -d
s /build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-core2sse3/../../ -d b /build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-core2sse3 -D c -DWALL -b 32 -Fa alg "-Wa,--noexecstack -fPIC" -Ss f77lib "-L/usr/lib/gcc/i486-kfreebsd-gnu/4.4.3/ -lgfortran -lgcc_s" -Ss flapack /usr/lib/liblapack_pic.a -A 14 -V 28 -v 2 -Si cputhrchk 0
[...]
The command is the following:
gcc -DL2SIZE=4194304
-I/build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-sse3/include -I/build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-sse3/../..//include -I/build/buildd-atlas_3.8.3-11-kfreebsd-i386-mfckCD/atlas-3.8.3/build/atlas-sse3/../..//include/contrib -DAdd_ -DF77_INTEGER=int -DStringSunStyle -DATL_OS_FreeBSD -DATL_ARCH_P4 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632 -DWALL -DATL_UCLEANM -DATL_UCLEANN -DATL_UCLEANK -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -Wa,--noexecstack -fPIC -m32 -c ATL_zupNBmm_b0.c
ATL_zupNBmm_b0.c: In function "ATL_zpNBmm_b0":
ATL_zupNBmm_b0.c:61: error: "else" without a previous "if"
ATL_zupNBmm_b0.c:65: error: "else" without a previous "if"
I don't have the file available but the syntax is clearly wrong (I can
produce it if really needed).
The log is available here:
https://buildd.debian.org/fetch.cgi?&pkg=atlas&ver=3.8.3-11&arch=kfreebsd-i386&stamp=1267920117&file=log
(atlas sse3)
Thanks,
Sylvestre
Sylvestre,
I believe you are seeing a long-standing bug that I just recently tracked down and fixed in the developer release. I'm going to attach an updated emit_mm, and can you tell me if it compiles and fixed the problem (you want to save it in your ATLAS/tune/blas/gemm directory)? I'm in the middle of a bunch of stuff, and can't test it, but I did a hasty backpatch of the changes to fix the bug, and so hopefully I did that OK . . .
Thanks,
Clint
Thanks for this quick and useful answer!
I just launched a new build of the packages (about 6 to 7 hours of build). I will then upload it in Debian experimental. This will give me some chances to reproduce (or not) this bug ... but due to it random nature, it is not always easy to test.
Thanks again!
Sylvestre,
Did the file I posted compile OK, and work in your new builds? I'm wondering if I can post to it to the errata to avoid this rarely occurring but annoying bug . . .
Thanks,
Clint
I made the mistake to add other modifications in my upload, which, of course, broke the package. I will let you know when it is fixed and confirmed.
Hello Clint,
I have been building atlas several times with your patch and I didn't see the crash on build time again.
For now, I think we can consider it as closed. Many thanks for the fix.
Sylvestre
PS: I wrote many patches to have ATLAS building correctly into Debian/Ubuntu. Is it possible to consider the inclusion of some of them into ATLAS sources ?
Sylvestre,
Great news that the patch I whipped up is kosher. As for my accepting other's patched, absolutely! The only caveat is that of course I'll have to see them with an explanation of what they are fixing before I can say I take them for sure.
You can submit on the ATLAS patches tracker. It will really help me evaluate them if for each patch, you describe what problem you are fixing, and some idea of how you fix it (just looking at the code doesn't always tell me). Have each patch be as discrete as possible.
Thanks!
Clint
Hello Clint,
Bad luck. this bug is still open but seems to occur less ...
I had the problem twice again.
As usual, I am using Debian build infrastructure.
https://buildd.debian.org/build.cgi?pkg=atlas
With the version 3.8.3-22 under hppa, it failed while it worked with 3.8.3-21.
and alpha 3.8.3-21 & 3.8.3-22
Sounds like Gentoo might have also this issue:
http://bugs.gentoo.org/show_bug.cgi?id=303185
Thanks
PS: I also see this problem with the build under 64bits (amd64) between version 3.8.3-20 (good) & 3.8.3-21 (failed) but it might be related to a too old CPU on the build chain.
Bummer. Now I've got to wait another epoch until I can reproduce the problem :(
Is there anyway I can help you here ?
You could try 3.9.25 if you like. The fix on the stable was a back-port of a fix for the developer. If the developer always works, then at least I know I just didn't backport everything. If the developer fails as well, then I've still got the bug everywhere . . .
I don't have time to actively look at this bug right now. It still happens only rarely to you, or is it happening frequently now?
Thanks,
Clint
Yes, it happens pretty often. Something like one every 10 to 15 builds.
I could try the 3.9 family but I would have to upload it into Debian to do it... (is this version stable enough ?)
Do you have an ETA of the 3.10 ? (if I wait or not)
Thanks
This code should be fixed, and if not, is going to be eliminated in 3.12. Closing.