As suggested, tried to compile 7zip from scratch and stumbled upon the issue with SSE or AVX that is not accepted by compiler (gcc). I could let it pass by manually adding `-msse/-mavx' option to compiler but that's not what I want. Not to mention AVX is not an option with this processor, and I have no idea how did it figure out it can use it.
Almost all sha1 files (`C/Sha1Opt.c')
/tmp/ccDR9z2w.s:55: Error: no such instruction: `sha1nexte %xmm0,%xmm2'
/tmp/ccDR9z2w.s:57: Error: no such instruction: `sha1msg1 %xmm0,%xmm1'
sha256 (`C/Sha256Opt.c')
/tmp/ccQNSw71.s:92: Error: no such instruction: `sha256rnds2 %xmm0,%xmm1,%xmm2'
/tmp/ccQNSw71.s:96: Error: no such instruction: `sha256rnds2 %xmm0,%xmm2,%xmm1'
then C/SwapBytes.c
../../../../C/SwapBytes.c: In function ‘ShufBytes_256’:
../../../../C/SwapBytes.c:312:7: warning: implicit declaration of function ‘_mm256_set_m128i’ [-Wimplicit-function-declaration]
_mm256_set_m128i(
^~~~~~~~~~~~~~~~
../../../../C/SwapBytes.c:312:7: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
There is no option in any Makefile to control it. Even un/defining stuff like -Dk_SwapBytes_Mode_MAX=0
, which I thought would turn in it off in `SwapBytes.c', doesn't work.
Moreover that, if one wants to build with or without some specific optimization - there is no way to do it. Only whatever will be automatically decided. Same, there is no way to make it generic if one needs.
When tried to check what macros are defined in this file, to hopefully control the process
cpp -dM ../../../../C/SwapBytes.c
got 200kB of text which, at this point is totally unmanageable.
What else cat one do to make it work?
What compiler version?
SwapBytes.c
checks that compiler version that supportavx2
:does your compiler accept
__attribute__((__target__("avx2")))
?
Last edit: Igor Pavlov 2023-11-12
It's gcc v6.5, and your test is for clang, as `#if' states.
Also checked documentation and
__attribute__ ((__target__ ("target"))
works. According to documentation, since, at least, gcc-4.4 (Function-Attributes). And it should recognize AVX as well (X86-Built_in-Functions), and AVX2 since 4.7.And it adjudged that it does when it does not. This processor does not have AVX2 although, somehow, test thought it did.
Well, the question was actually different, although related - how to set it up so it would use or not use these features? If one wanted to make it utilize every available optimization - how does one do that. And if one wanted to make it generic, despite the new processor on board, for clients, then how to set it up?
PS. tried to test how these macros expand but didn't manage to do it. Test in attachment.
Last edit: Sam Tansy 2023-11-12
I don't understand your question.
If compiler supports avx2, 7-zip compiles avx2 branch of code.
and 7-zip checks cpuid for avx2 at runtime also.
That scheme works without problems with latest gcc compile versions.
If it doesn't work for old gcc compiler, please try to find the reason.
What exactly feature was changed after gcc v6.5, that doesn't allow to compile 7zip with old compiler?
Last edit: Igor Pavlov 2023-11-12
First question is how to solve that particular problem with test recognizing features that are not present.
I just checked that with mingw-gcc-12:
# link or copy 7z-23.01 source fo `7z2301'.
As far as I understand `k_SwapBytes_Mode_MAX' is the 'maximal' feature available in CPU. It that right?
If so, then why is it always 3 ( `#define k_SwapBytes_Mode_AVX2 3') when the CPU does not offer this feature? It applies to clang as well.
In the same time CPUID recognizes:
So it's not just 'old gcc'. New (gcc-12) have the problem with these test macros.
Ed. There is something with gcc here as the first program does not compile in gcc-6, it does in gcc-8+.
Second question, or shall I say request, is how to control it, or if not possible to add mechanism to do that.
As said before, what if one wants to compile less features, in more generic fashion, so it worked on other, possibly not so advanced and not Intel (namely pre-Ryzen AMD) computers?
Maybe in similar fashion as it is cone in `$Z7SRC/C/var_gcc.mak'.
Last edit: Sam Tansy 2023-11-12
I don't understand the problem.
I compile one binary that will be run on any system.
Also I support all compilers when it's possible.
So the source code checks what exact features are supported by compiler. I know what version of GCC supports AVX/AVX2 and so on. So I check version of compiler to enable SSE2/AVX2 code.
So I try to use all features (AVX, AVX2 and so on) if compiler supports them.
But at runtime there is another check with
cpuid
also.So there are two checks:
1) compile time check. If compiler is new, then we have many branches of code in binary.
2) runtime check that selects branch of code depending on
cpuid
.Both checks work as expected in new compilers.
if something doesn't work with old compiler, then I want to know what exact compiler and why.
Last edit: Igor Pavlov 2023-11-12
I tested it with newer compiler, namely mingw-gcc-8.1, and linux-gcc-8.2, so gcc-8. Also with mingw-gcc-12. And while Mingw manages it fine, Linux throws similar errors in (AesOpt.o, Sha1Opt.o, Sha256Opt.o). So it is a relevant question.
And telling that it's because of compiler, is not going to change it. It's obviously not 'just a compiler' thing.
It looks more less this way:
To not clutter the thread log is in paste.
Funny thing is they (mingw-gcc, linux-gcc) produce similar intemediate assembler but do not compile them same way (in attachment).
Last edit: Sam Tansy 2023-11-18
https://github.com/xmrig/xmrig/issues/3081
You were right with binutils. I have recently restarted system and not loaded new modules. Sorry for the the mess.
So is it usual situation that some user has new gcc but old gas (GNU Assembler) in binutils?
gcc doesn't require new binutils during gcc installing?
They are different packets. One can get newer compiler working with older binutils. It can also happen when GCC is compiled from scratch.
Distributions often provide both, so they both are up to same date.
Chance to notice a difference is actually very slim, as it works, unless it comes to the situation when it compiles new code with new CPU features, unsupported by old binutils.
Vast majority of programs don't use these CPU features, with exception of games maybe, or some specialized applications, and choosing different implementations based on runtime check of processor feature are even rarer.
GCC documentations mentioned support for AVX since gcc-4.7, or even earlier; Binutils ChangeLog mentions xmm in 2011, 2012, and then 2017+.; Gas testsuite in 2008-2013.. and `as.info' specifies xmm registers in section `9.15.6 Register Naming' in 2013. (didn't check earlier).
One can expect it to be supported if it's in documentation.
-
Last edit: Sam Tansy 2023-11-20