Menu

#511 How to control compilation so there is/is not SSE, AVX and so on?

open
nobody
None
5
2023-11-20
2023-11-12
Sam Tansy
No

As suggested, tried to compile 7zip from scratch and stumbled upon the issue with SSE or AVX that is not accepted by compiler (gcc). I could let it pass by manually adding `-msse/-mavx' option to compiler but that's not what I want. Not to mention AVX is not an option with this processor, and I have no idea how did it figure out it can use it.

Almost all sha1 files (`C/Sha1Opt.c')

/tmp/ccDR9z2w.s:55: Error: no such instruction: `sha1nexte %xmm0,%xmm2'
/tmp/ccDR9z2w.s:57: Error: no such instruction: `sha1msg1 %xmm0,%xmm1'

sha256 (`C/Sha256Opt.c')

/tmp/ccQNSw71.s:92: Error: no such instruction: `sha256rnds2 %xmm0,%xmm1,%xmm2'
/tmp/ccQNSw71.s:96: Error: no such instruction: `sha256rnds2 %xmm0,%xmm2,%xmm1'

then C/SwapBytes.c

../../../../C/SwapBytes.c: In function ‘ShufBytes_256’:
../../../../C/SwapBytes.c:312:7: warning: implicit declaration of function ‘_mm256_set_m128i’ [-Wimplicit-function-declaration]
_mm256_set_m128i(
^~~~~~~~~~~~~~~~
../../../../C/SwapBytes.c:312:7: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’

There is no option in any Makefile to control it. Even un/defining stuff like -Dk_SwapBytes_Mode_MAX=0, which I thought would turn in it off in `SwapBytes.c', doesn't work.

Moreover that, if one wants to build with or without some specific optimization - there is no way to do it. Only whatever will be automatically decided. Same, there is no way to make it generic if one needs.

When tried to check what macros are defined in this file, to hopefully control the process
cpp -dM ../../../../C/SwapBytes.c
got 200kB of text which, at this point is totally unmanageable.

What else cat one do to make it work?

Discussion

  • Igor Pavlov

    Igor Pavlov - 2023-11-12

    What compiler version?

    SwapBytes.c checks that compiler version that support avx2 :

      #if defined(__clang__) && (__clang_major__ >= 4) \
          || defined(Z7_GCC_VERSION) && (Z7_GCC_VERSION >= 40701)
          #define k_SwapBytes_Mode_MAX  k_SwapBytes_Mode_AVX2
          #define SWAP_ATTRIB_SSE2  __attribute__((__target__("sse2")))
          #define SWAP_ATTRIB_SSSE3 __attribute__((__target__("ssse3")))
          #define SWAP_ATTRIB_AVX2  __attribute__((__target__("avx2")))
    

    does your compiler accept __attribute__((__target__("avx2")))
    ?

     

    Last edit: Igor Pavlov 2023-11-12
    • Sam Tansy

      Sam Tansy - 2023-11-12

      It's gcc v6.5, and your test is for clang, as `#if' states.

      Also checked documentation and __attribute__ ((__target__ ("target")) works. According to documentation, since, at least, gcc-4.4 (Function-Attributes). And it should recognize AVX as well (X86-Built_in-Functions), and AVX2 since 4.7.

      ... checks that compiler version that support avx2

      And it adjudged that it does when it does not. This processor does not have AVX2 although, somehow, test thought it did.

      Well, the question was actually different, although related - how to set it up so it would use or not use these features? If one wanted to make it utilize every available optimization - how does one do that. And if one wanted to make it generic, despite the new processor on board, for clients, then how to set it up?

      PS. tried to test how these macros expand but didn't manage to do it. Test in attachment.

       

      Last edit: Sam Tansy 2023-11-12
      • Igor Pavlov

        Igor Pavlov - 2023-11-12

        I don't understand your question.
        If compiler supports avx2, 7-zip compiles avx2 branch of code.
        and 7-zip checks cpuid for avx2 at runtime also.

        That scheme works without problems with latest gcc compile versions.
        If it doesn't work for old gcc compiler, please try to find the reason.
        What exactly feature was changed after gcc v6.5, that doesn't allow to compile 7zip with old compiler?

         

        Last edit: Igor Pavlov 2023-11-12
        • Sam Tansy

          Sam Tansy - 2023-11-12

          First question is how to solve that particular problem with test recognizing features that are not present.

          That scheme works without problems with latest gcc compile versions.

          I just checked that with mingw-gcc-12:

          #include <stdio.h>
          
          #include "Compiler.h"
          #include "CpuArch.h"
          #include "SwapBytes.h"
          
          #include "SwapBytes.c"
          
          int main()
              {
              //printf("MY_CPU_X86_OR_AMD64=%d\n", MY_CPU_X86_OR_AMD64);
              printf("k_SwapBytes_Mode_MAX=%d\n", k_SwapBytes_Mode_MAX);
          
              return 0;
              }
          

          # link or copy 7z-23.01 source fo `7z2301'.

          $ gcc --version
          gcc version 12.2.0 (MinGW-W64 i686-msvcrt-posix-dwarf)

          $ gcc -oattr7z2 attr7z2.c; ./attr7z2

          MY_CPU_X86_OR_AMD64=1
          k_SwapBytes_Mode_MAX=3

          $ gcc -o attr7z3 -I7z2301/C attr7z3.c 7z2301/C/CpuArch.c; ./attr7z3

          MY_CPU_X86_OR_AMD64=1
          k_SwapBytes_Mode_MAX=3

          As far as I understand `k_SwapBytes_Mode_MAX' is the 'maximal' feature available in CPU. It that right?
          If so, then why is it always 3 ( `#define k_SwapBytes_Mode_AVX2 3') when the CPU does not offer this feature? It applies to clang as well.

          In the same time CPUID recognizes:

          $ gcc -o attr7z_c -I7z2301/C attr7z_c.c 7z2301/C/CpuArch.c; ./attr7z_c

          MY_CPU_X86_OR_AMD64=1
          k_SwapBytes_Mode_MAX=3
          CPU_IsSupported_AES()=0
          CPU_IsSupported_AVX()=0
          CPU_IsSupported_AVX2()=0
          CPU_IsSupported_SSE()=1
          CPU_IsSupported_SSE2()=1
          CPU_IsSupported_SSSE3()=1
          CPU_IsSupported_SSE41()=0

          So it's not just 'old gcc'. New (gcc-12) have the problem with these test macros.

          Ed. There is something with gcc here as the first program does not compile in gcc-6, it does in gcc-8+.

          Second question, or shall I say request, is how to control it, or if not possible to add mechanism to do that.
          As said before, what if one wants to compile less features, in more generic fashion, so it worked on other, possibly not so advanced and not Intel (namely pre-Ryzen AMD) computers?
          Maybe in similar fashion as it is cone in `$Z7SRC/C/var_gcc.mak'.

           

          Last edit: Sam Tansy 2023-11-12
          • Igor Pavlov

            Igor Pavlov - 2023-11-12

            I don't understand the problem.
            I compile one binary that will be run on any system.
            Also I support all compilers when it's possible.
            So the source code checks what exact features are supported by compiler. I know what version of GCC supports AVX/AVX2 and so on. So I check version of compiler to enable SSE2/AVX2 code.
            So I try to use all features (AVX, AVX2 and so on) if compiler supports them.
            But at runtime there is another check with cpuid also.
            So there are two checks:
            1) compile time check. If compiler is new, then we have many branches of code in binary.
            2) runtime check that selects branch of code depending on cpuid.
            Both checks work as expected in new compilers.
            if something doesn't work with old compiler, then I want to know what exact compiler and why.

             

            Last edit: Igor Pavlov 2023-11-12
            • Sam Tansy

              Sam Tansy - 2023-11-18

              I tested it with newer compiler, namely mingw-gcc-8.1, and linux-gcc-8.2, so gcc-8. Also with mingw-gcc-12. And while Mingw manages it fine, Linux throws similar errors in (AesOpt.o, Sha1Opt.o, Sha256Opt.o). So it is a relevant question.
              And telling that it's because of compiler, is not going to change it. It's obviously not 'just a compiler' thing.

              It looks more less this way:

              7z2301/CPP/7zip/Bundles/Alone2 $ gcc-8.2.0  -O2 -c -Wall -Wextra -Waddress -Waggressive-loop-optimizations -Wattributes -Wcast-align -Wcomment -Wdiv-by-zero -Wformat-contains-nul -Winit-self -Wint-to-pointer-cast -Wunused -Wunused-macros  -Wbool-compare -Wduplicated-cond  -Wcast-align -Wconversion -Wmaybe-uninitialized -Wmisleading-indentation  -Wno-strict-aliasing   -DNDEBUG -D_REENTRANT -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fPIC   -o b/g/AesOpt.o ../../../../C/AesOpt.c
              /tmp/ccvq4qCk.s: Assembler messages:
              /tmp/ccvq4qCk.s:446: Error: operand type mismatch for `vaesdec'
              (x4)
              /tmp/ccvq4qCk.s:462: Error: operand type mismatch for `vaesdeclast'
              (x4)
              /tmp/ccvq4qCk.s:627: Error: operand type mismatch for `vaesenc'
              (x4)
              /tmp/ccvq4qCk.s:640: Error: operand type mismatch for `vaesenclast'
              (x4)
              make: *** [b/g/AesOpt.o] Error 1
              

              To not clutter the thread log is in paste.

              Funny thing is they (mingw-gcc, linux-gcc) produce similar intemediate assembler but do not compile them same way (in attachment).

               

              Last edit: Sam Tansy 2023-11-18
              • Igor Pavlov

                Igor Pavlov - 2023-11-19

                https://github.com/xmrig/xmrig/issues/3081

                It looks like you have incompatible gas (GNU Assembler) version which doesn't support VAES instructions properly.

                That was it! Using the latest version of binutils (2.38) solved the problem. I was using 2.27.

                 
                • Sam Tansy

                  Sam Tansy - 2023-11-19

                  You were right with binutils. I have recently restarted system and not loaded new modules. Sorry for the the mess.

                   
                  • Igor Pavlov

                    Igor Pavlov - 2023-11-19

                    So is it usual situation that some user has new gcc but old gas (GNU Assembler) in binutils?
                    gcc doesn't require new binutils during gcc installing?

                     
                    • Sam Tansy

                      Sam Tansy - 2023-11-20

                      They are different packets. One can get newer compiler working with older binutils. It can also happen when GCC is compiled from scratch.
                      Distributions often provide both, so they both are up to same date.
                      Chance to notice a difference is actually very slim, as it works, unless it comes to the situation when it compiles new code with new CPU features, unsupported by old binutils.
                      Vast majority of programs don't use these CPU features, with exception of games maybe, or some specialized applications, and choosing different implementations based on runtime check of processor feature are even rarer.

                       
                      • Sam Tansy

                        Sam Tansy - 2023-11-20

                        GCC documentations mentioned support for AVX since gcc-4.7, or even earlier; Binutils ChangeLog mentions xmm in 2011, 2012, and then 2017+.; Gas testsuite in 2008-2013.. and `as.info' specifies xmm registers in section `9.15.6 Register Naming' in 2013. (didn't check earlier).
                        One can expect it to be supported if it's in documentation.

                         
  • Sam Tansy

    Sam Tansy - 2023-11-20

    -

     

    Last edit: Sam Tansy 2023-11-20

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.