This patch is an attempt to implement an optimized code {C/C++ and Asm based} for LZMA and AES benchmark algorithms with use of AVX512 architecture instructions for single thread and multithreading.
- Optimized Compression algorithm partially in LZMA benchmark.
- Optimized Decompression algorithm in AES benchmark.
Compiled and tested in Ubuntu {22.04} with GCC compiler {11.4} and with MSVC {visual studio 2022 with C/C++ compiler version 19.37.32824} compiler in Windows.
Modified code in CpuArch.c, MainAr.cpp and SystemInfo.cpp, to implement AVX512 code path separately.
LZMA benchmark:
* command used - 7zz.exe/7zz b - {Windows or Linux respectively}
* Revised and introduced distinct functions and code paths for AVX-512, utilizing function names incorporating "_512 " in LzmaEnc.c, LzFind.c, and LzFindOpt.c, through the use of function pointers.
* Optimized certain loops, with declaration of UPDATEmaxlen macro in the LzFind.c, ReadMatchDistances and in GetMatchesSpecN_2 functions. Implemented the assembly with the similar changes for multithreading in case of LZMA benchmark in the file LzFindOpt.asm
* For enabling AVX512 based instructions in a machine with AVX512 support on windows with MSVC, need to set USE_AVX512=1 in the environment before build and before executing the executable.
* For enabling AVX512 based instructions in Linux with GCC environment, made use of MY_ARCH = -march=native in 7zip_gcc.mak file, so that, automatically compiler choses highest available architecture.
AES Benchmark:
* command used - 7zz.exe/7zz b -mm=AES256CBC:3 - { Windows or Linux respectively}
* Modified and created separate function for AesCbc_Decode_HW for 512 variant with name as "AesCbc_Decode_HW_512".
* Implemented the assembly version in AesOpt.asm for the same function and added changes by creating new function with above name and modified the num_ways for AVX512 variant as 8 instead of default 11 ways, as we find "num_ways=8" is much more optimized for 512 variant.
Tested overall code in AVX2 machines as well.
Results for LZMA and AES benchmarks taken on Intel Icelake {I5-13600K} and AMD 7600X - are attached below along with the patch.
Hi @ipavlov please refer to the updated performance results sheet attached in this comment. Just a minor update in the column names of LZMA performance. Also the configuration of the Intel machine we tested is {I5-1035G1} and not I5-13000K as quoted in the above message. Thanks
Last edit: sai krishna 2024-01-29
Hi @ipavlov, we have re-run our implementations on AMD Ryzen 5 9600X {Granite}, With the AES decompression algorithm implemented with AVX512, we observe approx.77 % boost in comparison to the base version. Kindly check the implementations and let us know if these can be integrated. Thanks.