Windows 11, limited user account: 7-max 24.01 (x64) : Copyright (c) 2003-2024 Igor Pavlov : 2024-05-10 Windows 10.0 26100 : Microsoft Hv : Hv#1 : 10.0.26100.6.0.6584 x64 6.A503 threads:12 128TB Large Page Min Size = 2097152 bytes ntdll.dll : NtAllocateVirtualMemory : 00007FF9B4DA3520 : 16 : 4c 8b d1 b8 18 00 00 00 f6 04 25 08 03 fe 7f 01 75 03 0f 05 kernelbase.dll : CreateProcessInternalW : 00007FF9B2324EB0 : 12 : 40 53 56 57 41 54 41 55 41 56 41 57 b8 90 1e 00 00 e8 9d 41 kernel32.dll : CreateProcessInternalW...
We don't want random blue screens and failures for 7-max program usage. We don't try to make 7-max working for everyone. We just support latest systems that are stable for 7-max. Also 7-max needs big amount of RAM for best performance. Old systems usually have no big RAM. Probably you can try to update windows 10 to newer revision version.
So please indicate all such known problems and details about them, including blocking, in a separate section of the help file or in the "General Information" section, if there is very little information. Otherwise, we have to play daisy-like. Is there no chance that this was fixed by subsequent updates with a higher final UBR? And is it possible to bypass the blocking and try at your own risk? You could also indicate the bypass and verification method in the help file, so that everyone could personally...
So please indicate all such known problems and details about them, including blocking, in a separate section of the help file. Otherwise, we have to play daisy-like. Is there no chance that this was fixed by subsequent updates with a higher final UBR? And is it possible to bypass the blocking and try at your own risk? You could also indicate the bypass and verification method in the help file, so that everyone could personally double-check their system if they want.
Some Windows 10 versions before builds 16299 and 18362 contain at least two different bugs for large pages that can give blue screen of death or another failures. So we block these problematic Windows revisions in 7-max. We suppose that Window 10 works incorrectly with "Large Pages" at: - Windows 10 1703 (15063) : incorrect allocating after VirtualFree() - Windows 10 1709 (16299) : incorrect allocating after VirtualFree() - Windows 10 1809 (17763) : the failures for blocks of 1 GiB and larger, if...
Some Windows 10 versions before builds 16299 and 18362 contain at least two different bugs for large pages that can give blue screen of death or another failures. So we block these problematic Windows revisions in 7-max.
Some Windows 10 versions before builds 16299 and 18362 contain at least two different bugs for large pages that can give blue screen of death or another failures. So we block these problematic Windows revisions.
Pentium N3710 @ Windows 10 2015 LTSB (10.0.10240) x64 AMD E-450 @ Windows 10 2016 LTSB (10.0.14393) x32
Write information about your Windows Version and CPU name.
Always says "this Windows version is not safe for Large Pages". I've already tried two versions of the OS. On what basis does it determine this safety? Why couldn't you list in the help which OS versions are not safe? Or is this determined on the fly by some tests, and not by the OS version number? And what is the unsafety? Maybe everything is fine and the determination is wrong based on a false indicator? How can I bypass the restriction?
win 11 benchmark test
Now I'm not ready to debug it.
Crashes Chromium
I checked right now with the more modern build of PDF-XChange Editor and OCRed a big file with no problem via 7-max :) , I suppose It could be that the developers had a hard time implementing Abbyy Fine Reader engine, but it's OK now. As for TC, it still refuses to start with the same error, I've checked with a clear profile and .ini and matters not release or beta.
I checked right now with the more modern build of PDF-XChange Editor and OCRed a big file with no problem via 7-max :) , I suppose It could be that the developers had a hard time to implement Abbyy Fine Reader engine, but it's OK now. As for TC it refuses to start with the same error, I've checked with a clear profile and .ini and matters not release or beta.
Igor, I've been never able to run TC via 7-max. This is x64 version of TC and 7-max, and I just chose TOTALCMD64.EXE, the policy was set long ago It works and my account has administrative rights. I think you can't, PDF X-Change Editor doesn't have any trial and without a working license it's just a Viewer and you can't OCR anything. And I can't share my license because it's pinned to my 2 PCs via internet activation. They use Abbyy Fine Reader licensed engine for it, I'll try ASAP maybe it's solved...
Igor, I've been never able to run TC via 7-max. This is x64 version of TC and 7-max, and I just chose TOTALCMD64.EXE, the policy was set long ago It work and my account has administrative rights. I think you can't, PDF X-Change Editor doesn't have any trial and without a working license it's just a Viewer and you can't OCR anything. And I can't share my license because it's pinned to my 2 PCs via internet activation. They use Abbyy Fine Reader licensed engine for it, I'll try ASAP maybe it's solved...
Igor, I've been never able to run TC via 7-max. This is x64 version of TC and 7-max, and I just chose TOTALCMD64.EXE I think you can't, PDF X-Change Editor doesn't have any trial and without a working license it's just a Viewer and you can't OCR anything. And I can't share my license because it's pinned to my 2 PCs via internet activation. They use Abbyy Fine Reader licensed engine for it, I'll try ASAP maybe it's solved now with a newer version.
please describe exact simplified steps to reproduce problems for each of these two issues: 1) issues with a PDF editor. 2) TC can't be run via 7-max. I'm not sure that I will debug it now. But maybe later I'll look it.
No, Igor, where I had the issues was a PDF editor and I tried to OCR some not very large text, but there was enough memory. Something like this. And TC can't be run via 7-max.
No, Igor, where I had the issues was a PDF editor and I tried to OCR some not very large text, but there was enough memory. Something like this. And TC can't be run via 7-max.
No, Igor, where I had the issues was a PDF editor and I tried to OCR some not very large text, but there was enough memory. Something like this. And TC can't be run via 7-max.
No, Igor, where I had the issues was a PDF editor and I tried to OCR some not very large text, but there was enough memory. Something like this. And TC can't be run via 7-max. [url=https://www.upload.ee/image/17436977/TC_7-Max.jpg][img]https://www.upload.ee/thumb/17436977/TC_7-Max.jpg[/img][/url]
winrar here is just example for 7-max testing, because we can see the difference of performance in winrar benchmark. So we can test how subprocess creation works in different cases: running total commander with 7-max, and winrar from total commander. Also probably there are different way to run subprocess. Maybe 7-max supports only some of them.
winrar here is just example for 7-max testing, because we can see the difference of performance in winrar benchmark. So we can test how subprocess creation works in different cases: running total commander with 7-max, and winrar from total commander.
The problem is: that I tested back then and now I don't remember exactly. If I see something helpful I'll report. Winrar in beta 7.1 has -mlp itself because the advantage is obvious.
7-max intercepts subprocess creation functions also. It's another point of possible problems. So if you know bad cases, please describe them. And show information about version windows. Maybe some simplified test cases can help, if we run some programs (for example, WinRAR) from another programs (for example, from Total Commander). Note, that I don't have Total Commander.
@ipavlov Igor, what I noticed myself that sometimes even if a SW initially started with 7-max and it seems fine then it tries to create sub processes and at that point everything breaks. I think it's because even if 7-max intercepts the memory calls the sub processes don't understand what is returned to them. But maybe I wrong, because I simply don't know.
We can have different situations: case-1: some error message or crash with 7-max. I don't know about such cases. But I didn't test many programs with 7-max. 7-max works in low level of windows WIN32 API and it uses some hacking methods to intercept low level memory allocation functions. And maybe some problems are possible there. So we could need more testing with different programs. But now we have no big number of users of 7-max for such task. case-2: if 7-max works without errors, but there is...
We can have different situations: 1) Some error message or crash with 7-max. I don't know about such cases. But I didn't test many programs with 7-max. 7-max works in low level of windows WIN32 API and it uses some hacking methods to intercept low level memory allocation functions. And maybe some problems are possible there. So we could need more testing with different programs. But now we have no big number of users of 7-max for such task. 2) 7-max works, but there is no gain. It's normal case....
@ipavlov Igor, I've got these questions for a long time but it's hard to formulate them exactly right though I'll try. When you released 7-max 24.01 many users of Ru-Board forum tried to boost the software they used via 7-max and what we found out that not every SW can handle this mode. For example Winrar and RapidCRC Unicode initially ran as if this mode is meant for them , but the other software gave "Access Violation " errors or crashed suddenly whatever we did. So my questions are: why the part...
@ipavlov Igor, I've got these questions for a long time but it's hard to formulate exactly right though i'll try. When you released 7-max 24.01 many users of Ru-Board forum tried to run the software they use and wanted to boost via 7-max and what we found out that not every SW can handle this mode. For example Winrar and RapidCRC Unicode initially did as this mode is meant for them , but the other software gave "Access Violation " errors or crashed suddenly whatever we did. So my questions are: why...
A common problem with unreplaced strings.
Windows 10.0 19045 Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
Well, this was before the PC restart and I deleted the a.txt
Well, this was before PC restart and I deleted the a.txt
OS: Microsoft Windows 11 Professional (x64) Build 22631.4391 (23H2) CPU: Intel Core i7-6700K (Skylake-S) OC 4702 MHz (47.00x100.0) Motherboard: ASUS PRIME Z270-A Chipset: Intel Z270 (Kaby Lake) Memory: 32768 MBytes @ 1500*2 MHz, 15-17-17-35 Graphics: NVIDIA GeForce GTX 1070, 8192 MB GDDR5 SDRAM Drive: Samsung SSD 970 EVO Plus 500GB, 488.4 GB, NVMe
OS: Microsoft Windows 11 Professional (x64) Build 22631.4391 (23H2) CPU: Intel Core i7-6700K (Skylake-S) OC 4702 MHz (47.00x100.0) Motherboard: ASUS PRIME Z270-A Chipset: Intel Z270 (Kaby Lake) Memory: 32768 MBytes @ 1500*2 MHz, 15-17-17-35 Graphics: NVIDIA GeForce GTX 1070, 8192 MB GDDR5 SDRAM Drive: Samsung SSD 970 EVO Plus 500GB, 488.4 GB, NVMe
we`ll test hopefully eventually, low usage because not on Github)
Sometimes we could need stdout for timer There is no way to safely pipe anything with timer spoiling stdout. How do you imagine that: x:\> timer.exe gzip.exe -c a.exe > a.exe.gz It will work, sure, but the resulting archive is corrupt. That's it. Or piping, or simply decompressing through stdout some database backup: x:\> timer.exe lzma.exe -dc db.lzma | process-a-users-script Good luck. Maybe we should use some use some switch between stdout/stderr That makes more sense. Printing to stderr should...
We have low user usage for 7-max program still. I hoped that someone will try to test games with 7-max. But nobody did it. So we don't know how 7-max can be useful.
Sometimes we could need stdout for timer. For example, that way we can see what operation was performed, if we have many calls of timer in same file. Maybe we should use some additional switch to switch between stdout/stderr.
Create 7max Github page
I have used timer to measure processing time and to my surprise, when redirected tested application output to a file timer also was redirected. What's worse it mixes with tested program output. This should never happen. It should be printed to standard error, not to standard output. Checked source, and sure enough - it uses `printf'. It could use `fprintf(stderr, ...'.
Your i5-3570 can install and work in Windows 11 without any problem? I had to bypass the checks for TPM v. 2.0 (mine is v. 1.2), but Windows 11 installed and is running with no problems. Please try 7-max test again after reboot, and close all another programs including browser. Done; file attached.
Your i5-3570 can install and work in Windows 11 without any problem? I had to bypass the checks for TPM v. 2.0 (mine is v. 1.2), but Windows 11 installed and is running with no problems. Please try 7-max test again after reboot, and close all another programs including browser. Done; file attached.
Your i5-3570 can install and work in Windows 11 without any problem? I had to bypass the checks for TPM v. 2.0 (mine is v. 1.2), but Windows 11 installed and is running with no problems. Please try 7-max test again after reboot, and close all another programs including browser. Done; file attached.
Your i5-3570 can install and work in Windows 11 without any problem? I had to bypass the checks for TPM v. 2.0 (mine is v. 1.2), but Windows 11 installed and is running with no problems. Please try 7-max test again after reboot, and close all another programs including browser. Done; file attached.
Your i5-3570 can install and work in Windows 11 without any problem? Please try 7-max test again after reboot, and close all another programs including browser.
Intel Core i5-3570 L1 Cache:Instruction: 4 x 32 KBytes, Data: 4 x 32 KBytes L2 Cache:Integrated: 4 x 256 KBytes L3 Cache: 6 MBytes Instruction TLB: 2MB/4MB Pages, Fully associative, 8 entries Data TLB: 4 KB Pages, 4-way set associative, 64 entries Total Memory Size: 16 GBytes Maximum Supported Memory Clock: 800.0 MHz Current Memory Clock: 653.7 MHz Current Timing (tCAS-tRCD-tRP-tRAS): 9-9-9-24 Memory Channels Supported: 2 Memory Channels Active: 2 960 GB SSD Microsoft Windows [Version 10.0.26120...
Maybe there are some "Branch Prediction Optimization": https://wccftech.com/amd-branch-prediction-optimization-ryzen-9000-7000-cpus-available-windows-11-23h2/ https://wccftech.com/amd-ryzen-9000-gaming-performance-update-revised-testing-parity-intel-14th-gen-cpus-optimized-branch-prediction-boost/
R-110 probably overflows mop cache.
R-xxx tests will overflow caches. xxx - the number of additional instructions between branches. but also there are several branches per iteration. So total number of instructions per loop iteration can be big. There were some changes in 22.00 for linux version as I remember, because it used C code instead of asm code. Maybe there were some changes for apple m1 support. Windows version used asm code and it doesn't depend from C compiler. Why does KB5041587 affect results?
I took some time today to resolve the compilation issues in Visual Studio 2022. After running some tests, I found that the results for 2200 and 1400 were not significantly different(with Windows KB5041587 update). btw, Which tests overflow the uop-cache?
I took some time today to resolve the compilation issues in Visual Studio 2022. After running some tests, I found that the results for 2200 and 1400 were not significantly different. btw, Which tests overflow the uop-cache?
I took some time today to resolve the compilation issues in Visual Studio 2022. After running some tests, I found that the results for 2200 and 1400 were not significantly different. btw, Which tests overflow the uop-cache?
it was not tested with new compilers. I just wanted to get some changes in new version. You can fix that code to ignore warning/error.
********************************************************************** ** Visual Studio 2017 Developer Command Prompt v15.9.64 ** Copyright (c) 2017 Microsoft Corporation ********************************************************************** [vcvarsall.bat] Environment initialized for: 'x86' C:\Program Files (x86)\Microsoft Visual Studio\2017\Community>cd D:\backup\benchmark\7bench2200\7bench2200-src\CPP\Utils\CPUTest\MemLat C:\Program Files (x86)\Microsoft Visual Studio\2017\Community>d: D:\backup\benchmark\7bench2200\7bench2200-src\CPP\Utils\CPUTest\MemLat>nmake...
cd /D CPP\Utils\CPUTest\MemLat\ nmake
Thanks for your reply. I download the version 2200, but I don't know how to build it. I have try nmake -f CPP/Build.mak, but get error: nmake -f build.mak Microsoft (R) Program Maintenance Utility Version 14.40.33813.0 Copyright (C) Microsoft Corporation. All rights reserved. link -nologo -OPT:REF -OPT:ICF /LARGEADDRESSAWARE /FIXED:NO -out:o\ oleaut32.lib ole32.lib user32.lib advapi32.lib shell32.lib LINK : fatal error LNK1104: cannot open file 'o\' NMAKE : fatal error U1077: 'link -nologo -OPT:REF...
If you want to test latest version of 7-benchmark, you can download it here (source code only): https://sourceforge.net/projects/sevenmax/files/7-Benchmark/7bench2200-src.7z/download
if you want some new version (source code only), you can download it here: https://sourceforge.net/projects/sevenmax/files/7-Benchmark/7bench2200-src.7z/download
That "r2" test doesn't depend from l2 cache. But if you run Windows versions with pipelen r, there are some tests that have big code. And that big code overflows micro-op cache. And we can try to estimate micro-op cache miss penalty.
perf stat -C 0 -e cpu_core/instructions/,cpu_core/branch/,cpu_core/branch-miss/,cpu_core/l2_request.all/ taskset -c 0 ./pipelen r2 PipeLen64 14.00 : Igor Pavlov : Public domain : 2014-01-04 r2 Branches 0 1 0-1 Random Len1 Len2 32 4.39 24.52 13.98 12.91 -3.09 -2.14 64 7.56 28.58 17.79 18.64 1.14 1.69 128 9.24 25.25 19.77 19.35 4.21 -0.83 256 9.96 26.09 17.05 17.45 -1.14 0.80 512 10.30 26.49 17.42 16.84 -3.11 -1.16 1-K 10.47 26.69 17.59 17.95 -1.26 0.72 2-K 8.70 26.78 17.67 17.92 0.36 0.50 4-K 8.73...
Intel i7-8750H CPU, GTX 1050Ti Max Q, 1Tb SSD, 16Gb 2666 MT/s
CPU Usage column shows incorrect value for 1-thread benchmark. Now I don't know the reason of that issue. Maybe something was changed in Windows 11.
GIGABYTE B760M GAMING AC DDR4 Microsoft Windows 11 Professional (x64) Build 22631.3880 (23H2) 2x 32GB = 64GB DDR4-3600 18-22-22-42 CR2 Intel Core i5-13500 (Raptor Lake-S 6+8) C:\Program Files\7-Zip>7z b -mmt1 7-Zip 24.07 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-06-19 mt1 Compiler: MSC 1400.140040310 Windows 10.0 22631 : Microsoft Hv : Hv#1 : 10.0.22621.3.0.3880 x64 6.BF02 threads:20 128TB f:5F310C2774C 13th Gen Intel(R) Core(TM) i5-13500 (B06F2) (35->35) 1T CPU Freq (MHz): 4398 4459 4476...
Size 6 12 24-K 3.97 3.96 32-K 4.16 3.97 48-K 11.86 3.96 64-K 11.87 3.97 96-K 11.92 3.97 128-K 11.83 3.97 192-K 11.91 3.97 256-K 11.94 3.97 384-K 11.97 3.97 512-K 15.11 3.97 768-K 34.19 3.96 1024-K 38.38 5.17 1536-K 43.94 9.58 2-M 46.46 13.01 3-M 50.13 16.74 4-M 50.88 18.17 6-M 52.35 18.64 8-M 53.05 19.14 column 6 (256-K) : L1 cache miss - 12 cycles for (L2 latency) column 12: ( 8-M) : ~19 cycles, 19 cycles includes both L1 cache miss and TLB miss. 19 cycles - 12 cycles = 7 cycles : that is DTLB L1...
Sorry, Turbo Boost to 2208 MHz in setting, and cpufreq governor set as schedutil.
https://www.7-cpu.com/cpu/Zen3.html 4 KB pages mode (64-bit, Linux) Data TLB L1: 64 items (about 800 KB of memory). ?-assoc. Miss penalty = ? cycles. Parallel miss: ? cycle per access Data TLB L2: 2048 items. 8-way. Miss penalty = ? cycles. Parallel miss: ? cycles per access (read from L3) Size Latency Increase Description 32 K 4 64 K 8 4 + 8 (L2) 128 K 10 2 256 K 11 1 512 K 21 10 + 7 (L1 TLB miss) 1 M 36 15 + 35 (L3) 2 M 46 10 4 M 51 5 8 M 53 2 My 5800X is 3.96 cycle in 64K Column "12", not 8 cycle,...
The results show 2200 MHz. Is it bug for timer? Or real frequency is 2200 MHz instead of 2000 MHz?
GIGABYTE Z390 AORUS PRO WIFI Microsoft Windows 11 Professional (x64) Build 22631.3958 (23H2) KHX3200C18D4/16G * 4 = 64G 18-21-21-39 CR2 PS C:\Program Files\7-Zip> .\7z.exe b -mmt1 7-Zip 24.07 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-06-19 mt1 Compiler: MSC 1400.140040310 Windows 10.0 22631 : Microsoft Hv : Hv#1 : 10.0.22621.3.0.3958 x64 6.9E0D threads:8 128TB f:5F110C2774C Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (906ED) (FA->FA) 1T CPU Freq (MHz): 4593 4588 4572 4552 4553 4558 4574 RAM...
GIGABYTE Z390 AORUS PRO WIFI Microsoft Windows 11 Professional (x64) Build 22631.3958 (23H2) KHX3200C18D4/16G * 4 = 64G 18-21-21-39 CR2 PS C:\Program Files\7-Zip> .\7z.exe b -mmt1 7-Zip 24.07 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-06-19 mt1 Compiler: MSC 1400.140040310 Windows 10.0 22631 : Microsoft Hv : Hv#1 : 10.0.22621.3.0.3958 x64 6.9E0D threads:8 128TB f:5F110C2774C Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (906ED) (FA->FA) 1T CPU Freq (MHz): 4593 4588 4572 4552 4553 4558 4574 RAM...
SoC: Rockchip RK3399 CPU: big.LITTLE,arm64/aarch64, Dual-Core Cortex-A72(up to 2.0GHz) + Quad-Core Cortex-A53(up to 1.5GHz) RAM: 4GB LPDDR4 SYS: OpenWRT 23.05.4 (kernel 6.6.43) mt1 Compiler: ver:9.2.1 20191025 GCC 9.2.1 : UNALIGNED Linux : 6.6.43 : #0 SMP PREEMPT Sat Jul 27 17:42:11 2024 : aarch64 PageSize:4KB THP:always hwcap:8FF:CRC32:SHA1:SHA2:AES:ASIMD LE 1T CPU Freq (MHz): 1469 1446 2063 2203 2190 2200 2201 RAM size: 3858 MB, # CPU hardware threads: 6 RAM usage: 437 MB, # Benchmark threads:...
Column "12" for tlb miss. zen result was without Page Table Entry (PTE) Coalescing.
Thanks for you reply~ I found https://www.7-cpu.com/cpu/Zen.html said: 512 K 20 4 + 8 (L1 TLB miss) how to get the +8 cycle number?
Column "12" for 4 kb page TLB. But measuring can be complicated for AMD processors, if Page Table Entry (PTE) Coalescing is working there, where one one TLB entry can cover 4 pages: 4 * 4 KB= 16 KB.
Start-Process -FilePath "MemLat64.exe" -ArgumentList "512 p" -Wait -NoNewWindow -PassThru | ForEach-Object { $_.ProcessorAffinity = 0x02 } MemLat64 14.00 : Igor Pavlov : Public domain : 2014-01-04 512 p Size 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 4-K 3.98 3.98 3.98 3.97 3.97 3.97 3.97 3.96 3.96 3.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6-K 3.98 3.97 3.97 3.97 3.97 3.97 3.96 3.96 3.96 3.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8-K 3.98 3.98 3.98...
Windows 10 IoT Enterprise LTSC
O_0 My goodness! I waited for 7max update for almost 18 years! And here it is! Finally! Thanks Igor. My version of report taken on i7-3770K + Windows 7 A couple of tests later...
Thanks. Now we have small number of users of 7-max. Therefore, development has been suspended. If the gain from 7-max can be confirmed and if number of users will grow, I can try to improve the code for interface.
I can test in about two weeks on 12900K and RTX 4090. Games. Will let you know if I don't forget. However two things I see that could be taken care of: 1. drag & drop on 7-max icon - run a program, not throw an error 2. drag & drop on empty space in 7-max window - the same
Hello! This software seems pretty interesting. Thank you!
Now I don't plan intensive development of 7-max. We have small number of respones about usefulness of that program. I hoped that some games could get performance gain with 7-max. But no one has tried 7-max for games still. I suppose game benchmarking is not so simple. It is advisable that some game benchmaking site or some gaming forum users try to test 7-max with games for accurate results comparison.
Can fix drag-and-drop program to 7max icon?
Thanks. Can you test also WinRAR benchmark with multithreading and multithreading-off ?
Here's aother b'mark. System: AMD Ryzen7950x, 64Gb Ram, 2Tb SSD, Win11 23H2
Here's aother b'mark. System: AMD Ryzen7950x, 64Gb Ram, Win11 23H2
in 7-zip 24.05 there is new RISCV filter. You can benchmark it so: 7zz b -mm=riscv -md25 -mtic=29 -mmt1 7zz b -mm=riscv -md25 -mtic=29 -mmt1 -mfile=big_riscv_elf_file_path where big_riscv_elf_file_path - some big elf file for risc-v that has big .text section. First test is for random data. You can get difference in speed because real riscv elf file has many branch mispredictions and it converts more data. So real file will be slower for processing by filter. You can get difference in compression...
in 7-zip 24.05 there is new "riscv" filter. You can benchmark it so: 7zz b -mm=riscv -md25 -mtic=29 -mmt1 7zz b -mm=riscv -md25 -mtic=29 -mmt1 -mfile=big_riscv_elf_file_path where big_riscv_elf_file_path - some big elf file for risc-v that has big .text section. First test is for random data. You can get difference in speed because real riscv elf file has many branch mispredictions and it converts more data. So real file will be slower for processing by filter. You can get difference in compression...
in 7-zip 24.05 there is new "riscv" filter. You can benchmark it so: 7zz b -mm=riscv -md25 -mtic=29 -mmt1 7zz b -mm=riscv -md25 -mtic=29 -mmt1 -mfile=big_riscv_elf_file_path where big_riscv_elf_file_path - some big elf file for risc-v that has big .text section. First test is for random data. You can get difference in speed because real riscv elf file has many branch mispredictions and it converts more data. So real file will be slower for processing by filter. You can get difference in compression...
in 7-zip 24.05 there is new "riscv" filter. You can benchmark it so: 7zz b -mm=riscv -md25 -mtic=29 -mmt1 7zz b -mm=riscv -md25 -mtic=29 -mmt1 -mfile=big_riscv_elf_file_path where big_riscv_elf_file_path - some big elf file for risc-v that has big .text section. First test is for random data. You can get difference in speed because real riscv elf file has many branch mispredictions and it converts more data. So real file will be slower for processing by filter. You can get difference in compression...
Thanks! I'll use it. Also I checked new GCC/CLANG compilers in godbolt.org. And these compilers can use rev instructions for my macro #define Z7_BSWAP32_CONST(v) \ ( (((unsigned)(v) << 24) ) \ | (((unsigned)(v) << 8) & (unsigned)0xff0000) \ | (((unsigned)(v) >> 8) & (unsigned)0xff00 ) \ | (((unsigned)(v) >> 24) )) So probably we can get good performance for swap4, even without __riscv_zbb and _riscv_xtheadbb checks, if extensions are available: -O2 -march=rv64imafdczbkb -O2 "-mcpu=thead-c906" Also...
Thanks! I'll use it. Also I checked new GCC/CLANG compilers in godbolt.org. And these compilers can use rev instructions for my macro #define Z7_BSWAP32_CONST(v) \ ( (((unsigned)(v) << 24) ) \ | (((unsigned)(v) << 8) & (unsigned)0xff0000) \ | (((unsigned)(v) >> 8) & (unsigned)0xff00 ) \ | (((unsigned)(v) >> 24) )) So probably we can get good performance for swap4, even without __riscv_zbb and _riscv_xtheadbb checks, if extensions are available: -O2 -march=rv64imafdczbkb -O2 "-mcpu=thead-c906" Also...
Thanks! I'll use it. Also I checked new GCC/CLANG compilers in godbolt.org. And these compilers can use rev instructions for my macro #define Z7_BSWAP32_CONST(v) \ ( (((unsigned)(v) << 24) ) \ | (((unsigned)(v) << 8) & (unsigned)0xff0000) \ | (((unsigned)(v) >> 8) & (unsigned)0xff00 ) \ | (((unsigned)(v) >> 24) )) So probably we can get good performance for swap4, even without __riscv_zbb and _riscv_xtheadbb checks, if extensions are available: -O2 -march=rv64imafdczbkb -O2 "-mcpu=thead-c906" Also...
Thanks! I'll use it. Also I checked new GCC/CLANG compilers in godbolt.org. And these compilers can use rev instructions for my macro #define Z7_BSWAP32_CONST(v) \ ( (((unsigned)(v) << 24) ) \ | (((unsigned)(v) << 8) & (unsigned)0xff0000) \ | (((unsigned)(v) >> 8) & (unsigned)0xff00 ) \ | (((unsigned)(v) >> 24) )) So probably we can get good performance for swap4, even without __riscv_zbb and _riscv_xtheadbb checks, if extensions are available: -O2 -march=rv64imafdczbkb -O2 "-mcpu=thead-c906" Also...