Improved whirlpool hash performance
Brought to you by:
aleksey14
This patch allows gcc to better optimize the whirlpool hash compression function and increases speed noticeably. Try it out. I just attached the new rhash_whirlpool_process_block function, that is all that has changed.
Technically, I created an even faster version, but it requires a conversion to little endian, and -funroll-loops, which isn't that safe.
100 MB file benchmark
old version: 0.895s
new version: 0.730s
little endian -funroll-loops (not posted): 0.662s
Thanks for the patch. I'll check it later, when I have time :)
Here is a newer version that is a bit faster and one less variable. I also recommend increasing the buffer size of rhash for slightly better performance on all hashes.
I managed to squeeze a bit more performance out of it. I think this is the most that can be done, except for assembly, which I do have, but it's not portable.
Running several benchmarks on Core i7, Win7 produced the following results.
Compilers used: MinGW 32/64-bit, and 64-bit MS VC 2013.
Notations: "orig" - the current version from git;
"optN" - the N-th optimization from this thread (see above);
"openssl" the OpenSSL Whirlpool implementation (used for comparision).
gcc (mingw) 4.7.0 x64
orig: WHIRLPOOL 256 MiB total in 2,199 sec, 116,429 MBps, CPB=26,84
opt1: WHIRLPOOL 256 MiB total in 1,820 sec, 140,629 MBps, CPB=22,21
opt2: WHIRLPOOL 256 MiB total in 1,812 sec, 141,289 MBps, CPB=22,17
opt3: WHIRLPOOL 256 MiB total in 1,791 sec, 142,911 MBps, CPB=21,85
openssl: WHIRLPOOL 256 MiB total in 1,944 sec, 131,659 MBps, CPB=23,41
gcc (mingw) 4.8.1 32-bit
orig: WHIRLPOOL 256 MiB total in 4,636 sec, 55,223 MBps, CPB=56,53
opt1: WHIRLPOOL 256 MiB total in 4,230 sec, 60,525 MBps, CPB=51,64
opt2: WHIRLPOOL 256 MiB total in 4,176 sec, 61,297 MBps, CPB=51,00
opt3: WHIRLPOOL 256 MiB total in 3,809 sec, 67,200 MBps, CPB=46,52
openssl: WHIRLPOOL 256 MiB total in 1,869 sec, 136,970 MBps, CPB=22,52
msvc 13 x64
orig: WHIRLPOOL 256 MiB total in 1,885 sec, 135,811 MBps, CPB=22,62
opt1: WHIRLPOOL 256 MiB total in 1,802 sec, 142,042 MBps, CPB=21,72
opt2: WHIRLPOOL 256 MiB total in 1,804 sec, 141,868 MBps, CPB=22,06
opt3: WHIRLPOOL 256 MiB total in 2,325 sec, 110,113 MBps, CPB=28,42
openssl: WHIRLPOOL 256 MiB total in 1,931 sec, 132,567 MBps, CPB=23,40
Summary: the 2nd and 3rd optimization are the best for 32/64-bit GCC,
while the 3rd one degrades the Whirlpool performance under MS VC.
So it's better to switch to the 2nd optimization.
The second optimization was partly incorporated into RHash 1.3.3.