Hi Pengwenwei,
the SPEED is VERY VERY important, sadly I am just an amateur and can't grasp your work, for now at least.
If you can compile/upload a Win64 executable I will be glad to test how it performs in my testdata.
If you can dumbdown your test tool to accept a testfile on command line (pretty much as my Knight-tours test), I will test it.
I am sorry for the delayed answer.
If you have something as a start (description, examples) I will try to help you. The problem is that I don't know C++ nor your promising project, so I am kinda useless.
If you can create some example (as hashing billions of Knight-Tours) I will be able to compare my FNV1A_YoshimitsuTRIAD with your results and make some useful comparison.
One more problem, for nearly 60 days I have to take care of sick family member, I have no time for anything.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Each Knight-Tour is an unique 128 bytes long line (key in fact), it is 64x2 bytes long.
The idea is to compare general purpose hash functions, no matter of what kind, and see how they fare under heavy loads - when keys are many times more than slots.
As you can see at: http://www.codeproject.com/Articles/716530/Fastest-hash-function-for-table-lookups-in-C
when ratio keys:slots is 1:1 CRC32 is slightly better than my FNV1A variant:
FNV1A_YoshimitsuTRIADii: Keys = 0,000,134,217,729; 1 x MAXcollisionsAtSomeSlots = 0,012; FeeSlots = 50,530,128
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,134,217,729; 4 x MAXcollisionsAtSomeSlots = 0,011; FeeSlots = 49,561,215
...
When testing many functions (Sbox was one of them) with above torture I found that from one point they start to accumulate many collisions onto several slots thus proving their degradation - meaning high number of maximum lookups.
In short, you can generate 1,000,000,000,000 unique keys in-memory and hash them with:
D:_KAZE>Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.exe a8 4000000000
At end:
FNV1A_YoshimitsuTRIADii: Keys = 1,000,056,291,329; 1 x MAXcollisionsAtSomeSlots = 7,930; FeeSlots = 00,000,000
CRC32 0x8F6E37A0, iSCSI: Keys = 1,000,056,291,329; 2 x MAXcollisionsAtSomeSlots = 7,910; FeeSlots = 00,000,000
In above lines MAXcollisionsAtSomeSlots = 7,930 means that the maximum lookups for any key is 7,930 attempts.
Sure, for a given number of Knight-Tours, but what about dynamic such!
Let's say you have perfectly hashed 1 billion Knight-Tours, my suggestion was to show how your hash approach behaves for arbitrary number of keys, say 1 trillion.
Basically, if you want to illustrate perfect hashing versus e.g. SpookyHash you have to use some common benchmark.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Three different map algorithm, used in different application scenarios.
1,memMap Based on memory No hard disk consumption.
2,diskMap Based on the hard disk No memory consumption.
3,hashMap No delete function, but the best performance.
memMap and diskMap can turn to hashMap by memMap2HashMap and diskMap2HashMap.
In addition,Provide Google HashMap for comparative test.
Algorithm of Google,the performance is not good, but also has the collision probability .
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Pengwenwei,
the SPEED is VERY VERY important, sadly I am just an amateur and can't grasp your work, for now at least.
If you can compile/upload a Win64 executable I will be glad to test how it performs in my testdata.
If you can dumbdown your test tool to accept a testfile on command line (pretty much as my Knight-tours test), I will test it.
D:_KAZE>Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.exe a8 4000000000
...
Polynomial used:
CRC32: 0x8F6E37A0
KEYS to be hashed = 4,000,000,000x4x64
HashSizeInBits = 27
ReportAtEvery = 134,217,727
Allocating HASH memory 512MB ... OK
Allocating HASH memory 512MB ... OK
The first KT:
A8C7E8G7H5G3H1F2H3G1E2C1A2B4A6B8D7F8H7G5F7H8G6H4G2E1C2A1B3A5B7D8C6A7C8E7G8H6G4H2F1D2B1A3B5D6F5D4F3E5C4B2D3F4E6C5A4B6D5F6E4C3D1E3
...
FNV1A_YoshimitsuTRIADii: Keys = 0,000,134,217,729; 1 x MAXcollisionsAtSomeSlots = 0,012; FeeSlots = 50,530,128
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,134,217,729; 4 x MAXcollisionsAtSomeSlots = 0,011; FeeSlots = 49,561,215
FNV1A_YoshimitsuTRIADii: Keys = 0,000,268,435,457; 3 x MAXcollisionsAtSomeSlots = 0,015; FeeSlots = 19,089,321
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,268,435,457; 4 x MAXcollisionsAtSomeSlots = 0,014; FeeSlots = 18,307,048
FNV1A_YoshimitsuTRIADii: Keys = 0,000,402,653,185; 1 x MAXcollisionsAtSomeSlots = 0,019; FeeSlots = 07,194,504
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,402,653,185; 8 x MAXcollisionsAtSomeSlots = 0,017; FeeSlots = 06,762,415
FNV1A_YoshimitsuTRIADii: Keys = 0,000,536,870,913; 2 x MAXcollisionsAtSomeSlots = 0,021; FeeSlots = 02,708,588
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,536,870,913; 2 x MAXcollisionsAtSomeSlots = 0,020; FeeSlots = 02,496,170
FNV1A_YoshimitsuTRIADii: Keys = 0,000,671,088,641; 2 x MAXcollisionsAtSomeSlots = 0,023; FeeSlots = 01,022,485
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,671,088,641; 2 x MAXcollisionsAtSomeSlots = 0,023; FeeSlots = 00,922,884
...
FNV1A_YoshimitsuTRIADii: Keys = 1,000,056,291,329; 1 x MAXcollisionsAtSomeSlots = 7,930; FeeSlots = 00,000,000
CRC32 0x8F6E37A0, iSCSI: Keys = 1,000,056,291,329; 2 x MAXcollisionsAtSomeSlots = 7,910; FeeSlots = 00,000,000
How your superb tool is gonna hash the above 1 trillion Knight-Tours, it is up to you, we to find out.
P.S.
For some unknown reason your comment on CodeProject was erased, by whom?!
hi
my english is bad
and my dont know who erased my comment
will you help me write the documentary and load to codeproject
I am sorry for the delayed answer.
If you have something as a start (description, examples) I will try to help you. The problem is that I don't know C++ nor your promising project, so I am kinda useless.
If you can create some example (as hashing billions of Knight-Tours) I will be able to compare my FNV1A_YoshimitsuTRIAD with your results and make some useful comparison.
One more problem, for nearly 60 days I have to take care of sick family member, I have no time for anything.
could you outline what is "hashing billions of Knight-Tours" and "FNV1A_YoshimitsuTRIAD" ?
I thought that you have seen the benchmark hashing 1 trillion Knight-Tours at:
http://www.codeproject.com/script/articles/download.aspx?file=/KB/cpp/716530/_KAZE_32bit_hash_showdown_FNV1A_YoshimitsuTRIAD_source.zip&rp=http://www.codeproject.com/Articles/716530/Fastest-hash-function-for-table-lookups-in-C
Each Knight-Tour is an unique 128 bytes long line (key in fact), it is 64x2 bytes long.
The idea is to compare general purpose hash functions, no matter of what kind, and see how they fare under heavy loads - when keys are many times more than slots.
As you can see at:
http://www.codeproject.com/Articles/716530/Fastest-hash-function-for-table-lookups-in-C
when ratio keys:slots is 1:1 CRC32 is slightly better than my FNV1A variant:
FNV1A_YoshimitsuTRIADii: Keys = 0,000,134,217,729; 1 x MAXcollisionsAtSomeSlots = 0,012; FeeSlots = 50,530,128
CRC32 0x8F6E37A0, iSCSI: Keys = 0,000,134,217,729; 4 x MAXcollisionsAtSomeSlots = 0,011; FeeSlots = 49,561,215
...
When testing many functions (Sbox was one of them) with above torture I found that from one point they start to accumulate many collisions onto several slots thus proving their degradation - meaning high number of maximum lookups.
In short, you can generate 1,000,000,000,000 unique keys in-memory and hash them with:
D:_KAZE>Knight-Tour_FNV1A_YoshimitsuTRIADii_vs_CRC32_TRISMUS.exe a8 4000000000
At end:
FNV1A_YoshimitsuTRIADii: Keys = 1,000,056,291,329; 1 x MAXcollisionsAtSomeSlots = 7,930; FeeSlots = 00,000,000
CRC32 0x8F6E37A0, iSCSI: Keys = 1,000,056,291,329; 2 x MAXcollisionsAtSomeSlots = 7,910; FeeSlots = 00,000,000
In above lines MAXcollisionsAtSomeSlots = 7,930 means that the maximum lookups for any key is 7,930 attempts.
My favorite general purpose hash function is FNV1a-YoshimitsuTRIAD, you can see it tested for linear speed at:
http://encode.ru/threads/1371-Filesystem-benchmark/page7
Last edit: Sanmayce 2015-04-17
I will transfer all the byte stream to address pointer,
No key output,
If you need The output,
will be the form of 1,... Num.
Sure, for a given number of Knight-Tours, but what about dynamic such!
Let's say you have perfectly hashed 1 billion Knight-Tours, my suggestion was to show how your hash approach behaves for arbitrary number of keys, say 1 trillion.
Basically, if you want to illustrate perfect hashing versus e.g. SpookyHash you have to use some common benchmark.
Three different map algorithm, used in different application scenarios.
1,memMap Based on memory No hard disk consumption.
2,diskMap Based on the hard disk No memory consumption.
3,hashMap No delete function, but the best performance.
memMap and diskMap can turn to hashMap by memMap2HashMap and diskMap2HashMap.
In addition,Provide Google HashMap for comparative test.
Algorithm of Google,the performance is not good, but also has the collision probability .