When files being rsynced are hundreds of Gbytes size collisions in hash table kill librsync.
So linear collision resolution has been replaced with log n collision resolution based on binary search.
Size of hash table is 65536 buckets. So when files size is (block_size * 65536 * t) then linear collision resolution is t / (log t) slower than binary search resolution. If block size is 2048 bytes then for 1TB speed up is 630 times. for 100GB - 80 times.
patch for sumset.h supplements logn-search.patch
Hi - thanks for the patch. I was away for a long time when you posted it, but I will review it.
Hi Martin, I know that you're probably busy with other things, but could you take a look at this patch? It cut 8 minutes off of a diff computation for a 16GB disk image.
One issue with it is that the nested function declaration requires a newer C compiler. This might create some heartburn for downstream packagers. For example, on my Mac using homebrew, I had to hack the build recipe to use a specific gcc instead of clang.
David, please see update below.
Hi David, here is the patch without nested functions: https://github.com/librsync/librsync/pull/14
Last edit: Victor Denisov 2014-11-22
David,
I have a version of this patch without nested functions.
Also, this patch has a bug. I'll upload new version with fix and without nested functions in a couple of days.
Please don't use the patch just yet.
I thought nobody cares, so didn't do it earlier.
Thanks,
Victor.
please move this to https://github.com/librsync/librsync
On Fri, Oct 10, 2014 at 1:15 PM, Victor Denisov victordenisov@users.sf.net
wrote:
--
Martin
Related
Patches: #11