Menu

#11 performance issue resolution for large files

open
nobody
7
2014-10-10
2012-09-24
No

When files being rsynced are hundreds of Gbytes size collisions in hash table kill librsync.
So linear collision resolution has been replaced with log n collision resolution based on binary search.
Size of hash table is 65536 buckets. So when files size is (block_size * 65536 * t) then linear collision resolution is t / (log t) slower than binary search resolution. If block size is 2048 bytes then for 1TB speed up is 630 times. for 100GB - 80 times.

Related

Patches: #11

Discussion

  • Victor Denisov

    Victor Denisov - 2012-09-24
     
  • Victor Denisov

    Victor Denisov - 2012-09-24
    • summary: logn implementation of rs_search_for_block --> performance issue resolution for large files
     
  • Victor Denisov

    Victor Denisov - 2012-09-24
    • priority: 5 --> 7
     
  • Victor Denisov

    Victor Denisov - 2012-09-25

    patch for sumset.h supplements logn-search.patch

     
  • Martin Pool

    Martin Pool - 2012-12-23

    Hi - thanks for the patch. I was away for a long time when you posted it, but I will review it.

     
  • David Coppit

    David Coppit - 2014-10-10

    Hi Martin, I know that you're probably busy with other things, but could you take a look at this patch? It cut 8 minutes off of a diff computation for a 16GB disk image.

    One issue with it is that the nested function declaration requires a newer C compiler. This might create some heartburn for downstream packagers. For example, on my Mac using homebrew, I had to hack the build recipe to use a specific gcc instead of clang.

     
    • Victor Denisov

      Victor Denisov - 2014-10-10

      David, please see update below.

       
    • Victor Denisov

      Victor Denisov - 2014-11-22

      Hi David, here is the patch without nested functions: https://github.com/librsync/librsync/pull/14

       

      Last edit: Victor Denisov 2014-11-22
  • Victor Denisov

    Victor Denisov - 2014-10-10

    David,

    I have a version of this patch without nested functions.
    Also, this patch has a bug. I'll upload new version with fix and without nested functions in a couple of days.
    Please don't use the patch just yet.
    I thought nobody cares, so didn't do it earlier.

    Thanks,
    Victor.

     
    • Martin Pool

      Martin Pool - 2014-10-10

      please move this to https://github.com/librsync/librsync

      On Fri, Oct 10, 2014 at 1:15 PM, Victor Denisov victordenisov@users.sf.net
      wrote:

      David,

      I have a version of this patch without nested functions.
      Also, this patch has a bug. I'll upload new version with fix and without
      nested functions in a couple of days.
      Please don't use the patch just yet.
      I thought nobody cares, so didn't do it earlier.

      Thanks,
      Victor.


      [patches:#11] performance issue resolution for large files

      Status: open
      Group:
      Labels: Implementation
      Created: Mon Sep 24, 2012 05:07 PM UTC by Victor Denisov
      Last Updated: Fri Oct 10, 2014 03:53 PM UTC
      Owner: nobody

      When files being rsynced are hundreds of Gbytes size collisions in hash
      table kill librsync.
      So linear collision resolution has been replaced with log n collision
      resolution based on binary search.
      Size of hash table is 65536 buckets. So when files size is (block_size
      * 65536 * t) then linear collision resolution is t / (log t) slower
      than binary search resolution. If block size is 2048 bytes then for 1TB
      speed up is 630 times. for 100GB - 80 times.


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/librsync/patches/11/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

      --
      Martin

       

      Related

      Patches: #11


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.