Menu

#1 RFE: Parallelize file reads during patching

open
nobody
None
2
2014-10-10
2004-09-09
tromer
No

During patching, librsync currently reads old blocks
sequentially, waiting for each to be retrieved before
retrieving the next one. In highly mixed deltas (such
as those generated by the test script of bug 1022764, a
re-sorted database, a filesystem restored from tar,
etc.), many of these operations require a seek in the
underlying basis stream. When this stream is backed by
a file residing on hard disk, this can make patching
extremely slow.

If librsync is changed to perform many reads in
parallel, it will give the storage system an
opportunity to order the physical disk accesses more
efficiently through the usual mechanisms: kernel
elevator algorithm, parallel access to disks on a RAID
device, TCQ on SCSI or SATA drives, etc.

On the librsync side, this will require replacing the
rs_copy_cb callback interface by a more complicated
interface, that supports submission of multiple read
requests and querying which was completed. The
application will then do whatever it takes to implement
that; in rdiff's case, it probably means threads or forks.

Some of this functionality can be separated into a
separate library, which may be of independent interest.

Discussion

  • Martin Pool

    Martin Pool - 2004-09-09

    Logged In: YES
    user_id=521

    Hm.

    Having the blocks be very unordered is probably a pretty
    unusual case. Realistic data seems likely to either not
    match at all, or match more-or-less in continuous sequences.
    It seems relatively rare to rearrange the order of blocks
    in a file, as opposed to overwriting, inserting, deleting, etc.

     
  • Martin Pool

    Martin Pool - 2004-09-09
    • priority: 5 --> 2
     
  • Donovan Baarda

    Donovan Baarda - 2006-02-21

    Logged In: YES
    user_id=10273

    Wow... talk about a pie-in-the-sky wishlist. There are so
    many things in librsync that need improving first, something
    like this is probably never going to happen...

     
  • Donovan Baarda

    Donovan Baarda - 2006-02-21
    • labels: 506015 -->
     
  • David Coppit

    David Coppit - 2014-10-10

    What is the underlying reason for this request? I assume it's for better performance. On my system I have an SSD, and for me the performance bottleneck is CPU. But I wouldn't be surprised if even for spinning disks the bottleneck is CPU.

     

Log in to post a comment.

MongoDB Logo MongoDB