librsync -- network-delta library / Feature Requests / #1 RFE: Parallelize file reads during patching

RFE: Parallelize file reads during patching

Status: Beta

Brought to you by: abo, bescoto, mbp, wayned

#1 RFE: Parallelize file reads during patching

Status: open

Owner: nobody

Labels: None

Priority: 2

Updated: 2014-10-10

Created: 2004-09-09

Creator: tromer

Private: No

During patching, librsync currently reads old blocks
sequentially, waiting for each to be retrieved before
retrieving the next one. In highly mixed deltas (such
as those generated by the test script of bug 1022764, a
re-sorted database, a filesystem restored from tar,
etc.), many of these operations require a seek in the
underlying basis stream. When this stream is backed by
a file residing on hard disk, this can make patching
extremely slow.

If librsync is changed to perform many reads in
parallel, it will give the storage system an
opportunity to order the physical disk accesses more
efficiently through the usual mechanisms: kernel
elevator algorithm, parallel access to disks on a RAID
device, TCQ on SCSI or SATA drives, etc.

On the librsync side, this will require replacing the
rs_copy_cb callback interface by a more complicated
interface, that supports submission of multiple read
requests and querying which was completed. The
application will then do whatever it takes to implement
that; in rdiff's case, it probably means threads or forks.

Some of this functionality can be separated into a
separate library, which may be of independent interest.

Discussion

Martin Pool - 2004-09-09

Logged In: YES
user_id=521

Hm.

Having the blocks be very unordered is probably a pretty
unusual case. Realistic data seems likely to either not
match at all, or match more-or-less in continuous sequences.
It seems relatively rare to rearrange the order of blocks
in a file, as opposed to overwriting, inserting, deleting, etc.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Pool - 2004-09-09

priority: 5 --> 2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donovan Baarda - 2006-02-21

Logged In: YES
user_id=10273

Wow... talk about a pie-in-the-sky wishlist. There are so
many things in librsync that need improving first, something
like this is probably never going to happen...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Donovan Baarda - 2006-02-21

labels: 506015 -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David Coppit - 2014-10-10

What is the underlying reason for this request? I assume it's for better performance. On my system I have an SSD, and for me the performance bottleneck is CPU. But I wouldn't be surprised if even for spinning disks the bottleneck is CPU.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

RFE: Parallelize file reads during patching

Group

Searches

Help

#1 RFE: Parallelize file reads during patching

Discussion