During patching, librsync currently reads old blocks
sequentially, waiting for each to be retrieved before
retrieving the next one. In highly mixed deltas (such
as those generated by the test script of bug 1022764, a
re-sorted database, a filesystem restored from tar,
etc.), many of these operations require a seek in the
underlying basis stream. When this stream is backed by
a file residing on hard disk, this can make patching
extremely slow.
If librsync is changed to perform many reads in
parallel, it will give the storage system an
opportunity to order the physical disk accesses more
efficiently through the usual mechanisms: kernel
elevator algorithm, parallel access to disks on a RAID
device, TCQ on SCSI or SATA drives, etc.
On the librsync side, this will require replacing the
rs_copy_cb callback interface by a more complicated
interface, that supports submission of multiple read
requests and querying which was completed. The
application will then do whatever it takes to implement
that; in rdiff's case, it probably means threads or forks.
Some of this functionality can be separated into a
separate library, which may be of independent interest.
Logged In: YES
user_id=521
Hm.
Having the blocks be very unordered is probably a pretty
unusual case. Realistic data seems likely to either not
match at all, or match more-or-less in continuous sequences.
It seems relatively rare to rearrange the order of blocks
in a file, as opposed to overwriting, inserting, deleting, etc.
Logged In: YES
user_id=10273
Wow... talk about a pie-in-the-sky wishlist. There are so
many things in librsync that need improving first, something
like this is probably never going to happen...
What is the underlying reason for this request? I assume it's for better performance. On my system I have an SSD, and for me the performance bottleneck is CPU. But I wouldn't be surprised if even for spinning disks the bottleneck is CPU.