Re: [Dump-devel] Understanding dump's use of multiple processes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Phillip,

On Mon, Apr 26, 2010 at 11:32:47AM -0400, Phillip Susi wrote:

> The archives show almost no activity on this list for some time,

Yeah, these are quiet times for dump :)

> so I
> hope this finds its way to someone.  I have been trying to understand
> why dump uses multiple processes that seem to burn through a lot of cpu
> time when doing a compressed backup with a larger block size ( 512 kb ).

The choice of using multiple processes in dump was made back in the
(very) old days, when the speed of tape drives exceeded the disk IO (and
CPU) speeds, and when it was important - performance wise - to keep the
tape drives under a constant flow of data.

> The code is very hard to follow, but it seems like the first process
> reads blocks from the disk and writes them to a pipe.  The second
> process reads from the pipe and compresses the data, then writes it to
> another pipe.  Finally the third process reads the compressed data from
> the pipe, slaps a header on it, and writes it to the tape.  Is this
> correct?  Does the third process attempt to reblock the data so it
> always writes fixed size records?  If so, how does it do this?

There is one main process (well, more exactly one main process per
volume) which traverses the filesystem, computing the list of disk blocks
to be saved. This list is written into a pipe, which is read by several
(#define SLAVES 3) slaves. Each slave reads a block number from the
pipe, seeks the disk to the wanted offset, reads the block, compresses
the block if asked to, writes it on the tape, then signals the next
slave to do its job.

The slaves do synchronize one another, serializing the access to the
tape drive, but doing disk reads and compression in parallel.

Hope this explains things a bit.

Stelian.
-- 
Stelian Pop <st...@po...>