From: Stelian P. <st...@po...> - 2010-04-27 14:49:53
|
Hi Phillip, On Mon, Apr 26, 2010 at 11:32:47AM -0400, Phillip Susi wrote: > The archives show almost no activity on this list for some time, Yeah, these are quiet times for dump :) > so I > hope this finds its way to someone. I have been trying to understand > why dump uses multiple processes that seem to burn through a lot of cpu > time when doing a compressed backup with a larger block size ( 512 kb ). The choice of using multiple processes in dump was made back in the (very) old days, when the speed of tape drives exceeded the disk IO (and CPU) speeds, and when it was important - performance wise - to keep the tape drives under a constant flow of data. > The code is very hard to follow, but it seems like the first process > reads blocks from the disk and writes them to a pipe. The second > process reads from the pipe and compresses the data, then writes it to > another pipe. Finally the third process reads the compressed data from > the pipe, slaps a header on it, and writes it to the tape. Is this > correct? Does the third process attempt to reblock the data so it > always writes fixed size records? If so, how does it do this? There is one main process (well, more exactly one main process per volume) which traverses the filesystem, computing the list of disk blocks to be saved. This list is written into a pipe, which is read by several (#define SLAVES 3) slaves. Each slave reads a block number from the pipe, seeks the disk to the wanted offset, reads the block, compresses the block if asked to, writes it on the tape, then signals the next slave to do its job. The slaves do synchronize one another, serializing the access to the tape drive, but doing disk reads and compression in parallel. Hope this explains things a bit. Stelian. -- Stelian Pop <st...@po...> |