[Lse-tech] FIle System buffered IO performance in mm tree

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Here are the results of some IO performance runs I have done on the 
2.6.0-test6 and test6mm1 kernels.  In all the graphs test6 is 
2.6.0-test6 baseline, test6mm1 is 2.6.0-test6-mm1 and test6mm1-noread is 
2.6.0-test6-mm1 with the aio-readahead-speedup.patch backed out.   All 
tests were performed on an 8way 700Mhz PIII machine with 4 SERVRaid 
adapters with 80 drives combined to form 20 raid 0 logical drives using 
rawread.  All io is done to a 1GB file on each drive with the ext2 
filesystem and data file created before the first run and being mounted 
and unmounted between each test. The test files are opened with O_SYNC 
in all tests.

The first graph shows sequential read performance with 8 threads per 
drive cooperating on the sequential reads.  There are 2 items of great 
interest.  First, at block sizes smaller than 4k, the mm trees are 
significantly slower than base test6 even though at 2k they consume more 
CPU.  This need to be investigated.  The second item is that at large 
block sizes (>32k)  the mm trees outperform the base tree in terms of 
throughput, but there is a severe increase in the amount of CPU consumed 
while throughput remains constant.

The second graph is sequential write performance.  About all I can say 
is WOW.  mm tree kicks serious butt.

The third graph is random read.  Here we see the large benefit from the 
aio-readahead-speedup.patch at block sizes from 8k-128k.

The final graph is random write. Throughputs are very similar, but the 
mm trees use about 30%-40% more CPU.

I have lots of backup data including sar output and readprofiles for all 
of these runs if anyone is interested.

Steve