From: Janet M. <jan...@us...> - 2001-09-28 19:15:21
|
I ran Bonnie and some of my own basic testcases to compare the following: (1) Regular-file direct io versus raw (2) Block-device direct io versus raw. I tested on 3 systems: - a NUMA 4-way (cpu MHz 700.121, cache size 2048K) with 1G RAM and EMC (fiberchannel) scsi - a NUMA 4-way (cpu MHz 495.046, cache size 2048K) with 878M RAM and IBM-OEM scsi - a Netfinity 2-way (cpu MHz 731.069, cache size 256K) with 500M RAM and IBM-PSG scsi. All systems were running 2.4.9 plus Andrea's direct io patches (o_direct-14 and blkdev-pagecache-14). Bonnie was modified slightly; for example, to use page-aligned data buffers and to parse a command-line specified file/device name (see Bonnie.c attached). Findings from the NUMA systems: On the NUMA systems, regular-file direct io, block-device direct io and raw io performed comparably (within 1%). *** The rest of this note pertains to the Netfinity test system. *** Findings from the Netfinity system: (1) Regular-file direct io versus raw: direct io sequential reads were 20% faster on average. The data from Bonnie shows that regular-file direct io and raw performed comparably (within 1%), with the exception of the "Reading intelligently" phase (reads sequentially using read(2)), where regular-file direct io was about 22% faster than raw. Appended results from my own tests support these findings. One note, however, is that these results were based on tests using a 500M regular file and a 500M raw device. When testing with smaller files/devices, direct I/O was as much as 5% faster across the board with 2 exceptions: sequential writes were always equally performant, and sequential reads were always 20% faster on average. (2) Block-device direct io versus raw: performed comparably. The data from Bonnie shows that block-device direct io and raw performed equally well (within 1%), with the exception of the "Reading with getc()" phase (raw io was about 4% faster), which reads the file sequentially using the getc() stdio macro. The author describes the purer test of sequential read performance as the "Reading intelligently" phase (uses read(2)), where block device direct io and raw perform approximately the same. I've appended the results from my own tests (I can furnish the source if anyone is interested), which also indicate that performance is about identical, again excepting direct io sequential reads, which were slightly faster than raw (around 2%). --Janet Morgan jan...@us... IBM Linux Technology Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data from the Netfinity System =================== (1) REGULAR FILE DIRECT IO vs RAW ======================== ./Bonnie ./junk -D -s 500 Bonnie: Warning: You have 500MB RAM, but you test with only 500MB datasize! Bonnie: This might yield unrealistically good results, Bonnie: for reading and seeking and writing. Bonnie 1.2: File './Bonnie.4795', size: 524288000, volumes: 1 Writing with putc()... done: 469 kB/s 4.3 %CPU Rewriting... done: 1533 kB/s 0.4 %CPU Writing intelligently... done: 1777 kB/s 0.4 %CPU Reading with getc()... done: 8595 kB/s 84.5 %CPU Reading intelligently... done: 26318 kB/s 3.9 %CPU Seeker 1...Seeker 2...Seeker 3...start 'em...filename is ./junk filename is ./junk using direct I/O ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU elm3b9 1* 500 469 4.3 1777 0.4 1533 0.4 8595 84.5 26318 3.9 137.9 0.0 ./Bonnie /dev/raw/raw1 -s 500 Bonnie: Warning: You have 500MB RAM, but you test with only 500MB datasize! Bonnie: This might yield unrealistically good results, Bonnie: for reading and seeking and writing. Bonnie 1.2: File './Bonnie.4770', size: 524288000, volumes: 1 Writing with putc()... done: 467 kB/s 4.2 %CPU Rewriting... done: 1524 kB/s 1.0 %CPU Writing intelligently... done: 1750 kB/s 0.6 %CPU Reading with getc()... done: 8473 kB/s 84.1 %CPU Reading intelligently... done: 21546 kB/s 7.2 %CPU Seeker 2...Seeker 1...Seeker 3...start 'em...filename is /dev/raw/raw1 filename is /dev/raw/raw1 done...done...filename is /dev/raw/raw1 done... filename is /dev/raw/raw1 ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU elm3b9 1* 500 467 4.2 1750 0.6 1524 1.0 8473 84.1 21546 7.2 153.8 0.0 My testcases ------------ directSequentialRead loops = 128000, last read rc = 4096, total bytes read are 524288000 real 0m19.431s user 0m0.040s sys 0m1.670s rawSequentialRead loops = 128000, last read rc = 4096, total bytes read are 524288000 real 0m23.901s user 0m0.000s sys 0m2.490s * * * * * * * * * * * * * * * * directSequentialWrite loops = 128000, last write rc = 4096, total bytes written are 524288000 real 18m11.372s user 0m0.020s sys 0m1.620s rawSequentialWrite loops = 128000, last write rc = 4096, total bytes written are 524288000 real 18m15.565s user 0m0.030s sys 0m2.880s * * * * * * * * * * * * * * * * directSequentialRdWr loops = 128000, last read rc = 4096, total i/o bytes 524288000 readbytes = 261091328 writebytes = 263196672 real 10m2.364s user 0m0.130s sys 0m1.870s rawSequentialRdWr loops = 128000, last read rc = 4096, total bytes read are 524288000 readbytes = 261091328 writebytes = 263196672 real 10m5.569s user 0m0.050s sys 0m2.530s * * * * * * * * * * * * * * * * directRandomRead loops = 128000, last read rc = 4096, total bytes read are 524288000; no coverage to 47209/128000 blocks real 15m17.250s user 0m0.160s sys 0m2.170s rawRandomRead loops = 128000, last read rc = 4096, total bytes read are 524288000; no coverage to 47209/128000 blocks real 15m27.782s user 0m0.090s sys 0m2.800s * * * * * * * * * * * * * * * * directRandomWrite loops = 128000, last read rc = 4096, total bytes written are 524288000; no coverage to 47209/128000 blocks real 16m15.084s user 0m0.150s sys 0m1.930s rawRandomWrite loops = 128000, last read rc = 4096, total bytes written are 524288000; no coverage to 47209/128000 blocks real 16m19.337s user 0m0.160s sys 0m2.550s * * * * * * * * * * * * * * * * directRandomRdWr loops = 128000, last i/o rc = 4096, total bytes read are 524288000; no coverage to 47209/128000 blocks readbytes = 262860800 writebytes = 261427200 real 15m46.085s user 0m0.210s sys 0m1.900s rawRandomRdWr loops = 128000, last read rc = 4096, total io bytes 524288000; no coverage to 47209/128000 blocks readbytes = 262860800, writebytes = 261427200 real 15m51.176s user 0m0.140s sys 0m2.840s =================== (2) BLOCK DIRECT IO vs RAW ======================== ./Bonnie /dev/sda7 -D -s 500 <------block direct io Bonnie 1.2: File './Bonnie.3976', size: 524288000, volumes: 1 Writing with putc()... done: 467 kB/s 4.3 %CPU Rewriting... done: 1519 kB/s 0.6 %CPU Writing intelligently... done: 1745 kB/s 0.4 %CPU Reading with getc()... done: 8561 kB/s 82.9 %CPU Reading intelligently... done: 21422 kB/s 4.9 %CPU Seeker 1...Seeker 3...Seeker 2...start 'em...filename is /dev/sda7 using direct I/O ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU elm3b9 1* 500 467 4.3 1745 0.4 1519 0.6 8561 82.9 21422 4.9 153.8 0.0 ./Bonnie /dev/raw/raw1 -s 500 <------ note raw1 maps to sda7 Bonnie 1.2: File './Bonnie.3933', size: 524288000, volumes: 1 Writing with putc()... done: 467 kB/s 4.5 %CPU Rewriting... done: 1526 kB/s 1.1 %CPU Writing intelligently... done: 1749 kB/s 0.6 %CPU Reading with getc()... done: 8934 kB/s 82.2 %CPU Reading intelligently... done: 21293 kB/s 7.4 %CPU Seeker 2...Seeker 1...Seeker 3...start 'em...filename is /dev/raw/raw1 filename is /dev/raw/raw1 ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU elm3b9 1* 500 467 4.5 1749 0.6 1526 1.1 8934 82.2 21293 7.4 153.8 0.0 My testcases: directSequentialRead loops = 128000, last read rc = 4096, total bytes read are 524288000 real 0m23.863s user 0m0.030s sys 0m1.850s rawSequentialRead loops = 128000, last read rc = 4096, total bytes read are 524288000 real 0m24.247s user 0m0.050s sys 0m2.580s * * * * * * * * * * * * * * * * directSequentialWrite loops = 128000, last write rc = 4096, total bytes written are 524288000 real 18m15.893s user 0m0.030s sys 0m2.650s rawSequentialWrite loops = 128000, last write rc = 4096, total bytes written are 524288000 real 18m15.401s user 0m0.060s sys 0m2.640s * * * * * * * * * * * * * * * * directSequentialRdWr loops = 128000, last read rc = 4096, total i/o bytes 524288000 readbytes = 261091328 writebytes = 263196672 real 10m6.107s user 0m0.040s sys 0m2.170s rawSequentialRdWr loops = 128000, last read rc = 4096, total bytes read are 524288000 readbytes = 261091328 writebytes = 263196672 real 10m5.906s user 0m0.080s sys 0m2.450s * * * * * * * * * * * * * * * * directRandomRead loops = 128000, last read rc = 4096, total bytes read are 524288000; no coverage to 47209/128000 blocks real 15m37.309s user 0m0.150s sys 0m2.270s rawRandomRead loops = 128000, last read rc = 4096, total bytes read are 524288000; no coverage to 47209/128000 blocks real 15m37.267s user 0m0.130s sys 0m2.870s * * * * * * * * * * * * * * * * directRandomWrite loops = 128000, last read rc = 4096, total bytes written are 524288000; no coverage to 47209/128000 blocks real 16m19.741s user 0m0.180s sys 0m2.360s rawRandomWrite loops = 128000, last read rc = 4096, total bytes written are 524288000; no coverage to 47209/128000 blocks real 16m19.992s user 0m0.150s sys 0m2.490s * * * * * * * * * * * * * * * * directRandomRdWr loops = 128000, last i/o rc = 4096, total bytes read are 524288000; no coverage to 47209/128000 blocks readbytes = 262860800 writebytes = 261427200 real 15m53.864s user 0m0.170s sys 0m2.480s rawRandomRdWr loops = 128000, last read rc = 4096, total io bytes 524288000; no coverage to 47209/128000 blocks readbytes = 262860800, writebytes = 261427200 real 15m54.398s user 0m0.140s sys 0m2.690s |