From: Bart V. A. <bva...@ac...> - 2010-02-27 19:27:58
|
On Mon, Jan 11, 2010 at 7:44 PM, Vladislav Bolkhovitin <vs...@vl...> wrote: > > [ ... ] > > SRP initiator seems to be not too well optimized for the best performance. ISER initiator is noticeably better in this area. (replying to an e-mail of one month ago) I'm not sure the above statement makes sense. Below you can find the performance results for 512-byte reads with a varying number of threads and a NULLIO target. With a sufficiently high number of threads this test saturated the two CPU cores of the initiator system but not the CPU core of the target system. So one can conclude from the numbers below that for the initiator and target software combinations used for this test that although the difference is small, the latency for the SRP traffic is slightly lower than that of the iSER traffic and also that the CPU usage of the SRP traffic is slightly lower than that of the iSER traffic. These numbers are quite impressive since one can conclude from the numbers below that for both protocols one I/O operation is completed by the initiator system in about 17 microseconds or 44000 clock cycles. iSER: 1 read : io=128MB, bw=13,755KB/s, iops=27,510, runt= 9529msec 2 read : io=256MB, bw=26,118KB/s, iops=52,235, runt= 10037msec 4 read : io=512MB, bw=48,985KB/s, iops=97,970, runt= 10703msec 8 read : io=1,024MB, bw=57,519KB/s, iops=115K, runt= 18230msec 16 read : io=2,048MB, bw=57,880KB/s, iops=116K, runt= 36233msec 32 read : io=4,096MB, bw=57,990KB/s, iops=116K, runt= 72328msec 64 read : io=8,192MB, bw=58,066KB/s, iops=116K, runt=144468msec CPU load for 64 threads (according to vmstat 2): 20% us + 80% sy on the initiator and 40% us + 20% sy + 40% id on the target. SRP: 1 read : io=128MB, bw=14,211KB/s, iops=28,422, runt= 9223msec 2 read : io=256MB, bw=26,275KB/s, iops=52,549, runt= 9977msec 4 read : io=512MB, bw=49,257KB/s, iops=98,513, runt= 10644msec 8 read : io=1,024MB, bw=60,322KB/s, iops=121K, runt= 17383msec 16 read : io=2,048MB, bw=61,272KB/s, iops=123K, runt= 34227msec 32 read : io=4,096MB, bw=61,176KB/s, iops=122K, runt= 68561msec 64 read : io=8,192MB, bw=60,963KB/s, iops=122K, runt=137602msec CPU load for 64 threads (according to vmstat 2): 20% us + 80% sy on the initiator and 0% us + 50% sy + 50% id on the target. Setup details: * The above output was generated with the following command: for i in 1 2 4 8 16 32 64; do printf "%2d " $i; io-load 512 $i ${initiator_device} | grep runt; done * The io-load script is as follows: #!/bin/sh blocksize="${1:-512}" threads="${2:-1}" dev="${3:-sdj}" fio --bs="${blocksize}" --buffered=0 --size=128M --ioengine=sg --rw=read --invalidate=1 --end_fsync=1 --thread --numjobs="${threads}" --loops=1 --group_reporting --name=nullio --filename=/dev/${dev} * SRP target software: SCST r1522 compiled in release mode. * iSER target software: tgt 1.0.2. * InfiniBand hardware: QDR PCIe 2.0 HCA's. * Initiator system: 2.6.33-rc7 kernel (for-next branch of Rolands InfiniBand repository without the recently posted iSER and SRP performance improvement patches). SRP initiator was loaded with parameter srp_sg_tablesize=128 Frequency scaling was disabled. Runlevel: 3. CPU: E6750 @ 2.66GHz. * Target system: 2.6.30.7 kernel + SCST patches. Frequency scaling was disabled. Runlevel: 3. CPU: E8400 @ 3.00GHz booted with maxcpus=1. Bart. |