[Scst-svn] SF.net SVN: scst: [157] trunk/scst

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 157
          http://scst.svn.sourceforge.net/scst/?rev=157&view=rev
Author:   vlnb
Date:     2007-08-08 02:52:23 -0700 (Wed, 08 Aug 2007)

Log Message:
-----------
 - Docs updated
 - Minor fix

Modified Paths:
--------------
    trunk/scst/README
    trunk/scst/ToDo
    trunk/scst/src/dev_handlers/scst_vdisk.c
    trunk/scst/src/scst_targ.c

Modified: trunk/scst/README
===================================================================

--- trunk/scst/README	2007-08-07 17:12:21 UTC (rev 156)
+++ trunk/scst/README	2007-08-08 09:52:23 UTC (rev 157)
@@ -361,8 +361,9 @@
       
       - READ_ONLY - read only
       
-      - O_DIRECT - both read and write caching disabled (doesn't work
-        currently).
+      - O_DIRECT - both read and write caching disabled. This mode
+        isn't currently fully implemented, you should use user space
+	fileio_tgt program in O_DIRECT mode instead (see below).
 
       - NULLIO - in this mode no real IO will be done, but success will be
         returned. Intended to be used for performance measurements at the same
@@ -499,28 +500,21 @@
 User space program fileio_tgt uses interface of scst_user dev handler
 and allows to see how it work in various modes. Fileio_tgt provides
 mostly the same functionality as scst_vdisk handler with the only
-exception that it supports O_DIRECT mode. This mode is basically the
-same as BLOCKIO, but also supports files, so for some loads it could be
-significantly faster, than regular FILEIO access, provided by
-scst_vdisk. All the words about BLOCKIO from above apply to O_DIRECT as
-well. While running fileio_tgt if you don't understand some its options,
-use defaults for them, those values are the fastest.
+exceptions that it has implemented O_DIRECT mode and doesn't support
+BLOCKIO one. O_DIRECT mode is basically the same as BLOCKIO, but also
+supports files, so for some loads it could be significantly faster, than
+regular FILEIO access. All the words about BLOCKIO from above apply to
+O_DIRECT as well. While running fileio_tgt if you don't understand some
+its options, use defaults for them, those values are the fastest.
 
 Performance
 -----------
 
 Before doing any performance measurements note that:
 
-I. Currently maximum performance is possible only with real SCSI devices
-or VDISK BLOCKIO mode with several simultaneously executed commands
-(SCSI tagged queuing) or performance handlers. If you have enough CPU
-power, VDISK FILEIO handler also could provide the same results, when
-aggregate throughput is close to the aggregate throughput locally on the
-target from the same disks. Also note, that currently IO subsystem in
-Linux implemented on such way, so a VDISK FILEIO device over a single
-file occupied entire formatted with some file system device (eg
-/dev/hdc) could perform considerably better, than a VDISK FILEIO device
-over /dev/hdc itself without the file system involved.
+I. Performance results are very much dependent from your type of load,
+so it is crucial that you choose access mode (FILEIO, BLOCKIO,
+O_DIRECT, pass-through), which suits your needs the best.
 
 II. In order to get the maximum performance you should:
 
@@ -529,9 +523,9 @@
  - Disable in Makefile STRICT_SERIALIZING, EXTRACHECKS, TRACING, DEBUG*,
    SCST_STRICT_SECURITY, SCST_HIGHMEM
 
-2. For Qlogic target driver:
+2. For target drivers:
 
- - Disable in Makefile EXTRACHECKS, TRACING, DEBUG_TGT, DEBUG_WORK_IN_THREAD
+ - Disable in Makefiles EXTRACHECKS, TRACING, DEBUG*
 
 3. For device handlers, including VDISK:
 
@@ -554,13 +548,40 @@
 
  - The default kernel read-ahead and queuing settings are optimized
    for locally attached disks, therefore they are not optimal if they
-   attached remotely (our case), which sometimes could lead to
-   unexpectedly low throughput. You should increase read-ahead size
-   (/sys/block/device/queue/read_ahead_kb) to at least 256Kb or even
-   more on all initiators and the target. Also experiment with other
-   parameters in /sys/block/device directory, they also affect the
-   performance. If you find the best values, please share them with us.
+   attached remotely (SCSI target case), which sometimes could lead to
+   unexpectedly low throughput. You should increase read-ahead size to at
+   least 512KB or even more on all initiators and the target.
+   
+   You should also limit on all initiators maximum amount of sectors per
+   SCSI command. To do it on Linux initiators, run:
+  
+   echo “64” > /sys/block/sdX/queue/max_sectors_kb
 
+   where specify instead of X your imported from target device letter,
+   like 'b', i.e. sdb.
+
+   To increase read-ahead size on Linux, run:
+  
+   blockdev --setra N /dev/sdX
+
+   where N is a read-ahead number in 512-byte sectors and X is a device
+   letter like above.
+
+   Note: you need to set read-ahead setting for device sdX again after
+   you changed the maximum amount of sectors per SCSI command for that
+   device.
+
+ - You may need to increase amount of requests that OS on initiator
+   sends to the target device. To do it on Linux initiators, run
+
+   echo “512” > /sys/block/sdX/queue/nr_requests
+
+   where X is a device letter like above.
+
+   You may also experiment with other parameters in /sys/block/sdX
+   directory, they also affect performance. If you find the best values,
+   please share them with us.
+
  - Use on the target deadline IO scheduler with read_expire and
    write_expire increased on all exported devices to 5000 and 20000
    correspondingly.
@@ -571,41 +592,30 @@
 5. For hardware.
 
  - Make sure that your target hardware (e.g. target FC card) and underlaying
-   SCSI hardware (e.g. SCSI card to which your disks connected) stay on
-   different PCI buses. They will have to work in parallel, so it
-   will be better if they don't race for the bus. The problem is not
-   only in the bandwidth, which they have to share, but also in the
-   interaction between the cards during that competition. We have told
-   that in some cases it could lead to 5-10 times less performance, than
+   IO hardware (e.g. IO card, like SATA, SCSI or RAID to which your
+   disks connected) stay on different PCI buses. They have to work in
+   parallel, so it will be better if they don't compete for the bus. The
+   problem is not only in the bandwidth, which they have to share, but
+   also in the interaction between cards during that competition. In
+   some cases it could lead up to 5-10 times less performance, than
    expected.
 
 IMPORTANT: If you use on initiator some versions of Windows (at least W2K)
 =========  you can't get good write performance for VDISK FILEIO devices with
            default 512 bytes block sizes. You could get about 10% of the
-	   expected one. This is because of "unusual" write access
-	   pattern, with which Windows'es write data and which is
-	   (simplifying) incompatible with how Linux page cache works,
-	   so for each write the corresponding block must be read first.
-	   With 4096 bytes block sizes for VDISK devices the write
-	   performance will be as expected. Actually, any system on
-	   initiator, not only Windows, will benefit from block size
+	   expected one. This is because of partition alignment, which
+	   is (simplifying) incompatible with how Linux page cache
+	   works, so for each write the corresponding block must be read
+	   first. Use 4096 bytes block sizes for VDISK devices and you
+	   will have the expected write performance. Actually, any OS on
+	   initiators, not only Windows, will benefit from block size
 	   max(PAGE_SIZE, BLOCK_SIZE_ON_UNDERLYING_FS), where PAGE_SIZE
-	   is the page size, BLOCK_SIZE_ON_UNDERLYING_FS is block size on
-	   the underlying FS, on which the device file located, or 0, if
-	   a device node is used. Both values are on the target.
+	   is the page size, BLOCK_SIZE_ON_UNDERLYING_FS is block size
+	   on the underlying FS, on which the device file located, or 0,
+	   if a device node is used. Both values are from the target.
+	   See also important notes about setting block sizes >512 bytes
+	   for VDISK FILEIO devices above.
 
-Just for reference: we had with 0.9.2 and "old" Qlogic driver on 2.4.2x
-kernel, where we did careful performance study, aggregate throughput
-about 390 Mb/sec from 2 qla2300 cards sitting on different 64-bit PCI
-buses and working simultaneously for two different initiators with
-several simultaneously working load programs on each. From one card -
-about 190 Mb/sec. We used tape_perf handler, so there was no influence
-from underlying SCSI hardware, i.e. we measured only SCST/FC overhead.
-The target computer configuration was not very modern for the moment:
-something like 2x1GHz Intel P3 Xeon CPUs. You can estimate the
-memory/PCI speed from that. CPU load was ~5%, there were ~30K IRQ/sec
-and no additional SCST related context switches.
-
 Credits
 -------
 

Modified: trunk/scst/ToDo
===================================================================
--- trunk/scst/ToDo	2007-08-07 17:12:21 UTC (rev 156)
+++ trunk/scst/ToDo	2007-08-08 09:52:23 UTC (rev 157)
@@ -8,7 +8,7 @@
    the page cache (in order to avoid data copy between it and internal
    buffers). Requires modifications of the kernel.
 
- - O_DIRECT mode doesn't work for FILEIO (oops'es somewhere in the kernel)
+ - Fix in-kernel O_DIRECT mode.
  
  - Close integration with Linux initiator SCSI mil-level, including 
    queue types (simple, ordered, etc.) and local initiators (sd, st, sg,

Modified: trunk/scst/src/dev_handlers/scst_vdisk.c
===================================================================
--- trunk/scst/src/dev_handlers/scst_vdisk.c	2007-08-07 17:12:21 UTC (rev 156)
+++ trunk/scst/src/dev_handlers/scst_vdisk.c	2007-08-08 09:52:23 UTC (rev 157)
@@ -2549,7 +2549,8 @@
 				TRACE_DBG("%s", "O_DIRECT");
 		#else
 				PRINT_INFO_PR("%s flag doesn't currently"
-					" work, ignoring it", "O_DIRECT");
+					" work, ignoring it, use fileio_tgt "
+					"in O_DIRECT mode instead", "O_DIRECT");
 		#endif
 			} else if (!strncmp("NULLIO", p, 6)) {
 				p += 6;

Modified: trunk/scst/src/scst_targ.c
===================================================================
--- trunk/scst/src/scst_targ.c	2007-08-07 17:12:21 UTC (rev 156)
+++ trunk/scst/src/scst_targ.c	2007-08-08 09:52:23 UTC (rev 157)
@@ -2447,9 +2447,16 @@
 
 	if (unlikely(test_bit(SCST_CMD_ABORTED, &cmd->cmd_flags))) {
 		if (test_bit(SCST_CMD_ABORTED_OTHER, &cmd->cmd_flags)) {
-			TRACE_MGMT_DBG("Flag ABORTED OTHER set for cmd %p "
-				"(tag %llu), returning TASK ABORTED", cmd, cmd->tag);
-			scst_set_cmd_error_status(cmd, SAM_STAT_TASK_ABORTED);
+			if (cmd->completed) {
+				/* It's completed and it's OK to return its result */
+				clear_bit(SCST_CMD_ABORTED, &cmd->cmd_flags);
+				clear_bit(SCST_CMD_ABORTED_OTHER, &cmd->cmd_flags);
+			} else {
+				TRACE_MGMT_DBG("Flag ABORTED OTHER set for cmd "
+					"%p (tag %llu), returning TASK ABORTED",
+					cmd, cmd->tag);
+				scst_set_cmd_error_status(cmd, SAM_STAT_TASK_ABORTED);
+			}
 		}
 	}
 


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.