Hi Dave and Dave,
Please see below for more discussion about O_DIRECT which was posted by
Stephen C. Tweedie from RedHat several years
ago(http://www.ussg.iu.edu/hypermail/linux/kernel/0107.0/0484.html).
I think Ming's suggestion is right, IOMeter can offer people options to
enable or disable the use of O_DIRECT which depends on actual performance
benchmarking requirement.
Regards,
Liang Yang
****************Posted by Stephen C. Tweedie (sct@...) Jul 04 2001
****
O_DIRECT does not speed up sequential file accesses. If anything, it
may well slow them down, especially for writes. What O_DIRECT does is
twofold --- it guarantees physical IO to the disk (so that you know
for sure that the data is on disk for writes, or that the data on disk
is readable for reads); and it avoids the memory and CPU overhead of
keeping any cached copy of the data.
But because O_DIRECT is completely synchronous, it's not possible for
the kernel to implement its normal readahead and writebehind IO
clustering for direct IO. If you use the normal approach of writing
4k at a time to an O_DIRECT file, things may well be *massively*
slower than usual because the kernel is sending individual 4k IOs to
the disk, and because it is waiting for each IO to complete before the
application provides the next one.
On the contrary, buffered writes allow the kernel to batch those 4k
writes into large disk IOs, perhaps 100k or more; and the kernel can
maintain a queue of more than one such IO, so that once the first IO
completes the next one is immediately ready to be sent out.
For these reasons, buffered IO is often faster than O_DIRECT for pure
sequential access. The downside it its greater CPU cost and the fact
that it pollutes the cache (which, in turn, causes even _more_ CPU
overhead when the VM is forced to start reclaiming old cache data to
make room for new blocks.)
O_DIRECT is great for cases like multimedia (where you want to
maximise CPU available to the application and where you know in
advance that the data is unlikely to fit in cache) and databases
(where the application is caching things already and extra copies in
memory are just a waste of memory). It is not an automatic win for
all applications.
Cheers,
Stephen
***************************************************************
----- Original Message -----
From: "Harder, David W." <David.Harder@...>
To: <david_solina@...>
Cc: <iometer-user@...>;
<iometer-devel@...>
Sent: Friday, October 27, 2006 3:05 PM
Subject: Re: [Iometer-devel] [Iometer-user] Questions about
outstandingI/OQueue of Linux vs.TCQ/NCQ.
>I think we got into that earlier on this mailing list.
>
> It appears to be a side effect of using O_DIRECT (a.k.a Direct I/O). If
> you use Direct I/O, then all I/O to the media appears to be synchronous
> (it's a kernel problem).
>
> The problem is that if you don't use Direct I/O, then linux makes heroic
> efforts to read-ahead or write-back the I/O (depending on read or write,
> respectively), in which case the page-cache ends up masking several of
> the characteristics of the I/O pattern you're trying to measure
> including but not limited to queue depth. (You sort-of end up with the
> opposite end of the spectrum, maximum possible queueing for write
> transactions and read-ahead for read transactions.)
>
> -Dave-
>
> -----Original Message-----
> From: Solina, David [mailto:david_solina@...]
> Sent: Friday, October 27, 2006 3:08 PM
> To: Harder, David W.
> Subject: RE: [Iometer-user] [Iometer-devel] Questions about outstanding
> I/OQueue of Linux vs.TCQ/NCQ.
>
>
> Hi Dave,
> On the IOmeter list you wrote:
>
> 1) You need one queue in order to populate the other. In other words,
> if your application does not tell the OS to issue multiple asynchronous
> I/O (or allow the OS to do this implicitly for you), then the block
> device will never have more than one I/O in it's queue.
>
> Have you been able to get IOmeter to do this under Linux?
>
> We have done testing with multiple instances of dynamo on the target
> system and have shown we only get one IO to the subsystem per dynamo
> instance regardless of the queue depth selected per worker on the client
> side.
>
> Thanks
> David
>
> -----Original Message-----
> From: iometer-user-bounces@...
> [mailto:iometer-user-bounces@...] On Behalf Of Harder,
> David W.
> Sent: Friday, October 27, 2006 3:49 PM
> To: yangliang_mr@...; iometer-user@...;
> iometer-devel@...
> Subject: Re: [Iometer-user] [Iometer-devel] Questions about outstanding
> I/OQueue of Linux vs.TCQ/NCQ.
>
> Depends on the RAID, your RAID configuration, and your IOMeter profile.
>
> For example, if you have 16 drives in a RAID-0 array and you have a
> capable RAID controller (that can handle 256+ I/Os), then a small-block
> random profile in IOMeter would typically result in roughly 16 I/Os to
> each drive.
>
> If you have a large-block sequential profile with the same hardware
> configuration, the number of I/Os would be somewhere in between 16 and
> 256 with a tendency towards 256 (depending on the RAID controller).
>
> -Dave-
>
> -----Original Message-----
> From: Liang Yang [mailto:yangliang_mr@...]
> Sent: Friday, October 27, 2006 2:32 PM
> To: Harder, David W.; iometer-user@...;
> iometer-devel@...
> Subject: Re: [Iometer-devel] Questions about outstanding I/O Queue of
> Linux vs.TCQ/NCQ.
>
> Hi Dave,
>
> Thanks for your answer. For the depth of TAG queue of SAS drives,
> theoretically it could be 65535 as the SAS device can support up to that
> many of initators. But in practice, 64 is the range most of SAS drives
> are using.
>
> As you mentioned the RAID stuff in your answer, I just have a question
> for using outstanding I/Os on RAID volumes:
> Suppose you have a RAID volume which is built on 16 SAS drives. When you
> treat this RAID volume as single physcial drive and use IOMeter to do
> some physcial drive performance testing. You set the queue depth in
> IOMeter to 256. Does each drive behind this RAID volume get 256 I/O, or
> the RAID controller card will divide 256 I/O between 16 drives and each
> drive just get 16 I/Os?
>
> Best regards,
>
> Liang
>
> ----- Original Message -----
> From: "Harder, David W." <David.Harder@...>
> To: <multisyncfe991@...>; <iometer-user@...>;
> <iometer-devel@...>
> Sent: Friday, October 27, 2006 12:04 PM
> Subject: RE: [Iometer-devel] Questions about outstanding I/O Queue of
> Linux vs.TCQ/NCQ.
>
>
> 1) You need one queue in order to populate the other. In other words,
> if your application does not tell the OS to issue multiple asynchronous
> I/O (or allow the OS to do this implicitly for you), then the block
> device will never have more than one I/O in it's queue.
>
> 2) I might be wrong, but I believe it is possible for block devices to
> support queue depths much greater than 32 or 64. For example, you could
> have a block device that is actually a SAS-SAS (or SAS-SATA) RAID 16
> drives behind it. (It might even be indistinguishable from a SAS drive
> as far as the host is concerned.)
>
> 3) IMHO, I doubt there are any measurable negative effects of having an
> application I/O queue that is greater than the block device's I/O queue.
> (i.e. If you have a block device with a queue size of 64 and your
> application queues 70 I/Os, the performance will not be any worse than
> if your application queued 64.)
>
> -Dave-
>
> -----Original Message-----
> From: iometer-devel-bounces@...
> [mailto:iometer-devel-bounces@...] On Behalf Of Liang
> Yang
> Sent: Friday, October 27, 2006 1:35 PM
> To: iometer-user@...;
> iometer-devel@...
> Subject: [Iometer-devel] Questions about outstanding I/O Queue of Linux
> vs.TCQ/NCQ.
>
> Hi,
> Under Linux, we can issue different number of outstanding I/O to the
> block device, e.g. the queue depth can be set from 1 to 256 from
> IOMeter. However, the hard disk drive itself may have its own queue
> (e.g. NCQ--Native tagged command queue for SATA drives and TCQ--tagged
> command queue for SAS/SCSI drives). A typical queue depth for NCQ is 32
> and TCQ is 64. The outstanding I/O queue should be controlled by the
> block device driver and RAID controller card/SAS HBA firmware. The
> NCQ/TCQ should be controlled by the hard disk drive firmware.
>
> I have two questions here:
> If outstanding I/O queue has different queue depth from NCQ/TCQ, will
> this affect the I/O performance, e.g. the Linux outstanding I/O queue is
> much longer than the TCQ/NCQ in the hard disk drive?
>
> Why do we need to maintain two queues here? Does the duplicate queue
> cause additional overhead?
>
> Could anyone give me some explanation about this?
>
> Thanks,
>
> Liang
>
>
> ------------------------------------------------------------------------
> -
> Using Tomcat but need to do more? Need to support web services,
> security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Iometer-devel mailing list
> Iometer-devel@...
> https://lists.sourceforge.net/lists/listinfo/iometer-devel
>
>
> ------------------------------------------------------------------------
> -
> Using Tomcat but need to do more? Need to support web services,
> security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Iometer-user mailing list
> Iometer-user@...
> https://lists.sourceforge.net/lists/listinfo/iometer-user
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Iometer-devel mailing list
> Iometer-devel@...
> https://lists.sourceforge.net/lists/listinfo/iometer-devel
>
|