Re: [Scst-devel] Re: The performance

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Mark Buechler wrote:
> The bigger readahead may work well for highly sequencial IO, but can be 
> a major loss on say reading many smaller files since there's a whole lot 
> of unused IO for each read.

Yes, correct. This is why you shouldn't use too big read-ahead size. But 
Linux read-ahead code does a good job detecting non-sequential access 
and shrinking the read-ahead window for it. What you set via /sys (or 
/proc for 2.4) is the maximum read-ahead size. The minimum one remains 
close to 0.

> The performance I'm getting, though substantially lower than local IO, 
> is acceptable for both my production and test setups. I'll continue to 
> research based on your information, however. I have access to a very 
> fast SAN attached array at work which has a Linux initiator connected. I 
> do some testing with tiobench and see what I get with that.

I will appreciate if you also tune the performance using Iometer and 
send us the results. Tiobench is very good for corner cases, but Iometer 
should better suit real life usage.

Vlad

> - Mark.
> 
> On 3/20/06, *Vladislav Bolkhovitin* <vs...@vl... <mailto:vs...@vl...>> 
> wrote:
> 
>     At first, (just in case) a bit of theory. When SCSI commands with equal
>     request sizes are issued from initiator to target one by one, the next
>     one after the previous one finished, for FILEIO device overall time of
>     each commands execution consists of:
> 
>     1. Time, spent for local preparations on the initiator, like building
>     IOCBs for the SCSI controller.
> 
>     2. SCSI (FC in our case) transport latency (one way). It includes not
>     only time to transfer the data, but also link setup time and any other
>     overhead.
> 
>     3. Time, spent on the target for receiving the command with its data
> 
>     4. Time, spent by VFS for reading or writing the data and time, spent
>     for submitting data to/from underlaying storage device.
> 
>     5. The storage device latency.
> 
>     6. Time, spent for on the target preparing the result for sending back
>     to the initiator, like building IOCBs for the target card.
> 
>     7. SCSI transport latency (way back).
> 
>     8. Time, spent for local preparations on the initiator, like reading the
>     result from the card and deliver it up to the requester program.
> 
>     The throughput will be overall_time*request_size.
> 
>     In the local case, when you read from local SCSI devices there are no
>     steps 1-3 and 6-7. Additionally, FC transport latency is much bigger
>     than one for local SCSI. Moreover, there is additional latency on
>     step 5.
> 
>     Thus, not matter which throughput your FC card has, when doing one by
>     one commands execution you will always have unexpectedly low
>     performance.
> 
>     The only way to get the maximum of your hardware, is to have at least
>     one command on all processing steps, i.e. the commands should be issued
>     in batches. The kernel VM subsystem is optimized for the local case, so
>     its settings are not optimal for our FC case. For example, incrementing
>     read-ahead size to 512K leads to increase in sequential READs in 2
>     times. This is related to both initiator and target systems.
> 
>     Unfortunately, I have no time to do the detail investigation and find
>     all the affecting kernel parameters as well as the best values for
>     them.
>     I will appreciate, if someone perform that study and share the results
>     with us. I hope, that above notes will help him to understand the
>     background and where to dig. If you have any questions, don't hesitate
>     to ask.
> 
>     Anyway, looks like with bigger read-ahead we have the performance close
>     to expected. Am I wrong?
> 
>     Vlad
> 
>