#39 Poor performance under SLES 8

Chuck Tribolet

I'm running iometer-2004.07.30-post.DS1. I'm doing 512
byte block random read with eight workers to eight IBM
FAStT LUNs, so one LUN per worker. 1 IO per target.
Under Windows and Redhat Enterprise Linux 3 U3, I get
1500 IO/s with eight LUNs, and proportionately fewer
with fewer LUns. Under SuSE SLES 8, I get only 300,
and I get 300 with anywhwere from 2 to 8 LUNs. With
one LUN I get 150 (same as Windows and RHEL).

Looking at the the information the adapter card gives
me, I only get two commands outstanding at a time on
SLES, where on RHEL it's 8 commands (can't see in
Windows). But the adapter card will support more than
2 IO's. But starting eight dd commands in eight shell
sessions, I can get 40 or so. This makes me think it's
IOMETER that's the problem.

SLES kernel is 2.4.21-231-smp. Hardware is IBM x345 with
two 3.0GHZ Xeons, 4G memory, Qlogic QL2342 FC card.

Any ideas what's happening?

Chuck Tribolet


  • Chuck Tribolet
    Chuck Tribolet

    Logged In: YES

    I tried running four separate copies of dynamo on the same
    copy of
    SLES 8, and I got an almost 4x improvement in throughput., and
    about 4x more IOs outstanding. So it would seem that
    something is
    constraining the number of IOs per process.

    Chuck Tribolet

  • Chuck Tribolet
    Chuck Tribolet

    Logged In: YES

    In my original post RHEL meant RHEL 3.

    I did some more experiments, and the same problem also occurs in
    SLES 9. So I wrote a small C program, and determined. in SLES 9
    if I use the same aio interfaces that dynamo is using,
    IOs are done in the same order they are issued, no paralleism
    happens, and performance is bad. On RHEL 3, the IOs complete in
    random order, there's parallelism, and performance is good.
    But multiple threads with one IO get good performance, unlike
    IOmeter, leading me to suspect that dynamo uses a single thread
    for all IO, not a thread per worker.

    I've also had a couple of people e-mail me offline having
    the same

    Chuck Tribolet

  • Chuck Tribolet
    Chuck Tribolet

    Logged In: YES

    It turns out that the 1 IO limitation is per file
    descriptor, not per thread.

    So of there are eight threads sharing one file descriptor,
    only one IO
    be outstanding at a time. But if one thread has eight file
    open against the same file, and issues one aio_read per file
    there will be four IOs outstanding to the LUN at the same
    time, and
    getting handled by the RAID controller.

    Also, I tried lio. Even with eight commands passed to lio,
    the still
    get done one at a time, in sequence.

    Net: I don't think this is an IOMETER bug. I think it's
    probably in
    librt, which handles the sio calls and is part of
    in my case.

    Chuck Tribolet

  • AnneHoller

    Logged In: YES

    I recently encountered what I believe is this problem ("IOs
    done in the same order they are issued, no parallelism
    and performance is bad") & I wanted to post info about it

    For me, the problem involved an aio thread that consumed
    amounts of CPU, polling for io completion (io_getevents).
    When I
    switched from using the librtkaio library to the binary-
    librt library, I got the io performance I was expecting.

    The material under:
    "o Changed the AIO glibc interface from librtkaio to librt"
    in the following file was helpful for me: