This was suppossed to go to the list ...
-------- Original Message --------
Subject: Re: [Aoetools-discuss] bad performance with 64bit+32bit interaction
Date: Mon, 20 Aug 2007 10:27:01 -0700
From: kelsey hudson <khudson@...>
To: Vasco Névoa <vasco.nevoa@...>
Vasco Névoa wrote:
> I've been using the aoe module and the vblade server that come packaged
> by Ubuntu 7.04 with relative success.
> I've got a diskless system that boots PXE into an AoE root (a PATA
> disk), as well as mounting an AoE /home dir (a SW RAID assembly of 2
> PATA disks). As I said, it worked rather well, although not extremely fast.
> Now I've switched my diskless client system to a 64bit dual processor
> Ubuntu 7.04, and the performance has suffered greatly. The booting of
> the machine takes very long, and everytime I start an application it
> takes many seconds for the system to have disk access. After the data is
> in RAM cache, everything is ligtning fast.
> I've replaced the stock aoe and vblade with their latest compiled
> versions, but the behaviour is the same. Furthermore, the speed reported
> by hdparm -t /dev/etherd/e0.1 and e0.2 on the client (~30MB/s) is half
> of the speed declared by hdparm -t /dev/hda and /dev/md1 on the server
> I can't see where the delays are coming from. Can you give me pointers
> on debugging this?
> -direct gigabit crossover cable, with IP stack configured by DHCP during
> initramfs boot
> -no errors or packet loss on either eth* interface
Use ethtool to enable flow control on both sides, for both TX and RX.
That helps tremendously. Also, increase the interface MTU to 9000 on
both sides and set sys.net.core.rmem_default, sys.net.core.rmem_max,
sys.net.core.wmem_default, sys.net.core.wmem_max to 262144. That will
enable jumbo frames and increase the kernel ring buffer size to
compensate for the larger frames. These settings should be put into
/etc/sysctl.conf or whatever the debian equivalent is.
Also, consider getting a gigabit switch. Even for two devices, it will
make a big difference because of the port buffers.
> The only clue I've got so far is a cat /dev/etherd/err on the client
> that shows a few retries, but I don't think it is enough to justify
> these long delays:
> retransmit e0.1 oldtag=015be102@... newtag=015ce1f6
> s=001a92b8a88f d=00e04c69442e nout=1
> unexpected rsp e0.1 tag=015be102@...
Flow control and the changes above should fix this.
Additionally, there's probably something a bit more sinister going on.
If you're not using jumbo frames, then your frame size will likely be
less than the size of a cache page. That means accesses to the
underlying device will be performed in chunks that are less than the
size of one cache page (1024, typically, vs. 4096, the size of a cache
page in Linux). This will result in a penalty to refill the missing
blocks from cache (especially on writes). There are other problems going
on regarding DOS disk geometry, but I still haven't worked out a good
global solution to that problem yet.
Good luck, and hope this helps!