From: Adi Kriegisch <adi@cg...> - 2006-11-23 12:15:04
Thanx alot for your help! -- Short version: It works with aoe6-23! :-)
> The write is offset out of the initiator (ie, before processing by vblade=
I had a look at the dump as well and already thought so... therefor I tried=
aoe6-40 yesterday with no change.
compiling aoe6-23 was not that easy because I got the following error:
/root/aoe6-23/linux/drivers/block/aoe/aoenet.c: In function =E2=80=98aoenet=
/root/aoe6-23/linux/drivers/block/aoe/aoenet.c:142: error: too many argumen=
to function =E2=80=98skb_linearize=E2=80=99
looking at line 142 in aoenet.c I found the following:
if (skb_linearize(skb, GFP_ATOMIC) < 0)
comparing this to the similar line in aoe6-40 where it looked like:
I decided to remove "GFP_ATOMIC" and try to use it. It works just fine but =
painfully slow (ie transfer rates around 5MB/s). And even worse: about the=
same values (although a little higher) for the loopback device.
The following page refers to the changed skb_linearize API:=20
=46rom the page (for kernel 2.6.18):
"The skb_linearize() function has been reworked, and no longer has a GFP fl=
argument. There is also a new skb_linearize_cow() function which ensures th=
the resulting SKB is writable."
looking into the code of skb_linearize I found that
is called just before skb_linearize(skb) and I decided to disable this call=
the very same function is called from within the new skb_linearize. That ga=
me an enormous performance boost to where aoe should be: somewhere around=20
10MB/s. I did not do exact benchmarks but just had a quick look with dstat=
and did a "size/time" to get rough estimates.
To me it seems that using aoe6-23 with kernel 2.6.18 requires the following=
comment out line 141 [if (skb_is_nonlinear(skb))]
modify line 142:
=2D if (skb_linearize(skb, GFP_ATOMIC) < 0)
+ if (skb_linearize(skb) < 0)
Now I think it is time for a disclaimer: I AM NO KERNEL HACKER -- DON'T USE=
THIS CODE; I HAVE NO IDEA WHAT I AM DOING HERE BUT IT WORKS FOR ME... ;-)
Sam, maybe you could comment on the code modifications?!
I am not sure about the "<0" part...
> Starting with aoe6-24 we began doing "zero copy writes" (ZCW). We
> My only guess right now is that something in your kernel doesn't like
> having to linearize the skb. You can test this theory by removing the
> The aoe6-23 driver doesn't support jumbo frames, but right now,
> neither do you. :)
So true... I should have started my tests with the final hardware; but then=
this bug might not have been unveiled... :-)
> If this driver makes the problem go away then either we have a bug in
> our ZCW setup of the sk_buff, the network system has a bug in
> processing it ... or both!
I found some information about a bug in SCTP in combination with XEN in the=
The trouble is actually caused by more heavy fragmentation when using xen.
Probably this is related and there is already someone working on that?
Thank you -- again -- very much for your help! If I may assist you in furth=
debugging, feel free to contact me; you might as well get an account on the=
test machines if you need to have one (just send me your ssh key in=20