The AoE initiator (the side using the storage) called "aoe" does retransmit AoE write commands for aoe_deadsecs seconds.  The virtual memory subsystem does buffer writes to filesystems.  The aoe_deadsecs module parameter is configurable.  An issue that is possibly related to your problems is briefly described below.

Often the problem is not too little buffering of writes but too much of it.  For writes to a filesystem, the data is actually modified in RAM, then at some point later, the dirty data in RAM is flushed out to the persistent storage.  If the system waits too long, it can cause things to get clogged up.

In a nutshell, the virtual memory subsystem's defaults were created before 64-bit systems were common and before large amounts of RAM were common.  You can use some VM settings to encourage dirty pages writes to be written out by the process generating the writes more quickly, so that performance is more consistent.

some example settings in the EtherDrive HOWTO FAQ:

  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19

Linux Weekly News article about this problem:

  http://lwn.net/Articles/572911/

On 1/15/14, 11:14 AM, James R. Leu wrote:
We see a similar issue with vblade when it becomes CPU starved
due to resource contention on our AOE server.

It would be nice if in these situations the AOE client would queueue write
blocks and resend unack'd writes.

On Wed, Jan 15, 2014 at 04:52:36PM +0100, Lars Täuber wrote:
Hi,

I experience some problems with the latest ggaoed version and a fresh ubuntu 14.04 aoe client (from the daily snapshots).

http://code.google.com/p/ggaoed/source/list

The kernel version on the client side is 3.13.0-3-generic


# modinfo aoe
filename:       /lib/modules/3.13.0-3-generic/kernel/drivers/block/aoe/aoe.ko
version:        85
description:    AoE block/char driver for 2.6.2 and newer 2.6 kernels
author:         Sam Hopkins <sah@coraid.com>
license:        GPL
srcversion:     5F0AC5D858A1164C5170585

The client is a testing box but the server is in productive state for years. So I can't change the server config.


I did a tcpdump and see that the server stops sending a response to the last write request of a series of write requests.
9 seconds after the client waited for responses without receiving any paket from the target it issues a "Query Config Information Request" and marks the device as read only. This results in a read-only filesystem.
The responses to the "Query Config Information Requests" can be seen right after the requests.

I can "repair" this with an aoe-revalidate and remounting rw.
But this appears to happen right with the next longer write operation.

I'm stuck here.

It seems the client doesn't resend unresponded requests. Is this on purpose?

Thanks
Lars

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk


_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss