AoEde Git
Virtual ATA over ethernet (AoE) target for Linux
Status: Alpha
Brought to you by:
killer-r
INTRODUCTION
------------
The AoEde is the virtual EtherDrive (R) blade, a program that makes a
seekable file available over an ethernet local area network (LAN) via
the ATA over Ethernet (AoE) protocol.
The seekable file is typically a block device like /dev/md0 but even
regular files will work. Sparse files can be especially convenient.
When AoEde exports the block storage over AoE it becomes a storage
target. Another host on the same LAN can access the storage if it has
a compatible aoe kernel driver.
BUILDING
--------
The following command should build the AoEde program on a Linux-based
system:
make
For FreeBSD systems, include an extra parameter like so:
make PLATFORM=freebsd
For the list of configurable options and their description see config.h
EXAMPLES
--------
There is a "AoEded" script that daemonizes the program and sends its
output to the logger program. Make sure you have logger installed if
you would like to run AoEde as a daemon with the AoEded script.
ecashin@kokone AoEde$ echo 'I have logger' | logger
ecashin@kokone AoEde$ tail -3 /var/log/messages
Feb 8 14:52:49 kokone -- MARK --
Feb 8 15:12:49 kokone -- MARK --
Feb 8 15:19:56 kokone logger: I have logger
Here is a short example showing how to export a block device with a
AoEde. (This is a loop device backed by a sparse file, but you could
use any seekable file instead of /dev/loop7.)
ecashin@kokone AoEde$ make
cc -Wall -c -o aoe.o aoe.c
cc -Wall -c -o linux.o linux.c
cc -Wall -c -o ata.o ata.c
cc -o AoEde aoe.o linux.o ata.o
ecashin@kokone AoEde$ su
Password:
root@kokone AoEde# modprobe loop
root@kokone AoEde# dd if=/dev/zero bs=1k count=1 seek=`expr 1024 \* 4096` of=bd
-file
1+0 records in
1+0 records out
1024 bytes transferred in 0.009901 seconds (103423 bytes/sec)
root@kokone AoEde# losetup /dev/loop7 bd-file
root@kokone AoEde# ./AoEde 9 0 eth0 /dev/loop7
ioctl returned 0
4294968320 bytes
pid 16967: e9.0, 8388610 sectors
Here's how you can use the Linux aoe driver to access the storage from
another host on the LAN.
ecashin@kokone ecashin$ ssh makki
Last login: Mon Feb 7 10:25:04 2005
ecashin@makki ~$ su
Password:
root@makki ecashin# modprobe aoe
root@makki ecashin# aoe-stat
e9.0 eth1 up
root@makki ecashin# mkfs -t ext3 /dev/etherd/e9.0
mke2fs 1.35 (28-Feb-2004)
...
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 24 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
root@makki ecashin# mkdir /mnt/e9.0
root@makki ecashin# mount /dev/etherd/e9.0 /mnt/e9.0
root@makki ecashin# echo hooray > /mnt/e9.0/test.txt
root@makki ecashin# cat /mnt/e9.0/test.txt
hooray
Remember: be as careful with these devices as you would with /dev/hda!
Jumbo Frame Compatibility
-------------------------
AoEde can use jumbo frames provided your initiator is jumbo frame
capable. There is one small configuration gotcha to consider
to avoid having the AoEde kernel frequently drop frames.
AoEde uses a raw socket to perform AoE. The linux kernel will
only buffer a certain amount of data for a raw socket. For 2.6
kernels, this value is managed through /proc:
root@nai aoe# grep . /proc/sys/net/core/rmem_*
/proc/sys/net/core/rmem_default:128000
/proc/sys/net/core/rmem_max:128000
rmem_max is the max amount a user process may expand the receive
buffer to -- through setsockopt(...) -- and rmem_default is, as you
might expect, the default.
The gotcha is that this amount to buffer does not relate
to the amount of user data buffered, but the amount of
real data buffered. As an example, the Intel GbE controller
must be given 16KB frames to use an MTU over 8KB.
For each received frame, the kernel must be able to buffer
16KB, even if the aoe frame is only 60 bytes in length.
The linux aoe initiator will use 16 outstanding frames when
used with AoEde. A good default for ensuring frames are
not dropped is to allocate 16KB for 17 frames:
for f in /proc/sys/net/core/rmem_*; do echo $((17 * 16 * 1024)) >$f; done
Be sure to start AoEde after changing the buffering defaults
as the buffer value is set when the socket is opened.
AoE Initiator Compatibility
---------------------------
The Linux aoe driver for the 2.6 kernel is compatible if you use
aoe-2.6-7 or newer. You can use older aoe drivers but you will only
be able to see one AoEde per MAC address.
Extended features can be used only with target that supports them.
Currently only AoEDisk does this. You can find it here:
https://sourceforge.net/projects/aoedisk/
Experimental Tag tracking functionality
---------------------------
Current AoE specification doesn't declare how AoE packet's tag field can
be used by initiator. But at least some of them uses it as monotinically
incrementing unsigned 32-bit/liddle-endian counter for every new AoE packet.
This makes possible to reduce duplicated replies transmitted over network
(RX tags tracking) and to reduce risk of data corruption due to duplicated
by hardware packet came in wrong order (WRITE tags tracking).
Both this functionalities are disabled by default and can be enabled by
specifying command line argument. But be extremely carefull with WRITE tags
tracking, cause using it with initiator that doesn't meet above behaviour
can corrupt data! RX tags tracking option is more safe - while used with
correct initiator it will improve performance, using it with incompatible
initiator can just cause performance degradation, but not data corruption.
TBD: Add special command in Aoemask request or flag into Aoehdr so initiator
will explicitely manifest itself as following described tag field behaviour.
CRC data integrity verification
---------------------------
Normally ethernet hardware uses CRC32 to ensure transferred data integrity
(google 'Frame Check Sequence' for details). However there're cases when it
would be nice to have some additional integrity verification: to check that
hardware works flawless, or if user is very worried about data integrity and
ready to pay some CPU cycles and/or network throughput. This project is able
to check CRC of incoming write requests and apply CRC to read responces before
send them. 4 CRC8 checksums used for that. I selected such algorithm since its
faster than CRC16/32 and still good enough to _diagnose_ errors. However if
your hardware somewhy doesn't handle own FCS/CRC32 properly - Aoede's CRC is
not strong enough to guarantee data integrity if used alone.
Note that this feature implemented as extension that should be supported by
initiator and also must be explicitly activated by it in order to work.
When it activated: any write requests with mismatched CRC will be printed to
stderr and every read responce will be supplemented with 4 bytes of CRC.
Exported image freeze
---------------------------
AoEde able to freeze disk image for a while with shadowing all writes. This
means that its possible to stop any modifications to disk image file or device,
while keeping client to work with it as usual (may be only a bit slower than
usual). When received 'freeze' signal AoEde flushes all its internal buffers
and begin functioning in special freezed mode until gets 'unfreeze' signal.
Also you should specify temporary shadow file, where all writes will go instead
while actual image remains 'freeze'-d. Use option -f for that, like:
aoede -b 64 1 1 egiga0 /var/.AOE/disk11.fs -f /var/.AOE/shadow11.fs
Also make sure that shadow file located on filesystem that supports sparse files,
otherwise it will instantly occupy a bit more space than size of main image.
How to freeze/unfreeze? Very simple. Use kill for that:
freeze:
kill -USR1 ${AOEDE_PID}
kill -USR2 ${AOEDE_PID}
unfreeze:
kill -USR1 ${AOEDE_PID}
kill -USR1 ${AOEDE_PID}
kill -USR2 ${AOEDE_PID}
Note that freezing/unfreezing doesn't performed instantly. So after sending freeze
sequence wait until AoEde will create shadow file. And after sending unfreeze -
wait until that file will be removed.
Unfreezing flushes written data to main image 'in background', without stopping
serving initiator requests. However this process noticeable decreases performance
until finishes.
Multiple interfaces support
---------------------------
If you want to use AoEde with more than singke network interface you should build it
with MAX_NICS (in tuneup.h) undefined or set to planned maximum interfaces count.
Then you should launch it specifying all interfaces is command line, for example:
aoede 1 4 eth0 eth1 /data/AOE.fs
Note that launching several instances of aoede per single file (like it was possible
with vblade) is not supported due to AoEde has own userspace buffering.