AoEde Git
Virtual ATA over ethernet (AoE) target for Linux
Status: Alpha
Brought to you by:
killer-r
INTRODUCTION ------------ The AoEde is the virtual EtherDrive (R) blade, a program that makes a seekable file available over an ethernet local area network (LAN) via the ATA over Ethernet (AoE) protocol. The seekable file is typically a block device like /dev/md0 but even regular files will work. Sparse files can be especially convenient. When AoEde exports the block storage over AoE it becomes a storage target. Another host on the same LAN can access the storage if it has a compatible aoe kernel driver. BUILDING -------- The following command should build the AoEde program on a Linux-based system: make For FreeBSD systems, include an extra parameter like so: make PLATFORM=freebsd For the list of configurable options and their description see config.h EXAMPLES -------- There is a "AoEded" script that daemonizes the program and sends its output to the logger program. Make sure you have logger installed if you would like to run AoEde as a daemon with the AoEded script. ecashin@kokone AoEde$ echo 'I have logger' | logger ecashin@kokone AoEde$ tail -3 /var/log/messages Feb 8 14:52:49 kokone -- MARK -- Feb 8 15:12:49 kokone -- MARK -- Feb 8 15:19:56 kokone logger: I have logger Here is a short example showing how to export a block device with a AoEde. (This is a loop device backed by a sparse file, but you could use any seekable file instead of /dev/loop7.) ecashin@kokone AoEde$ make cc -Wall -c -o aoe.o aoe.c cc -Wall -c -o linux.o linux.c cc -Wall -c -o ata.o ata.c cc -o AoEde aoe.o linux.o ata.o ecashin@kokone AoEde$ su Password: root@kokone AoEde# modprobe loop root@kokone AoEde# dd if=/dev/zero bs=1k count=1 seek=`expr 1024 \* 4096` of=bd -file 1+0 records in 1+0 records out 1024 bytes transferred in 0.009901 seconds (103423 bytes/sec) root@kokone AoEde# losetup /dev/loop7 bd-file root@kokone AoEde# ./AoEde 9 0 eth0 /dev/loop7 ioctl returned 0 4294968320 bytes pid 16967: e9.0, 8388610 sectors Here's how you can use the Linux aoe driver to access the storage from another host on the LAN. ecashin@kokone ecashin$ ssh makki Last login: Mon Feb 7 10:25:04 2005 ecashin@makki ~$ su Password: root@makki ecashin# modprobe aoe root@makki ecashin# aoe-stat e9.0 eth1 up root@makki ecashin# mkfs -t ext3 /dev/etherd/e9.0 mke2fs 1.35 (28-Feb-2004) ... Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 24 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. root@makki ecashin# mkdir /mnt/e9.0 root@makki ecashin# mount /dev/etherd/e9.0 /mnt/e9.0 root@makki ecashin# echo hooray > /mnt/e9.0/test.txt root@makki ecashin# cat /mnt/e9.0/test.txt hooray Remember: be as careful with these devices as you would with /dev/hda! Jumbo Frame Compatibility ------------------------- AoEde can use jumbo frames provided your initiator is jumbo frame capable. There is one small configuration gotcha to consider to avoid having the AoEde kernel frequently drop frames. AoEde uses a raw socket to perform AoE. The linux kernel will only buffer a certain amount of data for a raw socket. For 2.6 kernels, this value is managed through /proc: root@nai aoe# grep . /proc/sys/net/core/rmem_* /proc/sys/net/core/rmem_default:128000 /proc/sys/net/core/rmem_max:128000 rmem_max is the max amount a user process may expand the receive buffer to -- through setsockopt(...) -- and rmem_default is, as you might expect, the default. The gotcha is that this amount to buffer does not relate to the amount of user data buffered, but the amount of real data buffered. As an example, the Intel GbE controller must be given 16KB frames to use an MTU over 8KB. For each received frame, the kernel must be able to buffer 16KB, even if the aoe frame is only 60 bytes in length. The linux aoe initiator will use 16 outstanding frames when used with AoEde. A good default for ensuring frames are not dropped is to allocate 16KB for 17 frames: for f in /proc/sys/net/core/rmem_*; do echo $((17 * 16 * 1024)) >$f; done Be sure to start AoEde after changing the buffering defaults as the buffer value is set when the socket is opened. AoE Initiator Compatibility --------------------------- The Linux aoe driver for the 2.6 kernel is compatible if you use aoe-2.6-7 or newer. You can use older aoe drivers but you will only be able to see one AoEde per MAC address. Extended features can be used only with target that supports them. Currently only AoEDisk does this. You can find it here: https://sourceforge.net/projects/aoedisk/ Experimental Tag tracking functionality --------------------------- Current AoE specification doesn't declare how AoE packet's tag field can be used by initiator. But at least some of them uses it as monotinically incrementing unsigned 32-bit/liddle-endian counter for every new AoE packet. This makes possible to reduce duplicated replies transmitted over network (RX tags tracking) and to reduce risk of data corruption due to duplicated by hardware packet came in wrong order (WRITE tags tracking). Both this functionalities are disabled by default and can be enabled by specifying command line argument. But be extremely carefull with WRITE tags tracking, cause using it with initiator that doesn't meet above behaviour can corrupt data! RX tags tracking option is more safe - while used with correct initiator it will improve performance, using it with incompatible initiator can just cause performance degradation, but not data corruption. TBD: Add special command in Aoemask request or flag into Aoehdr so initiator will explicitely manifest itself as following described tag field behaviour. CRC data integrity verification --------------------------- Normally ethernet hardware uses CRC32 to ensure transferred data integrity (google 'Frame Check Sequence' for details). However there're cases when it would be nice to have some additional integrity verification: to check that hardware works flawless, or if user is very worried about data integrity and ready to pay some CPU cycles and/or network throughput. This project is able to check CRC of incoming write requests and apply CRC to read responces before send them. 4 CRC8 checksums used for that. I selected such algorithm since its faster than CRC16/32 and still good enough to _diagnose_ errors. However if your hardware somewhy doesn't handle own FCS/CRC32 properly - Aoede's CRC is not strong enough to guarantee data integrity if used alone. Note that this feature implemented as extension that should be supported by initiator and also must be explicitly activated by it in order to work. When it activated: any write requests with mismatched CRC will be printed to stderr and every read responce will be supplemented with 4 bytes of CRC. Exported image freeze --------------------------- AoEde able to freeze disk image for a while with shadowing all writes. This means that its possible to stop any modifications to disk image file or device, while keeping client to work with it as usual (may be only a bit slower than usual). When received 'freeze' signal AoEde flushes all its internal buffers and begin functioning in special freezed mode until gets 'unfreeze' signal. Also you should specify temporary shadow file, where all writes will go instead while actual image remains 'freeze'-d. Use option -f for that, like: aoede -b 64 1 1 egiga0 /var/.AOE/disk11.fs -f /var/.AOE/shadow11.fs Also make sure that shadow file located on filesystem that supports sparse files, otherwise it will instantly occupy a bit more space than size of main image. How to freeze/unfreeze? Very simple. Use kill for that: freeze: kill -USR1 ${AOEDE_PID} kill -USR2 ${AOEDE_PID} unfreeze: kill -USR1 ${AOEDE_PID} kill -USR1 ${AOEDE_PID} kill -USR2 ${AOEDE_PID} Note that freezing/unfreezing doesn't performed instantly. So after sending freeze sequence wait until AoEde will create shadow file. And after sending unfreeze - wait until that file will be removed. Unfreezing flushes written data to main image 'in background', without stopping serving initiator requests. However this process noticeable decreases performance until finishes. Multiple interfaces support --------------------------- If you want to use AoEde with more than singke network interface you should build it with MAX_NICS (in tuneup.h) undefined or set to planned maximum interfaces count. Then you should launch it specifying all interfaces is command line, for example: aoede 1 4 eth0 eth1 /data/AOE.fs Note that launching several instances of aoede per single file (like it was possible with vblade) is not supported due to AoEde has own userspace buffering.