Thread: [Aoetools-discuss] multipath I/O
Brought to you by:
ecashin,
elcapitansam
From: Matthew I. <ma...@di...> - 2009-02-25 04:09:07
|
I'm hoping somebody can verify my logic on how MPIO works with aoe/ vblade. If an aoe target machine exported a file using a command like vbladed 0 0 eth0 /path/to/file and repeated this process using the same file to broadcast on a second ethernet port: vbladed 0 0 eth1 /path/to/file On the initiator with 4 ethernet ports we would see: e0.0 xyzGB eth0,eth1,eth2,eth3 up The results are what I expected: In my write tests, I see traffic balancing on interfaces eth0-3. On the target side, traffic flows on eth0-1 in what appears to be a round- robin manner. On killing a single vblade, the throughput is almost cut in half after aoe sees it as down. My question then is, can corruption occur when having two vblades running for the same file or is this the recommended MPIO method or are there better solutions (this ones seems great to me)? -- Matth Ingersoll <ma...@di...> |
From: Tracy R. <tr...@ul...> - 2009-02-25 04:51:56
|
On Tue, Feb 24, 2009 at 07:55:03PM -0800, Matthew Ingersoll spake thusly: > I'm hoping somebody can verify my logic on how MPIO works with aoe/ > vblade. Awesome timing. I was just looking into setting up MPIO also. iSCSI has MC/S and certain people in my shop are a big fan of it. I want to show that we can do the same thing with AoE and MPIO more simply. What distribution are you using? Do you find dm-multipath to be stable? > If an aoe target machine exported a file using a command like > > vbladed 0 0 eth0 /path/to/file > > and repeated this process using the same file to broadcast on a second > ethernet port: > > vbladed 0 0 eth1 /path/to/file > > On the initiator with 4 ethernet ports we would see: > e0.0 xyzGB eth0,eth1,eth2,eth3 up So the target has two interfaces but the initiator has 4? > The results are what I expected: > > In my write tests, I see traffic balancing on interfaces eth0-3. On > the target side, traffic flows on eth0-1 in what appears to be a round- > robin manner. On killing a single vblade, the throughput is almost > cut in half after aoe sees it as down. Almost in half? How linear is the scaling? I'm wondering what happens when we get 8 interfaces in a machine. We have quad port gig-e stuff in our lab. It would be fun to put two of them in each target and initiator. > My question then is, can corruption occur when having two vblades > running for the same file or is this the recommended MPIO method or > are there better solutions (this ones seems great to me)? AoE itself is stateless so that's a big plus. As both vblade processes are dealing with exactly the same kernel page cache I would expect it to be fine. I look forward to seeing what others have to say. -- Tracy Reed http://tracyreed.org |
From: Matthew I. <ma...@di...> - 2009-02-25 07:57:54
|
As for dm-multipath being stable, haven't used it enough to know. On Feb 24, 2009, at 8:51 PM, Tracy Reed wrote: > On Tue, Feb 24, 2009 at 07:55:03PM -0800, Matthew Ingersoll spake > thusly: >> I'm hoping somebody can verify my logic on how MPIO works with aoe/ >> vblade. > > Awesome timing. I was just looking into setting up MPIO also. iSCSI > has MC/S and certain people in my shop are a big fan of it. I want to > show that we can do the same thing with AoE and MPIO more simply. What > distribution are you using? Do you find dm-multipath to be stable? Running Debian 5.0. As for dm-multipath being stable, haven't used it enough to know and hopefully won't need it. > > >> If an aoe target machine exported a file using a command like >> >> vbladed 0 0 eth0 /path/to/file >> >> and repeated this process using the same file to broadcast on a >> second >> ethernet port: >> >> vbladed 0 0 eth1 /path/to/file >> >> On the initiator with 4 ethernet ports we would see: >> e0.0 xyzGB eth0,eth1,eth2,eth3 up > > So the target has two interfaces but the initiator has 4? Yep. I wanted to make it clear that the initiator was different and could keep up with the target. I'm going to bet that by having only two interfaces on the initiator, you'll get the same throughput. > >> The results are what I expected: >> >> In my write tests, I see traffic balancing on interfaces eth0-3. On >> the target side, traffic flows on eth0-1 in what appears to be a >> round- >> robin manner. On killing a single vblade, the throughput is almost >> cut in half after aoe sees it as down. > > Almost in half? How linear is the scaling? I'm wondering what happens > when we get 8 interfaces in a machine. We have quad port gig-e stuff > in our lab. It would be fun to put two of them in each target and > initiator. An example with two ports (on target, always 4 on the initiator) using dd: dd if=/dev/zero of=test bs=1M count=128 oflag=direct 182 MB/s ... and one port on the target: dd if=/dev/zero of=test bs=1M count=128 oflag=direct 102 MB/s So it doesn't really double, but it may relate to other bottlenecks other than disk I/O and network (but we're still pretty much hitting the limit here). These were run on the XFS filesystem. One thing to be aware of is how fast the system can run locally. So if the network scales right by adding more ports, I would keep putting in NIC's until I reach the local I/O speed or the vblades eat up too many resources. Its basically a race to see what gets saturated first. > > >> My question then is, can corruption occur when having two vblades >> running for the same file or is this the recommended MPIO method or >> are there better solutions (this ones seems great to me)? > > AoE itself is stateless so that's a big plus. As both vblade processes > are dealing with exactly the same kernel page cache I would expect it > to be fine. I look forward to seeing what others have to say. > Looking at the vblade code, it uses the system calls open, read and write. So I think the main question would be, what makes it so the writes don't overlap? Didn't see any locking, is it another subsystem that handles the organization? > -- > Tracy Reed > http://tracyreed.org |
From: Ed C. <ec...@co...> - 2009-02-25 20:36:49
|
On Wed, Feb 25, 2009 at 12:01:10AM -0800, Matthew Ingersoll wrote: ... > Looking at the vblade code, it uses the system calls open, read and > write. So I think the main question would be, what makes it so the > writes don't overlap? Didn't see any locking, is it another subsystem > that handles the organization? No, just the kernel. The process scheduler can schedule both vblade processes, and so I would not expect order of write operations to be preserved reliably. Although it's true that there's only one page cache, the buffers of each vblade process are independent. In short, using two vblade processes on the same backing store in this way makes me very uncomfortable. It hasn't been tested, but even with testing, race conditions are not always going to be noticed. It seems like using a single process capable of using more than one local interface would be a more straightforward approach. -- Ed Cashin <ec...@co...> |
From: Tracy R. <tr...@ul...> - 2009-02-25 20:45:25
|
On Wed, Feb 25, 2009 at 03:36:36PM -0500, Ed Cashin spake thusly: > preserved reliably. Although it's true that there's only one page > cache, the buffers of each vblade process are independent. Ah. Good point. Any way to make the one vblade process listen on multiple interfacse? Would that solve this issue? Makine AoE work with MPIO (which comes with RHEL among others) is a nice way to add scalability and redundancy. Previously one would have to use 802.3ad (LACP) to accomplish this. Not having to do such tweaking in the switch is attractive. -- Tracy Reed http://tracyreed.org |
From: Gabor G. <go...@sz...> - 2009-02-26 07:06:22
|
On Wed, Feb 25, 2009 at 12:45:11PM -0800, Tracy Reed wrote: > On Wed, Feb 25, 2009 at 03:36:36PM -0500, Ed Cashin spake thusly: > > preserved reliably. Although it's true that there's only one page > > cache, the buffers of each vblade process are independent. > > Ah. Good point. Any way to make the one vblade process listen on > multiple interfacse? Would that solve this issue? If I understand the issue correctly then no, that won't change anything. Both the network, the NIC and the kernel may re-order packets arriving on different interfaces wrt. each other, so listening on multiple interfaces won't give you any extra ordering guarantees. If you want to access the same same target from the same initiator via multiple paths, then the initiator's kernel will ensure that there are no overlapping requests, and the target's kernel will ensure that the different vblade processes have a consistent view of the device's contents (except if one of the vblade processes is configured to use direct I/O while the other is not, but if you do such thing then you deserve what you get). If you want to access the same target from different initiators, then the initiators must use some form of distributed locking to ensure they do not step on each other's toes; no amount of hacking on the vblade side can substitute that. Gabor -- --------------------------------------------------------- MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences --------------------------------------------------------- |
From: Matthew I. <ma...@di...> - 2009-02-26 06:23:15
|
Just finished a vblade-19 MPIO patch. So far it seems to work great for throughput and disables an interface on poll() error. I'm not sure how the change will be handle on the initiator - seems like aoe- revalidate works ok. Hopefully somebody can give me some feedback. diff -uprN vblade-19.orig/aoe.c vblade-19/aoe.c --- vblade-19.orig/aoe.c 2008-10-08 21:07:40.000000000 +0000 +++ vblade-19/aoe.c 2009-02-26 04:39:02.000000000 +0000 @@ -9,6 +9,7 @@ #include <sys/stat.h> #include <fcntl.h> #include <netinet/in.h> +#include <poll.h> #include "dat.h" #include "fns.h" @@ -22,11 +23,12 @@ int nmasks; char config[Nconfig]; int nconfig = 0; int maxscnt = 2; -char *ifname; +char **ifname; +int ifname_count = 0; int bufcnt = Bufcount; void -aoead(int fd) // advertise the virtual blade +aoead(int fd, uchar *mac_n, char *ifname_n) // advertise the virtual blade { uchar buf[2000]; Conf *p; @@ -35,14 +37,14 @@ aoead(int fd) // advertise the virtual b p = (Conf *)buf; memset(p, 0, sizeof *p); memset(p->h.dst, 0xff, 6); - memmove(p->h.src, mac, 6); + memmove(p->h.src, mac_n, 6); p->h.type = htons(0x88a2); p->h.flags = Resp; p->h.maj = htons(shelf); p->h.min = slot; p->h.cmd = Config; p->bufcnt = htons(bufcnt); - p->scnt = maxscnt = (getmtu(sfd, ifname) - sizeof (Ata)) / 512; + p->scnt = maxscnt = (getmtu(fd, ifname_n) - sizeof (Ata)) / 512; p->firmware = htons(FWV); p->vercmd = 0x10 | Qread; memcpy(p->data, config, nconfig); @@ -111,7 +113,7 @@ aoeata(Ata *p, int pktlen) // do ATA req // yes, this makes unnecessary copies. int -confcmd(Conf *p, int payload) // process conf request +confcmd(Conf *p, int payload, int fd, uchar *mac_n, char *ifname_n) // process conf request { int len; @@ -151,14 +153,14 @@ confcmd(Conf *p, int payload) // process memmove(p->data, config, nconfig); p->len = htons(nconfig); p->bufcnt = htons(bufcnt); - p->scnt = maxscnt = (getmtu(sfd, ifname) - sizeof (Ata)) / 512; + p->scnt = maxscnt = (getmtu(fd, ifname_n) - sizeof (Ata)) / 512; p->firmware = htons(FWV); p->vercmd = 0x10 | QCMD(p); // aoe v.1 return nconfig + sizeof *p - sizeof p->data; } void -doaoe(Aoehdr *p, int n) +doaoe(Aoehdr *p, int n, int fd, uchar *mac_n, char *ifname_n) { int len; enum { // config query header size @@ -174,7 +176,7 @@ doaoe(Aoehdr *p, int n) case Config: if (n < CHDR_SIZ) return; - len = confcmd((Conf *)p, n - CHDR_SIZ); + len = confcmd((Conf *)p, n - CHDR_SIZ, fd, mac_n, ifname_n); if (len == 0) return; break; @@ -184,11 +186,11 @@ doaoe(Aoehdr *p, int n) break; } memmove(p->dst, p->src, 6); - memmove(p->src, mac, 6); + memmove(p->src, mac_n, 6); p->maj = htons(shelf); p->min = slot; p->flags |= Resp; - if (putpkt(sfd, (uchar *) p, len) == -1) { + if (putpkt(fd, (uchar *) p, len) == -1) { perror("write to network"); exit(1); } @@ -199,10 +201,14 @@ aoe(void) { Aoehdr *p; uchar *buf; - int n, sh; + int i, n, sh; + int ifname_c = ifname_count; + int ifname_good = ifname_count; long pagesz; enum { bufsz = 1<<16, }; - + struct pollfd sfds[4]; + + memset(&sfds, 0, sizeof(sfds)); if ((pagesz = sysconf(_SC_PAGESIZE)) < 0) { perror("sysconf"); exit(1); @@ -215,36 +221,67 @@ aoe(void) if (n & (pagesz - 1)) buf += pagesz - (n & (pagesz - 1)); - aoead(sfd); + for(n = 0; n < ifname_c; n++) { + aoead(sfd[n], mac[n], ifname[n]); + sfds[n].fd = sfd[n]; + sfds[n].events = POLLIN; + } for (;;) { - n = getpkt(sfd, buf, bufsz); - if (n < 0) { - perror("read network"); - exit(1); + if(poll(sfds, ifname_c, -1) < 1) { + perror("poll"); + return; + } + for(i = 0; i < ifname_c; i++) { + if(sfds[i].revents & POLLIN) { + n = getpkt(sfds[i].fd, buf, bufsz); + if (n < 0) { + perror("read network"); + exit(1); + } + if (n < sizeof(Aoehdr)) + continue; + p = (Aoehdr *) buf; + if (ntohs(p->type) != 0x88a2) + continue; + if (p->flags & Resp) + continue; + sh = ntohs(p->maj); + if (sh != shelf && sh != (ushort)~0) + continue; + if (p->min != slot && p->min != (uchar)~0) + continue; + if (nmasks && !maskok(p->src)) + continue; + doaoe(p, n, sfds[i].fd, mac[i], ifname[i]); + + } else if (sfds[i].revents & POLLRDHUP || sfds[i].revents & POLLERR || sfds[i].revents & POLLNVAL) { + + if(ifname_good-- < 1) { + fprintf(stderr, "exiting, no good interfaces left.\n"); + fflush(stderr); + exit(1); + } + fprintf(stderr, "disabling interface %s because of poll() error. %d good interfaces left.\n", ifname[i], ifname_good); + sfds[i].revents = 0; + sfds[i].events = 0; + close(sfds[i].fd); + sfds[i].fd = -1; + + /* seems like readvertising the blade works best */ + /*for(n = 0; n < ifname_c; n++) { + if(i != n) + aoead(sfd[n], mac[n], ifname[n]); + }*/ + } } - if (n < sizeof(Aoehdr)) - continue; - p = (Aoehdr *) buf; - if (ntohs(p->type) != 0x88a2) - continue; - if (p->flags & Resp) - continue; - sh = ntohs(p->maj); - if (sh != shelf && sh != (ushort)~0) - continue; - if (p->min != slot && p->min != (uchar)~0) - continue; - if (nmasks && !maskok(p->src)) - continue; - doaoe(p, n); } } void usage(void) { - fprintf(stderr, "usage: %s [-b bufcnt] [-d ] [-s] [-r] [ -m mac[,mac...] ] shelf slot netif filename\n", + fprintf(stderr, "usage: %s [-b bufcnt] [-d ] [-s] [-r] [ -i iface] [ -m mac[,mac...] ] shelf slot netif filename\n", progname); exit(1); } @@ -305,12 +342,17 @@ int main(int argc, char **argv) { int ch, omode = 0, readonly = 0; - + int i = 0; + bufcnt = Bufcount; setbuf(stdin, NULL); atainit(); progname = *argv; - while ((ch = getopt(argc, argv, "b:dsrm:")) != -1) { + if((ifname = malloc(sizeof(*ifname)*1)) == NULL) { + perror("malloc"); + exit(1); + } + while ((ch = getopt(argc, argv, "b:dsrm:i:")) != -1) { switch (ch) { case 'b': bufcnt = atoi(optarg); @@ -329,6 +371,14 @@ main(int argc, char **argv) case 'm': setmask(optarg); break; + case 'i': + ifname_count++; + if((ifname = realloc(ifname, sizeof(*ifname)*(ifname_count+1))) == NULL) { + perror("malloc"); + exit(1); + } + ifname[ifname_count-1] = optarg; + break; case '?': default: usage(); @@ -348,9 +398,26 @@ main(int argc, char **argv) slot = atoi(argv[1]); size = getsize(bfd); size /= 512; - ifname = argv[2]; - sfd = dial(ifname, bufcnt); - getea(sfd, ifname, mac); + ifname_count++; + ifname[ifname_count-1] = argv[2]; + if((sfd = malloc(sizeof(int *)*ifname_count)) == NULL) { + perror("malloc"); + exit(1); + } + if((mac = malloc(sizeof(uchar *)*ifname_count)) == NULL) { + perror("malloc"); + exit(1); + } + + for(; i < ifname_count; i++) { + if((mac[i] = malloc(sizeof(uchar)*6)) == NULL) { + perror("malloc"); + exit(1); + } + + sfd[i] = dial(ifname[i], bufcnt); + getea(sfd[i], ifname[i], mac[i]); + } printf("pid %ld: e%d.%d, %lld sectors %s\n", (long) getpid(), shelf, slot, size, readonly ? "O_RDONLY" : "O_RDWR"); diff -uprN vblade-19.orig/dat.h vblade-19/dat.h --- vblade-19.orig/dat.h 2008-10-08 21:07:40.000000000 +0000 +++ vblade-19/dat.h 2009-02-26 03:39:11.000000000 +0000 @@ -115,8 +115,8 @@ enum { int shelf, slot; ulong aoetag; -uchar mac[6]; +uchar **mac; int bfd; // block file descriptor -int sfd; // socket file descriptor +int *sfd; // socket file descriptor vlong size; // size of vblade char *progname; diff -uprN vblade-19.orig/fns.h vblade-19/fns.h --- vblade-19.orig/fns.h 2008-10-08 21:07:41.000000000 +0000 +++ vblade-19/fns.h 2009-02-26 03:38:32.000000000 +0000 @@ -6,7 +6,7 @@ void aoe(void); void aoeinit(void); void aoequery(void); void aoeconfig(void); -void aoead(int); +void aoead(int fd, uchar *, char *); void aoeflush(int, int); void aoetick(void); void aoerequest(int, int, vlong, int, uchar *, int); diff -uprN vblade-19.orig/vblade.8 vblade-19/vblade.8 --- vblade-19.orig/vblade.8 2008-10-08 21:07:41.000000000 +0000 +++ vblade-19/vblade.8 2009-02-26 04:23:48.000000000 +0000 @@ -55,6 +55,11 @@ The -r flag restricts the export of the The -m flag takes an argument, a comma separated list of MAC addresses permitted access to the vblade. A MAC address can be specified in upper or lower case, with or without colons. +.TP +\fB-i\fP +The -i flag initializes and broadcasts on ethernet network interfaces +to enable MPIO support and increase throughput. You must still +specify another interface without using the -i flag. .SH EXAMPLE In this example, the root user on a host named .I nai On Feb 25, 2009, at 12:45 PM, Tracy Reed wrote: > On Wed, Feb 25, 2009 at 03:36:36PM -0500, Ed Cashin spake thusly: >> preserved reliably. Although it's true that there's only one page >> cache, the buffers of each vblade process are independent. > > Ah. Good point. Any way to make the one vblade process listen on > multiple interfacse? Would that solve this issue? Makine AoE work with > MPIO (which comes with RHEL among others) is a nice way to add > scalability and redundancy. Previously one would have to use 802.3ad > (LACP) to accomplish this. Not having to do such tweaking in the > switch is attractive. > > -- > Tracy Reed > http://tracyreed.org |
From: Matthew I. <ma...@di...> - 2009-02-28 01:28:57
Attachments:
vblade-19-mpio-1.diff
|
Finally had some time to read through my own patch and fixed a bug (static size set that should be dynamic for pollfd), cleaned up mallocs and made naming more consistent. I also realized that it was pasted inline and seems to have been corrupted at some point, this time I'm trying it as an attachment - is there a recommended method? -- Matth Ingersoll |
From: Matthew I. <ma...@di...> - 2009-02-28 04:52:39
|
I have done some testing today on the linux kernel 2.6.27.19 aoe implementation using the mpio vblade patch previously provided and have good results. Using the kernel driver from coraid does not seem like it will work correct for this. The reason being, I couldn't find a rerouting of packets to another target like in the linux source. The specific lines in the linux kernel 2.6.27.19 aoecmd.c line 550: if (n > HELPWAIT /* see if another target can help */ && (tt != d->targets || d->targets[1])) d->htgt = tt; aoe.h line 97: HELPWAIT = 20, My test setup: Target machine (tgt) with two network interfaces running a vblade for one file advertised on eth0 and eth1: vbladed -i eth1 1 0 eth0 /mnt/d2.img Initiator (intr) just using one interface eth0: intr:/# aoe-stat e1.0 10.485GB eth0 up intr:/# mkfs.xfs /dev/etherd/e1.0 ... intr:/# mount /dev/etherd/e1.0 /mnt -o _netdev Run something I know will take awhile: intr:/# cd /mnt/; dd if=/dev/zero of=1 bs=24k count=120000 oflag=sync Network stats on target while dd runs: tgt:/# ifstat eth0 eth1 KB/s in KB/s out KB/s in KB/s out 20963.33 1203.90 20961.22 1203.71 21828.00 1252.98 21828.47 1252.90 22015.32 1263.72 22014.22 1263.52 23466.55 1347.02 23468.52 1346.94 .... Bring down an interface on the target and check stats: tgt:/# ifconfig eth1 0 down; ifstat eth0 KB/s in KB/s out 0.06 0.14 0.12 0.13 0.93 0.34 0.18 0.13 0.36 0.13 0.06 0.13 0.06 0.13 0.18 0.17 0.18 0.13 0.47 0.41 3.52 3.44 0.15 0.13 4.51 0.52 0.21 0.13 7.72 6.38 1.98 1.88 0.12 0.13 0.06 0.13 0.06 0.13 0.06 0.13 0.18 0.13 0.18 0.13 36904.46 2088.69 43033.43 2436.36 Since HELPWAIT is set to 20 seconds and ifstat outputs about every 1 second, it lines up with that setting before eth0 started receiving packets again. While this was happening the initiator was sending retransmits (failing): intr:/# cat /dev/etherd/err retransmit e1.0 oldtag=09ed1911@1004e1971 newtag=09f11971 ... .... When HELPWAIT is reached, the retransmit errors stop. And eventually the dd finishes _after_ failing eth1 on the target: 2949316608 bytes (2.9 GB) copied, 121.712 seconds, 24.2 MB/s Another note, dmesg is clear of xfs/aoe errors. Can somebody else try testing? -- Matth Ingersoll On Feb 27, 2009, at 5:32 PM, Matthew Ingersoll wrote: > Finally had some time to read through my own patch and fixed a bug > (static size set that should be dynamic for pollfd), cleaned up > mallocs and made naming more consistent. I also realized that it > was pasted inline and seems to have been corrupted at some point, > this time I'm trying it as an attachment - is there a recommended > method? > > -- > Matth Ingersoll > > > <vblade-19-mpio-1.diff> > > > > On Feb 25, 2009, at 10:26 PM, Matthew Ingersoll wrote: > >> Just finished a vblade-19 MPIO patch. So far it seems to work great >> for throughput and disables an interface on poll() error. I'm not >> sure how the change will be handle on the initiator - seems like aoe- >> revalidate works ok. Hopefully somebody can give me some feedback. > > |
From: Matthew I. <ma...@di...> - 2009-03-06 09:01:25
Attachments:
vblade-19-mpio-2.diff
|
The last patch was not from a clean source directory. This one should be fine. -- Matth Ingersoll |
From: Ed C. <ec...@co...> - 2009-03-16 19:40:46
|
On Fri, Mar 06, 2009 at 01:04:53AM -0800, Matthew Ingersoll wrote: > The last patch was not from a clean source directory. This one should be > fine. Attachments seem to work OK for patches. It's a mail client issue, though. If your mail client can include text inline without changing it, then you can post patches inline. I had a couple of requests, if you wouldn't mind ... > ac01:/nfs/src# cat vblade-19-mpio-1.diff ... > int > -confcmd(Conf *p, int payload) // process conf request > +confcmd(Conf *p, int payload, int fd, uchar *mac_n, char *ifname_n) // process conf request I would rather avoid introducing new parts of the source that aren't comfortable for folks using 80-character editors. Also, thanks for updating the documentation: > diff -urpN vblade-19/vblade.8 vblade-19-mpio-1/vblade.8 > --- vblade-19/vblade.8 2008-10-08 14:07:41.000000000 -0700 > +++ vblade-19-mpio-1/vblade.8 2009-03-02 14:09:15.000000000 -0800 > @@ -55,6 +55,11 @@ The -r flag restricts the export of the > The -m flag takes an argument, a comma separated list of MAC addresses > permitted access to the vblade. A MAC address can be specified in upper > or lower case, with or without colons. > +.TP > +\fB-i\fP > +The -i flag initializes and broadcasts on ethernet network interfaces > +to enable MPIO support and increase throughput. You must still > +specify another interface without using the -i flag. > .SH EXAMPLE > In this example, the root user on a host named > .I nai I am in a bit of a hurry, so sorry if I am missing something obvious, but why does the user need to supply an interface explicitly when the "-i" flag is in use? -- Ed Cashin <ec...@co...> Find experimental aoe Linux driver patches at http://coraid.typepad.com/aoe_linux_proving_grounds/ |
From: Matthew I. <ma...@di...> - 2009-03-16 21:20:42
|
On Mar 16, 2009, at 12:39 PM, Ed Cashin wrote: > On Fri, Mar 06, 2009 at 01:04:53AM -0800, Matthew Ingersoll wrote: > ... > I would rather avoid introducing new parts of the source that aren't > comfortable for folks using 80-character editors. > I overlooked this since I was pressed for time and wasn't using my usual editor - I'll fix it. > I am in a bit of a hurry, so sorry if I am missing something obvious, > but why does the user need to supply an interface explicitly when the > "-i" flag is in use? Normally you start a vblade process by specifying an interface as argv[3] (minus getops) vbladed 0 0 eth0 /path/to/file So I wanted to keep that style and any extra interfaces to use could be specified with the "-i" option vbladed -i eth1 0 0 eth0 /path/to/file or vbladed -i eth1 -i eth2 0 0 eth0 /path/to/file I haven't really given it that much thought other than keeping the argv/getop parsing/changes down to a minimum. It could be modified to change the argv index after getop to assign the parameters correctly minus the "netif" parameter and possibly to comma separate the "-i" list like the "-m" option. The current way works fine for me but feel free to suggest or modify. -- Matth Ingersoll |
From: Matthew I. <ma...@di...> - 2009-03-16 22:16:10
Attachments:
linux-aoe-helpsecs.diff
|
Here is a related linux kernel patch to manually set the HELPWAIT seconds since it can conflict (should be smaller) with aoe_deadsecs. It works just like aoe_deadsecs, example: echo '4' > /sys/module/aoe/parameters/aoe_helpsecs The logic behind the patch example... In an mpio situation we could have one target advertised through two network interfaces eth0 and eth1. eth0 is connected to switch0 and eth1 is connected to switch1. The initiator also has two interfaces, eth0 using switch0 and eth1 using switch1. In the event of a failure on a single switch, the initiator will either down the device due to aoe_deadsecs or try another target when hitting HELPWAIT (old, now defined as aoe_helpsecs). In normal mpio operation, traffic will flow through eth0 and eth1 on both the target and initiator. To reduce the wait time and path failure detection, setting aoe_helpsecs to a smaller value will ease the transition with automatic failover/failback. In doing so, aoe_deadsecs will not be reached and no errors should occur. The patch has been generated from a vanilla kernel.org linux-2.6.26.8 kernel and has shown to work on 2.6.28.1 also (thats all I have tested). I looked through the source code for the driver from coraids site and see that somewhere along the lines HELPWAIT was abandoned but no reason why (or if its handled in another manner - doesn't seem to be from testing). I would propose adding this back in since it seems vital for this type of setup. -- Matth Ingersoll |