Thread: Re: [Aoetools-discuss] bond0: received packet with own address as
Brought to you by:
ecashin,
elcapitansam
From: <b5...@en...> - 2008-02-11 12:11:46
|
> You'll likely want to use aoe-interfaces to restrict the aoe initiator > driver to only use br0. By default it will use all interfaces it > finds. Yes that solved the issue. I am not getting this messages anymore.. Thank you. > If this setup is just for aoe and both local interfaces are equal, you > can get rid of the bonding as the latest aoe initiators will round > robin across both local interfaces. A mechanism is in the driver to > stop using a local interface if it appears traffic has stopped flowing > through it. Actually we need bonding to get fast failover, cause STP is too slow and RSTP not available. The bridge is needed to get virtual machines on the net. Cheers, Holger |
From: <b5...@en...> - 2008-02-11 15:22:35
|
Hi, is there 802.1q support in vblade? Because I don't see any reaction of vblade to vlan packages... And yes, vlan tagging is working, for IP, ARP and AoE Packages. I tested it several times. AoE Server: # ifconfig eth0 0.0.0.0 up # vconfig add eth0 10 # ifconfig eth0.10 0.0.0.0 up # vblade 0 1 eth0.10 /dev/vg/aoe1 Aoe Client: # ifconfig eth0 0.0.0.0 up # vconfig add eth0 10 # ifconfig eth0.10 0.0.0.0 up # modprobe aoe aoe_iflist="eth0.10" # aoe-discover AoE Client interface eth0.10: 16:07:32.751016 00:1e:37:1c:e5:d3 > ff:ff:ff:ff:ff:ff, ethertype Unknown (0x88a2), length 32: 0x0000: ffff ffff ffff 001e 371c e5d3 88a2 1000 ........7....... 0x0010: ffff ff01 0000 0000 0000 0000 0000 0000 ................ AoE Server interface eth0.10: 16:07:23.737575 00:1e:37:1c:e5:d3 > Broadcast, ethertype Unknown (0x88a2), length 56: 0x0000: ffff ffff ffff 001e 371c e5d3 88a2 1000 ........7....... 0x0010: ffff ff01 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0030: 0000 0000 0000 0000 ........ But vblade doesnt send response packets. Please CC me. Cheers, Holger |
From: Justin C. D. <wi...@wi...> - 2008-02-11 18:53:43
|
I had a similar bug, the basis of which was vblade does a check on the packet length to see if it's less than 60 bytes, and disregards it if so. I had to change this check to 56 bytes, then it started working fine. I saw it discussed somewhere and I believe this fix is in for r15, but don't hold me to that. It has to do with Linux removing the vlan tag from the packet header, if I remember correctly. Anyway, it's in aoe.c. I can't work up a patch right now, though. Justin On 2/11/08, b5...@en... <b5...@en...> wrote: > > Hi, > > is there 802.1q support in vblade? Because I don't see any reaction of > vblade to vlan packages... And yes, vlan tagging is working, for IP, ARP > and AoE Packages. I tested it several times. > > > AoE Server: > # ifconfig eth0 0.0.0.0 up > # vconfig add eth0 10 > # ifconfig eth0.10 0.0.0.0 up > # vblade 0 1 eth0.10 /dev/vg/aoe1 > > Aoe Client: > # ifconfig eth0 0.0.0.0 up > # vconfig add eth0 10 > # ifconfig eth0.10 0.0.0.0 up > # modprobe aoe aoe_iflist="eth0.10" > # aoe-discover > > AoE Client interface eth0.10: > 16:07:32.751016 00:1e:37:1c:e5:d3 > ff:ff:ff:ff:ff:ff, ethertype Unknown > (0x88a2), length 32: > 0x0000: ffff ffff ffff 001e 371c e5d3 88a2 1000 ........7....... > 0x0010: ffff ff01 0000 0000 0000 0000 0000 0000 ................ > > AoE Server interface eth0.10: > 16:07:23.737575 00:1e:37:1c:e5:d3 > Broadcast, ethertype Unknown (0x88a2), > length 56: > 0x0000: ffff ffff ffff 001e 371c e5d3 88a2 1000 ........7....... > 0x0010: ffff ff01 0000 0000 0000 0000 0000 0000 ................ > 0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0x0030: 0000 0000 0000 0000 ........ > > But vblade doesnt send response packets. > Please CC me. > > Cheers, > Holger > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Aoetools-discuss mailing list > Aoe...@li... > https://lists.sourceforge.net/lists/listinfo/aoetools-discuss > |
From: <b5...@en...> - 2008-02-13 11:53:12
|
> The bug is in vblade, not the aoe driver. > > File: vblade-14/aoe.c > Function: void aoe(void) > > ... > if (n < 60) > continue; > ... > > Replacing 60 with 56 fixed this for me. > > Justin I'd like to report this to a bugtracker but there is none, except this mailinglist. So, please fix this in CVS. >> >> is there 802.1q support in vblade? Because I don't see any reaction >> of >> >> vblade to vlan packages... And yes, vlan tagging is working, for IP, >> ARP >> >> and AoE Packages. I tested it several times. Cheers, Holger |
From: <b5...@en...> - 2008-03-02 12:39:20
|
Hi, I found the following messages in my log: kernel: aoe: aoe_init: AoE v22 initialised. kernel: aoe: 000ea61b8064 e0.2 v400b has 4194304 sectors kernel: aoe: 000ea61b8064 e201.0 v400b has 2097152 sectors kernel: aoe: 000ea61b8064 e9.1 v400b has 8388608 sectors kernel: aoe: 000ea61b8064 e7.1 v400b has 4194304 sectors kernel: aoe: 000ea61b8064 e202.0 v400b has 2097152 sectors kernel: aoe: 000ea61b8064 e1.4 v400b has 2097152 sectors kernel: aoe: 000ea61b8064 e1.1 v400b has 2097152 sectors kernel: aoe: 000ea61b8064 e1.3 v400b has 2097152 sectors kernel: aoe: 000ea61b8064 e1.2 v400b has 2097152 sectors kernel: aoe: 000ea61b8064 e0.1 v400b has 4194304 sectors kernel: aoe: 000ea61b8064 e1.0 v400b has 6291456 sectors kernel: aoe: 000ea61b8064 e101.0 v400b has 2097152 sectors kernel: etherd/e0.2: unknown partition table kernel: etherd/e201.0:<6>aoe: ataid_complete: can't schedule work for e1.3, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e9.1, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e7.1, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e202.0, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e1.4, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e1.1, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e1.2, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e0.1, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e1.0, it's already on! (This really shouldn't happen). kernel: kernel: aoe: ataid_complete: can't schedule work for e101.0, it's already on! (This really shouldn't happen). But it seems to cause no harm. After that I can use this devices without any problems. At least I don't recognize some.. Someone has a clue whats the meaning of this error and why it happens if it shouldn't happen? Your help is appreciated and please CC me. Cheers, Holger |
From: Ed L. C. <ec...@co...> - 2008-02-13 21:09:47
|
Hi. There is a patch for vblade-14 that brings it to version 15 and includes the change where vblade only tests for the existance of the data it uses, not for the presence of 60 bytes. Anyone interested in this change, please test this prerelease patch. I would like to receive confirmation that it helps and works well. Here is a URL where you can find the patch (for now). http://noserose.net/e/temp/vblade-14-15.diff It should apply in the vblade-14 source directory when you use patch as follows: patch -p1 < /tmp/vblade-14-15.diff ... with whatever download location you used instead of "/tmp". -- Ed L Cashin <ec...@co...> |
From: Justin C. D. <wi...@wi...> - 2008-02-19 18:23:08
|
Hi, Ed. I've been out of the office for a week, but: I haven't had time to figure out why, but vblade-14 with this patch seems to drop my average single vblade process throughput to about 50mb/sec from 250mb/sec. It does seem to be specific to the build, though, as I can stack five vblade processes running at 50mb/sec or so without losing performance to any of them. CPU usage of vblade doesn't seem to change, though. As an aside, we're investigating writing a alternate, threaded vblade daemon that uses a thread pool to support tagging and emulate async I/O per the AoE RFC, since this disk array is actually capable of about 1.5gb/s reads if the local machine I/O scheduler can see far enough in advance. It's attached via dual 10GbE to 14 machines, so getting the demand put onto the disk array shouldn't be a problem -- NFS is considerably faster than vblade in this regard right now (over 700mb/s on average .. but a lot of TCP overhead, even with offloading on the controllers enabled, netpipe isn't much faster). Part of the problem we're having now with vblade is that the number of seeks to the raw block devices are kind of high under load since the local machine can't optimize them. Before I put effort into this, I have two questions: A) Is this necessary if using kvblade (e.g. can kvblade process multiple tagged requests simultaneously -- if so I may just try to start trying to maintain it since it's fallen behind a little)? And, B) does the AOE driver actually support tagging (I haven't been able to find a mention of it anywhere)? Thanks, Justin On 2/13/08, Ed L. Cashin <ec...@co...> wrote: > Hi. There is a patch for vblade-14 that brings it to version 15 and > includes the change where vblade only tests for the existance of the > data it uses, not for the presence of 60 bytes. > > Anyone interested in this change, please test this prerelease patch. > I would like to receive confirmation that it helps and works well. > Here is a URL where you can find the patch (for now). > > http://noserose.net/e/temp/vblade-14-15.diff > > It should apply in the vblade-14 source directory when you use patch > as follows: > > patch -p1 < /tmp/vblade-14-15.diff > > ... with whatever download location you used instead of "/tmp". > > -- > Ed L Cashin <ec...@co...> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Aoetools-discuss mailing list > Aoe...@li... > https://lists.sourceforge.net/lists/listinfo/aoetools-discuss > |
From: Ed L. C. <ec...@co...> - 2008-02-20 21:26:02
|
Justin C. Darby, hello. I would not expect the version 15 prerelease patch to have any effect on performance. It did have a flaw that interfered with AoE discovery, and I will create a new patch soon. For information about kvblade, I think you should check its documentation and source. I have not heard of any AoE-related software that works with VLAN tags. -- Ed L Cashin <ec...@co...> |
From: Ed L. C. <ec...@co...> - 2008-02-20 22:31:51
|
On Wed, Feb 13, 2008 at 04:10:00PM -0500, Ed L. Cashin wrote: > Hi. There is a patch for vblade-14 that brings it to version 15 and > includes the change where vblade only tests for the existance of the > data it uses, not for the presence of 60 bytes. The patch at the URL below fixes and enhances the work done to create the preliminary patch mentioned above. http://noserose.net/e/temp/vblade-14-15-v3.diff Any feedback is welcome. -- Ed L Cashin <ec...@co...> |
From: Tracy R R. <tr...@ul...> - 2008-03-02 22:26:51
|
b5...@en... wrote: > Hi, > > I found the following messages in my log: > > kernel: aoe: aoe_init: AoE v22 initialised. I know this used to happen in a really old version of AoE. I think I saw it a year and a half ago. Always make sure you are using the latest version of both the kernel module and vblade or whatever target you are using. -- Tracy R Reed Read my blog at http://ultraviolet.org Key fingerprint = D4A8 4860 535C ABF8 BA97 25A6 F4F2 1829 9615 02AD Non-GPG signed mail gets read only if I can find it among the spam. |
From: <b5...@en...> - 2008-03-03 13:05:23
|
> b5...@en... wrote: >> Hi, >> >> I found the following messages in my log: >> >> kernel: aoe: aoe_init: AoE v22 initialised. > > I know this used to happen in a really old version of AoE. I think I saw > it a year and a half ago. Always make sure you are using the latest > version of both the kernel module and vblade or whatever target you are > using. But its not an destructive error, right? Cause I use the ubuntu gutsy aoetools in a test environment and don't like to recompile them. I will definitely switch to newer versions in productive use. Thanks for responding Holger > > -- > Tracy R Reed Read my blog at http://ultraviolet.org > Key fingerprint = D4A8 4860 535C ABF8 BA97 25A6 F4F2 1829 9615 02AD > Non-GPG signed mail gets read only if I can find it among the spam. > |
From: Sam H. <sa...@co...> - 2008-03-03 15:03:07
|
Hello Holger, > But its not an destructive error, right? Cause I use the ubuntu gutsy > aoetools in a test environment and don't like to recompile them. I will > definitely switch to newer versions in productive use. Correct, it should not be destructive. The driver is structured a little differently now, but there used to be a few things that the driver needed to do that required the ability to go to sleep. The soft-interrupt routines that handle the incoming aoe packets can't sleep, so they'd arrange for a worker thread to handle these things out of band. When the work can't be scheduled you'll see the message you've seen. This sleepy work surrounds installing the devices in the system so it's not surprising that when you see the message you only see some of the aoe devices you know are available. You should be able to run aoe-discover and have the missing devices appear, but I agree with Tracy; best to update to the latest available driver: http://coraid.com/support/linux/ The aoetools are bundled with this standalone driver. If you've become accustomed to using the in-kernel aoe driver and the ubuntu aoetools, you'll probably want to uninstall the ubuntu aoetools before upgrading just to keep the system clean. That's if I understand you correctly. I don't use ubuntu. :) Cheers, Sam |
From: Tracy R R. <tr...@ul...> - 2008-03-03 17:45:18
|
b5...@en... wrote: > But its not an destructive error, right? Cause I use the ubuntu gutsy > aoetools in a test environment and don't like to recompile them. I will > definitely switch to newer versions in productive use. Correct. It should not hurt anything. |
From: Jon N. <jne...@ja...> - 2008-03-13 02:29:27
|
On Wed, Feb 20, 2008 at 5:33 PM, Ed L. Cashin <ec...@co...> wrote: > On Wed, Feb 13, 2008 at 04:10:00PM -0500, Ed L. Cashin wrote: > > Hi. There is a patch for vblade-14 that brings it to version 15 and > > includes the change where vblade only tests for the existance of the > > data it uses, not for the presence of 60 bytes. > > The patch at the URL below fixes and enhances the work done to create > the preliminary patch mentioned above. > > http://noserose.net/e/temp/vblade-14-15-v3.diff > > Any feedback is welcome. Using the recently-released vblade-15 I can see AoE devices over loopback. Yay! Using version 59 of the driver on the openSUSE x86-64 kernel 2.6.22.17-0.1-default, I run into a problem. Within 30 seconds of writing to the aoe device I get: Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev etherd/e0.0, sector 137181 Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev etherd/e0.0, sector 137182 Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev etherd/e0.0, sector 137183 Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev etherd/e0.0, sector 137184 Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev etherd/e0.0, sector 137185 The loglines (some 200,000+ of them) continue for only another 30 seconds or so but 2-3 minutes goes by until I can do *any* I/O on the system at all. I mean to say that *all* I/O is completely stalled, not just that to the AoE device. Any ideas? -- Jon |
From: Ed L. C. <ec...@co...> - 2008-03-13 12:50:35
|
On Wed, Mar 12, 2008 at 09:29:28PM -0500, Jon Nelson wrote: .. > Using the recently-released vblade-15 I can see AoE devices over loopback. > Yay! Great. Thanks for the report. > Using version 59 of the driver on the openSUSE x86-64 kernel > 2.6.22.17-0.1-default, I run into a problem. Within 30 seconds of > writing to the aoe device I get: > > Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev > etherd/e0.0, sector 137181 The I/O error can happen after the aoe driver marks a device as "down". Do you see a message like that from the aoe driver before you see the I/O errors? ... > The loglines (some 200,000+ of them) continue for only another 30 > seconds or so but 2-3 minutes goes by until I can do *any* I/O on the > system at all. I mean to say that *all* I/O is completely stalled, not > just that to the AoE device. Any ideas? Sometimes you can get too much batching, so that there's so much in-kernel work to do that the system is not as responsive. I notice that you are talking about writing data. You can encourage the dirty data to be flushed out more quickly by tightening some controls. http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19 One thing you could do on the target side would be to run the vblade in strace or in a debugger, to see whether there's something happening to it. -- Ed L Cashin <ec...@co...> |
From: Jon N. <jne...@ja...> - 2008-03-13 13:36:35
|
On Thu, Mar 13, 2008 at 7:48 AM, Ed L. Cashin <ec...@co...> wrote: > On Wed, Mar 12, 2008 at 09:29:28PM -0500, Jon Nelson wrote: > > Using version 59 of the driver on the openSUSE x86-64 kernel > > 2.6.22.17-0.1-default, I run into a problem. Within 30 seconds of > > writing to the aoe device I get: > > > > Mar 12 21:18:00 turnip kernel: end_request: I/O error, dev > > etherd/e0.0, sector 137181 > > The I/O error can happen after the aoe driver marks a device as > "down". Do you see a message like that from the aoe driver before you > see the I/O errors? No. > > The loglines (some 200,000+ of them) continue for only another 30 > > seconds or so but 2-3 minutes goes by until I can do *any* I/O on the > > system at all. I mean to say that *all* I/O is completely stalled, not > > just that to the AoE device. Any ideas? > > Sometimes you can get too much batching, so that there's so much > in-kernel work to do that the system is not as responsive. I notice > that you are talking about writing data. You can encourage the dirty > data to be flushed out more quickly by tightening some controls. It's not unresponsive (as in slow), it's unresponsive as in already-open shells will hang doing *anything* for 2-3 minutes. All I/O appears to be completely stalled. > http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19 > > One thing you could do on the target side would be to run the vblade > in strace or in a debugger, to see whether there's something happening > to it. Ah - I did run it under strace. Sorry about that - the vblade is in a stuck write(2) call, typically a small one, to the disk. -- Jon |