Thread: [Aoetools-discuss] Retransmit issues
Brought to you by:
ecashin,
elcapitansam
From: Tracy R. <tr...@ul...> - 2009-08-07 10:00:08
|
I currently have two AoE SANs deployed and they both have the same problem. So I must be missing something somewhere. I originally wrote about this last November: http://www.mail-archive.com/aoe...@li.../msg00136.html And never got the problem solved. I also did not get around to trying the patch Ed suggested. But I have a feeling there has got to be something I am doing wrong in the setup here. Performance didn't really matter much on that deployment at the time although it is becoming more important and I have just set up a second SAN with the same issue where I really do need it to perform. I have set up AoE SANs a few times before and got great performance. I'm not sure what could possibly be different this time. vblade-19 on the target side AoE v72 kernel module on the initiator. Using mtu 9000 on all of the interfaces involved. HP ProCurve 2810 switch with a dedicated VLAN for the AoE SAN. The switch is set up for 9000 MTU also. The initiator says: aoe: e0.0: setting 8704 byte data frames aoe: e1.0: setting 8704 byte data frames so I know it is getting the MTU right on that side. The initiator has a vlan interface for the SAN which then goes over the bonded link. cat /dev/etherd/err on the initiator produces lots of: unexpected rsp e2.0 tag=7e426f7f@102e56f91 s=0024e860c18a d=00219b916485 retransmit e2.0 oldtag=00736fa5@102e56faa newtag=00826faa s=00219b916485 d=0024e860c18a nout=1 unexpected rsp e2.0 tag=00736fa5@102e56fad s=0024e860c18a d=00219b916485 retransmit e2.0 oldtag=083b7005@102e5700e newtag=083c700e s=00219b916485 d=0024e860c18a nout=1 unexpected rsp e2.0 tag=083b7005@102e57016 s=0024e860c18a d=00219b916485 retransmit e2.0 oldtag=123d7082@102e5708b newtag=1245708b s=00219b916485 d=0024e860c18a nout=1 unexpected rsp e2.0 tag=123d7082@102e57095 s=0024e860c18a d=00219b916485 retransmit e2.0 oldtag=173870c9@102e570d6 newtag=174170d6 s=00219b916485 d=0024e860c18a nout=1 unexpected rsp e2.0 tag=173870c9@102e570dc s=0024e860c18a d=00219b916485 retransmit e2.0 oldtag=20c87147@102e57153 newtag=20cf7153 s=00219b916485 d=0024e860c18a nout=1 unexpected rsp e2.0 tag=20c87147@102e5715b s=0024e860c18a d=00219b916485 retransmit e2.0 oldtag=2aae71ca@102e571d0 newtag=2ab471d0 s=00219b916485 d=0024e860c18a nout=1 At this point I'm at a loss for what the problem could be. -- Tracy Reed http://tracyreed.org |
From: Ed C. <ec...@co...> - 2009-08-07 12:30:28
|
A couple of things come to mind. One is that there was a period where drivers could not use jumbo frames. That was followed by a long period where aoe drivers reported the maximum usable payload size in `aoe-stat`, but they would never use more than 4096-byte payloads. Starting with aoe6-49, the current period began, where the payload reported by aoe-stat is able to be fully used. So if you used to get good performance and now can't, it might just be because you were only using jumbos up to a size that your network equipment could handle well before upgrading to an aoe driver that used even bigger jumbos. You can test for that by using a 4200 MTU on your initiator network interfaces. Second, the latest drivers handle network congestion dynamically, in response to actual network conditions, and it is normal for retransmissions to occur as the "ideal" rate is momentarily exceeded and then backed away from. If you see short bursts of retransmits every few seconds, it probably isn't anything to worry about. A steady, rapid stream of retransmits could indicate a problem, and retransmits without related "unexpected responses" could also indicate a problem, specifically network packet loss. -- Ed |
From: Matthew I. <ma...@di...> - 2009-08-08 00:18:45
|
> vblade-19 on the target side AoE v72 kernel module on the > initiator. Using mtu 9000 on all of the interfaces involved. HP > ProCurve 2810 switch with a dedicated VLAN for the AoE SAN. The switch > is set up for 9000 MTU also. The initiator says: I had issues with a different HP set to 9000 MTU. The switch worked fine at 1500 MTU with very little retransmits and no noticeable throughput degradation but switching to 9000 seemed to confuse it and throughput went from around 90MB/s to around 1MB/s after a few seconds of continuous writing. Changing out the switch to a different brand cleared this ( running at 9000 MTU ) and it now has normal retransmissions without decreased throughput. This was running a newer kernel 2.6.26.8, newest vblade and the aoe initiator that comes with the vanilla kernel. Doing a direct connection worked fine for when I tested and like Ed said, it may have to do with your versions. -- Matth Ingersoll |