Re: [Etherboot-discuss] SRP timeout
Brought to you by:
marty_connor,
stefanhajnoczi
From: Itay G. <ita...@gm...> - 2010-07-13 14:41:08
|
Michael, Do you have an idea? What can be the problem with the arbel driver? Itay On Mon, Jul 12, 2010 at 3:18 AM, M Lowe <ml...@sh...> wrote: > I have been able to log the debug messages now however I see no errors > that would indicate where the problem is. > > Just to recap quickly, the problem is that san-booting over InfiniBand > using SRP doesn't work and just times out. The timeout occurs while > waiting for a response to the SRP login request. I'm fairly certain the > problem lies within gPXE because I can access the SRP target just fine > through a local installation of Windows. In addition, on the SRP target > side I have traced through the ib_srpt module and found that a login > response is generated and sent (or at least posted to the mthca module > work queue). > > On the gPXE side I've found that I'm not receiving the SRP_LOGIN_RSP > packet even at the InfiniBand protocol level (net/infiniband.c). So far > I have been able to determine the packet is lost at some point in the > Arbel driver (drivers/infiniband/arbel.c) before arbel_complete().This > would indicate the problem exists within the Arbel driver and explains > why SRP sanboot worked with the Hermon driver. Despite compiling with > DEBUG=arbel:3 I get no errors indicating there are any problems or > dropped packets. > > Here is the output from autoboot with > DEBUG=srp,ipoib,arp,infiniband,ib_cm,ib_cmrc,ib_mcast,ib_mi,ib_packet,ib > _pathrec,ib_sma,ib_smc,ib_srp > > Note: I have added some debug messages to help illustrate the flow of > packets. At the beginning of ipoib_complete_recv, ib_complete_recv, and > ib_mi_complete_recv I have added "RX" debug messages. > > Booting from root path > "ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c9020 > 022e5e4:0002c9020022e5e4" > SRP 0xbb134 using > ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c90200 > 22e5e4:0002c9020022e5e4 > SRP attached successfully > IBDEV 0xb9a84 creating completion queue > IBDEV 0xb9a84 created 8-entry completion queue 0xbb4c4 (0xbb214) with > CQN 0x83 > IBDEV 0xb9a84 creating queue pair > IBDEV 0xb9a84 created queue pair 0xbb4f4 (0xbb5c4) with QPN 0x550403 > IBDEV 0xb9a84 QPN 0x550403 has 4 send entries at [0xbb5a0,0xbb5b0) > IBDEV 0xb9a84 QPN 0x550403 has 2 receive entries at [0xbb5b0,0xbb5b8) > CMRC 0xbb1b4 using QPN 550403 > SRP 0xbb134 TX login request tag 0000000000000001 > CM 0xbbb64 created for IBDEV 0xb9a84 QPN 550403 > CM 0xbbb64 connecting to fe800000:00000000:0002c902:0022e5e5 > 0002c902:0022e5e4 > MI 0xba564 TX TID 6750584500000003 (03,02,01,0035) status 0000 > infiniband RX > MI 0xba564 RX > MI 0xba564 RX TID 6750584500000003 (03,02,81,0035) status 0000 > IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0 > rate 6 > MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 > MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 > MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 > MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 > infiniband RX > IPoIB 0xb9ccc RX > ARP cache add: IP 10.20.76.1 => IPoIB > 80000404:fe800000:00000000:0002c902:0022e5e5 > ARP reply: IP 10.20.76.45 => IPoIB > 00550402:fe800000:00000000:0002c902:00243035 > IPoIB peer 4 has MAC 80000404:fe800000:00000000:0002c902:0022e5e5 > MI 0xba564 TX TID 6750584500000005 (03,02,01,0035) status 0000 > infiniband RX > MI 0xba564 RX > MI 0xba564 RX TID 6750584500000005 (03,02,81,0035) status 0000 > MI 0xba564 RX TID 6750584500000005 handling via transaction handler > IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0 > rate 6 > infiniband RX > IPoIB 0xb9ccc RX > ARP cache update: IP 10.20.76.1 => IPoIB > 80000404:fe800000:00000000:0002c902:0022e5e5 > ARP reply: IP 10.20.76.45 => IPoIB > 00550402:fe800000:00000000:0002c902:00243035 > MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 > MI 0xba564 abandoning TID 6750584500000004 > CM 0xbbb64 connection request failed: Connection timed out (0x4c206035) > CMRC 0xbb1b4 disconnected: Connection timed out (0x4c206035) > SRP 0xbb134 socket closed: Connection timed out (0x4c206035) > > > > From: Itay Gazit [mailto:ita...@gm...] > Sent: Friday, June 25, 2010 11:47 AM > To: Stefan Hajnoczi; M Lowe > Cc: eth...@li...; gpxe; Michael Brown > Subject: Re: [Etherboot-discuss] SRP timeout > > Hi Matthew, > Stefan is right, you should reduce the DEBUG messages depth to find the > fail cause. > I have tried SRP boot only with Hermon driver (ConnectX) and it worked > for me. > Regards, > Itay > |