From: Denis V. <vd...@il...> - 2005-09-29 06:13:46
|
On Wednesday 28 September 2005 23:50, Carlos Martin wrote: > On 28/09/05, Andreas Bie=DFmann <an...@bi...> wrote: > > Hi Carlos, > > > > just started a new thread to inform about this problem. > > > > Well, usb is working at the moment with acx-0926 by carlos included last > > patch from denis (see "Another USB patch). System hang's are gone. > > > > The next problem is unsuspected stop of the device while printing a lot > > of "txdesc is full, dropping tx" messages. > > Analyzing the log shows, that most of rx buffers died by EOVERFLOW > > signal before dropping tx, they are never used after this "acx: error in > > rx, data overrun -> emergency stop" message. > > > > Well, kerneldoc shows in Documentation/usb/error-codes.txt: > > ---8<--- > > (*) Error codes like -EPROTO, -EILSEQ and -EOVERFLOW normally indicate > > hardware problems such as bad devices (including firmware) or cables. > > ---8<--- > > > > Ok, defective cable rule out ;) >=20 > Try doing while true; ip l s wlan0 up; ip l s wlan0 down; done and see > what happens. You'll receive -EPROTO errors, but the device will > actually do as told (or so it seems). I should try to time how much we > should wait, but for example, when we send _CMD_DISABLE_TX/RX, there's > no time at all between them yet it works. I'm also guessing room > temperature might have something to do. When I managed to download at > 60k/s, it was cold because I had the window open, but it's usually > boiling here, and the device heats up quite a bit on its own. >=20 > > > > My idea is, that this overflow could be related to truncated rxbuf. This > > is a snippet of my log: > > > > ---8<--- > > Sep 27 20:41:26 localhost kernel: rx: unsupported 802.11 protocol > > Sep 27 20:41:26 localhost kernel: 65709219 <=3D=3D > > acx_l_rx_ieee802_11_frame: 00000001 >=20 > My guess is that somewhere we're doing something the rest doesn't underst= and. I suspect that Andreas forgot to show us a bit of log preceding this one. In that part of log we ought to see truncated packet or usb buffer overflow. Look here: static void acx100usb_i_complete_rx(struct urb *urb, struct pt_regs *regs) =2E.. /* check if previous frame was truncated ** FIXME: this code can only handle truncation ** of consecutive packets! */ ptr =3D inbuf; if (priv->rxtruncsize) { int tail_size; ptr =3D &priv->rxtruncbuf; packetsize =3D RXBUF_BYTES_RCVD(ptr) + RXBUF_HDRSIZE; if (acx_debug & L_USBRXTX) { printk("handling truncated frame (truncsize=3D%d si= ze=3D%d " "packetsize(from trunc)=3D%d)\n", priv->rxtruncsize, size, packetsize= ); acx_dump_bytes(ptr, RXBUF_HDRSIZE); acx_dump_bytes(inbuf, RXBUF_HDRSIZE); } This part of code suggests that acx may return more than one packet. They are about to arrive glued together in one big lump. We should identify them by looking at RXBUF_BYTES_RCVD(ptr) and splitting data accordingly. (btw... #define RXBUF_BYTES_RCVD(rxbuf) (le16_to_cpu(rxbuf->mac_cnt_rcvd) & 0xfff) #define RXBUF_BYTES_USED(rxbuf) \ ((le16_to_cpu(rxbuf->mac_cnt_rcvd) & 0xfff) + RXBUF_HDRSIZE) I'll use replace RXBUF_BYTES_RCVD+RXBUF_HDRSIZE -> RXBUF_BYTES_USED immedia= tely) Note that entire "combined" blob of data must fit into usb buffer we provide. But currently we provide buf sized for just one full-sized wlan frame: typedef struct rxbuffer { u16 mac_cnt_rcvd ACX_PACKED; /* only 12 bits are len! (0= xfff) */ u8 mac_cnt_mblks ACX_PACKED; u8 mac_status ACX_PACKED; u8 phy_stat_baseband ACX_PACKED; /* bit 0x80: used LNA (Low-= Noise Amplifier) */ u8 phy_plcp_signal ACX_PACKED; u8 phy_level ACX_PACKED; /* PHY stat */ u8 phy_snr ACX_PACKED; /* PHY stat */ u32 time ACX_PACKED; /* timestamp upon MAC rcv f= irst byte */ wlan_hdr_a3_t hdr_a3 ACX_PACKED; /* maximally sized data part of wlan packet */ u8 data_a3[WLAN_A4FR_MAXLEN_WEP_FCS - WLAN_HDR_A3_LEN] ACX_PAC= KED; } rxbuffer_t; typedef struct usb_rx { unsigned busy:1; struct urb *urb; wlandevice_t *priv; rxbuffer_t bulkin; } usb_rx_t; This may be the source of our problems. Older driver used 3000 byte buffer rounded down to 64 bytes. I believe we should not blindly return to that, but experiment a bit with heavy rx floods in order to find out how large buffer do we really need, and document that in comments. One way to do this may be to add logging to rx which reports all usb rx urbs which were larger than one full wlan packet. By using printk, not acxlog, we will force user to see/report it. inbuf =3D &rx->bulkin; size =3D urb->actual_length; remsize =3D size; acxlog(L_USBRXTX, "RETURN RX (%p) status=3D%d size=3D%d\n", rx, urb->status, size); + if(size > sizeof(rxbuffer_t)) + printk("acx: large usb rx: %d bytes, please report\n", size); Combined with bogus_padding which enlarges usb_rx to 4k: +struct usb_rx_for_sizeof { + unsigned busy:1; + struct urb *urb; + wlandevice_t *priv; + rxbuffer_t bulkin; +}; typedef struct usb_rx { unsigned busy:1; struct urb *urb; wlandevice_t *priv; rxbuffer_t bulkin; + u8 bogus_padding[4*1024 - sizeof(struct usb_rx_for_sizeof)]; } usb_rx_t; this will allow for mostly crash-less experiments. In general, guys, please take a look at this rx truncation/gluing code, I cleaned it up but it still needs review/testing. =46or example, what will happen if wlan packet gets split across two usb buffers and second usb transfer will error out? We will try to glue together first part of wlan packet which we already have with _unrelated_ packet which will be received next, right?! Don't we need to set priv->rxtruncsize =3D 0 in several places (usb rx errors, ifdown, etc..) in order to avoid this scenario? > > Sep 27 20:41:26 localhost kernel: post-merge size=3D1552 remsize=3D1406 > > Sep 27 20:41:26 localhost kernel: packet with packetsize=3D1459 > > Sep 27 20:41:26 localhost kernel: need to truncate packet, > > packetsize=3D1459 remsize=3D1406 size=3D1552 > > Sep 27 20:41:26 localhost kernel: A7 15 06 84 D0 42 67 43 65 AC 2B 83 > > Sep 27 20:41:26 localhost kernel: 65709219 =3D=3D> acx100usb_l_poll_rx > > Sep 27 20:41:26 localhost kernel: SUBMIT RX (cf1c12c8) inpipe=3D0xC0008= 980 > > size=3D2358 errcode=3D0 num=3D2 > > Sep 27 20:41:26 localhost kernel: 65709219 <=3D=3D acx100usb_l_poll_rx > > Sep 27 20:41:26 localhost kernel: 65709219 <=3D=3D acx100usb_i_complete= _rx > > Sep 27 20:41:26 localhost kernel: 65709220 =3D=3D> acx100usb_i_complete= _rx > > Sep 27 20:41:26 localhost kernel: RETURN RX (cf1c1c2c) status=3D-75 > > size=3D2358 num=3D3 > > Sep 27 20:41:26 localhost kernel: acx: error in rx, data overrun -> > > emergency stop > > ---8<--- > > > > You can see a wrong 802.11 packet, a truncation and next rxbuf dies ... > > (BTW, log is at > > http://www2.fh-sw.de/~k3317/acx100_logs/acx-0926-patched_rx-problems2.l= og.bz2) > > (another BTW, num=3D%i is only a number of rxbuffer, it is easier to re= ad > > than the pointer .. i hide the number in rx->padding[0]) >=20 > My sources should have that instead of the pointer. That was mainly a > side effect of Denis' patch where we didn't need to pass the number > anymore. Padding is to be filled where we otherwise would get > -EOVERFLOW by faking the rxbuffer_t size. That is, when I finally get > around to implementing that. This is btw the old method (almost), > which seemed to work. Or maybe that's just hiding the bug. =2D- vda |