[Madwifi-devel] MADWifi & XSupplicant - 100% CPU Spike/malformed packets
Status: Beta
Brought to you by:
otaku
|
From: Terry S. <Terry.Simons@m.cc.utah.edu> - 2003-12-12 06:36:45
|
I think I've finally begun to track down the problem that causes xsupplicant and MADWifi to fail to work together. The problem is intriguing to say the least. MADWifi receives 802.1x frames just fine, but the AP never seems to receive the responses from the Atheros card, even though Ethereal (locally) claims that the frames left the interface. At first I thought the problem was that the LLC frame was being stuck in place of the source address of the 802.1x frames, but as Greg and Mathieu explained to me, that is a different seemingly harmless bug. I set up a sniffer on another box so I could get a look at the frames leaving the interface, but oddly enough there aren't any! That's the reason xsupplicant doesn't work with MADWifi. Because of the LLC issues from before, I had this odd notion that maybe frames were leaving the interface (especially since Ethereal claimed that they were) but they were messed up somehow. I turned debugging on in the driver, and set Ethereal to run in monitor mode (locally again since I don't have another computer that can run in monitor mode at the moment). What I found was rather odd. There appears to be a bug that xsupplicant triggers that causes the driver to lose its lunch. This causes 100% CPU utilization on my laptop as well. Here's the issue: After throwing the card into montior mode and running xsupplicant, ethereal immediately started receiving a very large number of "malformed" packets. (More than 32,000 malformed frames in less than 1 second!). Each of the packets is identical, and is essentially the 802.1x EAPOL start frame that xsupplicant sent out, but each one of these malformed packets is missing the LLC header... The exact contents of the packet, as reported by ethereal is: 00 40 05 d0 53 80 00 0d 54 98 ac e1 88 8e 01 01 00 00 The breakdown of the frame is like this: [ DST MAC ] [ SRC MAC ] [TYPE] [EAPoL GUNK] [00 40 05 d0 53 80] [00 0d 54 98 ac e1] [88 8e] [01 01 00 00] The LLC information should have been stuck right before the TYPE field, but it isn't there. It looks like the packet is missing each of the following (as compared to another correct EAP packet grabbed by Ethereal): Prism Monitoring Header & LLC gunk. The frame identically matches the buffer passed in from xsupplicant, and only appears to be missing the wireless bits tha would make it a good frame. Now, the weird thing is that xsupplicant only needs to send one of these frames and the driver freaks out.... spewing the same thing over and over again and filling my syslog with messages like: Dec 11 21:32:19 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1143 Dec 11 21:32:20 centrinix last message repeated 36134 times Dec 11 21:32:20 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1143id 0 flags 1143 Dec 11 21:32:20 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1143 Dec 11 21:32:20 centrinix last message repeated 6586 times Dec 11 21:32:20 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1143id 0 flags 1143 Dec 11 21:32:20 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1143 Dec 11 21:32:22 centrinix last message repeated 70721 times Dec 11 21:32:22 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1id 0 flags 1143 Dec 11 21:32:22 centrinix kernel: ath_hardstart: discard, invalid 0 flags 1143 I also got a few register dumps here and there. Stopping the supplicant doesn't cease the messages in syslog or in Ethereal... I keep getting the same malformed frames showing up over and over in Ethereal, as well as the syslog messages. Top showed that my CPU was running at 100% utilization for the duration of the tests. Popping the wireless card made the CPU load drop and the messages to cease. Reinserting the card did not cause any ill effects, until xsupplicant was run once again. This is earily similar to a 100% CPU bug that seems to exist in the reference Atheros driver for Windows (triggerable on Centrinos). I'm testing with a Centrino right now, but I'm not sure if the bug is even related. (The windows bug exists on every 5212 I've tested, so I really think it's an Atheros reference driver bug...) I have thrown the syslog information and a sniff of the traffic here: http://www.laptop.lib.utah.edu/~terry/linux/madwifi/ Packet # 333 (at 32.28 seconds) begins to show the problem of the malformed packet hammering that causes 100% CPU drain. Packet 32735 shows up less than 1 second later... at 33.04 seconds. The sniff is considerably large (over 1MB when gzipped), so I didn't want to attach it to this message. Information that may or may not be useful: Laptop Model: HP/Compaq Centrino nx7000 Card type: 5212 Linux centrinix 2.4.23 #3 Fri Nov 28 17:17:57 MST 2003 i686 unknown iwconfig Version 26 Compatible with Wireless Extension v16 or earlier, Currently compiled with Wireless Extension v16. Kernel Currently compiled with Wireless Extension v16. ath0 Recommend Wireless Extension v13 or later, Currently compiled with Wireless Extension v16. ath0 IEEE 802.11 ESSID:"WardriveMe" Mode:Monitor Frequency:2.437GHz Access Point: 00:40:05:D0:53:80 Bit Rate:0kb/s Tx-Power:off Sensitivity=0/3 Retry:off RTS thr:off Fragment thr:off Encryption key:0000-0000-0000-0000-0000-0000-00 Security mode:open Power Management:off Link Quality:0/94 Signal level:-95 dBm Noise level:-95 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 gcc 3.3.2 Any suggestions would be greatly appreciated. I tried to dig through the driver once again, but it's still way over my head and I really have no clue where to start. Thanks, - Terry |