Thread: [Madwifi-devel] bmiss problem
Status: Beta
Brought to you by:
otaku
From: Gerald B. <gbr...@do...> - 2003-11-16 01:45:44
|
I did a little digging in an attempt to solve the bmiss problem. I ripped out the bmiss interrupt handling and put in my own tracking of received packets to control a bmiss timer in software. My hope was to allow any received packet to kick the timer rather than only beacons (as the assumption was that we were missing only beacons). It turns out, we stop receiving packets entirely if the data rate goes too high. Doing a scan somehow kicks things so we start receiving data again, I set my soft beacon timeout to a value of several seconds, and sure enough.. we stop receiving data entirely and several seconds later, my beacon miss timer kicks in and starts a scan and bam, start receiving data again. Clearly there's something else going on which is probably a bug in receiving data. I've also found my card very much dislikes the processor doing throttling or power saving. Perhaps there is something the hal is doing related to timing where the processor dynamically changing speeds is making it rather pissed off (in 2.6, if i load the ACPI processor module, enabling the processor to throttle into C-states, the card stops working entirely). Perhaps this is related to the receiving problems I'm seeing, perhaps not. -- Gerald |
From: Gerald B. <gbr...@do...> - 2003-11-16 02:24:46
|
On Sat, Nov 15, 2003 at 08:45:06PM -0500, Gerald Britton wrote: > I've also found my card very much dislikes the processor doing throttling > or power saving. Perhaps there is something the hal is doing related to > timing where the processor dynamically changing speeds is making it rather > pissed off (in 2.6, if i load the ACPI processor module, enabling the > processor to throttle into C-states, the card stops working entirely). > Perhaps this is related to the receiving problems I'm seeing, perhaps not. A little more info here. The "jerky" behavior best seen when ssh'ing to a remote host and then holding down the space bar repeating. This behavior seems to change in different power modes, sometimes it's really smooth, other times it appears to freeze up for 300-500ms. I've often found the card to be in an unhappy state after a suspend/resume cycle. Doing this and then reloading the module seems to help: # echo '0%100%100%performance' > /proc/cpufreq # echo '0%0%0%powersave' > /proc/cpufreq I have the speedstep-centrino module loaded here to do speed shifting. Often right after a suspend, the card will simply stop receiving any data, or will work for a while then stop receiving data (so the bmiss problem occurs). I'm really confused as to how some people have been reporting good success with the madwifi driver and getting reasonable data rates (though not amazing rates) given I have trouble even transfering small files over scp, the card just reverts to searching until it eventually stops receiving data alltogether. -- Gerald |
From: Tom M. <to...@ho...> - 2003-11-16 17:16:16
|
> # echo '0%100%100%performance' > /proc/cpufreq > # echo '0%0%0%powersave' > /proc/cpufreq >=20 > I have the speedstep-centrino module loaded here to do speed shifting. >=20 > Often right after a suspend, the card will simply stop receiving any data, > or will work for a while then stop receiving data (so the bmiss problem > occurs). I'm really confused as to how some people have been reporting > good success with the madwifi driver and getting reasonable data rates > (though not amazing rates) given I have trouble even transfering small > files over scp, the card just reverts to searching until it eventually > stops receiving data alltogether. The madwifi driver uses udelay() (from the kernel). See ath_hal_delay() in hal/linux/ah_osdep.c. Is it possible that udelay() is not behaving properly when the CPU frequency changes? --=20 "I went to the museum where they had all the heads and arms from the statues that are in all the other museums." -- Steven Wright |
From: Gerald B. <gbr...@do...> - 2003-11-16 20:38:56
|
On Sun, Nov 16, 2003 at 09:15:35AM -0800, Tom Marshall wrote: > The madwifi driver uses udelay() (from the kernel). See ath_hal_delay() > in hal/linux/ah_osdep.c. Is it possible that udelay() is not behaving > properly when the CPU frequency changes? That's certainly possible and it's roughly what I figured might be occuring. Cpufreq shoul make udelay work properly (it updates the loop calibration in the kernel). I've found laptops tend to do some speed changes independant of that (especially with acpi), so this is somewhat unfortunate if there are hard timing requirements being done with udelay. -- Gerald |
From: Mathieu L. <Mat...@so...> - 2003-11-20 14:36:19
|
Hi gerald, I recently started looking into my "revert to SCAN mode randomly" problem which looked like it might be related to your bmiss thing. However, it looks like: 1) I never get a bmiss interrupt. 2) I get data from the rx_tasklet and I forward it to the networking stack. 3) my dhclient does not seem to be receiving any DHCPACK packet 4) ethereal seems to be receiving all DHCPREQUEST and DHCPACK packets 5) the algorithm my dhclient follows when deciding whether or not to shut down the interface because its lease has expired seems to be pretty weird: it seems to be trying really hard to survive in case the DHCP server is down so it takes a lot of time and a lot of randomness to shut down the network interface. 6) all transmission ceases when the dhclient shuts down the network interface. (as expected) 7) sometimes, it looks like the network interface comes back to life more or less randomly when it is down. With all this data, I am a bit lost... Anyone knows more than me on networking to interpret it ? Mathieu On Sun, 2003-11-16 at 02:45, Gerald Britton wrote: > I did a little digging in an attempt to solve the bmiss problem. I ripped > out the bmiss interrupt handling and put in my own tracking of received > packets to control a bmiss timer in software. My hope was to allow any > received packet to kick the timer rather than only beacons (as the > assumption was that we were missing only beacons). > > It turns out, we stop receiving packets entirely if the data rate goes too > high. Doing a scan somehow kicks things so we start receiving data again, > I set my soft beacon timeout to a value of several seconds, and sure > enough.. we stop receiving data entirely and several seconds later, my > beacon miss timer kicks in and starts a scan and bam, start receiving data > again. Clearly there's something else going on which is probably a bug in > receiving data. > > I've also found my card very much dislikes the processor doing throttling > or power saving. Perhaps there is something the hal is doing related to > timing where the processor dynamically changing speeds is making it rather > pissed off (in 2.6, if i load the ACPI processor module, enabling the > processor to throttle into C-states, the card stops working entirely). > Perhaps this is related to the receiving problems I'm seeing, perhaps not. > > -- Gerald > > > ------------------------------------------------------- > This SF. Net email is sponsored by: GoToMyPC > GoToMyPC is the fast, easy and secure way to access your computer from > any Web browser or wireless device. Click here to Try it Free! > https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl > _______________________________________________ > Madwifi-devel mailing list > Mad...@li... > https://lists.sourceforge.net/lists/listinfo/madwifi-devel -- Mathieu Lacage <mat...@so...> |
From: Gerald B. <gbr...@do...> - 2003-11-20 15:37:40
|
On Thu, Nov 20, 2003 at 03:35:27PM +0100, Mathieu Lacage wrote: > 1) I never get a bmiss interrupt. In times of high rate data reception I endup seeing bmiss interrupts. > 2) I get data from the rx_tasklet and I forward it to the networking > stack. In times of high rate data reception, I stop seeing the rx_tasklet being called, a few moments later I see a bmiss interrupt. > 3) my dhclient does not seem to be receiving any DHCPACK packet If it's scanning and not associated this makes sense, also if it's gone to the bad place as mine does (rx stops happening) this makes sense. > 4) ethereal seems to be receiving all DHCPREQUEST and DHCPACK packets ethereal on the same machine? or ethereal on an independant machine? > 5) the algorithm my dhclient follows when deciding whether or not to > shut down the interface because its lease has expired seems to be pretty > weird: it seems to be trying really hard to survive in case the DHCP > server is down so it takes a lot of time and a lot of randomness to shut > down the network interface. > 6) all transmission ceases when the dhclient shuts down the network > interface. (as expected) > 7) sometimes, it looks like the network interface comes back to life > more or less randomly when it is down. I've determined that I really hate dhclient. Are you using Red Hat 9 or Fedora Core 1? I seem to endup with a pile of dhclient processes which all have a habit of conflicting and bouncing the interface a lot. At one point I did up my own network-scripts which are a lot nicer for debugging purposes. Hopefully I'll revive those soon. You say that you're looking for your "revert to SCAN mode problem" Where in here are you seeing it go into scan mode? -- Gerald |
From: Mathieu L. <Mat...@so...> - 2003-11-21 07:58:47
|
On Thu, 2003-11-20 at 16:37, Gerald Britton wrote: > On Thu, Nov 20, 2003 at 03:35:27PM +0100, Mathieu Lacage wrote: > > 1) I never get a bmiss interrupt. > > In times of high rate data reception I endup seeing bmiss interrupts. > > > 2) I get data from the rx_tasklet and I forward it to the networking > > stack. > > In times of high rate data reception, I stop seeing the rx_tasklet being > called, a few moments later I see a bmiss interrupt. > > > 3) my dhclient does not seem to be receiving any DHCPACK packet > > If it's scanning and not associated this makes sense, also if it's gone to > the bad place as mine does (rx stops happening) this makes sense. it is in RUN state. rx_ does happen. > > > 4) ethereal seems to be receiving all DHCPREQUEST and DHCPACK packets > > ethereal on the same machine? or ethereal on an independant machine? same machine. > > > 5) the algorithm my dhclient follows when deciding whether or not to > > shut down the interface because its lease has expired seems to be pretty > > weird: it seems to be trying really hard to survive in case the DHCP > > server is down so it takes a lot of time and a lot of randomness to shut > > down the network interface. > > 6) all transmission ceases when the dhclient shuts down the network > > interface. (as expected) > > 7) sometimes, it looks like the network interface comes back to life > > more or less randomly when it is down. > > I've determined that I really hate dhclient. Are you using Red Hat 9 or I have determined the same :) > Fedora Core 1? I seem to endup with a pile of dhclient processes which all > have a habit of conflicting and bouncing the interface a lot. At one point I > did up my own network-scripts which are a lot nicer for debugging purposes. > Hopefully I'll revive those soon. > > You say that you're looking for your "revert to SCAN mode problem" Where > in here are you seeing it go into scan mode? when the network interface is downed by dhclient. Mathieu -- Mathieu Lacage <mat...@so...> |
From: Mathieu L. <Mat...@so...> - 2003-11-21 13:34:46
|
hi gerald, I finally decided to get rid of my dhclient because I just cannot figure out what it does and why it does this. I can now relatively reliably reproduce the bmiss interrupt pb. ie: the bmiss interrupt is triggered. I have connected a 3s timer to the interrupt. When the interrupt is triggered, I start counting the number of packets I receive from the hardware (specifically, the number of times I get in tx_tasklet and one of the rx descriptors is ready) until the timer expires. I get a non-zero number (more than 50 and less than 200 packets in 3 seconds) which would indicate that we keep on receiving data from the network interface despite what you said in your previous email. hope this helps. regards, Mathieu On Fri, 2003-11-21 at 08:58, Mathieu Lacage wrote: > On Thu, 2003-11-20 at 16:37, Gerald Britton wrote: > > On Thu, Nov 20, 2003 at 03:35:27PM +0100, Mathieu Lacage wrote: > > > 1) I never get a bmiss interrupt. > > > > In times of high rate data reception I endup seeing bmiss interrupts. > > > > > 2) I get data from the rx_tasklet and I forward it to the networking > > > stack. > > > > In times of high rate data reception, I stop seeing the rx_tasklet being > > called, a few moments later I see a bmiss interrupt. > > > > > 3) my dhclient does not seem to be receiving any DHCPACK packet > > > > If it's scanning and not associated this makes sense, also if it's gone to > > the bad place as mine does (rx stops happening) this makes sense. > > it is in RUN state. rx_ does happen. > > > > > > 4) ethereal seems to be receiving all DHCPREQUEST and DHCPACK packets > > > > ethereal on the same machine? or ethereal on an independant machine? > > same machine. > > > > > > 5) the algorithm my dhclient follows when deciding whether or not to > > > shut down the interface because its lease has expired seems to be pretty > > > weird: it seems to be trying really hard to survive in case the DHCP > > > server is down so it takes a lot of time and a lot of randomness to shut > > > down the network interface. > > > 6) all transmission ceases when the dhclient shuts down the network > > > interface. (as expected) > > > 7) sometimes, it looks like the network interface comes back to life > > > more or less randomly when it is down. > > > > I've determined that I really hate dhclient. Are you using Red Hat 9 or > > I have determined the same :) > > > > Fedora Core 1? I seem to endup with a pile of dhclient processes which all > > have a habit of conflicting and bouncing the interface a lot. At one point I > > did up my own network-scripts which are a lot nicer for debugging purposes. > > Hopefully I'll revive those soon. > > > > You say that you're looking for your "revert to SCAN mode problem" Where > > in here are you seeing it go into scan mode? > > when the network interface is downed by dhclient. > > Mathieu -- Mathieu Lacage <mat...@so...> |
From: Gerald B. <gbr...@do...> - 2003-11-21 15:10:31
|
On Fri, Nov 21, 2003 at 02:34:37PM +0100, Mathieu Lacage wrote: > I can now relatively reliably reproduce the bmiss interrupt pb. ie: the > bmiss interrupt is triggered. I have connected a 3s timer to the > interrupt. When the interrupt is triggered, I start counting the number > of packets I receive from the hardware (specifically, the number of > times I get in tx_tasklet and one of the rx descriptors is ready) until > the timer expires. I get a non-zero number (more than 50 and less than > 200 packets in 3 seconds) which would indicate that we keep on receiving > data from the network interface despite what you said in your previous > email. In my testing, I actually turned off the bmiss interrupt, created a timer with a 5 second timeout. Every time we run through the rx_tasklet I kick the timer so that it's now 5 seconds from when that rx packet happened. Beacons are being received in the rx_tasklet too. I'm also currently debugging by the verbose printk method, so I've got it printing out the current jiffies when it kicks the tasklet, and the time for expiry. I ssh to my server through the AP (all seems good), fire up something to print a lot of stuff back on the link, it siezes up (rx_tasklet not getting called) and doesn't complete, 5 seconds later the timer fires. I haven't gone digging into the hard interrupt yet, but that was my next plan (to see if it was somehow getting into a race condition scheduling the rx_tasklet. Even if it was, since this seems to replicate the beacon miss interrupt I'm getting through a software timer, I'm not sure that it would be a driver issue since presumably the beacon miss counters in the card ought to be independant of the driver (yeah, they need setup, but shouldn't they be independant of what the driver actually receives since it can elect to not receive beacons?). -- Gerald |