Re: [Madwifi-devel] Whats the point?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

On Wed, 8 Oct 2008, Scott Raynel wrote:
>
> There's a lot of knowledge about all the various parts of the driver
> out there - for example, Derek has a lot of knowledge about the inner
> workings of the IBSS code. None of us have a full understanding of all
> of the issues in madwifi or even how the whole driver works. We need
> to bring that knowledge together in order to make a solid release. For
> example, I'd hope that Derek could provide us with some pointers on
> what's wrong with IBSS and which patches need to be merged in (from
> trunk, openwrt, etc) to get it going better. The same applies for devs
> who have experience in other areas - Benoit with DFS, for example -
> what needs doing to get DFS working for the next release?

Thanks for your vote of confidence Scott - I appreciate it very much.

Lets start with the simple stuff. I am basing the following comments from 
madwifi trunk. 
ath_recv_mgmt(), is called by the rx packet tasklet for all
     received management frames, including those belonging to other BSS

     call ieee80211_recv_mgmt()
        (This 1100 line function fails the readability test).
        If in adhoc mode, and If it is a valid 802.11  beacon or probe
         response  frame add sender to our neighbour table.
           [Even if sender belongs to a different network]
           If the sender is not added, things like iwlist ath0 scan stop
           working.
        If the checks on valid 802.11 management frame failed, return.
        if the flag for syncbeacon is on (channel recently changed or
         a previous frame has a tsf > our_tsf) then update beacon timers
         by calling ath_beacon_config();
        If it is our bssid, check the ATIM registers (which can indicate if
           ramping is happening) and correct for possible ramping.
        If we have a preset BSSID, escape out now. this means that
           *there will be no beacon synchronisation,
           *there will be no event for upper layers to find out about new
              nodes in the network.
           *many of the nasty beacon setting races will be avoided.

         Add now to the consideration all received probe responses.
          If the TSF in the consideration is > our_tsf, call
           ieee8021_ibss_merge()
           Now, do you see the issue - if the channel has just be changed
             (by the scanning tasklet say), the next beacon we receive we will
              cause us to a ath_beacon_config()
                - Without testing the ESSID in the beacon.
            Further, the neighbour table contains an entry matching any node
            in the neighbourhood who has sent a beacon/probe to us. Thus,
            there will be entries in the neighbour table who don't have our
            ESSID. Since the neighbour table is of finite length, we have an
            excellent DOS attack by simply sending random beacons, which fills
            the neighbour table of the attackee.

In ieee80211_ibss_merge()
          test for matching essid of the source node, and that capabilities
           match.
          call ieee80211_sta_join1()

ieee80211_sta_join1
         sets some variables, and calls ic->ic_set_channel (ath_set_channel).
         Mostly, this does not require a channel change, so not very much of
         interest happens, except that it does set the flag for syncbeacon.

     OOps - do you see the issue? We have received a frame that has our essid
      and older TSF - so we set the syncbeacon flag. If the next beacon frame
      we receive is from a different network, then we will ath_beacon_update to
      that frame in that network.

     lastly, we call the tasklet iv_stajoin1tq, which starts a new tasklet at
ieee80211_sta_join1_tasklet(), which drives us to a new state of RUN. We 
were in run mode before (or rather, we should have been in RUN before...)

This stajoin tasklet does the following
call ieee80211_new_state()
          spin_lock_bh           /* Locking! - but only on states*/
          call vap->iv_newstate(), which resolves to ath_newstate
          spin_unlock_bh

Now, you see the locking here - but the received packet tasklet does not 
use the same spinlock. What is to stop the inevitable races when multiple 
consecutive frames coming in - one which launches the stajoin tasklet and 
the other for the rx packet tasklet.

ath_newstate() does a number of different things, among them
       may delete the periodic calibration timer
       netif_stop_queue
       copy vap->bssid to sc->sc_curbssid,
       ath_calc_rxfilter
       ath_hal_setrxfilter
       notifies the rate algorithm of the newstate. Even if we do a
        RUN->RUN transition, the rate algorithm is notified. The sample rate
       algorithm does the following in this case - resets all internal tables
        for all remote nodes to zero (discards all current information for all
        nodes). Do you see the problem? One node has joined our network with
       an older TSF. We promptly throw away all the information on the
       appropriate rate to use with other nodes in our neighbour table.
       ath_hal_stoptxdma
       ath_beacon_alloc (set a flag for next time swba happens)
       call avp->av_newstate (ieee80211_newstate(), Not ieee80211_new_state())
          call __i80211_newstate
          call ieee80211_reset_bss (no ath/hal releated things)
       start periodic recalibration timer
       netif_wake_queue()

You follow all those processes?

here is my problem::
     I want to
      *move the contents of the stajoin tasklet into the rx packet tasklet,
        so there is no overlap in functionality
        this will break lots of things, among them notification of a
        new node in the network
      *put a flag at the start of the rx_packet tasklet, to ensure we
        only process one management frame at a time
      *at the beginning of rx packet tasklet, dump all frames which are
       not our essid - this will break the scanning of the neighbourhood
      *remove the reset of the rate algorithm's data tables because we have
       transitioned from run to run mode. This will show if some rate
       algorithms descend to the lowest possible rate and stay there. On some
       of my cut down builds that were used here, I found that sample descends
       to 1mbit and stays there, despite the radio conditions being good.

But I ain't doing all this surgery. There is not much point...

Derek.
     -- 
Derek Smithies Ph.D.
IndraNet Technologies Ltd.
Email: de...@in...
ph +64 3 365 6485
Web: http://www.indranet-technologies.com/