Sometimes, I go to an incorrect state, in which both switches are designated root and regional root, and seem not to see each other.
When this situation occurs?
When I a kill mstpd (or for some reason it dies) on switch 1, and I start it again.
Switch 1:
br0 CIST info
enabled yes
bridge id 8.000.xx:xx:xx:04:01:02
designated root 8.000.xx:xx:xx:04:01:02
regional root 8.000.xx:xx:xx:04:01:02
root port none
path cost 0 internal path cost 0
max age 20 bridge max age 20
forward delay 15 bridge forward delay 15
tx hold count 6 max hops 20
force protocol version rstp
time since topology change 899
topology change count 1
topology change n
mstpctl showbridge br0
br0 CIST info
enabled yes
bridge id 8.000.xx:xx:xx:23:12:02
designated root 8.000.xx:xx:xx:23:12:02
regional root 8.000.xx:xx:xx:23:12:02
root port none
path cost 0 internal path cost 0
max age 20 bridge max age 20
forward delay 15 bridge forward delay 15
tx hold count 6 max hops 20
force protocol version rstp
time since topology change 28330
topology change count 5
topology change no
What I have been seeing, is that the problem is at switch 1, in spite of the fact that all ports are initialized to forwarding state, when mstpd starts, (my driver_deps does that), ports 1 and 2, fall to blocking state.
The way I can do things go the right way, is forcing ports 1 and 2 of switch 1, to forwarding state, at this point msptd state machine starts working, and
system goes to correct state.
Last edit: Francis 2012-09-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First of all, please make sure that you are using the most recent revision from the SVN - which is now revision #35. There was a crucial bugfix in that revision.
Second: it is completely unrelated to your problem, but worth noting that if you want switches to be in the same MSTP region, not only digests should be equal, but all four fields in the MST Configuration Identifier (Format Selector, Configuration Name, Revision Level, Configuration Digest). Again, this is only a side note and is totally unrelated to the issue because RSTP does not use MST Configuration Identifier at all and does not know anything about regions.
Now, to the actual problem at hand. It sounds like switch 1 does not receive (or does not send, or both) STP BPDUs when ports are in blocking state. Is my understanding correct? Can you see in logs received BPDUs when ports are in blocking state?
If so, than that is a problem. Your hardware and/or driver should not prevent ports from receiving BPDUs when in blocking state. In blocking state port should block receiving of almost all frames save for a few special cases and BPDU is one of those special cases.
If you still can see in the log that mstpd receives BPDUs while ports are in blocking state and in spite of that fact they still don't transit into learning state - than we have the different problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What I can see is that it seems like ports 1, and 2 of switch 1, that fall into 1 state, do not send BPDUS over physical port when in this state.
At switch 1. A tcpdump shows that ports 1,2 send and receive BPDU.
but switch 2 doesnt receive it.
However I have tested that STP frame proposal is going out through switch 1 ports, but for any reason that I dont know, It is not present at Switch 2 tcpdump capture.
At switch 1. A tcpdump shows that ports 1,2 send and receive BPDU.
but switch 2 doesnt receive it.
That's it. I guess the root of the poblem is in your driver. That it somehow prevents ALL frames from being emitted when ports are in blocking state.
I'd suggest you digging into your driver and trying to find out why BPDUs are not emitted in the outer world to the switch 2. They definitely SHOULD go out in any state of the port (except disabled/powered-out state, of course).
Alternatively, if BPDUs are actually being emitted out of switch 1, you should find out why they are dropped by the hardware or driver in the switch 2. They definitely SHOULD pass to the kernel and/or mstpd in any state of the port (except disabled/powered-out state, of course)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am considering this, BUT the certain thing, (misleading thing about this, is that just changing forwarding port states to forwarding state makes the system to recover to a correct state. I am able to lead spanning tree daemon to this state just killing mstpd daemon, and starting it again. Also If a restart msptd daemon on switch 2, when this occurs, STP starts working properly. So aparently driver seems not be responsible of this malfunction.
My driver sends BPDU frames from switch engine to a tun/tap. This mechanism is working right, The only reason for not to having the BPDU delivered is that it would be malformed or something like that.
I am not sure if I have explained right the situation...
Last edit: Francis 2012-09-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I can't see how mstpd can be responsible for the BPDU loss.
I'm afraid that now it is your responsibility to find the exact place where and why are BPDUs dropped. When you find this place, it will be clear what's happening and what to do next ;)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, so what do we have:
1) At some point BPDUs are transmitted by mstpd but for the unknown reason are not delivered to the final destination;
2a) You manually change the port state to forwarding - this recovers the situation;
2b) You restart mstpd (thus forcing some new state to be set on ports) - this recovers the situation too.
Is my understanding right?
If yes, it seems that in the initial situation ports are in some strange undefined state which causes them to drop BPDUs.
Could you look at the following in that state when BPDUs are being dropped:
- in what state must the ports be from the mstpd point of view, i.e. what state is reported by "mstpctl showport"?
- in what actual state are the actual physical (or tun/tap, whatever) ports?
If those states differ, we should find out why do they differ.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK. I am thinking rigth now, I have a clear idea of what is happening..
About BPDU trasmitted over interfaces that are not seen on the other side. These ports, are the ports that have fallen into blocking state, so BPDU sent by mstpd are dropped in output queue, and do not go out.
I was thinking that frames were going out, because I tested a port that was part of the bridge (but not was not connected to the peer) and frames were going out...so this port was not in blocking state.
What is happening here?..I start, msptd service in both switches,
msptd converges int to a correct state. If variations in port status, mpstpd converges in to a correct state
After mstpd state machines of both switches reaching a stable state.
a I shutdown mstpd service on one side (on root switch)
b After that, switch 2 becomes root switch.
c.I start mstpd again service on switch 1. Ports,are initialized to forwarding state, a state machine starts working, and finally falls into an incorrect state in wich, all ports of switch 1 connected to switch 2, fall in blocking state.
How I am able to go to a correct state:
Forcing switch 1 blocking ports, into forwarding...-> State machine starts making his work fine..
Restarting mstpd service on switch 2.
This is clearly what I am observing....
Last edit: Francis 2012-09-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
About BPDU trasmitted over interfaces that are not seen on the other side. These ports, are the ports that have fallen into blocking state, so BPDU sent by mstpd are dropped in output queue, and do not go out.
This is the problem. Ports in "blocking" state must not block BPDUs. "Blocking" state means that all frames are to be dropped except BPDUs (and several other special cases).
So, you need to investigate how the "blocking" state is implemented on the affected ports and why they drop BPDUs in that state when they should not.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am dont know in deepth the details of STP protocol, but I thought that the blocking port doesn't send any BPDUs, but just wait for them and ckeck them. But I am guessing that my switch driver perhaps is doing something wrong...
Last edit: Francis 2012-09-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Oops, you are right, in general ports in "blocking" state are not supposed to emit BPDUs, just receive. I had to double-check that before making conclusions. My bad :(
Anyway, all it looks like mstpd on some switch (switch-1 or switch-2) does not receive BPDUs when it should.
Just a wild guess - maybe ports do not receive incoming BPDUs in the "blocking" state? It is very suspicious that changing the state of the ports in such or another way helps things to start moving...
I will think about the situation during the holidays.
Last edit: Vitalii Demianets 2012-09-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I dont really know about the standard but my chip specificacion says:
Blocking and Listening: All incoming frames
except BPDUs will be discarded. All outgoing frames
except BPDUs will be masked.
However this is not the behavior of my driver API code, when I send a BPDU frame to proccesor via API frame is dropped if port is not in forwarding.
Respecting mstpd daemon state machine running in my chip. If I stop any daemon on any moment, It would no be possible to have mstpd working again at least ports will be forced to forwarding.
I have to rectify one thing I have said before. When system is at this incorrect state, "If we restart mpstd daemon It would repair stp" This is not always true, switch 1 ports fall in a correct state, many times switch 2 ports start changing from learning to discarding, excepting root port that remains in forward state. (I am updated to rev35)
What I see is the fact that Due to BPDU forwarding behavior when port is in forwarding state, stable state machine result differs depending when Bridges are started
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I dont really know about the standard but my chip specificacion says:
Blocking and Listening: All incoming frames
except BPDUs will be discarded. All outgoing frames
except BPDUs will be masked.
That's good ;) My chip does the same thing.
However this is not the behavior of my driver API code, when I send a BPDU frame to proccesor via API frame is dropped if port is not in forwarding.
That's bad. I do not know your driver implementation, but in any case that is not a driver responsibility to drop any of BPDUs. Maybe we should focus on that and find why your driver drops the BPDUs if port is not in forwarding state?
Bottom line: I think that neither hardware chip nor software driver should drop any BPDU, except when port is in disabled (powered-down) state.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Vitalii.
I am having problems again testing mstpd.
Now I am working on a simple RSTP configuration of two switches connected like this:
Root Mac xx:xx:xx is the same for both switches.
Switch 1 8.000.xx:xx:xx:04:01:02
Switch 2 8.000.xx:xx:xx:23:12:02
|-------------| |--------------|
| Switch1 |1++++++++++++++++++++1| Switch2 |
| root |2****2| |
|-------------| |--------------|
Sometimes, I go to an incorrect state, in which both switches are designated root and regional root, and seem not to see each other.
When this situation occurs?
When I a kill mstpd (or for some reason it dies) on switch 1, and I start it again.
Switch 1:
br0 CIST info
enabled yes
bridge id 8.000.xx:xx:xx:04:01:02
designated root 8.000.xx:xx:xx:04:01:02
regional root 8.000.xx:xx:xx:04:01:02
root port none
path cost 0 internal path cost 0
max age 20 bridge max age 20
forward delay 15 bridge forward delay 15
tx hold count 6 max hops 20
force protocol version rstp
time since topology change 899
topology change count 1
topology change n
br0 MST Configuration Identifier:
Format Selector: 0
Configuration Name: xxxxxx040102
Revision Level: 0
Configuration Digest: AC36177F50283CD4B83821D8AB26DE62
Switch 2:
mstpctl showbridge br0
br0 CIST info
enabled yes
bridge id 8.000.xx:xx:xx:23:12:02
designated root 8.000.xx:xx:xx:23:12:02
regional root 8.000.xx:xx:xx:23:12:02
root port none
path cost 0 internal path cost 0
max age 20 bridge max age 20
forward delay 15 bridge forward delay 15
tx hold count 6 max hops 20
force protocol version rstp
time since topology change 28330
topology change count 5
topology change no
br0 MST Configuration Identifier:
Format Selector: 0
Configuration Name: 000958231202
Revision Level: 0
Configuration Digest: AC36177F50283CD4B83821D8AB26DE62
What I have been seeing, is that the problem is at switch 1, in spite of the fact that all ports are initialized to forwarding state, when mstpd starts, (my driver_deps does that), ports 1 and 2, fall to blocking state.
The way I can do things go the right way, is forcing ports 1 and 2 of switch 1, to forwarding state, at this point msptd state machine starts working, and
system goes to correct state.
Last edit: Francis 2012-09-20
Hello, Francis!
First of all, please make sure that you are using the most recent revision from the SVN - which is now revision #35. There was a crucial bugfix in that revision.
Second: it is completely unrelated to your problem, but worth noting that if you want switches to be in the same MSTP region, not only digests should be equal, but all four fields in the MST Configuration Identifier (Format Selector, Configuration Name, Revision Level, Configuration Digest). Again, this is only a side note and is totally unrelated to the issue because RSTP does not use MST Configuration Identifier at all and does not know anything about regions.
Now, to the actual problem at hand. It sounds like switch 1 does not receive (or does not send, or both) STP BPDUs when ports are in blocking state. Is my understanding correct? Can you see in logs received BPDUs when ports are in blocking state?
If so, than that is a problem. Your hardware and/or driver should not prevent ports from receiving BPDUs when in blocking state. In blocking state port should block receiving of almost all frames save for a few special cases and BPDU is one of those special cases.
If you still can see in the log that mstpd receives BPDUs while ports are in blocking state and in spite of that fact they still don't transit into learning state - than we have the different problem.
What I can see is that it seems like ports 1, and 2 of switch 1, that fall into 1 state, do not send BPDUS over physical port when in this state.
At switch 1. A tcpdump shows that ports 1,2 send and receive BPDU.
but switch 2 doesnt receive it.
However I have tested that STP frame proposal is going out through switch 1 ports, but for any reason that I dont know, It is not present at Switch 2 tcpdump capture.
An extract of tcpdump at switch 1:
received:
12:20:51.023792 STP 802.1w, Rapid STP, Flags [Proposal, Learn, Forward, Agreement], bridge-id 8000.00:xx:xx:23:12:02.8001, length 43
sent:
12:20:51.075442 STP 802.1w, Rapid STP, Flags [Proposal, Agreement], bridge-id 8000.xx:xx:xx:04:01:02.8001, length 36
At the moment I manually change port 1, 2 status of switch 1 to forwarding state, spanning tree SM is negotiated right.
Last edit: Francis 2012-09-20
That's it. I guess the root of the poblem is in your driver. That it somehow prevents ALL frames from being emitted when ports are in blocking state.
I'd suggest you digging into your driver and trying to find out why BPDUs are not emitted in the outer world to the switch 2. They definitely SHOULD go out in any state of the port (except disabled/powered-out state, of course).
Alternatively, if BPDUs are actually being emitted out of switch 1, you should find out why they are dropped by the hardware or driver in the switch 2. They definitely SHOULD pass to the kernel and/or mstpd in any state of the port (except disabled/powered-out state, of course)
I am considering this, BUT the certain thing, (misleading thing about this, is that just changing forwarding port states to forwarding state makes the system to recover to a correct state. I am able to lead spanning tree daemon to this state just killing mstpd daemon, and starting it again. Also If a restart msptd daemon on switch 2, when this occurs, STP starts working properly. So aparently driver seems not be responsible of this malfunction.
My driver sends BPDU frames from switch engine to a tun/tap. This mechanism is working right, The only reason for not to having the BPDU delivered is that it would be malformed or something like that.
I am not sure if I have explained right the situation...
Last edit: Francis 2012-09-20
I can't see how mstpd can be responsible for the BPDU loss.
I'm afraid that now it is your responsibility to find the exact place where and why are BPDUs dropped. When you find this place, it will be clear what's happening and what to do next ;)
Ok...Your are right, but ..I am quite confused right now by the fact that restarting mstpd in both sides makes things to work properly
Oops, missed your update )
Ok, so what do we have:
1) At some point BPDUs are transmitted by mstpd but for the unknown reason are not delivered to the final destination;
2a) You manually change the port state to forwarding - this recovers the situation;
2b) You restart mstpd (thus forcing some new state to be set on ports) - this recovers the situation too.
Is my understanding right?
If yes, it seems that in the initial situation ports are in some strange undefined state which causes them to drop BPDUs.
Could you look at the following in that state when BPDUs are being dropped:
- in what state must the ports be from the mstpd point of view, i.e. what state is reported by "mstpctl showport"?
- in what actual state are the actual physical (or tun/tap, whatever) ports?
If those states differ, we should find out why do they differ.
Yes you are understanding right,
MSTPD port states always match with hardware spanning tree state.
I will update with more information, when I was completly sure.
OK. I am thinking rigth now, I have a clear idea of what is happening..
About BPDU trasmitted over interfaces that are not seen on the other side. These ports, are the ports that have fallen into blocking state, so BPDU sent by mstpd are dropped in output queue, and do not go out.
I was thinking that frames were going out, because I tested a port that was part of the bridge (but not was not connected to the peer) and frames were going out...so this port was not in blocking state.
What is happening here?..I start, msptd service in both switches,
msptd converges int to a correct state.
If variations in port status, mpstpd converges in to a correct state
After mstpd state machines of both switches reaching a stable state.
a I shutdown mstpd service on one side (on root switch)
b After that, switch 2 becomes root switch.
c.I start mstpd again service on switch 1. Ports,are initialized to forwarding state, a state machine starts working, and finally falls into an incorrect state in wich, all ports of switch 1 connected to switch 2, fall in blocking state.
How I am able to go to a correct state:
This is clearly what I am observing....
Last edit: Francis 2012-09-21
This is the problem. Ports in "blocking" state must not block BPDUs. "Blocking" state means that all frames are to be dropped except BPDUs (and several other special cases).
So, you need to investigate how the "blocking" state is implemented on the affected ports and why they drop BPDUs in that state when they should not.
I am dont know in deepth the details of STP protocol, but I thought that the blocking port doesn't send any BPDUs, but just wait for them and ckeck them. But I am guessing that my switch driver perhaps is doing something wrong...
Last edit: Francis 2012-09-21
Oops, you are right, in general ports in "blocking" state are not supposed to emit BPDUs, just receive. I had to double-check that before making conclusions. My bad :(
Anyway, all it looks like mstpd on some switch (switch-1 or switch-2) does not receive BPDUs when it should.
Just a wild guess - maybe ports do not receive incoming BPDUs in the "blocking" state? It is very suspicious that changing the state of the ports in such or another way helps things to start moving...
I will think about the situation during the holidays.
Last edit: Vitalii Demianets 2012-09-21
I dont really know about the standard but my chip specificacion says:
except BPDUs will be discarded. All outgoing frames
except BPDUs will be masked.
However this is not the behavior of my driver API code, when I send a BPDU frame to proccesor via API frame is dropped if port is not in forwarding.
Respecting mstpd daemon state machine running in my chip. If I stop any daemon on any moment, It would no be possible to have mstpd working again at least ports will be forced to forwarding.
I have to rectify one thing I have said before. When system is at this incorrect state, "If we restart mpstd daemon It would repair stp" This is not always true, switch 1 ports fall in a correct state, many times switch 2 ports start changing from learning to discarding, excepting root port that remains in forward state. (I am updated to rev35)
What I see is the fact that Due to BPDU forwarding behavior when port is in forwarding state, stable state machine result differs depending when Bridges are started
That's good ;) My chip does the same thing.
That's bad. I do not know your driver implementation, but in any case that is not a driver responsibility to drop any of BPDUs. Maybe we should focus on that and find why your driver drops the BPDUs if port is not in forwarding state?
Bottom line: I think that neither hardware chip nor software driver should drop any BPDU, except when port is in disabled (powered-down) state.
In blocking/learning state, ports are able to receive BPDU, and these are forward to mstpd.
Also, maybe you have in-kernel STP enabled alongside with mstpd?
Please do the following (substitute br0 with the actual name of your bridge):
and check the output. It must be 2 (2 = kernel knows about user-space STP daemon).
This is the first what I have checked and STP is working userspace, 2