Thread: [Ebtables-devel] Re: 2.6.12: connection tracking broken?
Brought to you by:
bdschuym
From: Patrick M. <ka...@tr...> - 2005-06-19 13:05:48
|
Santiago Garcia Mantinan wrote: >>I have sent this right now to the bridge list, I'm copying it here so that >>more info is available about this bug. > > > I have selected patches from 2.6.12 that I thought could be related to this > issue, and I have finaly identified this patch... > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b31e5b1bb53b99dfd5e890aa07e943aff114ae1c > > as the patch causing the problem, I have reversed it on my kernel tree and > now the firewall is working again. > > I have not really looked at what the patch does and how it does that, I have > just identified it as the one causing the break of this connection tracking > relating to the bridges. The patch drops the conntrack reference when a packet leaves IP to avoid problems with module unload because of indefinitely queued packets. The bridge-netfilter code defers calling of some NF_IP_* hooks to the bridge layer, when the conntrack reference is already gone, so the entry is neither confirmed (enters the hashtable) nor available for use by matches or targets. Reverting the patch is not a good option, I'll look into other possiblities. Regards Patrick |
From: Herbert Xu <he...@go...> - 2005-06-20 00:06:36
|
Patrick McHardy <ka...@tr...> wrote: > > The bridge-netfilter code defers calling of some NF_IP_* hooks to the > bridge layer, when the conntrack reference is already gone, so the entry Why does it defer them at all? Shouldn't the fact that the device is bridged be transparent to the IP layer? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <he...@go...> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt |
From: David S. M. <da...@da...> - 2005-06-20 00:19:28
|
From: Herbert Xu <he...@go...> Date: Mon, 20 Jun 2005 10:05:42 +1000 > Patrick McHardy <ka...@tr...> wrote: > > > > The bridge-netfilter code defers calling of some NF_IP_* hooks to the > > bridge layer, when the conntrack reference is already gone, so the entry > > Why does it defer them at all? Shouldn't the fact that the device is > bridged be transparent to the IP layer? The bridge netfilter layer uses netif_rx(skb) at the deepest level in order to avoid too deep stack usage. This is also why the NF_HOOK*() macros were semantically changed a little bit several months ago. |
From: Herbert Xu <he...@go...> - 2005-06-20 00:50:48
|
On Sun, Jun 19, 2005 at 05:18:13PM -0700, David S. Miller wrote: > From: Herbert Xu <he...@go...> > Date: Mon, 20 Jun 2005 10:05:42 +1000 > > > Why does it defer them at all? Shouldn't the fact that the device is > > bridged be transparent to the IP layer? > > The bridge netfilter layer uses netif_rx(skb) at the deepest level in > order to avoid too deep stack usage. Sorry, but I don't see the connection between this and deferring NF_IP_* hooks on the transmit path. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <he...@go...> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt |
From: Patrick M. <ka...@tr...> - 2005-06-20 02:45:36
|
On Mon, 20 Jun 2005, Herbert Xu wrote: > Patrick McHardy <ka...@tr...> wrote: >> >> The bridge-netfilter code defers calling of some NF_IP_* hooks to the >> bridge layer, when the conntrack reference is already gone, so the entry > > Why does it defer them at all? Shouldn't the fact that the device is > bridged be transparent to the IP layer? I couldn't figure out the reason, it seems to have something to do with setting up device pointers for iptables and ebtables. It looks like the only way to fix this problem without keeping the conntrack reference while packets are queued at the device is to avoid defering the NF_IP_* hooks. Bart, can you explain why the hooks are defered please? Regards Patrick |
From: Bart De S. <bds...@te...> - 2005-06-20 06:26:44
|
Op ma, 20-06-2005 te 04:45 +0200, schreef Patrick McHardy: > On Mon, 20 Jun 2005, Herbert Xu wrote: > > Patrick McHardy <ka...@tr...> wrote: > >> > >> The bridge-netfilter code defers calling of some NF_IP_* hooks to the > >> bridge layer, when the conntrack reference is already gone, so the entry > > > > Why does it defer them at all? Shouldn't the fact that the device is > > bridged be transparent to the IP layer? > > I couldn't figure out the reason, it seems to have something to do > with setting up device pointers for iptables and ebtables. It looks > like the only way to fix this problem without keeping the conntrack > reference while packets are queued at the device is to avoid defering > the NF_IP_* hooks. Bart, can you explain why the hooks are defered > please? This is done so that iptables knows which bridge port the output device is, using the iptables physdev match. Can't you release the conntrack reference with a function registered on the POSTROUTING hook with a prio higher than nat POSTROUTING (or something like that)? cheers, Bart |
From: Patrick M. <ka...@tr...> - 2005-06-20 12:16:04
|
Bart De Schuymer wrote: > Op ma, 20-06-2005 te 04:45 +0200, schreef Patrick McHardy: > >> Bart, can you explain why the hooks are defered please? > > This is done so that iptables knows which bridge port the output device > is, using the iptables physdev match. In which cases is this necessary? AFAICT the output device is determined in br_handle_frame_finish() for a normally bridged packet. > Can't you release the conntrack reference with a function registered on > the POSTROUTING hook with a prio higher than nat POSTROUTING (or > something like that)? We would have to hold the reference while the packet is queued at the device for the bridge case, which we want to avoid. Regards Patrick |
From: Bart De S. <bds...@pa...> - 2005-06-20 18:33:47
|
Op ma, 20-06-2005 te 14:15 +0200, schreef Patrick McHardy: > Bart De Schuymer wrote: > > Op ma, 20-06-2005 te 04:45 +0200, schreef Patrick McHardy: > > > >> Bart, can you explain why the hooks are defered please? > > > > This is done so that iptables knows which bridge port the output device > > is, using the iptables physdev match. > > In which cases is this necessary? AFAICT the output device is determined > in br_handle_frame_finish() for a normally bridged packet. When the _routing_ decision sends the packet to br0 (a bridge device), it is unknown which bridge port(s) the packet will be sent out. This is only known after the packet enters the bridge code. Therefore, for iptables to know the bridge port out device, the hooks must be postponed until in the bridge code. > > Can't you release the conntrack reference with a function registered on > > the POSTROUTING hook with a prio higher than nat POSTROUTING (or > > something like that)? > > We would have to hold the reference while the packet is queued at the > device for the bridge case, which we want to avoid. Trust me, people will complain if they can no longer use the physdev match for routed packets. People using a bridging firewall will just have to live with the fact that the reference is held until in the bridge code. cheers, Bart |
From: Phil O. <ke...@li...> - 2005-06-20 18:57:48
Attachments:
patch-fixbridge
|
The changes were originally made to fix conntrack unload problems with raw sockets. My original patch performed the nf_reset in the socket code, but Patrick suggested moving it to ip_output. The below patch reverts the ip_output changes, and implements the original suggested changes to raw socket handling. While this is unlikely to be the permanent solution, it will fix the current bridging problems while retaining the raw socket fixes. I'd suggest that this could be included in -stable while researching other solutions. Phil Signed-off-by: Phil Oester <ke...@li...> |
From: Patrick M. <ka...@tr...> - 2005-06-20 23:27:48
|
Phil Oester wrote: > The changes were originally made to fix conntrack unload problems with > raw sockets. My original patch performed the nf_reset in the socket > code, but Patrick suggested moving it to ip_output. The below patch > reverts the ip_output changes, and implements the original suggested > changes to raw socket handling. > > While this is unlikely to be the permanent solution, it will fix the > current bridging problems while retaining the raw socket fixes. I'd > suggest that this could be included in -stable while researching > other solutions. I disagree. We will probably have a final solution within a few days, so I don't think we need to submit "temporary" fixes yet. Your patch actually causes a regression for people not using bringe-netfilter, unplugging a network cable can cause the conntrack module to become unloadable again. Regards Patrick |
From: Patrick M. <ka...@tr...> - 2005-06-20 23:23:04
|
Bart De Schuymer wrote: > When the _routing_ decision sends the packet to br0 (a bridge device), > it is unknown which bridge port(s) the packet will be sent out. This is > only known after the packet enters the bridge code. Therefore, for > iptables to know the bridge port out device, the hooks must be postponed > until in the bridge code. This seems to be a really bad idea, for a single match that violates layering we need to deal with this inconsistency. It's not just the conntrack reference, with IPsec the packet passed to the defered hooks is totally different from when it was still inside IP, which for example breaks the policy match. > Trust me, people will complain if they can no longer use the physdev > match for routed packets. I'm sure they will, now that they got used to it. Why can't people match on the bridge port inside ebtables and only match on the device within iptables? Is there a case that can't be handled this way? Regards Patrick |
From: Bart De S. <bds...@pa...> - 2005-06-21 07:06:45
|
Op di, 21-06-2005 te 01:22 +0200, schreef Patrick McHardy: > This seems to be a really bad idea, for a single match that violates > layering we need to deal with this inconsistency. It's not just the > conntrack reference, with IPsec the packet passed to the defered hooks > is totally different from when it was still inside IP, which for example > breaks the policy match. I agree it is annoying. > > Trust me, people will complain if they can no longer use the physdev > > match for routed packets. > > I'm sure they will, now that they got used to it. Why can't people > match on the bridge port inside ebtables and only match on the device > within iptables? Is there a case that can't be handled this way? ebtables doesn't have all the targets/matches that iptables has. Perhaps a rule structure using marking can always be used so that the ACCEPT/DROP can be deferred until inside ebtables, I don't know if that will always be possible though. Deferring the hooks makes the bridge-nf code alot more complicated, so I would be glad to get rid of it if it is the right thing to do. But backwards compatibility can't be maintained and I'd be surprised if every ruleset that now works will still be possible using an iptables/ebtables scheme. It has worked fine in the past and I see no reason why to stop making it work now because of some recently found and unrelated referencing problem. Of course, if the netfilter people decide it should be removed, then so be it. cheers, Bart |
From: Patrick M. <ka...@tr...> - 2005-06-21 17:23:34
|
Bart De Schuymer wrote: > Deferring the hooks makes the bridge-nf code alot more complicated, so I > would be glad to get rid of it if it is the right thing to do. But > backwards compatibility can't be maintained and I'd be surprised if > every ruleset that now works will still be possible using an > iptables/ebtables scheme. I unfortunately don't see a way to remove it, but we should keep thinking about it. Can you please check if the attached patch is correct? It should exclude all packets handled by bridge-netfilter from having their conntrack reference dropped. I didn't add nf_reset()'s to the bridging code because with tc actions the packets can end up anywhere else anyway, and this will hopefully get fixed right sometime. BTW. this line from ip_sabotage_out() looks wrong, it will clear all flags instead of setting the BRNF_DONT_TAKE_PARENT flag (second patch): nf_bridge->mask &= BRNF_DONT_TAKE_PARENT; Signed-off-by: Patrick McHardy <ka...@tr...> |
From: Bart De S. <bds...@pa...> - 2005-06-21 20:33:15
|
Op di, 21-06-2005 te 17:16 +0200, schreef Patrick McHardy: > I unfortunately don't see a way to remove it, but we should keep > thinking about it. Can you please check if the attached patch is > correct? It should exclude all packets handled by bridge-netfilter > from having their conntrack reference dropped. I didn't add nf_reset()'s > to the bridging code because with tc actions the packets can end up > anywhere else anyway, and this will hopefully get fixed right sometime. Looks good. Perhaps a compile time option to disable postponing the hooks would be nice... > BTW. this line from ip_sabotage_out() looks wrong, it will clear all > flags instead of setting the BRNF_DONT_TAKE_PARENT flag (second > patch): > > nf_bridge->mask &= BRNF_DONT_TAKE_PARENT; Thanks, Bart |
From: Chris W. <ch...@os...> - 2005-06-21 21:23:40
|
* Bart De Schuymer (bds...@pa...) wrote: > Op di, 21-06-2005 te 17:16 +0200, schreef Patrick McHardy: > > I unfortunately don't see a way to remove it, but we should keep > > thinking about it. Can you please check if the attached patch is > > correct? It should exclude all packets handled by bridge-netfilter > > from having their conntrack reference dropped. I didn't add nf_reset()'s > > to the bridging code because with tc actions the packets can end up > > anywhere else anyway, and this will hopefully get fixed right sometime. > > Looks good. Is this one good for -stable? thanks, -chris |
From: David S. M. <da...@da...> - 2005-06-21 22:32:56
|
From: Chris Wright <ch...@os...> Subject: Re: 2.6.12: connection tracking broken? Date: Tue, 21 Jun 2005 14:23:17 -0700 > * Bart De Schuymer (bds...@pa...) wrote: > > Op di, 21-06-2005 te 17:16 +0200, schreef Patrick McHardy: > > > I unfortunately don't see a way to remove it, but we should keep > > > thinking about it. Can you please check if the attached patch is > > > correct? It should exclude all packets handled by bridge-netfilter > > > from having their conntrack reference dropped. I didn't add nf_reset()'s > > > to the bridging code because with tc actions the packets can end up > > > anywhere else anyway, and this will hopefully get fixed right sometime. > > > > Looks good. > > Is this one good for -stable? We will push it to st...@ke... if we deem it should. I do review the stream going into 2.6.x, and decide if it should be bound for -stable as well, so you never need to inquire about this specifically. |
From: Chris W. <ch...@os...> - 2005-06-21 22:34:54
|
* David S. Miller (da...@da...) wrote: > We will push it to st...@ke... if we deem it should. > > I do review the stream going into 2.6.x, and decide if it > should be bound for -stable as well, so you never need to > inquire about this specifically. Great, much appreciated. thanks, -chris |
From: Patrick M. <ka...@tr...> - 2005-06-22 00:27:34
|
David S. Miller wrote: >>Is this one good for -stable? > > We will push it to st...@ke... if we deem it should. I would like to get confirmation from someone affected by this bug, after that I think it should go in -stable. Chris, could you give it a try? Thanks Patrick |
From: Chris R. <ran...@ya...> - 2005-06-22 22:58:23
|
--- Patrick McHardy <ka...@tr...> wrote: > I would like to get confirmation from someone affected by this > bug, after that I think it should go in -stable. Chris, could > you give it a try? I trust you're talking about the following patches? diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -188,7 +188,12 @@ static inline int ip_finish_output2(stru skb = skb2; } - nf_reset(skb); +#ifdef CONFIG_BRIDGE_NETFILTER + /* bridge-netfilter defers calling some IP hooks to the bridge layer and + * still needs the conntrack reference */ + if (skb->nf_bridge == NULL) +#endif + nf_reset(skb); if (hh) { int hh_alen; diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -882,7 +882,7 @@ static unsigned int ip_sabotage_out(unsi * doesn't use the bridge parent of the indev by using * the BRNF_DONT_TAKE_PARENT mask. */ if (hook == NF_IP_FORWARD && nf_bridge->physindev == NULL) { - nf_bridge->mask &= BRNF_DONT_TAKE_PARENT; + nf_bridge->mask |= BRNF_DONT_TAKE_PARENT; nf_bridge->physindev = (struct net_device *)in; } #if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE) I have just installed them, and my bridging firewall is working again with 2.6.12. Thanks, Chris ___________________________________________________________ How much free photo storage do you get? Store your holiday snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com |
From: Patrick M. <ka...@tr...> - 2005-06-23 17:43:40
|
Chris Rankin wrote: > I have just installed them, and my bridging firewall is working again with 2.6.12. Thanks Chris. Dave, did you get the patches or do you want me to send them again? Regards Patrick |
From: David S. M. <da...@da...> - 2005-06-23 19:50:27
|
From: Patrick McHardy <ka...@tr...> Date: Thu, 23 Jun 2005 19:42:38 +0200 > Thanks Chris. Dave, did you get the patches or do you want me to send > them again? I have the patch, can you give me a nice changelog entry for it? Thanks. |
From: Patrick M. <ka...@tr...> - 2005-06-24 08:39:49
|
David S. Miller wrote: > I have the patch, can you give me a nice changelog entry > for it? Here you go: In 2.6.12 we started dropping the conntrack reference when a packet leaves the IP layer. This broke connection tracking on a bridge, because bridge-netfilter defers calling some NF_IP_* hooks to the bridge layer for locally generated packets going out a bridge, where the conntrack reference is no longer available. This patch keeps the reference in this case as a temporary solution, long term we will remove the defered hook calling. No attempt is made to drop the reference in the bridge-code when it is no longer needed, tc actions could already have sent the packet anywhere. Regards Patrick |
From: David S. M. <da...@da...> - 2005-06-28 23:08:10
|
From: Patrick McHardy <ka...@tr...> Date: Fri, 24 Jun 2005 10:39:08 +0200 > In 2.6.12 we started dropping the conntrack reference when a packet > leaves the IP layer. This broke connection tracking on a bridge, > because bridge-netfilter defers calling some NF_IP_* hooks to the bridge > layer for locally generated packets going out a bridge, where the > conntrack reference is no longer available. This patch keeps the > reference in this case as a temporary solution, long term we will > remove the defered hook calling. No attempt is made to drop the > reference in the bridge-code when it is no longer needed, tc actions > could already have sent the packet anywhere. Patch applied and pushed to st...@ke... Thanks a lot. |
From: Patrick M. <ka...@tr...> - 2005-06-22 00:46:13
|
Bart De Schuymer wrote: > Op di, 21-06-2005 te 17:16 +0200, schreef Patrick McHardy: > >>I unfortunately don't see a way to remove it, but we should keep >>thinking about it. Can you please check if the attached patch is >>correct? It should exclude all packets handled by bridge-netfilter >>from having their conntrack reference dropped. I didn't add nf_reset()'s >>to the bridging code because with tc actions the packets can end up >>anywhere else anyway, and this will hopefully get fixed right sometime. > > Looks good. Thanks. > Perhaps a compile time option to disable postponing the hooks would be > nice... I think we want to reduce the number of possible paths to ideally one, not make them a compile-time option. Regards Patrick |
From: Herbert Xu <he...@go...> - 2005-06-22 21:50:13
|
On Tue, Jun 21, 2005 at 05:16:05PM +0200, Patrick McHardy wrote: > Bart De Schuymer wrote: > > Deferring the hooks makes the bridge-nf code alot more complicated, so I > > would be glad to get rid of it if it is the right thing to do. But > > backwards compatibility can't be maintained and I'd be surprised if > > every ruleset that now works will still be possible using an > > iptables/ebtables scheme. > > I unfortunately don't see a way to remove it, but we should keep > thinking about it. Can you please check if the attached patch is > correct? It should exclude all packets handled by bridge-netfilter > from having their conntrack reference dropped. I didn't add nf_reset()'s > to the bridging code because with tc actions the packets can end up > anywhere else anyway, and this will hopefully get fixed right sometime. I think this patch will be fine for 2.6.12.*. Longer term though we should obsolete the ipt_physdev module. The rationale there is that this creates a precedence that we can't possibly maintain in a consistent way. For example, we don't have a target that matches by hardware MAC address. If you wanted to do that, you'd hook into the arptables interface rather than deferring iptables after the creation of the hardware header. This can be done in two stages to minimise pain for people already using it: 1) We rewrite ipt_physdev to do the lookups necessary to get the output physical devices through the bridge layer. Of course this may not be the real output device due to changes in the environment. So this should be accompanied with a warning that users should switch to ebt. 2) We remove the iptables deferring since ipt_physdev will no longer need it. 3) After a set period (say a year or so) we remove ipt_physdev altogether. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <he...@go...> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt |