Re: [mpls-linux-general] Problems with BGP and mplsd
Status: Beta
Brought to you by:
jleu
|
From: James R. L. <jl...@mi...> - 2003-05-06 13:30:00
|
Excellent debugging. During my last re-write of the nexthop handling
I forgot yo take into consideration BGP routes whose nexthop is an
IP address which is not in a directly connected subnet. I should be able
to come up with a quick fix.
Are you using 'p4' to get access to my development tree? If so I can
just tell you when the fix is in and you can just sync your client
at that time.
On Tue, May 06, 2003 at 11:17:07AM +0200, Fredrik Pettersson wrote:
> I work on the same project as Mattias, and here are some more details.
>=20
> On Tuesday 06 May 2003 07.36, James R. Leu wrote:
> > What version of ldp-portable and can you get a stack trace or trace o=
utput
> > from 'mplsd'
>=20
> ldp-portable is version 0.310.
>=20
> Zebra, ospfd and bgpd are all up and running. BGP is peering with a rou=
ter=20
> which does not redistribute the external routes in OSPF, the intention =
is to=20
> distribute those by using BGP. When mplsd is started it will crash with=
the=20
> following backtrace.
>=20
> (gdb) bt
> #0 0x0804c66b in mpls_zebra_read_ipv4 (command=3D7, client=3D0x80cf7d8=
,=20
> length=3D22) at mpls_zebra.c:282
> #1 0x0809caf7 in zclient_read (thread=3D0xbffffaa0) at zclient.c:867
> #2 0x08095100 in thread_call (thread=3D0xbffffaa0) at thread.c:647
> #3 0x0804cba7 in main (argc=3D3, argv=3D0xbffffb84) at mpls_main.c:223
> #4 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6
>=20
> The segfault is from a debug print which tries to access a pointer with=
out=20
> checking to see if it is NULL. The simple fix is to add a check for NUL=
L,=20
> with the following patch.
>=20
> --- mpls_zebra.c.orig Tue May 6 11:02:19 2003
> +++ mpls_zebra.c Tue May 6 11:02:41 2003
> @@ -279,7 +279,9 @@
> zlog_info("\tnexthop %s", inet_ntoa(tmp));
> }
> if (nexthop.type & MPLS_NH_IF) {
> - zlog_info("\tifindex %d", nexthop.if_handle->ifindex);
> + if (nexthop.if_handle !=3D NULL) {
> + zlog_info("\tifindex %d", nexthop.if_handle->ifindex);
> + }
> }
>=20
> if (command =3D=3D ZEBRA_IPV4_ROUTE_ADD) {
>=20
> This allows mplsd to survive for several seconds longer, but it will se=
gfault=20
> again, with this stacktrace.
>=20
> Program received signal SIGSEGV, Segmentation fault.
> 0x080839dd in mpls_nexthop_compare (nh1=3D0x81b05b0, nh2=3D0xbffff97c) =
at=20
> mpls_compare.c:38
> 38 if ((retval =3D mpls_if_handle_compare(nh1->if_handle,=20
> nh2->if_handle))) {
> (gdb) bt
> #0 0x080839dd in mpls_nexthop_compare (nh1=3D0x81b05b0, nh2=3D0xbffff9=
7c) at=20
> mpls_compare.c:38
> #1 0x08049cf1 in mpls_fib_getnext_route (handle=3D0x80d1e08, dest=3D0x=
bffff960)=20
> at impl_fib.c:154
> #2 0x08069940 in ldp_label_mapping_initial_callback (timer=3D0x81fdff0=
,=20
> extra=3D0x81f8dc8, handle=3D0x80d1ec8) at ldp_label_mapping.c:523
> #3 0x0804b82c in mpls_timer (thread=3D0xbffffaa0) at impl_timer.c:28
> #4 0x08095108 in thread_call (thread=3D0xbffffaa0) at thread.c:647
> #5 0x0804cbaf in main (argc=3D3, argv=3D0xbffffb84) at mpls_main.c:223
> #6 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6
> (gdb) print nh1->if_handle
> $8 =3D (struct interface *) 0x0
> (gdb) print nh2->if_handle
> $9 =3D (struct interface *) 0x0
>=20
> I don't know if I would gain anything by fixing this segfault, even if =
it is=20
> simple to check for NULL in mpls_nexthop_compare as well. The basic pro=
blem=20
> seems to be that the if_handle should be something other than NULL. As=20
> Mattias mentioned, the reason for the error is that BGP reports the gat=
eway=20
> for external routes to be an interface which the Linux machine is not=20
> directly connected to. A possible fix is for mplsd to recognize this=20
> situation and find the route towards the gateway, and use the interface=
=20
> related to that route instead, but that is only a guess from my side. I=
n the=20
> image below which Mattias made, the mplsd at B should replace gateway=20
> 10.1.0.105 with 10.7.2.10. Let us know if there is any other informatio=
n you=20
> need.
>=20
> /Fredrik Pettersson
>=20
> > On Wed, Apr 30, 2003 at 10:49:43AM +0200, Mattias Persson wrote:
> > > The following setup results in a segmentation fault of mplsd:
> > >
> > > .105 .106 .10 .9
> > > X----10.1.0.104/29------A---10.7.2.8/29----B
> > >
> > > Router A is a Cisco 3620 and router B is a linux machine.
> > > Both run OSPF, BGP and MPLS. A and B are bgp peers. They are in the=
same
> > > BGP AS and in the same BGP and OSPF areas.
> > >
> > > Router X runs BGP. It is in another BGP AS and peers with router A.
> > >
> > > mplsd on B crashes when it receives some routes from
> > > the bgp protocol. I have traced the problem and it
> > > occurs because of the way bgp announces it routes:
> > > Router A tells router B that network 10.1.0.8/29 can be
> > > reached on nexthop 10.1.0.105. mplsd on Router B
> > > however, can not handle this because it
> > > does not have a direct connection to 10.1.0.105.
> > >
> > > This results in a seg fault in mplsd (if_handle is null for the nex=
thop).
> > >
> > > /Mattias Persson
>=20
> --=20
> Fredrik Pettersson (Fre...@op...)
> Operax AB (www.operax.com)
> Aurorum 8, 977 75 Lule=E5
> +46 920 75502
--=20
James R. Leu
|