Re: [mpls-linux-general] Problems with BGP and mplsd
Status: Beta
Brought to you by:
jleu
From: James R. L. <jl...@mi...> - 2003-05-06 13:30:00
|
Excellent debugging. During my last re-write of the nexthop handling I forgot yo take into consideration BGP routes whose nexthop is an IP address which is not in a directly connected subnet. I should be able to come up with a quick fix. Are you using 'p4' to get access to my development tree? If so I can just tell you when the fix is in and you can just sync your client at that time. On Tue, May 06, 2003 at 11:17:07AM +0200, Fredrik Pettersson wrote: > I work on the same project as Mattias, and here are some more details. >=20 > On Tuesday 06 May 2003 07.36, James R. Leu wrote: > > What version of ldp-portable and can you get a stack trace or trace o= utput > > from 'mplsd' >=20 > ldp-portable is version 0.310. >=20 > Zebra, ospfd and bgpd are all up and running. BGP is peering with a rou= ter=20 > which does not redistribute the external routes in OSPF, the intention = is to=20 > distribute those by using BGP. When mplsd is started it will crash with= the=20 > following backtrace. >=20 > (gdb) bt > #0 0x0804c66b in mpls_zebra_read_ipv4 (command=3D7, client=3D0x80cf7d8= ,=20 > length=3D22) at mpls_zebra.c:282 > #1 0x0809caf7 in zclient_read (thread=3D0xbffffaa0) at zclient.c:867 > #2 0x08095100 in thread_call (thread=3D0xbffffaa0) at thread.c:647 > #3 0x0804cba7 in main (argc=3D3, argv=3D0xbffffb84) at mpls_main.c:223 > #4 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6 >=20 > The segfault is from a debug print which tries to access a pointer with= out=20 > checking to see if it is NULL. The simple fix is to add a check for NUL= L,=20 > with the following patch. >=20 > --- mpls_zebra.c.orig Tue May 6 11:02:19 2003 > +++ mpls_zebra.c Tue May 6 11:02:41 2003 > @@ -279,7 +279,9 @@ > zlog_info("\tnexthop %s", inet_ntoa(tmp)); > } > if (nexthop.type & MPLS_NH_IF) { > - zlog_info("\tifindex %d", nexthop.if_handle->ifindex); > + if (nexthop.if_handle !=3D NULL) { > + zlog_info("\tifindex %d", nexthop.if_handle->ifindex); > + } > } >=20 > if (command =3D=3D ZEBRA_IPV4_ROUTE_ADD) { >=20 > This allows mplsd to survive for several seconds longer, but it will se= gfault=20 > again, with this stacktrace. >=20 > Program received signal SIGSEGV, Segmentation fault. > 0x080839dd in mpls_nexthop_compare (nh1=3D0x81b05b0, nh2=3D0xbffff97c) = at=20 > mpls_compare.c:38 > 38 if ((retval =3D mpls_if_handle_compare(nh1->if_handle,=20 > nh2->if_handle))) { > (gdb) bt > #0 0x080839dd in mpls_nexthop_compare (nh1=3D0x81b05b0, nh2=3D0xbffff9= 7c) at=20 > mpls_compare.c:38 > #1 0x08049cf1 in mpls_fib_getnext_route (handle=3D0x80d1e08, dest=3D0x= bffff960)=20 > at impl_fib.c:154 > #2 0x08069940 in ldp_label_mapping_initial_callback (timer=3D0x81fdff0= ,=20 > extra=3D0x81f8dc8, handle=3D0x80d1ec8) at ldp_label_mapping.c:523 > #3 0x0804b82c in mpls_timer (thread=3D0xbffffaa0) at impl_timer.c:28 > #4 0x08095108 in thread_call (thread=3D0xbffffaa0) at thread.c:647 > #5 0x0804cbaf in main (argc=3D3, argv=3D0xbffffb84) at mpls_main.c:223 > #6 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6 > (gdb) print nh1->if_handle > $8 =3D (struct interface *) 0x0 > (gdb) print nh2->if_handle > $9 =3D (struct interface *) 0x0 >=20 > I don't know if I would gain anything by fixing this segfault, even if = it is=20 > simple to check for NULL in mpls_nexthop_compare as well. The basic pro= blem=20 > seems to be that the if_handle should be something other than NULL. As=20 > Mattias mentioned, the reason for the error is that BGP reports the gat= eway=20 > for external routes to be an interface which the Linux machine is not=20 > directly connected to. A possible fix is for mplsd to recognize this=20 > situation and find the route towards the gateway, and use the interface= =20 > related to that route instead, but that is only a guess from my side. I= n the=20 > image below which Mattias made, the mplsd at B should replace gateway=20 > 10.1.0.105 with 10.7.2.10. Let us know if there is any other informatio= n you=20 > need. >=20 > /Fredrik Pettersson >=20 > > On Wed, Apr 30, 2003 at 10:49:43AM +0200, Mattias Persson wrote: > > > The following setup results in a segmentation fault of mplsd: > > > > > > .105 .106 .10 .9 > > > X----10.1.0.104/29------A---10.7.2.8/29----B > > > > > > Router A is a Cisco 3620 and router B is a linux machine. > > > Both run OSPF, BGP and MPLS. A and B are bgp peers. They are in the= same > > > BGP AS and in the same BGP and OSPF areas. > > > > > > Router X runs BGP. It is in another BGP AS and peers with router A. > > > > > > mplsd on B crashes when it receives some routes from > > > the bgp protocol. I have traced the problem and it > > > occurs because of the way bgp announces it routes: > > > Router A tells router B that network 10.1.0.8/29 can be > > > reached on nexthop 10.1.0.105. mplsd on Router B > > > however, can not handle this because it > > > does not have a direct connection to 10.1.0.105. > > > > > > This results in a seg fault in mplsd (if_handle is null for the nex= thop). > > > > > > /Mattias Persson >=20 > --=20 > Fredrik Pettersson (Fre...@op...) > Operax AB (www.operax.com) > Aurorum 8, 977 75 Lule=E5 > +46 920 75502 --=20 James R. Leu |