Re: [mpls-linux-general] Problems with BGP and mplsd
Status: Beta
Brought to you by:
jleu
From: Fredrik P. <Fre...@op...> - 2003-05-06 09:17:38
|
I work on the same project as Mattias, and here are some more details. On Tuesday 06 May 2003 07.36, James R. Leu wrote: > What version of ldp-portable and can you get a stack trace or trace output > from 'mplsd' ldp-portable is version 0.310. Zebra, ospfd and bgpd are all up and running. BGP is peering with a router which does not redistribute the external routes in OSPF, the intention is to distribute those by using BGP. When mplsd is started it will crash with the following backtrace. (gdb) bt #0 0x0804c66b in mpls_zebra_read_ipv4 (command=7, client=0x80cf7d8, length=22) at mpls_zebra.c:282 #1 0x0809caf7 in zclient_read (thread=0xbffffaa0) at zclient.c:867 #2 0x08095100 in thread_call (thread=0xbffffaa0) at thread.c:647 #3 0x0804cba7 in main (argc=3, argv=0xbffffb84) at mpls_main.c:223 #4 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6 The segfault is from a debug print which tries to access a pointer without checking to see if it is NULL. The simple fix is to add a check for NULL, with the following patch. --- mpls_zebra.c.orig Tue May 6 11:02:19 2003 +++ mpls_zebra.c Tue May 6 11:02:41 2003 @@ -279,7 +279,9 @@ zlog_info("\tnexthop %s", inet_ntoa(tmp)); } if (nexthop.type & MPLS_NH_IF) { - zlog_info("\tifindex %d", nexthop.if_handle->ifindex); + if (nexthop.if_handle != NULL) { + zlog_info("\tifindex %d", nexthop.if_handle->ifindex); + } } if (command == ZEBRA_IPV4_ROUTE_ADD) { This allows mplsd to survive for several seconds longer, but it will segfault again, with this stacktrace. Program received signal SIGSEGV, Segmentation fault. 0x080839dd in mpls_nexthop_compare (nh1=0x81b05b0, nh2=0xbffff97c) at mpls_compare.c:38 38 if ((retval = mpls_if_handle_compare(nh1->if_handle, nh2->if_handle))) { (gdb) bt #0 0x080839dd in mpls_nexthop_compare (nh1=0x81b05b0, nh2=0xbffff97c) at mpls_compare.c:38 #1 0x08049cf1 in mpls_fib_getnext_route (handle=0x80d1e08, dest=0xbffff960) at impl_fib.c:154 #2 0x08069940 in ldp_label_mapping_initial_callback (timer=0x81fdff0, extra=0x81f8dc8, handle=0x80d1ec8) at ldp_label_mapping.c:523 #3 0x0804b82c in mpls_timer (thread=0xbffffaa0) at impl_timer.c:28 #4 0x08095108 in thread_call (thread=0xbffffaa0) at thread.c:647 #5 0x0804cbaf in main (argc=3, argv=0xbffffb84) at mpls_main.c:223 #6 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6 (gdb) print nh1->if_handle $8 = (struct interface *) 0x0 (gdb) print nh2->if_handle $9 = (struct interface *) 0x0 I don't know if I would gain anything by fixing this segfault, even if it is simple to check for NULL in mpls_nexthop_compare as well. The basic problem seems to be that the if_handle should be something other than NULL. As Mattias mentioned, the reason for the error is that BGP reports the gateway for external routes to be an interface which the Linux machine is not directly connected to. A possible fix is for mplsd to recognize this situation and find the route towards the gateway, and use the interface related to that route instead, but that is only a guess from my side. In the image below which Mattias made, the mplsd at B should replace gateway 10.1.0.105 with 10.7.2.10. Let us know if there is any other information you need. /Fredrik Pettersson > On Wed, Apr 30, 2003 at 10:49:43AM +0200, Mattias Persson wrote: > > The following setup results in a segmentation fault of mplsd: > > > > .105 .106 .10 .9 > > X----10.1.0.104/29------A---10.7.2.8/29----B > > > > Router A is a Cisco 3620 and router B is a linux machine. > > Both run OSPF, BGP and MPLS. A and B are bgp peers. They are in the same > > BGP AS and in the same BGP and OSPF areas. > > > > Router X runs BGP. It is in another BGP AS and peers with router A. > > > > mplsd on B crashes when it receives some routes from > > the bgp protocol. I have traced the problem and it > > occurs because of the way bgp announces it routes: > > Router A tells router B that network 10.1.0.8/29 can be > > reached on nexthop 10.1.0.105. mplsd on Router B > > however, can not handle this because it > > does not have a direct connection to 10.1.0.105. > > > > This results in a seg fault in mplsd (if_handle is null for the nexthop). > > > > /Mattias Persson -- Fredrik Pettersson (Fre...@op...) Operax AB (www.operax.com) Aurorum 8, 977 75 Luleå +46 920 75502 |