AW: [mpls-linux-general] RES: Zebra LDP Crash (Discovered source of problem)
Status: Beta
Brought to you by:
jleu
|
From: Georg K. <gk...@gi...> - 2002-11-04 07:59:30
|
Hi all,
I had the same problem, here. The problem lies in the zebra code. The rib
structure
(especially rib_table) is not initialized for static routes. Please try t=
he
additional
line in the zrib.c
---------------------------------------------------------------
--- zrib.c.orig Mon Nov 4 08:51:09 2002
+++ zrib.c Mon Nov 4 08:51:31 2002
@@ -133,6 +133,7 @@
rib->type =3D ZEBRA_ROUTE_STATIC;
rib->distance =3D si->distance;
rib->metric =3D 0;
+ rib->rib_table =3D rib_table_ipv4;
rib->nexthop_num =3D 0;
switch (si->type)
---------------------------------------------------------------
Unfortunately, there might be other memebers of the struct which are not
assigned
correctly. Anyway, the zebra code won't crash anymore.
Please note: the ospfd will only propagate theses static routes, if you a=
dd the
"redistribute static" line to the ospfd.conf.
Unfortunately, the LDP is not aware of these static routes and I couldn't=
get it
to assign labels for those routes :-(
Kind regards,
Georg Klug
> After you patched zebra to created mplsd did you do a 'make distclean' =
from
> the top level? Are you sure you're running the zebra binary created at=
the
> sametime as the mplsd binary?
>
> I will look into this failure, but it looks like it is in some code tha=
t I
> don't mess with.
>
> Thank for the backtrace, this is very helpful.
>
> On Fri, Nov 01, 2002 at 12:08:14PM -0300, Pl=EDnio de Paula wrote:
> > Zebra stopped generating core dump (maybe because I upgraded the
> OS), but problem persists...
> > I compiled zebra with debug info and here is the backtrace:
> > --------------------------------------
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0805c8fb in rib_install_lower (rn=3D0x8087600, rib=3D0x8087638) at =
rib.c:661
> > 661 if (rib->rib_table->rib_install_kernel)
> > (gdb) bt
> > #0 0x0805c8fb in rib_install_lower (rn=3D0x8087600, rib=3D0x8087638)
> at rib.c:661
> > #1 0x0805cae0 in rib_process (rn=3D0x8087600, del=3D0x0) at rib.c:78=
5
> > #2 0x0804d6d6 in static_ipv4_add (p=3D0xbffff8c0, gate=3D0x0,
> > ifname=3D0x8087a20 "eth4", distance=3D1 '\001', table=3D0) at zri=
b.c:297
> > #3 0x0804da5a in static_ipv4_func (vty=3D0xbffff8b8, add_cmd=3D1,
> > dest_str=3D0x80870b8 "10.10.1.10/32",
> > mask_str=3D0x1 <Address 0x1 out of bounds>, gate_str=3D0x8087a20 =
"eth4",
> > distance_str=3D0x0) at zrib.c:440
> > #4 0x0804daa4 in ip_route (self=3D0x806b4e0, vty=3D0x8087240, argc=3D=
2, argv=3D0x2)
> > at zrib.c:457
> > #5 0x08054952 in cmd_execute_command_strict (vline=3D0x8087550,
> vty=3D0x8087240,
> > cmd=3D0x0) at command.c:1963
> > #6 0x08054a7b in config_from_file (vty=3D0x8087240, fp=3D0x80870d0)
> > at command.c:2001
> > #7 0x08051755 in vty_read_file (confp=3D0x80870d0) at vty.c:2079
> > #8 0x08051a23 in vty_read_config (config_file=3D0x0,
> > config_current_dir=3D0x806b260 "zebra.conf",
> > config_default_dir=3D0x806b26b "/usr/local/etc/zebra.conf") at vt=
y.c:2266
> > #9 0x0804b682 in main (argc=3D0, argv=3D0xbffffb54) at main.c:287
> > #10 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6
> > --------------------------------------
> >
> > This is my zebra.conf:
> > --------------------------------------
> > hostname routerA
> >
> > interface eth5
> > description Fiber1000 Interface -> routerB
> > ip address 10.10.2.1/24
> > shutdown
> >
> > interface eth4
> > description Fiber1000 Interface -> clientA
> > ip address 10.10.1.1/24
> > no shutdown ***
> >
> > interface eth3
> > description Fiber100 Interface -> Optical Network
> > no shutdown
> >
> > interface eth2
> > description Fiber100 Interface -> Optical Network
> > no shutdown
> >
> > interface eth1
> > description Fiber100 Interface -> Optical Network
> > no shutdown
> >
> > ip route 10.10.1.10/32 eth4 <- This causes seg fault if eth4 (***) is=
up
> > | Instantaneous seg fault if initially
> down then brought up within zebra vty
> > | Same problem applies to Giga and
> Fast NICs (All opticals)
> > ----------------------------------------
> >
> >
> > -----Mensagem original-----
> > De: James R. Leu [mailto:jl...@mi...]
> > Enviada em: quinta-feira, 31 de outubro de 2002 20:45
> > Para: Pl=EDnio de Paula
> > Cc: mpl...@li...
> > Assunto: Re: RES: [mpls-linux-general] Zebra LDP Crash (Discovered
> > source of problem)
> >
> >
> > No one else is using static routes (that I know of). I know I've nev=
er
> > tried it. Do you get a core file? Give me the backtrace from it and=
I'll
> > try to fix it.
> >
> > On Thu, Oct 31, 2002 at 06:48:07PM -0300, Pl=EDnio de Paula wrote:
> > > My configuration of zebra included static routes! With LDP patch
> they cause zebra segmentation fault!
> > >
> > > Without static routes, LDP-patched-zebra runs OK...
> > >
> > > Is this happening with everybody?
> > >
> > > See you!
> > >
> > > Pl=EDnio de Paula
> > > UNICAMP/Brazil
> > >
> > > -----Mensagem original-----
> > > De: James R. Leu [mailto:jl...@mi...]
> > > Enviada em: quinta-feira, 31 de outubro de 2002 15:11
> > > Para: Pl=EDnio de Paula
> > > Cc: Gianfranco Delli Carri; mpl...@li...=
t
> > > Assunto: Re: [mpls-linux-general] Zebra LDP Crash
> > >
> > >
> > > Do you have acore file? Can you get me the backtrace from the core=
dump?
> > >
> > > On Thu, Oct 31, 2002 at 02:04:17PM -0300, Pl=EDnio de Paula wrote:
> > > > Hello Gianfranco,
> > > >
> > > > I=B4m trying to compile zebra with LDP patch in the same
> configuration as yours. The compilation goes OK, but
> > > > when I call zebra, it generates core dump. Have you crossed
> similar problems? What did you do about them?
> > > >
> > > > Pl=EDnio de Paula
> > > > UNICAMP
> > > >
> > > > -----Mensagem original-----
> > > > De: Gianfranco Delli Carri [mailto:gf....@nc...]
> > > > Enviada em: quarta-feira, 30 de outubro de 2002 22:13
> > > > Para: 'mpl...@li...'
> > > > Assunto: [mpls-linux-general] Zebra LDP session
> > > >
> > > >
> > > > Hi to all,
> > > >
> > > > I have a linux box (2.4.19) patched with mpls-linux-1.170 and
> zebra-0.93b
> > > > patched with ldp-portable-0.250.
> > > >
> > > > When I a start mlpsd after zebra and ospfs, in my CISCO router MP=
LS/LDP
> > > > enabled, I can see the LDP connection setting UP, but after few
> second (hold
> > > > timer) it come down.
> > > >
> > > > Debugging MPLSD I can see:
> > > >
> > > > /usr/local/sbin/mplsd
> > > > ldp_if_new:
> > > > 2002/10/31 02:00:24 MPLS: MPLSd (0.93b) starts
> > > > 2002/10/31 02:00:24 MPLS: interface add lo index 1 flags 73 metri=
c 1 mtu
> > > > 16436
> > > > 2002/10/31 02:00:24 MPLS: address add 127.0.0.1 to interface lo
> > > > 2002/10/31 02:00:24 MPLS: interface add eth0 index 2 flags 4419
> metric 1 mtu
> > > > 1500
> > > > 2002/10/31 02:00:24 MPLS: address add 10.254.0.250 to interface e=
th0
> > > > 2002/10/31 02:00:24 MPLS: router-id change 10.254.0.250
> > > > 2002/10/31 02:00:24 MPLS: router-id update 10.254.0.250
> > > > 2002/10/31 02:00:24 MPLS: router add 0.0.0.0/0
> > > > 2002/10/31 02:00:24 MPLS: nexthop 10.254.0.1
> > > > 2002/10/31 02:00:24 MPLS: ifindex 2
> > > > session delete
> > > >
> > > > Debugging CISCO LDP:
> > > >
> > > > Oct 31 02:00:24.584 CET: ldp: Opening ldp conn; adj 0x67827E30,
> 10.254.2.6
> > > > <-> 10.254.0.250
> > > > Oct 31 02:00:24.584 CET: ldp: ldp conn is up; adj 0x67827E30,
> > > > 10.254.2.6:11439 <-> 10.254.0.250:646
> > > > Oct 31 02:00:24.584 CET: ldp: Sent init msg to 10.254.0.250 (pp 0=
x0)
> > > > Oct 31 02:00:24.604 CET: ldp: ldp conn closed by peer; adj 0x6782=
7E30
> > > > 10.254.2.6:11439 <-> 10.254.0.250:646, FastEthernet0/0
> > > > Oct 31 02:00:24.604 CET: ldp: Closing ldp conn 10.254.2.6:11439 <=
->
> > > > 10.254.0.250:646, adj 0x67827E30
> > > > Oct 31 02:00:29.588 CET: ldp: Opening ldp conn; adj 0x67827E30,
> 10.254.2.6
> > > > <-> 10.254.0.250
> > > > Oct 31 02:00:29.588 CET: ldp: ldp conn is up; adj 0x67827E30,
> > > > 10.254.2.6:11440 <-> 10.254.0.250:646
> > > > Oct 31 02:00:29.588 CET: ldp: Sent init msg to 10.254.0.250 (pp 0=
x0)
> > > > Oct 31 02:00:29.600 CET: ldp: Rcvd init msg from 10.254.0.250 (pp=
0x0)
> > > > Oct 31 02:00:29.600 CET: ldp: Sent keepalive msg to
> 10.254.0.250:0 (pp 0x0)
> > > > Oct 31 02:00:29.604 CET: ldp: Rcvd keepalive msg from 10.254.0.25=
0:0 (pp
> > > > 0x0)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent address msg to 10.254.0.250:0 =
(pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > Oct 31 02:00:29.608 CET: ldp: Sent label mapping msg to
> 10.254.0.250:0 (pp
> > > > 0x6225D768)
> > > > etc...
> > > > Oct 31 02:00:44.605 CET: ldp: Discovery hold timer expired for ad=
j
> > > > 0x67827E30, 10.254.0.250:0, will close conn
> > > > Oct 31 02:00:44.605 CET: ldp: Sent notif msg to 10.254.0.250:0 (p=
p
> > > > 0x6225D768)
> > > > Oct 31 02:00:44.605 CET: ldp: Sent notif msg to 10.254.0.250:0 (p=
p
> > > > 0x6225D768)
> > > > Oct 31 02:00:44.605 CET: ldp: Close LDP transport conn for adj
> 0x67827E30
> > > > Oct 31 02:00:44.605 CET: ldp: Closing ldp conn 10.254.2.6:11440 <=
->
> > > > 10.254.0.250:646, adj 0x67827E30
> > > >
> > > > Ah... my MPLSD process come to use all the CPU time:
> > > >
> > > > ps aux
> > > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME CO=
MMAND
> > > > root 769 98.4 0.7 2076 904 pts/0 R 02:00 5:10
> > > > /usr/local/sbin/mplsd
> > > >
> > > > and I'm always unable to telnet on it, the session freeze.
> > > >
> > > > telnet 10.254.0.250 2610
> > > > Trying 10.254.0.250...
> > > > Connected to 10.254.0.250.
> > > > Escape character is '^]'.
> > > >
> > > >
> > > >
> > > > Have you any kind of idea ?
> > > >
> > > > Thanks in advance.
> > > >
> > > > Regards,
> > > >
> > > > Gianfranco
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > This sf.net email is sponsored by: Influence the future
> > > > of Java(TM) technology. Join the Java Community
> > > > Process(SM) (JCP(SM)) program now.
> > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en
> > > > _______________________________________________
> > > > mpls-linux-general mailing list
> > > > mpl...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > This sf.net email is sponsored by: Influence the future
> > > > of Java(TM) technology. Join the Java Community
> > > > Process(SM) (JCP(SM)) program now.
> > > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en
> > > > _______________________________________________
> > > > mpls-linux-general mailing list
> > > > mpl...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general
> > >
> > > --
> > > James R. Leu
> >
> > --
> > James R. Leu
> >
> >
> > -------------------------------------------------------
> > This sf.net email is sponsored by: See the NEW Palm
> > Tungsten T handheld. Power & Color in a compact size!
> > http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
> > _______________________________________________
> > mpls-linux-general mailing list
> > mpl...@li...
> > https://lists.sourceforge.net/lists/listinfo/mpls-linux-general
>
> --
> James R. Leu
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: See the NEW Palm
> Tungsten T handheld. Power & Color in a compact size!
> http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
> _______________________________________________
> mpls-linux-general mailing list
> mpl...@li...
> https://lists.sourceforge.net/lists/listinfo/mpls-linux-general
>
|