Thread: [Keepalived-devel] Keepalived and Cisco Switches
Status: Beta
Brought to you by:
acassen
From: Shaun M. <sha...@ma...> - 2004-05-27 11:46:23
|
Hi, I've encountered some flapping problems with Keepalived v1.1.1 (on RH Linux 7.3 Kernel 2.4.18-5) when used with Cisco 2948 & C3548-XL switches. Both Master and Backup PC's use 3COM 905C NICS. As an experiment I tried ' ifconfig eth2 down'. on the Backup system to check it recovered from a FAULT state. The system went into FAULT state as expected, but I when I exec'd 'ifconfig eth2 up', keepalived initially went to Backup state, then started oscillating between MASTER and BACKUP state. I fixed the problem by increasing the advert_int to 35 seconds (on both Master and Backup system). The problem with this is when Keepalived is started the VIPs obviously take much longer to start than if the advert_int is set to 5 seconds. Has anybody else come across this? I'd grateful for suggestions as to what to investigate, as I quite like to set the advert_int back to 5 seconds Cheers Shaun ! Configuration File for keepalived global_defs { notification_email { sh...@na... } smtp_server 192.168.200.92 smtp_connect_timeout 30 lvs_id OCTOPUSSY2 } vrrp_sync_group G1 { group { VI_11 VI_12 VI_13 VI_14 VI_15 VI_16 } smtp_alert } vrrp_instance VI_11 { state BACKUP track_interface { eth0 } interface eth0.4 mcast_src_ip 10.32.5.252 garp_master_delay 1 virtual_router_id 11 priority 100 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 10.32.5.254/23 } } vrrp_instance VI_12 { state BACKUP interface eth0.6 track_interface { eth0 } mcast_src_ip 10.32.7.252 garp_master_delay 1 virtual_router_id 12 priority 100 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 10.32.7.254/23 } } vrrp_instance VI_13 { state BACKUP interface eth0.8 track_interface { eth0 } mcast_src_ip 10.32.9.252 garp_master_delay 1 virtual_router_id 13 priority 100 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 10.32.9.254/24 } } vrrp_instance VI_14 { state BACKUP track_interface { eth1 } interface eth1 mcast_src_ip 10.32.0.252 garp_master_delay 1 virtual_router_id 14 priority 100 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 10.32.0.254/24 } virtual_routes { 0.0.0.0/0 via 10.32.0.200 dev eth1 } } vrrp_instance VI_15 { state BACKUP track_interface { eth2 } interface eth2.104 virtual_router_id 15 mcast_src_ip 192.168.200.252 garp_master_delay 1 priority 100 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 192.168.200.254/24 } virtual_routes { 172.16.1.0/24 via 192.168.200.8 dev eth2.104 172.16.2.0/24 via 192.168.200.8 dev eth2.104 10.0.0.0/16 via 192.168.200.8 dev eth2.104 } } vrrp_instance VI_16 { state BACKUP track_interface { eth2 } interface eth2.120 mcast_src_ip 10.0.12.252 virtual_router_id 16 garp_master_delay 1 priority 100 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 10.0.12.254/24 } } |
From: Graeme F. <kee...@gr...> - 2004-05-27 12:06:08
|
On Thu, 27 May 2004, Shaun McCullagh wrote: > I've encountered some flapping problems with Keepalived v1.1.1 (on RH > Linux 7.3 Kernel 2.4.18-5) when used with Cisco 2948 & C3548-XL switches. Hard set your switch speed/duplex settings for those ports, and use "mii-tool" (assuming it will support your cards) to do the same at the server end. Cisco switches take up to 30 seconds to complete their autonegotiation - if they're hard set, they don't. HTH Graeme |
From: Kjetil T. H. <kje...@if...> - 2004-05-27 18:15:42
|
On Thu, 2004-05-27 at 13:06 +0100, Graeme Fowler wrote: > On Thu, 27 May 2004, Shaun McCullagh wrote: > > I've encountered some flapping problems with Keepalived v1.1.1 (on RH > > Linux 7.3 Kernel 2.4.18-5) when used with Cisco 2948 & C3548-XL switches. > > Hard set your switch speed/duplex settings for those ports, and use "mii-tool" > (assuming it will support your cards) to do the same at the server end. > > Cisco switches take up to 30 seconds to complete their autonegotiation - if > they're hard set, they don't. it's not auto-negotiation which takes time, it's the spanning tree algorithm. it's required to wait for 30 seconds to discover loops in the topology (nodes will only announce their presence so often). you can turn this off if you're certain the port will never be used to connect to switches with the configuration option "spanning-tree portfast". -- Kjetil T. |
From: Graeme F. <kee...@gr...> - 2004-05-28 08:21:20
|
On Thu, 27 May 2004, Kjetil Torgrim Homme wrote: > it's not auto-negotiation which takes time, it's the spanning tree > algorithm. it's required to wait for 30 seconds to discover loops in > the topology (nodes will only announce their presence so often). you > can turn this off if you're certain the port will never be used to > connect to switches with the configuration option "spanning-tree > portfast". Whoops! My mistake; indeed it is the spanning tree algorithm. I also ensure that I have "spanning-tree portfast" set on interfaces which I know will always be connected to hosts rather than switches (or in fact where I know that the port may connect to a switch which is not spanning tree capable). One point of note though is that I have on occasion been bitten by interfaces which continually autonegotiate - whilst connectivity seems OK, the interface itself flaps wildly ever few seconds. Hence the comments about hard-setting port speeds :) Graeme |
From: Shaun M. <sha...@ma...> - 2004-05-28 12:10:48
|
Hi, Many thanks to Graeme, Kjetil and Alexandre for all the advice offered. I've reconfigured our switch and NICs to operate in Fixed/Full mode. The good news is all the errors we were getting on the switch have now gone. The bad news is Keepalived reports eth1 is down about 2 minutes after starting it, then it goes into fault state. In fact eth1 is not down, ifconfig eth1 shows it is up, and all hosts on the associated network can ping it. The same problem happens when I disable iptables. One thing I do notice when I start Keepalived is this message: May 28 13:45:46 octopussy2 Keepalived_vrrp: VRRP sockpool: [ifindex(3), proto(51), fd(-1,-1)] ..... May 28 13:45:51 octopussy2 Keepalived: Watchdog: success connecting /tmp/.vrrp wdog socket May 28 13:47:32 octopussy2 Keepalived_vrrp: Kernel is reporting: interface eth1 DOWN May 28 13:47:32 octopussy2 Keepalived_vrrp: Kernel is reporting: interface eth1 DOWN May 28 13:47:32 octopussy2 Keepalived_vrrp: VRRP_Instance(VI_14) Now in FAULT state I think ifindex(3) corresponds to eth1 and I notice it gets a negative file descriptor........ We are using the 3c59x driver v1.1.16 patched to support VLANs. Our modules.conf file looks like this: alias eth0 3c59x alias eth1 3c59x alias eth2 3c59x options 3c59x vlan=1,,1 options=0x204,0x204,0x204 Note that eth1 does not have any vlans associated with it so there is no '1' Any ideas? Thanks again, and thanks to everybody who has contributed to the Keepalived Project, which we have been using with great success for over 18 months on our Production systems (they use Bcom NICs tho' :-) ) Cheers Shaun |
From: Alexandre C. <ac...@fr...> - 2004-05-28 12:36:19
|
Hi, > May 28 13:45:46 octopussy2 Keepalived_vrrp: VRRP sockpool: [ifindex(3), > proto(51), fd(-1,-1)] > ..... the fd(-1,-1) is bad... this is a problem while creating VRRP socket binded to device using IPSEC_AH. BTW, I would recommand to not use IPSEC_AH if one or a couple of VRRP instance are using sync_group... IPSEC_AH only on instance not related to any sync_group. I leave IPSEC_AH support since I spend time coding this but in last IETF summit, VRRP WG decided to remove any kind of authentication. Best regards, Alexandre |
From: Shaun M. <sha...@ma...> - 2004-05-28 13:12:24
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Alexandre Cassen wrote:<br> <blockquote cite="mid1085747777.3436.39.camel@lnxos" type="cite"> <pre wrap="">Hi, </pre> <blockquote type="cite"> <pre wrap="">May 28 13:45:46 octopussy2 Keepalived_vrrp: VRRP sockpool: [ifindex(3), proto(51), fd(-1,-1)] ..... </pre> </blockquote> <pre wrap=""><!----> the fd(-1,-1) is bad... this is a problem while creating VRRP socket binded to device using IPSEC_AH. BTW, I would recommand to not use IPSEC_AH if one or a couple of VRRP instance are using sync_group... IPSEC_AH only on instance not related to any sync_group. I leave IPSEC_AH support since I spend time coding this but in last IETF summit, VRRP WG decided to remove any kind of authentication. Best regards, Alexandre </pre> </blockquote> Hi Alexandre,<br> <br> We don't really need IPSEC_AH, on these systems, but could I use this revised config instead?<br> <br> However on some of our other systems, I would rather continue to use IPSEC_AH as they have Public VIPs, and I'm concerned that if I change to auth PASS the packets might be spoofed. Are there better ways to prevent spoofing?<br> <br> Cheers<br> <br> Shaun<br> <br> <br> ! Configuration File for keepalived <br> <br> global_defs { <br> notification_email { <br> <a class="moz-txt-link-abbreviated" href="mailto:sh...@na...">sh...@na...</a> <br> } <br> smtp_server 192.168.200.92 <br> smtp_connect_timeout 30 <br> lvs_id OCTOPUSSY2 <br> } <br> <br> <br> vrrp_sync_group G1 { <br> group { <br> VI_11 <br> } <br> smtp_alert <br> } <br> <br> <br> <br> vrrp_instance VI_11 { <br> state BACKUP <br> track_interface { <br> eth0 <br> eth1<br> eth2<br> } <br> virtual_router_id 11 <br> interface eth1<br> priority 100 <br> advert_int 35 <br> authentication { <br> auth_type AH <br> auth_pass <PW> <br> } <br> virtual_ipaddress { <br> 10.32.5.254/23 dev eth0.4 <br> 10.32.7.254/23 dev eth0.6<br> 10.32.9.254/24 dev eth0.8<br> 10.32.0.254/24 dev eth1<br> 192.168.200.254/24 dev eth2.104<br> 10.0.12.254/24 eth2.120<br> } <br> virtual_routes { <br> 172.16.1.0/24 via 192.168.200.8 dev eth2.104 <br> 172.16.2.0/24 via 192.168.200.8 dev eth2.104 <br> 10.0.0.0/16 via 192.168.200.8 dev eth2.104<br> 0.0.0.0/0 via 10.32.0.200 dev eth1 <br> } <br> } <br> <br> <br> </body> </html> |
From: Alexandre C. <ac...@fr...> - 2004-05-28 13:24:26
|
Hi Shaun, > We don't really need IPSEC_AH, on these systems, but could I use this > revised config instead? If you only have vrrp_instance then you don't need sync_group... and reciproc: sync_group with only one vrrp_instance is not consistent. > However on some of our other systems, I would rather continue to use > IPSEC_AH as they have Public VIPs, and I'm concerned that if I change > to auth PASS the packets might be spoofed. Are there better ways to > prevent spoofing? if you are using IPSEC_AH outside instance present in a sync_group (I mean standalone vrrp_instance), then IPSEC_AH is OK. In past regression tests, I found some weird case where using IPSEC_AH with sync_group introduce some mad things... I just want to WARN on this point. Best regards, Alexandre |
From: Shaun M. <sha...@ma...> - 2004-05-28 13:53:02
|
Alexandre Cassen wrote: >Hi Shaun, > > > >>We don't really need IPSEC_AH, on these systems, but could I use this >>revised config instead? >> >> > >If you only have vrrp_instance then you don't need sync_group... and >reciproc: sync_group with only one vrrp_instance is not consistent. > > > Point taken.... I'll remove the sync_group. Can I put smtp_alert in the vrrp_instance? >>However on some of our other systems, I would rather continue to use >>IPSEC_AH as they have Public VIPs, and I'm concerned that if I change >>to auth PASS the packets might be spoofed. Are there better ways to >>prevent spoofing? >> >> > >if you are using IPSEC_AH outside instance present in a sync_group (I >mean standalone vrrp_instance), then IPSEC_AH is OK. In past regression >tests, I found some weird case where using IPSEC_AH with sync_group >introduce some mad things... I just want to WARN on this point. > > We are using IPSEC_AH, inside a sync_group, on other systems, but these work fine. However I don't want to ignore your warning. The problem is I have 40 Public VIPs running on one Keepalived config. I believe a vrrp_instance cannot support more that 20 VIPs, so I have to have at least two vrrp_instances with IPSEC_AH. The reason I want to use IPSEC_AH on these instances is prevent IP spoofing. I'm unsure how to resolve this. Do I need to worry about IP spoofing or am I just being paranoid? Many thanks.... (again) Shaun |
From: Alexandre C. <ac...@fr...> - 2004-05-28 14:12:23
|
> Point taken.... I'll remove the sync_group. Can I put smtp_alert in the > vrrp_instance? ya (cf: doc/keepalived.conf.SYNOPSIS) > We are using IPSEC_AH, inside a sync_group, on other systems, but these > work fine. > However I don't want to ignore your warning. The problem is I have 40 > Public VIPs running on one Keepalived config. > I believe a vrrp_instance cannot support more that 20 VIPs, so I have to > have at least two vrrp_instances with IPSEC_AH. Place your VIP in block : virtual_ipaddress_excluded { ... } instead of virtual_ipaddress. > The reason I want to use IPSEC_AH on these instances is prevent IP > spoofing. I'm unsure how to resolve this. > Do I need to worry about IP spoofing or am I just being paranoid? Well to prevent against cooked packet injection you have multiple way to solve this... First VRRP is using TTL=255 in IP field to prevent against hope injection... IMHO, this solve most of injection issue. OTOH, most of VRRP routers are gateways or border routers which mean that split network topology into 2 segments, and considering the TTL=255 sanity check this guarantee a good security level. You can have the problem if intruder cooking packet are on the same network segment as VRRP NICs, otherwise TTL sanity is good. Anyway IPSEC_AH is the best solution but it introduce IPSEC seq_num synchronization during takeover which can introduce 'some' issues if used in conjonction with sync_group. regards, Alexandre |
From: Shaun M. <sha...@ma...> - 2004-06-08 12:05:20
|
Hi, After much wringing of hands I've persuaded keepalived v1.1.1 to work again with our 3c905 NICs. The original problem was our Cisco 3548-XL was reporting thousands of runt packets from our Keepalived hosts. I thought that forcing the NICs into 100Mb Full Duplex and changing the config of the switch to be the same would fix things. Well the runts disappeared, but Keepalived always reported a negative file descriptor for one interface every time it started, 10 seconds later it logged that the kernel was flagging the NIC as down. Consequently it went into a FAULT state. Even though all the NICs were working correctly. I tried a simpler Keepalived config with only one vrrp_instance and plain text authentication, same result. Then I tried Keepalived v1.1.3, all to no avail :(. This morning I loaded the latest production kernel for RH 7.3 from http://www.fedoralegacy.org -- same problem Finally I stopped forcing the NICs into 100 Mb Fixed Full Duplex and changed the switch back to auto for both Duplex and speed selection. Result: No errors and Keepalived works purrfect. Note that this time I'm using the default 3c59x driver as this supports VLAN tagging. I suspect this problem only occurs with 3c905's, perhaps it's a driver related problem rather that Keepalived. Thanks to Alexandre, et al, for all your help. Shaun |
From: Jeremy R. <ru...@os...> - 2004-05-27 19:14:42
|
On Thursday 27 May 2004 14:14 pm, Kjetil Torgrim Homme wrote: > On Thu, 2004-05-27 at 13:06 +0100, Graeme Fowler wrote: > > On Thu, 27 May 2004, Shaun McCullagh wrote: > > > I've encountered some flapping problems with Keepalived v1.1.1 (on RH > > > Linux 7.3 Kernel 2.4.18-5) when used with Cisco 2948 & C3548-XL > > > switches. > > > > Hard set your switch speed/duplex settings for those ports, and use > > "mii-tool" (assuming it will support your cards) to do the same at the > > server end. > > > > Cisco switches take up to 30 seconds to complete their autonegotiation - > > if they're hard set, they don't. > > it's not auto-negotiation which takes time, it's the spanning tree > algorithm. it's required to wait for 30 seconds to discover loops in > the topology (nodes will only announce their presence so often). you > can turn this off if you're certain the port will never be used to > connect to switches with the configuration option "spanning-tree > portfast". I just ran into this problem and want to confirm that it is probably the spanning-tree in the switches. It also has adverse affects on bootp/dhcp requests as well. Configuring your host ports (not interswitch links) to enable portfast should remedy the problem. Thanks, -- Jeremy Rumpf ---------------------------- The Ohio State University OIT Network Information Services 464 Baker Systems ru...@os... Office # (614)292-4062 ---------------------------- |
From: Shaun M. <sha...@ma...> - 2004-05-28 14:42:09
|
>Place your VIP in block : > > virtual_ipaddress_excluded { > ... > } > >instead of virtual_ipaddress. > > Hi, Is this OK? Cheers Shaun ============================================================== global_defs { notification_email { me...@ma... } smtp_server 192.168.124.92 smtp_connect_timeout 30 lvs_id PO1 } vrrp_instance VI { state MASTER interface eth0 track_interface { eth0 eth1 } virtual_router_id 101 priority 150 advert_int 35 authentication { auth_type AH auth_pass <PW> } virtual_ipaddress { 10.1.12.16 dev eth0.12 } virtual_ipaddress_excluded { 10.1.43.254 dev eth1 10.1.143.254 dev eth0 10.1.50.5/26 dev eth1.50 10.1.52.5/26 dev eth1.52 10.1.53.5/26 dev eth1.53 10.1.54.5/26 dev eth1.54 10.1.55.5/26 dev eth1.55 10.1.56.5/26 dev eth1.56 10.1.57.5/26 dev eth1.57 10.1.58.5/26 dev eth1.58 10.1.59.5/26 dev eth1.59 10.1.60.5/26 dev eth1.60 10.1.61.5/26 dev eth1.61 10.1.62.5/26 dev eth1.62 10.1.63.5/26 dev eth1.63 10.1.64.5/26 dev eth1.64 10.1.65.5/26 dev eth1.65 10.1.66.5/26 dev eth1.66 10.1.67.5/26 dev eth1.67 10.1.68.5/26 dev eth1.68 10.1.69.5/26 dev eth1.69 80.247.34.36/26 dev eth0.11 80.247.34.37/26 dev eth0.11 80.247.34.38/26 dev eth0.11 80.247.34.39/26 dev eth0.11 80.247.34.40/26 dev eth0.11 80.247.34.41/26 dev eth0.11 80.247.34.43/26 dev eth0.11 80.247.34.44/26 dev eth0.11 80.247.34.45/26 dev eth0.11 80.247.34.46/26 dev eth0.11 80.247.34.65/26 dev eth0.11 80.247.34.66/26 dev eth0.11 80.247.34.67/26 dev eth0.11 80.247.34.68/26 dev eth0.11 80.247.34.69/26 dev eth0.11 80.247.34.70/26 dev eth0.11 80.247.34.71/26 dev eth0.11 80.247.34.72/26 dev eth0.11 80.247.34.73/26 dev eth0.11 80.247.34.74/26 dev eth0.11 80.247.34.75/26 dev eth0.11 80.247.34.76/26 dev eth0.11 80.247.34.77/26 dev eth0.11 80.247.34.78/26 dev eth0.11 80.247.34.79/26 dev eth0.11 80.247.34.80/26 dev eth0.11 80.247.34.81/26 dev eth0.11 80.247.34.82/26 dev eth0.11 80.247.34.83/26 dev eth0.11 80.247.34.84/26 dev eth0.11 80.247.34.85/26 dev eth0.11 80.247.34.86/26 dev eth0.11 80.247.34.87/26 dev eth0.11 80.247.34.88/26 dev eth0.11 80.247.34.89/26 dev eth0.11 80.247.34.90/26 dev eth0.11 80.247.34.91/26 dev eth0.11 80.247.34.92/26 dev eth0.11 80.247.34.93/26 dev eth0.11 80.247.34.94/26 dev eth0.11 80.247.34.95/26 dev eth0.11 80.247.34.96/26 dev eth0.11 } virtual_routes { 0.0.0.0/0 via 80.247.34.3 dev eth0.11 } notify_fault /usr/local/etc/keepalived_fault.sh smtp_alert } |
From: Alexandre C. <ac...@fr...> - 2004-05-28 14:53:48
|
> Is this OK? ya regards, Alexandre |