Re: [Keepalived-devel] ksoftirqd/0 100% Processor Usage (SOLVED)
Status: Beta
Brought to you by:
acassen
|
From: David R. <da...@ad...> - 2006-02-20 03:21:35
|
Hello, I've since found a solution to the problem mentioned in this thread, so I thought I'd share it with the rest of the list, for completeness purposes. Essentially, what was happening can be described as a "packet storm" or infinite redirection loop, where one director would forward a packet to another, who would forward it back to the original, and so on. This caused the high load, when this situation arose. The solution is to create a script to be called by keepalived on state change, as described in this message [ http://sourceforge.net/mailarchive/message.php?msg_id=12901268 ]. This backs up and flushes the ipvs table when a director enters BACKUP state, to prevent the infinite redirection loop problem. Thanks again Alexandre for this great software! David Rusenko Aderes, Inc. - www.aderes.net David Rusenko wrote: > Dear Alexandre et al: > > I haven't heard back from you on this subject, so I am considering > taking it to the linux-kernel mailing list. I understand you may not > have any real idea of what is going on, but if you have any quick tips, > ideas, or suggestions, it would all be very helpful. > > Thanks again, > > David Rusenko > Aderes, Inc - www.aderes.net > > > > David Rusenko wrote: >> Hello Alexandre and friends, >> >> I am experiencing a very weird issue with keepalived 1.1.11 and >> ksoftirqd. This issue did not crop up in testing, and has only >> manifested once the servers were put in production (!). >> >> In the interest of time, for you and others, I have included the details >> of the Keepalived setup, and server setup, at the bottom of the message. >> >> Here is what happens: When keepalived is started on the second director, >> ksoftirqd/0 slowly begins to use increasingly more processor, and the >> load slowly begins to rise. If keepalived is not shut off on the second >> server, the two will eventually become unresponsive, and "sweat it out" >> with loads of 100+ for about 45 mins, until the load comes back down. >> During the periods of high load, services become unresponsive and don't >> reply to queries. The servers operate fine if keepalived is only running >> on the Master Director. >> >> advert_int is set to 1 -- could this be a problem? I've included the >> vrrp_instance definition at the bottom of this message as well. >> >> All interrupts look good for systems with disk usage and high network >> usage, except perhaps the "timer" interrupt. Does Keepalived use any >> soft interrupts? I've included a listing of /proc/interrupts at the >> bottom of the message. >> >> Is there any other information that might be useful in debugging the >> problem? I have thought of emailing the linux-kernel mailing list, but >> would prefer to ask here first. >> >> Thanks in advance for your help on this issue, and for your great work >> on Keepalived. I look forward to hearing from you soon. >> >> Sincerely, >> >> David Rusenko >> President/CEO >> Aderes, Inc - www.aderes.net >> >> >> >> KEEPALIVED SETUP >> 2 servers both configured as Real Servers, and as Directors. Localnode >> is used to point services from the director back to itself, and LVS-DR >> is used for the services. With the new 2.6 kernel, and a recent >> Keepalived code base, the VIP is added to the NIC on MASTER transition, >> and a gARP is sent out -- all is fine. No ARPs are sent out by the >> backup director, as it does not have the IP address on its interface. >> All is well at this point, and the setup worked perfectly in testing. >> >> SERVER SETUP >> The systems in question are IBM xSeries 305 servers, with "Broadcom >> Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 02)" network cards. >> Both have sufficient RAM (1.5+ GB) and CPU (2.4 Ghz). They are both >> running SuSE 9.1 Linux. >> >> VRRP_INSTANCE DEFINITION >> vrrp_instance VI_1 { >> state MASTER >> interface eth1 >> virtual_router_id 51 >> priority 200 >> advert_int 1 >> authentication { >> auth_type PASS >> auth_pass somepassword >> } >> virtual_ipaddress { >> x.x.x.x >> } >> } >> >> LISTING OF /proc/interrupts >> # cat /proc/interrupts >> CPU0 >> 0: 1250767820 XT-PIC timer >> 2: 0 XT-PIC cascade >> 5: 99823249 XT-PIC eth1 >> 7: 42909170 XT-PIC eth0 >> 8: 2 XT-PIC rtc >> 9: 14 XT-PIC acpi >> 10: 0 XT-PIC ohci_hcd >> 14: 34660278 XT-PIC ide0 >> NMI: 5844 >> LOC: 0 >> ERR: 0 >> MIS: 0 >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: >> Power Architecture Resource Center: Free content, downloads, discussions, >> and more. http://solutions.newsforge.com/ibmarch.tmpl >> _______________________________________________ >> Keepalived-devel mailing list >> Kee...@li... >> https://lists.sourceforge.net/lists/listinfo/keepalived-devel >> > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Power Architecture Resource Center: Free content, downloads, discussions, > and more. http://solutions.newsforge.com/ibmarch.tmpl > _______________________________________________ > Keepalived-devel mailing list > Kee...@li... > https://lists.sourceforge.net/lists/listinfo/keepalived-devel > |