Thread: [Javagroups-development] FD_SOCK...
Brought to you by:
belaban
From: Ilan G. <ila...@el...> - 2006-02-10 18:12:29
|
Bela et al., I set up my platform with 2.2.9.1 and put some new member creation load on it (launching 10 JGroups member JVMs at a time on a total of about 5 machines, getting to at least 100 members in the group). Some of my machines simply froze (were replying to ping but that's about it, no remote reboot possible). After restart, system messages indicated the kernel reported out of memory problems and killing of java processes... This put the system in an interesting state. Once I restarted my machines (and killed the JVMs on all but one of the surviving machine), I tried to start a new member. It tried to contact the old (dead) coordinator, because the one surviving JVM that I forgot to kill still thought this was the coordinator (monitored by FD_SOCK) and sent its identity to the new member. I quickly looked at FD_SOCK and I have a few questions: 1. It seems that down() in FD_SOCK does not handle Event.SUSPECT. Does that mean that if another member is suspecting the member FD_SOCK is monitoring, FD_SOCK will just ignore the suspicion? 2. I had the impression that FD_SOCK does not take advantage of the symmetric nature of the TCP connection. Does the monitored (server) member use the TCP connection to monitor the monitoring (client) member? 3. I feel strongly that some heart beat message should be sent on these idle FD_SOCK TCP connections to detect router or server failure... The incurred network cost can't be significant in a 'normal' system using network resources. This will prevent indefinite lockups as I experienced (requiring a complete shutdown of ALL machines in a platform... not always easy, especially when the platform still provides degraded service despite a partial failure). I'm going to do more tests under 'normal' load and see if the problem occurs again. Thanks, Ilan |
From: Bela B. <be...@ya...> - 2006-02-15 08:26:28
|
Ilan Ginzburg wrote: > Bela et al., > > I set up my platform with 2.2.9.1 and put some new member creation > load on it (launching 10 JGroups member JVMs at a time on a total of > about 5 machines, getting to at least 100 members in the group). > > Some of my machines simply froze (were replying to ping but that's > about it, no remote reboot possible). After restart, system messages > indicated the kernel reported out of memory problems and killing of > java processes... This put the system in an interesting state. > > Once I restarted my machines (and killed the JVMs on all but one of > the surviving machine), I tried to start a new member. It tried to > contact the old (dead) coordinator, because the one surviving JVM that > I forgot to kill still thought this was the coordinator (monitored by > FD_SOCK) and sent its identity to the new member. > > I quickly looked at FD_SOCK and I have a few questions: > > 1. It seems that down() in FD_SOCK does not handle Event.SUSPECT. Does > that mean that if another member is suspecting the member FD_SOCK is > monitoring, FD_SOCK will just ignore the suspicion? SUSPECT messages will *never* be received from *above*, always from *below*, so we don't need to handle them in down(), but in up(). In general, however, we will ignore SUSPECT messages and only handle VIEW messages. That's because a SUSPECT doesn't necessarily lead into a new VIEW (VERIFY_SUSPECT might drop them). > 2. I had the impression that FD_SOCK does not take advantage of the > symmetric nature of the TCP connection. Does the monitored (server) > member use the TCP connection to monitor the monitoring (client) member? No. Failure detection is *not* tied to the transport; it is a separate aspect. Besides, the TCP transport might close connections when they're idle, so we cannot rely on a connection close at the transport to indicate that a member has crashed. > 3. I feel strongly that some heart beat message should be sent on > these idle FD_SOCK TCP connections to detect router or server > failure... The incurred network cost can't be significant in a > 'normal' system using network resources. This will prevent indefinite > lockups as I experienced (requiring a complete shutdown of ALL > machines in a platform... not always easy, especially when the > platform still provides degraded service despite a partial failure). This leads to a problem though: what interval do you send the heartbeat messages at ? That interval could be too small, or too large, but it will never be ideal. You should be able to achive that by simply adding FD on top of FD_SOCK, so you have both failure detection protocols in one stack. The interval of FD should then be high. I have never tried this out though... -- Bela Ban Lead JGroups / JBossCache callto://belaban |
From: Ilan G. <ila...@el...> - 2006-02-15 09:40:58
|
Bela Ban wrote: > SUSPECT messages will *never* be received from *above*, always from > *below*, so we don't need to handle them in down(), but in up(). My mistake, I later discovered the ring structure of FD_SOCK anyway... >> 2. I had the impression that FD_SOCK does not take advantage of the >> symmetric nature of the TCP connection. Does the monitored (server) >> member use the TCP connection to monitor the monitoring (client) member? > > No. Failure detection is *not* tied to the transport; it is a separate > aspect. I wasn't thinking about the transport but about the TCP connection FD_SOCK opens. In the FD_SOCK ring where A connects to B that connects to C that connects to A, if the connection between A and B breaks for some reason A will suspect B but B will not suspect A. >> 3. I feel strongly that some heart beat message should be sent on >> these idle FD_SOCK TCP connections to detect router or server >> failure... The incurred network cost can't be significant in a >> 'normal' system using network resources. This will prevent indefinite >> lockups as I experienced (requiring a complete shutdown of ALL >> machines in a platform... not always easy, especially when the >> platform still provides degraded service despite a partial failure). > > This leads to a problem though: what interval do you send the heartbeat > messages at ? That interval could be too small, or too large, but it > will never be ideal. > You should be able to achive that by simply adding FD on top of FD_SOCK, > so you have both failure detection protocols in one stack. The interval > of FD should then be high. > I have never tried this out though... I wasn't aware I could get away with adding FD on top of FD_SOCK, and because I don't have a lot of time to spend on understanding and hacking JGroups, I did a quick "fix" adding a thread sending bytes on the FD_SOCK link from A to B and having B send back a byte to A when it receives one. Every time A sends a byte it also checks when the last reply from B was received and if that reply is too old, A considers that there is a problem (similar to detecting a broken FD_SOCK TCP connection). I added two configuration parameters to FD_SOCK (millis between sending the ping byte and millis since last reception of the pong reply before deciding something went wrong). I'll send you a patch (is this list the place?) once I'm happy enough with the result although I don't think it'll have universal appeal ;-) Ilan |
From: Roman R. <rro...@ac...> - 2006-02-15 11:42:45
|
>>> 3. I feel strongly that some heart beat message should be sent on these >>> idle FD_SOCK TCP connections to detect router or server failure... The >>> incurred network cost can't be significant in a 'normal' system using >>> network resources. This will prevent indefinite lockups as I experienced >>> (requiring a complete shutdown of ALL machines in a platform... not >>> always easy, especially when the platform still provides degraded >>> service despite a partial failure). >> >> This leads to a problem though: what interval do you send the heartbeat >> messages at ? That interval could be too small, or too large, but it will >> never be ideal. >> You should be able to achive that by simply adding FD on top of FD_SOCK, >> so you have both failure detection protocols in one stack. The interval >> of FD should then be high. >> I have never tried this out though... > I wasn't aware I could get away with adding FD on top of FD_SOCK, and > because I don't have a lot of time to spend on understanding and hacking > JGroups, I did a quick "fix" adding a thread sending bytes on the FD_SOCK > link from A to B and having B send back a byte to A when it receives one. > Every time A sends a byte it also checks when the last reply from B was > received and if that reply is too old, A considers that there is a problem > (similar to detecting a broken FD_SOCK TCP connection). I added two > configuration parameters to FD_SOCK (millis between sending the ping byte > and millis since last reception of the pong reply before deciding > something went wrong). I'll send you a patch (is this list the place?) > once I'm happy enough with the result although I don't think it'll have > universal appeal ;-) Long time ago I have implemented the same approach because FD was firing false alarms under heavy load (100% CPU) and I thought that it might be better solution. However, after extensive testing I found that FD works usually better compared to a "patched" FD_SOCK, so I never committed the code to the CVS. Hope this helps. Roman |
From: Ilan G. <ila...@el...> - 2006-02-15 11:50:28
|
Roman Rokytskyy wrote: > Long time ago I have implemented the same approach because FD was firing > false alarms under heavy load (100% CPU) and I thought that it might be > better solution. However, after extensive testing I found that FD works > usually better compared to a "patched" FD_SOCK, so I never committed the > code to the CVS. If your system is CPU bound I can understand that trying to get a TCP message through is harder than a UDP packet. In my production environment, CPU usage might be low on all machines yet I have false alarms using FD (but maybe some other network traffic going through the switch). Unless I find some obvious problem, I'll deploy my patched FD_SOCK and I'll send feedback here in a few months once I get a better feel of how it behaves for real. Thanks for your feedback, Ilan |
From: Bela B. <be...@ya...> - 2006-02-15 12:07:50
|
Okay, why don't you do that. I'd be very interested though in knowing why FD is causing false alarms when the cluster is idle ! Have you scrutinized the logs on the switch, e.g. to see whether UDP datagrams were dropped at some point in time ? Ilan Ginzburg wrote: > Roman Rokytskyy wrote: >> Long time ago I have implemented the same approach because FD was >> firing false alarms under heavy load (100% CPU) and I thought that it >> might be better solution. However, after extensive testing I found >> that FD works usually better compared to a "patched" FD_SOCK, so I >> never committed the code to the CVS. > > If your system is CPU bound I can understand that trying to get a TCP > message through is harder than a UDP packet. > > In my production environment, CPU usage might be low on all machines > yet I have false alarms using FD (but maybe some other network traffic > going through the switch). > > Unless I find some obvious problem, I'll deploy my patched FD_SOCK and > I'll send feedback here in a few months once I get a better feel of > how it behaves for real. > > Thanks for your feedback, > Ilan > -- Bela Ban Lead JGroups / JBossCache callto://belaban |
From: Roman R. <rro...@ac...> - 2006-02-15 12:08:45
|
> If your system is CPU bound I can understand that trying to get a TCP > message through is harder than a UDP packet. It was not related to the TCP or UDP. The issue with FD is simple - when the FD thread did not receive enough "quantas" for execution finally it noticed that the last heartbeat was received after timeout and issued SUSPECT message. The message went into a queue, which already contained other messages and FD was unable to process them in time, so next SUSPECT was issued... And so on... I used UDP as transport, but same behavior could be seen with TCP (on Windows, Linux and Solaris, however that was more than 2 years ago, things could have changed since then). > In my production environment, CPU usage might be low on all machines yet I > have false alarms using FD (but maybe some other network traffic going > through the switch). > > Unless I find some obvious problem, I'll deploy my patched FD_SOCK and > I'll send feedback here in a few months once I get a better feel of how it > behaves for real. Test your system's behavior under high load. My normal operation mode was low-CPU, however when under some conditions (CPU load, network load, etc) FD or "patched" FD_SOCK were not able to process heartbeats group was splitted into two or more smaller ones, which started to work independently, then MERGE2 merged them together and so on. So, even if you expect system to work under low load, test its behavior under very high load too - you might discover undesirable behavior during such peaks. We just had such problem in a very large production system (though it uses TIBCO Rendezvous) - under load nodes lost each other, then it tried to recover and failed, RV buffers went overloaded, and after all - big bang. We have added two more CPUs and things went away. Roman |
From: Bela B. <be...@ya...> - 2006-02-15 12:05:55
|
Ilan Ginzburg wrote: >> No. Failure detection is *not* tied to the transport; it is a >> separate aspect. > > I wasn't thinking about the transport but about the TCP connection > FD_SOCK opens. In the FD_SOCK ring where A connects to B that connects > to C that connects to A, if the connection between A and B breaks for > some reason A will suspect B but B will not suspect A. That is as designed. Any member X will always only suspect the member to tis right. So in A - B - C , A will suspect B, B will suspect and C will suspect A. This is unidirectional, not bidirectional. > I wasn't aware I could get away with adding FD on top of FD_SOCK, and > because I don't have a lot of time to spend on understanding and > hacking JGroups, I did a quick "fix" adding a thread sending bytes on > the FD_SOCK link from A to B and having B send back a byte to A when > it receives one. Every time A sends a byte it also checks when the > last reply from B was received and if that reply is too old, A > considers that there is a problem (similar to detecting a broken > FD_SOCK TCP connection). I added two configuration parameters to > FD_SOCK (millis between sending the ping byte and millis since last > reception of the pong reply before deciding something went wrong). > I'll send you a patch (is this list the place?) once I'm happy enough > with the result although I don't think it'll have universal appeal ;-) I'm not sure I want to mix the static failure detection of the socket connection with heartbeating, to do that more elegantly simply add FD to the stack. -- Bela Ban Lead JGroups / JBossCache callto://belaban |