RE: [Javagroups-development] Members hanging on JChannel.connect()
Brought to you by:
belaban
From: Ilan G. <ila...@el...> - 2006-01-13 09:19:12
|
Bela, The coordinator was not dead (I could see application logs coming from it) and it was sending JGroups messages that were not received by the other members (those that had the correct view, not only the one trying to join - that's when I started to suspect something was going wrong). You write: > There are 2 cases when the coord leaves: > (a) it crashed or (b) it left gracefully. There could also be a case when some functionality is ok and some is not... I believe that was the case because I updated the jars while the code was running. Would it be complicated for other protocols in the stack to issue failure 'hints'? (then FD_SOCK might have to send a byte around to test if the TCP connection is alive or not). Or maybe simply let FD_SOCK be configurable to send a byte around from time to time. The small increase in network use might be worth it for some applications (I moved to FD_SOCK not for reducing network bandwidth but to reduce/eliminate members being excluded for no reason). Note this would not solve the problem (if it exists) of other protocols being broken and FD_SOCK being OK. This is hard to eliminate as you must be certain that your code deals correctly with all possible exceptions and throwables, including those thrown by the user code it calls. I'll update everything and let you know (with better logs) if I run into the problem again. Thanks again for your time and effort, Ilan |