Re: [javagroups-users] Logs filling up

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Eric Dalquist wrote:
>> Can you reproduce this ? Is the cluster under stress, lots of 
>> messages being sent ? I don't think so, as the few logs I looked at 
>> had sent ca 18'000 messages.
>
> I haven't been able to. We had been seeing this quite often a few 
> weeks ago then realized we had some firewall rules that were blocking 
> some of the FD sockets. We removed those and it seemed that everything 
> was good. Our 4 machine QA cluster (same hardware/network configs) 
> doesn't appear to be having this problem even under load tests but it 
> is only 4 machines versus 9 in production.

OK

>>> I've been looking through it and can't really see anything that 
>>> stands out. The logs include DEBUG level info for all of the jgroups 
>>> package but that generally isn't much data compared to the number of 
>>> warnings we're getting.
>>>
>>> The log files are available here: 
>>> https://mywebspace.wisc.edu/dalquist/web/JGroups/portal.jgroups.log.tar.bz2 
>>>
>>>
>>> I've attached our JGroups config, we're using 2.10
>>
>> I would suggest to either remove FD_ALL from the config, or increase 
>> the props, e.g. timeout=35000 interval=10000. In a large cluster, a 
>> lot of messages can be sent by FD_ALL, and I created [1] today to 
>> look into it.
>>
>> [1] https://jira.jboss.org/browse/JGRP-1241
>>
> I'm assuming FD_SOCK will still behave correctly without FD_ALL in the 
> configuration or do I need to add in some other FD layer to replace 
> FD_ALL?

I would actually leave FD_ALL in your config, but increase the timeouts, 
so we reduce the risk of a broadcast storm when it triggers. Once I've 
fixed [1], you could try it out and that should really help.

In most cases, FD_SOCK will do the job and detect a crashed member 
quickly, *before* FD_ALL kicks in.

[1] https://jira.jboss.org/browse/JGRP-1241

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss