This may not be a Jgroups issues but a little more knowledge may help me in figuring out the issue. I am using infinispan (which uses jgroups) for clustering. I have a 4 node setup with jgroups 2.12.x and TCP stack. After a while logs are flooded with "failed sending message to NodeX, cause Queue Full" exceptions, I have debugged this code and found that TCPConnectionMap.TCPConnection.Sender's addToQueue method throws this exception when the call to TCPConnection._send method fails from within Sender.run. On enabling logs, I found that _send method encounters SocketClosed exception.
Please help me in understanding a couple of things:
1) Does jgroups only make one TCP connection for transfer of data between any two nodes?
2) Does it try to close this connection if not being used for a while or it relies on the keepalive for that?
3) Do you see any reason why Socket would be getting closed frequently? As per my understanding, a socket should be held for the entire duration when the processes are running if there is reasonable traffic between the nodes, is it not true?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have some further info on this issue. I captured tcpdump logs and found that in case when SocketClosed exception was encountered, RST packets were being sent. As per my understanding RST would be sent by OS when the socket goes down and then comes back up. But in this case, the nodes are stable. Does jgroup force RST in some cases? Any advice would be helpful.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This may not be a Jgroups issues but a little more knowledge may help me in figuring out the issue. I am using infinispan (which uses jgroups) for clustering. I have a 4 node setup with jgroups 2.12.x and TCP stack. After a while logs are flooded with "failed sending message to NodeX, cause Queue Full" exceptions, I have debugged this code and found that TCPConnectionMap.TCPConnection.Sender's addToQueue method throws this exception when the call to TCPConnection._send method fails from within Sender.run. On enabling logs, I found that _send method encounters SocketClosed exception.
Please help me in understanding a couple of things:
1) Does jgroups only make one TCP connection for transfer of data between any two nodes?
2) Does it try to close this connection if not being used for a while or it relies on the keepalive for that?
3) Do you see any reason why Socket would be getting closed frequently? As per my understanding, a socket should be held for the entire duration when the processes are running if there is reasonable traffic between the nodes, is it not true?
I have some further info on this issue. I captured tcpdump logs and found that in case when SocketClosed exception was encountered, RST packets were being sent. As per my understanding RST would be sent by OS when the socket goes down and then comes back up. But in this case, the nodes are stable. Does jgroup force RST in some cases? Any advice would be helpful.