|
From: <er...@fr...> - 2007-03-13 12:07:02
|
Hi,
I was trying to set up a TCP socket based perfparsed daemon server on
SPARC/Solaris 9 to collect data from remote perfparse-log2socket_output
clients. The configuration was quite simple and straightforward, but I
ran into a problem on the server side. Perfparsed on the server started
fine, it opened the server port and started to wait for connections.
This is what I expected. However, after the first client connection,
perfparsed topped one of the server CPUs to nearly 100%. Moreover, after
a few hours of running, it dumped core.
After further investigating the issue and analyzing the core file, I
realized, that perfparsed did not free up the sockets of the closed
client connections, but tried to continuously read data from them,
causing high CPU utilization (the select calls referencing remotely
closed sockets returned immediately, causing a busy wait loop after the
first terminated client connection). And after the number of sockets/FDs
reached FD_SETSIZE, perfparsed cored in log_reader.c when calling
FD_ISSET or FD_SET.
I did some debugging and testing, and analyzed the source code as well,
and found, that perfparsed closes the client socket only when it
receives an 'exit' command from the client. But the
perfparse-log2socket_output client does not seem to send this. The
storage_socket_output module just shuts down the connection on
storage_disconnect, without sending 'exit' to the server.
Now I saw 2 ways to fix this: modify the storage_socket_output module to
send 'exit' on storage_disconnect, or modify the perfparsed side to
handle the remotely closed sockets correctly. Obviously, the 2nd option
is better, and results a more stable perfparsed. Therefore I fixed the
log_reader function in log_reader.c to close remotely terminated sockets
properly. (In such a case, the read call on the socket returns 0.)
Here is the patch:
--- ./perfparse-0.105.6/perfparse/log_reader.c Fri Jan 14 15:40:32 2005
+++ ./perfparse-0.105.6-mod/perfparse/log_reader.c Mon Mar 12
16:29:25 2007
@@ -506,6 +506,10 @@
if(strchr(tmp,'\n')) break;
if((LOG_FD_CLIENT_SOCKET != log_fd[i]->type) &&
(r!= 10)) break;
}
+ if((LOG_FD_SOCKET == log_fd[i]->type) && (r==0)) {
+ close_log_source(i,1);
+ return(NULL);
+ }
}
What do you think of this issue and fix?
If you agree with that, it could be incorporated into the CVS and the
next release.
(I am just wondering why others have not run into this problem so far.)
Bye,
Tamas
-----------------------
Tamas Erdei
KFKI-LNX Ltd.
http://www.kfki-lnx.hu
|