From: <er...@fr...> - 2007-03-13 12:07:02
|
Hi, I was trying to set up a TCP socket based perfparsed daemon server on SPARC/Solaris 9 to collect data from remote perfparse-log2socket_output clients. The configuration was quite simple and straightforward, but I ran into a problem on the server side. Perfparsed on the server started fine, it opened the server port and started to wait for connections. This is what I expected. However, after the first client connection, perfparsed topped one of the server CPUs to nearly 100%. Moreover, after a few hours of running, it dumped core. After further investigating the issue and analyzing the core file, I realized, that perfparsed did not free up the sockets of the closed client connections, but tried to continuously read data from them, causing high CPU utilization (the select calls referencing remotely closed sockets returned immediately, causing a busy wait loop after the first terminated client connection). And after the number of sockets/FDs reached FD_SETSIZE, perfparsed cored in log_reader.c when calling FD_ISSET or FD_SET. I did some debugging and testing, and analyzed the source code as well, and found, that perfparsed closes the client socket only when it receives an 'exit' command from the client. But the perfparse-log2socket_output client does not seem to send this. The storage_socket_output module just shuts down the connection on storage_disconnect, without sending 'exit' to the server. Now I saw 2 ways to fix this: modify the storage_socket_output module to send 'exit' on storage_disconnect, or modify the perfparsed side to handle the remotely closed sockets correctly. Obviously, the 2nd option is better, and results a more stable perfparsed. Therefore I fixed the log_reader function in log_reader.c to close remotely terminated sockets properly. (In such a case, the read call on the socket returns 0.) Here is the patch: --- ./perfparse-0.105.6/perfparse/log_reader.c Fri Jan 14 15:40:32 2005 +++ ./perfparse-0.105.6-mod/perfparse/log_reader.c Mon Mar 12 16:29:25 2007 @@ -506,6 +506,10 @@ if(strchr(tmp,'\n')) break; if((LOG_FD_CLIENT_SOCKET != log_fd[i]->type) && (r!= 10)) break; } + if((LOG_FD_SOCKET == log_fd[i]->type) && (r==0)) { + close_log_source(i,1); + return(NULL); + } } What do you think of this issue and fix? If you agree with that, it could be incorporated into the CVS and the next release. (I am just wondering why others have not run into this problem so far.) Bye, Tamas ----------------------- Tamas Erdei KFKI-LNX Ltd. http://www.kfki-lnx.hu |