From: SourceForge.net <no...@so...> - 2008-08-30 02:52:11
|
Bugs item #2049872, was opened at 2008-08-13 12:15 Message generated for change (Comment added) made by buccella You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=712784&aid=2049872&group_id=128809 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: sfcb Group: Function Status: Open Resolution: Accepted Priority: 7 Private: No Submitted By: Manish Tomar (manishtomar) Assigned to: Sean Swehla (smswehla) Summary: "--- Warning: fd is closed" message from binary intf app Initial Comment: Invoking the following sequence of calls continuously 1. _Create_SfcbLocal_Env returns ce 2. ce->ft->connect returns cc 3. cc->ft->enumerateInstances or cc->ft->associators or any other intrinsic call 4. cc->ft->release 5. ce->ft->release causes the following message on console, aborting the application linking to libcimcClientSfcbLocal.so "--- Warning: fd is closed: Interrupted system call spGetMsg receiving from 10 0-14 Bad address" This eventually happens when libsfcBrokerCore.so calls libc abort function causing the following core dump from gdb. Program terminated with signal 6, Aborted. #0 0x2adf4874 in kill () from /lib/libc.so.0 (gdb) bt #0 0x2adf4874 in kill () from /lib/libc.so.0 #1 0x2ae3c58c in raise () from /lib/libc.so.0 #2 0x2ae3f5d0 in abort () from /lib/libc.so.0 #3 0x2b7095b4 in spRcvMsg () from /usr/lib/libsfcBrokerCore.so.0 #4 0x2b709af8 in spRecvCtlResult () from /usr/lib/libsfcBrokerCore.so.0 #5 0x2b6a0630 in localConnect () from /usr/lib/libcimcClientSfcbLocal.so #6 0x2b6a093c in CMPIConnect2 () from /usr/lib/libcimcClientSfcbLocal.so #7 0x2b6a0a28 in CMPIConnect () from /usr/lib/libcimcClientSfcbLocal.so ---------------------------------------------------------------------- >Comment By: Chris Buccella (buccella) Date: 2008-08-29 22:52 Message: Logged In: YES user_id=1550470 Originator: NO -Aug. 29 Braindump- This error starts in msgqueue.c:spGetMsg() where the recvmsg() syscall returns 0. According to the glibc manpage for this function, this indicates that the peer closed the socket. The strange part is that errno is set to EINTR, which indicates that the system call was interrupted. If is is what happens, we should just retry; sfcb takes care of this, but the condition (immediately below recvmsg() in the if block) is never reached, since it expects EINTR only if recvmsg() returns an error (<0). uClibc's implementation of recvmsg() may be setting the return code wrong. Or perhaps recvmsg() isn't setting errno at all, but is set by some other function. I need to investigate what errno should be for all return cases. Another possibility is that sfcb closed the socket on itself... perhaps some other thread running closed the wrong socket. Tracing on the msgqueue component should tell us if this happens; I've been tracing for the past 2 hours and haven't hit the error yet. The crash is the result of an abort() call in a case statement at the end of spRcvMsg(). If EINTR was detected correctly previously, this case statement is not reached; instead sfcb retries until a successful return from recvmsg(). ---------------------------------------------------------------------- Comment By: Manish Tomar (manishtomar) Date: 2008-08-28 03:38 Message: Logged In: YES user_id=2179806 Originator: YES What I meant is if sfcbd is started with trace the trace log is not accurate as it eats up all the space in filesystem. We normally run it without trace having enough space in filesystem. The error still occurs. ---------------------------------------------------------------------- Comment By: Chris Buccella (buccella) Date: 2008-08-27 19:45 Message: Logged In: YES user_id=1550470 Originator: NO manishtomar, I think what Sean was trying to determine is if the filling up of the filesystem is the cause of the error. Can you confirm this? Does the error occur when there is plenty of disk space available? ---------------------------------------------------------------------- Comment By: Manish Tomar (manishtomar) Date: 2008-08-26 10:44 Message: Logged In: YES user_id=2179806 Originator: YES Mostly, yes. The error normally occurs after space in /tmp gets filled up due to loads of output from sfcb. Other areas are Read-only and are not advised to be accessed frequently. Recently, the error has been quite frequent and I'll try it to get it with the trace. Do you want the trace on the client app also (i.e. app linking to libcimcClientSfcbLocal.so)? Our just in sfcb server? ---------------------------------------------------------------------- Comment By: Sean Swehla (smswehla) Date: 2008-08-26 10:09 Message: Logged In: YES user_id=1939165 Originator: NO When you say "space is completely filled up", do you mean space on the file system? Does the error occur only after space has filled completely? Does it ever occur when there is still space available? ---------------------------------------------------------------------- Comment By: Manish Tomar (manishtomar) Date: 2008-08-26 05:58 Message: Logged In: YES user_id=2179806 Originator: YES Somtimes, second output is different: Nov 22 09:21:07 (none) lighttpd[8345]: --- Warning: fd is closed: Resource temporarily unavailable Nov 22 09:21:07 (none) lighttpd[8345]: spGetMsg receiving from 14 0-14 Bad address ---------------------------------------------------------------------- Comment By: Manish Tomar (manishtomar) Date: 2008-08-25 10:45 Message: Logged In: YES user_id=2179806 Originator: YES Following is outputted to syslog when this occurs: Aug 25 19:35:05 (none) lighttpd[9697]: --- Warning: fd is closed: Resource temporarily unavailable Aug 25 19:35:05 (none) lighttpd[9697]: ### 0 ??? 0-0 ---------------------------------------------------------------------- Comment By: Manish Tomar (manishtomar) Date: 2008-08-23 00:47 Message: Logged In: YES user_id=2179806 Originator: YES Chris, It might be difficult to give the trace output as it is occuring in embedded system where we do not have much space and error occurs after the space is completely filled up. I'll attach the syslog output once the error occurs. ---------------------------------------------------------------------- Comment By: Chris Buccella (buccella) Date: 2008-08-22 21:31 Message: Logged In: YES user_id=1550470 Originator: NO I have changed my test program so that the for loop executes inside of it as you suggested. I still do not see the problem occur. My virtual MIPS box is using glibc... perhaps the issue is with uclibc? Could you please attach the information I requested in my previous post? This would help us. ---------------------------------------------------------------------- Comment By: Venkatesh Ramamurthy (vkat) Date: 2008-08-21 21:06 Message: Logged In: YES user_id=2180106 Originator: NO Chris, I believe that problem will not happen if the test util is run 200 times from the BASH shell as test process termination cleans up everything. The connect()/enumerate()/release() needs to be looped several times from within the test application. ---------------------------------------------------------------------- Comment By: Chris Buccella (buccella) Date: 2008-08-21 20:31 Message: Logged In: YES user_id=1550470 Originator: NO 1) I am trying to recreate this problem. I have attached the test program that I am using to do so. I have run this program in a BASH for loop for 200 iterations, and have not seen the error you describe. Please check the attached source code and verify that it is similar to what you are using. 2) As Sean suggested, log output would be very useful. Please consult your syslog and attach a copy of the messages from sfcbd. 3) Trace output would also be useful. Please run "sfcbd -t 65536 2> sfcb-output" and recreate your problem scenario. Then attach the sfcb-output file to this tracker item along with the output for #2. File Added: v2test_ei-ami.c ---------------------------------------------------------------------- Comment By: Sean Swehla (smswehla) Date: 2008-08-21 15:26 Message: Logged In: YES user_id=1939165 Originator: NO Do you get any messages in your log file? There's only one abort() call in spRcvMsg, and it should be logging an error before the call is made. ---------------------------------------------------------------------- Comment By: Manish Tomar (manishtomar) Date: 2008-08-13 12:20 Message: Logged In: YES user_id=2179806 Originator: YES Please note that this does not appear in x86 with glibc. It seems to be appearing only on uclibc with big endian processor. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=712784&aid=2049872&group_id=128809 |