Invoking the following sequence of calls continuously
1. _Create_SfcbLocal_Env returns ce
2. ce->ft->connect returns cc
3. cc->ft->enumerateInstances or cc->ft->associators or any other intrinsic call
4. cc->ft->release
5. ce->ft->release
causes the following message on console, aborting the application linking to libcimcClientSfcbLocal.so
"--- Warning: fd is closed: Interrupted system call
spGetMsg receiving from 10 0-14 Bad address"
This eventually happens when libsfcBrokerCore.so calls libc abort function causing the following core dump from gdb.
Program terminated with signal 6, Aborted.
#0 0x2adf4874 in kill () from /lib/libc.so.0
(gdb) bt
#0 0x2adf4874 in kill () from /lib/libc.so.0
#1 0x2ae3c58c in raise () from /lib/libc.so.0
#2 0x2ae3f5d0 in abort () from /lib/libc.so.0
#3 0x2b7095b4 in spRcvMsg () from /usr/lib/libsfcBrokerCore.so.0
#4 0x2b709af8 in spRecvCtlResult () from /usr/lib/libsfcBrokerCore.so.0
#5 0x2b6a0630 in localConnect () from /usr/lib/libcimcClientSfcbLocal.so
#6 0x2b6a093c in CMPIConnect2 () from /usr/lib/libcimcClientSfcbLocal.so
#7 0x2b6a0a28 in CMPIConnect () from /usr/lib/libcimcClientSfcbLocal.so
Logged In: YES
user_id=2179806
Originator: YES
Please note that this does not appear in x86 with glibc. It seems to be appearing only on uclibc with big endian processor.
Logged In: YES
user_id=1939165
Originator: NO
Do you get any messages in your log file? There's only one abort() call in spRcvMsg, and it should be logging an error before the call is made.
client test program (sfcc)
Logged In: YES
user_id=1550470
Originator: NO
1) I am trying to recreate this problem. I have attached the test program that I am using to do so. I have run this program in a BASH for loop for 200 iterations, and have not seen the error you describe. Please check the attached source code and verify that it is similar to what you are using.
2) As Sean suggested, log output would be very useful. Please consult your syslog and attach a copy of the messages from sfcbd.
3) Trace output would also be useful. Please run "sfcbd -t 65536 2> sfcb-output" and recreate your problem scenario. Then attach the sfcb-output file to this tracker item along with the output for #2.
File Added: v2test_ei-ami.c
Logged In: YES
user_id=2180106
Originator: NO
Chris,
I believe that problem will not happen if the test util is run 200 times from the BASH shell as test process termination cleans up everything. The connect()/enumerate()/release() needs to be looped several times from within the test application.
Logged In: YES
user_id=1550470
Originator: NO
I have changed my test program so that the for loop executes inside of it as you suggested. I still do not see the problem occur. My virtual MIPS box is using glibc... perhaps the issue is with uclibc?
Could you please attach the information I requested in my previous post? This would help us.
Logged In: YES
user_id=2179806
Originator: YES
Chris,
It might be difficult to give the trace output as it is occuring in embedded system where we do not have much space and error occurs after the space is completely filled up. I'll attach the syslog output once the error occurs.
Logged In: YES
user_id=2179806
Originator: YES
Following is outputted to syslog when this occurs:
Aug 25 19:35:05 (none) lighttpd[9697]: --- Warning: fd is closed: Resource
temporarily unavailable
Aug 25 19:35:05 (none) lighttpd[9697]: ### 0 ??? 0-0
Logged In: YES
user_id=2179806
Originator: YES
Somtimes, second output is different:
Nov 22 09:21:07 (none) lighttpd[8345]: --- Warning: fd is closed: Resource temporarily unavailable
Nov 22 09:21:07 (none) lighttpd[8345]: spGetMsg receiving from 14 0-14 Bad address
Logged In: YES
user_id=1939165
Originator: NO
When you say "space is completely filled up", do you mean space on the file system? Does the error occur only after space has filled completely? Does it ever occur when there is still space available?
Logged In: YES
user_id=2179806
Originator: YES
Mostly, yes. The error normally occurs after space in /tmp gets filled up due to loads of output from sfcb. Other areas are Read-only and are not advised to be accessed frequently. Recently, the error has been quite frequent and I'll try it to get it with the trace. Do you want the trace on the client app also (i.e. app linking to libcimcClientSfcbLocal.so)? Our just in sfcb server?
Logged In: YES
user_id=1550470
Originator: NO
manishtomar, I think what Sean was trying to determine is if the filling up of the filesystem is the cause of the error. Can you confirm this? Does the error occur when there is plenty of disk space available?
Logged In: YES
user_id=2179806
Originator: YES
What I meant is if sfcbd is started with trace the trace log is not accurate as it eats up all the space in filesystem. We normally run it without trace having enough space in filesystem. The error still occurs.
Logged In: YES
user_id=1550470
Originator: NO
-Aug. 29 Braindump-
This error starts in msgqueue.c:spGetMsg() where the recvmsg() syscall returns 0. According to the glibc manpage for this function, this indicates that the peer closed the socket. The strange part is that errno is set to EINTR, which indicates that the system call was interrupted. If is is what happens, we should just retry; sfcb takes care of this, but the condition (immediately below recvmsg() in the if block) is never reached, since it expects EINTR only if recvmsg() returns an error (<0). uClibc's implementation of recvmsg() may be setting the return code wrong. Or perhaps recvmsg() isn't setting errno at all, but is set by some other function. I need to investigate what errno should be for all return cases.
Another possibility is that sfcb closed the socket on itself... perhaps some other thread running closed the wrong socket. Tracing on the msgqueue component should tell us if this happens; I've been tracing for the past 2 hours and haven't hit the error yet.
The crash is the result of an abort() call in a case statement at the end of spRcvMsg(). If EINTR was detected correctly previously, this case statement is not reached; instead sfcb retries until a successful return from recvmsg().
committed to HEAD.
This LTC bug #47412.
File Added: 2049872-fd_is_closed.patch
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 60 days (the time period specified by
the administrator of this Tracker).