Menu

#1310 "--- Warning: fd is closed" message from binary intf app

Function
closed-fixed
sfcb (1090)
7
2008-12-02
2008-08-13
No

Invoking the following sequence of calls continuously
1. _Create_SfcbLocal_Env returns ce
2. ce->ft->connect returns cc
3. cc->ft->enumerateInstances or cc->ft->associators or any other intrinsic call
4. cc->ft->release
5. ce->ft->release

causes the following message on console, aborting the application linking to libcimcClientSfcbLocal.so

"--- Warning: fd is closed: Interrupted system call
spGetMsg receiving from 10 0-14 Bad address"

This eventually happens when libsfcBrokerCore.so calls libc abort function causing the following core dump from gdb.

Program terminated with signal 6, Aborted.
#0 0x2adf4874 in kill () from /lib/libc.so.0
(gdb) bt
#0 0x2adf4874 in kill () from /lib/libc.so.0
#1 0x2ae3c58c in raise () from /lib/libc.so.0
#2 0x2ae3f5d0 in abort () from /lib/libc.so.0
#3 0x2b7095b4 in spRcvMsg () from /usr/lib/libsfcBrokerCore.so.0
#4 0x2b709af8 in spRecvCtlResult () from /usr/lib/libsfcBrokerCore.so.0
#5 0x2b6a0630 in localConnect () from /usr/lib/libcimcClientSfcbLocal.so
#6 0x2b6a093c in CMPIConnect2 () from /usr/lib/libcimcClientSfcbLocal.so
#7 0x2b6a0a28 in CMPIConnect () from /usr/lib/libcimcClientSfcbLocal.so

Discussion

  • Manish Tomar

    Manish Tomar - 2008-08-13

    Logged In: YES
    user_id=2179806
    Originator: YES

    Please note that this does not appear in x86 with glibc. It seems to be appearing only on uclibc with big endian processor.

     
  • Chris Buccella

    Chris Buccella - 2008-08-20
    • assigned_to: buccella --> smswehla
     
  • Chris Buccella

    Chris Buccella - 2008-08-21
    • priority: 5 --> 7
     
  • Sean Swehla

    Sean Swehla - 2008-08-21

    Logged In: YES
    user_id=1939165
    Originator: NO

    Do you get any messages in your log file? There's only one abort() call in spRcvMsg, and it should be logging an error before the call is made.

     
  • Chris Buccella

    Chris Buccella - 2008-08-22

    client test program (sfcc)

     
  • Chris Buccella

    Chris Buccella - 2008-08-22

    Logged In: YES
    user_id=1550470
    Originator: NO

    1) I am trying to recreate this problem. I have attached the test program that I am using to do so. I have run this program in a BASH for loop for 200 iterations, and have not seen the error you describe. Please check the attached source code and verify that it is similar to what you are using.

    2) As Sean suggested, log output would be very useful. Please consult your syslog and attach a copy of the messages from sfcbd.

    3) Trace output would also be useful. Please run "sfcbd -t 65536 2> sfcb-output" and recreate your problem scenario. Then attach the sfcb-output file to this tracker item along with the output for #2.
    File Added: v2test_ei-ami.c

     
  • Chris Buccella

    Chris Buccella - 2008-08-22
    • status: open --> open-accepted
     
  • Venkatesh Ramamurthy

    Logged In: YES
    user_id=2180106
    Originator: NO

    Chris,
    I believe that problem will not happen if the test util is run 200 times from the BASH shell as test process termination cleans up everything. The connect()/enumerate()/release() needs to be looped several times from within the test application.

     
  • Chris Buccella

    Chris Buccella - 2008-08-23

    Logged In: YES
    user_id=1550470
    Originator: NO

    I have changed my test program so that the for loop executes inside of it as you suggested. I still do not see the problem occur. My virtual MIPS box is using glibc... perhaps the issue is with uclibc?

    Could you please attach the information I requested in my previous post? This would help us.

     
  • Manish Tomar

    Manish Tomar - 2008-08-23

    Logged In: YES
    user_id=2179806
    Originator: YES

    Chris,
    It might be difficult to give the trace output as it is occuring in embedded system where we do not have much space and error occurs after the space is completely filled up. I'll attach the syslog output once the error occurs.

     
  • Manish Tomar

    Manish Tomar - 2008-08-25

    Logged In: YES
    user_id=2179806
    Originator: YES

    Following is outputted to syslog when this occurs:

    Aug 25 19:35:05 (none) lighttpd[9697]: --- Warning: fd is closed: Resource
    temporarily unavailable
    Aug 25 19:35:05 (none) lighttpd[9697]: ### 0 ??? 0-0

     
  • Manish Tomar

    Manish Tomar - 2008-08-26

    Logged In: YES
    user_id=2179806
    Originator: YES

    Somtimes, second output is different:

    Nov 22 09:21:07 (none) lighttpd[8345]: --- Warning: fd is closed: Resource temporarily unavailable
    Nov 22 09:21:07 (none) lighttpd[8345]: spGetMsg receiving from 14 0-14 Bad address

     
  • Sean Swehla

    Sean Swehla - 2008-08-26

    Logged In: YES
    user_id=1939165
    Originator: NO

    When you say "space is completely filled up", do you mean space on the file system? Does the error occur only after space has filled completely? Does it ever occur when there is still space available?

     
  • Manish Tomar

    Manish Tomar - 2008-08-26

    Logged In: YES
    user_id=2179806
    Originator: YES

    Mostly, yes. The error normally occurs after space in /tmp gets filled up due to loads of output from sfcb. Other areas are Read-only and are not advised to be accessed frequently. Recently, the error has been quite frequent and I'll try it to get it with the trace. Do you want the trace on the client app also (i.e. app linking to libcimcClientSfcbLocal.so)? Our just in sfcb server?

     
  • Chris Buccella

    Chris Buccella - 2008-08-27

    Logged In: YES
    user_id=1550470
    Originator: NO

    manishtomar, I think what Sean was trying to determine is if the filling up of the filesystem is the cause of the error. Can you confirm this? Does the error occur when there is plenty of disk space available?

     
  • Manish Tomar

    Manish Tomar - 2008-08-28

    Logged In: YES
    user_id=2179806
    Originator: YES

    What I meant is if sfcbd is started with trace the trace log is not accurate as it eats up all the space in filesystem. We normally run it without trace having enough space in filesystem. The error still occurs.

     
  • Chris Buccella

    Chris Buccella - 2008-08-30

    Logged In: YES
    user_id=1550470
    Originator: NO

    -Aug. 29 Braindump-

    This error starts in msgqueue.c:spGetMsg() where the recvmsg() syscall returns 0. According to the glibc manpage for this function, this indicates that the peer closed the socket. The strange part is that errno is set to EINTR, which indicates that the system call was interrupted. If is is what happens, we should just retry; sfcb takes care of this, but the condition (immediately below recvmsg() in the if block) is never reached, since it expects EINTR only if recvmsg() returns an error (<0). uClibc's implementation of recvmsg() may be setting the return code wrong. Or perhaps recvmsg() isn't setting errno at all, but is set by some other function. I need to investigate what errno should be for all return cases.

    Another possibility is that sfcb closed the socket on itself... perhaps some other thread running closed the wrong socket. Tracing on the msgqueue component should tell us if this happens; I've been tracing for the past 2 hours and haven't hit the error yet.

    The crash is the result of an abort() call in a case statement at the end of spRcvMsg(). If EINTR was detected correctly previously, this case statement is not reached; instead sfcb retries until a successful return from recvmsg().

     
  • Nobody/Anonymous

     
  • Chris Buccella

    Chris Buccella - 2008-10-02
     
  • Chris Buccella

    Chris Buccella - 2008-10-02

    committed to HEAD.

    This LTC bug #47412.
    File Added: 2049872-fd_is_closed.patch

     
  • Chris Buccella

    Chris Buccella - 2008-10-02
    • status: open-accepted --> pending-fixed
     
  • SourceForge Robot

    • status: pending-fixed --> closed-fixed
     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 60 days (the time period specified by
    the administrator of this Tracker).

     

Log in to post a comment.