#89 Gracefully handle providers doing abort()

1.4.8
fixed
nobody
None
None
Function
2014-03-27
2013-11-15
Klaus Kämpf
No

If a provider executes abort(), sfcb does not detect this and communication with this provider stalls.

Attached patch fixes this.

1 Attachments

Related

News: 2014/03/new-release-sfcb-148

Discussion

  • Dave Heller
    Dave Heller
    2014-01-06

    I agree, we should handle an abort in the same way as a segfault. The main reason being: the provider process should try to close any outstanding http request handlers before it dies. That's one of the things the current handleSigSegv() handler does.

    Sending a CMPI_RC_ERR_FAILED response to the req handler in turns sends the appropriate response back the client. It also allows the req handler to complete normally so it does not hang in spRcvMsg().

    I didn't like the idea of reusing the existing handleSigSegv() because I wanted a response message that matches the signal. But also: when I tested a provider crash with SIGSEGV, I would get a "bounce", or second call to the handler immediately after the first, since it calls abort() at the end of the handler.

    I wrote a new generic handler that handles SIGABRT, SIGSEGV and SIGFPE. I used a flag to ensure the handler will be called only once. This seemed easier than trying to change the sigmask -- since, to prevent the "bounce" I think we need to change the sigmask of the main thread from within the signal handler thread, which is not straightforward. (altho, I'm open to suggestions here)

    To test, I added a new method to TestMethodProvider to simulate various provider problems. This was inspired by your test provider, Klaus! Hopefully it's clear enough for the moment, by looking at the code... you can call all of these:

    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=hang
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=abort
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=fpe
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=segfault
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=sigabrt
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=sigfpe
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=sigsegv
    wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=sigkill
    

    And get responses like:

    $ wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=abort
    *
    * wbemcli: Cim: (1) CIM_ERR_FAILED: *** Provider TestMethodProvider(13499) exiting due to a SIGABRT signal
    *
    $ wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=fpe
    *
    * wbemcli: Cim: (1) CIM_ERR_FAILED: *** Provider TestMethodProvider(13812) exiting due to a SIGFPE signal
    *
    $ wbemcli cm http://localhost/root/cimv2:Sample_Method Misbehave.Action=segfault
    *
    * wbemcli: Cim: (1) CIM_ERR_FAILED: *** Provider TestMethodProvider(13829) exiting due to a SIGSEGV signal
    *
    

    With no hung http request handlers!

    I moved some of the response generation code into the signal handler. This is arguably a lot for a signal hander to do, but it greatly simplifies the code. I think it's pretty safe since I'm ensuring the handler can only be called once -- and in the worse case, if the handler fails and we don't generate the response, then the client gets no reply and we may get a hung http req handler -- which happens in many cases currently. And we may get some core dump that's slightly different from the one we would generate by calling the final abort().

     
  • Klaus Kämpf
    Klaus Kämpf
    2014-01-07

    Awesome, thanks so much !

     
  • Dave Heller
    Dave Heller
    2014-01-12

    Commit [8f807c] for v1.4

     

    Related

    Commit: [8f807c]

  • Dave Heller
    Dave Heller
    2014-01-12

    • status: open --> pending
     
  • Dave Heller
    Dave Heller
    2014-02-07

    • Release: backlog --> 1.4.8
     
  • Dave Heller
    Dave Heller
    2014-03-27

    • Status: pending --> fixed