#928 base: Selection object fails due to re-cycled file descriptor

4.6.FC
accepted
nobody
None
defect
base
-
4.4.0
minor
2014-10-01
2014-05-28
Anders Widell
No

A case has been seen where syslog gets filled with thousands of messages like the one below:

May 3 15:37:48 SC-1 osaflogd[7643]: ncs_sel_obj_rmv_ind: recv failed - Socket operation on non-socket

Probably the wrong file descriptor is being used here when this happens. When looking at the code, there are some obvious improvements that can be made:

  • Whenever the file descriptors raise_obj and/or rmv_obj are closed, the file descriptors in the data structure should be overwritten with -1 to indicate that the file descriptor is no longer valid. Relying on subsequent system calls to fail with EBADF is not a good idea, since the file descriptor may be re-cycled. This might be what has happened in the syslog entry above.
  • The function ncs_sel_obj_rmv_ind() should check if either file descriptor is less than zero, and if so, return immediately without trying to operate on the file descriptors. It may log to syslog in this case, but in order to avoid spamming the log it should make sure to log only once. This can be achieved by e.g. logging if the file descriptor is -1, and then change it to -2 so that the next call will not log to syslog.
  • If, after implementing the changes suggested above, recv() still fails due to any other reason than EAGAIN, EWOULDBLOCK or EINTR, we should call osaf_abort() to generate a core dump. Errors like "socket operation on non-socket" is an indication of a bug.

Related

Tickets: #928
Wiki: ChangeLog-4.3.3
Wiki: ChangeLog-4.4.1

Discussion

  • Ramesh
    Ramesh
    2014-06-03

    Primary reason I see to happen this scenario was, the 'fd(s)' were not handled properly with in the process execution flow.

    Chances of happening:
    When the fd is made to close twice in a particular flow viz., after reallocating the closed 'fd' for some other usage, again this 'fd' is made to close by the initial flow.

    Similar behavior was explained in ticket# 147.

    Yes, in this situation better to call osaf_abort() to generate a core-dump.

     
  • Ramesh
    Ramesh
    2014-06-20

    Fix pushed to default, 4.4 and 4.3.

    changeset: 5429:19bbcda1b15a
    tag: tip
    parent: 5426:89f247c08c4e
    user: Ramesh ramesh.betham@oracle.com
    date: Fri Jun 20 18:41:01 2014 +0530
    summary: base: Corrected handling of raise_obj, rmv_obj file descriptors of Selection object [#928]

    changeset: 5428:3ddbecc11a98
    branch: opensaf-4.4.x
    parent: 5414:dba5f3bbbf6f
    user: Ramesh ramesh.betham@oracle.com
    date: Fri Jun 20 18:41:01 2014 +0530
    summary: base: Corrected handling of raise_obj, rmv_obj file descriptors of Selection object [#928]

    changeset: 5427:4c1bea3021ba
    branch: opensaf-4.3.x
    parent: 5413:02e77b43ee5b
    user: Ramesh ramesh.betham@oracle.com
    date: Fri Jun 20 18:36:24 2014 +0530
    summary: base: Corrected handling of raise_obj, rmv_obj file descriptors of Selection object [#928]

    Clean-up of SEL_OBJ macros to "default" branch is pending. Will be pushed subsequently.

    Thanks,
    Ramesh.

     

    Related

    Tickets: #928

  • Ramesh
    Ramesh
    2014-06-25

    Following ER messages shows to do similar corrections in "rda" as well.

    ...............
    Jun 10 09:47:58 SC-1 osaflogd[7691]: ER recv: PCSRDA_RC_IPC_RECV_FAILED: rc=88-Socket operation on non-socket
    Jun 10 09:47:58 SC-1 osaflogd[7691]: ncs_sel_obj_rmv_ind: recv failed - Socket operation on non-socket ....................

    Thanks,
    Ramesh.

     
  • Ramesh
    Ramesh
    2014-07-14

    The fix was already made into 4.3, 4,4 and default br.

    Will be doing SEL_OBJ Macros cleanup in default (4.5) br.

     
  • Ramesh
    Ramesh
    2014-07-14

    • status: unassigned --> accepted
    • Milestone: 4.3.3 --> 4.5.FC
     
  • Anders Widell
    Anders Widell
    2014-08-15

    • Milestone: 4.5.FC --> 4.5.0
     
  • Mathi Naickan
    Mathi Naickan
    2014-10-01

    • Milestone: 4.5.0 --> 4.6.FC