A case has been seen where syslog gets filled with thousands of messages like the one below:
May 3 15:37:48 SC-1 osaflogd: ncs_sel_obj_rmv_ind: recv failed - Socket operation on non-socket
Probably the wrong file descriptor is being used here when this happens. When looking at the code, there are some obvious improvements that can be made:
- Whenever the file descriptors raise_obj and/or rmv_obj are closed, the file descriptors in the data structure should be overwritten with -1 to indicate that the file descriptor is no longer valid. Relying on subsequent system calls to fail with EBADF is not a good idea, since the file descriptor may be re-cycled. This might be what has happened in the syslog entry above.
- The function ncs_sel_obj_rmv_ind() should check if either file descriptor is less than zero, and if so, return immediately without trying to operate on the file descriptors. It may log to syslog in this case, but in order to avoid spamming the log it should make sure to log only once. This can be achieved by e.g. logging if the file descriptor is -1, and then change it to -2 so that the next call will not log to syslog.
- If, after implementing the changes suggested above, recv() still fails due to any other reason than EAGAIN, EWOULDBLOCK or EINTR, we should call osaf_abort() to generate a core dump. Errors like "socket operation on non-socket" is an indication of a bug.