Menu

#1834 accept() failed in openhpi daemon

3.5.0
closed-fixed
nobody
OpenHPI Daemon
5
2014-10-06
2014-06-06
No

When openhpi service is restarted, the below errors are observed in syslog:
"openhpid: transport: strmsock.cpp:558: accept failed."
and
"openhpid: transport: strmsock.cpp:296: error while sending message"

The error messages go away if attached changes are applied to the sources.

1 Attachments

Discussion

  • dr_mohan

    dr_mohan - 2014-06-18

    The above error messages seems to be coming from different location than where the code changes are. Some of the error messages are not showing up when the process is killed. There are few questions on the patch. Bugs are usually fixed on the trunk.

    1. When the accept call failed, it used to give an error message and break out of the loop. Now it does not give the error message and continuous with the loop after a sleep. Why? Could the error message be avoided the first time? Could we continue only few times and break out of the loop?
    2. Generally, trying to understand what the patch is trying to accomplish
    3. The vaiable, cStreamSock::eWaitCc wc, in server.cpp is not getting used.
    4. If the error message is appearing only once but continuing without problem, is there a way to avoid the error message only the first time.
     
    • Vladimir Pavlov

      Vladimir Pavlov - 2014-06-18

      I wouldn't like call my changes in the source as "patch". It is rather workaround.
      The problem is persistent appearance of the error messages during openhpid restart.
      So I need patch which doesn't show error messages only when openhpi daemon is stopping. To manage it I added error recognition in cStreamSock::ReadMsg() from strmsock.cpp if peer closed connection in ahead of server finished reading (via payload_len and such decision is an ugly hack). Also I removed critical error if the one appears because of timeout. Timeout error processing was moved to server.cpp. I hope it is answer for first, second and partly fourth questions.
      Regarding question 3: O, yes, in service_thread(). I forgot to delete unused declaration. But in oh_server_run() this variable is used.

       
  • dr_mohan

    dr_mohan - 2014-06-18
     
  • Praveen

    Praveen - 2014-07-04

    1) Tested the patch by running the clients on different machine rather than where the openhpid is running. Without the patch we get the two errors transport: CRIT: strmsock.cpp:558: accept failed. and transport: CRIT: strmsock.cpp:296: error while sending message.
    2) When I applied the patch and tested it, the first error(transport: CRIT: strmsock.cpp:558: accept failed) goes off but the second one(transport: CRIT: strmsock.cpp:296: error while sending message) appears sometime. Second one does not appear all the time but it comes sometime.
    3) Variable 'wc' is declared in service_thread() but not used in this function, we can remove this declaration from the patch.
    4) The message [CRIT( "select failed" );] has been removed , I feel we can retain this one.

     
  • dr_mohan

    dr_mohan - 2014-07-07

    We will checkin the attached patch, if it is ok. Please review

     
  • Anton Pak

    Anton Pak - 2014-07-07

    Thanks for the patch!
    Several comments and questions:

    o) Just whining: let's have {} even for a single line if construct!

    o) Are we trying to handle a transient accept error?

    o) What if a client drops connection attempt between Wait and Accept? Looks like a race condition.

    o) "Select failed" logging in server.cpp may be confusing. We call sock->Wait() and wait potentially can be implemented with another mechanism.

    o) Assigning "payload_len = (uint32_t)(-1)" looks dangerous.
    Can we assign it to zero instead?

     
  • Praveen

    Praveen - 2014-07-16

    Thanks Anton for reviewing the patch and comments.

    We will take of care of first and last point in the patch.

    For second point, Indeed Vladimir is trying to avoid two error messages as described in the bug.

    Regarding third point, we have not seen failure of Accept() call so far. We will have a check with Vladimir if he has faced any time Accept() call failure. And in case Aceept() call does not fail there will be no race condition. In case of Accept() call fails, could you please suggest us that either should we continue in the loop or break out of the loop.

    On the fourth point, could you please give us some idea on implementation of different mechanism for Wait() instead of calling sock->Wait() and wait?

     
  • dr_mohan

    dr_mohan - 2014-07-31

    Anton,

    The attached is the modified patch for the defect. Looks like the higher level error message could be completely removed. We have not seen the accept call failure though

     
  • Anton Pak

    Anton Pak - 2014-08-01

    Mohan, thanks!

    My points:

    • let's not mix tabs and spaces in indentation :)
    • errno is no cross-platform. See http://msdn.microsoft.com/en-us/library/windows/desktop/ms740121(v=vs.85).aspx
    • the same about <unistd.h></unistd.h>
    • "Select failed" message now reports about sock->Wait() error. Let's change it to "Waiting on server socker failed"
    • suggest to have {} even for a single line "if" action.
     
  • Praveen

    Praveen - 2014-08-14

    Anton,

    Please find the attached modified patch which has addressed most of your above mentioned points.

     
  • Anton Pak

    Anton Pak - 2014-08-14

    Thanks!
    I have only few points:

    • Let's not mix tabs and spaces in indentation. The whole file is space-indented.
    • Let's use cross-platform g_usleep instead of Unix sleep.
    • Also let's remove Unix-specific errno.h and unistd.h
    • Let's remove "// select timeout" line. I think the code that follows that file is self-explanatory.
     
  • Praveen

    Praveen - 2014-08-22

    Anton,

    I have attached the modified patch(which addresses all of your above points).

     
  • Anton Pak

    Anton Pak - 2014-08-22

    Cool!
    Now I have only two comments.

    1) please use spaces instead of tabs for indentation in "Error accepting server socket." block.

    2) In the block:
    g_usleep( 1000000 );
    if (stop) {
    break;
    }

    let's have "stop" check first and then g_usleep.

     
  • Praveen

    Praveen - 2014-08-25

    Thanks Anton for comments.

    I have addressed the above two points in the attached patch.

     
  • Anton Pak

    Anton Pak - 2014-08-25

    Great! Looks perfect for me.

     
  • Hemantha Beecherla

    Fixed in revision 7588.

     
  • Hemantha Beecherla

    • status: open --> closed-fixed
    • 3.5.0: 3.2.1 --> 3.5.0
     
  • dr_mohan

    dr_mohan - 2014-10-06
    • labels: --> OpenHPI Daemon
     
MongoDB Logo MongoDB