#399 Player SVN trunk: fixed error in server<->server connections

closed
G Biggs
Player (393)
5
2009-06-07
2009-03-20
No

Some time ago I wrote a message on plauyerstage-developers that I'm experiencing errors with server<->server communications. Finally I found some time for a deeper investigation and found that it had to be caused by recent WIN32-compatibility improvements. There are two solutions for this:
1. Use recv() instead of read() on unixes, as in other parts of Player (see libplayerc code) - that is what I'm doing in my patch (whenever WIN32 is not defined); note that communication errors are indicated different way on different unixes (different sets of errno constants), but one thing is sure: if communication is broken, recv() returns 0 (we can assume then, -1 means no data available).
2. Do not use non-blocking sockets, in this cause read() behaves much the same way on WIN32 and unixes. We may use select() or poll() instead of relying on non-blocking communication, however, I don't know how it should be done on WIN32. Therefore I have choosen solution 1 which was easier to implement.
This solved my server<->server communication problems.

Discussion

  • Paul Osmialowski

    • labels: --> Player
    • assigned_to: nobody --> gerkey
     
  • Toby Collett

    Toby Collett - 2009-03-22
    • assigned_to: gerkey --> gbiggs
     
  • G Biggs

    G Biggs - 2009-03-31

    recv() works much the same on both Windows and POSIX, irrespective of if the socket is non-blocking or not. I think we should be able to change the read() calls into recv() calls. playertcp.cc uses recv() instead of read() on non-blocking sockets. We then just need to alter the result and errno checks to handle the various cases (0 = closed cleanly but probably unexpectedly, -1 = some kind of error, possibly just "no data yet", etc).

     
  • G Biggs

    G Biggs - 2009-03-31

    I've changed all the read() calls into recv() calls, with appropriate error checking. Try it out and see if the problem is fixed or not.

     
  • Paul Osmialowski

    libplayertcp/remote_driver.cc - with additional non-zero errno check

     
  • Paul Osmialowski

    Recent changes didn't solve the problem. As I wrote once
    before: errno setting for blocking differs from Unix to Unix and recv()
    call have differ in part where errno values are listed. I've added one
    debugging message in remote_driver.cc code and look what I saw on Solaris:
    invoking player_driver_init()...
    success
    listening on 6665
    Listening on ports: 6665
    error : recv() failed: Error 0
    warning : errno: 0, ErrNo: 0, ERRNO_EAGAIN: 11, EAGAIN: 11, EINPROGRESS:
    150, EWOULDBLOCK: 11
    error : error reading message header from remote device
    error : unable to subscribe to position2d device
    error : Driver failed to Setup (-1)

    On Solaris, recv() man page has no EAGAIN listed at all!

    Proposed patch (slightly based on what I saw in client.c file) adds additional checks for errno values (most important is check if errno is non-zero for every recv() and send() - however uplink buffer overrun that would block send() is less likely). I guess errno == 0 (Error 0 message) should always be treated as no_error. Similar thing needs to be done for client.c.

     
  • G Biggs

    G Biggs - 2009-04-02

    I suspected EWOULDBLOCK might be involved, but wasn't sure because on Linux it's the same as EAGAIN.

    Just want to clarify something: why are you checking for the EINPROGRESS error? The socket is non-blocking, so if this occurs it's probably indicative of something really weird going on rather than a geuine "in progress" error.

    Good catch on the PLAYER_MSG calls.

     
  • Paul Osmialowski

    @EINPROGRESS: I was wondering why client side works when it is done similar way. What I saw in client.c was the use of EWOULDBLOCK and EINPROGRESS checks. So I copied this to remote_driver.cc. Unfortunately it didn't help, so I added mentioned PLAYER_WARN6 for a while. Thad showed me, on Solaris errno is 0 if recv() would block.I don't know how it would be on other unixex (next week I plan to compile the most recent SVN snapshot on my QNX 6.2 box), so I decided to add all checks in one 'if'. I was wondering how come client side works for me anyway. Looks like client side uses blocking sockets and poll() calls for avoiding blocking, compare:
    remote_driver.cc non-blocking socket settings:
    if(fcntl(this->sock, F_SETFL, O_NONBLOCK) == -1)
    {
    PLAYER_ERROR1("fcntl() failed: %s", strerror(errno));
    close(this->sock);
    return(-1);
    }
    client.c settings:
    if ((old_flags = fcntl(client->sock, F_GETFL)) < 0)
    {
    PLAYERC_ERR1("error getting socket flags [%s]", strerror(errno));
    return -1;
    }
    if (fcntl(client->sock, F_SETFL, old_flags & ~O_NONBLOCK) < 0)
    {
    PLAYERC_ERR1("error setting socket non-blocking [%s]", strerror(errno));
    return -1;
    }
    (it seems like it is good idea to use old_flags in remote_driver.cc too)
    In client.c O_NONBLOCK is bitwise inverted, so no matter what error message says ("error setting socket non-blocking") this fcntl() sets blocking sockets. As we know, blocking sockets does not cause any troubles with cross-systems programming, as long as we know how to manage with select()/poll() on different architectures.

     
  • G Biggs

    G Biggs - 2009-04-03

    I've added checks for errno == 0 and EWOULDBLOCK. Since the socket is non-blocking, I've left out EINPROGRESS, as I don't think it's necessary, nor a good idea to potentially hide a different error. If it turns out to be necessary, we can add it then.

    Looking at playertcp.cc, we may want to apply the same changes there once we have this sorted out, as it uses similar logic for receiving.

     
  • Paul Osmialowski

    I don't know if playertcp.cc should be fixed, would be nice probably, however, I was testing current SVN snapshot for about half an hour running two player servers on different machines (Linux x86_64 and Solaris 10 UltraSPARC), even swapping server roles (which server subscribes devices from which), sending command and receiving data (position2d and dio mostly) and everything just worked fine.

     
  • Toby Collett

    Toby Collett - 2009-06-06

    Where are we up to with the stuff? can we close this ticket yet?

     
  • Paul Osmialowski

    It works for me now, so I guess it can be closed as fixed.

     
  • G Biggs

    G Biggs - 2009-06-07

    The changes have been made in SVN trunk.
    Thank you for your contribution.

     
  • G Biggs

    G Biggs - 2009-06-07
    • status: open --> closed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks