Menu

#216 fails to detect loss of connection

important
open
nobody
5
2001-05-01
2001-05-01
Mark Whitis
No

gabber 0.8.2 tends not to detect a loss of connection
such as on a dialup connection which is timed out and
has to redial (getting a different ip address).

I thought the jabber server and client were supposed to
send each other blank lines occassionally to determine
if they were still in contact by timing out if
such a message hadn't been received and because
it will provoke "connection reset by peer"
or "no route to host"..

gabber not only does not reconnect to the server, it
doesn't even let you know you aren't online. The
client/server connection could have been broken hours
ago and you have no way of knowing it.

Discussion

  • Julian Missig

    Julian Missig - 2001-05-01
    • labels: --> gabber backend
    • milestone: --> important
     
  • Julian Missig

    Julian Missig - 2001-05-01

    Logged In: YES
    user_id=9539

    I'm fairly certain Gabber's TCP code *does* send a blank
    space every now and then to be able to detect whether it's
    still connected or not.

    Last time I've had any reports of this or experienced it
    myself was before we figured out that we needed to remove
    all optimizations if we compile gabber without SSL
    support... but perhaps g++ is also messing up for it with
    SSL support...

     
  • Konrad Podloucky

    Logged In: YES
    user_id=36533

    Yep, Gabber definitely sends a blank every 15 seconds to
    check if the connection's still there. What does 'netstat'
    tell you about the connection to the Jabber server in these
    cases?

     
  • Nobody/Anonymous

    Logged In: NO

    still happens as of 0.8.6 - you can be disconnected, and
    gabber will just display the same status as of the time you
    disconnected.

     
  • Julian Missig

    Julian Missig - 2002-02-01

    Logged In: YES
    user_id=9539

    I think you can assume any bug not marked closed still
    happens. Please send a patch.

     
  • Mark Whitis

    Mark Whitis - 2002-02-02

    Logged In: YES
    user_id=131590

    To reproduce this, try "ifconfig eth0 down" if you have a permanent connection. That should be close enough to
    the conditions of a dialup connection going down.
    Gabber blindly reports that it is online hours after
    the connection has dropped.

    You cannot rely on the TCP/IP stack to tell you when this happens. That is not its job. TCP is intended to deliver of flaky networks and will retransmit until the cows come home. By default, TCP will wait 3 hours before it sends the first keep-alive probe You can change this with sysctl(2) but only on a system wide basis, rather than per connection.
    On linux kernel 2.4.3:
    tcp_keepalive_time=7200
    tcp_keepalive_probes=9
    tcp_keepalive_intvl=75
    I.E., the system will wait 2 hours before sending 9 keepalive probes at intervals of 75 seconds before dropping the connection. This is intentionally set to very long timeouts so, for example, your 600MB file transfer doesn't go splat just because there is a temporary network outage upstream.
    There is one time, however, that TCP will typcally tell you the connection has been lost = when it can actually communicate with the other side and the other side no longer recognizes the connection and sends a RST packet. This could happen when you reestablish a dialup connection but your IP address has changed.

    Netstat still shows "ESTABLISHED". The TCP stack doesn't complain because a lost route to a host is often only a temporary condition (keep alive might
    make it eventually report a problem).

    However, if each end sends a blank line once every 30 seconds and resets a 2 minute timer everytime you receive incoming data you will know when the connection has been lost or become unreliable because you will get a SIGALARM when this happens. The actual times should be tunable in preferences.

    I am still using Gabber 0.8.4. I cannot upgrade to 0.8.6 without an unreasonable amount of work ( I have already spent an hour or two today on this bug) for somethhing that it is known won't fix the problem (another user reported the problems still exists in 0.8.6) because it requires a newer version of openssl than is availible for redhat 7.1.

     
  • Julian Missig

    Julian Missig - 2002-02-02

    Logged In: YES
    user_id=9539

    For upgrading when you have Red Hat 7.1... go to your local
    friendly Red Hat 7.2 mirror, and grab package 'openssl'
    along with 'openssl096' - if you install them at the same
    time, it'll upgrade fine with no problems.

    If you'll note TCPTransmitter::startPoll and
    TCPTransmitter::_connection_poll, TCPTransmitter does send a
    blank line every 2.5 minutes.

     
  • Matthias Wimmer

    Matthias Wimmer - 2002-02-02

    Logged In: YES
    user_id=59036

    Can anybody confirm that this bug is still open in 0.8.6? In
    older versions I had the problem too, but for me it has been
    fixed some time ago.
    (I have a dial up connection to the internet.)

     

Log in to post a comment.