full backup dies - tape change request

Help
Roy Bixler
2005-11-24
2012-12-07
  • Roy Bixler

    Roy Bixler - 2005-11-24

    I'm encountering a strange problem with "afbackup" on a Debian sarge client and server setup.  I configured the full backup in 2 parts.  When I execute the 1st part, the backup runs for several hours and then it requests another tape.  The date stamp on the message requesting the tape change is "Thu, 24 Nov 2005 00:49:49 -0600".  Then, about 40 minutes later, the client dies:

    Thu Nov 24 01:29:22 2005, afbackup got signal 13, exiting
    Thu Nov 24 01:29:23 2005, Connection to client process lost, exiting.
    Thu Nov 24 01:29:26 2005, Full backup part 1 interrupted.

    The server is still running.  Does anyone have an idea about why the client stopped waiting?

     
    • Roy Bixler

      Roy Bixler - 2005-11-27

      To follow up, I made a Debian package for v. 3.3.9beta and installed the packages on both client and server.  I then reran the backup and found that it still dies while waiting for the tape change.  If it helps, both client and server are running Debian sarge.  The server has kernel 2.4.31 and the client has 2.6.14.2.  Interestingly, I also recall that I do not have this problem of the clients exiting prematurely on Solaris.

      Also, I noticed this post from Dec. 2003 from "dnillson":

      https://sourceforge.net/forum/forum.php?thread_id=984572&forum_id=92201

      It describes exactly the same problem.  He observed that the server no longer seemed to respond to the client when it asked for a tape change and the client just quits in responce.  I observed the following when the server was asking for a new tape:

      $ netstat -t
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address           Foreign Address         State
      tcp    70863      0 server:afbackup client:33224 ESTABLISHED

      That is, the server receive buffer has some bytes pending.  The original poster "solved" the problem by increasing the TCP retries setting.  Does anyone have any better ideas?

       
    • Roy Bixler

      Roy Bixler - 2005-12-12

      I've tracked the problem to be some kind of general TCP/IP communication problem.  While trying some experiments with removing "keepalive" from "afbackup", I noticed that there were a few hundred "overflows" on the server's Ethernet interface.  So then, I did 3 things and now the client no longer dies on a tape change:

      1) changed the server to a different port
      2) upgraded server's kernel to 2.4.32 (one of the changes with 2.4.32 is "to avoid 'over-clamping' the TCP/IP receive buffer")
      3) I noticed that the client's Ethernet card was not configured to do flow control.  I fixed that by using "ethtool -A eth0 autoneg off" and then "ethtool -A eth0 rx on tx on"

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks