I'm encountering a strange problem with "afbackup" on a Debian sarge client and server setup. I configured the full backup in 2 parts. When I execute the 1st part, the backup runs for several hours and then it requests another tape. The date stamp on the message requesting the tape change is "Thu, 24 Nov 2005 00:49:49 -0600". Then, about 40 minutes later, the client dies:
Thu Nov 24 01:29:22 2005, afbackup got signal 13, exiting
Thu Nov 24 01:29:23 2005, Connection to client process lost, exiting.
Thu Nov 24 01:29:26 2005, Full backup part 1 interrupted.
The server is still running. Does anyone have an idea about why the client stopped waiting?
To follow up, I made a Debian package for v. 3.3.9beta and installed the packages on both client and server. I then reran the backup and found that it still dies while waiting for the tape change. If it helps, both client and server are running Debian sarge. The server has kernel 2.4.31 and the client has 220.127.116.11. Interestingly, I also recall that I do not have this problem of the clients exiting prematurely on Solaris.
Also, I noticed this post from Dec. 2003 from "dnillson":
It describes exactly the same problem. He observed that the server no longer seemed to respond to the client when it asked for a tape change and the client just quits in responce. I observed the following when the server was asking for a new tape:
$ netstat -t
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 70863 0 server:afbackup client:33224 ESTABLISHED
That is, the server receive buffer has some bytes pending. The original poster "solved" the problem by increasing the TCP retries setting. Does anyone have any better ideas?
I've tracked the problem to be some kind of general TCP/IP communication problem. While trying some experiments with removing "keepalive" from "afbackup", I noticed that there were a few hundred "overflows" on the server's Ethernet interface. So then, I did 3 things and now the client no longer dies on a tape change:
1) changed the server to a different port
2) upgraded server's kernel to 2.4.32 (one of the changes with 2.4.32 is "to avoid 'over-clamping' the TCP/IP receive buffer")
3) I noticed that the client's Ethernet card was not configured to do flow control. I fixed that by using "ethtool -A eth0 autoneg off" and then "ethtool -A eth0 rx on tx on"
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.