The following issue has been ASSIGNED.
======================================================================
http://bugs.bacula.org/view.php?id=1817
======================================================================
Reported By: jstarek
Assigned To: kern
======================================================================
Project: bacula
Issue ID: 1817
Category: File Daemon
Reproducibility: always
Severity: minor
Priority: normal
Status: closed
Resolution: won't fix
Fixed in Version:
======================================================================
Date Submitted: 2012-01-09 21:01 GMT
Last Modified: 2012-04-18 11:33 BST
======================================================================
Summary: restore of big file from one opensolaris machine
(sd) to other (fd) always terminates after some gigabytes.
Description:
We use one big opensolaris (nexentacore 3.0) filer as backup filer that has the
tape connected to it. We normally do zfs send/recv to that filer and use a local
file-deamon to do the backups to the connected tape. So for backup we use the
loopback adapter (fd -> sd).
Local restore from sd to the local file daemon on the same machine works without
problems. However, when I try to restore to another box via a 1GB LAN, I always
get failed jobs after several gigabytes of data.
Steps to Reproduce:
Used Environment:
- NexentaCore (Opensolaris) backupfiler
- Nexentacore restore client connected via 1 GB lan.
- LTO-5 Tapes attached to the backupfiler.
The last is important: LTO-5 can restore with up to 180MB/s, but the network can
only deliver at most 110MB/s. So when streaming to the remote-FD, the SD must
slow down. (I can't remember that problem with LTO-3 tapes/drives, which we had
before...).
Additional Information:
I found 3 Problems in the bnet.c/bsock.c code which, after I fixed it on my
site, enabled me to do successfull restore an backups again.
a) I changed the read()/write() functions to send()/recv() like on windows.
I changed the write_nbytes()/read_nbytes() in bnet.c to send the data in 8Kb
chunks.
The code before tried to send the whole buffer in one chunk - which might be
too much for the system buffers.
b) In bsock.c the send() function writes the packet length BEFORE the "msg"
pointer. However, in contrast to the comment before the code, NONONE makes sure
that there is actually 4Bytes of space BEFORE "msg". There is in fact some code
in bacula that changes the "msg" pointer to some other (private) data before
calling send() - and the restores msg to its old value.
So this actually causes a memory overwrite bug....
After changing this to a more reasonable 2 step write - first the 4 byte
header - and then the data, stability of restore increased.
c) The bsock class has a mutex lock for locking send() and recv() calls.
However it seems that the default setting is "do not use the locks".
Why? For safety reasons, I enabled these locks by default on the sock class.
I don't know if they are usable in general, but they are vital for bacula's
functionality in my case.
======================================================================
----------------------------------------------------------------------
(0006269) kern (administrator) - 2012-03-30 20:35
http://bugs.bacula.org/view.php?id=1817#c6269
----------------------------------------------------------------------
Items b and c in your changes are not necessary. There *are* always four bytes
before the msg pointer, which can be verified by studying the code, and adding
addition locking to what is already implemented in Bacula is not necessary and
will just slow down communications.
Item a is interesting, but in principle there is no difference between
read/write and send/recv, so I don't see the need to change it -- especially
since it involves some rather intrusive coding. If you can explain why
send/recv are better or useful, we can take another look at it.
Most likely the problems you are seeing are because of a bad switch or incorrect
(or changed) timeout or window size parameters in the OS of one of the ends of
the communication line (typically we see hangup problems on Windows machine that
were "hardened").
----------------------------------------------------------------------
(0006274) jstarek (reporter) - 2012-04-01 23:23
http://bugs.bacula.org/view.php?id=1817#c6274
----------------------------------------------------------------------
Hi kern,
thanks for the update. You are right, there *are* always 4 bytes before each
pool allocation - I finally found the code in the memory pool functions - the
special 4 bytes at the end of each allocation header... However, I consider this
a rather crude optimization ;-) - but thats not my problem... If it works, its
ok...
So yes, b) IS unneccessary.
c) I have to double check again....
But what I recall about a) - it's some months ago... - is (correct me if I'm
wrong):
- when you create the socket, you check for the netbuf size and allocate the msg
pool accordingly...
- but when the data you try to send over the wire is bigger, you expand that
pool to fit the block.
- then, you send the block as a whole. (and therefore you may exceed the
initially correct os limits).
So I think the main benefit in my changes may be to just reduce it to a fixed
limit (in my case 8k). But it may have helped to just reduce it to the system
NETBUF limit.
The recv/send functions may not have more benefits here, but according to the
man pages, they can return ENOMEM + EMSGSIZE that could indicate this error
as a recoverable error whereas write would simply return an error that is
unrecoverable and therefore terminates the restore.
I chose the 8k limit, because I did want a "proof" of concept. If this is the
real problem, one would normally have a configurable parameter, that limits the
maximum send size deterministically to a defined fixed value.
(or reacts on EMSGSIZE and reduces the send size dynamically...)
I'll be able to do some tests in a week or two (then the backup filers will be
free again...).
Regards,
Jürgen.
----------------------------------------------------------------------
(0006276) kern (administrator) - 2012-04-02 18:24
http://bugs.bacula.org/view.php?id=1817#c6276
----------------------------------------------------------------------
Concerning storing the 4 bytes in front of the buffer, it is a bit ugly,
but it works. I may find a better solution someday, but for now it saves
one system call per buffer write.
On the packet size: Bacula first determines what the largest buffer the system
can handle, then sets that as the system buffer size. This size can also be set
by the user.
I forget the exact details, because the code was written 12 years ago, but I
believe that Bacula then will use buffers up to that size for sending data.
So aside from messages that are impossibly long, the buffers never should
exceed the size determined. If they do, in principle, it should never create
any problem because the OS is there exactly for the purpose of chopping up
writes and reconstructing them at the other side. So, it seems to me to be
extra work, complication, and possibly new bugs to add such code to Bacula.
I need a convincing argument to be able to seriously consider changing the code
which has function absolutely correctly for a very long time. The problems that
we have seen are all due to bad ethernet cards, bad switches, or poor OS
configuration parameters (typically "hardening" efforts on Windows). All these
can be solved by getting properly working communications equipment, and possibly
by adjusting some of the Directives that already exist in Bacula.
----------------------------------------------------------------------
(0006288) kern (administrator) - 2012-04-18 11:33
http://bugs.bacula.org/view.php?id=1817#c6288
----------------------------------------------------------------------
At this point, I don't see any reason to change the code, since the code is
fairly complex compared to the original and could potentially introduce error, I
would need some really definitive proof that it improves the communications.
For that reason, I am closing this bug report.
Issue History
Date Modified Username Field Change
======================================================================
2012-01-09 21:01 jstarek New Issue
2012-01-09 21:05 jstarek File Added: bnet.c
2012-01-09 21:06 jstarek File Added: bsock.c
2012-01-09 21:17 jstarek File Added: bnet_and_bsock_diff.patch
2012-02-10 07:38 ebollengier Severity major => minor
2012-03-30 20:35 kern Note Added: 0006269
2012-03-30 20:35 kern Status new => feedback
2012-04-01 23:23 jstarek Note Added: 0006274
2012-04-01 23:23 jstarek Status feedback => new
2012-04-02 18:24 kern Note Added: 0006276
2012-04-02 18:24 kern Status new => feedback
2012-04-18 11:33 kern Note Added: 0006288
2012-04-18 11:33 kern Assigned To => kern
2012-04-18 11:33 kern Status feedback => closed
2012-04-18 11:33 kern Resolution open => won't fix
======================================================================
|