Sorry, this is much longer than I had planned. If you have large FileSets,
please read at least the last 3 or 4 paragraphs.
I would like to say that I am relatively happy with release 2.0.x. As with
every major release, there are always "teething" problems in the first few
versions, but my feeling is that this release has had fewer and far less
serious bugs than previous versions.
In case you are not aware, version 2.0.2 is the most recent release. I now
have a few more bug fixes in the CVS, but I don't consider them serious
enough to make an additional release just for them.
As you probably know, my two personal topics for the next release are:
1. Bacula GUI, now going by the user preferred name "bat".
2. Performance improvements.
bat is progressing nicely, but a bit slower than I expected due to item #2
(see below). It now communicates with the Director, and the GUI interface is
all defined using designer (GUI layout program) forms. Though my
implementation is not very elegant, I've also managed to implement the
console and a dummy restore "page" with designer in separate subdirectories.
The restore page does nothing, but it does put up an interface that Eric sent
me, which is the same as the brestore GUI.
Having the pages implemented with designer and in separate subdirectories will
vastly simplify adding new pages (i.e. functionality) and allowing multiple
developers to work at the same time.
I'm currently adding code to obtain the default values from Bacula (jobs,
pools, storage, ...) so that it can display dialogs similar to the gnome
console, where the dialog "knows" which jobs can be selected, ...
I have planed on implementing the following things:
- Immediate disconnect of the FD from the SD after sending all
files to the SD. This will permit Laptops to send the data, then
disconnect, even if there is spooling or attribute insertion to be
This is now implemented in the CVS.
- Database performance improvements in several different areas,
the most important being faster insertion of attributes. Eric and
Marc are working on this, and have submitted a working patch
that is not yet integrated, but that gives significant speed improvements
especially for PostgreSQL
Another area is faster pruning, for which I have a patch, but it is
not yet tested.
- Transmitting attributes to the Director in a separate thread while
spooling/despooling the data. This remains to be started.
- Improving the performance of building the in-memory restore tree.
You probably don't know that in 2005 (yea two years ago), I wrote
a red-black binary tree class for Bacula with the intention of using it
for the in-memory restore tree file lists. I completed the code, but
never integrated it into the tree routines (if I remember correctly, this
was because the tree traversal routines were hand crafted linked lists)
Since then, I have converted the tree lists to be Bacula dlist classes,
(doubly linked list) with a "fake" binary sort, which improved performance
significantly. The red-back binary trees remained unused awaiting
An amazing thing recently happened. Rudolf Cejka, being hit by directories
with lots of files, implemented AA binary tree routines that he calls tlist
to replace the Bacula dlist routines in the restore code. It turns out that
AA trees are a simplified form of red-black trees that give similar
performance, but not quite as consistent as red-black trees (AA trees
remain better balanced than rb trees, but that costs a bit more but
speeds up searches).
Rudolf needed to make only trivial changes
to something like 5 lines of code to integrate his tlist routines. I would
like to integrate his tlist code since AA tree handling is much simpler than
RB trees. However, while we are working out the licensing issues, I
corrected one bug that Rudolf found in my RB tree routines, and one other
minor design change and integrated them in the CVS HEAD.
Last night I did some performance testing with the new RB binary tree
code in restore. I was hoping for a 10 times improvement. My test was
rather stupid -- I created a directory containing two subdirectories, one
has 419,549 files (with simple names like a.0 - a.9999, ab.xx ...) and the
second directory has the same number of files with the same names.
This is not really representative of what one would have on most systems
(though it may simulate certain mailbox directories). So the two
subdirectories have approx 840,000 files.
I then backed up the directory containing the two subdirectories (SQLite2
DB) with a full save and did a restore, but stopped the process once all
data was loaded into the in-memory tree. I.e.
1 (select client)
quit (quit after in-memory tree is built)
Now the amazing part:
For Bacula 2.0.2, which uses the dlist routines, it took 58 minutes to load
the in-memory tree (including the time for SQLite to lookup the records).
For Bacula 2.1.2, which used my rblist routines, it took 10.05 seconds to
load the in-memory tree.
A speed up 513 times. I certainly expected an improvement, but not that
At this point, I would appreciate it if some of you could pull down the CVS
code (if you have problems with that let me know, and I will post a .tar.gz
file to my web site) and test it with some large number of files to restore
using real data. Several of you have FileSets of over 1,000,000 files, and I
will be very interested to see what this code produces. I look forward to
If the code proves stable (it passes all my file based regression scripts), I
will probably release it in 2.0.3 or 2.0.4.
Many thanks to Rudolf for showing me how simple it was to integrate the RB