Re: [Bacula-devel] Project #19

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

Please copy the bacula-devel list.

On Monday 10 July 2006 00:57, Clay Atkins wrote:
> I'm fairly familiar with the following databases:
>
> sybase asa - I have access to this one
> sybase ase
> mysql - I have access to this one
> ms sql - I have access to this one
> oracle - I have access to this one
> postgresql - I have access to this one
>
> All of these, except for mysql, have sophisticated facilities for
> backup. Some people will be satisfied with a backup of the entire
> content of the schemas and data, but for larger database I think the
> DBAs will want to backup the REDO logs (as in postgresql) between
> full database backups, which means keeping the full backup longer.
> Also, it's sometimes a good policy to spread the backup of a database
> system that has multiple schemas/database over different intervals.
> Oracle makes it possible to backup different "sections" of the
> database at different times, which I did for one client because the
> thing was just too big to backup all at once.

This is fine, and I will provide as much information to facilitate this as 
possible, but with the idea that backing up the catalog is the user's 
responsibility.

>
> I'm thinking from the user's perspective that the database backups
> for large databases will probably be configured into unique pools.
> I'm also thinking that it'll be necessary to pass parameters to the
> file director.

What "parameters"?  You need to be specific.   Please read the Bacula  
documentation that describes the components, because I can see that you don't 
exactly have the Bacula vocabulary correct, and that makes it difficult to 
discuss things.  There is a Director (Dir), a File daemon (FD), and a Storage 
daemon (SD).  There is no such thing in Baculaese as file director.  

>
> The relationship between full and incremental is perfect for what
> will be the typical "full" database backup then transaction log
> backups afterward. One enhancement that will be necessary is a two-
> phase commit between the file director and the storage director, so
> that the file director can move, delete or tell the database manager
> that the transaction logs are no longer needed.

There is no such thing as a two-phase commit between the FD and the SD.  The 
FD sends data to the SD and the SD accepts it.  I don't expect this to 
change.  In addition, there is no direct communication between the FD and the 
database manager, and I'm not sure that I foresee that.  I suspect that most 
of what you want to do can be accomplished with ClientRunBefore and After 
jobs.

>
> I think in a segmented backup configuration, where the DBA wants to
> backup different sections at different intervals, that there will
> need to be a way to relate the "full" backups of the different
> segments and the incremental backups to that set of full backups.
> Maybe the set of fulls could be represented as a "full" backup and
> then the segments as differentials. Would that keep one of the
> individual, segment backups from being auto pruned?

Any files that are backed up by Bacula fall under the normal auto pruning 
rules, so it *should* be just a matter of setting them up correctly in the 
Director's conf file.

>
> On restore the database plug-in module for some of the databases will
> need some extra information. Especially oracle. I can see that this
> information could be put in the stream created during backup.

Well, I don't know what "extra information" is.  You need to be specific.

>
> On configuration issues, it seems bacula-like to put everything in
> the direction configuration file. Perhaps a tag within the fileset
> that means "module" then allow that module statement to be bracketed
> with sub-parameters. To eliminate name-space collisions, perhaps each
> module needs a way to identify itself with a name and version that
> gets passed to the file director. Actually, sounds like a thorough
> API needs to be thought out for plug-in modules.

Here you are leaping way ahead talking about subparameters and name-space 
collisions without explaining what you are talking about.  Again you need to 
be specific, I lose interest when someone tells me: "perhaps each
module needs a way to identify itself with a name and version that
gets passed to the FD" but does not explain why.  I am not even sure what you 
mean by "module".

>
> That brings up the other issue, where I noticed that everything is
> pretty much "C" and I was wondering if there was anything thought of
> migrating/drifting to "C++"; once you start coding in real c++, it's
> hard to go back :)

Bacula started out as C code.   However, it *is* now C++.  The rules for what 
C++ syntax is permitted or tolerated is described in the Developer's guide.

>
> Is there a description of the datastream between the director and
> file director? I found job.c under the file director and can see how
> it receives commands and proceses them, but it would save me some
> time if there was a document describing the commands and parameters.

There is *very* old documentation in the Developer's guide, which will give 
you the basic underlying architecture, but the details have evolved 
enormously.  You can see almost all the transactions that take place by 
turning on debug level 100 or more.

I would be very happy to have someone work on this project as it is something 
that is really needed, but it is a very difficult project to do right ...

My suggestions are:

1. Read very carefully the Developer's document.
2. Turn on debug level starting at low levels (maybe 50) and increase it until 
you see the information you want.
3. Look at the current code a bit more.
4. Don't forget to copy the bacula-devel list.
5. Please be much more specific in what you write.  I don't mean definitive in 
the sense that there are a lot of open questions that are not easy to answer 
and a lot of design issues that need to be decided, *but* if you talk about 
modules or information or parameters, make sure to be specific with one or 
more examples.

Best regards, Kern

>
> Clay
>
> On Jul 9, 2006, at 3:41 PM, Kern Sibbald wrote:
> > On Sunday 09 July 2006 21:33, Clay Atkins wrote:
> >> Project #19 - Pluggable Modules
> >>
> >> Has a decision been made on how this will be implemented: new
> >> command, parameters, how it will be configured?
> >
> > No.
> >
> >> I'd like to work on this. I have large databases to backup that need
> >> to be streamed from the server to the storage device.
> >
> > There are three possible ways that I have thought of of
> > implementing this:
> >
> > 1. Implement calling an external program or script that can read/
> > write via
> > stdin/stdout to backup and restore a file.  There is just a little
> > bit of
> > code written for this already.
> >
> > 2. Implement a Python interface that allows a Python event to be
> > called and
> > the Python program can read/write the appropriate file using Python
> > binary
> > buffers.
> >
> > 3. Add a new mechanism to allow the Fileset to tell Bacula that a
> > particular
> > file should be read/written via a specially named FIFO.
> >
> > Advantages of each:
> > 1. Easiest for the user to program.
> >
> > 2. Extremely efficient -- almost as efficient as Bacula doing the
> > backup/restore internally.
> >
> > 3. Extremely easy to implement.
> >
> > Disadvantages of each:
> > 1. Very inefficient; requires a lot of thought about how to do a
> > recovery;
> > probably requires some additions to the Volume format; moderate to
> > a lot of
> > programming.
> >
> > 2. Probably rather difficult to program.
> >
> > 3. A bit of a kludge.
> >
> > I have a few more written details on the FIFO idea, but nothing on
> > the other
> > two.
> >
> > ========== FIFO ideas ==========
> > - Given all the problems with FIFOs, I think the solution is to do
> > something a
> >   little different, though I will look at the code and see if there
> > is not
> > some
> >   simple solution (i.e. some bug that was introduced).  What might
> > be a better
> >   solution would be to use a FIFO as a sort of "key" to tell Bacula
> > to read
> > and
> >   write data to a program rather than the FIFO.  For example,
> > suppose you
> >   create a FIFO named:
> >
> >      /home/kern/my-fifo
> >
> >   Then, I could imagine if you backup and restore this file with a
> > direct
> >   reference as is currently done for fifos, instead, during backup
> > Bacula will
> >   execute:
> >
> >     /home/kern/my-fifo.backup
> >
> >   and read the data that my-fifo.backup writes to stdout. For
> > restore, Bacula
> >   will execute:
> >
> >     /home/kern/my-fifo.restore
> >
> >   and send the data backed up to stdout. These programs can either
> > be an
> >   executable or a shell script and they need only read/write to
> > stdin/stdout.
> >
> >   I think this would give a lot of flexibility to the user without
> > making any
> >   significant changes to Bacula.
> >
> >
> > Best regards,
> >
> > Kern
>
> Clay Atkins
> cl...@e3...