Re: [Bacula-devel] Pre-alpha version of Bacula plugins working

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

On Tue, Feb 19, 2008 at 05:13:04PM -0500, John Stoffel wrote:
> 
> Kern> Mark today in your calendar.  Bacula just did its first backup
> Kern> and restore of a MySQL database using a plugin.  I did it with
> Kern> using a simplistic "pipe" plugin.
> 
> Congrats!
> 
> Kern> The operation consisted of adding the following line to the
> Kern> Include section of the FileSet:
> 
> Kern>                   1                    2                                  3                                               4
> Kern> Plugin = "bpipe:/@MYSQL/regress.sql:mysqldump -f --opt --databases regress:mysql"
> 
> Kern> This plugin line goes in the FileSet section where you have your
> Kern> File = xxx lines, and for this plugin is composed of 4 fields
> Kern> separated by colons (I've indicated the field numbers above the
> Kern> Plugin line for reference only.
> 
> Ugh... sorry to be negative, but could we spend some time coming up
> with a nicer syntax please?  
> 
> Kern> Field 1 specifies a specific plugin name (in this case bpipe).
> 
> Ok.  But would it make more sense to have a plugin { ... } resource block
> instead in the FileSet?  
> 
> Kern> Field 2 specifies the namespace (in this case the pseudo
> Kern>    path/filename under which the backup will be saved -- this
> Kern>    will show up in the restore tree selection).
> 
> Hmm... what are the limits in Bacula's design in terms of namespace
> support?  Does it strictly follow the unix rooted tree, or does it
> also support the notion of drive letters as in PCs?  If so, why not
> just extend the drive letter to be:
> 
>      DB:/MYSQL[345N]/<database>/<table>
> 
> instead?  Maybe the version of mysql doesn't matter, but might be
> useful if you try to restore a version 5 mysql dump onto a version 3
> server to get a warning.  
What about even further developing the notion of plugins and what is
being backed up. I think that the with plugin architecture, it would be
better to get rid of the file-oriented backups. Instead, a notion of
entities would better describe what is being backed up. These entities
would not require globally matching naming scheme, i.e. one would not be
forced to try represent all backed up entities as Unix file names with
funny prefixes. Databases are not logically files.

Instead, the naming scheme, configuration that defines what to backup
and needed meta data, and methods how to backup and restore data would
be implemented by various plugins. Plugin architecture would be needed
for both Director and File Daemon. The Director plugins would handle
backup configuration (think FileSet block in the current Director
configuration) of the plugin, and interfacing with the catalog, or
alternatively provide plugin-specific helper hooks for the Director core
code accessing the catalog. File Daemon plugins would handle actual
backing up and restoring of the data in the plugin-specific way.

In practice, this would mean that the current feature set of bacula
would be implemented by Filesystem plugin. After the plugin API is
defined, it would be quite trivial to move the current Bacula backup and
restore code to the Filesystem plugin. The naming scheme of the
Filesystem would be the current way of naming files (/path on Unix,
DriveLetter:/Path on Windows). So the backup entity in the Filesystem
plugin case is a single file residing on a filesystem. The configuration
of the Filesystem plugin would be done using the familiar FileSet block
in the Director configuration. Also File Daemon side, the current code
for actually providing the data to be backed up for the Storage Daemon
(reading file contents from the file system and so on) and writing
restore data in usable form (writing files on the file system with given
content), could be re-used pretty much as-is in the File Daemon
Filesystem plugin.

As for handling databases natively, the plugins would work like this
(I'm using PostgreSQL as an example, but this is not tied to PostgreSQL
in anyway, plugins for other RDBMs would be pretty much the same at the
high level I'm talking about). In the Director, there would be a
configuration block that specifies the database resources to be backend
up (but this block would not be called FileSets, remember that handling
of the block is implemented by each plugin, and PostgreSQL
plugin has nothing to do with the Filesystem plugin). These resources
include at least the name of the database and/or tables to be backed up,
the usual info needed to connect the PostgreSQL server (hostname, port,
user, password and so on). This info is delivered to the configured File
Daemon, and it uses the info to acutally read the backup data from the
database. So the actual PostgreSQL instance to be backed up does not
need to reside on the same machine that is running the File Daemon. For
instance, you can use just one File Daemon in your institution to backup
several PostgreSQL servers running on distinct hosts just fine, since
the File Daemon PostgreSQL plugin uses the native PostgreSQL API to
connect to the actual server as specified in the Director configuration.
Director does not know anything about what or where the actual
PostgreSQL server is, and it does not need to know, it's just instructed
to backup some database using this and that File Daemon, which
interfaces with the actual service. As for implementing the PostgreSQL
File Daemon plugin, a lot of code for backuping up and restoring could
be borrowed from pg_dump and psql tools of the PostgreSQL distribution.
The backup entity of the PostgreSQL plugin would be the name of the
database and/or tables. Pair of the database/tables name and the
configured File Daemon is unique within the domain of the PostgreSQL
plugin (just like with files, you can have many /etc/passwd files backed
up, you can have many databases with the same names backed up.
But the /etc/passwd on a certain File Daemon makes it unique
within all the /etc/passwd's). But the main point is that the entry in
the catalog for a PostgreSQL database does not need to resamble a valid
file name in any way (it's trivial technical detail what the string
would actually look like so I'm not proposing any format here).

Of course, this is not a simple thing to design and implement. There
would be no universally known grammar for the Director and File Daemon
configuration file, for instance. The set of available keywords depends
on the fact which plugins are loaded, and since anybody is able to
implement a new plugin using the public plugin API, the amount of
possible keywords is infinite. Also the catalog database schema would
need a complete re-design. For instance, the filename, file and path
tables in the current form would make sense only for the Filesystem
Directory plugin.

Ideas?

Best Regards,

-- 
Tuomas Jormola <tj at solitudo.net>