Re: [Bacula-devel] A suggestion for the form that a 'plugin' API for the file daemon might take

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

>=20
> My plan was to have plugins as fine as the Options level (as currently
> partially implemented), which is finer grained than the Job level, and
it
> does not at all solve the problem, but creates new problems; the main
one
> being that unless you figure out how to put these plugin names on the
> Volume,
> the Volume is no longer complete, but requires a *special* conf file
to be
> properly read.  I consider this a partial implementation that will get
the
> user into trouble, so it is something I have ruled out implementing --
at
> least for the moment.

Hmmm... maybe there could be some value in writing out a 'dummy' header
file at the start of the backup that contains the 'options' section that
was used to create this backup.

I do know that Veritas Backup Exec makes it look like everything is
backed up in one big job, but really all the different bits and pieces
are in separate jobs on tape (eg file1=3DC:, file2=3DD:, =
file3=3D'Microsoft
Information Store' (exchange), file4=3D'SQL Server', file5=3D'System
State'). Have you ever thought about implementing 'Job Groups' in
Bacula?

> > It would be nice to be able to override this though, as for at
> > least MSSQL backups, the format on tape is exactly the same as the
> > format stored on disk when you use the "BACKUP DATABASE xxx TO
> > DISK=3D'filename.bak'" command, so as a last resort you could just
restore
> > your SQL backup to a plain file and use MSSQL to do the restore from
> > there, eg in disaster recovery mode where you don't have a working
> > plugin yet.
>=20
> It seems to me that if a particular plugin is not available, the user
will
> not be able to restore his data.  I believe that you are viewing the
> problem from too narrow a perspective, because in general the kinds of
> plugins that will
> be written will not be easy to simulate in some sort of disaster
recovery
> mode without the plugin.  Were it that simple, I don't think we would
need
> the plugins.

Maybe. I'm not convinced of this yet though... (see my MSSQL example
below)

> > The problem with trying to integrate your whole backup (eg files +
> > exchange + mssql) into one job is that each deals with different
logical
> > things. A file backup obviously just deals with plain files, but an
> > exchange backup logically deals with storage groups and databases
> > (databases consist of multiple files - normally two - and a storage
> > group has multiple databases and then multiple log files which hold
> > transactions for all databases in that storage groups).
>=20
> At the very lowest level of the code, you are correct, but in reality
a
> "file backup" does not just deal with plain files. It deals with a
tree of
> objects.
> Normally you call those objects files, which is OK, but in fact to
Bacula
> they are a whole bunch of different types of objects (directories,
which
> form the tree; sockets; character files; block files; normal files;
FIFOS,
> ...).
> There is no limit to the number of objects that Bacula can deal with.
Each
> one has a different backup and restore method (though the code is not
> really
> that well organized). The only *current* requirement is that they be
in a
> tree relationship.
>=20
> I'm not sure why the above is a problem, other than finding a proper
> namespace and browsing the backup objects.

The problem I was attempting to describe is that the files that Bacula
currently backs up consist of a small known number of streams (as you
say, depending on what type of 'file' it is). But as long as the
directory that it lives in exists, any of these files can be restored on
it's own without any problems, a single 'file' (device node, fifo, etc)
exists as a single useful entity.

The files backed up in my latest backup of an Exchange Server are (MIS =
=3D
'Microsoft Information Store', FSG =3D 'First Storage Group' - reduced =
to
fit in a readable way - I hope):

"
1.  /MIS/FSG/Mailbox Store (CMSERVER1)/=20
2.  /MIS/FSG/Mailbox Store (CMSERVER1)/DatabaseBackupInfo
3.  /MIS/FSG/Mailbox Store (CMSERVER1)/D:\...\priv1.edb
4.  /MIS/FSG/Mailbox Store (CMSERVER1)/D:\...\priv1.stm
5.  /MIS/FSG/Public Folder Store (CMSERVER1)/
6.  /MIS/FSG/Public Folder Store (CMSERVER1)/DatabaseBackupInfo
7.  /MIS/FSG/Public Folder Store (CMSERVER1)/D:\...\pub1.edb
8.  /MIS/FSG/Public Folder Store (CMSERVER1)/D:\...\pub1.stm
9.  /MIS/FSG/D:\...\E000010B.log
10. /MIS/FSG/
11. /MIS/
"

Ignoring folders, possible restore combinations are:

"
2.  /MIS/FSG/Mailbox Store (CMSERVER1)/DatabaseBackupInfo
3.  /MIS/FSG/Mailbox Store (CMSERVER1)/D:\...\priv1.edb
4.  /MIS/FSG/Mailbox Store (CMSERVER1)/D:\...\priv1.stm
6.  /MIS/FSG/Public Folder Store (CMSERVER1)/DatabaseBackupInfo
7.  /MIS/FSG/Public Folder Store (CMSERVER1)/D:\...\pub1.edb
8.  /MIS/FSG/Public Folder Store (CMSERVER1)/D:\...\pub1.stm>=20
9.  /MIS/FSG/D:\...\E000010B.log
" (Full Restore)

Or:

"
2.  /MIS/FSG/Mailbox Store (CMSERVER1)/DatabaseBackupInfo
3.  /MIS/FSG/Mailbox Store (CMSERVER1)/D:\...\priv1.edb
4.  /MIS/FSG/Mailbox Store (CMSERVER1)/D:\...\priv1.stm
9.  /MIS/FSG/D:\...\E000010B.log
" (Just user mailboxes)

Or:

"
6.  /MIS/FSG/Public Folder Store (CMSERVER1)/DatabaseBackupInfo
7.  /MIS/FSG/Public Folder Store (CMSERVER1)/D:\...\pub1.edb
8.  /MIS/FSG/Public Folder Store (CMSERVER1)/D:\...\pub1.stm>=20
9.  /MIS/FSG/D:\...\E000010B.log
" (Just the public folders)

Any other combination is not valid - you must restore the logfile(s)
with any restore that you do, and you must restore all files in a given
database (.edb and .stm - DatabaseBackupInfo is a metafile that exists
to allow the agent to know the GUID and filenames of the files in the
database before it gets to them when restoring).

Now it may make sense for the exchange agent to internally roll some of
the files together into a single stream, eg create a stream that has the
DatabaseBackupInfo + .edb + .stm files all stored sequentially (and I
may yet to do that...), but you still need the database files separate
from the log files so that each database can be selected for restore
independently. As long as the instructions are clear from a user point
of view, a restore is still pretty straight forward.

This leads into another item to (maybe) add to the list - does a plugin
need to have any influence over how the files are selected for restore
in the user interface. This would be really nice for the above case, but
a nightmare to implement as suddenly you need the plugin running in the
director too. Unless maybe we developed some sort of restore recipe that
would be stored at the director for each backup... maybe just ignore
this whole paragraph for now :)

> > True, but a _lot_ more work for bacula. Although I haven't been
> > following the discussions on 'true' incremental/differential backups
so
> > you may already have worked out a nice solution to this.
>=20
> Yes, I know how to do project #1 "Accurate restoration of
renamed/deleted
> files". The only unknown are some minor details mainly concerning
> performance.

That last sentence was the bit I was referring to.

> > Your above statement is also true for Exchange right now, as per my
> > previous paragraph. Exchange keeps track internally of when the last
> > full backup was done, so bacula needs to somehow know that it
doesn't
> > have all the control it would like over incremental and differential
> > backups.
> >
> > I guess one of the other things a plugin needs to do is tell bacula
> > about its capabilities, eg 'Can do Incremental Backup', 'Can do
> > Differential Backup', etc.
>=20
> I don't think that is quite the right question.  All plugins will have
to
> know
> how to do all implemented backup types.  It doesn't make sense to do a
> partial implementation.  That said, I imagine the plugin may have a
> certain
> flexibility in the type of backup it does.  If an Incremental is
> requested,
> there is no real harm if it does a Differential other than efficiency
...
> however, I don't imagine that would be a normal case.

Well... one of the things MSSQL (and probably other databases) can do is
a restore to a point in time. eg suppose I had forgotten the WHERE
clause on a delete and done something stupid like 'DELETE FROM
Order_Detail', and the last backup was run 3 hours earlier, and in that
time 100 people had been madly entering orders. That's 300 person hours
of labour down the drain! What I could do though is:

1. backup the current transaction logs
2. restore the most recent full copy of the database
3. restore all transactions since then up until right before I issued my
monumentally stupid DELETE command (this is done via the RESTORE syntax
in MSSQL).

That repairs the database exactly as it was, and everyone can keep
moving ahead.

Now I imagine that driving all of that from within Bacula would be quite
hard using a 'plugin unaware' user interface, but it might be nice to
have some of the back end framework in place for when we design it.

Incidentally, how I would get around this with a simpler MSSQL agent is
to simply restore everything to plain files in the filesystem and get
MSSQL to restore from those. But maybe MSSQL is the only agent where it
would make sense to be able to do this...

James