Re:[mbackup-devel] Process block diagram

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 17 Apr 2001, Brent B. Powers wrote:

> I have the suspicion that modules mean different things, also. If I
> understand correctly, the Master Control Module interprets one or more
> script files, (or something similar), whereas the external
> communication module is a glue and interface layer.

By module, I mean something that is self contained and can be easily
removed and replaced because the interface it uses is well defined. I
thought of an analogy today for the different modules. Imagine that
mbackup is a large factory. In this case, then the MCM would be the CEO,
he decides what factory will be doing, when, and in what order. He knows
the master plan. The RCM would be the person in charge of the
purchasing dept. He has several people under him and when new materials
for the factory are needed the MCM tells the RCM to get something, and the
RCM delegates it to one of his workers. A similar thing happens with the
FCM, They are the assembly line, the MCM tells them to process this new
input from the RCM. The WCM is the shipping dept. Once we have the
processed data, we need to ship it off somewhere. The Communcations
manager is like the secretary or a PR person. If we need to contact
another machine or process we do it through the ECM.

> OK, so if modules don't need to create a tcp connection, how _do_ they
> communicate?

communicate with what? if it's another part of the same process it will be
function calls. If it's another process on the same machine, they can use
named pipes. What example case were you refering to?

> Does the mcm ask this for the next file, or for the next file spec?

I haven't decided exactly. I've been thinking that it will just ask for
the next block of data and process things in blocks instead of files. I'm
still pondering this, so input would be good.

> I'd like to consider a couple of scenarios, and see how you think that
> they might be handled.
>
> The obvious ones (any of these should go to disk, rom, tape, or ???):
> 1.) Full Backup (file by file, all files)
> 2.) Incremental backup  (file by file, all files since a certain date)
> 3.) Image backup (an entire file system)
>
> More difficult:
> 1.) Tower of Hanoi backup (file by file, all files that have not been
> backed up in their current version to at least n other data sets)
> 2.) Distributed backup (multiple machines)

See below.

> Proprietary:
> 1.) Suppose we had the ability to read a database partition while the
> database was live? How would we trigger the program to read the data?

This would be taken care of by a module under the RCM.

>
> Obviously, I've some things in mind. My conception of the master
> control module is that it reads a file describing the backup.  I think
> that your system allows reader modules to handle file readers, image
> readers, or remote readers, and the proprietary steps are a reasonable
> extention. However, where is backup history being kept, as well as
> media content lists? (i.e. if you know you've backed up a file, and
> you're looking at a pile of tapes and cdroms, how do you find the file)?
>
> Now, suppose that you want to keep that information in a text file;
> sometimes in a database; in a Sybase database for one set of clients,
> in an Oracle database for another set of clients, and.... I think a
> new module is required. Can we call it the database or history module?

I've been thinking about this too. Lots of things will need this info, I
just haven't decided how I want it to interact with the rest of the
program.

> Now, what makes the decisions as to _what_ is backed up? A file called
> fspec exists: Or, maybe it does. How do we know? Is the reader module
> responsible for determining that? If so, what has the responsibility
> of determinging that the file fspec should actually be backed up? I've
> not really figured this one out at all for the generic case. I did
> come up with something like: a control module of some kind requests a
> directory listing from a directory-reader. Based on the directory
> listing, it compiles a list of items to back up. The master control
> module takes that list, along with other lists that are included in
> the same backup set, and arranges them. For each item, then, it
> requests the data from a reader.

My thoughts at the moment are that the MCM makes a high level decision,
such as, it's time to start the backup, or I want to backup /dev/hda1.
It's the RCM's job to look at /dev/hda1 and decide which files are to be
backed up. If it need to contact the to be named indexing module, then it
either does it directly, if it's part of the current process, or it asks
the ecm to contact the index on it's behalf.

> This scenario leads to some kind of reader architecture that allows
> two modes of operation for each reader: directory and data. Note that
> the directory itself may have to be backed up, so that, for some
> readers, the directory mode might be a noop (for instance, a unix
> partition reader (dump)); but for other readers, the data mode might
> be a noop (think ldap-ily).
>
> This also leads (somehow) to the idea that the Master Control Module
> might not want to simply parse a list, but be able to execute a
> program or script in any situation where it would otherwise parse a
> list. (I'm not sure if that sentence makes sense).

I'm not sure either.

> Finally, it is pretty clear that if there is a database module,
> communication to the database module has to be granted to all of the
> control modules, and some of the individual modules. For simplicity,
> you might want to say all of the individual modules.

All modules will have access to the module if needed, but it will be
handled via ECM or the indexing module. The various modules don't need to
know how the data is indexed or stored, it just wants to know the info it
needs.

I hope that answered some of your questions?

-james