[Bacula-devel] Migration jobs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

I'm probably more than half way (possibly 3/4) to getting the first 
migration/copy job running, so I thought I would write up a few of the 
details for those who are interested in checking my "design" or who simply 
want to make comments.

What I have implemented already is (passes regression testing, so all existing 
features work despite the new code):
- Separation of read/write descriptors in the Storage daemon.
- Separation of the read/write Storage names in the Director that
  are sent to the Storage daemon (both a read and a write storage
  device can now be sent). 
- Implementation of a skeleton of Migration/Copy job code in the Director.
- Implementation of the following new Pool Directives:
    Migration Time = <duration>
    Migration High Bytes = <size>
    Migration Low Bytes = <size>
    Next Pool = <Pool-res-name>
  (nothing is done with them yet, but all the Catalog variables
   exist already in 1.38.x).
- Implementation of a new Job Directive:
  Migration Job = <Job-res-name>
  This is identical to the current Verify Job directive. It allows
  specification of a Job to be migrated/copied.
- Implementation of a Migrate and Copy Job Type (similar to Backup, Restore, 
  and Verify).

What remains to be done:
- Finish the skeleton of the Migration job code in the Director including
  having it check the new Pool Directives ...
- Implement sending the bootstrap file directly from the Director to the
  Storage daemon (cutting out the FD). Requires a DIR<->SD protocol
  change.
- Implement Migration/Copy code in the SD (rather trivial, I think).

How does it work?  Much like a Verify job.
You define a Migrate or Copy job much the same as you do a Verify job, with 
the exception that you specify a Migration Job (target) rather than a Verify 
Job name (i.e. you tell it what job you want migrated).  The from Storage 
daemon is defined in the specified target Migration Job. The from Pool is 
specified in the target Job's Pool's Next Pool, and if not specified, it is 
taken from the Pool specified in the current job. 

You then schedule this Migration job using a schedule.  When it runs, it will 
check that either the Migration Time is depassed (it is if it is zero) or the 
Migration High Bytes are exceeded in the target's Pool.  If one of those is 
true, the job will start and will migrate the last target job run (this needs 
to be improved) by reading that job, much like a restore, and writing it to 
the destination pool, then for a Migration, the old job is deleted from the 
catalog (perhaps the Volume will be removed -- another Feature Request), or 
in the case of a Copy, the old Job information will be left unchanged.

Consequences/problems:
- This is a bit simplistic (OK with me) and relies on the user to schedule 
Migration jobs rather than Bacula interally checking and automatically firing 
off a Migration job.  I'm not too comfortable with automatically generated 
jobs.
- There exist dead locks situations in obtaining resources. For example, once 
the read device is acquired, that device is blocked, and if another job has 
the output device that is needed, the Migrate Job will block until the other 
job completes.  If the other job needs the read device of the Migrate job, a 
dead lock will occur.
- Each Migration job will migrate a single previous job.  Scheduling multiple 
migration jobs will, assuming the migration conditions are met, migrate one 
job each time. This is fine with me.
- You need a different Migration job for each job to be migrated.  This is a 
bit annoying but is mitigated by JobDefs.
- I haven't worked out exactly what to keep in the catalog for Migration jobs 
(a job that runs but does nothing should probably be recorded, a job that 
runs and migrates data should probably be labeled as a Backup job to simplify 
the restore code ...).
- The last 20% of the programming effort requires 80% of the work :-)
- I'm thinking about adding an interactive migration console command, similar 
to the restore command, except that the files selected will be written to the 
output. This is a way to "migrate" multiple jobs (i.e. the current state of 
the system) or in other words do a "vitual Full backup" or a "consolidation". 
To be consistent this command would not allow selection of individual files,  
i.e. it will take all files from the specified jobs. 
- An Archive feature can also be implement from this -- it is simply a Copy 
with no record being put in the catalog and the output Volume types being 
Archive rather than Backup.  (Note, my concept of Archive is that no record 
of them remains in the catalog -- this may be a subjet of discussion ...)

Comments?

-- 
Best regards,

Kern

  (">
  /\
  V_V