Re: [Bacula-devel] Migration jobs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Monday 28 November 2005 17:01, David Boyes wrote:
> > What I have implemented already is (passes regression
> > testing, so all existing features work despite the new code):
> > - Separation of read/write descriptors in the Storage daemon.
> > - Separation of the read/write Storage names in the Director that
> >   are sent to the Storage daemon (both a read and a write storage
> >   device can now be sent).
>
> Neat.
>
> > - Implementation of a skeleton of Migration/Copy job code in
> > the Director.
> > - Implementation of the following new Pool Directives:
> >     Migration Time = <duration>
> >     Migration High Bytes = <size>
> >     Migration Low Bytes = <size>
> >     Next Pool = <Pool-res-name>
> >   (nothing is done with them yet, but all the Catalog variables
> >    exist already in 1.38.x).
>
> And these values apply to all volumes in the pool, right? 

Yes, which may encourage some users to create different Pools for different 
Media Types.

> Did you want to treat the values for the entire pool
> (eg number of volumes * size of 
> volume), or for individual volumes? 

I consider them to be values for the whole pool.

> Operationally, I think you'll be 
> more concerned with the entire pool case, using the volume occupancy
> information to select what volumes are eligible for migration.

Well, Bacula is a bit brain-damaged, so for the moment, it is not going to be 
able to use volume occupancy information to select particular Volumes for 
migration.  What I am talking about for the moment is Job migration (see 
project #1 for details).  Now, if one arranges things right (Migration Time 
for example), this might translate into Volume migration.

If users want Volume migration, we could possibly come up with such a project, 
but it is considerably more complex, and not as useful, IMO, as Job migration 
(at least for a first cut at migration) because Job migration implements 
99.9% of the code necessary for Copy, Archive, and Consolidation (Virtual 
Full backups) at no extra cost.

>
> > How does it work?  Much like a Verify job.
> > You define a Migrate or Copy job much the same as you do a
> > Verify job, with the exception that you specify a Migration
> > Job (target) rather than a Verify Job name (i.e. you tell it
> > what job you want migrated).  The from Storage daemon is
> > defined in the specified target Migration Job. The from Pool
> > is specified in the target Job's Pool's Next Pool, and if not
> > specified, it is taken from the Pool specified in the current job.
> >
> > You then schedule this Migration job using a schedule.  When
> > it runs, it will check that either the Migration Time is
> > depassed (it is if it is zero) or the Migration High Bytes
> > are exceeded in the target's Pool.  If one of those is true,
> > the job will start and will migrate the last target job run
> > (this needs to be improved) by reading that job, much like a
> > restore, and writing it to the destination pool, then for a
> > Migration, the old job is deleted from the catalog (perhaps
> > the Volume will be removed -- another Feature Request), or in
> > the case of a Copy, the old Job information will be left unchanged.
>
> Hmm. I really think migration should be volume-oriented, not job
> oriented. You really want to use this to clear entire volumes, not
> job-by-job.

Yes, moving an entire Volume is also a useful feature -- especially if you 
want to free up some space.  

>
> I'd suggest that the migration code check whether the pool exceeds the
> "Migration High Bytes" value, and then select volumes for migration
> starting with the ones with the least space in use (it's faster to clear
> an almost empty volume, minimizing the time the volume is unavailable
> for appends) that is not in use in another job. The migration code
> should then move ALL the jobs off the volume to the volumes in the next
> pool, and release the original volume as available for use. The
> migration code could then continue with the next volume if pool
> utilization is still above the threshold, or stop if below the
> threshold. If there are absolutely no volumes available in the source
> pool for whatever reason, dip into your scratch pool and log the event.

This is something I could see being implemented, and I don't see it as being 
excluded by what I am currently implemented.  Please let me get migration of 
a single job working, then the step to doing a full Volume should be a lot 
easier.

For example, I could imagine that the current code would have a Job Level of 
"Job" and what you want would be Job Level = "Volume".

>
> That way, you don't have to tie anything to the job itself, and you can
> run regularly scheduled jobs that just "do the right thing" at regular
> intervals. It also lends itself easily to later adding a trigger-based
> process that would fire automatically if a threshold in a pool was
> exceeded.
>
> > - You need a different Migration job for each job to be
> > migrated.  This is a bit annoying but is mitigated by JobDefs.
>
> See above. IMHO, this really isn't job management, it's volume
> management. All we're doing is rearranging where the server stored the
> data, not the characteristics of an individual job.
>
> > - I haven't worked out exactly what to keep in the catalog
> > for Migration jobs (a job that runs but does nothing should
> > probably be recorded, a job that runs and migrates data
> > should probably be labeled as a Backup job to simplify the
> > restore code ...).
>
> Start/stop times, amount of data migrated, number of volumes processed,
> pool occupancy at start, pool occupancy at end.
>
> I think labeling it as a Backup job would be confusing. It's not really
> a backup; it's a migration job.

Yes, except that I am planning to have the Migration job function much like a 
backup.  When anything is moved or copied, a whole new set of database 
records need to be written.  I find it cleaner to attach them to the 
Migration job, which just falls out of the code, rather than trying to go 
back and delete part of the previous job data and patch the new records in.  
In fact, it would be *very* difficult to get it right.

>
> > - The last 20% of the programming effort requires 80% of the work :-)
>
> And most of the grief.
>
> > - I'm thinking about adding an interactive migration console
> > command, similar to the restore command, except that the
> > files selected will be written to the output. This is a way
> > to "migrate" multiple jobs (i.e. the current state of the
> > system) or in other words do a "vitual Full backup" or a
> > "consolidation".
> > To be consistent this command would not allow selection of
> > individual files, i.e. it will take all files from the
> > specified jobs.
>
> There's two cases, I think: migration and data movement.  I'd add two
> commands: MOVE DATA and MIGRATE DATA. MOVE DATA just moves data from
> volume A to volume B, using the standard Bacula append rules, updating
> the database as it goes along so that the active location for a file in
> the database is now recorded as volume B rather than volume A. Takes two
> specific volume names as arguments.

I don't exclude having a Move job, which might work as you describe.  After 
all, I am planning to have a Copy job.

>
> MIGRATE DATA would take a pool name as input, and perform the equivalent
> of a migrate job. The input pool name is required, and the default
> behavior would be to use the nextpool attribute. If an output pool is
> specified, the nextpool attribute in the database is ignored, and the
> output pool is used as specified.
>
> > - An Archive feature can also be implement from this -- it is
> > simply a Copy with no record being put in the catalog and the
> > output Volume types being Archive rather than Backup.  (Note,
> > my concept of Archive is that no record of them remains in
> > the catalog -- this may be a subjet of discussion ...)
>
> I'd say that you absolutely *want* records of where you archive stuff.
> You could add a COPY DATA command that would behave like the MOVE DATA
> command I described above, and give it an option as to whether to record
> the copy in the database or not (default being yes). There you probably
> want to allow either volume names or specifying individual job numbers
> to copy to the new volume.

The Copy job will record detailed data locations and File level information in 
the database. The Archive does not record detailed data locations or File 
level information in the database (except perhaps the Job record).

>
> Coupled with my feature request for multiple copypool processing, I
> think this removes the last few barriers to calling Bacula
> enterprise-grade -- this is a really seriously cool step forward.

Thanks for your comments. As always, they help me clarify my ideas, and keep 
me from overlooking important considerations.

-- 
Best regards,

Kern

  (">
  /\
  V_V