I want to run a series of jobs that extract data from a database to refresh into the data warehouse. There are three phases of execution that occur: first I try to refresh all the reference tables, which are largely independent; then I pull in transactional tables, which depend on the reference data being up-to-date; then I perform other processing based on the refreshed transactional and reference tables.
Within each phase, the jobs are independent, but since they all hit the same destination, and may consume a lot of resources on the batch server, I only want to run one at a time. If one fails, there's a good chance it will work on retry, and a good chance that other jobs in the same phase will succeed even during the first try.
So I have been looking for a good pattern for a "phase" -- that is, a single-threaded job that executes its children one at a time, but tries all children even after one has failed, and waits for all of the children to have been tried before completing itself, so that subsequent "phases" don't start until the earlier one completes. If the phase is wrapped in a retry schedule, it is desirable to retry each job up to 5 times (say), rather than retrying the whole sequence 5 times (maybe 3 tries used on job 1, then 2 on job 2, and job 3 never gets a look in).
Sequential jobs get the single-threaded and waiting part right, but fail as soon as one child fails, on the assumption that there are implied dependencies between the child jobs.
A parallel job run all the jobs independently, and pool their status, but normally will run them in parallel (of course!), which is not desirable in my case as I don't want to do lots of competing extractions on a single destination server.
So I tried using a parallel job with a throttled executor service that only allowed a single thread, and this worked well for a while. But a recent change to parallel jobs has made this strategy fail as the the parallel job now completes after submitting all its child jobs to the executor service. The comment on that commit was to use state:join to wait for the job, so to get the behaviour I want I need to put a single-thread throttled parallel job into a state:join container.
This seems quite complicated for something that is a fairly simple task. Looking through the other possibilities provided with the various state jobs, I can't seem to find one that does what I want any more easily than the parallel job, but I might be missing something obvious here.
The patch provided takes a different approach and says that what I want is basically a sequential job, but one that considers the child jobs independent, and so it runs them all one after the other, not stopping at the first failure, and then sets its own status based on the child states.
This simplifies the configuration a lot for me, as there's no executor service to set up and no complications posed by asynchronous execution.