#11 Failure to handle Daylight Saving transition

closed-fixed
nobody
None
5
2011-10-07
2011-10-02
No

I set up a daily schedule to run a job at 2am each day, launched on Friday 30 September. On Monday 3rd October, when I checked the logs, it turned out the job failed on Sunday 2nd October after Daylight Saving Time became active at 2am.

The message was:

ERROR 02-Oct-11 01:00:16: Failed notifiying listener [org.oddjob.scheduling.TimerBase$RescheduleStateListener@bcc8f4] of event [JobStateEvent, source=PB_All_Daily_Attempt, COMPLETE]
java.lang.IllegalStateException: An interval can not have a to [Mon Oct 03 02:00:00 EST 2011] before the from [Mon Oct 03 03:00:00 EST 2011]

I think the listener here is a scheduling:retry container that was triggered from the
It seems that Oddjob did partly account for DST, because the log shows the job ran at 01:00 on the Sunday, which was 24 hours after the previous run. But for some reason one of the job listeners failed with the error above (see attached stack trace).

The job XML includes some custom jobs/beans that I can attach if they are of interest, but I don't think they are relevant:

* batch-logging service is a bean that has methods for writing a log record for each job attempt that is used by the ETL jobs.

* datasource is a bean that wraps a commons DBCP connection pooling data source so I can re-use connections over time

* batch is a StructuralJob subclass that accepts one subjob, and registers a "parent" job that it exposes on a property to link subjobs launched together.

* talend is a job that builds the command line to run a Java VM that executes a Talend ETL job and then uses ProcessBuilder to run it.

Discussion

  • William Rose

    William Rose - 2011-10-02

    Full log file of "Batch" job where error appears

     
  • William Rose

    William Rose - 2011-10-02

    Oddjob XML that was running

     
  • William Rose

    William Rose - 2011-10-02

    The missing words after "I think the listener here is a scheduling:retry container that was triggered from the" are "scheduling:daily container".

     
  • William Rose

    William Rose - 2011-10-02

    Also: having failed in this way, the job then didn't run at 01:00 or 02:00 on Monday 3rd of October.

     
  • William Rose

    William Rose - 2011-10-02

    To clear this error, I first tried a hard reset of the scheduling:daily job, then a reset of the top-level oddjob that calls PB_All.xml. Neither of these stopped the same message appearing on a re-run. Opening and closing Oddjob explorer also did not clear the issue. It seems that whatever was causing the issue was persisted as part of the job state, and I had to clear out the serialized state files before rerunning would work normally.

     
  • William Rose

    William Rose - 2011-10-02

    Same error occurs after hard reset of scheduling:daily and parent oddjob

     
  • William Rose

    William Rose - 2011-10-02

    Appearance of oddjob explorer after error

     
  • William Rose

    William Rose - 2011-10-02

    Normal appearance of oddjob explorer after clearing serialized state and re-launching

     
  • Rob Gordon

    Rob Gordon - 2011-10-03

    Hi - I can reproduce the probem. It's caused by Oddjob adding 1 day to a Calendar that is on the DST boundry and adding 1 day to a Calendar that is 1 millisecond after that. I need to dig deeper how the java Calendar works at these points.

    You needed to clear out the serialized state because without the skipMissedRuns property set, Oddjob will alway try and schedule from when it last ran and so continued to hit the bug.

     
  • Rob Gordon

    Rob Gordon - 2011-10-07

    I've committed a fix for this. It includes extensive unit tests so I'm reasonably confident there should be no more issues around the transition to DST and back.

    A recent change also includes Timers going to READY when stopped and hard reset resetting schedules, so deleting the persisted state is no longer necessary. I'm considering if other jobs should go to READY and not COMPLETE when stopped - including services. Any feedback would be welcome.

    Rob.

     
  • Rob Gordon

    Rob Gordon - 2011-10-07
    • status: open --> closed-fixed