Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

[r7423]: docs / wiki / TGScheduler.rst Maximize Restore History

Download this file

TGScheduler.rst    310 lines (221 with data), 10.9 kB

Scheduling Tasks with TurboGears

status:Draft

TGScheduler is a scheduler that is based on Kronos by Irmen de Jong. This scheduler makes it easy to have one time or recurring tasks run as needed. It works with both TurboGears 1 and 2.

You can schedule Python functions to be called at specific intervals or days. It uses the standard sched module for the actual task scheduling, but provides much more:

  • Repeated (at intervals, or on specific days) and one-time tasks.
  • Error handling (exceptions in your tasks don't kill the scheduler).
  • You can run the scheduler in its own thread or a separate process.
  • You can run a task in its own thread or a separate process.

To use the scheduler, you must start it. For example, add the following lines in a method that is run when starting your project:

from tgscheduler import scheduler
scheduler.start_scheduler()

A convenient place to do that is the <yourproject>/<yourpkg>/commands.py file.

Scheduling Jobs

There are five functions in the tgscheduler.scheduler module you can use to schedule jobs. They are called add_interval_task, add_weekday_task, add_monthday_task, add_single_task and add_cron_like_task.

All five functions return a Task object. If you hold on to that Task object, you can later cancel it by calling tgscheduler.scheduler.cancel() with that Task.

Here is an example that runs the function "do_something" every ten seconds:

from tgscheduler import scheduler

def do_something():
    print "Hello world."

scheduler.add_interval_task(action=do_something, taskname='do_something',
    initialdelay=0, interval=10)

All five scheduling functions take the following arguments:

action
The callable that will be called at the time you request
args
Tuple of positional parameters to pass to the action
kw
Keyword arguments to pass to the action
taskname
Tasks can have a name (stored in task.name), which can help if you're trying to keep track of many tasks.
processmethod

By default, each task will be run in a new thread. You can also pass in tgscheduler.scheduler.method.sequential or tgscheduler.scheduler.method.forked. The default is tgscheduler.scheduler.method.threaded.

Sequential means that the task will run in the same thread as the scheduler, and task will be executed sequentially, one after another. This should only be used for quick tasks.

Forked means to fork a new process to run the job, which is sometimes more effective for intense jobs, particularly on multiprocessor machines (due to Python's architecture).

Warning

it is impossible to add new tasks to a ForkedScheduler, after the scheduler has been started!

Here's an example of how to schedule the same function as above as a task using the sequential method:

tgscheduler.scheduler.add_interval_task(
    processmethod=tgscheduler.scheduler.method.sequential,
    action=do_something,
    taskname='do_something',
    initialdelay=5,
    interval=60)

In addition to these common parameters, the five scheduling functions each offer additional options to determine when they run. Here are the five functions and their parameters for how often to run:

add_interval_task

Pass in initialdelay with a number of seconds to wait before running and an interval with the number of seconds between runs.

For example, an initialdelay of 600 and interval of 60 would mean "start running after 10 minutes and run every 1 minute after that".

add_weekday_task
Runs on certain days of the week. Pass in a list or tuple of weekdays from 1-7 (where 1 is Monday). Additionally, you need to pass in timeonday which is the time of day to run. timeonday should be a tuple with (hour, minute).
add_monthday_task
Runs on certain days of the month. Pass in a list or tuple of monthdays from 1-31, and also pass in timeonday which is an (hour, minute) tuple of the time of day to run the task.
add_single_task
Runs a task once. Pass in initialdelay with a number of seconds to wait before running.
add_cron_like_task

Pass in cron_str with a string representing the scheduling in a cron-like syntax.

For example, a task that needs to be executed every 15 minutes on working days / hours will be expressed as */15 8-12,14-18 * * 1-5

Note:

  • just like with Cron, week starts by a 0 on Sunday

  • months and days of week can alternatively be specified by the first 3 letters of their name, case-insensitively. For example:

    • 3 is the same as WED
    • 1-5 is the same as mon-fri
  • months and days of week can be specified either by their number or the first 3 letters of their name, not both:

    • JAN-12 will raise an exception
  • the task is executed when all the fields match the current time. This is an important difference with the UNIX Cron, where tasks are executed when the minute, hour, and month of year fields match the current time, and when at least one of the two day fields (day of month, or day of week) match the current time.

Retrieving Jobs

As described above, a task can be canceled by calling tgscheduler.scheduler.cancel() with that Task object:

from tgscheduler import scheduler

def do_something():
    print "Hello world."

task = scheduler.add_single_task(
        action=do_something,
        taskname='do_something',
        initialdelay=60)

scheduler.cancel(task)

However, it is not always possible to hold on to the task object so you can cancel it. If the Task object has a name, it can be retrieved later on:

from tgscheduler import scheduler

task = scheduler.get_task(taskname)

This assumes that the task was given a name when it was scheduled, which is normally not mandatory.

Retrieving tasks can be useful for modifying the scheduling of an already scheduled task:

from tgscheduler import scheduler

def reschedule_interval_task(taskname, action, interval):
    task = scheduler.get_task(taskname)
    scheduler.cancel(task)

    scheduler.add_interval_task(
        action=action,
        taskname=taskname,
        initialdelay=10,
        interval=interval)


# we schedule a task
scheduler.add_interval_task(
    action=do_something,
    taskname='do_something',
    initialdelay=10,
    interval=10)

# later in the code, we need to change the interval (user input, hot configuration,...)
reschedule_interval_task('do_something', do_something, 60)

Using Task Objects Directly

For more control you can create one of the following Task sub-classes:

IntervalTask ThreadedIntervalTask ForkedIntervalTask
SingleTask ThreadedSingleTask ForkedSingleTask
WeekdayTask ThreadedWeekdayTask ForkedWeekdayTask
MonthdayTask ThreadedMonthdayTask ForkedMonthdayTask
CronLikeTask ThreadedCronLikeTask ForkedCronLikeTask

All Task sub-classes support the following methods:

execute(self)
Execute the actual task.
reschedule(self, scheduler)
*SingleTask
Does nothing.
*IntervalTask
Reschedule this task according to its interval (in seconds).
*WeekdayTask
Reschedule this for tomorrow, for the given daytime.
*MonthdayTask
Not applicable (raises a NotImplementedError exception.).
*CronLikeTask
Reschedule this for the next execution, based on its cron-like string.

You can then schedule a task using one of the following methods of a Scheduler instance. To get and instance, you can either call tgscheduler.scheduler._get_scheduler() or create your own instance of one of the following Scheduler classes:

  • Scheduler
  • ThreadedScheduler
  • ForkedScheduler
schedule_task(self, task, delay)
Add a new task to the scheduler with the given delay (in seconds).
schedule_task_abs(self, task, abstime)
Add a new task to the scheduler for the given absolute time value.

Usage Example

You might want to use the scheduler as a kind of mini-cron to execute tasks at regular intervals or to dynamically schedule new tasks during runtime (e.g, a RSS reader). Where you place the code for your tasks and scheduling them will depend on your application. In any case you will need to run a function to (re) schedule your tasks after application startup.

For example, you could use the scheduler to regularly run jobs ranging from reporting to updating the database using external data. For this purpose, you might create a jobs.py file in your applications package, containing a function per 'job' and a schedule() function to schedule the tasks during startup. jobs.py basically reads as follows:

# jobs.py
import datetime
from tgscheduler.scheduler import start_scheduler, add_weekday_task, add_interval_task

def generate_product_ranking():
    # do something useful here...
    pass

def synchronize_stock(from=None):
    # do something useful here...
    pass

def schedule():
    start_scheduler()

    add_weekday_task(action=generate_product_ranking,
        weekdays=range(1,8), timeonday=(0))

    add_interval_task(action=synchronize_stock,
        args=[lambda:datetime.datetime.now() -
            datetime.timedelta(minutes=2)],
        interval=120)

Then add two lines to the start() function in commands.py, just before the TurboGears server is started, to run the schedule function at each startup (assuming your applications package is called yourpkg):

# commands.py

...

def start():

    ...

    # following two lines added
    from yourpkg import jobs
    turbogears.startup.call_on_startup.append(jobs.schedule)

    from yourpkg.controllers import Root
    turbogears.start_server(Root())