[Saturn-devel] proposal for needed functionality
Status: Beta
Brought to you by:
mazzabr
From: Christian S. <Chr...@gm...> - 2005-04-15 05:57:17
|
hello list, sorry for the head-start that follows. i am considering a similar project for quite some time but i would like to suggest a slightly different direction and a wider scope. i would have the need for a not only scheduler for a cluster of machines where on every machine there is such a scheduler and all of them can be centrally administrated. besides that high reliability would be a must for such a service. i would say such a service should have roughly the following parts: - central configuration for the whole cluster of machines. the machines in the cluster belong to a certain type and you specify the "crontab" of types of machines. this configuration should allow for some parameters like ${hostname} or similar. - the scheduler itself. but it should not only provide the scheduling of for starting tasks, but aswell to set timouts on which you can configure alternative tasks to run or error messages to be sent. e.g. if a task is triggered by the arrival of a file you should be able to say that the file has to arrive every day until 22:00 o'clock or this is an error condition. - the tasks to execute. these tasks are "functionality" that take arguments like the time of execution or the filename of the file which just arrived and triggered the execution of this task. it is necessary that these parameters are arguments to the task and that you don't discover them at run-time because of the following point: - a local and a remote entry point (you should be able to trigger every task by hand with arguments either locally from the commandline or remotely via a network protocol) - logging of info and error messages and - a system that reacts on error events and autonomously starts "recovery" tasks perhaps the system should even contain a reliable file transfer mechanism to feed the results of one task on one machine into another task on another machine in the cluster. i personally would suggest to use java as the implementation basis for such a cron replacement, because then it will be easily portable. besides that i would suggest to describe tasks as ant tasks in xml files. there is already a huge base of predefined ant tasks which are ideal for most system maintenance activities and besides that ant has already proven to run on all kinds of different platforms. in addition you should be able to write tasks directly in java if they cannot be expressed as ant tasks. java supports all kinds of remote network interfaces like rmi, corba, soap, jini, ... which you could use to manually trigger tasks on any machine of the cluster. in this remote scenario you would have to provide some sort of authentication scheme so that you can restrict the access to your defined tasks. the central configuration for the tasks and possible the logging would come from or go to a database and there are jdbc drivers for basically all major databases out there. in your description of what saturn should be it is written "Job Scheduler for Networks. Control local and remote job scheduling through simple commands". i am not sure if that means that you have a central scheduler which then triggers commands on remote machines or that you have schedulers on all machines of the cluster which can be administrated from a central point? i guess version 2, so that even if you don't have networking available the scheduler can do its job. because of the possibility of network failures the logging mechanism should allow for queuing messages so that when the network comes back the messages can be sent to the central collection point of all logging messages. if you could agree to most of the points above then you have an additional developer for your project. thanks, -- Christian Schuhegger http://www.el-chef.de/ |