saturn-devel Mailing List for saturn Network Job Scheduler
Status: Beta
Brought to you by:
mazzabr
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|
From: Fabio A M. <fab...@gm...> - 2005-05-11 18:58:58
|
Jorge: I'm trying to reorganize .pm files in order to better fit into the Event-Job model, including changing the database. I'm also starting to develop GUI, using Perl/GTK (I'm still learning, but I'll have something useful soon). The code already done will be reintegrated into the new file organization, and no code will be discarded, including yours, the reorganization was already planed to happen this way. About MySQL support, I really hope to have MySQL support at any time, my concern about this is only the setup routines that must be slightly different. Until the reorganization is done I think that you can send patches to me, and I'll patch it into CVS. About the reorganization. I think that you are already familiar with the Event-Job model. If not, please reply for a better explanation. The reorganization consists in separating the Job management (sql routines) and execution in a separeted file, the same happens with Event management and triggering. So, each file stores more specific contents. The database have also changed in order to support a better Event-Job relation, including having a single job to be triggered by several different events, to support other kinds of event, instead of only crontab-like events, and to enhance the support of time events, including the 'at' fuctionality and also limited number of execution of each event. Due to the increasing complexicity of the database I think that CLI is becoming too complicated, because now we need more and more options besides the ones already existing, of course that we'll probably still have CLI, mainly to do trivial tasks, like at-like commands. So a GUI is needed for basic administration, it'll probably be developed using Perl/GTK Perl will be used, again, just to keep a single language in use. GTK will be used, in version 2, because of eye-candy when copared to TK. One more thing. I'm sending you a copy, and this message is being sent to the mailing list: . sat...@li... Hope you understand that this is the right way to the developers to communicate. Fabio. Fabio A Mazzarino wrote: >--------------------------- Mensagem Original ---------------------------- >Assunto: [Saturn] Patches >De: "Jorge Valdes" <jo...@jo...> >Data: Ter, Maio 10, 2005 1:09 am >Para: "Fabio A Mazzarino" <ma...@al...> >-------------------------------------------------------------------------- > >Fabio, >How can I get patches/changes to you? >Do you have CVS available? > >Have some time after work to code the proyect and already have the SMTP >mailer stuff done. > >Let me know... > >Jorge. > > > > > > |
From: Christian S. <Chr...@gm...> - 2005-04-18 06:08:01
|
hi fabio, Fabio A Mazzarino wrote: > I'm sorry for that, actually I'm not subscribed in neither of the lists at > SF.net. I shouldn't have believed that the list administrator would be a > subscriber by default. i thought so :) i made the same mistake. >> - central configuration for the whole cluster of machines. the machines >> in the cluster belong to a certain type and you specify the "crontab" of >> types of machines. this configuration should allow for some parameters >> like ${hostname} or similar. > > > The basic idea is to make possible to centralize the configuration of each > machine in a network, server or workstation. I'd like to keep the 'option' > and > not the obligation of configure it centralized. to have the option between several choices is a good thing. but did you see the concept of having "types of machines", e.g. in a cluster you would not want to configure every single machine, you would say configure all web-servers like this, configure all application servers like that ... >> - the scheduler itself. but it should not only provide the scheduling of >> for starting tasks, but aswell to set timouts on which you can configure >> alternative tasks to run or error messages to be sent. e.g. if a task is >> triggered by the arrival of a file you should be able to say that the >> file has to arrive every day until 22:00 o"clock or this is an error >> condition. > > > Let's see some of the concepts involved in Saturn. > . What you call tasks, are called at Saturn JOBs. > . JOBs are triggered by EVENTs, and these events are triggered not only by > time events, like in crontab, but also filesystem events, system events, > or even job generating events. > > With this conception it's possible to determine a set of events to trigger a > job. in order to provide for timeouts you would have to allow in the "language" in which you define your trigger condition to refer to "job A did run today already" or something similar. otherwise you won't be able to define a job that runs if a timeout occurs. timeouts are really something special. you even have two sorts of timeouts, one is triggered if the job did not run until a given point in time and one is triggered if the job runs to long (watchdog). >> - the tasks to execute. these tasks are "functionality" that take >> arguments like the time of execution or the filename of the file which >> just arrived and triggered the execution of this task. it is necessary >> that these parameters are arguments to the task and that you don"t >> discover them at run-time because of the following point: > > > KISS. By creating these special task/JOB (I understood you want to create a > special JOB) you'll add an unecessary complexicity. Try to avoid this. > Discovery the arguments at run-time makes it easier to run. > > I think that I havent gotten you concept. Let's talk more about this. exactly because of KISS i would propose to give arguments to jobs :) the scheduler is only responsible for kicking off activity (jobs/tasks) or for killing activity (watchdog). then you have your jobs. i would like to see a job as a normal function like in C or Perl and functions take arguments. for example currently we have in our system several jobs that work with the current date. but if for some reason a file transfer did not work for several days you will have to run the same functionality, but for a past date and not today. currently in our system i am copying the script that performs the task and i am modifying the place where it gets the current date. that's a pain in the ass! it would be no problem at all if all of these scripts would take the date on which they work as an argument. and besides that the disovery at runtime does not make a task less complex or easier, i would say quite the opposite. - the scheduler knows already the time, because it triggered the execution of the job. - at a file arrival event the scheduler already knows the filename of the file! if you would code this information into the job then your job would always be tied to the current date and/or an explicitely hardcoded filename or you would introduce an additional link to your configuration database from all of your jobs. >> - a local and a remote entry point (you should be able to trigger every >> task by hand with arguments either locally from the commandline or >> remotely via a network protocol) > > > It'll be possible to trigger events remotely, that's why it's called network > job scheduler. The EVENTxJOB concept allow us to easily configure user > generated events, even through CLI, GUI or remote GUI. yes, but modularity would be a lot better preserved if the scheduler would not be in the way for executing jobs, e.g. if you would write a little shell script that uses functionality already existing as a job in the scheduler i would prefer to be able to call it directly. besides that if i would call a job in that context i would expect it to give me the log/error messages on stderr and not only in its internal configured logging mechanism. as soon as you have a defined remote interface to your jobs (CORBA, RMI, RPC, SOAP, JINI, ...) it does not matter anymore if you use a CLI or a GUI. in both cases you just call a remote procedure. >>- logging of info and error messages and > > > We plan not only to log error messages, but also running statistics, and in > the future to use these execution statistics to preview job execution. that's a very good point! >> - a system that reacts on error events and autonomously starts >> "recovery" tasks > > > An error can be handled as an event. And then run a error recovery job. yes, that's true. i was not very explicit in my original mail, but actually i have something more like a "workflow" engine in mind that defines on a very high level how tasks are related to each other. an error condition would relate two tasks with each other. concrete, something similar like: http://openemcee.org/ you are right that an error can be handled as an event. but how would you identify it in the condition that runs the recovery task? by name, e.g. "job-name-event-0x01"? if you define it in a "workflow" like manner it is immediately clear from which component an event flows to which recovery component. >> perhaps the system should even contain a reliable file transfer >> mechanism to feed the results of one task on one machine into another >> task on another machine in the cluster. > > > Reliable file transfer -> ftp + ssl + md5 (KISS) > > I'm not that sure, but I think to do this using the database. ftp + ssl + md5 is probably a good solution, but i would like to have something that when you put on a given machine it is self contained and will just work. i would not want to configure several other services first. besides that on a unix file system you have anyway a problem with discovering that a file was created, modified, deleted (dnotify/FAM are just add-ons). therefore if you integrate this functionality in the overall product you don't have that problem, because the file reception side will just feed an event into the scheduler on file arrival. one more thing with file transfers: because i am mostly interested in a solution for a cluster of machines you would also have to provide for a solution on not only how to update the crontab, but also on how to upgrade existing jobs or installing new jobs. we are not yet at the point of discussing "java", but ideally i would like to hava a job = "a jar archive" mapping so that the file transfer mechanism can be used to upgrade the jobs aswell. >> i personally would suggest to use java as the implementation basis for >> such a cron replacement, because then it will be easily portable. > > > <zealot chat> > I'm a FreeSoftware zealot, and I won't use Java. :o) > </zealot chat> what makes you think FreeSoftware != Java? have a look at: http://www.apache.org/ > Take a look at CPAN for more information about Perl portability. > http://www.cpan.org/ i know perl and i write scripts in it. but for anything bigger i want something that supports proper data structures ;) no, don't let us go into that direction. ok, perl is also portable, has a huge software base in cpan, has a database interface over the DBI, ... i prefer java over perl because: - java supports better and easier deep data structures - java has immediate support for most network protocols (CORBA, RMI, RPC, SOAP, JINI, ...) - you have the eclipse development environment - you can properly debug java with a visual debugger - you have stack traces in your logfiles that allow you immediately to identify the source location of the problem - you have exception handling - if you want a GUI you have immediately a portable GUI via Swing or SWT - you can integrate python, scheme or lisp into your java programs via jython, sisc or abcl - you have with hsql a pure java integrated database solution - you have two good object relational mapping tools: castor and hibernate - you have several workflow management engines at your choice (http://java-source.net/open-source/workflow-engines) - ... >> besides that i would suggest to describe tasks as ant tasks in xml >> files. there is already a huge base of predefined ant tasks which are >> ideal for most system maintenance activities and besides that ant has >> already proven to run on all kinds of different platforms. in addition >> you should be able to write tasks directly in java if they cannot be >> expressed as ant tasks. > > > <zealot chat> > Sorry, for me XML is vapourware > </zealot chat> to that end you are right. xml is nothing more than scheme/lisp s-expressions. but nevertheless for any other language than scheme/lisp you don't have the advantage that the language comes already along with a configurable reader to parse configuration files. i see the big advantage of xml in that i don't have to write a parser for it in any language. all languages have in the meanwhile libraries that allow you to read xml files plus to validate them. > I can't understand why XML would be interesting as long as we already use a > database. I don't see advantages in storing XML into a database, besides when > strictly nedded. a database is not everything. a database is good for storing the configuration data that corresponds to the crontab of cron. but to define the jobs is a bit more complex. the best would be a real scripting language, e.g. SISC (http://sisc.sourceforge.net/) that could then call into the ant library for a lot of its functionality. but how many people out there know scheme? in our team nobody does. but all of them know ant and its xml configuration format. besides that i would like to have the job = "a jar file" correspondence so that you always know what you have to upgrade when you want to upgrade a job and you don't have to track dependencies between components. >> in your description of what saturn should be it is written "Job >> Scheduler for Networks. Control local and remote job scheduling through >> simple commands". i am not sure if that means that you have a central >> scheduler which then triggers commands on remote machines or that you >> have schedulers on all machines of the cluster which can be >> administrated from a central point? i guess version 2, so that even if >> you don"t have networking available the scheduler can do its job. > > > About the goal. > . there are a number of products that call themselves as schedulers, some as > simple as crontab, and others involving many features as ControlM. Saturn > aims > at ControlM, we hope to get there someday. > > About remote triggering. > That can be dangerous, don't you think? And it can be very easy to generate > unecessary overload in a single machine (imagine a network with 2500 Saturn > instances). obviously remote triggereing of tasks should be protected by an authentication mechanism. and yes, you have to know what you are doing :) >> because of the possibility of network failures the logging mechanism >> should allow for queuing messages so that when the network comes back >> the messages can be sent to the central collection point of all logging >> messages. > > > Interesting point. I'll think more about it before. As long as all the > data is > stored into a possibly remote database I'll think a future solution to this. perhaps here is a good point to mention the installation of new jobs or upgrade of existing jobs again. i would like to have both, push and pull options, e.g. either the central configuration host tells the other instances to upgrade a given job or they are configured to regularly check for upgrades. > Well, most of the functionality you requested will be covered in future > releases using the current concept. But as I can see you want to develop the > project using Java, which I discord at the moment. i explained this several times to a good friend of mine who is a strong proponent of perl: java is not about the language (there are a lot of languages out there which are much much better, but perl is not among them ;) ) it is about the HUGE functionality that you immediately inherit via its standardized libraries and interfaces. i am already looking forward to see the project grow and i would like to continue the discussion at that point. obviously you have to be happy with the implementation choices in your project, otherwise it won't be fun. > If you get interested I'll be very proud to have you as a developer. i am already interested :) and i will continue to watch your project, -- Christian Schuhegger http://www.el-chef.de |
From: Christian S. <Chr...@gm...> - 2005-04-15 05:57:17
|
hello list, sorry for the head-start that follows. i am considering a similar project for quite some time but i would like to suggest a slightly different direction and a wider scope. i would have the need for a not only scheduler for a cluster of machines where on every machine there is such a scheduler and all of them can be centrally administrated. besides that high reliability would be a must for such a service. i would say such a service should have roughly the following parts: - central configuration for the whole cluster of machines. the machines in the cluster belong to a certain type and you specify the "crontab" of types of machines. this configuration should allow for some parameters like ${hostname} or similar. - the scheduler itself. but it should not only provide the scheduling of for starting tasks, but aswell to set timouts on which you can configure alternative tasks to run or error messages to be sent. e.g. if a task is triggered by the arrival of a file you should be able to say that the file has to arrive every day until 22:00 o'clock or this is an error condition. - the tasks to execute. these tasks are "functionality" that take arguments like the time of execution or the filename of the file which just arrived and triggered the execution of this task. it is necessary that these parameters are arguments to the task and that you don't discover them at run-time because of the following point: - a local and a remote entry point (you should be able to trigger every task by hand with arguments either locally from the commandline or remotely via a network protocol) - logging of info and error messages and - a system that reacts on error events and autonomously starts "recovery" tasks perhaps the system should even contain a reliable file transfer mechanism to feed the results of one task on one machine into another task on another machine in the cluster. i personally would suggest to use java as the implementation basis for such a cron replacement, because then it will be easily portable. besides that i would suggest to describe tasks as ant tasks in xml files. there is already a huge base of predefined ant tasks which are ideal for most system maintenance activities and besides that ant has already proven to run on all kinds of different platforms. in addition you should be able to write tasks directly in java if they cannot be expressed as ant tasks. java supports all kinds of remote network interfaces like rmi, corba, soap, jini, ... which you could use to manually trigger tasks on any machine of the cluster. in this remote scenario you would have to provide some sort of authentication scheme so that you can restrict the access to your defined tasks. the central configuration for the tasks and possible the logging would come from or go to a database and there are jdbc drivers for basically all major databases out there. in your description of what saturn should be it is written "Job Scheduler for Networks. Control local and remote job scheduling through simple commands". i am not sure if that means that you have a central scheduler which then triggers commands on remote machines or that you have schedulers on all machines of the cluster which can be administrated from a central point? i guess version 2, so that even if you don't have networking available the scheduler can do its job. because of the possibility of network failures the logging mechanism should allow for queuing messages so that when the network comes back the messages can be sent to the central collection point of all logging messages. if you could agree to most of the points above then you have an additional developer for your project. thanks, -- Christian Schuhegger http://www.el-chef.de/ |