[Saturn-devel] Re: i sent a mail to the saturn mailing list

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

hi fabio,

Fabio A Mazzarino wrote:
>  I'm sorry for that, actually I'm not subscribed in neither of the lists at
> SF.net. I shouldn't have believed that the list administrator would be a
> subscriber by default.

i thought so :) i made the same mistake.

>> - central configuration for the whole cluster of machines. the machines
>> in the cluster belong to a certain type and you specify the "crontab" of
>> types of machines. this configuration should allow for some parameters
>> like ${hostname} or similar.
> 
> 
>  The basic idea is to make possible to centralize the configuration of each
> machine in a network, server or workstation. I'd like to keep the 'option'
> and
> not the obligation of configure it centralized.

to have the option between several choices is a good thing. but did you 
see the concept of having "types of machines", e.g. in a cluster you 
would not want to configure every single machine, you would say 
configure all web-servers like this, configure all application servers 
like that ...

>> - the scheduler itself. but it should not only provide the scheduling of
>> for starting tasks, but aswell to set timouts on which you can configure
>> alternative tasks to run or error messages to be sent. e.g. if a task is
>> triggered by the arrival of a file you should be able to say that the
>> file has to arrive every day until 22:00 o"clock or this is an error
>> condition.
> 
> 
>  Let's see some of the concepts involved in Saturn.
>  . What you call tasks, are called at Saturn JOBs.
>  . JOBs are triggered by EVENTs, and these events are triggered not only by
> time events, like in crontab, but also filesystem events, system events,
> or even job generating events.
> 
>  With this conception it's possible to determine a set of events to trigger a
> job.

in order to provide for timeouts you would have to allow in the 
"language" in which you define your trigger condition to refer to "job A 
did run today already" or something similar. otherwise you won't be able 
to define a job that runs if a timeout occurs.

timeouts are really something special. you even have two sorts of 
timeouts, one is triggered if the job did not run until a given point in 
time and one is triggered if the job runs to long (watchdog).

>> - the tasks to execute. these tasks are "functionality" that take
>> arguments like the time of execution or the filename of the file which
>> just arrived and triggered the execution of this task. it is necessary
>> that these parameters are arguments to the task and that you don"t
>> discover them at run-time because of the following point:
> 
> 
>  KISS. By creating these special task/JOB (I understood  you want to create a
> special JOB) you'll add an unecessary complexicity. Try to avoid this.
>  Discovery the arguments at run-time makes it easier to run.
> 
>  I think that I havent gotten you concept. Let's talk more about this.

exactly because of KISS i would propose to give arguments to jobs :)

the scheduler is only responsible for kicking off activity (jobs/tasks) 
or for killing activity (watchdog).

then you have your jobs. i would like to see a job as a normal function 
like in C or Perl and functions take arguments.

for example currently we have in our system several jobs that work with 
the current date. but if for some reason a file transfer did not work 
for several days you will have to run the same functionality, but for a 
past date and not today. currently in our system i am copying the script 
that performs the task and i am modifying the place where it gets the 
current date. that's a pain in the ass!

it would be no problem at all if all of these scripts would take the 
date on which they work as an argument.

and besides that the disovery at runtime does not make a task less 
complex or easier, i would say quite the opposite.
- the scheduler knows already the time, because it triggered the 
execution of the job.
- at a file arrival event the scheduler already knows the filename of 
the file!
if you would code this information into the job then your job would 
always be tied to the current date and/or an explicitely hardcoded 
filename or you would introduce an additional link to your configuration 
database from all of your jobs.

>> - a local and a remote entry point (you should be able to trigger every
>> task by hand with arguments either locally from the commandline or
>> remotely via a network protocol)
> 
> 
>  It'll be possible to trigger events remotely, that's why it's called network
> job scheduler. The EVENTxJOB concept allow us to easily configure user
> generated events, even through CLI, GUI or remote GUI.

yes, but modularity would be a lot better preserved if the scheduler 
would not be in the way for executing jobs, e.g. if you would write a 
little shell script that uses functionality already existing as a job in 
the scheduler i would prefer to be able to call it directly. besides 
that if i would call a job in that context i would expect it to give me 
the log/error messages on stderr and not only in its internal configured 
logging mechanism.

as soon as you have a defined remote interface to your jobs (CORBA, RMI, 
RPC, SOAP, JINI, ...) it does not matter anymore if you use a CLI or a 
GUI. in both cases you just call a remote procedure.

>>- logging of info and error messages and
> 
> 
>  We plan not only to log error messages, but also running statistics, and in
> the future to use these execution statistics to preview job execution.

that's a very good point!

>> - a system that reacts on error events and autonomously starts
>> "recovery" tasks
> 
> 
>  An error can be handled as an event. And then run a error recovery job.

yes, that's true. i was not very explicit in my original mail, but 
actually i have something more like a "workflow" engine in mind that 
defines on a very high level how tasks are related to each other. an 
error condition would relate two tasks with each other.

concrete, something similar like:
http://openemcee.org/

you are right that an error can be handled as an event. but how would 
you identify it in the condition that runs the recovery task? by name, 
e.g. "job-name-event-0x01"? if you define it in a "workflow" like manner 
it is immediately clear from which component an event flows to which 
recovery component.

>> perhaps the system should even contain a reliable file transfer
>> mechanism to feed the results of one task on one machine into another
>> task on another machine in the cluster.
> 
> 
>  Reliable file transfer -> ftp + ssl + md5 (KISS)
> 
>  I'm not that sure, but I think to do this using the database.

ftp + ssl + md5 is probably a good solution, but i would like to have 
something that when you put on a given machine it is self contained and 
will just work. i would not want to configure several other services 
first. besides that on a unix file system you have anyway a problem with 
discovering that a file was created, modified, deleted (dnotify/FAM are 
just add-ons). therefore if you integrate this functionality in the 
overall product you don't have that problem, because the file reception 
side will just feed an event into the scheduler on file arrival.

one more thing with file transfers:
because i am mostly interested in a solution for a cluster of machines 
you would also have to provide for a solution on not only how to update 
the crontab, but also on how to upgrade existing jobs or installing new 
jobs. we are not yet at the point of discussing "java", but ideally i 
would like to hava a job = "a jar archive" mapping so that the file 
transfer mechanism can be used to upgrade the jobs aswell.

>> i personally would suggest to use java as the implementation basis for
>> such a cron replacement, because then it will be easily portable.
> 
> 
> <zealot chat>
>  I'm a FreeSoftware zealot, and I won't use Java. :o)
> </zealot chat>

what makes you think FreeSoftware != Java?

have a look at:
   http://www.apache.org/

>  Take a look at CPAN for more information about Perl portability.
>  http://www.cpan.org/

i know perl and i write scripts in it. but for anything bigger i want 
something that supports proper data structures ;)

no, don't let us go into that direction. ok, perl is also portable, has 
a huge software base in cpan, has a database interface over the DBI, ...

i prefer java over perl because:
- java supports better and easier deep data structures
- java has immediate support for most network protocols (CORBA, RMI, 
RPC, SOAP, JINI, ...)
- you have the eclipse development environment
- you can properly debug java with a visual debugger
- you have stack traces in your logfiles that allow you immediately to 
identify the source location of the problem
- you have exception handling
- if you want a GUI you have immediately a portable GUI via Swing or SWT
- you can integrate python, scheme or lisp into your java programs via 
jython, sisc or abcl
- you have with hsql a pure java integrated database solution
- you have two good object relational mapping tools: castor and hibernate
- you have several workflow management engines at your choice 
(http://java-source.net/open-source/workflow-engines)
- ...

>> besides that i would suggest to describe tasks as ant tasks in xml
>> files. there is already a huge base of predefined ant tasks which are
>> ideal for most system maintenance activities and besides that ant has
>> already proven to run on all kinds of different platforms. in addition
>> you should be able to write tasks directly in java if they cannot be
>> expressed as ant tasks.
> 
> 
> <zealot chat>
>  Sorry, for me XML is vapourware
> </zealot chat>

to that end you are right. xml is nothing more than scheme/lisp 
s-expressions. but nevertheless for any other language than scheme/lisp 
you don't have the advantage that the language comes already along with 
a configurable reader to parse configuration files.

i see the big advantage of xml in that i don't have to write a parser 
for it in any language. all languages have in the meanwhile libraries 
that allow you to read xml files plus to validate them.

>  I can't understand why XML would be interesting as long as we already use a
> database. I don't see advantages in storing XML into a database, besides when
> strictly nedded.

a database is not everything. a database is good for storing the 
configuration data that corresponds to the crontab of cron. but to 
define the jobs is a bit more complex. the best would be a real 
scripting language, e.g. SISC (http://sisc.sourceforge.net/) that could 
then call into the ant library for a lot of its functionality.

but how many people out there know scheme? in our team nobody does. but 
all of them know ant and its xml configuration format.

besides that i would like to have the job = "a jar file" correspondence 
so that you always know what you have to upgrade when you want to 
upgrade a job and you don't have to track dependencies between components.

>> in your description of what saturn should be it is written "Job
>> Scheduler for Networks. Control local and remote job scheduling through
>> simple commands". i am not sure if that means that you have a central
>> scheduler which then triggers commands on remote machines or that you
>> have schedulers on all machines of the cluster which can be
>> administrated from a central point? i guess version 2, so that even if
>> you don"t have networking available the scheduler can do its job.
> 
> 
>  About the goal.
>  . there are a number of products that call themselves as schedulers, some as
> simple as crontab, and others involving many features as ControlM. Saturn
> aims
> at ControlM, we hope to get there someday.
> 
>  About remote triggering.
>  That can be dangerous, don't you think? And it can be very easy to generate
> unecessary overload in a single machine (imagine a network with 2500 Saturn
> instances).

obviously remote triggereing of tasks should be protected by an 
authentication mechanism. and yes, you have to know what you are doing :)

>> because of the possibility of network failures the logging mechanism
>> should allow for queuing messages so that when the network comes back
>> the messages can be sent to the central collection point of all logging
>> messages.
> 
> 
>  Interesting point. I'll think more about it before. As long as all the
> data is
> stored into a possibly remote database I'll think a future solution to this.

perhaps here is a good point to mention the installation of new jobs or 
upgrade of existing jobs again. i would like to have both, push and pull 
options, e.g. either the central configuration host tells the other 
instances to upgrade a given job or they are configured to regularly 
check for upgrades.

>  Well, most of the functionality you requested will be covered in future
> releases using the current concept. But as I can see you want to develop the
> project using Java, which I discord at the moment.

i explained this several times to a good friend of mine who is a strong 
proponent of perl: java is not about the language (there are a lot of 
languages out there which are much much better, but perl is not among 
them ;) ) it is about the HUGE functionality that you immediately 
inherit via its standardized libraries and interfaces.

i am already looking forward to see the project grow and i would like to 
continue the discussion at that point. obviously you have to be happy 
with the implementation choices in your project, otherwise it won't be fun.

>  If you get interested I'll be very proud to have you as a developer.

i am already interested :) and i will continue to watch your project,
-- 
Christian Schuhegger
http://www.el-chef.de