Re: [Queue-developers] new design details

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Two comments.

1. Regarding the protocol, GQ's protocols largely
predated modern RPC standards, such as SOAP and XML. 

These are most suited for easy implementation in Java
or C#/Mono, where there are excellent development
environments (e.g., Eclipse, JBuilder, Visual Studio
.NET, etc.) that will almost write the code for you
automagically. Of these, Eclipse and Mono are Open
Source, so suitable for consideration here. SOAP does
fully support C,C++, even Perl, however, although
development is slightly more difficult IMHO.

I would think some of the current features of the GQ
TCP/IP protocol would be best done using some sort of
SOAP implementation. For example, aspects of the
initial authentication, and querying load information
would be best done using SOAP.

Also, since GQ was written, standard protocols for
this type of thing have emerged. Look at Apst/Apstd
system at SDSC (where, ironically, I used to work,
although not on that project):

http://grail.sdsc.edu/projects/apst/

Apst is a meta demon for cluster demons. It doesn't
currently support starting jobs using GQ, but does
support starting other (commerical) systems. GQ
support would be fairly trivial for them to add, if
they wanted to. SDSC (part of UCSD) receives grant
money from a firm that makes a GQ-like commercial
product, so it's not clear if that's a direction they
want to go in. They do support the commerical product.
However, the source code is available, so the
community is free to add support for GQ as well.

Apst will query each cluster manager (this would
similar to the qm program you are proposing) and
obtain load information via an XML file returned from
the cluster manager. It will then decide how many jobs
to start on that particular cluster (which it will
start using a crude ssh command-line protocol to
submit the jobs and scp to first transfer the relevant
files into place). It's up to the cluster manager to
then distribute the jobs to the cluster nodes.

Apst, which is C/C++ based (Apstd is available in
Java) is similar to Nimrod, which is Java-based.
Source code for all of these is available.

2. Regarding qm, a divison of the Texas Instruments
actually contributed a SQL-based qm in C++. (It would
require that an SQL database, preferably Open Source
and free such as Postgresql, be running on a server).

This is part of the GQ distribution, but is optional
and not compiled by default (due to C++ autoconf
problems at the time since resolved. Also, users wrote
to me explaining their preference for a small, simple
package with peer-to-peer behavior, rather than a
centralized package with a manager that might crash,
so the original behavior of GQ remained the default.)

Beforing writing a manager from scratch, you might
want to look at the manager code and documentation
that TI's subsidary contributed.
--
Werner G. Krebs, Ph.D.
Technical Specialist
Personal website: http://www.wernergkrebs.com

--- Koni <mh...@co...> wrote:

> 
> As promised before, here is some more details about
> the new system I
> mentioned before. 
> 
> Werner: This is more or less identical to what I
> sent you previously. 
> 
> 
> 
> I envision 4 separate programs working together in
> this system:
> 
> qs: Users use this program (like "queue" or "qsh" in
> GNU Queue) to
> submit jobs [ Presently not implemented at all ]
> 
> qm: the queue manager running on some central host.
> qs sends job
> requests to qm. [ ~60% implemented ]
> 
> qd: daemon running on slave or "compute" nodes,
> possibly on the same
> host as qm as well. More than one qd may run on any
> host, there may be
> any number of them on any number of hosts. Only qm
> talks to qd's,
> sending jobs as available. The distribution protocol
> works as a
> offer/volunteer system, qm sends offers to multiple
> qd's at once for the
> same job, willing qd's respond will a volunteer. qm
> assigns the job to
> exactly one qd. qd may refuse at this point too
> (resets job to offer
> stage), or commit and receive transfer of the job
> and begin execution.
> Important point is qd's decide autonomously if they
> can spare resources
> for the job. qm has some state information of the
> availability of the
> qd's it knows about and does not send offers to qd's
> it knows are fully
> committed, but qm does not need an accurate
> perception, it's qd's
> decision. [ ~70% implemented ]
> 
> qe: Execution agent forked and exec'd by the qd
> process for running a
> job. qe is responsible for setting the environment,
> calling back to
> waiting qs if foreground mode selected (called
> interactive mode in GNU
> Queue I think), validating and changing to the user
> of the job,
> monitoring for the termination of the job and return
> code, etc. qe is
> the only part of this system that needs to be setuid
> root. qd and qm may
> need to start as root to read the system wide key
> file (see below) but
> can drop privilege permanently after that. [
> currently a trivial program
> which just returns immediately is presently
> implemented ] 
> 
> Some design goals/choices:
> 
> NFS is not used for communication and distribution
> of the jobs. This was
> a primary goal in the design for me. After getting
> into it, I have new
> appreciation for the design of GNU Queue though. :)
> 
> Stateless UDP is used for communication for qm and
> qd, which results in
> some complexity in the code due to the possibility
> of lost messages.
> This is a design goal as persistent TCP connections
> consume file
> descriptors limiting the number of qd's that can be
> connected to qm. I
> would like this to scale well beyond typical limits
> for open file
> descriptors. 
> 
> All messages between qd and qm are cryptographically
> signed [ this is
> already fully implemented ] using key'd SHA-1. On
> connection, a
> registration protocol verifies the authenticity of
> both qm and qd by
> proving knowledge of a system-wide key. After
> registration, each qd is
> assigned a session key used to sign messages after
> that. qs will
> communicate username/password information
> (encrypted) to qm which will
> ultimately be passed from qd to qe which will
> authenticate before
> switching to the user requested. 
> 
> Much effort is being put into low latency
> distribution of jobs.
> Experimenting with the version of GNU Queue I have,
> after making several
> changes to get it to go and all, it takes a second
> or more between
> submission of a job and onset of execution in an
> idle cluster. Much of
> this I think is due to built-in deliberate delays to
> work-around NFS
> race conditions, hence my interest in eliminating
> NFS as a communication
> layer between submitting users and execution agents.
> My present system
> is seemingly instantaneous on an idle cluster (but
> much is not
> implemented yet), my goal is to have latency for
> executing say 1000 no-
> op jobs on a system with a single qd agent
> comparable to that of a shell
> script doing the same directly. 
> 
> Some drawbacks:
> 
> Security rests ultimately with the privacy of the
> system-wide key file,
> which must be installed or accessible to both qm and
> all qd agents.
> 
> All systems running qd must have access to the same
> authentication
> system for validating username/password for
> submitting users. NIS or
> something equivalent is probably the easiest both
> for me as developer
> and administrators at large who might use this
> thing. We can potentially
> use a custom arrangement through PAM too.
> 
> NFS or other shared network filesystem still
> required for user jobs to
> read/write input and output, unless they only want
> to use stdin/stdout
> in which case qs can handle it. I don't consider
> this a problem really
> for dedicated systems.
> 
> Job transfer takes place over a transient TCP
> connection, but I've
> noticed this can cause a hiccup (qm pauses for
> several seconds but
> eventually resumes rapid distribution of jobs) if
> the TCP SYN packet is
> lost, which seems to happen after about 30,000 jobs
> have been sent and
> executed as fast as possible. the TIME_WAIT state of
> a closed TCP
> connection hogs the system resources on the qm host,
> potentially
> blocking the opening of new connections until
> resources are available.
> This is only a problem in the pathological case of
> >30,000 no-op jobs at
> once, surely not a real-world problem. Presently the
> system will pause
> if the SYN packet is dropped when forming a new
> connection, and will
> wait until both enough TIME_WAIT old TCP connections
> are cleared and the
> SYN retransmit timer expires, at which point
> connection is established
> and distribution commences again.
> 
> This system has a central manager, qm, which the
> present GNU Queue does
> not. Failure at qm will cause the whole cluster to
> stop executing jobs
> after their present assignments. This does not
> happen with GNU Queue,
> unless the NFS server goes down. However, when NFS
> comes back, provided
> no corruption to the filesystem, everything
> continues. My system will
> need some crash-recovery complexity for qm. qd's can
> die and come back
> all they like.
> 
> 
> Comments are welcome. If you want to peak at the
> code, reply back to
> this list. If there is interest and no objections, I
> will post a copy of
> the source as is to the list. It doesn't do much for
> the moment except
> implement the qm -> qd -> qe chain of events and
> demonstrate the
> distribution of jobs.
> 
> Cheers,
> Koni
> 
> On Sun, 2005-05-01 at 19:38 -0400, Richard Stallman
> wrote:
> >     Anyhow, I suggested in my email to Koni and
> Mike that
> >     we wait a week or two for Mike to respond.
> > 
> > I think that is a reasonable plan.  The program
> needs a maintainer who
> > will make releases, and more generally, who will
> give the program
> > proper attention.
> > 
> > 					       At some
> >     point, we'd post publically, and then wait
> about 30
> >     days or some reasonable time. 
> > 
> > I don't understand that part.  Wait 30 days for
> what?
> > 
> >     What's the standard procedure for reclaiming
> an
> >     abandoned GNU project?
> > 
> > I can appoint (and remove) maintainers at any
> time.
> > So once the situation is clear, I can simply
> appoint
> > a new maintainer for GNU Queue.
> 
> 
> 
>
-------------------------------------------------------
> This SF.Net email is sponsored by: NEC IT Guy Games.
> Get your fingers limbered up and give it your best
> shot. 4 great events, 4
> opportunities to win big! Highest score wins.NEC IT
> Guy Games. Play to
> win an NEC 61 plasma display. Visit
> http://www.necitguy.com/?r=20
> _______________________________________________
> Queue-developers mailing list
> Que...@li...
> To unsubscribe, subscribe, or set options:
>
https://lists.sourceforge.net/lists/listinfo/queue-developers
>