Re: [Queue-developers] New site at Savannah

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Koni wrote:
> The extant GNU queue didn't really do anything special to attempt to
> support heterogenous environments or even attempt to be aware of
> heterogeneity that I know of, what I have in mind will be no less
> supportive of mixed setups than the old GNU queue. From what I see in
> the old code GQ wouldn't have even handled distribution correctly
> between a PPC and x86 system, because the communication between "queue"
> and "queued" used binary formats.

To be honest the killer for me in the previous implemenation was the
reliance on shared NFS.  And my environment at the time was all big
endian so I don't know if there was an existing problem with
endianness or not.  But today I have a mixed big and little endian
environment.

Just because there is shared use of binary data file formats does not
mean this is going to be a problem between big and little endian
machines.  Programs that write binary data structures *are supposed
to* handle the difference between big and little endian data
structures.  That is, even in the original K&R doing a write(2) of a
binary structure was listed as non-portable.  Applications desiring
portability often used the byte order macros htonl() and ntohl() and
so forth to achieve cross platform binary compatibility.

None of this is an argument for binary data formats.  Just that binary
data formats by themelves do not mean endianness problems.  I actually
prefer plain text formats whenever possible.

> I will at least deal with making sure that new GQ itself can handle
> architecture differences when talking to itself across nodes,

Good!  That is 90% of the problem.

A large percentage of what is left is environment problems such as
PATH which may be different on different machines, for example.  So
one would always want to use a local environment.  A common misfeature
that I often see is copying PATH from one host to another and
expecting an HP-UX PATH (no /bin) to work on Solaris (needs
/usr/xpg4/bin) or some such.  With the growing use of GNU/Linux,
especially for development, it is very easy to believe all of the
world is as nice.  Unfortunately it is not.  And in the upcoming
GNU/Hurd I have been hearing of many interesting differences from the
previous compute model.  Things are going to be interesting.

> but otherwise it seems to me that it invites trouble to the
> non-programmer user unless the application being distributed is both
> programmed properly to handle input/output generated on other
> architectures, and installed correctly and locally on each system.

But that is exactly the case for people desiring heterogeneous
environments.  It is all taken care of already.

Let's say I have a CAD tool, a circuit simulation program for example,
that runs on GNU/Linux amd64, GNU/Linux i686 and HP-UX both 23-bit and
64-bit mode.  At that point from a queue point of view you can treat
it like 'cat', 'grep', 'sed', etc.  It will run on any of those
platforms.  Just invoke it.  Don't get caught up in the details of how
that can run on the different platforms because it is not important
from the perspective of the queue software.  Treat it as a black box
API just like 'sed.  The same is true for the reverse of those
programs using a queue system.

In reality most cad programs like what I am talking about in my
example are never invoked directly.  Usually they are invoked as a
wrapper script.  The #!/bin/sh script runs on all systems and detects
that PATH needed for that system, loads up the environment as needed,
then calls the real underlying binary to do the rest of the work.  The
promise of Java as a compile once and run anywhere application has
never been truly realized that I can tell in real life.

> Following up on that, we can make the new GQ system itself aware of what
> architecture each of the compute nodes are, and allow a user to specify
> which architecture(s) are acceptable for a job. That would allow
> developers of distributed apps and competent users to use GQ to take
> advantage of mixed environments where desired, provided their apps/jobs
> can handle it.

It is desireable to be able to specify that only 64-bit systems can
execute a task.  Or that only systems with more than X amount of
memory or more than Y amount of disk space or so forth.  Because I
have big tasks and little tasks.

> Beyond that, I flinch a bit at trying to make the complexities of
> distributed computing on ad-hoc heterogeneous clusters part of GQ's
> problem space.

I am not sure exactly to what you are referring.  I don't think we are
talking about building a beowulf style tighly integrated cluster.  But
regardless I don't think that is needed either.  Beowulf already
exists and serves a good use.  Other queue systems server a different
niche.  (And the new GNU Queue is still not sure what niche it will
fill.  Time will tell.)

> Personally, I see a decline in ad-hoc clusters formed from spare or idle
> systems and a rise in small dedicated clusters where the systems are
> purchased all at once.

I disagree.  I have a couple of thousand machines in my current
queues.  It is not possible to purchase a complete replacement at any
given time.  We do buy a rack here and a rack there all at once.  But
the n-1 equipment is still quite useful and does not get removed from
the queues until it is truly obsolete as n-3 equipment.

I think you are thinking that users would use one queue for old
equipment set A and a different queue for new equipment set B.  But my
experience is that users hate this type of coarse queue management.
Sure they like the fast new machines.  But the old ones are the bulk
of the system.  They then write some type of queue on the front of the
queue to be able to stuff jobs into both queues.  This is from actual
experience where users have done this and not something I am making up
as a contrived example.  So of course I would like to see GNU Queue
handle heterogeneous queues natively.

> Thus, if the latter truly is an expanding market, we need to have a
> release as soon as possible that can handle this simpler case well
> enough to establish a user base. The extensions above will be simple
> enough I think at that point.
>
> How does this plan sound?

If your design goal is to make a very simple queue that serves a very
simple set of hardware capabilities then that is fine.  It will
certainly have usefulness to many.  It is a perfectly valid design
goal.  But I don't think that is what most people think of when they
think of a queuing system.  Personally I think that it is
significantly not as useful as one that does support a mixed computing
environment.

Supporting a mixed environment is definitely incrementally harder than
assuming a homogeneous one.  So starting off small and growing larger
later may be a good development roadmap.  But developing without that
as an end goal may make it much more difficult to add later than
designing with that in mind up front.  I have a worry that if this is
not thought about as the design progresses that it becomes too
difficult to add later and becomes a lockout.

Let me finish by saying that I am not unhappy in any way if the new
GNU Queue does not fit my particular needs.  And unfortunately I am
not in a position at this moment to produce my own free software queue
project.  Therefore I can only stand on the sidelines and cheer on
those who are trying to volunteer their time to do this.  So let me
cheer you on and see what is produced.  It is your itch to scratch.
Don't let me dissuade you from your needs.  But if you ask my opinion
I will provide what I think is the most useful features as I see them
from my viewpoint.

Bob