Re: [Queue-developers] New site at Savannah
Brought to you by:
wkrebs
From: <bo...@pr...> - 2005-07-05 22:54:34
|
Koni wrote: > The extant GNU queue didn't really do anything special to attempt to > support heterogenous environments or even attempt to be aware of > heterogeneity that I know of, what I have in mind will be no less > supportive of mixed setups than the old GNU queue. From what I see in > the old code GQ wouldn't have even handled distribution correctly > between a PPC and x86 system, because the communication between "queue" > and "queued" used binary formats. To be honest the killer for me in the previous implemenation was the reliance on shared NFS. And my environment at the time was all big endian so I don't know if there was an existing problem with endianness or not. But today I have a mixed big and little endian environment. Just because there is shared use of binary data file formats does not mean this is going to be a problem between big and little endian machines. Programs that write binary data structures *are supposed to* handle the difference between big and little endian data structures. That is, even in the original K&R doing a write(2) of a binary structure was listed as non-portable. Applications desiring portability often used the byte order macros htonl() and ntohl() and so forth to achieve cross platform binary compatibility. None of this is an argument for binary data formats. Just that binary data formats by themelves do not mean endianness problems. I actually prefer plain text formats whenever possible. > I will at least deal with making sure that new GQ itself can handle > architecture differences when talking to itself across nodes, Good! That is 90% of the problem. A large percentage of what is left is environment problems such as PATH which may be different on different machines, for example. So one would always want to use a local environment. A common misfeature that I often see is copying PATH from one host to another and expecting an HP-UX PATH (no /bin) to work on Solaris (needs /usr/xpg4/bin) or some such. With the growing use of GNU/Linux, especially for development, it is very easy to believe all of the world is as nice. Unfortunately it is not. And in the upcoming GNU/Hurd I have been hearing of many interesting differences from the previous compute model. Things are going to be interesting. > but otherwise it seems to me that it invites trouble to the > non-programmer user unless the application being distributed is both > programmed properly to handle input/output generated on other > architectures, and installed correctly and locally on each system. But that is exactly the case for people desiring heterogeneous environments. It is all taken care of already. Let's say I have a CAD tool, a circuit simulation program for example, that runs on GNU/Linux amd64, GNU/Linux i686 and HP-UX both 23-bit and 64-bit mode. At that point from a queue point of view you can treat it like 'cat', 'grep', 'sed', etc. It will run on any of those platforms. Just invoke it. Don't get caught up in the details of how that can run on the different platforms because it is not important from the perspective of the queue software. Treat it as a black box API just like 'sed. The same is true for the reverse of those programs using a queue system. In reality most cad programs like what I am talking about in my example are never invoked directly. Usually they are invoked as a wrapper script. The #!/bin/sh script runs on all systems and detects that PATH needed for that system, loads up the environment as needed, then calls the real underlying binary to do the rest of the work. The promise of Java as a compile once and run anywhere application has never been truly realized that I can tell in real life. > Following up on that, we can make the new GQ system itself aware of what > architecture each of the compute nodes are, and allow a user to specify > which architecture(s) are acceptable for a job. That would allow > developers of distributed apps and competent users to use GQ to take > advantage of mixed environments where desired, provided their apps/jobs > can handle it. It is desireable to be able to specify that only 64-bit systems can execute a task. Or that only systems with more than X amount of memory or more than Y amount of disk space or so forth. Because I have big tasks and little tasks. > Beyond that, I flinch a bit at trying to make the complexities of > distributed computing on ad-hoc heterogeneous clusters part of GQ's > problem space. I am not sure exactly to what you are referring. I don't think we are talking about building a beowulf style tighly integrated cluster. But regardless I don't think that is needed either. Beowulf already exists and serves a good use. Other queue systems server a different niche. (And the new GNU Queue is still not sure what niche it will fill. Time will tell.) > Personally, I see a decline in ad-hoc clusters formed from spare or idle > systems and a rise in small dedicated clusters where the systems are > purchased all at once. I disagree. I have a couple of thousand machines in my current queues. It is not possible to purchase a complete replacement at any given time. We do buy a rack here and a rack there all at once. But the n-1 equipment is still quite useful and does not get removed from the queues until it is truly obsolete as n-3 equipment. I think you are thinking that users would use one queue for old equipment set A and a different queue for new equipment set B. But my experience is that users hate this type of coarse queue management. Sure they like the fast new machines. But the old ones are the bulk of the system. They then write some type of queue on the front of the queue to be able to stuff jobs into both queues. This is from actual experience where users have done this and not something I am making up as a contrived example. So of course I would like to see GNU Queue handle heterogeneous queues natively. > Thus, if the latter truly is an expanding market, we need to have a > release as soon as possible that can handle this simpler case well > enough to establish a user base. The extensions above will be simple > enough I think at that point. > > How does this plan sound? If your design goal is to make a very simple queue that serves a very simple set of hardware capabilities then that is fine. It will certainly have usefulness to many. It is a perfectly valid design goal. But I don't think that is what most people think of when they think of a queuing system. Personally I think that it is significantly not as useful as one that does support a mixed computing environment. Supporting a mixed environment is definitely incrementally harder than assuming a homogeneous one. So starting off small and growing larger later may be a good development roadmap. But developing without that as an end goal may make it much more difficult to add later than designing with that in mind up front. I have a worry that if this is not thought about as the design progresses that it becomes too difficult to add later and becomes a lockout. Let me finish by saying that I am not unhappy in any way if the new GNU Queue does not fit my particular needs. And unfortunately I am not in a position at this moment to produce my own free software queue project. Therefore I can only stand on the sidelines and cheer on those who are trying to volunteer their time to do this. So let me cheer you on and see what is produced. It is your itch to scratch. Don't let me dissuade you from your needs. But if you ask my opinion I will provide what I think is the most useful features as I see them from my viewpoint. Bob |