Re: [Queue-developers] new design details
Brought to you by:
wkrebs
From: <bo...@pr...> - 2005-05-11 05:21:39
|
Koni wrote: > I envision 4 separate programs working together in this system: > > qs: Users use this program (like "queue" or "qsh" in GNU Queue) to > submit jobs [ Presently not implemented at all ] It would be bonus points to make this interoperable with the POSIX standard. As far as I can tell this is simply PBS. http://www.opengroup.org/onlinepubs/009695399/utilities/qsub.html Most queue software is completely unique and people don't really expect POSIX conformance at this point in history. It just does not seem to have the grip in the industry that conformance to other parts of the POSIX spec have. So I would not say this is really important at this time. But at least being aware of it would be good. > Some design goals/choices: > > NFS is not used for communication and distribution of the jobs. This was > a primary goal in the design for me. After getting into it, I have new > appreciation for the design of GNU Queue though. :) I looked into GNU queue way back in the beginning. But the integrated NFS as an integral part of the design made it unsuitable for my use with several thousand compute servers. I have been a lurker on the list every since because GNU queue had some nice features. It would be nice if a project goal were to have code portable to a wide range of platforms. I would say GNU/Linux, HP-UX, Solaris, AIX, SGI, Mac OS X, at a start. What this really means is avoiding a lot of heavy dependencies from weird libraries. Trying to build a large project with fifty library dependencies in a mix of C++ with heavy STL and C# and Java from scratch on AIX is not an easy task. > executed as fast as possible. the TIME_WAIT state of a closed TCP > connection hogs the system resources on the qm host, potentially I have actually seen this often on my network just doing normal TCP activity. I believe that flakey network hardware can raise the likelyhood of this problem drastically. I don't have a solution other than waiting for the TIME_WAIT timeout to clear the network stack. If you are opening and closing a lot of network connections very rapidly I think this could be a problem on bigger networks. But if this is only one connection per job then on my network that would not be a problem as that is well within tolerable limits. Bob |