queue-developers Mailing List for GNU Queue
Brought to you by:
wkrebs
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(8) |
Jun
(4) |
Jul
(4) |
Aug
(25) |
Sep
(9) |
Oct
(4) |
Nov
(4) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(15) |
Feb
(31) |
Mar
(26) |
Apr
(44) |
May
(39) |
Jun
(3) |
Jul
|
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
|
Dec
|
2003 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(5) |
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(9) |
Jun
(9) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Srivatsa, S. <Sha...@qw...> - 2007-04-05 07:20:23
|
Hi, =20 Iam trying to build the source code for GNU Queue on Linux.=20 I executed the command --> ./configure --prefix=3Dqueue ; make ; make install =20 on following sources=20 queue-1.40.1beta=20 queue-1.30.1=20 queue-1.20.1=20 At the compilation stage Iam getting the following error for all the above sources=20 queued.c:3923: dereferencing pointer to incomplete type=20 queued.c:3923: dereferencing pointer to incomplete type=20 queued.c:3923: dereferencing pointer to incomplete type=20 queued.c:3923: dereferencing pointer to incomplete type=20 queued.c:3923: dereferencing pointer to incomplete type=20 queued.c:3923: dereferencing pointer to incomplete type=20 queued.c:3931: dereferencing pointer to incomplete type=20 queued.c:3931: dereferencing pointer to incomplete type=20 queued.c:3935: dereferencing pointer to incomplete type=20 queued.c:3936: dereferencing pointer to incomplete type=20 queued.c:3936: dereferencing pointer to incomplete type=20 queued.c:3938: dereferencing pointer to incomplete type=20 queued.c:3938: dereferencing pointer to incomplete type=20 queued.c:3946: dereferencing pointer to incomplete type=20 queued.c:3946: dereferencing pointer to incomplete type=20 queued.c:3947: dereferencing pointer to incomplete type=20 queued.c:3947: dereferencing pointer to incomplete type=20 queued.c:3955: dereferencing pointer to incomplete type=20 queued.c:3955: dereferencing pointer to incomplete type=20 queued.c: In function `check_query':=20 queued.c:4015: warning: passing arg 2 of `bind' from incompatible pointer type=20 queued.c:4041: warning: passing arg 2 of `bind' from incompatible pointer type=20 queued.c:4268: warning: assignment makes pointer from integer without a cast=20 queued.c:4300: warning: assignment makes pointer from integer without a cast=20 =20 Can someone help me to resolve this error.=20 Thanks & Regards,=20 Sharath=20 This communication is the property of Qwest and may contain confidential = or privileged information. Unauthorized use of this communication is = strictly=20 prohibited and may be unlawful. If you have received this communication = in error, please immediately notify the sender by reply e-mail and = destroy=20 all copies of the communication and any attachments. |
From: arthy g. <ar...@ya...> - 2007-01-12 10:38:52
|
Hello Henry Tillotson, I saw your mail about CC dependency option in queue-developers mailing list. Pls refer the following link http://sourceforge.net/mailarchive/message.php?msg_id=449223 You have specified that to generate dependencies we have to use CC as specified below. $(COMPILE) -xM $< > .deps/$(*F).pp $(COMPILE) -c $< By default I think that the -xM option assumes that all the object files are in the same directory as the output file Suppose if the object files are in a different directory than the source files then how does the -xM recoginize it?? Because when I give in my makefiel as given below, depend option works fine. $(OBJS):%.o:%.cxx $(CXX) -xM1 $(CFLAGS) $< > $*.P $(CXX) $(CFLAGS) $(@F:.o=.cxx) -o $@ -include $(SRCS:.cxx=.P) But when the object files are in a different directory as given below it does not work properly. $(OBJS):$(OUTPUT_DIR)/%.o:%.cxx $(CXX) -xM1 $(CFLAGS) $< > $(OUTPUT_DIR)/$*.P $(CXX) $(CFLAGS) $(@F:.o=.cxx) -o $@ -include $(SRCS:%.cxx=$(OUTPUT_DIR)/%.P) From what I found when -xM1 ouputs info it assumes that alll the object files are in the same directory as the source files. When I give a "make" in the source directory it displays like this. CEventQueue.o : CEventQueue.cxx CEventQueue.o : CEventQueue.h CEventQueue.o : ../../pdfcore/src//CCoQueueService.h CEventQueue.o : ../../pdfcore/src//CCoPthreadObj.h CEventQueue.o : ../../pdfcommon/src//CPdfObject.h CEventQueue.o : ../../pdfcommon/src//CPdfTrace.h CEventQueue.o : ../../pdfcommon/src//CPdfException.h CEventQueue.o : ../../pdfcommon/src//PdfType.h But actually it should refer the .o's like this. ../../build/CEventQueue.o : CEventQueue.cxx where ../../build is the OUTPUT_DIR. I found that there is no CC compiler option for adding prefix to object files. Could you tell me what to do now.??All the makefiles in our project create the .o files in a seperate directory other than the source directory. Thanks in advance, GA --------------------------------- Meet your soulmate! Yahoo! Asia presents Meetic - where millions of singles gather |
From: Koni <mh...@co...> - 2006-02-17 21:10:09
|
Hi Folks, Anyone here at this mailing list interested in following GNU queue development discussions should now join the savannah mailing list "gnuqueue-devel", accessible from the GNU Queue project pages there. Here is a link to the mailing list page: https://savannah.gnu.org/mail/?group=gnu-queue I will post a message to that mailing list sometime this weekend regarding the problems with the code base and design I developed last summer and propose a different approach that I hope may generate more interest in active involvement from other developers. Cheers, K |
From: Koni <mh...@co...> - 2005-07-24 14:57:17
|
Hi folks, The new gnu-queue vaporware is now publically inspectable at least... I've created a new CVS repository at the Savannah site and imported all of the new source tree, automake/autoconf build files & other GNU project standard files (place holders at the moment). check it out if you want, it will not work out of the box or do anything impressive and I don't have a chance to write some decent instructions for how to get it to at least try to do something. At the very least it will need a system wide key file created called "system-keyfile" which the programs just expect to find in their cwd. It just needs 20 random bytes in there. To make one, just do dd if=/dev/random of=system-keyfile count=20 When this is all a bit more mature, this file will be created by an install script, and it must be readable only by root or by a non-root "system user" defined specifically for running the gnu-queue system. Security depends entirely on the privacy of this file. If that sounds sketchy (it is sketchy), I might point out that security of your SSH daemon rests entirely with the privacy of your private host key file... Some brief explanations: To build, just ./configure && make -- if you get build errors on your system, send the output to the list and what system you are trying on. It has been tested on one system, mine, which is FC3. Anybody who has different gnu/linux distros, or non-gnu/linux systems, we'll need your help getting the build cleaned up. There are 5 programs, here is what they do: qs: user command submission program qd: daemon running on compute nodes, receives job advertisements and autonomously decides whether or not it's local system can execute them. (if so, it responds with a volunteer message -- qm (below) picks one of the volunteers) qm: scheduler. receives connections from qs, distributes jobs to qd qe: execution agent. This program must be setuid (you will need to do that manually). This program is the only program which must run as root (except qlogin), which it only needs to change to the user of the job submitter. It is spawned by qd, but qd does not (and should not) run as root. qe does not listen to the network, and only switches users and performs execution if the username and password supplied by qd can be validated via PAM. This is a security feature. qlogin: This program must be setuid so it can read the system-keyfile. It will create a file in the user's home directory called ".qtoken" which contains the user's password in an encrypted form, as well as some other bull. This saves the user the trouble of typing their password on every invocation of qs. presently, this is required as qs can not prompt for the password directly. qlogin will check the password via PAM to make sure it's right. Otherwise, a user could queue something with a bad .qtoken and then some time later, when their jobs come up in the queue, they will all fail because qe will refuse to execute them if it can't verify the submitting user's identity. So, after creating the system-keyfile, you could try to test out on your local system, like this: [NOTE: qm and qd do not daemonize themselves, because I'm still working on them quite a bit. There may or may not be a lot of debugging output depending on whether I left that on in the source I just checked in to CVS ] open a terminal, ./qm open another terminal, ./qd open a third terminal, then ./qlogin ./qs "ls -l > ls-test" if you get a file called "ls-test" in the directory when that command returns, and it looks like the output of ls -l, then it worked. If it doesn't, open up the source and see if you can figure out why... Note there is no terminal support or any I/O redirect back to ./qs itself, so you must quote the I/O redirect so that it happens at the remote end. Ok, that's all I got time for. I won't have time for much else for quite a while as I am collecting samples for my research now at the field which seems to take all the time and energy I have... nothing that can be done about that, plants are ready when they're ready. Cheers, Koni |
From: <bo...@pr...> - 2005-07-05 22:54:34
|
Koni wrote: > The extant GNU queue didn't really do anything special to attempt to > support heterogenous environments or even attempt to be aware of > heterogeneity that I know of, what I have in mind will be no less > supportive of mixed setups than the old GNU queue. From what I see in > the old code GQ wouldn't have even handled distribution correctly > between a PPC and x86 system, because the communication between "queue" > and "queued" used binary formats. To be honest the killer for me in the previous implemenation was the reliance on shared NFS. And my environment at the time was all big endian so I don't know if there was an existing problem with endianness or not. But today I have a mixed big and little endian environment. Just because there is shared use of binary data file formats does not mean this is going to be a problem between big and little endian machines. Programs that write binary data structures *are supposed to* handle the difference between big and little endian data structures. That is, even in the original K&R doing a write(2) of a binary structure was listed as non-portable. Applications desiring portability often used the byte order macros htonl() and ntohl() and so forth to achieve cross platform binary compatibility. None of this is an argument for binary data formats. Just that binary data formats by themelves do not mean endianness problems. I actually prefer plain text formats whenever possible. > I will at least deal with making sure that new GQ itself can handle > architecture differences when talking to itself across nodes, Good! That is 90% of the problem. A large percentage of what is left is environment problems such as PATH which may be different on different machines, for example. So one would always want to use a local environment. A common misfeature that I often see is copying PATH from one host to another and expecting an HP-UX PATH (no /bin) to work on Solaris (needs /usr/xpg4/bin) or some such. With the growing use of GNU/Linux, especially for development, it is very easy to believe all of the world is as nice. Unfortunately it is not. And in the upcoming GNU/Hurd I have been hearing of many interesting differences from the previous compute model. Things are going to be interesting. > but otherwise it seems to me that it invites trouble to the > non-programmer user unless the application being distributed is both > programmed properly to handle input/output generated on other > architectures, and installed correctly and locally on each system. But that is exactly the case for people desiring heterogeneous environments. It is all taken care of already. Let's say I have a CAD tool, a circuit simulation program for example, that runs on GNU/Linux amd64, GNU/Linux i686 and HP-UX both 23-bit and 64-bit mode. At that point from a queue point of view you can treat it like 'cat', 'grep', 'sed', etc. It will run on any of those platforms. Just invoke it. Don't get caught up in the details of how that can run on the different platforms because it is not important from the perspective of the queue software. Treat it as a black box API just like 'sed. The same is true for the reverse of those programs using a queue system. In reality most cad programs like what I am talking about in my example are never invoked directly. Usually they are invoked as a wrapper script. The #!/bin/sh script runs on all systems and detects that PATH needed for that system, loads up the environment as needed, then calls the real underlying binary to do the rest of the work. The promise of Java as a compile once and run anywhere application has never been truly realized that I can tell in real life. > Following up on that, we can make the new GQ system itself aware of what > architecture each of the compute nodes are, and allow a user to specify > which architecture(s) are acceptable for a job. That would allow > developers of distributed apps and competent users to use GQ to take > advantage of mixed environments where desired, provided their apps/jobs > can handle it. It is desireable to be able to specify that only 64-bit systems can execute a task. Or that only systems with more than X amount of memory or more than Y amount of disk space or so forth. Because I have big tasks and little tasks. > Beyond that, I flinch a bit at trying to make the complexities of > distributed computing on ad-hoc heterogeneous clusters part of GQ's > problem space. I am not sure exactly to what you are referring. I don't think we are talking about building a beowulf style tighly integrated cluster. But regardless I don't think that is needed either. Beowulf already exists and serves a good use. Other queue systems server a different niche. (And the new GNU Queue is still not sure what niche it will fill. Time will tell.) > Personally, I see a decline in ad-hoc clusters formed from spare or idle > systems and a rise in small dedicated clusters where the systems are > purchased all at once. I disagree. I have a couple of thousand machines in my current queues. It is not possible to purchase a complete replacement at any given time. We do buy a rack here and a rack there all at once. But the n-1 equipment is still quite useful and does not get removed from the queues until it is truly obsolete as n-3 equipment. I think you are thinking that users would use one queue for old equipment set A and a different queue for new equipment set B. But my experience is that users hate this type of coarse queue management. Sure they like the fast new machines. But the old ones are the bulk of the system. They then write some type of queue on the front of the queue to be able to stuff jobs into both queues. This is from actual experience where users have done this and not something I am making up as a contrived example. So of course I would like to see GNU Queue handle heterogeneous queues natively. > Thus, if the latter truly is an expanding market, we need to have a > release as soon as possible that can handle this simpler case well > enough to establish a user base. The extensions above will be simple > enough I think at that point. > > How does this plan sound? If your design goal is to make a very simple queue that serves a very simple set of hardware capabilities then that is fine. It will certainly have usefulness to many. It is a perfectly valid design goal. But I don't think that is what most people think of when they think of a queuing system. Personally I think that it is significantly not as useful as one that does support a mixed computing environment. Supporting a mixed environment is definitely incrementally harder than assuming a homogeneous one. So starting off small and growing larger later may be a good development roadmap. But developing without that as an end goal may make it much more difficult to add later than designing with that in mind up front. I have a worry that if this is not thought about as the design progresses that it becomes too difficult to add later and becomes a lockout. Let me finish by saying that I am not unhappy in any way if the new GNU Queue does not fit my particular needs. And unfortunately I am not in a position at this moment to produce my own free software queue project. Therefore I can only stand on the sidelines and cheer on those who are trying to volunteer their time to do this. So let me cheer you on and see what is produced. It is your itch to scratch. Don't let me dissuade you from your needs. But if you ask my opinion I will provide what I think is the most useful features as I see them from my viewpoint. Bob |
From: Koni <mh...@co...> - 2005-06-29 04:08:25
|
Hi Bob, thanks for joining up and offering your help. Below is relevant for the developer list so I am copying this there as well. > But reading through some of your initial design postings I see that > you seem to be concentrating on a completely homogeneous design where > all of the machines are identical. That will definitely limit the > usefulness to me personally. That is not to say that it won't be > useful to you or to someone else. But most compute pools are > heterogeneous and so this may be less useful to me. We shall see. > Mike also objected to my previous statements about that, I should clarify my thinking some. The extant GNU queue didn't really do anything special to attempt to support heterogenous environments or even attempt to be aware of heterogeneity that I know of, what I have in mind will be no less supportive of mixed setups than the old GNU queue. From what I see in the old code GQ wouldn't have even handled distribution correctly between a PPC and x86 system, because the communication between "queue" and "queued" used binary formats. I will at least deal with making sure that new GQ itself can handle architecture differences when talking to itself across nodes, but otherwise it seems to me that it invites trouble to the non-programmer user unless the application being distributed is both programmed properly to handle input/output generated on other architectures, and installed correctly and locally on each system. Following up on that, we can make the new GQ system itself aware of what architecture each of the compute nodes are, and allow a user to specify which architecture(s) are acceptable for a job. That would allow developers of distributed apps and competent users to use GQ to take advantage of mixed environments where desired, provided their apps/jobs can handle it. Beyond that, I flinch a bit at trying to make the complexities of distributed computing on ad-hoc heterogeneous clusters part of GQ's problem space. Personally, I see a decline in ad-hoc clusters formed from spare or idle systems and a rise in small dedicated clusters where the systems are purchased all at once. Thus, if the latter truly is an expanding market, we need to have a release as soon as possible that can handle this simpler case well enough to establish a user base. The extensions above will be simple enough I think at that point. How does this plan sound? K |
From: <bo...@pr...> - 2005-06-25 23:26:49
|
Koni wrote: > For the meantime, I will continue to use the sourceforge queue- > developers mailing this (this list) to communicate until enough active > interest has been registered through Savannah. At that time I will close > down this list and begin using new lists through Savannah. I suggest going ahead and creating the mailing list on gnu.org so that it is ready go to. You could subscribe it to the sourceforge list, or the reverse, until you decide to switch over. I think this would be good because the mailing list archives would then be all in one place. And I find the archive interface on gnu.org more pleasant to search through than many of the alternatives. > I will create a new CVS tree on Savannah with the new code base shortly. > Before doing so I will need to read some of the GNU maintainer docs and > identify what should be done to conform to GNU standards, especially > things like directory structure which is difficult to change in CVS once > established. Do you have a preview of the structure that you are planning? Something like 'find . -type d -print' output? Bob |
From: <bo...@pr...> - 2005-06-25 23:23:13
|
Koni wrote: > if anybody is strongly opposed to moving to savannah, please speak out > loud now. Savannah is preferred for official GNU projects. For most of the user base there is little difference to choose between them. > this means eventually disbanding this list and creating a new one > through gnu/savannah. There are dozens of subscribers to this list > according to sourceforge, but few respondents. I imagine that due to the low activity on the mailing list the last few years that people have gotten out of the habit of reading it. I know that I have. Bob |
From: Koni <mh...@co...> - 2005-06-24 15:53:50
|
There were no complaints or objections to closing the sourceforge development site and moving to GNU's Savannah site. Since we are a formal GNU project, I am moving the queue development website to Savannah to support GNU development and philosophy. The Savannah folks have now approved a new GNU queue development website on their system. You can find it at https://savannah.gnu.org/projects/gnu-queue/ If you are interested in participating or even just silently monitoring GNU queue's development, please add yourself to the project there. For the meantime, I will continue to use the sourceforge queue- developers mailing this (this list) to communicate until enough active interest has been registered through Savannah. At that time I will close down this list and begin using new lists through Savannah. I will create a new CVS tree on Savannah with the new code base shortly. Before doing so I will need to read some of the GNU maintainer docs and identify what should be done to conform to GNU standards, especially things like directory structure which is difficult to change in CVS once established. |
From: Koni <mh...@co...> - 2005-06-18 01:26:20
|
if anybody is strongly opposed to moving to savannah, please speak out loud now. this means eventually disbanding this list and creating a new one through gnu/savannah. There are dozens of subscribers to this list according to sourceforge, but few respondents. If we move to savannah and start using a new list, it will be up to you to subscribe yourself if you are interested in what develops with GNU queue. K |
From: Koni <mh...@co...> - 2005-06-15 14:38:48
|
> Richard, Werner, and Mark: > > My apologies for the issues I have caused. > > When I first took over Queue, I had ample spare time, being unemployed at > the time. I had expected to make some quick work of the project. > > But, as I got into it, it turned out to be more work than I anticipated, > for reasons I'll get into below. > > Then, of course, I found a job. Between work and raising a family, it > turned out I was never able to find the time to properly fix the problems. > Heck, finding time to properly reply to these messages from Werner and > RMS was just another indication of the problem. > > I could not, as much as Werner requested it, simply do a release with > just my name as the new maintainer. Releasing something that wouldn't > even compile just wasn't right, in my opinion. > > Some work I did get done on Queue is the following: > > Updated to modern autoconf. I think all of that is taken care of. > It should work with any autoconf-2.59+. Excellent. In the past I have had a lot of trouble understanding how to use the autoconf/automake system as a developer, hopefully I can follow the work you have done with autoconf and move it forward into a new code base. > > It does compile again, but all of the terminal allocation code is now absent > (that is, you HAVE to use -n now). > > Two main issues and several minor ones (plans really) still exist: > > 1) As I mentioned, no terminal code. The previous stuff was too outdated > to work on modern systems. I could have just borrowed code from a package > like screen, expect, or script or something. While screen and expect at > GPL, I was actually hoping to get something owned by FSF, and use it. > To that end, a post to one of the GNU lists asking for pointers would > probably be a good start. For the new development plans, I think we'll have to start in the same way, without the terminal capabilities and then add these features once job distribution, scheduling, and hopefully some management tools for users (ie, ability to list, delete, or delay jobs) to control the system are functional. > > 2) The more important issue, I think, is that the protocol, as currently > implemented, is subject to race conditions. I can deadlock in less > than 3 seconds with nothing more complicated than the `date' command. > This requires a complete overhaul, which is where I got caught up. I have observed the dead lock problem in the queue-stable branch and traced its sources to a couple different race conditions. Some of these can be worked around by inserting delays but guaranteeing correctness is complex. I agree here that complete overhaul of job distribution is the preferred approach. > > 3) Starting with the minor issues, or would be nices, would be a migration > from SF to Savannah. At one time, when I thought I was going to give > energy to Queue, I was going to do this migration, then they had the > security issue in 2003. I too would like to move away from sourceforge. I need to check out Savannah though. My main complaint with sourceforge is its too busy and complicated, and way too much irrelevant stuff on any webpage except for the "home" pages for the project. There are 4 or 5 different ways to post something, which seems unnecessary and confusing. For us as developers, its difficult to kept track of all the different places a user might post a bug, patch, or comment. > > 4) Slowly rewrite all of GQ to enable a definitive set of authorship, > to enable safely transferring the code to FSF ownership. (This was why > I didn't want to just pull terminal code from expect, or even screen, > as neither of those are FSF owned either.) I'm not sure how I feel about FSF ownership of copyright. I'm not opposed to it, but I not particularly in favor either because I not sure what it gains for GNU queue, and it may restrict what I might want to do in the future with my own code. I asked RMS about some details, he simply indicated that it is not required for FSF to have ownership for GNU queue. I guess we can think about it more when we have a new code base out and distributed under GPL with a known author list. > > 5) I'd had some grand ideas about rewriting both the config files and > communication protocols using some sort of XML structure. I'm less > convinced of that now. But the current set of configuration items are > too system specific, and every time I see a double go across the wire, > I wince. I really think that there should be more emphasis on heterogenous > environments, including configurations shared by multiple architectures. For the new stuff I've already defined config formats as a CFG with a flex/bison parser, likewise for transfer of job information to execution agents. I don't know XML and am loath to learn it (shame on me) as the syntax of the crap is just so damn ugly. See below about heterogeneous environments. > > 6) I'd also thought it's be cool to have some sort of library suitable for > use with linking into GNU Make for remote processing when using `make -j'. > I now think this can be accomplished by using SH=/some/wrapper/if/not/qsh > instead. I may be wrong though. distcc covers this territory. As for queue, if the cluster is already busy at all nodes, compilation would end up submitting jobs that wait in the queue. I'm not sure if we want to deal with complexity of "if resources available immediately, distribute job, else run locally", but it would be a cool feature. > > In looking over Mark's proposals, some of this may be addressed soon already, > particularly the protocol race-condition issue. At one point the question > was raised on whether or not any code from Queue could be reused or not > to implement some of his ideas. My gut reaction is probably not. Ideas, > sure. But probably not any code. Not to implement what he had in mind. > A re-write from scratch would probably be easier than trying to retrofit > some those ideas on top of the current code base. Well, reusing some of > the autotools stuff should work. > > I would like to emphasize heterogeneity again. Once thing that I read in > Mark's proposal was a seeming focus on Linux-kernel based systems. Or at > least homogenous environments. I strongly feel this is a big mistake. > All the world is not a VAX. Let's not continue to relearn this lesson. > By homogeneous systems I mean a dedicated collection of systems with the same architecture and operating system. I think this is the most common setup that potential users of queue will have, based on my own experience and what I see others around here doing. I would like queue to run on any such environment, not specifically linux/x86 setups. The lab down the hall has a homogeneous 100 node G5 cluster with Mac OS X, for instance. They use Sun GRID. Cornell theory center has several windows based clusters, they use their own queuing system. I'm not sure how many new purchases of solaris clusters or other unix systems there will be given that linux is highly competitive price/performance wise for anyone considering making the investment and wants a unix environment. Anyone buying Sun would probably use Sun GRID I suppose. Anyway, I would like queue to be a viable option no matter what the hardware or operating system is. That said, I only have access really to Linux/x86 based clusters here, so I can only develop/test/deploy with that. I hope that volunteers from the community will help test/adapt to other environments. As for mixed setups, say a old cluster on one architecture, and a new one on a different architecture, wearing the sys admin hat I would just opt for running two GNU queue installations and keeping them separate. Otherwise, a single queue installation can control both clusters and require the user to specify the architecture/environment required to run the job, but this ultimately passes responsibility to the user to understand that the same program on different systems, even if proper binaries are installed locally on those systems, may not produce output readable on other architectures due to binary formats of integers, floating points, etc. Some groups here develop in Java though and want to farm out those jobs to get better throughput. HPC with Java doesn't make much sense to me (like trying to haul a big load using an army of snails), but that kind of application could farm out to any system. For the near future development, I want to first focus on the simpler problem of homogeneous environments and do this well, build an active user base and development team, then broaden the range of application for GNU queue to include more complicated setups. > Werner, if you've not yet done so, please go ahead and remove me from the > SF project. I don't for see having any time to even participate in a role > as being able to compile+test, much less contribute any code. > > Again, my apologies. It shouldn't have required an external event like > this to kick start the process. > > Good luck, Mark! > > Cheers, mrc |
From: Koni <mh...@co...> - 2005-06-15 13:47:10
|
Hi folks, Message below is from Mike Castle regarding the recent changes for GNU queue, originally sent to just myself, Werner, and RMS, but Mike has authorized me to forward it to the development list. Please see his comments on future directions for development in the second half of the letter. I will add some comments in a reply following this message. Cheers, Koni -------- Forwarded Message -------- From: Mike Castle <da...@ix...> Reply-To: Mike Castle <da...@ix...> To: Richard Stallman <rm...@gn...> Cc: Mike Castle <da...@ix...>, mh...@co..., wer...@ya... Subject: Re: Maintaining GNU Queue Date: Mon, 13 Jun 2005 16:10:17 -0700 Richard, Werner, and Mark: My apologies for the issues I have caused. When I first took over Queue, I had ample spare time, being unemployed at the time. I had expected to make some quick work of the project. But, as I got into it, it turned out to be more work than I anticipated, for reasons I'll get into below. Then, of course, I found a job. Between work and raising a family, it turned out I was never able to find the time to properly fix the problems. Heck, finding time to properly reply to these messages from Werner and RMS was just another indication of the problem. I could not, as much as Werner requested it, simply do a release with just my name as the new maintainer. Releasing something that wouldn't even compile just wasn't right, in my opinion. Some work I did get done on Queue is the following: Updated to modern autoconf. I think all of that is taken care of. It should work with any autoconf-2.59+. It does compile again, but all of the terminal allocation code is now absent (that is, you HAVE to use -n now). Two main issues and several minor ones (plans really) still exist: 1) As I mentioned, no terminal code. The previous stuff was too outdated to work on modern systems. I could have just borrowed code from a package like screen, expect, or script or something. While screen and expect at GPL, I was actually hoping to get something owned by FSF, and use it. To that end, a post to one of the GNU lists asking for pointers would probably be a good start. 2) The more important issue, I think, is that the protocol, as currently implemented, is subject to race conditions. I can deadlock in less than 3 seconds with nothing more complicated than the `date' command. This requires a complete overhaul, which is where I got caught up. 3) Starting with the minor issues, or would be nices, would be a migration from SF to Savannah. At one time, when I thought I was going to give energy to Queue, I was going to do this migration, then they had the security issue in 2003. 4) Slowly rewrite all of GQ to enable a definitive set of authorship, to enable safely transferring the code to FSF ownership. (This was why I didn't want to just pull terminal code from expect, or even screen, as neither of those are FSF owned either.) 5) I'd had some grand ideas about rewriting both the config files and communication protocols using some sort of XML structure. I'm less convinced of that now. But the current set of configuration items are too system specific, and every time I see a double go across the wire, I wince. I really think that there should be more emphasis on heterogenous environments, including configurations shared by multiple architectures. 6) I'd also thought it's be cool to have some sort of library suitable for use with linking into GNU Make for remote processing when using `make -j'. I now think this can be accomplished by using SH=/some/wrapper/if/not/qsh instead. I may be wrong though. In looking over Mark's proposals, some of this may be addressed soon already, particularly the protocol race-condition issue. At one point the question was raised on whether or not any code from Queue could be reused or not to implement some of his ideas. My gut reaction is probably not. Ideas, sure. But probably not any code. Not to implement what he had in mind. A re-write from scratch would probably be easier than trying to retrofit some those ideas on top of the current code base. Well, reusing some of the autotools stuff should work. I would like to emphasize heterogeneity again. Once thing that I read in Mark's proposal was a seeming focus on Linux-kernel based systems. Or at least homogenous environments. I strongly feel this is a big mistake. All the world is not a VAX. Let's not continue to relearn this lesson. Werner, if you've not yet done so, please go ahead and remove me from the SF project. I don't for see having any time to even participate in a role as being able to compile+test, much less contribute any code. Again, my apologies. It shouldn't have required an external event like this to kick start the process. Good luck, Mark! Cheers, mrc |
From: Koni <mh...@co...> - 2005-06-13 22:10:12
|
Hello Everyone, With the encouragement of Werner Krebs and Richard Stallman, I humbly accept the responsibility for maintaining and leading development of GNU Queue. I have posted to this list a few times already, so hopefully I don't seem like I'm coming out of nowhere. I will briefly introduce myself: I am presently graduate student in the field of Genetics and Development at Cornell University. My present research interests are in population genetics and the study of standing diversity in wild plant populations with potential applications to improvement of domesticated varieties. My undergraduate study was in Computer Science, also at Cornell. As a Computer Science student my interests were primarily in distributed systems and cryptography. My interest in GNU Queue comes from an increasing need in the age of genomics for computational power. With cheap PCs making it possible for many labs to purchase their own small dedicated cluster systems, many groups including my lab here at Cornell need a simple and robust scheduling and distribution system. I identified GNU Queue (from the online documentation) as the best solution for our group's needs, but quickly found it had withered on the vine and became interested in trying to update/fix GQ. The next few weeks will be very busy for me, but I hope to post more information shortly on what I want to do to get started. Since I began inquiry into what was happening with GNU Queue development, I have begun development of a new code base. I will propose shortly to this list to replace the existing GNU Queue code as given on sourceforge with this system, once I have time to write more about it. I should also mention that I am somewhat concerned about whether or not I can spare sufficient time to provide consistent development and responsiveness to community. I will see what I can manage for the next 6 months and resign at that time if I find the project is too much a burden for me or the community feels insufficient progress has been made and another developer is interested in taking the lead. Comments are welcome, please make sure your reply goes back to the developers list. Koni |
From: wernerkrebs <wer...@ya...> - 2005-06-13 20:46:17
|
Hi all --- This is the announcement some of you have been expecting. RMS has appointed Cornell graduate student Mark Wright (aka Koni) <mh...@co...> as the new maintainer of GNU Queue. Please welcome him! -- Werner G. Krebs, Ph.D. Technical Specialist Personal homepage: http://www.wernergkrebs.com |
From: wernerkrebs <wer...@ya...> - 2005-05-12 18:07:36
|
--- Koni <mh...@co...> wrote: > On Wed, 2005-05-11 at 22:01 -0700, wernerkrebs > wrote: > > Progamming is partially an artform as much as a > > science, and it's usually best to try to use the > most > > modern techniques that everyone else is using. > This > > way, you can leverage off what other people in the > > community are doing, and develop synergy with > other > > projects. GQ was modern for its time, but lots has > > happened since it came out. > > > > I agree here in principle, but some things that > might be called modern > techniques are really just trendy to me. > > I think we (we as in all interested parties on this > list) first need to > sort out a couple critical items: > > First, do we wish or intend to subscribe firmly to > the RMS point of view > of free software for this project? If GNU Queue > remains "GNU", then I > think this is most relevant albeit restrictive. A > firm commitment to > that would limit our discussions about mysql or high > performance > commercial database backends and use of Java or > C#/Mono. Up to you. >;-> > Second, I think we should define a limited scope of > problems that GQ > will purport to solve and do that well. I am > confused by the idea of > meta-clustering. It seems like it won't be much work > on us to have some > meta-controller interface with GQ's controller, but > why do people do > this? Lack of centralized administrative control of the different clusters, which are spread out at different institutions. Computational biology (specifically simulation and modelling of protein molecules for rational drug design) in support of the multi-billion dollar pharmaceutical R&D budget is the current "killer app" for "the grid", in part because the ageing population of affulent countries is seen as creating the business model to sell the next generation of supercomputers. (GQ was originally written to help in econometrics problems, but I apparently moved in comp. bio. with the rest of the supercomputer market as I underwent my doctoral studies.) "The Grid" is the current buzzword that describes what GQ does (aka "clustering", but on a wide-area, multi-institution scale). You can read books on Amazon.com on it; I know some of the authors personally. It's a word that didn't even exist when GQ was first released. The idea is that multiple institutions in different countries across the globe would loan their clusters to form a meta-cluster, connected by a dedicated TeraBit backbone network currently being built. Time on this meta cluster might then be rented to pharmaceutical companies. There is very serious government money in this area. (It's also an area that your institution, Cornell, has been heavily involved in, and has it's dedicated own Linux clusters. The person at Cornell is Prof. Harold Scheraga, whom I've had the privilege of meeting.) Your analysis may be dead on: it might not be the thing for GQ, which maybe should focus on small clusters and easy installation. Still, it's something to keep in mind. If you want to write a GQ manual someday, it will sell a lot more copies if you advertise GQ as facilitating "The Grid" or "The TeraGrid." Similarily, the GQ website would pick up a lot more traffic if we added "Grid Applications" or "TeraGrid Applications" somewhere on the webpage. "Clustering" is the old-school term, I'm afraid. >:-> > I can think of a few reasons, but are the > needs of these > applications fitting within the problem space we > want to address? Don't know; your call. > I see a growing market of small dedicated > homogeneous systems with > simple distribution needs, driven by cheap computer > prices. In time up > to present or perhaps into the future some more, > interest heterogeneous > and non-dedicated systems I think is coming from > interest in tapping the > resource of idle desktop CPUs. With that comes > substantial complexity > that other projects like condor have embraced and > seem to be doing > well. > > I don't think its beneficial to GQ to try to tackle > some of the same > problems if condor already fills that niche. I also > think many groups > with new interest in this area will have small > dedicated systems and be > looking for something simple and easy. Thus, this > seems like an > expanding niche that would be well filled by > something that takes 2 > minutes to setup and start using. GQ can do this, > especially if it is > stand-alone with no backend database and no library > dependencies. > > I think the ease of setup and use will make a big > difference in the > adoption of GQ as a solution in the situations where > it is applicable. Sounds good. >:-> > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Oracle Space > Sweepstakes > Want to be the first software developer in space? > Enter now for the Oracle Space Sweepstakes! > http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click > _______________________________________________ > Queue-developers mailing list > Que...@li... > To unsubscribe, subscribe, or set options: > https://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: Koni <mh...@co...> - 2005-05-12 13:14:56
|
On Wed, 2005-05-11 at 22:01 -0700, wernerkrebs wrote: > Progamming is partially an artform as much as a > science, and it's usually best to try to use the most > modern techniques that everyone else is using. This > way, you can leverage off what other people in the > community are doing, and develop synergy with other > projects. GQ was modern for its time, but lots has > happened since it came out. > I agree here in principle, but some things that might be called modern techniques are really just trendy to me. I think we (we as in all interested parties on this list) first need to sort out a couple critical items: First, do we wish or intend to subscribe firmly to the RMS point of view of free software for this project? If GNU Queue remains "GNU", then I think this is most relevant albeit restrictive. A firm commitment to that would limit our discussions about mysql or high performance commercial database backends and use of Java or C#/Mono. Second, I think we should define a limited scope of problems that GQ will purport to solve and do that well. I am confused by the idea of meta-clustering. It seems like it won't be much work on us to have some meta-controller interface with GQ's controller, but why do people do this? I can think of a few reasons, but are the needs of these applications fitting within the problem space we want to address? I see a growing market of small dedicated homogeneous systems with simple distribution needs, driven by cheap computer prices. In time up to present or perhaps into the future some more, interest heterogeneous and non-dedicated systems I think is coming from interest in tapping the resource of idle desktop CPUs. With that comes substantial complexity that other projects like condor have embraced and seem to be doing well. I don't think its beneficial to GQ to try to tackle some of the same problems if condor already fills that niche. I also think many groups with new interest in this area will have small dedicated systems and be looking for something simple and easy. Thus, this seems like an expanding niche that would be well filled by something that takes 2 minutes to setup and start using. GQ can do this, especially if it is stand-alone with no backend database and no library dependencies. I think the ease of setup and use will make a big difference in the adoption of GQ as a solution in the situations where it is applicable. |
From: wernerkrebs <wer...@ya...> - 2005-05-12 05:01:44
|
Progamming is partially an artform as much as a science, and it's usually best to try to use the most modern techniques that everyone else is using. This way, you can leverage off what other people in the community are doing, and develop synergy with other projects. GQ was modern for its time, but lots has happened since it came out. 1. In certain modern environments (Java or C#/Mono), using SOAP/XML actually simplifies things. The complexity gets shifted to the libraries. Moreover, it adds the ability to then become a "standard" that work together with other environments --- open the possibility of easily making GQ work in non-homogenous environments, something that was much requested by the users. Check out Ganglia, another open source project, originally developed at UCB. http://ganglia.sourceforge.net/ This is becoming one of the standards for determining load averages on remote machines. Meta-clustering systems, such as APST, recognize this as one of the protocols for querying load averages and other information. So, it would be good if GQ supported a standard monitoring protocol like ganglia. You'd have to run ganglia somewhere anyway if you wanted to use a meta system like apst, so GQ could feed off that, rather than, or in addition to, its own load monitoring code. Or, GQ could continue to use its own monitoring code, but also support the ganglia protocol (it's open source, after all), so there wouldn't be a need to run two load monitoring daemons. Ganglia exchanges using information using XML, of course. 2. Regarding your concerns about SQL scalability: a lot of work was gone into making SQL environments highly scalable --- its a huge issue for corporations and anyone trying to run a high-traffic, mission critical website (been there). Remember the old commercials for a certain computer services firm about the startup that didn't consult with them, and therefore its website wasn't scalable? mysql (which RMS wouldn't want us to mention here because it is semi-commercial --- he wants us to say 'postgresql' because that's completely free), may not be one of them, but the major two commercial SQL are designed to be highly scalable in a cluster environment (this is why one pays big bucks in support and fees for them instead of mysql). The other solution, commonly used in the website world (J2EE and .NET/Mono), involves so-called 3-tiered architectures. Basically, the idea is to have another set of threads, which can potentially run on another CPU or another machine, handle all the actual communication with client (web browser) and cache some of the SQL queries (pre-compilation), and, in some cases, even cache the results when possible. This takes a significant load off the back-end SQL database, which can now handles 1000s more clients. This approach would work with a GQ mananger by creating an intermediary gm daemon that would reply to a large number of clients by caching and periodically refreshing the results of a small number of SQL queries. SQL databases are at the leading edge of scalability technology (although often in the commercial rather than open source worlds) and have other benefits (standardization, so other clients can interact, and existing SQL database management tools can be used). Still, I'll let you decide how you think it's best to do this. --- Koni <mh...@co...> wrote: > On Tue, 2005-05-10 at 08:56 -0700, wernerkrebs > wrote: > > > > Two comments. > > > > 1. Regarding the protocol, GQ's protocols largely > > predated modern RPC standards, such as SOAP and > XML. > > > > I'm not sure any of these things are worth their > weight in a homogeneous > system. The communication between the GQ system as I > have envisioned it > is pretty lightweight and there is very little > structure to the > information. In this case, I think using XML or SOAP > for a communication > layer adds complexity (in my mind) which is contrary > to their purpose in > general. > > [snip] > > > I would think some of the current features of the > GQ > > TCP/IP protocol would be best done using some sort > of > > SOAP implementation. For example, aspects of the > > initial authentication, and querying load > information > > would be best done using SOAP. > > > > I don't think SOAP will do much for us regarding > authentication. The > authentication stuff here is really simple (to me). > Perhaps for load > information if a lot of detail is returned (like all > the information ps > would return say). As for authentication, its > already implemented as a > simple challenge handshake (initial authentication): > > qd qm > > auth/register request > (send nonce) --------> > > sign nonce with > system key, > <------- reply with our own > nonce > > verify response --------> > sign qm nonce > > <-------- verify response, > send session key > > > If either verification fails, the offended party > stops the protocol. > Receipt of the session key indicates to qd that the > challenge handshake > protocol completed successfully. After that, all > communication between > the qd and qm come with simple signatures using that > key. The complexity > of the generation of signatures and verification of > them is already more > or less isolated from the logic if handling the > message payload. > > > > > Also, since GQ was written, standard protocols for > > this type of thing have emerged. Look at > Apst/Apstd > > system at SDSC (where, ironically, I used to work, > > although not on that project): > > > > http://grail.sdsc.edu/projects/apst/ > > > > Apst is a meta demon for cluster demons. It > doesn't > > currently support starting jobs using GQ, but does > > support starting other (commerical) systems. GQ > > support would be fairly trivial for them to add, > if > > they wanted to. SDSC (part of UCSD) receives grant > > money from a firm that makes a GQ-like commercial > > product, so it's not clear if that's a direction > they > > want to go in. They do support the commerical > product. > > However, the source code is available, so the > > community is free to add support for GQ as well. > > > > Apst will query each cluster manager (this would > > similar to the qm program you are proposing) and > > obtain load information via an XML file returned > from > > the cluster manager. It will then decide how many > jobs > > to start on that particular cluster (which it will > > start using a crude ssh command-line protocol to > > submit the jobs and scp to first transfer the > relevant > > files into place). It's up to the cluster manager > to > > then distribute the jobs to the cluster nodes. > > > > Apst, which is C/C++ based (Apstd is available in > > Java) is similar to Nimrod, which is Java-based. > > Source code for all of these is available. > > This sounds interesting. It would be great for GQ, > whether GQ becomes my > new proposed implementation, remains as is, or > something else > altogether, contributing a "driver" (so to speak) so > that this meta > system can work with it would be cool and perhaps > broaden the market for > us. > > > > > 2. Regarding qm, a divison of the Texas > Instruments > > actually contributed a SQL-based qm in C++. (It > would > > require that an SQL database, preferably Open > Source > > and free such as Postgresql, be running on a > server). > > > > Cool. I was first thinking about job information > being managed by a > mysql (or postgres) backend, where the SQL engine > would handle things > like atomicity and persistent state information > across failure. Would > have been cake if I wrote qm in perl (I am very > familiar with Perl-DBI). > The only thing I don't like about this is the > potential high-latency -- > one (or more) threads insert to the job table (qs) > while some another > thread polls (qm) the table for new rows. Perhaps in > postgres there is a > way to install a trigger or something so polling is > unnecessary. I don't > think there is a way to do that in mysql. qm is > actually unnecessary if > qd's can talk to the SQL engine directly. SQL can > handle authentication > and atomicity and qd's can just compete for jobs. > That's kind of nice. > Not sure it will scale well though. 1000 qd's each > with persistent TCP > connection to mysql would create 1000 forked > processes at the database > server. > > > > This is part of the GQ distribution, but is > optional > > and not compiled by default (due to C++ autoconf > > problems at the time since resolved. Also, users > wrote > > to me explaining their preference for a small, > simple > > package with peer-to-peer behavior, rather than a > > centralized package with a manager that might > crash, > > so the original behavior of GQ remained the > default.) > > > > Beforing writing a manager from scratch, you might > > want to look at the manager code and documentation > > that TI's subsidary contributed. > > OK, I'll try to have a look. The manager is almost > already all written > though in my haste to flesh out ideas rolling around > in my head. I shall > post a tarball of the code shortly. I want to add at > least a rudimentary > support for actually submitting a job to the system > and having it > execute. While I'm doing that, we can get a better > feel for who is out > there reading this list and what interest there is. > > Thanks for your comments Werner, I appreciate your > insights greatly. > > Cheers, > Koni > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Oracle Space > Sweepstakes > Want to be the first software developer in space? > Enter now for the Oracle Space Sweepstakes! > http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click > _______________________________________________ > Queue-developers mailing list > Que...@li... > To unsubscribe, subscribe, or set options: > https://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: <bo...@pr...> - 2005-05-11 05:21:39
|
Koni wrote: > I envision 4 separate programs working together in this system: > > qs: Users use this program (like "queue" or "qsh" in GNU Queue) to > submit jobs [ Presently not implemented at all ] It would be bonus points to make this interoperable with the POSIX standard. As far as I can tell this is simply PBS. http://www.opengroup.org/onlinepubs/009695399/utilities/qsub.html Most queue software is completely unique and people don't really expect POSIX conformance at this point in history. It just does not seem to have the grip in the industry that conformance to other parts of the POSIX spec have. So I would not say this is really important at this time. But at least being aware of it would be good. > Some design goals/choices: > > NFS is not used for communication and distribution of the jobs. This was > a primary goal in the design for me. After getting into it, I have new > appreciation for the design of GNU Queue though. :) I looked into GNU queue way back in the beginning. But the integrated NFS as an integral part of the design made it unsuitable for my use with several thousand compute servers. I have been a lurker on the list every since because GNU queue had some nice features. It would be nice if a project goal were to have code portable to a wide range of platforms. I would say GNU/Linux, HP-UX, Solaris, AIX, SGI, Mac OS X, at a start. What this really means is avoiding a lot of heavy dependencies from weird libraries. Trying to build a large project with fifty library dependencies in a mix of C++ with heavy STL and C# and Java from scratch on AIX is not an easy task. > executed as fast as possible. the TIME_WAIT state of a closed TCP > connection hogs the system resources on the qm host, potentially I have actually seen this often on my network just doing normal TCP activity. I believe that flakey network hardware can raise the likelyhood of this problem drastically. I don't have a solution other than waiting for the TIME_WAIT timeout to clear the network stack. If you are opening and closing a lot of network connections very rapidly I think this could be a problem on bigger networks. But if this is only one connection per job then on my network that would not be a problem as that is well within tolerable limits. Bob |
From: Koni <mh...@co...> - 2005-05-11 00:05:38
|
On Tue, 2005-05-10 at 08:56 -0700, wernerkrebs wrote: > > Two comments. > > 1. Regarding the protocol, GQ's protocols largely > predated modern RPC standards, such as SOAP and XML. > I'm not sure any of these things are worth their weight in a homogeneous system. The communication between the GQ system as I have envisioned it is pretty lightweight and there is very little structure to the information. In this case, I think using XML or SOAP for a communication layer adds complexity (in my mind) which is contrary to their purpose in general. [snip] > I would think some of the current features of the GQ > TCP/IP protocol would be best done using some sort of > SOAP implementation. For example, aspects of the > initial authentication, and querying load information > would be best done using SOAP. > I don't think SOAP will do much for us regarding authentication. The authentication stuff here is really simple (to me). Perhaps for load information if a lot of detail is returned (like all the information ps would return say). As for authentication, its already implemented as a simple challenge handshake (initial authentication): qd qm auth/register request (send nonce) --------> sign nonce with system key, <------- reply with our own nonce verify response --------> sign qm nonce <-------- verify response, send session key If either verification fails, the offended party stops the protocol. Receipt of the session key indicates to qd that the challenge handshake protocol completed successfully. After that, all communication between the qd and qm come with simple signatures using that key. The complexity of the generation of signatures and verification of them is already more or less isolated from the logic if handling the message payload. > Also, since GQ was written, standard protocols for > this type of thing have emerged. Look at Apst/Apstd > system at SDSC (where, ironically, I used to work, > although not on that project): > > http://grail.sdsc.edu/projects/apst/ > > Apst is a meta demon for cluster demons. It doesn't > currently support starting jobs using GQ, but does > support starting other (commerical) systems. GQ > support would be fairly trivial for them to add, if > they wanted to. SDSC (part of UCSD) receives grant > money from a firm that makes a GQ-like commercial > product, so it's not clear if that's a direction they > want to go in. They do support the commerical product. > However, the source code is available, so the > community is free to add support for GQ as well. > > Apst will query each cluster manager (this would > similar to the qm program you are proposing) and > obtain load information via an XML file returned from > the cluster manager. It will then decide how many jobs > to start on that particular cluster (which it will > start using a crude ssh command-line protocol to > submit the jobs and scp to first transfer the relevant > files into place). It's up to the cluster manager to > then distribute the jobs to the cluster nodes. > > Apst, which is C/C++ based (Apstd is available in > Java) is similar to Nimrod, which is Java-based. > Source code for all of these is available. This sounds interesting. It would be great for GQ, whether GQ becomes my new proposed implementation, remains as is, or something else altogether, contributing a "driver" (so to speak) so that this meta system can work with it would be cool and perhaps broaden the market for us. > > 2. Regarding qm, a divison of the Texas Instruments > actually contributed a SQL-based qm in C++. (It would > require that an SQL database, preferably Open Source > and free such as Postgresql, be running on a server). > Cool. I was first thinking about job information being managed by a mysql (or postgres) backend, where the SQL engine would handle things like atomicity and persistent state information across failure. Would have been cake if I wrote qm in perl (I am very familiar with Perl-DBI). The only thing I don't like about this is the potential high-latency -- one (or more) threads insert to the job table (qs) while some another thread polls (qm) the table for new rows. Perhaps in postgres there is a way to install a trigger or something so polling is unnecessary. I don't think there is a way to do that in mysql. qm is actually unnecessary if qd's can talk to the SQL engine directly. SQL can handle authentication and atomicity and qd's can just compete for jobs. That's kind of nice. Not sure it will scale well though. 1000 qd's each with persistent TCP connection to mysql would create 1000 forked processes at the database server. > This is part of the GQ distribution, but is optional > and not compiled by default (due to C++ autoconf > problems at the time since resolved. Also, users wrote > to me explaining their preference for a small, simple > package with peer-to-peer behavior, rather than a > centralized package with a manager that might crash, > so the original behavior of GQ remained the default.) > > Beforing writing a manager from scratch, you might > want to look at the manager code and documentation > that TI's subsidary contributed. OK, I'll try to have a look. The manager is almost already all written though in my haste to flesh out ideas rolling around in my head. I shall post a tarball of the code shortly. I want to add at least a rudimentary support for actually submitting a job to the system and having it execute. While I'm doing that, we can get a better feel for who is out there reading this list and what interest there is. Thanks for your comments Werner, I appreciate your insights greatly. Cheers, Koni |
From: wernerkrebs <wer...@ya...> - 2005-05-10 15:56:20
|
Two comments. 1. Regarding the protocol, GQ's protocols largely predated modern RPC standards, such as SOAP and XML. These are most suited for easy implementation in Java or C#/Mono, where there are excellent development environments (e.g., Eclipse, JBuilder, Visual Studio .NET, etc.) that will almost write the code for you automagically. Of these, Eclipse and Mono are Open Source, so suitable for consideration here. SOAP does fully support C,C++, even Perl, however, although development is slightly more difficult IMHO. I would think some of the current features of the GQ TCP/IP protocol would be best done using some sort of SOAP implementation. For example, aspects of the initial authentication, and querying load information would be best done using SOAP. Also, since GQ was written, standard protocols for this type of thing have emerged. Look at Apst/Apstd system at SDSC (where, ironically, I used to work, although not on that project): http://grail.sdsc.edu/projects/apst/ Apst is a meta demon for cluster demons. It doesn't currently support starting jobs using GQ, but does support starting other (commerical) systems. GQ support would be fairly trivial for them to add, if they wanted to. SDSC (part of UCSD) receives grant money from a firm that makes a GQ-like commercial product, so it's not clear if that's a direction they want to go in. They do support the commerical product. However, the source code is available, so the community is free to add support for GQ as well. Apst will query each cluster manager (this would similar to the qm program you are proposing) and obtain load information via an XML file returned from the cluster manager. It will then decide how many jobs to start on that particular cluster (which it will start using a crude ssh command-line protocol to submit the jobs and scp to first transfer the relevant files into place). It's up to the cluster manager to then distribute the jobs to the cluster nodes. Apst, which is C/C++ based (Apstd is available in Java) is similar to Nimrod, which is Java-based. Source code for all of these is available. 2. Regarding qm, a divison of the Texas Instruments actually contributed a SQL-based qm in C++. (It would require that an SQL database, preferably Open Source and free such as Postgresql, be running on a server). This is part of the GQ distribution, but is optional and not compiled by default (due to C++ autoconf problems at the time since resolved. Also, users wrote to me explaining their preference for a small, simple package with peer-to-peer behavior, rather than a centralized package with a manager that might crash, so the original behavior of GQ remained the default.) Beforing writing a manager from scratch, you might want to look at the manager code and documentation that TI's subsidary contributed. -- Werner G. Krebs, Ph.D. Technical Specialist Personal website: http://www.wernergkrebs.com --- Koni <mh...@co...> wrote: > > As promised before, here is some more details about > the new system I > mentioned before. > > Werner: This is more or less identical to what I > sent you previously. > > > > I envision 4 separate programs working together in > this system: > > qs: Users use this program (like "queue" or "qsh" in > GNU Queue) to > submit jobs [ Presently not implemented at all ] > > qm: the queue manager running on some central host. > qs sends job > requests to qm. [ ~60% implemented ] > > qd: daemon running on slave or "compute" nodes, > possibly on the same > host as qm as well. More than one qd may run on any > host, there may be > any number of them on any number of hosts. Only qm > talks to qd's, > sending jobs as available. The distribution protocol > works as a > offer/volunteer system, qm sends offers to multiple > qd's at once for the > same job, willing qd's respond will a volunteer. qm > assigns the job to > exactly one qd. qd may refuse at this point too > (resets job to offer > stage), or commit and receive transfer of the job > and begin execution. > Important point is qd's decide autonomously if they > can spare resources > for the job. qm has some state information of the > availability of the > qd's it knows about and does not send offers to qd's > it knows are fully > committed, but qm does not need an accurate > perception, it's qd's > decision. [ ~70% implemented ] > > qe: Execution agent forked and exec'd by the qd > process for running a > job. qe is responsible for setting the environment, > calling back to > waiting qs if foreground mode selected (called > interactive mode in GNU > Queue I think), validating and changing to the user > of the job, > monitoring for the termination of the job and return > code, etc. qe is > the only part of this system that needs to be setuid > root. qd and qm may > need to start as root to read the system wide key > file (see below) but > can drop privilege permanently after that. [ > currently a trivial program > which just returns immediately is presently > implemented ] > > Some design goals/choices: > > NFS is not used for communication and distribution > of the jobs. This was > a primary goal in the design for me. After getting > into it, I have new > appreciation for the design of GNU Queue though. :) > > Stateless UDP is used for communication for qm and > qd, which results in > some complexity in the code due to the possibility > of lost messages. > This is a design goal as persistent TCP connections > consume file > descriptors limiting the number of qd's that can be > connected to qm. I > would like this to scale well beyond typical limits > for open file > descriptors. > > All messages between qd and qm are cryptographically > signed [ this is > already fully implemented ] using key'd SHA-1. On > connection, a > registration protocol verifies the authenticity of > both qm and qd by > proving knowledge of a system-wide key. After > registration, each qd is > assigned a session key used to sign messages after > that. qs will > communicate username/password information > (encrypted) to qm which will > ultimately be passed from qd to qe which will > authenticate before > switching to the user requested. > > Much effort is being put into low latency > distribution of jobs. > Experimenting with the version of GNU Queue I have, > after making several > changes to get it to go and all, it takes a second > or more between > submission of a job and onset of execution in an > idle cluster. Much of > this I think is due to built-in deliberate delays to > work-around NFS > race conditions, hence my interest in eliminating > NFS as a communication > layer between submitting users and execution agents. > My present system > is seemingly instantaneous on an idle cluster (but > much is not > implemented yet), my goal is to have latency for > executing say 1000 no- > op jobs on a system with a single qd agent > comparable to that of a shell > script doing the same directly. > > Some drawbacks: > > Security rests ultimately with the privacy of the > system-wide key file, > which must be installed or accessible to both qm and > all qd agents. > > All systems running qd must have access to the same > authentication > system for validating username/password for > submitting users. NIS or > something equivalent is probably the easiest both > for me as developer > and administrators at large who might use this > thing. We can potentially > use a custom arrangement through PAM too. > > NFS or other shared network filesystem still > required for user jobs to > read/write input and output, unless they only want > to use stdin/stdout > in which case qs can handle it. I don't consider > this a problem really > for dedicated systems. > > Job transfer takes place over a transient TCP > connection, but I've > noticed this can cause a hiccup (qm pauses for > several seconds but > eventually resumes rapid distribution of jobs) if > the TCP SYN packet is > lost, which seems to happen after about 30,000 jobs > have been sent and > executed as fast as possible. the TIME_WAIT state of > a closed TCP > connection hogs the system resources on the qm host, > potentially > blocking the opening of new connections until > resources are available. > This is only a problem in the pathological case of > >30,000 no-op jobs at > once, surely not a real-world problem. Presently the > system will pause > if the SYN packet is dropped when forming a new > connection, and will > wait until both enough TIME_WAIT old TCP connections > are cleared and the > SYN retransmit timer expires, at which point > connection is established > and distribution commences again. > > This system has a central manager, qm, which the > present GNU Queue does > not. Failure at qm will cause the whole cluster to > stop executing jobs > after their present assignments. This does not > happen with GNU Queue, > unless the NFS server goes down. However, when NFS > comes back, provided > no corruption to the filesystem, everything > continues. My system will > need some crash-recovery complexity for qm. qd's can > die and come back > all they like. > > > Comments are welcome. If you want to peak at the > code, reply back to > this list. If there is interest and no objections, I > will post a copy of > the source as is to the list. It doesn't do much for > the moment except > implement the qm -> qd -> qe chain of events and > demonstrate the > distribution of jobs. > > Cheers, > Koni > > On Sun, 2005-05-01 at 19:38 -0400, Richard Stallman > wrote: > > Anyhow, I suggested in my email to Koni and > Mike that > > we wait a week or two for Mike to respond. > > > > I think that is a reasonable plan. The program > needs a maintainer who > > will make releases, and more generally, who will > give the program > > proper attention. > > > > At some > > point, we'd post publically, and then wait > about 30 > > days or some reasonable time. > > > > I don't understand that part. Wait 30 days for > what? > > > > What's the standard procedure for reclaiming > an > > abandoned GNU project? > > > > I can appoint (and remove) maintainers at any > time. > > So once the situation is clear, I can simply > appoint > > a new maintainer for GNU Queue. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. > Get your fingers limbered up and give it your best > shot. 4 great events, 4 > opportunities to win big! Highest score wins.NEC IT > Guy Games. Play to > win an NEC 61 plasma display. Visit > http://www.necitguy.com/?r=20 > _______________________________________________ > Queue-developers mailing list > Que...@li... > To unsubscribe, subscribe, or set options: > https://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: Jordi B. C. <jbu...@oc...> - 2005-05-10 08:36:33
|
Hi Koni, all, On Monday 09 May 2005 23:47, Koni wrote: > Comments are welcome. If you want to peak at the code, reply back to > this list. If there is interest and no objections, I will post a copy of > the source as is to the list. It doesn't do much for the moment except > implement the qm -> qd -> qe chain of events and demonstrate the > distribution of jobs. I would be very interested in having a look at the code. I tried to use GNU Queue a few weeks ago but found the same problems you did. Now I am considering Sun ONE Grid Engine as a batch system for our clusters, but would like to experiment with your new code. Cheers, jordi |
From: Koni <mh...@co...> - 2005-05-09 21:45:35
|
As promised before, here is some more details about the new system I mentioned before. Werner: This is more or less identical to what I sent you previously. I envision 4 separate programs working together in this system: qs: Users use this program (like "queue" or "qsh" in GNU Queue) to submit jobs [ Presently not implemented at all ] qm: the queue manager running on some central host. qs sends job requests to qm. [ ~60% implemented ] qd: daemon running on slave or "compute" nodes, possibly on the same host as qm as well. More than one qd may run on any host, there may be any number of them on any number of hosts. Only qm talks to qd's, sending jobs as available. The distribution protocol works as a offer/volunteer system, qm sends offers to multiple qd's at once for the same job, willing qd's respond will a volunteer. qm assigns the job to exactly one qd. qd may refuse at this point too (resets job to offer stage), or commit and receive transfer of the job and begin execution. Important point is qd's decide autonomously if they can spare resources for the job. qm has some state information of the availability of the qd's it knows about and does not send offers to qd's it knows are fully committed, but qm does not need an accurate perception, it's qd's decision. [ ~70% implemented ] qe: Execution agent forked and exec'd by the qd process for running a job. qe is responsible for setting the environment, calling back to waiting qs if foreground mode selected (called interactive mode in GNU Queue I think), validating and changing to the user of the job, monitoring for the termination of the job and return code, etc. qe is the only part of this system that needs to be setuid root. qd and qm may need to start as root to read the system wide key file (see below) but can drop privilege permanently after that. [ currently a trivial program which just returns immediately is presently implemented ] Some design goals/choices: NFS is not used for communication and distribution of the jobs. This was a primary goal in the design for me. After getting into it, I have new appreciation for the design of GNU Queue though. :) Stateless UDP is used for communication for qm and qd, which results in some complexity in the code due to the possibility of lost messages. This is a design goal as persistent TCP connections consume file descriptors limiting the number of qd's that can be connected to qm. I would like this to scale well beyond typical limits for open file descriptors. All messages between qd and qm are cryptographically signed [ this is already fully implemented ] using key'd SHA-1. On connection, a registration protocol verifies the authenticity of both qm and qd by proving knowledge of a system-wide key. After registration, each qd is assigned a session key used to sign messages after that. qs will communicate username/password information (encrypted) to qm which will ultimately be passed from qd to qe which will authenticate before switching to the user requested. Much effort is being put into low latency distribution of jobs. Experimenting with the version of GNU Queue I have, after making several changes to get it to go and all, it takes a second or more between submission of a job and onset of execution in an idle cluster. Much of this I think is due to built-in deliberate delays to work-around NFS race conditions, hence my interest in eliminating NFS as a communication layer between submitting users and execution agents. My present system is seemingly instantaneous on an idle cluster (but much is not implemented yet), my goal is to have latency for executing say 1000 no- op jobs on a system with a single qd agent comparable to that of a shell script doing the same directly. Some drawbacks: Security rests ultimately with the privacy of the system-wide key file, which must be installed or accessible to both qm and all qd agents. All systems running qd must have access to the same authentication system for validating username/password for submitting users. NIS or something equivalent is probably the easiest both for me as developer and administrators at large who might use this thing. We can potentially use a custom arrangement through PAM too. NFS or other shared network filesystem still required for user jobs to read/write input and output, unless they only want to use stdin/stdout in which case qs can handle it. I don't consider this a problem really for dedicated systems. Job transfer takes place over a transient TCP connection, but I've noticed this can cause a hiccup (qm pauses for several seconds but eventually resumes rapid distribution of jobs) if the TCP SYN packet is lost, which seems to happen after about 30,000 jobs have been sent and executed as fast as possible. the TIME_WAIT state of a closed TCP connection hogs the system resources on the qm host, potentially blocking the opening of new connections until resources are available. This is only a problem in the pathological case of >30,000 no-op jobs at once, surely not a real-world problem. Presently the system will pause if the SYN packet is dropped when forming a new connection, and will wait until both enough TIME_WAIT old TCP connections are cleared and the SYN retransmit timer expires, at which point connection is established and distribution commences again. This system has a central manager, qm, which the present GNU Queue does not. Failure at qm will cause the whole cluster to stop executing jobs after their present assignments. This does not happen with GNU Queue, unless the NFS server goes down. However, when NFS comes back, provided no corruption to the filesystem, everything continues. My system will need some crash-recovery complexity for qm. qd's can die and come back all they like. Comments are welcome. If you want to peak at the code, reply back to this list. If there is interest and no objections, I will post a copy of the source as is to the list. It doesn't do much for the moment except implement the qm -> qd -> qe chain of events and demonstrate the distribution of jobs. Cheers, Koni On Sun, 2005-05-01 at 19:38 -0400, Richard Stallman wrote: > Anyhow, I suggested in my email to Koni and Mike that > we wait a week or two for Mike to respond. > > I think that is a reasonable plan. The program needs a maintainer who > will make releases, and more generally, who will give the program > proper attention. > > At some > point, we'd post publically, and then wait about 30 > days or some reasonable time. > > I don't understand that part. Wait 30 days for what? > > What's the standard procedure for reclaiming an > abandoned GNU project? > > I can appoint (and remove) maintainers at any time. > So once the situation is clear, I can simply appoint > a new maintainer for GNU Queue. |
From: Koni <mh...@co...> - 2005-05-09 17:37:23
|
Hello all, My name is Koni, I am interested to know what the plans and outlook is for the future development of the GNU queue package. I've recently attempted to contact the current maintainer Mike Castle through his sourceforge contact address but there has been no response now over two weeks. The present release available through sourceforge does not compile on recent Linux systems as I imagine many subscribers to this list already know. I have resolved several compile and run-time issues with the queue-stable branch on sourceforge and succeeded in making GNU queue run on my small Linux cluster here in my lab at Cornell. Please see the patch posted by sourceforge user "cryptopup" (me) in the patches section for more information. While queue runs on my systems here, it still has problems making unsuitable still for general use in my lab. Since the sourceforge site indicates the project has been essentially dormant for several years, I decided to contact Dr. Krebs and Mr. Castle to figure out if it was worth more of my time to work on fixing GNU queue to use for my needs here, or whether or not I should look into something else or perhaps even do my own thing. In the meantime while I waited for a response from Mike, I read through more of the GNU queue code. I have since decided to begin a de novo development of my own system, inspired largely by what I like about how GNU queue is intended to be used. Experimenting with ideas, I am now nearly 3000 lines of code into this and I'm reasonably confident the direction I am going will result in a robust and scalable system. I will post a description of the design to this list shortly for your reference. Dr. Krebs, as well as Richard Stallman, have expressed their interest in having me take over the development of GNU queue. As I am nearly ready to start my own open-source project with the code base I have developed, I am now writing to the broad community via this mailing list to raise this issue. This is a community decision. I have no interest in promoting myself here. Instead I am offering what I am doing to this already established community that seems to be lacking active development at the moment. The benefit for me is immediate gain of a capable group of developers and user base to help take what is presently a toy on my laptop to a respectable and broadly usable system that achieves the goals of GNU queue. I will be perfectly plain and upfront and say that if I am assigned the responsibility for maintaining this project, I will first replace the existing code base with my own. As developers, you should consider that seriously. This will be a completely disruptive change. My intended path of development is to complete the partially implemented system I have now, and add features that are present in GNU queue (like tty emulation) to my system by cribbing some of the existing code. Again, I do not wish to preemptively take over someone else's project and circumvent a presently unknown (to me) development path. Instead, I am offering to pick up a dropped ball here if the community is interested. If it is decided that the community would prefer the present GNU queue system stay as is, I will simply continue my own project as a completely separate system with a different name. Your comments are needed to decide what to do here. All feedback is welcome including anyone who wants to tell me to just nose out and do my own thing somewhere else. Cheers, Koni |
From: wernerkrebs <wer...@ya...> - 2005-04-26 22:22:14
|
Hi, thanks for your comments, but please direct replies to the list (que...@li...) so there can be a community discussion. (Sending replies to the list will also save me from any interpretations of the net-etiquette of responding publicly to someone's private GQ comments. ;-) Years ago replies to queue-developers automatically went to the list, but I think SF changed policy.) As you know, I resigned as project admin some time ago, so I don't really have any more say than anyone else does. (As the official admin, Mike has the most say right now, actually, so he would be the only person to write to directly with GQ suggestions, really.) The Free Software way is that any big decision should be made by the community. Thanks. ---------------------------------------------------------------- Werner G. Krebs, Ph.D., Technical Specialist Personal website: http://www.wernergkrebs.com ---------------------------------------------------------------- --- "John McKowen Taylor Jr." <jo...@ca...> wrote: > Hi Werner, > > Mike's talk but no action is what led me to > lose interest in Queue, btw. > > best regards, > > -- johnT > John McKowen Taylor, Jr. > Cadence Design Systems, Inc. > 200 Regency Forest Dr. > Cary, NC 27511 USA > +1 919 481 6835 > > wernerkrebs wrote: > > >Hmmmm, what you say sounds good, but I think this > >discussion should be moved to queue-developers, as > I > >think it might be taking place in a vacuum. > > > >---------------------------------------------------------------- > >Werner G. Krebs, Ph.D., > >Technical Specialist > > > >Personal website: http://www.wernergkrebs.com > >---- > > > >--- Koni <mh...@co...> wrote: > > > > > > > >>Hi Werner, > >> > >>Thanks for your response. I guess I should clarify > >>for Mike here that I > >>don't mean to sound like I want to take things > over. > >>What I do want to > >>do is make it work like it must have at one time, > as > >>outlined in my > >>previous email. This is all primarily motived by > >>needs I have right now, > >>as well as another group here on campus that may > be > >>interested after > >>those goals are accomplished. > >> > >>I am undecided as to whether its worth the trouble > >>for my own needs here > >>given that there are other options out there. I > must > >>say I just really > >>like what I've read from the online docs for GQ, > and > >>I don't really like > >>what I read for condor and other options I've > >>encountered so far. What > >>would tip the balance for me is if my efforts to > do > >>this could feed back > >>to the well known project named "GNU queue" as > >>opposed to totally > >>unknown and probably never to be discovered GQ > >>derivative under a > >>different name, or something that is just used > >>locally by myself and my > >>collaborators but not publically released. > >> > >>That I guess depends on whether there is a > >>perception that there are a > >>lot of people like me out there using small but > >>dedicated linux-based > >>clusters, and whether or not what I'm proposing is > >>compatible with > >>existing plans. I don't mean to barge in, just > >>trying to figure out > >>whether or not I should keep trying to make GQ > work > >>here or not. What I > >>would do would probably totally break any existing > >>attempts at > >>checkpoint support as I don't think its worth the > >>trouble to do this. > >>Perhaps it could be attempted again on a new code > >>base using what was > >>learned from before. > >> > >>Cheers, > >>Koni > >> > >>On Fri, 2005-04-22 at 13:23 -0700, wernerkrebs > >>wrote: > >> > >> > >>>Hi Koni, > >>> > >>>Nice to hear that you're interested in GQ! > >>> > >>>Yes, I formally signed off on the project some > >>> > >>> > >>time > >> > >> > >>>ago as a result of working for the University of > >>>California, whose polices on IP created by > >>> > >>> > >>employees > >> > >> > >>>is famous for being less than GPL-friendly. (And > >>> > >>> > >>now > >> > >> > >>>I've changed jobs again.) > >>> > >>>Mike is the current project admin. > >>> > >>>Last I heard --- which was some time ago --- he > >>> > >>> > >>was > >> > >> > >>>still very interested, and doing all sorts of > cool > >>>stuff in the CVS tree. > >>> > >>>(I think he was doing most of the work in the > >>>queue-development branch, which should have the > >>> > >>> > >>latest > >> > >> > >>>versions. One problem with SourceForge is that it > >>> > >>> > >>is > >> > >> > >>>impossible to rename a CVS branch, so if you > start > >>> > >>> > >>to > >> > >> > >>>deposit stuff in a branch whose name is > >>>less-than-ideal, it tends to stay there. You can > >>> > >>> > >>ask > >> > >> > >>>the SourceForge admins theoretically, although > >>> > >>> > >>tend > >> > >> > >>>not to process those type of requests.) > >>> > >>>I think standard Open Source is etiquette is to > >>> > >>> > >>first > >> > >> > >>>write the authors privately (which you've now > >>> > >>> > >>done), > >> > >> > >>>then wait some reasonable period of time for a > >>>response. If there's no response, then you can > ask > >>>publicaly, in, say, queue-developers. If there's > >>> > >>> > >>still > >> > >> > >>>no response after, say, 30 days, then it's yours. > >>> > >>>However, please be patient with Mike. I know he > >>> > >>> > >>has a > >> > >> > >>>full-time job (jobs?) as well as growing family. > >>> > >>> > >>So, > >> > >> > >>>give him at least a week before posting the > >>> > >>> > >>question > >> > >> > >>>publicaly, and don't be offended if he doesn't > get > >>>back to you right away. > >>> > >>>Mike will probably write back in a week or so > >>> > >>> > >>saying > >> > >> > >>>he's still doing work on GQ, but would really > >>> > >>> > >>welcome > >> > >> > >>>your help on GQ as a developer. > >>> > >>>However, if Mike is agreeable to your taking over > >>> > >>> > >>the > >> > >> > >>>project, or doesn't get back to you after you've > >>>posted publically for a month or so, then you can > >>> > >>> > >>just > >> > >> > >>>write RMS (with a CC to me), and it's yours. > >>> > >>>You could probably get some developer admins > >>> > >>> > >>(e.g., > >> > >> > >>>write access to the CVS tree) faster than that. > >>> > >>> > >>Ask > >> > >> > >>>Mike first, though. If he's agreeable, or if we > >>>haven't heard back from him in a week or two, I > >>> > >>> > >>can > >> > >> > >>>probably set you up with some write access. > >>> > >>>BTW, It's "Werner" to everyone except > >>> > >>> > >>telemarketers > >> > >> > >>>and such ilk, who are required to use "Dr. > Krebs." > >>> > >>> > >>> > >>> > >---------------------------------------------------------------- > > > > > >>>Werner G. Krebs, Ph.D., > >>>Technical Specialist > >>> > >>>Personal website: http://www.wernergkrebs.com > >>> > >>> > >>> > >---------------------------------------------------------------- > > > > > >>> > >>>--- Koni <mh...@co...> wrote: > >>> > >>> > >>> > >>>>Hello Mr. Castle and Mr. Krebs, > >>>> > >>>>If that should be Dr. for either of you, my > >>>>apologies. My name is Koni, > >>>>alias cryptopup on sourceforge. I am writing to > >>>>inquire about what > >>>>either of you are planning to do with GNU Queue. > >>>> > >>>> > >>I > >> > >> > >>>>know Mr. Krebs has > >>>>formally signed off from the project, but I am > >>>>sending this your way > >>>>anyway since you are still listed as project > >>>> > >>>> > >>admin > >> > >> > >>>>on sourceforge. > >>>> > >>>>I encountered GNU queue through a google search > >>>> > >>>> > >>and > >> > >> > >>>>was immediately > >>>>seduced by the documentation I found at > >>>>http://www.gnuqueue.org/queue_man/queue.html > >>>> > >>>> > >>which > >> > >> > >>>>comes up first at > >>>>google. By the way, this is way better than > >>>>sourceforge, don't take it > >>>>offline, point sourceforge back to there. > >>>>Sourceforge is so busy with > >>>>adds and irrelevent links. > >>>> > >>>>Anyway, files as given for release at source > >>>> > >>>> > >>forge, > >> > >> > >>>>or the queue-stable > >>>>cvs checkout do not compile on modern Linux > >>>> > >>>> > >>systems > >> > >> > >>>>(specifically fedora > >>>>or RHEL here) and from the postings there it > >>>> > >>>> > >>looks > >> > >> > >>>>like this has been > >>>>the case for perhaps up to 4 years. > >>>> > >>>>I have succeeded in correcting both compile and > >>>> > >>>> > >>some > >> > >> > >>>>runtime bugs and > >>>>posted a patch, FWIW, to sourceforge in the > >>>> > >>>> > >>patches > >> > >> > >>>>section. Check that > >>>>out to gauqe my abilities. I apologize for the > >>>>reformatting changes in > >>>>the diff, using -b didn't make very much > >>>> > >>>> > >>difference > >> > >> > >>>>from cvs diff. > >>>> > >>>>There are still serious problems though which I > >>>> > >>>> > >>am > >> > >> > >>>>trying to work > >>>>through. I want a system that can be used by > >>>>non-programmers as easily > >>>>(?) as it is to use simple job control of a UNIX > >>>>shell. Nothing too > >>>>fancy, just take a job, execute it when and > >>>> > >>>> > >>where > >> > >> > >>>>there are resources, > >>>>and that's it. Don't care about accounting, > >>>> > >>>> > >>process > >> > >> > >>>>migration, or > >>>>anything else. I actually found queue by > >>>> > >>>> > >>searching > >> > >> > >>>>for rsh replacement. > >>>> > >>>>>From working through the code to track the > >>>> > >>>> > >>bugs, it > >> > >> > >>>>seems clear to me > >>>>that a substantial overhaul/rewrite is badly > >>>> > >>>> > >>needed. > >> > >> > >>>>What I would like > >>>>to know from either of you is the following: > >>>> > >>>>1 - Is there an active known user base out there > >>>> > >>>> > >>for > >> > >> > >>>>this, or is GNU > >>>>Queue gone defunct by lack of user interest > >>>> > >>>> > >>and/or > >> > >> > >>>>development? <another > >>>>story> I know how this goes. If you hit my > >>>>sourceforge account you'll > >>>>find SLAN, a product which works (but not the > >>>>sourceforge copy) and I'm > >>>>using right now actually, but I can't be > >>>> > >>>> > >>bothered > >> > >> > >>>>maintain because there > >>>>is no interest now that IPSec is built-in to > >>>> > >>>> > >>modern > >> > >> > >>>>operating systems > >>>>and you can buy an IPSec VPN server at Staples > >>>> > >>>> > >>for > >> > >> > >>>>$60. </another > >>>>story> > >>>> > >>>>2 - If it worked as documented, is there still a > >>>>niche market for it? > >>>>Ie, has sun GRID, condor, openmosix, and > >>>> > >>>> > >>whatever > >> > >> > >>>>else covering the > >>>>range of interest out there? > >>>> > >>>>3 - Do either of you want to do anything else > >>>> > >>>> > >>with > >> > >> > >>>>this, and/or can I do > >>>>my own thing? > >>>> > >>>>I am thinking about rewriting gnu queue to root > >>>> > >>>> > >>out > >> > >> > >>>>the bugs, simplify > >>>>the code, clean it up, and add a few features > >>>> > >>>> > >>that > >> > >> > >>>>users in my lab here > >>>>will probably need. > >>>> > >>>>If there are (active?) solid development plans > >>>> > >>>> > >>and > >> > >> > >>>>things are just > >>>>happening beneath the radar, then I'll fork off > >>>> > >>>> > >>and > >> > >> > >>>>do my own thing if I > >>>>decide its worth the effort for just the local > >>>> > >>>> > >>group > >> > >> > >>>>here. Otherwise, > >>>>based on the empirical evidence at sourceforge, > >>>> > >>>> > >>it > >> > >> > >>>>seems both of you are > >>>>more or less done with this project. I apologize > >>>> > >>>> > >>if > >> > >> > >>>>I'm being > >>>>presumptuous in that conclusion. > >>>> > >>>>It may be worth mentioning that I do not plan to > >>>> > >>>> > >>do > >> > >> > >>>>anything with > >>>>checkpoint support, and probably never will. > >>>> > >>>> > >>Condor > >> > >> > >>>>addresses this need > >>>>in the case of non-dedicated systems. For small > >>>>dedicated cluster > >>>>systems, which is my perception of the proper > >>>> > >>>> > >>market > >> > >> > >>>>for GNU queue, > >>>>checkpoint is something that is either not > >>>> > >>>> > >>necessary > >> > >> > >>>>or something that > >>>>is better handled at the application level in > >>>>whatever special cases. It > >>>>seems safe to me to presume that those special > >>>> > >>>> > >>cases > >> > >> > >>>>are custom > >>>>development apps anyway. I think its a > >>>> > >>>> > >>complexity > >> > >> > >>>>nightmare the code > >>>>base in present form (as per cvs queue-stable) > >>>> > >>>> > >>can > >> > >> > >>>>not sustain. I have > >>>>not looked at the development branch. > >>>> > >>>>Thus, if I put serious effort into this, my > >>>> > >>>> > >>primary > >> > >> > >>>>goal will be to > >>>>clean up the code and try to create a stable > >>>> > >>>> > >>system > >> > >> > >>>>suitable for any > >>>>current linux small cluster system and add tools > >>>> > >>>> > >>to > >> > >> > >>>>allow users to see > >>>>the queue list, remove or kill jobs. > >>>> > >>>>This email is way longer than I thought it would > >>>> > >>>> > >>be. > >> > >> > >>>>Sorry 'bout that. > >>>>brevity is not my skill. :) > >>>> > >>>>Cheers > >>>>Koni > >>>> > >>>> > >>>>-- > >>>>mh...@co... > >>>>Koni (Mark Wright) > >>>>233 Biotechnology - Cornell University > >>>>Graduate Student - Genomics / Plant Cell > >>>> > >>>> > >>Molecular > >> > >> > >>>>Biology > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > > > > > > > >------------------------------------------------------- > >SF email is sponsored by - The IT Product Guide > >Read honest & candid reviews on hundreds of IT > Products from real users. > >Discover which products truly live up to the hype. > Start reading now. > >http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > >_______________________________________________ > >Queue-developers mailing list > Que...@li... > >To unsubscribe, subscribe, or set options: > >https://lists.sourceforge.net/lists/listinfo/queue-developers > > > > > > > > > > > > |
From: wernerkrebs <wer...@ya...> - 2005-04-25 21:44:01
|
Hmmmm, what you say sounds good, but I think this discussion should be moved to queue-developers, as I think it might be taking place in a vacuum. ---------------------------------------------------------------- Werner G. Krebs, Ph.D., Technical Specialist Personal website: http://www.wernergkrebs.com ---- --- Koni <mh...@co...> wrote: > Hi Werner, > > Thanks for your response. I guess I should clarify > for Mike here that I > don't mean to sound like I want to take things over. > What I do want to > do is make it work like it must have at one time, as > outlined in my > previous email. This is all primarily motived by > needs I have right now, > as well as another group here on campus that may be > interested after > those goals are accomplished. > > I am undecided as to whether its worth the trouble > for my own needs here > given that there are other options out there. I must > say I just really > like what I've read from the online docs for GQ, and > I don't really like > what I read for condor and other options I've > encountered so far. What > would tip the balance for me is if my efforts to do > this could feed back > to the well known project named "GNU queue" as > opposed to totally > unknown and probably never to be discovered GQ > derivative under a > different name, or something that is just used > locally by myself and my > collaborators but not publically released. > > That I guess depends on whether there is a > perception that there are a > lot of people like me out there using small but > dedicated linux-based > clusters, and whether or not what I'm proposing is > compatible with > existing plans. I don't mean to barge in, just > trying to figure out > whether or not I should keep trying to make GQ work > here or not. What I > would do would probably totally break any existing > attempts at > checkpoint support as I don't think its worth the > trouble to do this. > Perhaps it could be attempted again on a new code > base using what was > learned from before. > > Cheers, > Koni > > On Fri, 2005-04-22 at 13:23 -0700, wernerkrebs > wrote: > > Hi Koni, > > > > Nice to hear that you're interested in GQ! > > > > Yes, I formally signed off on the project some > time > > ago as a result of working for the University of > > California, whose polices on IP created by > employees > > is famous for being less than GPL-friendly. (And > now > > I've changed jobs again.) > > > > Mike is the current project admin. > > > > Last I heard --- which was some time ago --- he > was > > still very interested, and doing all sorts of cool > > stuff in the CVS tree. > > > > (I think he was doing most of the work in the > > queue-development branch, which should have the > latest > > versions. One problem with SourceForge is that it > is > > impossible to rename a CVS branch, so if you start > to > > deposit stuff in a branch whose name is > > less-than-ideal, it tends to stay there. You can > ask > > the SourceForge admins theoretically, although > tend > > not to process those type of requests.) > > > > I think standard Open Source is etiquette is to > first > > write the authors privately (which you've now > done), > > then wait some reasonable period of time for a > > response. If there's no response, then you can ask > > publicaly, in, say, queue-developers. If there's > still > > no response after, say, 30 days, then it's yours. > > > > However, please be patient with Mike. I know he > has a > > full-time job (jobs?) as well as growing family. > So, > > give him at least a week before posting the > question > > publicaly, and don't be offended if he doesn't get > > back to you right away. > > > > Mike will probably write back in a week or so > saying > > he's still doing work on GQ, but would really > welcome > > your help on GQ as a developer. > > > > However, if Mike is agreeable to your taking over > the > > project, or doesn't get back to you after you've > > posted publically for a month or so, then you can > just > > write RMS (with a CC to me), and it's yours. > > > > You could probably get some developer admins > (e.g., > > write access to the CVS tree) faster than that. > Ask > > Mike first, though. If he's agreeable, or if we > > haven't heard back from him in a week or two, I > can > > probably set you up with some write access. > > > > BTW, It's "Werner" to everyone except > telemarketers > > and such ilk, who are required to use "Dr. Krebs." > > > > > ---------------------------------------------------------------- > > Werner G. Krebs, Ph.D., > > Technical Specialist > > > > Personal website: http://www.wernergkrebs.com > > > ---------------------------------------------------------------- > > > > --- Koni <mh...@co...> wrote: > > > > > > > > Hello Mr. Castle and Mr. Krebs, > > > > > > If that should be Dr. for either of you, my > > > apologies. My name is Koni, > > > alias cryptopup on sourceforge. I am writing to > > > inquire about what > > > either of you are planning to do with GNU Queue. > I > > > know Mr. Krebs has > > > formally signed off from the project, but I am > > > sending this your way > > > anyway since you are still listed as project > admin > > > on sourceforge. > > > > > > I encountered GNU queue through a google search > and > > > was immediately > > > seduced by the documentation I found at > > > http://www.gnuqueue.org/queue_man/queue.html > which > > > comes up first at > > > google. By the way, this is way better than > > > sourceforge, don't take it > > > offline, point sourceforge back to there. > > > Sourceforge is so busy with > > > adds and irrelevent links. > > > > > > Anyway, files as given for release at source > forge, > > > or the queue-stable > > > cvs checkout do not compile on modern Linux > systems > > > (specifically fedora > > > or RHEL here) and from the postings there it > looks > > > like this has been > > > the case for perhaps up to 4 years. > > > > > > I have succeeded in correcting both compile and > some > > > runtime bugs and > > > posted a patch, FWIW, to sourceforge in the > patches > > > section. Check that > > > out to gauqe my abilities. I apologize for the > > > reformatting changes in > > > the diff, using -b didn't make very much > difference > > > from cvs diff. > > > > > > There are still serious problems though which I > am > > > trying to work > > > through. I want a system that can be used by > > > non-programmers as easily > > > (?) as it is to use simple job control of a UNIX > > > shell. Nothing too > > > fancy, just take a job, execute it when and > where > > > there are resources, > > > and that's it. Don't care about accounting, > process > > > migration, or > > > anything else. I actually found queue by > searching > > > for rsh replacement. > > > > > > >From working through the code to track the > bugs, it > > > seems clear to me > > > that a substantial overhaul/rewrite is badly > needed. > > > What I would like > > > to know from either of you is the following: > > > > > > 1 - Is there an active known user base out there > for > > > this, or is GNU > > > Queue gone defunct by lack of user interest > and/or > > > development? <another > > > story> I know how this goes. If you hit my > > > sourceforge account you'll > > > find SLAN, a product which works (but not the > > > sourceforge copy) and I'm > > > using right now actually, but I can't be > bothered > > > maintain because there > > > is no interest now that IPSec is built-in to > modern > > > operating systems > > > and you can buy an IPSec VPN server at Staples > for > > > $60. </another > > > story> > > > > > > 2 - If it worked as documented, is there still a > > > niche market for it? > > > Ie, has sun GRID, condor, openmosix, and > whatever > > > else covering the > > > range of interest out there? > > > > > > 3 - Do either of you want to do anything else > with > > > this, and/or can I do > > > my own thing? > > > > > > I am thinking about rewriting gnu queue to root > out > > > the bugs, simplify > > > the code, clean it up, and add a few features > that > > > users in my lab here > > > will probably need. > > > > > > If there are (active?) solid development plans > and > > > things are just > > > happening beneath the radar, then I'll fork off > and > > > do my own thing if I > > > decide its worth the effort for just the local > group > > > here. Otherwise, > > > based on the empirical evidence at sourceforge, > it > > > seems both of you are > > > more or less done with this project. I apologize > if > > > I'm being > > > presumptuous in that conclusion. > > > > > > It may be worth mentioning that I do not plan to > do > > > anything with > > > checkpoint support, and probably never will. > Condor > > > addresses this need > > > in the case of non-dedicated systems. For small > > > dedicated cluster > > > systems, which is my perception of the proper > market > > > for GNU queue, > > > checkpoint is something that is either not > necessary > > > or something that > > > is better handled at the application level in > > > whatever special cases. It > > > seems safe to me to presume that those special > cases > > > are custom > > > development apps anyway. I think its a > complexity > > > nightmare the code > > > base in present form (as per cvs queue-stable) > can > > > not sustain. I have > > > not looked at the development branch. > > > > > > Thus, if I put serious effort into this, my > primary > > > goal will be to > > > clean up the code and try to create a stable > system > > > suitable for any > > > current linux small cluster system and add tools > to > > > allow users to see > > > the queue list, remove or kill jobs. > > > > > > This email is way longer than I thought it would > be. > > > Sorry 'bout that. > > > brevity is not my skill. :) > > > > > > Cheers > > > Koni > > > > > > > > > -- > > > mh...@co... > > > Koni (Mark Wright) > > > 233 Biotechnology - Cornell University > > > Graduate Student - Genomics / Plant Cell > Molecular > > > Biology > > > > > > > > > > > > > |