|
From: Travis B. <tr...@be...> - 2005-04-29 22:43:19
|
First of all, Tiger is finally out, so Daniel is now released from his NDA, and can tell us what's changed in Xgrid, and perhaps what we need to change in xgridagent to keep up. I've found some bugs in xgridagent. There are also some subtle differences between xgridagent and Apple's GridAgent. Bugs: (1) xgridagent 1.0.3 hangs when sending large amounts of data (e.g. a couple MB tarred & gzipped) in the work directory back to the controller. Running with logging at level 4 (Debug with Beep messages), I see some XML that indicates a message of type XgMessageReplyTypeValue, named TaskFinalResults and seems to indicate that it contains the TarredWorkingDiretory. This is followed by about 110 lines like this: 04/29/05 15:18:46 [64784] Sending frame of size 32700, size of window currently 181644 followed by a single line like this: 04/29/05 15:18:46 [64784] Sending frame of size 31186, size of window currently 181644 Then nothing--it hangs there. The Xgrid tachometer drops to 0, but the Status in the Xgrid client is "Job running". Files never are delivered to the client that submitted the job. An identical job works fine on an OS X / Apple GridAgent agent, and a similar job with a much smaller output also works fine on a FreeBSD / xgridagent agent. I haven't figured out yet what the critical output size limit is yet, but I can work on that. Any ideas on this one? The following bugs are subtle differences between xgridagent and Apple's GridAgent (2) Involves handling of literal parameters (and perhaps other parameters) in Custom Plug-ins. Suppose I create a custom plug-in that accepts 1 literal parameter, and passes it to a shell script. With OS X GridAgent, if the literal I pass has spaces (e.g. "-b 4"), it is passed to the shell script as a single parameter (it's stored in $1). With xgridagent, it's passed as two parameters. There are also differences in how quotes within literals are handled. (3) There seem to be other differences on whether stdout is returned with custom plugins, and whether files produced by a job are returned if there wasn't a working directory passed in as part of the custom plug-in. I will try to better characterize these differences. Finally, I'd like to suggest the following behavior changes for xgridagent. Comments? (4) xgridagent look somewhere other than current working directory for xgrid.config.xml (maybe /etc?) (5) xgridagent should try to create "cookie" file somewhere other than current working directory (/tmp? /var/run?) (6) Needs better support for multiple machines sharing the same xgrid.config.xml file. Either automatically include the hostname in the agentName, or provide an option to specify the agentName via the command line at launch time, to allow xgridagent to be launched by a script that passes in the hostname or some other unique identifier. This is necessary so xgridagent can be used in a cluster situation where all the nodes share a common drive. |