From: <ha...@no...> - 2002-03-19 15:39:39
|
> PBS is not a good match for a BProc based system. It comes with a lot > of baggage for managing remote machines that you just don't need with > BProc. Should be true for any serious scheduler from pre-BProc times - they just had to do it somehow :-) > We have a student working a on an entirely new (and very simple) > scheduler for BProc based systems. I had my own simple (pre-BProc) scheduler but gave up further maintanance when I needed better job dependencies and multiuser envoronment. I found two free systems with job deps: PBS and GE (Grid Engine). PBS was not opensource enough for me, so my current bet is GE. I plan to port GE to BProc and suppose this to be relatively easy. I am still frustrated by the fact that to get say twice more functionality I adopted GE which is four orders of magnitude bigger but I am getting used to this. If GE works nicely with BProc, is it a viable option for your site? > As far as scripts go, I try to discourage people from trying to run > scripts on nodes. It's really not designed for it. That being said, > there are some facilities for running scripts. There is a gross hack > to make #! style execs work with execmove (bpsh). There's also an > Aexecve() hook which uses the ghost process on the front end to exec() > non-existent binaries. There's no caching of binaries on the slave > nodes though. I am trying to avoid scripts on nodes, but in batch spooling they are handy - though they just prepare environment for one heavy executable which does the real work. If I got the implications right, such scripts (being sent to nodes by bpsh) could work as long as they use absolute pathnames for executables (cause for relative ones shell looks around and gets mad)? (If we ever manage to provide all other things shell might want to touch - like .profile or .*rc) Putting all this together, probably the best approach is to let batch spooled scripts to execute on master and only migrate selected binaries by prefixing their command lines by special command; this command can look to environment variables set by spooling system, find something like queue name and bpsh executable to node ? > There's no caching of binaries on the slave > ... > > NFS is flaky and does not move data as quickly as BProc? > > BProc should always be faster for the reason you mention. But with hypothetical clever solid networked filesystem (caching in RAM and maybe local harddisk, streaming all data needed by exec) the speed of BProc would be the same or even lower if BProc does not cache executables? And furthermore I guess BProc has to get whole executable to RAM and move it while after local exec() just pages actually needed (visited by program execution) are demand loaded? (Which is probably small difference and contradicts my idea of clever filesystem streaming the whole executable to node doing local exec.) I know I compare working BProc with non-existent super-NFS, I just wanted to make sure I got it right. Best Regards Vaclav |