From: Peter K. <pe...@pe...> - 2014-11-17 16:45:46
|
On 11/12/14, 3:05 PM, Daniel Povey wrote: > For a while it has bothered me that there is no very good unified > interface to the queue-invoking scripts, i.e. no universal way to say > that you want a certain number of threads, a certain amount of memory, > etc, or a GPU, independent of queue mechanism; having a unified > mechanism would make it easier for the scripts to tell the queue what > resources they need. I'm writing this email to say how I propose to > improve this, and to ask for help (i.e. if anyone has time to > implement this). > > I propose to modify queue.pl and similar scripts such as run.pl, > ssh.pl and slurm.pl, so that they all accept some additional options, > so for instance you could invoke > > queue.pl --mem 10G --num-threads 12 JOB=1:8 exp/foo/something.JOB.log .... > or > queue.pl --mem 10G --gpu 1 --max-jobs-run 4 JOB=1:8 > exp/bar/something.JOB.log .... > (max-jobs-run would limit the simultaneously running jobs, just like > -tc 4 to GridEngine). > > All the other parallelization scripts would take the same options, and > would probably just ignore options that they didn't already recognize > (for future-proofing). > Some of these scripts would have to be configurable, e.g. GridEngine > can be configured in various ways. > > For example, queue.pl could look for a file located by default in > conf/queue.conf > which would tell it how to convert the things above into actual > options, e.g. the following, which looks a bit like bash but would be > interpreted by the perl script. Below I try to show a case where the > "gpu" option requires a change in queue, which makes the script a > little more complicated. But I don't want to make the config language > super-powerful so it's hard to implement; if someone has a weird queue > setup that requires extra configuration, they can always modify > queue.pl. > > # cat conf/queue.conf > standard_opts -l arch=*64* > mem=* -l mem_free=$0,ram_free=$0 > num_threads=* -pe smp $0 > max_jobs_run=* -tc $0 > default gpu=0 > gpu=0 -q all.q > gpu=* -l gpu=$0 -q gpu.q > > The idea is that once queue.pl and similar scripts are updated to > include these standardized options, with a mechanism to convert them > into "normal" options, we can then start extending the scripts to take > advantage of this standardization, so instead of having the user pass > in "gpu_opts" and so on, we can just have the script add the option > --gpu 1 itself. And scripts can start working out how much memory > different stages will need, and set the --mem option themselves. > I think a sane common configuration format is a great idea, and some common Perl library to read it / mixin with cli options ideal. I'd be happy to contribute in this way. Do you have any restrictions on the project with requiring/using CPAN modules? There are several different ways to approach a solution, and several existing implementations on CPAN. E.g. using a common config format (.ini, .yml, .json, .conf) with something like https://metacpan.org/pod/Config::Any and coupled with https://metacpan.org/pod/Getopt::Long can work well. Of course, Moose combines these even more easily, but I expect a large dependency list like Moose includes would not be welcome. Thoughts? -- Peter Karman . http://peknet.com/ . pe...@pe... |