Menu

Multiplex jobs and harvest results

2001-02-19
2001-02-22
  • Michael J White

    Michael J White - 2001-02-19

    I'm wondering if I'm going about this correctly.
    No one has discussed this aspect of queue very much, so maybe I missed something important!

    Typically, when using a queue, I need to run many jobs (one per CPU) and harvest the results into a summary file when all jobs complete.  Each job has a few input parameters, writes some results and needs its own temporary file work space.  The number of workstations (about 3) is usually much smaller than the number of jobs (about 1000) in a batch.

    How best to allocate these to queue?  Perhaps,
    for each_problem in problems.txt
    do
       queue --queue --batch -n -- solver $each_problem
    done

    Where,
      problems.txt -- is a file with list of problem input parameters
      solver -- is a script (with no standard input or output) that takes a single problem, builds a working directory on the remote host (under remotehost:/tmp), runs a calculation engine, and deposits the result in a shared directory in a file, something like: localhost:./`hostname`$$,
    then removes the /tmp directory

    I haven't tried this yet, because it leaves some potential problems, e.g. how should I detect batch completion or node failure?  I'm trying to avoid something like
       (queue --queue --wait -n -- solver) &
    and then wait for background processes, because the do loop might generate more stub processes than the OS can contain at once.

    Regards,
    Mike White

     
    • Anonymous

      Anonymous - 2001-02-19

      Hi Mike,

      I'm just starting to play with queue, so I may be off base
      as well...

      It occurred to me that a good way to fire off a large batch
      of jobs could be with gnu make's -j option.  Your makefile
      could look something like:

      SOLVER = queue -i -w -- solver

      %.out: %.in
          $(SOLVER)

      Then you could queue up all of your jobs to run 3 at a time
      with

      make -j 3 1.out 2.out 3.out 4.out ...

      A down side of this approach is that you have to specify up
      front how many jobs you want running concurrently.

      I've tried this approach and it seems to work, except that
      I'm having (I think) unrelated sporadic problems with queue.

      -- Trey

       
      • Michael J White

        Michael J White - 2001-02-22

        Trey,

        Thanks for the clever tip with make -j !

        I have been running a few experiments, with pretty good results.  I let the makefile generate the names of the targets, and hand over to the solver input and output file names as command line args, with something like:

        SOLVER = queue -i -w -- solver

        list := $(patsubst %.tbd,%.done,$(wildcard *.done))

        finally.dat: $(list)
            cat $(list) > finally.dat

        %.done: %.tbd
            $(SOLVER) $< $@

        With make -j 3, and all 3 machines have load < 0.10, the initial allocations are uneven. For example, two jobs might start on machine A, no jobs for B, and 1 job for C.  Perhaps because "queue" is looking at the 1 minute load, it doesn't anticipate quickly enough that one of the jobs just submitted to "queued" will be increasing the load by one.
        However, things start to balance after one of the jobs on the doubly loaded machine completes.  So, this works pretty well as long as the number of submitted jobs is much larger than the number of machines.

        I've had no luck so far with "queue" using the -r option, or the -q option.  I can sometimes see the command start on the remote machine, and it shows up under the process list with ps, but it doesn't run for some reason.  The combination queue -i -w seems to be much more reliable.

        Regards,
        Mike

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.