Re: [Queue-developers] [patch] Same job launched on multiple hosts, unlink errors
Brought to you by:
wkrebs
From: Mike C. <da...@ix...> - 2001-05-07 23:26:50
|
On Thu, May 03, 2001 at 11:10:10PM -0700, Cyril Bortolato wrote: > I came up with the patch to queued.c below. In startjob() I check if > there's already an efmXXX file for the job passed to startjob(). If > there is, it means that job is already running on some host and we'd > better not start it again. So I set the job's pid accordingly and There is still a race condition here, unfortunately. The efm file could still show up after you look for it but before you create it. You've reduced the window, but not eliminated it. One solution might be from the linux open(2) man page: O_EXCL When used with O_CREAT, if the file already exists it is an error and the open will fail. O_EXCL is broken on NFS file systems, programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Oth<AD> erwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful. We can't necessarily rely on flock working (not everyone has a working lockd). We could build in a lock protocol into queue, but there are so many other things broken with queue right now it's not even funny. (I've pretty much given up on queue for now and wrote a few cheesy shell scripts that work much better.) > return ALREADY_LOCKED. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |