Re: [Queue-developers] [patch] Same job launched on multiple hosts, unlink errors
Brought to you by:
wkrebs
|
From: Mike C. <da...@ix...> - 2001-05-07 23:26:50
|
On Thu, May 03, 2001 at 11:10:10PM -0700, Cyril Bortolato wrote:
> I came up with the patch to queued.c below. In startjob() I check if
> there's already an efmXXX file for the job passed to startjob(). If
> there is, it means that job is already running on some host and we'd
> better not start it again. So I set the job's pid accordingly and
There is still a race condition here, unfortunately. The efm file could
still show up after you look for it but before you create it. You've
reduced the window, but not eliminated it.
One solution might be from the linux open(2) man page:
O_EXCL When used with O_CREAT, if the file already exists
it is an error and the open will fail. O_EXCL is
broken on NFS file systems, programs which rely on
it for performing locking tasks will contain a race
condition. The solution for performing atomic file
locking using a lockfile is to create a unique file
on the same fs (e.g., incorporating hostname and
pid), use link(2) to make a link to the lockfile.
If link() returns 0, the lock is successful. Oth<AD>
erwise, use stat(2) on the unique file to check if
its link count has increased to 2, in which case
the lock is also successful.
We can't necessarily rely on flock working (not everyone has a working
lockd). We could build in a lock protocol into queue, but there are so
many other things broken with queue right now it's not even funny. (I've
pretty much given up on queue for now and wrote a few cheesy shell scripts
that work much better.)
> return ALREADY_LOCKED.
mrc
--
Mike Castle Life is like a clock: You can work constantly
da...@ix... and be right all the time, or not work at all
www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc
We are all of us living in the shadow of Manhattan. -- Watchmen
|