Thread: [BProc] NFS file locking problems

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Attached is a simple Perl script that I can use to tank the system.
The script uses blocking NFS file locking (a great, simple way to 
coordinate jobs across a cluster), and works fine
on other computers.  For example, if you spawn a bunch of them at once

for i in `seq 1 8 ` ; do filelocktest name_of_existing_file & done

The last script will finish 8 seconds later, each script taking
a turn holding the lock on the file for 1 second.   It also
works across multiple (non-clustermatic) machines if the 
name_of_existing_file is on a commonly NFS mounted directory.

However, if you try the script on our cluster (where all
the nodes have /home NFS mounted and /proc/sys/bproc/shell_hack
is off):

bpsh 1-30 filelocktest name_of_existing_file

It does not run in 30 seconds as expected.  The locks are obtained
much more slowly than 1/sec and after little while the whole 
system freezes up and dumps the message that I sent earlier.
Note that while using ~10 nodes takes 
much longer than 10 seconds, it usually succeeds after a certain
amount of time, and doesn't crash.  30 nodes and more crashes pretty 
reliably.

On another note our final piece of cluster weirdness that I've
detected is also NFS related, though not as important.  
When I read a file off a 
master NFS server drive from a node I get 50 MB/s, which 
is how fast the drive goes (Yay! The 2.4 kernel maxed out at 
~20MB/s over NFS for a single client.)  But then I read the
same file from the master NFS server again from a different node
now that it is cached on the server and I get only 10 MB/s.
To make certain that I'm not nuts I read the same file over NFS from 
a non-clustermatic computer and I get 100 MB/s, the legal gigabit limit
(Sweet!).   
Summary: NFS to clustermatic nodes is much slower if the file is
cached in the master NFS server.

It seems very odd that I'm getting these NFS problems.  Shouldn't that
be pretty much be independent of the bproc changes to the kernel?
Would having an NFS server separate from the bproc master fix things?

Thread: [BProc] NFS file locking problems

bproc-users