From: Luke S. <lu...@ac...> - 2005-03-25 16:03:17
|
Attached is a simple Perl script that I can use to tank the system. The script uses blocking NFS file locking (a great, simple way to coordinate jobs across a cluster), and works fine on other computers. For example, if you spawn a bunch of them at once for i in `seq 1 8 ` ; do filelocktest name_of_existing_file & done The last script will finish 8 seconds later, each script taking a turn holding the lock on the file for 1 second. It also works across multiple (non-clustermatic) machines if the name_of_existing_file is on a commonly NFS mounted directory. However, if you try the script on our cluster (where all the nodes have /home NFS mounted and /proc/sys/bproc/shell_hack is off): bpsh 1-30 filelocktest name_of_existing_file It does not run in 30 seconds as expected. The locks are obtained much more slowly than 1/sec and after little while the whole system freezes up and dumps the message that I sent earlier. Note that while using ~10 nodes takes much longer than 10 seconds, it usually succeeds after a certain amount of time, and doesn't crash. 30 nodes and more crashes pretty reliably. On another note our final piece of cluster weirdness that I've detected is also NFS related, though not as important. When I read a file off a master NFS server drive from a node I get 50 MB/s, which is how fast the drive goes (Yay! The 2.4 kernel maxed out at ~20MB/s over NFS for a single client.) But then I read the same file from the master NFS server again from a different node now that it is cached on the server and I get only 10 MB/s. To make certain that I'm not nuts I read the same file over NFS from a non-clustermatic computer and I get 100 MB/s, the legal gigabit limit (Sweet!). Summary: NFS to clustermatic nodes is much slower if the file is cached in the master NFS server. It seems very odd that I'm getting these NFS problems. Shouldn't that be pretty much be independent of the bproc changes to the kernel? Would having an NFS server separate from the bproc master fix things? |