From: Vipul D. <vip...@ya...> - 2004-08-13 21:34:29
|
One other point is I have 1GB RAM on my master versus 256MB on all my slave nodes. Could that cause any problems? Do I need to set some ulimits (-m) on the slave? Another oddity is that when I strace bpsh or node_up, I do not see any special "bproc" system calls getting executed like SYSCALL(....), though I should have executed them per the code path. Insights? Thanks. Vipul -----Original Message----- From: bpr...@li... [mailto:bpr...@li...] On Behalf Of Vipul Deokar Sent: Friday, August 13, 2004 12:25 PM To: YhLu; bpr...@li... Subject: RE: [BProc] Newbie questions Hi folks, I went ahead and installed ClusterMatic 4 over a clean RedHat 9.0 on the master of my 5-node Intel P4-based cluster. The beoboot part works fine to bring the slave nodes up and running connected to port 2223 of master, but the node_up script seems to fail at the end, so node gets marked as "error" state. What could the reasons be? Any help will be appreciated. ON MASTER (node_up script fails): $# tail -15 /var/log/clustermatic/node.0 vmadlib : loaded /lib/ld-2.3.2.so (size=103044;id=0,0;mode=100755) vmadlib : loaded /lib/libc-2.3.2.so (size=1549556;id=0,0;mode=100755) vmadlib : loaded /lib/librt-2.3.2.so (size=37552;id=0,0;mode=100755) vmadlib : loaded /lib/libpthread-0.10.so (size=103104;id=0,0;mode=100755) vmadlib : loaded /lib/libm-2.3.2.so (size=211876;id=0,0;mode=100755) vmadlib : loaded /lib/libnss_bproc.so.2 (size=25043;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libbproc.so.4.0.0 (size=21388;id=0,0;mode=100755) nodeup : Plugin vmadlib returned status 0 (ok) nodeup : No premove function for nodeinfo nodeup : No premove function for kmod nodeup : Starting 1 child processes. nodeup : Finished creating child processes. nodeup : I/O error talking to child nodeup : Child process for node 0 died with signal 4 nodeup : Node setup returned status 1 ON SLAVE (things look fine): boot: Server IP Address : 10.0.0.1 boot: My IP Address : 10.0.0.100 boot: starting bpslave : bpslave -d -i -v 10.0.0.1 2223 bpslave: connecting to 10.0.0.1:2223 bpslave: IO Daemon started; pid 15 bpslave connection to 10.0.0.1:2223 up and running bpslave: Setting node number to 0 Now, if I force the master to mark status of slave node to be "up", bpsh fails like this. I tried "vmadlib -l" and it does show /lib/ld-2.3.2.so, but execing slave cannot seem to find it. I also looked at the strace of (strace -f) bpmaster, and one odd thing is it tries to close a whole host of socket descriptors (4096 instances) that are not open. Please - any help will be appreciated. ON MASTER: $# bpctl -S 0 -s up $# bpstat Node(s) Status Mode User Group 1-3 down ---------- root root 0 up ---x------ root root $# bpsh 0 sleep 1 0: No such file or directory ON SLAVE: vmadump: mmap failed: /lib/ld-2.3.2-so Thanks. Vipul ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |