bproc-users Mailing List for BProc: Beowulf Distributed Process Space (Page 14)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi folks,

I went ahead and installed ClusterMatic 4 over a clean
RedHat 9.0 on the master of my 5-node Intel P4-based
cluster. The beoboot part works fine to bring the
slave nodes up and running connected to port 2223 of
master, but the node_up script seems to fail at the
end, so node gets marked as "error" state. What could
the reasons be? Any help will be appreciated.

ON MASTER (node_up script fails):
$# tail -15 /var/log/clustermatic/node.0
vmadlib   :     loaded /lib/ld-2.3.2.so
(size=103044;id=0,0;mode=100755)
vmadlib   :     loaded /lib/libc-2.3.2.so
(size=1549556;id=0,0;mode=100755)
vmadlib   :     loaded /lib/librt-2.3.2.so
(size=37552;id=0,0;mode=100755)
vmadlib   :     loaded /lib/libpthread-0.10.so
(size=103104;id=0,0;mode=100755)
vmadlib   :     loaded /lib/libm-2.3.2.so
(size=211876;id=0,0;mode=100755)
vmadlib   :     loaded /lib/libnss_bproc.so.2
(size=25043;id=0,0;mode=100755)
vmadlib   :     loaded /usr/lib/libbproc.so.4.0.0
(size=21388;id=0,0;mode=100755)
nodeup    :   Plugin vmadlib returned status 0 (ok)
nodeup    :   No premove function for nodeinfo
nodeup    :   No premove function for kmod
nodeup    : Starting 1 child processes.
nodeup    : Finished creating child processes.
nodeup    : I/O error talking to child
nodeup    : Child process for node 0 died with signal
4
nodeup    : Node setup returned status 1

ON SLAVE (things look fine):
boot: Server IP Address	: 10.0.0.1
boot: My IP Address	: 10.0.0.100
boot: starting bpslave : bpslave -d -i -v 10.0.0.1
2223
bpslave: connecting to 10.0.0.1:2223
bpslave: IO Daemon started; pid 15
bpslave connection to 10.0.0.1:2223 up and running
bpslave: Setting node number to 0

Now, if I force the master to mark status of slave
node to be "up", bpsh fails like this. I tried
"vmadlib -l" and it does show /lib/ld-2.3.2.so, but
execing slave cannot seem to find it. I also looked at
the strace of (strace -f) bpmaster, and one odd thing
is it tries to close a whole host of socket
descriptors (4096 instances) that are not open. Please
- any help will be appreciated.
ON MASTER:
$# bpctl -S 0 -s up
$# bpstat
Node(s)                            Status         
Mode       User     Group
1-3                                down           
---------- root     root
0                                  up             
---x------ root     root

$# bpsh 0 sleep 1
0: No such file or directory

ON SLAVE:
 vmadump: mmap failed: /lib/ld-2.3.2-so

Thanks.
Vipul

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (25)	Nov	Dec (22)
2002	Jan (13)	Feb (22)	Mar (39)	Apr (10)	May (26)	Jun (23)	Jul (38)	Aug (20)	Sep (27)	Oct (76)	Nov (32)	Dec (11)
2003	Jan (8)	Feb (23)	Mar (12)	Apr (39)	May (1)	Jun (48)	Jul (35)	Aug (15)	Sep (60)	Oct (27)	Nov (9)	Dec (32)
2004	Jan (8)	Feb (16)	Mar (40)	Apr (25)	May (12)	Jun (33)	Jul (49)	Aug (39)	Sep (26)	Oct (47)	Nov (26)	Dec (36)
2005	Jan (29)	Feb (15)	Mar (22)	Apr (1)	May (8)	Jun (32)	Jul (11)	Aug (17)	Sep (9)	Oct (7)	Nov (15)	Dec

bproc-users Mailing List for BProc: Beowulf Distributed Process Space (Page 14)

bproc-users — General discussion about BProc.