You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: Daniel G. <dg...@ti...> - 2004-10-19 16:07:19
|
HI Ted, I attach my node_up script, which sources the nfs.init script (you call it nfs_node.conf, I guess). You can ignore the lines for sensors.init and pathscale.init. In addition to this, I have the following in my /etc/clustermatic/config: librariesfrombinary /sbin/rpc.statd /sbin/portmap My nodes now work properly, and mount my master:/home instantaneously. Daniel On Tue, Oct 19, 2004 at 12:01:05PM -0400, Ted Sariyski wrote: > I need some more help. The script Michal wrote for NFS node support > ends with: > > # bpsh $node rpc.statd > > There is no rpc.statd in my distribution but there is rpc.rstatd (I use > SuSe Server 9), so I changed it corespondingly. > Than I add > > /etc/clustermatic/nfs_node.conf $* > > at the end of node_up (nfs_node.conf if the Michal's script, attached). > > It mounts but is slow. I'm not sure that I understand how to use: > # If you want to put more setup stuff here, make sure do replace the > # "exec" above with the following: > # /usr/lib/beoboot/bin/node_up $* || exit 1 > > What I'm doing wrong ? > Thanks, > Ted > > #!/bin/bash -x > # > # A sample how to get NFS modules on a node. > # Make sure that /etc/modules.conf.dist for a node does not > # define any "install" actions for these > # > # Michal Jaegermann, 2004/Aug/19, michal@ha... > # > # 2004/Oct/15, michal@ha... > # - start portmap and rpc.statd on nodes > # - fix "case m" typo and do not use "-N" option to bpsh > > node=$1 > mod=nfs > modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) > modules=${modules/:/} > modules=$( > for m in $modules ; do > echo $m > done | tac ) > ( cd / > for m in $modules ; do > echo $m > done > ) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet > bpsh $node depmod -a > for m in $modules ; do > m=$(basename $m .ko) > m=${m/_/-} > case $m in > sunrpc) > bpsh $node modprobe -i sunrpc > bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs > bpsh $node mount | grep -q rpc_pipefs || \ > bpsh $node mount -t rpc_pipefs sunrpc > /var/lib/nfs/rpc_pipefs > ;; > *) bpsh $node modprobe -i $m > esac > done > # these are for a benfit of rpc.statd > bpsh $node mkdir -p /var/lib/nfs/statd/ > bpsh $node mkdir -p /var/run > bpsh $node portmap > bpsh $node rpc.rstatd > > #mount -t nfs MASTER:/public/home /u -o > nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr > #mount -t nfs MASTER:/scratch /scratch1 -o > nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr > #mount -t nfs MASTER:/public/code /code -o > nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr > > > Daniel Gruner wrote: > > >Ted, > > > >See the posting from Michal Jagermann on Oct 16. You need to run > >both portmap and rpc.statd on the nodes, and then mounting and umounting > >work fine. > > > >Daniel > > > > > >On Mon, Oct 18, 2004 at 11:01:59AM -0400, Ted Sariyski wrote: > > > > > >>Finally I was able to build a customized version of clustermatic with > >>kernel 2.6.7 for AMD64. All nodes use Tian B2882 Transport GX28 > >>mainboard, the head node have two SATA hard disks running in RAID1 mode > >>and I use PXE to boot the diskless nodes (only 16 nodes). > >> > >>I have a couple of questions concerning mounting remote file systems, it > >>takes really long. Besides some nodes come up fast while for other it > >>takes 5-10 minutes. For example node1 boots in 2-3 minutes while node0 > >>issued errors on the console (there are not records on the log file): > >> > >>mmap failed: /lib64/ld-2.3.3.so > >>vmadump: mmap failed: /lib64/ld-2.3.3.so > >>portmap: server localhost not responding, time out > >>RPC: failed to contact portmap > >>Lockd_up: no pid, 2 users?? > >> > >>before somehow come up: > >> > >>[root@xtreme101 root]# bpsh 0 mount > >>rootfs on / type rootfs (rw) > >>none on /proc type proc (rw,nodiratime) > >>none on /bpfs type bpfs (rw) > >>192.168.0.101:/home on /home type nfs > >>(rw,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.0.101) > >>192.168.0.200:/public/home on /u type nfs > >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > >>192.168.0.200:/scratch on /scratch1 type nfs > >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > >>192.168.0.200:/public/code on /code type nfs > >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > >> > >>Currently I work only with three nodes and I believe that it's not a PXE > >>issue. What is the meaning of mmap and portmap errors issued by node0? > >>Is it normal for mount to take so long or I miss something in config? > >> > >>Thanks, > >>Ted > >> > >> > >> > >> > >> > >>------------------------------------------------------- > >>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > >>Use IT products in your business? Tell us what you think of them. Give us > >>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > >>http://productguide.itmanagersjournal.com/guidepromo.tmpl > >>_______________________________________________ > >>BProc-users mailing list > >>BPr...@li... > >>https://lists.sourceforge.net/lists/listinfo/bproc-users > >> > >> > > > > > > > -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Steven J. <py...@li...> - 2004-10-19 16:06:29
|
Greetings, Often when I see slow NFS mounts, it's because the server isn't running lockd (or hasn't loaded the lockd module) and the client doesn't mount with -onolock G'day, sjames On Tue, 19 Oct 2004, Ted Sariyski wrote: > I need some more help. The script Michal wrote for NFS node support > ends with: > > # bpsh $node rpc.statd > > There is no rpc.statd in my distribution but there is rpc.rstatd (I use > SuSe Server 9), so I changed it corespondingly. > Than I add > > /etc/clustermatic/nfs_node.conf $* > > at the end of node_up (nfs_node.conf if the Michal's script, attached). > > It mounts but is slow. I'm not sure that I understand how to use: > # If you want to put more setup stuff here, make sure do replace the > # "exec" above with the following: > # /usr/lib/beoboot/bin/node_up $* || exit 1 > > What I'm doing wrong ? > Thanks, > Ted > > #!/bin/bash -x > # > # A sample how to get NFS modules on a node. > # Make sure that /etc/modules.conf.dist for a node does not > # define any "install" actions for these > # > # Michal Jaegermann, 2004/Aug/19, michal@ha... > # > # 2004/Oct/15, michal@ha... > # - start portmap and rpc.statd on nodes > # - fix "case m" typo and do not use "-N" option to bpsh > > node=$1 > mod=nfs > modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) > modules=${modules/:/} > modules=$( > for m in $modules ; do > echo $m > done | tac ) > ( cd / > for m in $modules ; do > echo $m > done > ) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet > bpsh $node depmod -a > for m in $modules ; do > m=$(basename $m .ko) > m=${m/_/-} > case $m in > sunrpc) > bpsh $node modprobe -i sunrpc > bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs > bpsh $node mount | grep -q rpc_pipefs || \ > bpsh $node mount -t rpc_pipefs sunrpc > /var/lib/nfs/rpc_pipefs > ;; > *) bpsh $node modprobe -i $m > esac > done > # these are for a benfit of rpc.statd > bpsh $node mkdir -p /var/lib/nfs/statd/ > bpsh $node mkdir -p /var/run > bpsh $node portmap > bpsh $node rpc.rstatd > > #mount -t nfs MASTER:/public/home /u -o > nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr > #mount -t nfs MASTER:/scratch /scratch1 -o > nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr > #mount -t nfs MASTER:/public/code /code -o > nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr > > > Daniel Gruner wrote: > > >Ted, > > > >See the posting from Michal Jagermann on Oct 16. You need to run > >both portmap and rpc.statd on the nodes, and then mounting and umounting > >work fine. > > > >Daniel > > > > > >On Mon, Oct 18, 2004 at 11:01:59AM -0400, Ted Sariyski wrote: > > > > > >>Finally I was able to build a customized version of clustermatic with > >>kernel 2.6.7 for AMD64. All nodes use Tian B2882 Transport GX28 > >>mainboard, the head node have two SATA hard disks running in RAID1 mode > >>and I use PXE to boot the diskless nodes (only 16 nodes). > >> > >>I have a couple of questions concerning mounting remote file systems, it > >>takes really long. Besides some nodes come up fast while for other it > >>takes 5-10 minutes. For example node1 boots in 2-3 minutes while node0 > >>issued errors on the console (there are not records on the log file): > >> > >>mmap failed: /lib64/ld-2.3.3.so > >>vmadump: mmap failed: /lib64/ld-2.3.3.so > >>portmap: server localhost not responding, time out > >>RPC: failed to contact portmap > >>Lockd_up: no pid, 2 users?? > >> > >>before somehow come up: > >> > >>[root@xtreme101 root]# bpsh 0 mount > >>rootfs on / type rootfs (rw) > >>none on /proc type proc (rw,nodiratime) > >>none on /bpfs type bpfs (rw) > >>192.168.0.101:/home on /home type nfs > >>(rw,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.0.101) > >>192.168.0.200:/public/home on /u type nfs > >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > >>192.168.0.200:/scratch on /scratch1 type nfs > >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > >>192.168.0.200:/public/code on /code type nfs > >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > >> > >>Currently I work only with three nodes and I believe that it's not a PXE > >>issue. What is the meaning of mmap and portmap errors issued by node0? > >>Is it normal for mount to take so long or I miss something in config? > >> > >>Thanks, > >>Ted > >> > >> > >> > >> > >> > >>------------------------------------------------------- > >>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > >>Use IT products in your business? Tell us what you think of them. Give us > >>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > >>http://productguide.itmanagersjournal.com/guidepromo.tmpl > >>_______________________________________________ > >>BProc-users mailing list > >>BPr...@li... > >>https://lists.sourceforge.net/lists/listinfo/bproc-users > >> > >> > > > > > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 55 Marietta Street Suite 1830 Atlanta, Ga 30303 866 824 9737 support |
From: Ted S. <tsa...@cr...> - 2004-10-19 16:01:07
|
I need some more help. The script Michal wrote for NFS node support ends with: # bpsh $node rpc.statd There is no rpc.statd in my distribution but there is rpc.rstatd (I use SuSe Server 9), so I changed it corespondingly. Than I add /etc/clustermatic/nfs_node.conf $* at the end of node_up (nfs_node.conf if the Michal's script, attached). It mounts but is slow. I'm not sure that I understand how to use: # If you want to put more setup stuff here, make sure do replace the # "exec" above with the following: # /usr/lib/beoboot/bin/node_up $* || exit 1 What I'm doing wrong ? Thanks, Ted #!/bin/bash -x # # A sample how to get NFS modules on a node. # Make sure that /etc/modules.conf.dist for a node does not # define any "install" actions for these # # Michal Jaegermann, 2004/Aug/19, michal@ha... # # 2004/Oct/15, michal@ha... # - start portmap and rpc.statd on nodes # - fix "case m" typo and do not use "-N" option to bpsh node=$1 mod=nfs modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) modules=${modules/:/} modules=$( for m in $modules ; do echo $m done | tac ) ( cd / for m in $modules ; do echo $m done ) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet bpsh $node depmod -a for m in $modules ; do m=$(basename $m .ko) m=${m/_/-} case $m in sunrpc) bpsh $node modprobe -i sunrpc bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs bpsh $node mount | grep -q rpc_pipefs || \ bpsh $node mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs ;; *) bpsh $node modprobe -i $m esac done # these are for a benfit of rpc.statd bpsh $node mkdir -p /var/lib/nfs/statd/ bpsh $node mkdir -p /var/run bpsh $node portmap bpsh $node rpc.rstatd #mount -t nfs MASTER:/public/home /u -o nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr #mount -t nfs MASTER:/scratch /scratch1 -o nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr #mount -t nfs MASTER:/public/code /code -o nfsvers=3,tcp,bg,rw,rsize=16384,wsize=16384,hard,intr Daniel Gruner wrote: >Ted, > >See the posting from Michal Jagermann on Oct 16. You need to run >both portmap and rpc.statd on the nodes, and then mounting and umounting >work fine. > >Daniel > > >On Mon, Oct 18, 2004 at 11:01:59AM -0400, Ted Sariyski wrote: > > >>Finally I was able to build a customized version of clustermatic with >>kernel 2.6.7 for AMD64. All nodes use Tian B2882 Transport GX28 >>mainboard, the head node have two SATA hard disks running in RAID1 mode >>and I use PXE to boot the diskless nodes (only 16 nodes). >> >>I have a couple of questions concerning mounting remote file systems, it >>takes really long. Besides some nodes come up fast while for other it >>takes 5-10 minutes. For example node1 boots in 2-3 minutes while node0 >>issued errors on the console (there are not records on the log file): >> >>mmap failed: /lib64/ld-2.3.3.so >>vmadump: mmap failed: /lib64/ld-2.3.3.so >>portmap: server localhost not responding, time out >>RPC: failed to contact portmap >>Lockd_up: no pid, 2 users?? >> >>before somehow come up: >> >>[root@xtreme101 root]# bpsh 0 mount >>rootfs on / type rootfs (rw) >>none on /proc type proc (rw,nodiratime) >>none on /bpfs type bpfs (rw) >>192.168.0.101:/home on /home type nfs >>(rw,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.0.101) >>192.168.0.200:/public/home on /u type nfs >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) >>192.168.0.200:/scratch on /scratch1 type nfs >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) >>192.168.0.200:/public/code on /code type nfs >>(rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) >> >>Currently I work only with three nodes and I believe that it's not a PXE >>issue. What is the meaning of mmap and portmap errors issued by node0? >>Is it normal for mount to take so long or I miss something in config? >> >>Thanks, >>Ted >> >> >> >> >> >>------------------------------------------------------- >>This SF.net email is sponsored by: IT Product Guide on ITManagersJournal >>Use IT products in your business? Tell us what you think of them. Give us >>Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more >>http://productguide.itmanagersjournal.com/guidepromo.tmpl >>_______________________________________________ >>BProc-users mailing list >>BPr...@li... >>https://lists.sourceforge.net/lists/listinfo/bproc-users >> >> > > > |
From: Daniel G. <dg...@ti...> - 2004-10-18 16:06:20
|
Ted, See the posting from Michal Jagermann on Oct 16. You need to run both portmap and rpc.statd on the nodes, and then mounting and umounting work fine. Daniel On Mon, Oct 18, 2004 at 11:01:59AM -0400, Ted Sariyski wrote: > Finally I was able to build a customized version of clustermatic with > kernel 2.6.7 for AMD64. All nodes use Tian B2882 Transport GX28 > mainboard, the head node have two SATA hard disks running in RAID1 mode > and I use PXE to boot the diskless nodes (only 16 nodes). > > I have a couple of questions concerning mounting remote file systems, it > takes really long. Besides some nodes come up fast while for other it > takes 5-10 minutes. For example node1 boots in 2-3 minutes while node0 > issued errors on the console (there are not records on the log file): > > mmap failed: /lib64/ld-2.3.3.so > vmadump: mmap failed: /lib64/ld-2.3.3.so > portmap: server localhost not responding, time out > RPC: failed to contact portmap > Lockd_up: no pid, 2 users?? > > before somehow come up: > > [root@xtreme101 root]# bpsh 0 mount > rootfs on / type rootfs (rw) > none on /proc type proc (rw,nodiratime) > none on /bpfs type bpfs (rw) > 192.168.0.101:/home on /home type nfs > (rw,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.0.101) > 192.168.0.200:/public/home on /u type nfs > (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > 192.168.0.200:/scratch on /scratch1 type nfs > (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > 192.168.0.200:/public/code on /code type nfs > (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) > > Currently I work only with three nodes and I believe that it's not a PXE > issue. What is the meaning of mmap and portmap errors issued by node0? > Is it normal for mount to take so long or I miss something in config? > > Thanks, > Ted > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Ted S. <tsa...@cr...> - 2004-10-18 15:02:02
|
Finally I was able to build a customized version of clustermatic with kernel 2.6.7 for AMD64. All nodes use Tian B2882 Transport GX28 mainboard, the head node have two SATA hard disks running in RAID1 mode and I use PXE to boot the diskless nodes (only 16 nodes). I have a couple of questions concerning mounting remote file systems, it takes really long. Besides some nodes come up fast while for other it takes 5-10 minutes. For example node1 boots in 2-3 minutes while node0 issued errors on the console (there are not records on the log file): mmap failed: /lib64/ld-2.3.3.so vmadump: mmap failed: /lib64/ld-2.3.3.so portmap: server localhost not responding, time out RPC: failed to contact portmap Lockd_up: no pid, 2 users?? before somehow come up: [root@xtreme101 root]# bpsh 0 mount rootfs on / type rootfs (rw) none on /proc type proc (rw,nodiratime) none on /bpfs type bpfs (rw) 192.168.0.101:/home on /home type nfs (rw,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.0.101) 192.168.0.200:/public/home on /u type nfs (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) 192.168.0.200:/scratch on /scratch1 type nfs (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) 192.168.0.200:/public/code on /code type nfs (rw,v3,rsize=16384,wsize=16384,hard,intr,tcp,lock,addr=192.168.0.200) Currently I work only with three nodes and I believe that it's not a PXE issue. What is the meaning of mmap and portmap errors issued by node0? Is it normal for mount to take so long or I miss something in config? Thanks, Ted |
From: Michal J. <mi...@ha...> - 2004-10-16 21:06:43
|
On Fri, Aug 20, 2004 I posted a note how to use extra modules on nodes if "kmod" is absent (or things will require an extra handling anyway). As an example I added a helper script to establish NFS of nodes. Later experience showed up that for that you better run rpc.statd on nodes, which wants to see portmap, or some operations will be very slooooow (in particular mount and umount even much more). There were also some minor typos in that script. So here it is a new version which makes NFS operations so much better. :-) This is meant to be be sourced from node_up startup script; or you may use it "standalone" but then give it a node number as an argument (and wrap in a loop for multiple nodes). It is clearly expected that portmap and rpc.statd will find requires libraries (you may use 'librariesfrombinary' directive in your 'config' for that). If you are running different kernels on your master machine and on nodes then obvious tweaks need to be done. Michal #!/bin/bash # # A sample how to get NFS modules on a node. # Make sure that /etc/modules.conf.dist for a node does not # define any 'install' actions for these # # Michal Jaegermann, 2004/Aug/19, mi...@ha... # # 2004/Oct/15, mi...@ha... # - start portmap and rpc.statd on nodes # - fix 'case m' typo and do not use '-N' option to bpsh node=$1 mod=nfs modules=$( grep $mod.ko /lib/modules/$(uname -r)/modules.dep) modules=${modules/:/} modules=$( for m in $modules ; do echo $m done | tac ) ( cd / for m in $modules ; do echo $m done ) | ( cd / ; cpio -o -c --quiet ) | bpsh $node cpio -imd --quiet bpsh $node depmod -a for m in $modules ; do m=$(basename $m .ko) m=${m/_/-} case $m in sunrpc) bpsh $node modprobe -i sunrpc bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs bpsh $node mount | grep -q rpc_pipefs || \ bpsh $node mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs ;; *) bpsh $node modprobe -i $m esac done # these are for a benfit of rpc.statd bpsh $node mkdir -p /var/lib/nfs/statd/ bpsh $node mkdir -p /var/run bpsh $node portmap bpsh $node rpc.statd |
From: Gustavo G. M. <gu...@ma...> - 2004-10-11 14:55:06
|
Luke, I really agree with you. The PVM, BPROC and MPI will be in the same kernel compilation. But, openMosix, will be in a different kernel compilation. I am doing the things quickly only in practice, but now I am reading somethings that is open my mind to avoid crashes. :o) In my tests with BPROC, I really, really was amazed. BPROC is a great tool. -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 Citando Luke Palmer <lop...@wi...>: > Gustavo, > > openMosix and bproc together sounds like a really bad idea. Both of them do > very nonstandard things to processes, and the results could be unpredictable > (actually, I predict crashes :) > > If you use openMosix, then you will need to have rsh to get MPI to work. If > you use bproc, you will not need to use rsh. > > -Luke > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Luke P. <lop...@wi...> - 2004-10-11 14:06:13
|
Gustavo, openMosix and bproc together sounds like a really bad idea. Both of them do very nonstandard things to processes, and the results could be unpredictable (actually, I predict crashes :) If you use openMosix, then you will need to have rsh to get MPI to work. If you use bproc, you will not need to use rsh. -Luke |
From: Greg W. <gw...@la...> - 2004-10-10 16:48:17
|
You need to load the nodeinfo plugin on the nodes. Make sure /etc/clustermatic/node_up.conf contains the line: plugin nodeinfo The nodeinfo plugin is available in the beoboot package and is used to make some node-specific information available via the /bpfs filesystem. mpictrl uses this to find out how many cpus there are on each node. Greg On Oct 9, 2004, at 2:25 PM, Daniel Purcell wrote: > BPROC mailing list subscribers, > > I'm a beginner with linux clusters and am having a warning message > show up every time I run mpirun. > > I'm using bproc-4.0.0pre6 and a 2.6.7 kernel. I'm using > mpich-1.2.5.3, and applied the patches I found in cmtools-1.4 (there > was an error when trying to patch the ch_gm section, but I'm not using > gm so I ignored the error). I also made cmtool's mpirun without a > problem. I can run the test cpi program on my 14 node cluster without > a problem, but whenever I do, I see this warning about not being able > to find the number of cpus on each machine: > > wulfman@wulfpack ~/cmtools-1.4/mpirun $ ./mpirun --p4 --np 14 > ~/mpich-1.2.5.3/examples/basic/cpi > Could not get number of cpus for node 1, assuming 1 > Could not get number of cpus for node 2, assuming 1 > Could not get number of cpus for node 3, assuming 1 > Could not get number of cpus for node 4, assuming 1 > Could not get number of cpus for node 5, assuming 1 > Could not get number of cpus for node 6, assuming 1 > Could not get number of cpus for node 7, assuming 1 > Could not get number of cpus for node 8, assuming 1 > Could not get number of cpus for node 9, assuming 1 > Could not get number of cpus for node 10, assuming 1 > Could not get number of cpus for node 11, assuming 1 > Could not get number of cpus for node 12, assuming 1 > Could not get number of cpus for node 13, assuming 1 > Could not get number of cpus for node 14, assuming 1 > Process 0 of 14 on localhost > pi is approximately 3.1415926.... yadda yadda > wall clock time = 1.33525 > Process 2 of 14 on localhost > ... > ... > > I get this output (about not being able to find the number of cpus for > each cluster node) for any mpi program I run. > > Is there something I did wrong during the install? Has anyone else > had this problem before? > > -Daniel Purcell > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on > ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give > us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out > more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Greg W. <gw...@la...> - 2004-10-10 05:11:24
|
Good luck. The problem is probably with rsh. I seem to remember that it doesn't=20 copy your environment to the remote machine, and because bash is not=20 run as a login shell your .bash_profile is not run. You might be able=20 to convince it to run a .bashrc though. Greg On Oct 9, 2004, at 7:26 PM, Gustavo Gobi Martinelli wrote: > Well, > > Let=B4s I explain my project. > > I have to make a beowulf cluster using: PVM, MPICH and openMOSIX. > > I installed BPROC, because MPI needs it. And a cluster of 64 machines=20= > installed > in a University (UFES) here, use BPROC with MPI. > > I have to study this 03 tecnologies. The first one is PVM. > > PVM doesn't use BPROC, but it is already installed because in the=20 > future I will > use MPI. > > My problem is the rsh session on a remote host. > > If there are variables on this session, I have to know where its was=20= > declared, > so I will include the variable that I need. That=B4s the point. > > -- > Atenciosamente, > Gustavo Gobi Martinelli > Linux User# 270627 > > > Citando Greg Watson <gw...@la...>: > >> Why are you using rsh? The whole idea behind bproc is that you don't >> need to log into the nodes, ever. Use the bpsh command if you want to >> run something on a node. >> >> Greg >> >> On Oct 9, 2004, at 2:27 PM, Gustavo Gobi Martinelli wrote: >> >>> >>> My beowulf cluster is already finish. >>> >>> But I'm having a problem that I really don=B4t know what I have to = do. >>> >>> When I execute the command: >>> >>> # rsh 192.168.0.1 'set' >>> >>> I can see some variables that are a default system variables >>> >>> But I have to see the variables that is on .bash_profile and >>> /etc/profile >>> >>> If I execute the >>> >>> # rsh 192.168.0.1 >>> >>> The login occurs, and after, I execute >>> >>> # set >>> >>> In this way I can see the variables that I need. >>> >>> But why can=B4t I see if I execute >>> >>> # rsh 192.168.0.1 'set' ?????? >>> >>> Someone, please help. I lost my day on this problem. :o) >>> >>> -- >>> Atenciosamente, >>> Gustavo Gobi Martinelli >>> Linux User# 270627 >>> >>> >>> ------------------------------------------------------- >>> This SF.net email is sponsored by: IT Product Guide on >>> ITManagersJournal >>> Use IT products in your business? Tell us what you think of them.=20 >>> Give >>> us >>> Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find=20= >>> out >>> more >>> http://productguide.itmanagersjournal.com/guidepromo.tmpl >>> _______________________________________________ >>> BProc-users mailing list >>> BPr...@li... >>> https://lists.sourceforge.net/lists/listinfo/bproc-users >>> >> >> > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on=20 > ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give=20= > us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out=20= > more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Gustavo G. M. <gu...@ma...> - 2004-10-10 01:26:46
|
Well, Let´s I explain my project. I have to make a beowulf cluster using: PVM, MPICH and openMOSIX. I installed BPROC, because MPI needs it. And a cluster of 64 machines installed in a University (UFES) here, use BPROC with MPI. I have to study this 03 tecnologies. The first one is PVM. PVM doesn't use BPROC, but it is already installed because in the future I will use MPI. My problem is the rsh session on a remote host. If there are variables on this session, I have to know where its was declared, so I will include the variable that I need. That´s the point. -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 Citando Greg Watson <gw...@la...>: > Why are you using rsh? The whole idea behind bproc is that you don't > need to log into the nodes, ever. Use the bpsh command if you want to > run something on a node. > > Greg > > On Oct 9, 2004, at 2:27 PM, Gustavo Gobi Martinelli wrote: > > > > > My beowulf cluster is already finish. > > > > But I'm having a problem that I really don´t know what I have to do. > > > > When I execute the command: > > > > # rsh 192.168.0.1 'set' > > > > I can see some variables that are a default system variables > > > > But I have to see the variables that is on .bash_profile and > > /etc/profile > > > > If I execute the > > > > # rsh 192.168.0.1 > > > > The login occurs, and after, I execute > > > > # set > > > > In this way I can see the variables that I need. > > > > But why can´t I see if I execute > > > > # rsh 192.168.0.1 'set' ?????? > > > > Someone, please help. I lost my day on this problem. :o) > > > > -- > > Atenciosamente, > > Gustavo Gobi Martinelli > > Linux User# 270627 > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IT Product Guide on > > ITManagersJournal > > Use IT products in your business? Tell us what you think of them. Give > > us > > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out > > more > > http://productguide.itmanagersjournal.com/guidepromo.tmpl > > _______________________________________________ > > BProc-users mailing list > > BPr...@li... > > https://lists.sourceforge.net/lists/listinfo/bproc-users > > > > |
From: Gustavo G. M. <gu...@ma...> - 2004-10-09 20:27:30
|
My beowulf cluster is already finish. But I'm having a problem that I really don´t know what I have to do. When I execute the command: # rsh 192.168.0.1 'set' I can see some variables that are a default system variables But I have to see the variables that is on .bash_profile and /etc/profile If I execute the # rsh 192.168.0.1 The login occurs, and after, I execute # set In this way I can see the variables that I need. But why can´t I see if I execute # rsh 192.168.0.1 'set' ?????? Someone, please help. I lost my day on this problem. :o) -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 |
From: Daniel P. <dan...@gm...> - 2004-10-09 20:25:57
|
BPROC mailing list subscribers, I'm a beginner with linux clusters and am having a warning message show up every time I run mpirun. I'm using bproc-4.0.0pre6 and a 2.6.7 kernel. I'm using mpich-1.2.5.3, and applied the patches I found in cmtools-1.4 (there was an error when trying to patch the ch_gm section, but I'm not using gm so I ignored the error). I also made cmtool's mpirun without a problem. I can run the test cpi program on my 14 node cluster without a problem, but whenever I do, I see this warning about not being able to find the number of cpus on each machine: wulfman@wulfpack ~/cmtools-1.4/mpirun $ ./mpirun --p4 --np 14 ~/mpich-1.2.5.3/examples/basic/cpi Could not get number of cpus for node 1, assuming 1 Could not get number of cpus for node 2, assuming 1 Could not get number of cpus for node 3, assuming 1 Could not get number of cpus for node 4, assuming 1 Could not get number of cpus for node 5, assuming 1 Could not get number of cpus for node 6, assuming 1 Could not get number of cpus for node 7, assuming 1 Could not get number of cpus for node 8, assuming 1 Could not get number of cpus for node 9, assuming 1 Could not get number of cpus for node 10, assuming 1 Could not get number of cpus for node 11, assuming 1 Could not get number of cpus for node 12, assuming 1 Could not get number of cpus for node 13, assuming 1 Could not get number of cpus for node 14, assuming 1 Process 0 of 14 on localhost pi is approximately 3.1415926.... yadda yadda wall clock time = 1.33525 Process 2 of 14 on localhost ... ... I get this output (about not being able to find the number of cpus for each cluster node) for any mpi program I run. Is there something I did wrong during the install? Has anyone else had this problem before? -Daniel Purcell |
From: Ted S. <tsa...@cr...> - 2004-10-06 18:33:28
|
Michal Jaegermann wrote: >On Wed, Oct 06, 2004 at 11:43:18AM -0400, Ted Sariyski wrote: > > >>Here is what I did: >> >>1. I moved e100.ko out of /lib/modules/2.6.7/... >>2. Compiled a kernel only with eepro and without e100. >> >> > >I do not understand. If you had done 2 then there was nothing >to move in point 1. I would have rather use e100 but that is >your choice. > > I did the steps one at a time: compile a kernel and make boot images. The only reason I use eepro is because the master node uses eepro. >>3. Included eepro in the kernel and tg3 as a module. >> >> > >Do you have two network interfaces on your nodes? Are you trying >to use both while booting? What for? > > > There are three interfaces on the motherboard: one 100BaseT (eepro) and two 1GB (tg3). I use eepro for remote connection, eth1 is configured to the cluster subnet, eth2 is not used and not connected. >Why instead of recompiling kernels you will not use '--noautomod' >and configure explicitely which modules are used in your >config.boot? Quicker and allows to try more combinations. Of >course you have to create new images for nodes if you are changing >configurations. > > > I do use --noautomod. Whit each build I usually try both with and without --noautomod. It's supposed that --noautomod will prevent the automatic gathering of all the network drivers but it doesn't. If there is a e100 module it will try to load it with and without --noautomod. Whit each build I usually try both with and without --noautomod. >>4. Include both eepro and tg3 in the kernel. >> >> > >???? > > Just a desperate attempt :) >>The boot process always ends with "socket: Address family not supported". >> >> > >My guess would be that either you have a cable on a node hooked >up to one interface while your setup is attempting to use another >one or a version of a driver you happen to use has broadcast >troubles. Have you done a bit of a research on that? > > I'll do it now. >While you will not configure yourself to use only ONE interface >on a node? At least until you will find out what is going on. > > Until now I have only one master and one node. I'm going ot add a few more nodes and try with another one. Only one MAC address is listed in /etc/clustermatich/config and only one interface is physically connected to the net. Probably there are jumpers on the motherboard I could use to disable all but one interfaces. Is that what you mean? > > >>In the case 4 I'm pretty sure that I can see a message that tg3 >>passed the test but it scrolls too fast. >> >> > >You can always use a serial console on a node and hook up a terminal >emulator with a screen capture while debugging. > > > >>By the way, is there a way (like a hot key, e.g. 'pause') that >>will allow to scroll the screen messages back and forth? >> >> > >Ctrl-S should stop it if you have a keyboard hooked up. If >a normal Shift-PgUp works from such keyboard I do not remeber >right now (but most likely it does). > >Did you try to check if there is anything in your logs? Various >interesting things can often be found there. > The only strange record is: in.tftpd[19628]: tftp: client does not accept options Is it strange enough to cause the problem? > > > >>During the compilation I got the following warnings: >> >>beoboot: >>boot.h:105: warning: conflicting types for built-in function `log' >>modhelper.c:239: warning: field precision is not type int (arg 2) >>boot.h:105: warning: conflicting types for built-in function `log' >>nfsmount_xdr.c:46: warning: unused variable `buf' >> >> > >So? In the worst case you have an editor. > >>bproc: >>/root/clustermatic/2.6.7/bproc-4.0.0pre6/kernel/masq.c:937:2: warning: >>#warning "This is almost certainly busted." >> >> > >Quite possibly true. Still works (usually?). Erik clearly left >that as a message to himself but that point is far from obvious. > > > >>bpstat.c:303:2: warning: #warning print_node_number is busted. >>bpstat.c:279: warning: unused variable `sz' >>bpstat.c:281: warning: unused variable `addr' >> >> > >And these affect you how? > I don't know, just trying to provide more information. > > > >>I tried to check out beoboot but got an error: >> >> > >This is a script (pretty standard) I am using to get that stuff: > >#!/bin/sh > >modules=" >bproc >beoboot >bjs >" > >## cvs -d:pserver:ano...@cv...:/cvsroot/bproc login >for modulename in $modules ; do >cvs -z3 -d:pserver:ano...@cv...:/cvsroot/bproc co $modulename >done > >Type anything if using login and asked for a password. It worked a minute ago. > > Michal > > |
From: Steven J. <py...@li...> - 2004-10-05 20:42:32
|
Greetings, It sounds like it's getting messed up by having two modules for the same PCI IDs. You may have to get rid of none or the other by moving it out of /lib/modules while you run beoboot. G'day, sjames On Tue, 5 Oct 2004, Ted Sariyski wrote: > It was a misconfiguration, eth1 and eth2 were configured to the same > IP. After I fix > it the diskless node loads successfully tg3 and eepro. The problem is > that it proceeds > with loading e100. There is no e100 interface and it fails with > "socet: Address family not supported". I do not have > e100 in config.boot. I tried with and without --noautomod without > success. I am unable > to find 'e100' in the code, so if it is hardwired it's not transparent. > What is the next > move? > Thanks, Ted > > P.S. Where is the cvs repository for beoboot? > > Michal Jaegermann wrote: > > >On Tue, Oct 05, 2004 at 11:48:49AM -0400, Ted Sariyski wrote: > > > > > >>>At level 2 the diskless node loads bproc module correctly > >>>but complains that cannot find supermon_proc module. > >>> > >>> > >>Ok, supermon_proc is optional but is not controlled from > >>config.boot. It's hardwired in beoboot's boot.c: > >> > >> /* Load the modules we require */ > >> mod = module_get("bproc"); > >> modprobe(mod, mod->args); > >> mod = module_get("supermon_proc"); > >> modprobe(mod, mod->args); > >> > >> > > > >This code is a bit different in a version of beoboot from the > >current cvs tree. > > > > /* Load the modules we require */ > > mod = module_get(0, "bproc"); > > if (!mod) fatal("Failed to load the bproc module.\n"); > > modprobe(mod, mod->args); > > > > /* Load optional modules */ > > mod = module_get(0, "supermon_proc"); > > if (mod) > > modprobe(mod, mod->args); > > > > > > > >>Now the boot process dies with (grabbed from the screen): > >> > >> > >.... > > > > > >>Installing module tg3: tg3.c (ver. 3.6) > >>eth0: Tigon3 [partno BCM9570] [ver. 2003] [PCIX: 100MHz:64 bit > >>10/100/1000BaseT] [PHY 15704] > >>socket: Address family not supported by protocol > >> > >> > > > >Broadcast troubles? Do you have some fancy switch between master > >and nodes? > > > > Michal > > > > > >------------------------------------------------------- > >This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > >Use IT products in your business? Tell us what you think of them. Give us > >Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > >http://productguide.itmanagersjournal.com/guidepromo.tmpl > >_______________________________________________ > >BProc-users mailing list > >BPr...@li... > >https://lists.sourceforge.net/lists/listinfo/bproc-users > > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 55 Marietta Street Suite 1830 Atlanta, Ga 30303 866 824 9737 support |
From: Michal J. <mi...@ha...> - 2004-10-05 20:35:20
|
On Tue, Oct 05, 2004 at 02:48:48PM -0400, Ted Sariyski wrote: > It was a misconfiguration, eth1 and eth2 were configured to the > same IP. After I fix it the diskless node loads successfully tg3 > and eepro. The problem is that it proceeds with loading e100. This is somewhat excessive as both e100 and eepro support the same hardware (and use rather e100). A comment in /etc/clustermatic/config.boot says: # Beoboot will now automatically gather all the network drivers it can # find. Use --noautomod to prevent this behavior. which likely means that you configured both modules for your kernel and got beoboot confused. You may fix that with '--noautomod' when making your kernel images for nodes but then you will have to specify what you want loaded and this likely means only tg3. > There is no e100 interface and it fails with "socet: Address > family not supported". Well, there is an interface where this module could be used but it is already "taken". > P.S. Where is the cvs repository for beoboot? http://sourceforge.net/projects/bproc/ Look at the bottom and also a mesage from Erik on 17th of August in an archive of this list with a big "ANNOUNCE" in the subject. Michal |
From: Ted S. <tsa...@cr...> - 2004-10-05 18:52:16
|
It was a misconfiguration, eth1 and eth2 were configured to the same IP. After I fix it the diskless node loads successfully tg3 and eepro. The problem is that it proceeds with loading e100. There is no e100 interface and it fails with "socet: Address family not supported". I do not have e100 in config.boot. I tried with and without --noautomod without success. I am unable to find 'e100' in the code, so if it is hardwired it's not transparent. What is the next move? Thanks, Ted P.S. Where is the cvs repository for beoboot? Michal Jaegermann wrote: >On Tue, Oct 05, 2004 at 11:48:49AM -0400, Ted Sariyski wrote: > > >>>At level 2 the diskless node loads bproc module correctly >>>but complains that cannot find supermon_proc module. >>> >>> >>Ok, supermon_proc is optional but is not controlled from >>config.boot. It's hardwired in beoboot's boot.c: >> >> /* Load the modules we require */ >> mod = module_get("bproc"); >> modprobe(mod, mod->args); >> mod = module_get("supermon_proc"); >> modprobe(mod, mod->args); >> >> > >This code is a bit different in a version of beoboot from the >current cvs tree. > > /* Load the modules we require */ > mod = module_get(0, "bproc"); > if (!mod) fatal("Failed to load the bproc module.\n"); > modprobe(mod, mod->args); > > /* Load optional modules */ > mod = module_get(0, "supermon_proc"); > if (mod) > modprobe(mod, mod->args); > > > >>Now the boot process dies with (grabbed from the screen): >> >> >.... > > >>Installing module tg3: tg3.c (ver. 3.6) >>eth0: Tigon3 [partno BCM9570] [ver. 2003] [PCIX: 100MHz:64 bit >>10/100/1000BaseT] [PHY 15704] >>socket: Address family not supported by protocol >> >> > >Broadcast troubles? Do you have some fancy switch between master >and nodes? > > Michal > > >------------------------------------------------------- >This SF.net email is sponsored by: IT Product Guide on ITManagersJournal >Use IT products in your business? Tell us what you think of them. Give us >Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more >http://productguide.itmanagersjournal.com/guidepromo.tmpl >_______________________________________________ >BProc-users mailing list >BPr...@li... >https://lists.sourceforge.net/lists/listinfo/bproc-users > > |
From: Michal J. <mi...@ha...> - 2004-10-05 16:34:48
|
On Tue, Oct 05, 2004 at 11:48:49AM -0400, Ted Sariyski wrote: > > At level 2 the diskless node loads bproc module correctly > > but complains that cannot find supermon_proc module. > > > Ok, supermon_proc is optional but is not controlled from > config.boot. It's hardwired in beoboot's boot.c: > > /* Load the modules we require */ > mod = module_get("bproc"); > modprobe(mod, mod->args); > mod = module_get("supermon_proc"); > modprobe(mod, mod->args); This code is a bit different in a version of beoboot from the current cvs tree. /* Load the modules we require */ mod = module_get(0, "bproc"); if (!mod) fatal("Failed to load the bproc module.\n"); modprobe(mod, mod->args); /* Load optional modules */ mod = module_get(0, "supermon_proc"); if (mod) modprobe(mod, mod->args); > Now the boot process dies with (grabbed from the screen): .... > Installing module tg3: tg3.c (ver. 3.6) > eth0: Tigon3 [partno BCM9570] [ver. 2003] [PCIX: 100MHz:64 bit > 10/100/1000BaseT] [PHY 15704] > socket: Address family not supported by protocol Broadcast troubles? Do you have some fancy switch between master and nodes? Michal |
From: Ted S. <tsa...@co...> - 2004-10-05 15:53:14
|
> At level 2 the diskless node loads bproc module correctly > but complains that cannot find supermon_proc module. Ok, supermon_proc is optional but is not controlled from config.boot. It's hardwired in beoboot's boot.c: /* Load the modules we require */ mod = module_get("bproc"); modprobe(mod, mod->args); mod = module_get("supermon_proc"); modprobe(mod, mod->args); I commented out the last two lines and was able to move one step ahead. Now the boot process dies with (grabbed from the screen): ... TCP: Hash tables configured ... ... NET: Registered protocol family 2 ... NET: Registered protocol family 1 ... bproc: BProc ... (version info) Installing module tg3: tg3.c (ver. 3.6) eth0: Tigon3 [partno BCM9570] [ver. 2003] [PCIX: 100MHz:64 bit 10/100/1000BaseT] [PHY 15704] socket: Address family not supported by protocol tg3 is the right module. What do I miss now? Thanks, Ted Ted Sariyski wrote: > Hi, > >I sent this mail a week or so ago but didn't get a response. I have CM4 running on a P4 cluster. Now I'm trying to move from CM4 to a newer custom kernel 2.6.7 (for a new x86_64 Opteron cluster). I compiled 2.6.7 kernel with bproc-4.0.0pre6, cmtools-1.4, beoboot-cm1.9 and bjs-1.5. I generated boot level 1 kernel/initrd and boot level 2 boot.img. At level 2 the diskless node loads bproc module correctly but complains that cannot find supermon_proc module. I am able to compile all the objects from supermon-1.4-sc2003 except the kernel modules, where supermon_proc module is (errors are attached). I tried everything that comes to my head but I really stuck there. What do I do wrong? Any help will be highly appreciated. >Thanks, Ted > > > > [root@xtreme101 kernel]# make > gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -D__KERNEL__ > -DMODULE -I. -I/usr/src/linux-2.6.7/include -c supermon_proc.c > supermon_proc.c:43:31: linux/modversions.h: No such file or directory > supermon_proc.c: In function `supermon_proc_doinfo': > supermon_proc.c:258: warning: `MOD_INC_USE_COUNT' is deprecated > (declared at /usr/src/linux-2.6.7/include/linux/module.h:529) > supermon_proc.c:265: warning: `MOD_DEC_USE_COUNT' is deprecated > (declared at /usr/src/linux-2.6.7/include/linux/module.h:541) > supermon_proc.c: In function `supermon_convert_value': > supermon_proc.c:389: error: `kstat' undeclared (first use in this > function) > supermon_proc.c:389: error: (Each undeclared identifier is reported > only once > supermon_proc.c:389: error: for each function it appears in.) > supermon_proc.c:407: warning: long unsigned int format, different type > arg (arg 4) > supermon_proc.c:407: warning: long unsigned int format, different type > arg (arg 5) > supermon_proc.c:407: warning: long unsigned int format, different type > arg (arg 6) > supermon_proc.c:407: warning: int format, pointer arg (arg 7) > supermon_proc.c: In function `supermon_proc_dovalue': > supermon_proc.c:466: warning: `MOD_INC_USE_COUNT' is deprecated > (declared at /usr/src/linux-2.6.7/include/linux/module.h:529) > supermon_proc.c:470: warning: `MOD_DEC_USE_COUNT' is deprecated > (declared at /usr/src/linux-2.6.7/include/linux/module.h:541) > make: *** [supermon_proc.o] Error 1 > |
From: Ted S. <tsa...@co...> - 2004-10-04 23:13:23
|
Hi, I sent this mail a week or so ago but didn't get a response. I have CM4 running on a P4 cluster. Now I'm trying to move from CM4 to a newer custom kernel 2.6.7 (for a new x86_64 Opteron cluster). I compiled 2.6.7 kernel with bproc-4.0.0pre6, cmtools-1.4, beoboot-cm1.9 and bjs-1.5. I generated boot level 1 kernel/initrd and boot level 2 boot.img. At level 2 the diskless node loads bproc module correctly but complains that cannot find supermon_proc module. I am able to compile all the objects from supermon-1.4-sc2003 except the kernel modules, where supermon_proc module is (errors are attached). I tried everything that comes to my head but I really stuck there. What do I do wrong? Any help will be highly appreciated. Thanks, Ted [root@xtreme101 kernel]# make gcc -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -D__KERNEL__ -DMODULE -I. -I/usr/src/linux-2.6.7/include -c supermon_proc.c supermon_proc.c:43:31: linux/modversions.h: No such file or directory supermon_proc.c: In function `supermon_proc_doinfo': supermon_proc.c:258: warning: `MOD_INC_USE_COUNT' is deprecated (declared at /usr/src/linux-2.6.7/include/linux/module.h:529) supermon_proc.c:265: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at /usr/src/linux-2.6.7/include/linux/module.h:541) supermon_proc.c: In function `supermon_convert_value': supermon_proc.c:389: error: `kstat' undeclared (first use in this function) supermon_proc.c:389: error: (Each undeclared identifier is reported only once supermon_proc.c:389: error: for each function it appears in.) supermon_proc.c:407: warning: long unsigned int format, different type arg (arg 4) supermon_proc.c:407: warning: long unsigned int format, different type arg (arg 5) supermon_proc.c:407: warning: long unsigned int format, different type arg (arg 6) supermon_proc.c:407: warning: int format, pointer arg (arg 7) supermon_proc.c: In function `supermon_proc_dovalue': supermon_proc.c:466: warning: `MOD_INC_USE_COUNT' is deprecated (declared at /usr/src/linux-2.6.7/include/linux/module.h:529) supermon_proc.c:470: warning: `MOD_DEC_USE_COUNT' is deprecated (declared at /usr/src/linux-2.6.7/include/linux/module.h:541) make: *** [supermon_proc.o] Error 1 |
From: Steven J. <py...@li...> - 2004-10-04 18:30:10
|
Greetings, You need to: mkdir /bpfs mount -t bpfs /bpfs G'day, sjames On Mon, 4 Oct 2004, Gustavo Gobi Martinelli wrote: > Someone knows something about this error? > > bproc_notifies: No such file or directory > > I could install the bproc-4.0.0pre6 in the kernel ok and install it on my > computer ok too. > > I executed: > > # insmod bproc.ko > # insmod vmadump.ko > # ldconfig > > And it is ok. > > So I check it with: > > #lsmod > #ldconfig -p | grep bproc > > It is ok too. > > I can execute > bpctl, bplib, bpmaster, bpslave, bpsh, bpcp. But bpstat returns this erro= r for > me: > > bproc_notifier: No such file or directory. > > I=B4m already standalone, but the other commands execute without an error= =2E Why > bpstat returns this? > > -- > Atenciosamente, > Gustavo Gobi Martinelli > Linux User# 270627 > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out mo= re > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 55 Marietta Street Suite 1830 Atlanta, Ga 30303 866 824 9737 support |
From: Gustavo G. M. <gu...@ma...> - 2004-10-04 17:15:38
|
Someone knows something about this error? bproc_notifies: No such file or directory I could install the bproc-4.0.0pre6 in the kernel ok and install it on my computer ok too. I executed: # insmod bproc.ko # insmod vmadump.ko # ldconfig And it is ok. So I check it with: #lsmod #ldconfig -p | grep bproc It is ok too. I can execute bpctl, bplib, bpmaster, bpslave, bpsh, bpcp. But bpstat returns this error for me: bproc_notifier: No such file or directory. I´m already standalone, but the other commands execute without an error. Why bpstat returns this? -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 |
From: Eugenio J. Y. <eji...@ds...> - 2004-10-01 09:15:16
|
Hi all I'm trying to move from CM4 to a newer custom kernel (2.6.7 seems to be the limit for bproc) Of course, I have set up two test nodes, one master and one slave ( I don't want hear my users crying). The master node is a RH9 box with a custom compiled gcc 3.4.0. I compiled a bproc enabled kernel (2.6.7), bproc4.0pre6, cmtools1.4 and bjs1.5 without a glitch. Beoboot-cm1.9 was a bit difficult; I cannot compile the files in monte directory. If I put HAVE_MONTE=n in the main Makefile, I can compile and install all the utilities, generate cdrom phase2 images (lovely -e option) and boot my slave node... but I really would like to have monte for my 2.6.7 future new cluster (one image for all nodes is my idea of an admin's dream) This is the output of the failed compilation make -C monte LINUX=/lib/modules/2.6.7-nodo/build EXTRAKDEFS="" kmonte.ko make[1]: Entering directory `/usr/src/Bproc/beoboot-cm1.9/monte' gcc -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -pipe -msoft-float -mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=pentium4 -I/lib/modules/2.6.7-nodo/build/include/asm-i386/mach-default ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What does it mean this include directive? -O2 -fomit-frame-pointer -Wdeclaration-after-statement -DMODULE -DPACKAGE_VERSION='"cm1.9"' -I. -c -o kmonte.o kmonte.c In file included from /usr/include/linux/sched.h:14, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ including kernel headers from /usr/include/linux??? from /usr/include/linux/mm.h:4, from /usr/include/linux/pagemap.h:10, from kmonte.c:66: /usr/include/linux/timex.h:173: error: field `time' has incomplete type ....a bunch of errors omited here............................... If I change that include line to /lib/modules/2.6.7-nodo/build/include (kernel headers) make -C monte LINUX=/lib/modules/2.6.7-nodo/build EXTRAKDEFS="" kmonte.ko make[1]: Entering directory `/usr/src/Bproc/beoboot-cm1.9/monte' gcc -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -pipe -msoft-float -mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=pentium4 -I/lib/modules/2.6.7-nodo/build/include/ -O2 -fomit-frame-pointer -Wdeclaration-after-statement -DMODULE -DPACKAGE_VERSION='"cm1.9"' -I. -c -o kmonte.o kmonte.c In file included from /lib/modules/2.6.7-nodo/build/include/linux/spinlock.h:16, from /lib/modules/2.6.7-nodo/build/include/linux/jiffies.h:6, from /lib/modules/2.6.7-nodo/build/include/linux/sched.h:12, from /lib/modules/2.6.7-nodo/build/include/linux/module.h:10, from kmonte.c:65: /lib/modules/2.6.7-nodo/build/include/asm/processor.h: In function `load_esp0': /lib/modules/2.6.7-nodo/build/include/asm/processor.h:459: warning: implicit declaration of function `unlikely' In file included from /lib/modules/2.6.7-nodo/build/include/linux/jiffies.h:6, from /lib/modules/2.6.7-nodo/build/include/linux/sched.h:12, from /lib/modules/2.6.7-nodo/build/include/linux/module.h:10, from kmonte.c:65: /lib/modules/2.6.7-nodo/build/include/linux/spinlock.h: In function `bit_spin_lock': /lib/modules/2.6.7-nodo/build/include/linux/spinlock.h:415: warning: implicit declaration of function `current_thread_info' /lib/modules/2.6.7-nodo/build/include/linux/spinlock.h:415: error: invalid type argument of `->' ....a bunch of errors omited here............................... Any idea? BTW: I have been using v9fs for CM4 and I like it but it seems its support stopped in 2.4.22 Is there anything like v9fs but with 2.6 support? TIA --------------------------------------------------------------------- Eugenio Jimenez Yguacel eji...@ds... E.T.S.I. Telecomunicacion Tfno: (+34)-28-457368 Campus de Tafira S/N Fax: (+34)-28-451243 35017 Las Palmas, Spain Beer, breakfast for champions! --------------------------------------------------------------------- |
From: Gustavo G. M. <gu...@ma...> - 2004-09-30 19:53:01
|
Luke, As you said, I got the kernel 2.6.7 from kernel.org because I was using the rpm kernel-source from Fedora´s site. I installed the original source-kernel and I applied the bproc-4.0.0pre6 patch, where, it was applied without one error. Now, I'm following the next steps to finish my beowulf cluster. :o\ Difficult Thank´s. -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 |
From: Luke P. <lop...@wi...> - 2004-09-29 15:08:20
|
Gustavo, Are you using a kernel downloaded directly from kernel.org? It should just work. I have successfully used patched kernels with this version of bproc, but it required some manual editing (simple stuff if you know C). -Luke -----Original Message----- From: bpr...@li... [mailto:bpr...@li...] On Behalf Of Gustavo Gobi Martinelli Sent: Wednesday, September 29, 2004 9:44 AM To: bpr...@li... Subject: [BProc] bproc-4.0.0pre6 patch error on kernel 2.6.7. Hi everybody again, Luke has helped me and this message is the continuation of the problems. Luke, as you said, I installed the kernel 2.6.7 and recompiled it. Until this, Ok. The kernel 2.6.7 was compiled. The next step was apply the bproc-4.0.0pre6. I execute the following command: patch -f -p1 < ../bproc-4.0.0pre6/patches/bproc-4.0.0pre6 But, again, I got some errors. Even this, I tried to recompile de kernel, but without success. A question: Can I edit the files where the patch failed, looking for the block code in the patch bproc-4.0.0pre6 and pasting it in the kernel file? The errors applying the patch are: patching file arch/i386/kernel/i386_ksyms.c Hunk #1 FAILED at 206. 1 out of 1 hunk FAILED -- saving rejects to file arch/i386/kernel/i386_ksyms.c.rej patching file arch/i386/kernel/process.c Hunk #1 FAILED at 36. 1 out of 3 hunks FAILED -- saving rejects to file arch/i386/kernel/process.c.rej patching file arch/i386/kernel/traps.c Hunk #1 FAILED at 61. 1 out of 1 hunk FAILED -- saving rejects to file arch/i386/kernel/traps.c.rej patching file arch/x86_64/kernel/x8664_ksyms.c Hunk #1 FAILED at 219. 1 out of 1 hunk FAILED -- saving rejects to file arch/x86_64/kernel/x8664_ksyms.c.rej patching file kernel/sched.c Hunk #1 FAILED at 40. 1 out of 6 hunks FAILED -- saving rejects to file kernel/sched.c.rej -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |