You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: Daryl W. G. <dw...@la...> - 2004-08-30 14:52:51
|
Attached is a perl script that I use to do this en-mass on our clusters. I call it bpformat, and it expects a config file of your partition scheme in /etc/bpformat.conf (but you can override the location/name on the command line). Type 'bpformat --man' to get the man page, esp. for examples of what the config file should look like. Daryl > Date: Sat, 28 Aug 2004 22:06:45 +0000 (UTC) > From: Steven James <py...@li...> > To: Luke Palmer <lop...@wi...> > cc: bpr...@li... > Subject: Re: [BProc] partitioning, scyld > > Greetings, > > I don't know what's up with Scyld, but I can address disk partitioning. > > I just use mknod to create the needed entries in /dev or add something > like: > plugin miscfiles /dev/hd* > in /etc/clustermatic/node_up.conf > > Then partition one node the way I want it with fdisk, and duplicate it > with sfdisk: > > bpsh 0 fdisk /dev/hda > <make some partitions> > bpsh 0 sfdisk -d /dev/hda >part > bpsh allup sfdisk /dev/hda <part > > G'day, > sjames > > > ||||| |||| ||||||||||||| ||| > by Linux Labs International, Inc. > Steven James, CTO > > 55 Marietta Street > Suite 1830 > Atlanta, Ga 30303 > 866 824 9737 support > > > On Sat, 28 Aug 2004, Luke Palmer wrote: > > > Hello, > > > > This may seem a dumb/obvious question, but I'm trying to figure out how > > to easily partition disks on slave nodes. I can't just do 'bpsh X > > fdisk' because the /dev filesystem isn't present on slave nodes. So- > > what's a good way to partition? > > > > Of course I read about beofdisk after some google searches. Now, I know > > scyld has always liked to more or less hide the ways of getting their > > software for free, but I can't find scyld ANYWHERE. I recall that > > penguin computing has something to do with it now- poked around there > > too with no luck. Can any one comment of the fate of scyld? > > > > -Luke |
From: Steven J. <py...@li...> - 2004-08-28 22:06:32
|
Greetings, I don't know what's up with Scyld, but I can address disk partitioning. I just use mknod to create the needed entries in /dev or add something like: plugin miscfiles /dev/hd* in /etc/clustermatic/node_up.conf Then partition one node the way I want it with fdisk, and duplicate it with sfdisk: bpsh 0 fdisk /dev/hda <make some partitions> bpsh 0 sfdisk -d /dev/hda >part bpsh allup sfdisk /dev/hda <part G'day, sjames ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 55 Marietta Street Suite 1830 Atlanta, Ga 30303 866 824 9737 support On Sat, 28 Aug 2004, Luke Palmer wrote: > Hello, > > This may seem a dumb/obvious question, but I'm trying to figure out how > to easily partition disks on slave nodes. I can't just do 'bpsh X > fdisk' because the /dev filesystem isn't present on slave nodes. So- > what's a good way to partition? > > Of course I read about beofdisk after some google searches. Now, I know > scyld has always liked to more or less hide the ways of getting their > software for free, but I can't find scyld ANYWHERE. I recall that > penguin computing has something to do with it now- poked around there > too with no luck. Can any one comment of the fate of scyld? > > -Luke > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Luke P. <lop...@wi...> - 2004-08-28 20:01:46
|
Hello, This may seem a dumb/obvious question, but I'm trying to figure out how to easily partition disks on slave nodes. I can't just do 'bpsh X fdisk' because the /dev filesystem isn't present on slave nodes. So- what's a good way to partition? Of course I read about beofdisk after some google searches. Now, I know scyld has always liked to more or less hide the ways of getting their software for free, but I can't find scyld ANYWHERE. I recall that penguin computing has something to do with it now- poked around there too with no luck. Can any one comment of the fate of scyld? -Luke |
From: Vipul D. <vip...@ya...> - 2004-08-24 23:01:28
|
Folks, I have a configuration of 4 identical "compute" nodes (with disks) and 1 slightly different "master" node. The master node has a slightly more powerful CPU, more RAM, an additonal 10/100 NIC interface to connect to extranet, a CDROM drive on secondary IDE, a more recent verison of BIOS firmware. With Red Hat 9 (runlevel 3) and Clustermatic4 i386 RPMS installed on computeNode#1 as master, I can build a cluster with the other 4 nodes as slaves. (I see one node rebooting every 5-6 minutes that I need to debug). I am using this cluster currently. However, with RedHat 9 (runlevel 3 or 5; more packages) and Clustermatic4 i386 RPMS installed on the "master", I cannot get any of the "compute" nodes up as slaves. The node_up script fails on all nodes with : nodeup : Starting 1 child processes. nodeup : Finished creating child processes. nodeup : I/O error talking to child nodeup : Child process for node 1 died with signal 4 The same config, node_up, config.boot scripts execute in both cluster configuration attempts - one succeeds and the other fails. Any insight why this would happen? Thanks. Vipul |
From: Adrian <scr...@be...> - 2004-08-23 02:16:08
|
Has anyone used it with with a bproc style setup? I'm liking it so far, a kernel module can be used for pvfs2. Wondering if there Are known issues. Adrian |
From: <er...@he...> - 2004-08-20 16:50:51
|
On Wed, Aug 18, 2004 at 01:04:07PM -0600, Michal Jaegermann wrote: > On Wed, Aug 18, 2004 at 02:08:37PM -0400, Adrian Thiele wrote: > > I use a /28 on mine, and have not experienced any problems. But I also > > specify my netmasks. > > Well, that patch was really about that that a clustermatic startup > file ignores your netmask specification and a way to fix that > without relying on utilities which are present only on some > installations. I think it only crops up with IP multicast and broadcast situations. The script should definitely get it right. > > Leaving networking config to the OS means less changes are needed > > for things like ipv6. I do however, see your point and concern > > for correct configuration of the nodes. I've checked in Michal's patch to do the ip + netmask -> broadcast address correctly. I also modified it so that having the IP and netmask at all is now optional. If you do use it, it burps out a little message stating that using the clustermatic config to set IP addresses is deprecated. I think it should be removed at some point. > > Slightly different topic 2.6.8.1 looks to be fairly sane . I > > haven`t been able to use 2.6 until now, for a variety of non > > Bproc issues, mostly the ips driver. 2.6.8.1 built very clean > > for me yesterday. I`ll try it with Bproc today. > > I tried that with bproc-4.0.0pre6 a few days ago. A kernel patch > needed some adjustments. No idea yet how different that is from > what is now in CVS. A node came up without any fuss and looked ok > but bpsh got an indigestion. A filp_open() call errored out. > I cannot tell much more at the moment. I have a 2.6.8 patch in CVS at this point. So far it looks functional to me. - Erik |
From: Michal J. <mi...@ha...> - 2004-08-20 15:40:18
|
As you can find from sources an old "kmod" mechanism of inserting extra modules on cluster nodes is, at least so far, not implemented in bproc-4.0.0pre... This does not mean that 'modprobe' cannot achieve the same results. There are some catches (although some would show up regardless of a method you would employ). Obviously you need required modules on nodes but also files /etc/modprobe.conf.dist and /etc/modprobe. The later are the easiest to plant there with a help of 'plugin miscfiles ...' in node_up.conf although you may want versions tailored to needs of nodes which may be somewhat different than those for a master. The other thing is that various modules may have defined 'install' actions which are often not that simple pieces of a shell code, referring to programs by an absolute path, and bpsh does not take kindly to any of these. Therefore you should mostly executed modprobe with '--ignore', and the same for 'modprobe -r', and if any actions are needed then peform them yourself. Likely the best explanation by what I mean is a bash code which I wrote to enable NFS mounts from clients. It assumes that the same kernel is used on your master and on nodes. If this is not true then adjust accordingly. The same pattern can be used for loading other module sets. #!/bin/bash # # A sample of setting up NFS modules on a node. # It is better if /etc/modules.conf.dist for a node does not # define any 'install' actions for these although we can '--ignore' # # Michal Jaegermann, 2004/Aug/19, mi...@ha... # node=$1 mod=nfs modules=$(grep $mod.ko /lib/modules/$(uname -r)/modules.dep) modules=${modules/:/} modules=$( for m in $modules ; do echo $m done | tac ) ( cd / for m in $modules ; do echo $m done ) | ( cd / ; cpio -o -c --quiet ) | bpsh -N $node cpio -imd --quiet bpsh $node depmod -a for m in $modules ; do m=$(basename $m .ko) m=${m/_/-} # names "on-disk" and "processed" may differ case m in sunrpc) bpsh $node modprobe -i sunrpc bpsh $node mkdir -p /var/lib/nfs/rpc_pipefs bpsh $node mount | grep -q rpc_pipefs || \ bpsh -N $node mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs ;; *) bpsh $node modprobe -i $m ;; esac done This can be, for example, sourced from /etc/clustermatic/node_up script and after it completed then you can do things like 'mount master:/something/exported /my/mount/point'. Otherwise, if NFS support is not "build-in" the you see "nfs file system is not supported by this kernel" or something in that style. One more note about extra entries in /dev/ you may need on nodes. If you want an access to a disk, say, /dev/hda then 'device hda' in /etc/clustermatic/config works fine but an old 'device hda*' seems now to be ignored. OTOH, in node_up.conf, 'plugin miscfiles /dev/hda*' works fine but if you need devices to support, say, lm_sensors then 'plugin miscfiles /dev/i2*' will create on a node only entries directly below /dev and you need more explicit 'plugin miscfiles /dev/i2* /dev/i2o/*' to get everything. Michal |
From: Michal J. <mi...@ha...> - 2004-08-18 19:05:24
|
On Wed, Aug 18, 2004 at 02:08:37PM -0400, Adrian Thiele wrote: > I use a /28 on mine, and have not experienced any problems. But I also > specify my netmasks. Well, that patch was really about that that a clustermatic startup file ignores your netmask specification and a way to fix that without relying on utilities which are present only on some installations. > Leaving networking config to the OS means less changes are needed > for things like ipv6. I do however, see your point and concern > for correct configuration of the nodes. Yes, there are some tradeoffs both ways. I am not sure if ipv6 is needed at all for a cluster. At least for now. > Slightly different topic 2.6.8.1 looks to be fairly sane . I > haven`t been able to use 2.6 until now, for a variety of non > Bproc issues, mostly the ips driver. 2.6.8.1 built very clean > for me yesterday. I`ll try it with Bproc today. I tried that with bproc-4.0.0pre6 a few days ago. A kernel patch needed some adjustments. No idea yet how different that is from what is now in CVS. A node came up without any fuss and looked ok but bpsh got an indigestion. A filp_open() call errored out. I cannot tell much more at the moment. Michal |
From: Adrian T. <ad....@gm...> - 2004-08-18 18:08:40
|
I use a /28 on mine, and have not experienced any problems. But I also specify my netmasks. Leaving networking config to the OS means less changes are needed for things like ipv6. I do however, see your point and concern for correct configuration of the nodes. Slightly different topic 2.6.8.1 looks to be fairly sane . I haven`t been able to use 2.6 until now, for a variety of non Bproc issues, mostly the ips driver. 2.6.8.1 built very clean for me yesterday. I`ll try it with Bproc today. On Wed, 18 Aug 2004 09:41:17 -0600, Michal Jaegermann <mi...@ha...> wrote: > On Wed, Aug 18, 2004 at 05:44:59AM -0400, Adrian wrote: > > > > It's late or early , however you look at it, but let me see if I'm > > following along. You may possibly have trouble with the second > > monte image , IF you elect to use non-standard ip-addressing for > > the nodes? > > Probably not in "simple" setups but it is hard to predict what needs > may arise and what will be effects of incorrect network parameters. > > > Can I assume 'atypical' to be a deviation from the spec. > > What specs and which "deviation"? To be more concrete if you will > use an address starting with 10. then, does not matter what network > mask you specified, if ifconfig is not given an explicit mask it > will assume 10.255.255.255 for broadcast, i.e. /8 mask, which is > wrong and contravenes specs. Old "Class A, Class B, Class C" > routing is obsolete and dead for many years. ifconfig could be > "smarter" but it is not and the mask in question could be equally > well 24, or 25 or 22 for that matter, bits wide. "Atypical" above > meant not something which violates standards but it was short for > "cases when ifconfig is screwing up without an explicit help". > > > Can anyone possibly give me a legitimate reason for this? > > For what? > > > I have never needed atypical ip addressing. > > In other words "I rely on guessing correctly which implicit > network parameters my interface configuration utility may use"? > I prefer some better defined behaviour which follows standards; > especially when this can be achieved quickly using regular > tools. > > > Knowing which interface to "talk" to the nodes is a routing issue. > > 'beoserv' needs to know where to listen for RARP requests > before nodes will have their network interfaces configured. > No routing yet involved here. > > > > Michal > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Michal J. <mi...@ha...> - 2004-08-18 15:41:26
|
On Wed, Aug 18, 2004 at 05:44:59AM -0400, Adrian wrote: > > It's late or early , however you look at it, but let me see if I'm > following along. You may possibly have trouble with the second > monte image , IF you elect to use non-standard ip-addressing for > the nodes? Probably not in "simple" setups but it is hard to predict what needs may arise and what will be effects of incorrect network parameters. > Can I assume 'atypical' to be a deviation from the spec. What specs and which "deviation"? To be more concrete if you will use an address starting with 10. then, does not matter what network mask you specified, if ifconfig is not given an explicit mask it will assume 10.255.255.255 for broadcast, i.e. /8 mask, which is wrong and contravenes specs. Old "Class A, Class B, Class C" routing is obsolete and dead for many years. ifconfig could be "smarter" but it is not and the mask in question could be equally well 24, or 25 or 22 for that matter, bits wide. "Atypical" above meant not something which violates standards but it was short for "cases when ifconfig is screwing up without an explicit help". > Can anyone possibly give me a legitimate reason for this? For what? > I have never needed atypical ip addressing. In other words "I rely on guessing correctly which implicit network parameters my interface configuration utility may use"? I prefer some better defined behaviour which follows standards; especially when this can be achieved quickly using regular tools. > Knowing which interface to "talk" to the nodes is a routing issue. 'beoserv' needs to know where to listen for RARP requests before nodes will have their network interfaces configured. No routing yet involved here. Michal |
From: Adrian <scr...@be...> - 2004-08-18 09:42:43
|
It's late or early , however you look at it, but let me see if I'm following along. You may possibly have trouble with the second monte image , IF you elect to use non-standard ip-addressing for the nodes? Can I assume 'atypical' to be a deviation from the spec. Can anyone possibly give me a legitimate reason for this? I work on a very large network with over 580,000 public ip addresses , I beoboot like a madman. I have never needed atypical ip addressing. I have had almost every bizarre problem available. I have never had a broadcast problem. Knowing which interface to "talk" to the nodes is a routing issue. This is not a good idea. It should be handled by the OS mechanism,the chance for conflict is Increased by having two separate processes Calculate and configure the same setting. -----Original Message----- From: "Michal Jaegermann"<mi...@ha...> Sent: 08/18/2004 12:22:32 AM To: "er...@he..."<er...@he...> Cc: "bpr...@li..."<bpr...@li...> Subject: Re: [BProc] Re: broadcast in startup files (beoboot) On Tue, Aug 17, 2004 at 03:25:06PM -0600, er...@he... wrote: > > How about optional? If the addresses are there, do something, else > ignore the interface and presume some other part of the OS setup has > configured it. Hm, I always though that the reason why this is in a clustermatic config file is that beosrv, and possibly other pieces, need to know on which interface to talk to a cluster and how it is configured. If the other part is not required then indeed this may be configured somewewhere else but this has a much higher probability that something will get out-of-sync. If you will need change something you have to remember to change few different places. > I think it's reasonable to phase it out. This seems like a feasibly but quite radical approach and people will do get some "strange" errors. A global search-and-replace on one file will be not enough to reconfigure networking as you will need to provide a range of IP numbers for cluster nodes. Even if you will not forget about that other location then typos are not something unheard of. > > Yeah, random values are bad. So here is a promissed replacement for finding BCAST without ipcalc. One of reasons I used awk is that it lives in /bin/awk (although some other things presume that /usr/bin is already available). --- beoboot-cm1.9/rc.clustermatic.bk 2004-08-16 09:45:35.000000000 -0600 +++ beoboot-cm1.9/rc.clustermatic 2004-08-17 21:40:04.016426648 -0600 @@ -155,8 +155,22 @@ echo >&2 "Error: No netmask given for interface." exit 1 fi - #BCAST=`ipcalc --broadcast $ADDR $NMSK | sed -e 's/.*=//'` - if ifconfig $IF $ADDR netmask $NMSK; then + BCAST=`echo $ADDR $NMSK | awk ' + function octstonum(octs, a) + { + split(octs, a, /\./); + return ((a[1]*0x100 + a[2])*0x100 + a[3])*0x100 + a[4]; + } + { + bcast = or(octstonum($1), xor(octstonum($2), 0xffffffff)); + printf ("%d.%d.%d.%d", + rshift(and(bcast, 0xff000000), 24), + rshift(and(bcast, 0x00ff0000), 16), + rshift(and(bcast, 0x0000ff00), 8), + and(bcast, 0x000000ff)); + } + '` + if ifconfig $IF $ADDR netmask $NMSK broadcast $BCAST; then Xsuccess else Xfailure The above does work on 64-bit machines too. Actually 'ifconfig' will be just as happy with an output of 'print or(octstonum($1), xor(octstonum($2), 0xffffffff));' for $BCAST so splitting that for octets is not really necessary and is done mostly for benefits of debugging. The following, simpler, patch works equally well.: --- beoboot-cm1.9/rc.clustermatic.bk 2004-08-16 09:45:35.000000000 -0600 +++ beoboot-cm1.9/rc.clustermatic 2004-08-17 22:15:53.438664800 -0600 @@ -155,8 +155,17 @@ echo >&2 "Error: No netmask given for interface." exit 1 fi - #BCAST=`ipcalc --broadcast $ADDR $NMSK | sed -e 's/.*=//'` - if ifconfig $IF $ADDR netmask $NMSK; then + BCAST=`echo $ADDR $NMSK | awk ' + function octstonum(octs, a) + { + split(octs, a, /\./); + return ((a[1]*0x100 + a[2])*0x100 + a[3])*0x100 + a[4]; + } + { + print or (octstonum($1), xor(octstonum($2), 0xffffffff)); + } + '` + if ifconfig $IF $ADDR netmask $NMSK broadcast $BCAST; then Xsuccess else Xfailure Michal ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Michal J. <mi...@ha...> - 2004-08-18 04:22:50
|
On Tue, Aug 17, 2004 at 03:25:06PM -0600, er...@he... wrote: > > How about optional? If the addresses are there, do something, else > ignore the interface and presume some other part of the OS setup has > configured it. Hm, I always though that the reason why this is in a clustermatic config file is that beosrv, and possibly other pieces, need to know on which interface to talk to a cluster and how it is configured. If the other part is not required then indeed this may be configured somewewhere else but this has a much higher probability that something will get out-of-sync. If you will need change something you have to remember to change few different places. > I think it's reasonable to phase it out. This seems like a feasibly but quite radical approach and people will do get some "strange" errors. A global search-and-replace on one file will be not enough to reconfigure networking as you will need to provide a range of IP numbers for cluster nodes. Even if you will not forget about that other location then typos are not something unheard of. > > Yeah, random values are bad. So here is a promissed replacement for finding BCAST without ipcalc. One of reasons I used awk is that it lives in /bin/awk (although some other things presume that /usr/bin is already available). --- beoboot-cm1.9/rc.clustermatic.bk 2004-08-16 09:45:35.000000000 -0600 +++ beoboot-cm1.9/rc.clustermatic 2004-08-17 21:40:04.016426648 -0600 @@ -155,8 +155,22 @@ echo >&2 "Error: No netmask given for interface." exit 1 fi - #BCAST=`ipcalc --broadcast $ADDR $NMSK | sed -e 's/.*=//'` - if ifconfig $IF $ADDR netmask $NMSK; then + BCAST=`echo $ADDR $NMSK | awk ' + function octstonum(octs, a) + { + split(octs, a, /\./); + return ((a[1]*0x100 + a[2])*0x100 + a[3])*0x100 + a[4]; + } + { + bcast = or(octstonum($1), xor(octstonum($2), 0xffffffff)); + printf ("%d.%d.%d.%d", + rshift(and(bcast, 0xff000000), 24), + rshift(and(bcast, 0x00ff0000), 16), + rshift(and(bcast, 0x0000ff00), 8), + and(bcast, 0x000000ff)); + } + '` + if ifconfig $IF $ADDR netmask $NMSK broadcast $BCAST; then Xsuccess else Xfailure The above does work on 64-bit machines too. Actually 'ifconfig' will be just as happy with an output of 'print or(octstonum($1), xor(octstonum($2), 0xffffffff));' for $BCAST so splitting that for octets is not really necessary and is done mostly for benefits of debugging. The following, simpler, patch works equally well.: --- beoboot-cm1.9/rc.clustermatic.bk 2004-08-16 09:45:35.000000000 -0600 +++ beoboot-cm1.9/rc.clustermatic 2004-08-17 22:15:53.438664800 -0600 @@ -155,8 +155,17 @@ echo >&2 "Error: No netmask given for interface." exit 1 fi - #BCAST=`ipcalc --broadcast $ADDR $NMSK | sed -e 's/.*=//'` - if ifconfig $IF $ADDR netmask $NMSK; then + BCAST=`echo $ADDR $NMSK | awk ' + function octstonum(octs, a) + { + split(octs, a, /\./); + return ((a[1]*0x100 + a[2])*0x100 + a[3])*0x100 + a[4]; + } + { + print or (octstonum($1), xor(octstonum($2), 0xffffffff)); + } + '` + if ifconfig $IF $ADDR netmask $NMSK broadcast $BCAST; then Xsuccess else Xfailure Michal |
From: <er...@he...> - 2004-08-17 22:57:51
|
On Tue, Aug 17, 2004 at 11:01:38AM -0600, Michal Jaegermann wrote: > > One option would be to leave interface configuration out of beoboot > > and let people do that using whatever distribution provided mechanism > > they have. There might be fewer collisions that way too. > > OTOH this will force everybody to hack their startup files and > for many this may be a PITA. I think that this is a bit too > radical. How about optional? If the addresses are there, do something, else ignore the interface and presume some other part of the OS setup has configured it. I think it's reasonable to phase it out. > > The rc.clustermatic script is the only piece that looks at the > > addresses on the interface lines. bpmaster and beoserv look at those > > lines to get interface names, not addresses. The daemons all get > > addresses directly from the interface (as they should). > > That is true but probably not a good enough reason to configure > interfaces to some random values. :-) Yeah, random values are bad. This is an argument to remove (or phase out) the interface configuration from beoboot altogether. It doesn't make much sense to me to have multiple mechanisms for sticking an address on a network interface in one system. - Erik |
From: <er...@he...> - 2004-08-17 21:18:32
|
In light of the high level of input I've been getting lately, I decided to move my CVS repositories to Sourceforge. I hope this makes life easier for people that are actively patching and improving the code. It should also make it easy for people to try the latest 'n' greatest 'n' brokenest versions of stuff. See the Sourceforge project page for more information on how to get to the CVS repository. (http://sourceforge.net/projects/bproc/) So far I've moved the repositories for bproc and beoboot. I plan to move the rest of the stuff "soon". Thanks again to all the people that have been submitting patches. - Erik |
From: <er...@he...> - 2004-08-17 17:35:17
|
On Mon, Aug 16, 2004 at 10:09:15AM -0600, Michal Jaegermann wrote: > This is not earth shattering but sometimes confusing. If you happen > to use some "atypical" addresses and/or masks on a cluster interface > then a broadcast address is guessed wrong and a confusion may ensue. > > There used to be an invocation of 'ipcalc' to compute that broadcast > correctly in a startup file but for some reasons it got commented > out. If 'sed' was of a concern then this is sed-less version. > > --- beoboot-cm1.9/rc.clustermatic~ 2004-08-16 09:13:00.093179048 -0600 > +++ beoboot-cm1.9/rc.clustermatic 2004-08-16 09:24:35.400476272 -0600 > @@ -155,8 +155,9 @@ > echo >&2 "Error: No netmask given for interface." > exit 1 > fi > - #BCAST=`ipcalc --broadcast $ADDR $NMSK | sed -e 's/.*=//'` > - if ifconfig $IF $ADDR netmask $NMSK; then > + BCAST=$(ipcalc --broadcast $ADDR $NMSK) > + BCAST=${BCAST#*=} > + if ifconfig $IF $ADDR netmask $NMSK broadcast $BCAST; then > Xsuccess > else > Xfailure > > This will set that interface parameters to what they were really > configured in /etc/clustermatic/config. The problem here is that ipcalc doesn't exist on everywhere. It's a Red Hat-ism, I believe. The SuSE systems I'm using don't have it. I'd like to be more vendor neutral if possible. One option would be to leave interface configuration out of beoboot and let people do that using whatever distribution provided mechanism they have. There might be fewer collisions that way too. The rc.clustermatic script is the only piece that looks at the addresses on the interface lines. bpmaster and beoserv look at those lines to get interface names, not addresses. The daemons all get addresses directly from the interface (as they should). - Erik |
From: Michal J. <mi...@ha...> - 2004-08-17 17:08:01
|
On Tue, Aug 17, 2004 at 09:17:23AM -0600, er...@he... wrote: > > The problem here is that ipcalc doesn't exist on everywhere. It's a > Red Hat-ism, I believe. The SuSE systems I'm using don't have it. > I'd like to be more vendor neutral if possible. OK, but ipcalc although convenient is far from necessary here. We have an address and a mask so we can produce a broadcast address using, likely only, a shell. Later on when I will have some fifteen minutes for that I will make up something. > One option would be to leave interface configuration out of beoboot > and let people do that using whatever distribution provided mechanism > they have. There might be fewer collisions that way too. OTOH this will force everybody to hack their startup files and for many this may be a PITA. I think that this is a bit too radical. > The rc.clustermatic script is the only piece that looks at the > addresses on the interface lines. bpmaster and beoserv look at those > lines to get interface names, not addresses. The daemons all get > addresses directly from the interface (as they should). That is true but probably not a good enough reason to configure interfaces to some random values. :-) Michal |
From: Peter E. <Pet...@un...> - 2004-08-17 08:10:46
|
Hi, I've seen this on broken systems, that is with faulty power supply or bad condensators on the main board, etc. To figure out if this is the case, I suggest to boot some stand-alone linux on it (knoppix or some boot floppy) and do some heavy disk i/o (e.g. find) and memory test. You may also try some boot options with the CM boot kernel to turn off apm or use less memory. Another possible reason may be that your kernel didn't fit on the floppy and this is why it crashes. Best, Peter. On Mon, Aug 16, 2004 at 10:17:13PM -0700, Vipul Deokar wrote: > Thanks Michal. > > I used your suggestion, and tried booting directly off > floppy and then disk on slave from phase 2 kernel and > initrd image. However, as soon as control passes to > this kernel the system resets to the BIOS - I can't > even see the message that gets printed momentarily > before the reset. > > So, it looks like a beoboot issue for generating the > kernel. The same kernel image is running fine on the > master node, of course with a different initrd image. > Has anyone seen this? > > Thanks. > Vipul |
From: Vipul D. <vip...@ya...> - 2004-08-17 05:17:19
|
Thanks Michal. I used your suggestion, and tried booting directly off floppy and then disk on slave from phase 2 kernel and initrd image. However, as soon as control passes to this kernel the system resets to the BIOS - I can't even see the message that gets printed momentarily before the reset. So, it looks like a beoboot issue for generating the kernel. The same kernel image is running fine on the master node, of course with a different initrd image. Has anyone seen this? Thanks. Vipul --- Michal Jaegermann <mi...@ha...> wrote: > On Mon, Aug 16, 2004 at 06:58:07PM -0700, Vipul > Deokar wrote: > > > > When beobooting off a floppy, my slave node gets > into > > a reboot loop. It manages to boot off the SYSLINUX > > image, successfully RARPs IP addr off the master, > and > > downloads the phase 2 image but then fails with > the > > following error message. > > > > monte: entry point (protected mode): 0x100000 > > ...rebooting in 2 seconds... > > If you can boot from an "external" media, be that a > floppy, CD, a > local disk on a node or a network (say via PXE) then > you can as well > boot a phase 2 kernels directly. Just make with > 'beoboot' script > separate kernel and its initrd and use that. > > I found on various occasions kmonte hack iffy and/or > not working at > all. In particular at least in its current state it > does not seem > to have much chance on 2.6 kernels. There kexec > mechanism could be > used instead but this is still a work-in-progress. > OTOH on nowadays > x86 boards a PXE client in BIOS is something to be > rather expected. > > The main attraction of this two-phase boot is that > if you change > your kernel then you do not have to send it to > nodes, instead of > loading it as a phase 2 kernel, where such things > like local disks > may not even exist (but on Alphas which had to boot > via ARC/milo > this was the only way I found to get nodes going). > If you can boot > over a network anyway then what is the difference? > Load what > you really want to run from your boot server and be > done with it. > > A tip - when PXE-booting that way I am setting up > DHCP and tftp > on one network and configure clustermatic to use the > same > hardware interface but on an alias on another > network so these > two are independent. DHCP will not run on an alias, > so there > is not much choice here, but clustermatic will. > > Michal > |
From: Michal J. <mi...@ha...> - 2004-08-17 04:13:23
|
On Mon, Aug 16, 2004 at 06:58:07PM -0700, Vipul Deokar wrote: > > When beobooting off a floppy, my slave node gets into > a reboot loop. It manages to boot off the SYSLINUX > image, successfully RARPs IP addr off the master, and > downloads the phase 2 image but then fails with the > following error message. > > monte: entry point (protected mode): 0x100000 > ...rebooting in 2 seconds... If you can boot from an "external" media, be that a floppy, CD, a local disk on a node or a network (say via PXE) then you can as well boot a phase 2 kernels directly. Just make with 'beoboot' script separate kernel and its initrd and use that. I found on various occasions kmonte hack iffy and/or not working at all. In particular at least in its current state it does not seem to have much chance on 2.6 kernels. There kexec mechanism could be used instead but this is still a work-in-progress. OTOH on nowadays x86 boards a PXE client in BIOS is something to be rather expected. The main attraction of this two-phase boot is that if you change your kernel then you do not have to send it to nodes, instead of loading it as a phase 2 kernel, where such things like local disks may not even exist (but on Alphas which had to boot via ARC/milo this was the only way I found to get nodes going). If you can boot over a network anyway then what is the difference? Load what you really want to run from your boot server and be done with it. A tip - when PXE-booting that way I am setting up DHCP and tftp on one network and configure clustermatic to use the same hardware interface but on an alias on another network so these two are independent. DHCP will not run on an alias, so there is not much choice here, but clustermatic will. Michal |
From: Vipul D. <vip...@ya...> - 2004-08-17 02:00:45
|
Hi guys, I am using CM4 (i686 RPMs) on my 5-node P4 cluster. When beobooting off a floppy, my slave node gets into a reboot loop. It manages to boot off the SYSLINUX image, successfully RARPs IP addr off the master, and downloads the phase 2 image but then fails with the following error message. monte: entry point (protected mode): 0x100000 ...rebooting in 2 seconds... I searched for this behavior off the web, but couldn't find what's going wrong. I built the phase 2 boot image as "beoboot -2 -n" in the new kernel as root. I had CM4 with i386 RPMs on this cluster and did not see this problem then. Please help. Thanks. Vipul |
From: <er...@he...> - 2004-08-16 17:38:39
|
On Thu, Aug 12, 2004 at 05:43:51PM -0600, Michal Jaegermann wrote: > I think that it speaks for itself. :-) > > --- beoboot-cm1.9/node_up/nodeinfo.c~ 2003-11-05 11:52:07.000000000 -0700 > +++ beoboot-cm1.9/node_up/nodeinfo.c 2004-08-12 15:34:57.354997882 -0600 > @@ -91,7 +91,7 @@ int nodeup_postmove(int argc, char *argv > {"cpus active : %Ld", &values[0], 1}, > {"cycle frequency [Hz] : %Ld", &values[1], 1}, > #endif > -#if defined(__i386__) > +#if defined(__i386__) || defined(__x86_64__) > {"cpu MHz : %Ld", &values[1], 1000000}, > {"processor\t:", &values[0], 0}, > #endif > > Not much bad will happen without it. Simply in clustermatic log > files you will see on x86_64 > cpus=1; hz=0; mem=0 > or something like that if that is missing. Thanks, applied. > What formats should be used for Sparc and PowerPC I am afraid that > I do not know. BProc is busted on ppc right now and never really worked on sparc anyway... - Erik |
From: <er...@he...> - 2004-08-16 16:48:26
|
On Sun, Aug 15, 2004 at 01:54:03PM -0600, Michal Jaegermann wrote: > There is a problem with vmadump_common.c "recycling" kernel memory. > In load_map it _does_ print a strange looking garbage in a case > of a failure (leaving me scratching my head for quite a while :-). > Here is a patch: > > --- bproc-4.0.0pre6/vmadump/vmadump_common.c~ 2004-06-22 12:24:20.000000000 -0600 > +++ bproc-4.0.0pre6/vmadump/vmadump_common.c 2004-08-15 13:33:49.090716846 -0600 > @@ -335,8 +335,8 @@ int vmadump_del_hook(struct vmadump_hook > if (h->hook == hook) { > list_del(&h->list); > up_write(&hook_lock); > - kfree(h); > printk(KERN_INFO "vmadump: Unregistered hook \"%s\"\n", hook->tag); > + kfree(h); > return 0; > } > } > @@ -544,8 +544,8 @@ int load_map(struct vmadump_map_ctx *ctx > r = mmap_file(ctx, head, filename, > PROT_READ|PROT_WRITE|PROT_EXEC, mmap_flags); > if (r) { > - kfree(filename); > printk("vmadump: mmap failed: %s\n", filename); > + kfree(filename); > return r; > } > kfree(filename); > > The first chunk is not really needed but just in case if somebody > would like later to add a printout of 'h' to this printk. :-) > The second chunk is the real bug. > > A quick scan through the rest of bproc code does not show up > similar troubles elsewhere but I was not very thorough. > > Michal Applied, thanks. - Erik |
From: Michal J. <mi...@ha...> - 2004-08-16 16:09:28
|
This is not earth shattering but sometimes confusing. If you happen to use some "atypical" addresses and/or masks on a cluster interface then a broadcast address is guessed wrong and a confusion may ensue. There used to be an invocation of 'ipcalc' to compute that broadcast correctly in a startup file but for some reasons it got commented out. If 'sed' was of a concern then this is sed-less version. --- beoboot-cm1.9/rc.clustermatic~ 2004-08-16 09:13:00.093179048 -0600 +++ beoboot-cm1.9/rc.clustermatic 2004-08-16 09:24:35.400476272 -0600 @@ -155,8 +155,9 @@ echo >&2 "Error: No netmask given for interface." exit 1 fi - #BCAST=`ipcalc --broadcast $ADDR $NMSK | sed -e 's/.*=//'` - if ifconfig $IF $ADDR netmask $NMSK; then + BCAST=$(ipcalc --broadcast $ADDR $NMSK) + BCAST=${BCAST#*=} + if ifconfig $IF $ADDR netmask $NMSK broadcast $BCAST; then Xsuccess else Xfailure This will set that interface parameters to what they were really configured in /etc/clustermatic/config. Michal |
From: Michal J. <mi...@ha...> - 2004-08-15 19:54:14
|
There is a problem with vmadump_common.c "recycling" kernel memory. In load_map it _does_ print a strange looking garbage in a case of a failure (leaving me scratching my head for quite a while :-). Here is a patch: --- bproc-4.0.0pre6/vmadump/vmadump_common.c~ 2004-06-22 12:24:20.000000000 -0600 +++ bproc-4.0.0pre6/vmadump/vmadump_common.c 2004-08-15 13:33:49.090716846 -0600 @@ -335,8 +335,8 @@ int vmadump_del_hook(struct vmadump_hook if (h->hook == hook) { list_del(&h->list); up_write(&hook_lock); - kfree(h); printk(KERN_INFO "vmadump: Unregistered hook \"%s\"\n", hook->tag); + kfree(h); return 0; } } @@ -544,8 +544,8 @@ int load_map(struct vmadump_map_ctx *ctx r = mmap_file(ctx, head, filename, PROT_READ|PROT_WRITE|PROT_EXEC, mmap_flags); if (r) { - kfree(filename); printk("vmadump: mmap failed: %s\n", filename); + kfree(filename); return r; } kfree(filename); The first chunk is not really needed but just in case if somebody would like later to add a printout of 'h' to this printk. :-) The second chunk is the real bug. A quick scan through the rest of bproc code does not show up similar troubles elsewhere but I was not very thorough. Michal |
From: Vipul D. <vip...@ya...> - 2004-08-13 21:34:29
|
One other point is I have 1GB RAM on my master versus 256MB on all my slave nodes. Could that cause any problems? Do I need to set some ulimits (-m) on the slave? Another oddity is that when I strace bpsh or node_up, I do not see any special "bproc" system calls getting executed like SYSCALL(....), though I should have executed them per the code path. Insights? Thanks. Vipul -----Original Message----- From: bpr...@li... [mailto:bpr...@li...] On Behalf Of Vipul Deokar Sent: Friday, August 13, 2004 12:25 PM To: YhLu; bpr...@li... Subject: RE: [BProc] Newbie questions Hi folks, I went ahead and installed ClusterMatic 4 over a clean RedHat 9.0 on the master of my 5-node Intel P4-based cluster. The beoboot part works fine to bring the slave nodes up and running connected to port 2223 of master, but the node_up script seems to fail at the end, so node gets marked as "error" state. What could the reasons be? Any help will be appreciated. ON MASTER (node_up script fails): $# tail -15 /var/log/clustermatic/node.0 vmadlib : loaded /lib/ld-2.3.2.so (size=103044;id=0,0;mode=100755) vmadlib : loaded /lib/libc-2.3.2.so (size=1549556;id=0,0;mode=100755) vmadlib : loaded /lib/librt-2.3.2.so (size=37552;id=0,0;mode=100755) vmadlib : loaded /lib/libpthread-0.10.so (size=103104;id=0,0;mode=100755) vmadlib : loaded /lib/libm-2.3.2.so (size=211876;id=0,0;mode=100755) vmadlib : loaded /lib/libnss_bproc.so.2 (size=25043;id=0,0;mode=100755) vmadlib : loaded /usr/lib/libbproc.so.4.0.0 (size=21388;id=0,0;mode=100755) nodeup : Plugin vmadlib returned status 0 (ok) nodeup : No premove function for nodeinfo nodeup : No premove function for kmod nodeup : Starting 1 child processes. nodeup : Finished creating child processes. nodeup : I/O error talking to child nodeup : Child process for node 0 died with signal 4 nodeup : Node setup returned status 1 ON SLAVE (things look fine): boot: Server IP Address : 10.0.0.1 boot: My IP Address : 10.0.0.100 boot: starting bpslave : bpslave -d -i -v 10.0.0.1 2223 bpslave: connecting to 10.0.0.1:2223 bpslave: IO Daemon started; pid 15 bpslave connection to 10.0.0.1:2223 up and running bpslave: Setting node number to 0 Now, if I force the master to mark status of slave node to be "up", bpsh fails like this. I tried "vmadlib -l" and it does show /lib/ld-2.3.2.so, but execing slave cannot seem to find it. I also looked at the strace of (strace -f) bpmaster, and one odd thing is it tries to close a whole host of socket descriptors (4096 instances) that are not open. Please - any help will be appreciated. ON MASTER: $# bpctl -S 0 -s up $# bpstat Node(s) Status Mode User Group 1-3 down ---------- root root 0 up ---x------ root root $# bpsh 0 sleep 1 0: No such file or directory ON SLAVE: vmadump: mmap failed: /lib/ld-2.3.2-so Thanks. Vipul ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |