You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: Luke P. <lop...@wi...> - 2004-07-20 18:34:39
|
Hi everyone, My cluster is made up of dual Xeon nodes with 2GB memory. We formerly ran openMosix, which actually worked quite well other than the one hour uptimes... Anyway, we observe that when placing two processes on a node, they run about half as fast as they would were a single process placed on a node. If I look at ps or top, both processes stay at 99.9% or so, and the load averages stay just shy of 2, but the wall clock doesn't lie... I'm running the most recent clustermatic stuff, so nodes and master have 2.4.22-cm36smp kernels over Fedora Core 1. This didn't used to happen with openMosix for the exact same executables. Any ideas of things to look at? Thanks -Luke |
From: Daniel G. <dg...@ti...> - 2004-07-20 14:20:07
|
On Tue, Jul 20, 2004 at 09:03:01AM -0500, Brian Barrett wrote: > On Jul 20, 2004, at 8:08 AM, Thomas Eckert wrote: > > > this thread seems to have slipped off the bproc-list -- most likely I > > replied > > to the wrong message -- so here is a forward of my reply :( > > > > I'm interested in the bproc3<->lam-7.0.x results: have you tried > > bproc3 with > > the latest stable lam (7.0.x) and it did not work or are you focusing > > on > > bproc4 now anyway due to other reasons (want to use 2.6-kernels, ...)? > > Luke was using BProc 4, which LAM 7.0.x does not support (LAM 7.1, > which just went into beta, supports what is currently in the BProc 4 > API. Hopefully, that means it will support BProc 4 when it goes > stable). > > If you have any problems using LAM 7.0.x with BProc 3, please let us > (the LAM developers) know. There has been some fairly extensive > testing, so I would be surprised if there were problems in that area. I have recently installed LAM 7.0.2 on a CM3 (BProc 3) cluster. It mostly works, but there are a few disturbing glitches: - I cannot seem to run 2 MPI jobs as the same user simultaneously (on different sets of nodes, of course), since when I do the second invocation of mpiexec (or its equivalent lamboot/run/lamhalt) it kills the first lamd on the master node. It does seem to work for different users, though. - Just doing lamboot followed by lamhalt (whether or not some mpi job is run) produces a core dump (I guess it is by lamhalt). Always. Running mpiexec does it too. ??? Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Thomas E. <eck...@gm...> - 2004-07-20 14:16:16
|
On Tue, 20 Jul 2004, Brian Barrett wrote: (...) > Luke was using BProc 4, which LAM 7.0.x does not support (LAM 7.1, > which just went into beta, supports what is currently in the BProc 4 > API. Hopefully, that means it will support BProc 4 when it goes > stable). for the records: the snapshot lam-7.1a1r9708 seems to work fine with bproc-4.0.0_pre5 (with a small additional patch from this list) on ia32 and x86_64 (mpirun'ing jobs works). If I can help with some testing please let me know. > If you have any problems using LAM 7.0.x with BProc 3, please let us > (the LAM developers) know. There has been some fairly extensive > testing, so I would be surprised if there were problems in that area. Up to now I've only tested the snapshot -- lam-7.0.x with bproc3 is on my todo-list (which may be obsoleted by newer releases ;). Thanks for your reply, Thomas -- "Someday I'll write my own philosophy book." -Calvin |
From: Brian B. <brb...@la...> - 2004-07-20 14:03:09
|
On Jul 20, 2004, at 8:08 AM, Thomas Eckert wrote: > this thread seems to have slipped off the bproc-list -- most likely I > replied > to the wrong message -- so here is a forward of my reply :( > > I'm interested in the bproc3<->lam-7.0.x results: have you tried > bproc3 with > the latest stable lam (7.0.x) and it did not work or are you focusing > on > bproc4 now anyway due to other reasons (want to use 2.6-kernels, ...)? Luke was using BProc 4, which LAM 7.0.x does not support (LAM 7.1, which just went into beta, supports what is currently in the BProc 4 API. Hopefully, that means it will support BProc 4 when it goes stable). If you have any problems using LAM 7.0.x with BProc 3, please let us (the LAM developers) know. There has been some fairly extensive testing, so I would be surprised if there were problems in that area. Brian -- Brian Barrett LAM/MPI developer and all around nice guy Have a LAM/MPI day: http://www.lam-mpi.org/ |
From: Thomas E. <eck...@gm...> - 2004-07-20 13:20:16
|
Hi Luke, this thread seems to have slipped off the bproc-list -- most likely I replied to the wrong message -- so here is a forward of my reply :( I'm interested in the bproc3<->lam-7.0.x results: have you tried bproc3 with the latest stable lam (7.0.x) and it did not work or are you focusing on bproc4 now anyway due to other reasons (want to use 2.6-kernels, ...)? Thomas ---------- Forwarded message ---------- Date: Thu, 24 Jun 2004 11:56:09 +0200 (MEST) From: Thomas Eckert <eck...@gm...> To: Luke Palmer <lo...@du...> Subject: Re: [BProc] Re: More LAM / BProc changes Luke, On Wed, 23 Jun 2004, Luke Palmer wrote: > Thanks for the email. Did you build the kernel, or all of bproc as > well, using the 2.6? as the system evolved over time that's a tricky question. bproc-4.0.0_pre5 (kmods, libs, clients, ...) was completely build against 2.6 for sure. from the logs of the package-manager I see that beoboot, cmtools, beonss, ... were build (Gentoo is a "from-source" distribution) earlier (most likely against 2.4.22 with bproc-4.0.0_pre3). > It's easy for me to upgrade to Fedora Core 2, > which would have the 2.6 kernels, but next to impossible to back out. I > want to be good and sure before I do this... the execve hook is not implemented in brpoc4 yet -- so if you rely on it (i.e. your batch system uses shell-scripts) you'll have a problem with that. you have tried LAM with bproc3 without luck, right? (at least according to the lam v7.0.5-docs bproc>=3.2.5 _should_ work) if time permits I'll do a quick test with bproc-3.2.6 ... Cheers, Thomas > On Mon, 2004-06-21 at 02:56, Thomas Eckert wrote: > > Luke, > > > > I only can offer a kind of "me too" statement with a litte addon: > > bproc-4.0.0_pre{4,5} (I'm not sure for _pre3) id not build with > > 2.4-kernels (result as you describe it) > > BUT both do with 2.6-kernels (vanilla-kernels with only bproc-patch applied). > > All testing was done with a faily current Gentoo. > > > > My _guess_ would be that testing concentrates on 2.6-kernels at the moment as > > this is the interesting new stuff (Erik?). > > > > For the LAM-side: > > I've not tested LAM with bproc3 up to now but with the nightly snapshots on > > LAM ("lam-7.1a1r9708.tar.gz" in this case) and the bproc-4.0.0_pre5-patch > > suggested by Kevin Russell to clients/bproc.c a few days ago on bproc-users it > > comiles and is able of "mpirun"ning jobs on x86. > > > > Hope this helps a bit, > > > > Thomas > > > > On Sun, 20 Jun 2004, Luke wrote: > > > > > Daniel, > > > > > > Thanks for the reply. I'm cc-ing the bproc list on this just in case > > > anyone else has ideas. > > > > > > Unless you know of a download loaction that I don't, all the > > > Clustermatic stuff is bproc4.0.0pre3. I wasn't able to mix versions- > > > how are you using pre4? I don't have an immediate need to use a newer > > > version of bproc, but I do have an immediate need for LAM. I'm going to > > > ask Brian if I can help with development. > > > > > > Here's a rundown of my problems. At the very start of the pre4 build, I > > > see: > > > > > > make -C vmadump vmadump.ko kver > > > make[1]: Entering directory `/usr/src/bproc-4.0.0pre4/vmadump' > > > gcc -D__KERNEL__ > > > -I/lib/modules/2.4.25-bproc/build//usr/src/linux-2.4.25-bproc/include > > > -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing > > > -fno-common -fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 > > > -march=i686 -DMODULE -DPACKAGE_VERSION='"4.0.0pre4"' -I. -c > > > vmadump_common.c > > > > > > The include line, which is automatically generated, is obviously wrong. > > > If I correct it, the build makes sense a bit. The client programs build > > > fine. When building vmadump and the kernel modules, I see errors like > > > these, repeated many times. Except the second error, they all have to > > > do with the variable "current", but I'm afraid I've been unable to > > > figure out what it is (my knowledge of kernel-ish stuff is quite > > > little). There are many, many warnings as well, that I have left out. > > > > > > vmadump_common.c:679: error: structure has no member named `sighand' > > > vmadump_common.c:681: error: too few arguments to function > > > `recalc_sigpending' > > > vmadump_common.c:707: error: structure has no member named `clear_child_tid' > > > ghost.c:432: error: structure has no member named `utime' > > > ghost.c:433: error: structure has no member named `stime' > > > ghost.c:434: error: structure has no member named `cutime' > > > ghost.c:435: error: structure has no member named `cstime' > > > > > > Any ideas? > > > > > > Thanks > > > -Luke > > > > > > Daniel Gruner wrote: > > > > > > >Hi Luke, > > > > > > > >I am using the stuff from clustermatic. At least the kernels. > > > >Some of the other packages too, as far as I recall. > > > > > > > >Now, I am not working with Fedora on any of my clusters yet, and that > > > >may be why you are seeing problems. Can you tell me what the build > > > >problems are? I have built bproc on many different systems, such > > > >as RH7.2 on alpha, RH7.3 on i386, RH9 on athlon, etc. > > > > > > > >Also, in my experience, Fedora 1 is missing a bunch of stuff, such as > > > >libraries missing from packages, and whatever else. Let me know more > > > >details, and perhaps I can help. > > > > > > > >Now as for success with LAM... that is another story. Brian was still > > > >working on it, but it seems that some functions in the BProc api are > > > >either broken or not properly documented. The note you quote below is old, > > > >and has been revised since to "not working". > > > > > > > >Regards, > > > >Daniel > > > > > > > >On Sun, Jun 20, 2004 at 01:36:17AM -0500, Luke Palmer wrote: > > > > > > > > > > > >>Hey Daniel, > > > >> > > > >>I have a couple of questions for you. I cannot replicate your success > > > >>with LAM on bproc. I am hoping the difference is that I am using > > > >>bproc4.0.0pre3 (from clustermatic). > > > >> > > > >>You say you are using bproc4.0.0pre4. I was wondering if there is any > > > >>trick to getting it to build? I am on Fedora Core 1, and the build is > > > >>quite badly broken in my environment. > > > >> > > > >>Please let me know if you can help! > > > >> > > > >>Thanks > > > >>-Luke > > > >> > > > >>On Sun, 2004-06-13 at 18:42, Daniel Gruner wrote: > > > >> > > > >> > > > >>>Brian, > > > >>> > > > >>>Success!!! (well, at least apparently). The thing lamboots fine, starts > > > >>>the lamd on all the nodes, and seems to run mpi jobs, so until further > > > >>>testing it looks like we are in business! > > > >>> > > > >>>I agree with you about BProc's lack of documentation, but I still like > > > >>>the system. It is the only cluster software that makes the cluster > > > >>>seem like a single system image. I have been running different versions > > > >>>of it for over 2 years, and I insist on it for all my clusters. It may > > > >>>be time to hack on the BProc guys for more documentation (Erik...). > > > >>> > > > >>>Anyway, thanks for your work on LAM, and let me know if I can be of more > > > >>>help. > > > >>> > > > >>>Regards, > > > >>>Daniel > > > >>> > > > >>> > > > >>>On Sun, Jun 13, 2004 at 04:01:11PM -0700, Brian Barrett wrote: > > > >>> > > > >>> > > > >>>>On Jun 13, 2004, at 3:32 PM, Daniel Gruner wrote: > > > >>>> > > > >>>> > > > >>>> > > > >>>>>Ok, here it goes again. The master node is still somehow screwed up, > > > >>>>>according to lamboot... Here is the output: > > > >>>>> > > > >>>>> > > > >>>><snip> > > > >>>> > > > >>>> > > > >>>> > > > >>>>>n-1<8638> ssi:boot:bproc: n-1 nodestatus failed (-1) > > > >>>>>n-1<8638> ssi:boot:bproc: n-1 node status down, failure > > > >>>>>n-1<8638> ssi:boot:bproc: n0 node status: up > > > >>>>> > > > >>>>> > > > >>>>Well, on the good side, we detect the master node correctly now :). > > > >>>> > > > >>>>You know, life would be so much better if BProc documented things like > > > >>>>that. I added some code to make sure that NODE_MASTER is never used > > > >>>>for the parameter to bproc_nodestatus or the like. Hopefully, that > > > >>>>will make life better all around. If you could svn up and let me know > > > >>>>how it goes, I'd appreciate it. Only file changed is > > > >>>>share/ssi/boot/bproc/src/ssi_boot_bproc.c, so you should be able to svn > > > >>>>up and run make (without the autogen or configure stuff). > > > >>>> > > > >>>>Thanks! > > > >>>> > > > >>>>Brian > > > >>>> > > > >>>>-- > > > >>>> Brian Barrett > > > >>>> LAM/MPI developer and all around nice guy > > > >>>> Have a LAM/MPI day: http://www.lam-mpi.org/ > > > >>>> > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference > > > Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer > > > Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA > > > REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND > > > _______________________________________________ > > > BProc-users mailing list > > > BPr...@li... > > > https://lists.sourceforge.net/lists/listinfo/bproc-users > > > > > > > > -- Sometimes I think the surest sign that intelligent life exists elsewhere in the universe is that none of it has tried to contact us. -- Calvin |
From: Michal J. <mi...@ha...> - 2004-07-18 19:43:43
|
On Sun, Jul 18, 2004 at 11:24:23AM +0200, Thomas Eckert wrote: > On Sat, 17 Jul 2004, Michal Jaegermann wrote: > > (...) > > ls[199] general protection rip: 3a1cd0ada7 rsp: 7fbffff010 > > I saw similar errors too with _pre4 -- from my docs I cannot reconstruct if it > happend with _pre5 too but it's running on amd64 in my setup (2.6.6, _pre5, > Gentoo-dist) when I disable "COMPAT32"-support in bproc: > (bproc-src-dir/Makefile.conf) > s/^COMPAT32:=y/COMPAT32:=n/ I am afraid that some other magic is involved here as the above did not solve the issue for me. The message itself comes from a kernel trap which for me is not that illuminating. While we are at it here is a slight trouble in clients/Makefile --- bproc-4.0.0pre5/clients/Makefile~ 2004-05-20 16:25:08.000000000 -0600 +++ bproc-4.0.0pre5/clients/Makefile 2004-07-18 12:19:40.949094720 -0600 @@ -94,6 +94,7 @@ compat_install: $(A_LIB32) $(SO_LIB32) # Libraries (renaming the 32 bit stuff as we go...) + install -d -m 755 $(prefix)$(lib32dir) install -m 644 $(A_LIB32) $(prefix)$(lib32dir)/$(A_LIB) install -m 755 $(SO_LIB32) $(prefix)$(lib32dir)/$(SO_LIB) ln -sf $(SONAME) $(prefix)$(lib32dir)/$(SOLINK) This target directory may not exist. Especially if $(prefix) is non-empty. Michal |
From: Luke P. <lop...@wi...> - 2004-07-18 16:16:08
|
Yep. Adding /lib/libnss_files* to libraries in my config lets the hosts file work correctly. Good catch, Thomas. -Luke |
From: Thomas E. <eck...@gm...> - 2004-07-18 09:24:29
|
On Sat, 17 Jul 2004, Michal Jaegermann wrote: > I am trying to get bproc cluster running on 64-bit Opterons with > a kernel based on 2.6.6 and patched up bproc-4.0.0pre5. Front end > is running Fedora Core 2 distro plus support. That > hardware/software combo seems to limit my choices to that bproc > version. (...) > ls[199] general protection rip: 3a1cd0ada7 rsp: 7fbffff010 I saw similar errors too with _pre4 -- from my docs I cannot reconstruct if it happend with _pre5 too but it's running on amd64 in my setup (2.6.6, _pre5, Gentoo-dist) when I disable "COMPAT32"-support in bproc: (bproc-src-dir/Makefile.conf) s/^COMPAT32:=y/COMPAT32:=n/ Hope this helps, Thomas |
From: Michal J. <mi...@ha...> - 2004-07-18 04:05:34
|
I am trying to get bproc cluster running on 64-bit Opterons with a kernel based on 2.6.6 and patched up bproc-4.0.0pre5. Front end is running Fedora Core 2 distro plus support. That hardware/software combo seems to limit my choices to that bproc version. I have one client node so far. At this stage I can boot a node and I see from it .... nodeup : Node setup completed successfully. nodeup : Node setup returned status 0 although 'nodeinfo' tells me "cpus=0; hz=0; mem=4163420160" (which is hopefuly minor). Also 'bpstat' prints: 0 up ---x--x--x root root So far so good. But if I will try, say, 'bpsh -a ls' then I end up with "bpsh: Child process exited abnormally" and on a monitor hooked up to a node I see lines like that: ls[199] general protection rip: 3a1cd0ada7 rsp: 7fbffff010 OTOH bpstat seems to work fine so I can power down a node, or change it ownership, but that is about it at this moment. Any ideas? Michal |
From: Luke P. <lop...@wi...> - 2004-07-16 20:26:34
|
I have the default, which only consists of libnss_bproc.so.2. Ahh, that makes a bit of sense. I'll stick the files lib in there and see what happens. -Luke On Fri, 2004-07-16 at 14:56, Thomas Eckert wrote: > Luke, > > do you have the nss-libs on the nodes, especially "/lib/libnss_files*"? > > Thomas > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Thomas E. <eck...@gm...> - 2004-07-16 20:24:01
|
On Fri, 16 Jul 2004, Willem Schreuder wrote: > On Fri, 16 Jul 2004, Willem Schreuder wrote: > >> I've run into exactly the same problem with several versions of BProc. It >> appears that the BProc nss mods used to allow node addresing by symbolic >> names breaks name resolution by both files and dns. I've stared at the >> code, but it is not obvious to me where it is broken. > > I should qualify that - beonss breaks name resolution for hosts not part > of the cluster. This would explain why my setup works. I'll try resolving a non-cluster-member on monday. Have you tested with "files" only or also gave "dns" a try? |
From: Thomas E. <eck...@gm...> - 2004-07-16 19:56:25
|
Luke, do you have the nss-libs on the nodes, especially "/lib/libnss_files*"? Thomas |
From: Thomas E. <eck...@gm...> - 2004-07-16 19:41:11
|
On Fri, 16 Jul 2004, Luke Palmer wrote: > Your nsswitch leads me to believe that your nodes have more of a full > install than just the minimal bproc setup (which I am using). Is that > the case? nope. it's a beoboot-booted minimal beowulf; no nfs-root, fat initrd or stuff like that. some stuff (like /etc/protocols) was added to the bare minimum but i think it's comparable to your install. Thomas |
From: <Wil...@pr...> - 2004-07-16 18:41:24
|
On Fri, 16 Jul 2004, Luke Palmer wrote: > Your nsswitch leads me to believe that your nodes have more of a full > install than just the minimal bproc setup (which I am using). Is that > the case? Willem, what are you running? I have had this problem since a Vanilla CM3. I've upgraded that by recompiling some components including beonss, although not recently. Why this is a problem for me is that I have a SAN to some NFS servers which I access via a second ethernet card in each slave. When I reference the servers by their IP address, all is well. When I try to resolve them via /etc/hosts or DNS no joy. This also applies to anything else (ping, etc.) trying to resolve the host name. My conclusion has been that somehow the bproc hooks break the name resolution cascade so that only the bproc name resolution works, but I can't prove that with a patch :-( -Willem -- ================================================================ Dr. Willem A. Schreuder, President, Principia Mathematica Address: 575 Union Blvd, Suite 320, Lakewood, CO 80228, USA Tel: (303) 716-3573 Fax: (303) 716-3575 WWW: www.prinmath.com Email: Wil...@pr... |
From: Luke P. <lop...@wi...> - 2004-07-16 18:17:32
|
Thomas, Your nsswitch leads me to believe that your nodes have more of a full install than just the minimal bproc setup (which I am using). Is that the case? Willem, what are you running? -Luke > I tested with nearly the same setup regarding nsswitch.conf and hosts (my > foo-host points to another cluster-node so there is no NATing) and ist works: > nsswitch.conf: > n-1 # bpsh 0 cat /etc/nsswitch.conf > passwd: files bproc > shadow: files > group: files > hosts: files bproc |
From: Willem S. <wi...@pr...> - 2004-07-16 18:08:33
|
On Fri, 16 Jul 2004, Willem Schreuder wrote: > I've run into exactly the same problem with several versions of BProc. It > appears that the BProc nss mods used to allow node addresing by symbolic > names breaks name resolution by both files and dns. I've stared at the > code, but it is not obvious to me where it is broken. I should qualify that - beonss breaks name resolution for hosts not part of the cluster. -Willem |
From: Willem S. <wi...@pr...> - 2004-07-16 18:00:33
|
On Fri, 16 Jul 2004, Luke Palmer wrote: > Yep, that's exactly what I did (see original post). > hosts: files bproc I've run into exactly the same problem with several versions of BProc. It appears that the BProc nss mods used to allow node addresing by symbolic names breaks name resolution by both files and dns. I've stared at the code, but it is not obvious to me where it is broken. -Willem -- ================================================================ Dr. Willem A. Schreuder, President, Principia Mathematica Address: 575 Union Blvd, Suite 320, Lakewood, CO 80228, USA Tel: (303) 716-3573 Fax: (303) 716-3575 WWW: www.prinmath.com Email: Wil...@pr... |
From: Thomas E. <eck...@gm...> - 2004-07-16 18:00:23
|
addition: a similar setup to the one described in my post a few minutes ago with bproc-4.0.0_pre5 (nsswitch.conf: "hosts: files") works too. |
From: Thomas E. <eck...@gm...> - 2004-07-16 17:52:47
|
Luke, On Thu, 15 Jul 2004, Luke Palmer wrote: > I'm trying to do some host faking to make flexlm licensed software work > on nodes. Say I have a flexlm license that refers to the servers foo > and bar. I want to make nodes think that foo and bar are the master, > which will then do NAT and send data to the real foo and bar. > > So, on nodes here is what I have done. nsswitch.conf looks like this: > > passwd: bproc > hosts: files bproc > > and /etc/hosts is this: > > 10.0.4.100 foo bar I tested with nearly the same setup regarding nsswitch.conf and hosts (my foo-host points to another cluster-node so there is no NATing) and ist works: nsswitch.conf: n-1 # bpsh 0 cat /etc/nsswitch.conf passwd: files bproc shadow: files group: files hosts: files bproc hosts: n-1 # bpsh 0 cat /etc/hosts 127.0.0.1 localhost 10.0.4.1 m-1 10.0.4.10 m0 10.0.4.11 m1 do the ping: n-1 # bpsh 0 ping -c 1 m1 PING m1 (10.0.4.11): 56 octets data 64 octets from 10.0.4.11: icmp_seq=0 ttl=64 time=0.1 ms --- m1 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.1/0.1/0.1 ms (bproc-3.2.6) Do you have /etc/protcols with the needed entries on the nodes -- I see "unknown protocol icmp"-errors if it's missing -- but who knows ... > Unfortunately, I see the following: > > # bpsh 1 ping foo > ping: unknown host foo Thomas |
From: Luke P. <lop...@wi...> - 2004-07-16 17:48:04
|
Yep, that's exactly what I did (see original post). Thanks again -Luke > Have you tried to modify /etc/nsswitch.conf (the one on the nodes) so > that it tries to use /etc/hosts first in order to resolve addresses? > Normally /etc/nsswitch.conf contains: > passwd: bproc > hosts: bproc > > In a normal running system it has many more entries. One could modify it so that > it looks like: > passwd: bproc > hosts: files bproc |
From: Luke P. <lop...@wi...> - 2004-07-16 16:57:23
|
Thanks for the reply, Daniel! I'm pretty sure my NAT is working properly. I can do: #bpsh 1 telnet 10.0.4.100 1705 and talk to the external FlexLM server, but not #bpsh 1 telnet foo 1705 The problem is that the FlexLM license file refers to the servers by name (foo), and you can't change those names without invalidating the license. So, what you have to do is associate a fake IP address (10.0.4.100) with the names in the license file. I would typically do this by modifying the /etc/hosts file, but for some reason, my bproc nodes are ignoring either /etc/hosts or /etc/nsswitch.conf. Any other ideas? -Luke > This is precisely what NAT should help you do, though... I can easily do > something like "bpsh 0 ping 128.100.100.128", and it works properly through > the NAT setup of the master node. Are you sure your NAT is working properly? |
From: Luke P. <lop...@wi...> - 2004-07-16 14:11:30
|
Thanks for the reply. I unfortunately can't test without bpsh- I don't have full installs on nodes. I also couldn't use your setup, as nodes can't see the globally valid IP's outside their subnet. I do have appropriate NAT working, though! I thought that was going to be the hard part.... :) Does anyone have other ideas, or things to check? Thanks -Luke > Try testing without bpsh (if you can). We use this setup for a commercial > Flexlm licensed application, and it works. We have full OS installs on the > internal compute nodes, however, so they only use hosts: files. > > /etc/hosts has the *external* addresses of the flexlm servers (we have a > 3-backup-server setup for Flexlm). Routing all goes through the master > server (router set via DHCP). Master server has NAT set up via iptables > script (iptable_nat module, set up FORWARD chain for appropriate ports, > enable MASQUERADEing in nat table (POSTROUTING chain), and "echo 1 > > /proc/sys/net/ipv4/ip_forward"). |
From: Daniel W. <wi...@ci...> - 2004-07-16 03:00:55
|
Luke, Try testing without bpsh (if you can). We use this setup for a commercial Flexlm licensed application, and it works. We have full OS installs on the internal compute nodes, however, so they only use hosts: files. /etc/hosts has the *external* addresses of the flexlm servers (we have a 3-backup-server setup for Flexlm). Routing all goes through the master server (router set via DHCP). Master server has NAT set up via iptables script (iptable_nat module, set up FORWARD chain for appropriate ports, enable MASQUERADEing in nat table (POSTROUTING chain), and "echo 1 > /proc/sys/net/ipv4/ip_forward"). If you haven't done NAT before, there's a decent HOWTO in TLDP. HTH, Dan W. On Thu, Jul 15, 2004 at 05:54:15PM -0500, Luke Palmer wrote: > Hello, > > I'm trying to do some host faking to make flexlm licensed software work > on nodes. Say I have a flexlm license that refers to the servers foo > and bar. I want to make nodes think that foo and bar are the master, > which will then do NAT and send data to the real foo and bar. > > So, on nodes here is what I have done. nsswitch.conf looks like this: > > passwd: bproc > hosts: files bproc > > and /etc/hosts is this: > > 10.0.4.100 foo bar > > Unfortunately, I see the following: > > # bpsh 1 ping foo > ping: unknown host foo > > This trick would work on a normal linux box- can anyone see what I am > doing wrong, or suggest an alternate approach? > > Thanks > -Luke -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Liniac Project, CIS Dept., SEAS, University of Pennsylvania -- Mail: CIS Dept, 302 Levine 3330 Walnut St Philadelphia, PA 19104 |
From: Michal J. <mi...@ha...> - 2004-07-16 00:06:37
|
Another gotcha! In bproc-4.0.0pre5/clients/sys/bproc_common.h we see the following code: enum { BPROC_ARCH_X86 = 1, BPROC_ARCH_ALPHA = 2, BPROC_ARCH_PPC = 3, BPROC_ARCH_X86_64 = 4 }; #if defined(__i386__) #define BPROC_ARCH BPROC_ARCH_X86 #elif defined(__alpha__) #define BPROC_ARCH BPROC_ARCH_ALPHA #elif defined(powerpc) #define BPROC_ARCH BPROC_ARCH_PPC #elif defined(__x86_64__) #define BPROC_ARCH BPROC_ARCH_X86_64 #else .... but boot.h from beoboot-cm1.9 says #define BEOBOOT_ARCH_I386 1 .... #if defined(__i386__) || defined(__x86_64__) #define BEOBOOT_ARCH BEOBOOT_ARCH_I386 #elif .... and later nodeadd checks if four equals one, as I am trying to get that running on x86_64, leaving me scratching my head why rarp is steadfastly refusing talk to me. Sigh! I guess that an idea is that beoboot is using beboot_common.h, so we will not get out of sync like that, but unfortunately this is not the case in this moment. Watch out if you are hacking with these sources! Michal |
From: Luke P. <lop...@wi...> - 2004-07-15 22:54:19
|
Hello, I'm trying to do some host faking to make flexlm licensed software work on nodes. Say I have a flexlm license that refers to the servers foo and bar. I want to make nodes think that foo and bar are the master, which will then do NAT and send data to the real foo and bar. So, on nodes here is what I have done. nsswitch.conf looks like this: passwd: bproc hosts: files bproc and /etc/hosts is this: 10.0.4.100 foo bar Unfortunately, I see the following: # bpsh 1 ping foo ping: unknown host foo This trick would work on a normal linux box- can anyone see what I am doing wrong, or suggest an alternate approach? Thanks -Luke |