From: Luke <lo...@du...> - 2004-06-20 19:35:09
|
Daniel, Thanks for the reply. I'm cc-ing the bproc list on this just in case anyone else has ideas. Unless you know of a download loaction that I don't, all the Clustermatic stuff is bproc4.0.0pre3. I wasn't able to mix versions- how are you using pre4? I don't have an immediate need to use a newer version of bproc, but I do have an immediate need for LAM. I'm going to ask Brian if I can help with development. Here's a rundown of my problems. At the very start of the pre4 build, I see: make -C vmadump vmadump.ko kver make[1]: Entering directory `/usr/src/bproc-4.0.0pre4/vmadump' gcc -D__KERNEL__ -I/lib/modules/2.4.25-bproc/build//usr/src/linux-2.4.25-bproc/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DPACKAGE_VERSION='"4.0.0pre4"' -I. -c vmadump_common.c The include line, which is automatically generated, is obviously wrong. If I correct it, the build makes sense a bit. The client programs build fine. When building vmadump and the kernel modules, I see errors like these, repeated many times. Except the second error, they all have to do with the variable "current", but I'm afraid I've been unable to figure out what it is (my knowledge of kernel-ish stuff is quite little). There are many, many warnings as well, that I have left out. vmadump_common.c:679: error: structure has no member named `sighand' vmadump_common.c:681: error: too few arguments to function `recalc_sigpending' vmadump_common.c:707: error: structure has no member named `clear_child_tid' ghost.c:432: error: structure has no member named `utime' ghost.c:433: error: structure has no member named `stime' ghost.c:434: error: structure has no member named `cutime' ghost.c:435: error: structure has no member named `cstime' Any ideas? Thanks -Luke Daniel Gruner wrote: >Hi Luke, > >I am using the stuff from clustermatic. At least the kernels. >Some of the other packages too, as far as I recall. > >Now, I am not working with Fedora on any of my clusters yet, and that >may be why you are seeing problems. Can you tell me what the build >problems are? I have built bproc on many different systems, such >as RH7.2 on alpha, RH7.3 on i386, RH9 on athlon, etc. > >Also, in my experience, Fedora 1 is missing a bunch of stuff, such as >libraries missing from packages, and whatever else. Let me know more >details, and perhaps I can help. > >Now as for success with LAM... that is another story. Brian was still >working on it, but it seems that some functions in the BProc api are >either broken or not properly documented. The note you quote below is old, >and has been revised since to "not working". > >Regards, >Daniel > >On Sun, Jun 20, 2004 at 01:36:17AM -0500, Luke Palmer wrote: > > >>Hey Daniel, >> >>I have a couple of questions for you. I cannot replicate your success >>with LAM on bproc. I am hoping the difference is that I am using >>bproc4.0.0pre3 (from clustermatic). >> >>You say you are using bproc4.0.0pre4. I was wondering if there is any >>trick to getting it to build? I am on Fedora Core 1, and the build is >>quite badly broken in my environment. >> >>Please let me know if you can help! >> >>Thanks >>-Luke >> >>On Sun, 2004-06-13 at 18:42, Daniel Gruner wrote: >> >> >>>Brian, >>> >>>Success!!! (well, at least apparently). The thing lamboots fine, starts >>>the lamd on all the nodes, and seems to run mpi jobs, so until further >>>testing it looks like we are in business! >>> >>>I agree with you about BProc's lack of documentation, but I still like >>>the system. It is the only cluster software that makes the cluster >>>seem like a single system image. I have been running different versions >>>of it for over 2 years, and I insist on it for all my clusters. It may >>>be time to hack on the BProc guys for more documentation (Erik...). >>> >>>Anyway, thanks for your work on LAM, and let me know if I can be of more >>>help. >>> >>>Regards, >>>Daniel >>> >>> >>>On Sun, Jun 13, 2004 at 04:01:11PM -0700, Brian Barrett wrote: >>> >>> >>>>On Jun 13, 2004, at 3:32 PM, Daniel Gruner wrote: >>>> >>>> >>>> >>>>>Ok, here it goes again. The master node is still somehow screwed up, >>>>>according to lamboot... Here is the output: >>>>> >>>>> >>>><snip> >>>> >>>> >>>> >>>>>n-1<8638> ssi:boot:bproc: n-1 nodestatus failed (-1) >>>>>n-1<8638> ssi:boot:bproc: n-1 node status down, failure >>>>>n-1<8638> ssi:boot:bproc: n0 node status: up >>>>> >>>>> >>>>Well, on the good side, we detect the master node correctly now :). >>>> >>>>You know, life would be so much better if BProc documented things like >>>>that. I added some code to make sure that NODE_MASTER is never used >>>>for the parameter to bproc_nodestatus or the like. Hopefully, that >>>>will make life better all around. If you could svn up and let me know >>>>how it goes, I'd appreciate it. Only file changed is >>>>share/ssi/boot/bproc/src/ssi_boot_bproc.c, so you should be able to svn >>>>up and run make (without the autogen or configure stuff). >>>> >>>>Thanks! >>>> >>>>Brian >>>> >>>>-- >>>> Brian Barrett >>>> LAM/MPI developer and all around nice guy >>>> Have a LAM/MPI day: http://www.lam-mpi.org/ >>>> >>>> > > > |