You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: Gustavo G. M. <gu...@ma...> - 2004-09-29 14:44:16
|
Hi everybody again, Luke has helped me and this message is the continuation of the problems. Luke, as you said, I installed the kernel 2.6.7 and recompiled it. Until this, Ok. The kernel 2.6.7 was compiled. The next step was apply the bproc-4.0.0pre6. I execute the following command: patch -f -p1 < ../bproc-4.0.0pre6/patches/bproc-4.0.0pre6 But, again, I got some errors. Even this, I tried to recompile de kernel, but without success. A question: Can I edit the files where the patch failed, looking for the block code in the patch bproc-4.0.0pre6 and pasting it in the kernel file? The errors applying the patch are: patching file arch/i386/kernel/i386_ksyms.c Hunk #1 FAILED at 206. 1 out of 1 hunk FAILED -- saving rejects to file arch/i386/kernel/i386_ksyms.c.rej patching file arch/i386/kernel/process.c Hunk #1 FAILED at 36. 1 out of 3 hunks FAILED -- saving rejects to file arch/i386/kernel/process.c.rej patching file arch/i386/kernel/traps.c Hunk #1 FAILED at 61. 1 out of 1 hunk FAILED -- saving rejects to file arch/i386/kernel/traps.c.rej patching file arch/x86_64/kernel/x8664_ksyms.c Hunk #1 FAILED at 219. 1 out of 1 hunk FAILED -- saving rejects to file arch/x86_64/kernel/x8664_ksyms.c.rej patching file kernel/sched.c Hunk #1 FAILED at 40. 1 out of 6 hunks FAILED -- saving rejects to file kernel/sched.c.rej -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 |
From: Greg W. <gw...@la...> - 2004-09-29 00:15:38
|
From: gw...@la... Subject: Workshop - Clustermatic: An Innovative Approach To Cluster Computing Date: September 28, 2004 6:12:31 PM MDT To: cc...@la..., cc...@la... Clustermatic: An Innovative Approach To Cluster Computing The 2004 LACSI Symposium will present this workshop on October 12. This workshop will give participants extensive, hands-on experience, installing, configuring and using the Clustermatic software suite on a real cluster computer system. One can register for the workshop only, or the entire Symposium, at http://lacsi.lanl.gov/symposium, then click on "Registration". After October 5, registration is available onsite at the Eldorado Hotel in Santa Fe. AGENDA 8:00-9:00 Registration and Continental Breakfast 9:00-9:30 Introduction to Clustermatic, Greg Watson, Cluster Research Team, LANL 9:30-10:30 BProc & Beoboot, Erik Hendriks, Cluster Research Team, LANL 10:30-11:00 Break 11:00-11:30 BProc & Beoboot continued..., Erik Hendriks, Cluster Research Team, LANL 11:30-12:30 LinuxBIOS, Ron Minnich, Cluster Research Team Lead, LANL 12:30-2:00 Lunch (on your own) 2:00-3:00 Filesystems, Ron Minnich, Cluster Research Team Lead, LANL 3:00-3:30 Supermon, Matt Sottile, Cluster Research Team, LANL 3:30-4:00 Break 4:00-4:30 BProc Job Scheduler (BJS), Matt Sottile, Cluster Research Team, LANL 4:30-5:30 Using MPI, Matt Sottile, Cluster Research Team, LANL |
From: Ted S. <tsa...@cr...> - 2004-09-28 15:04:12
|
Hi, I work to install clustermatic4 on an opteron cluster running SuSe Server9 for AMD64 (Tian B2882 Transport GX28). I compiled kernel 2.6.7 with modules from http://sf.net/projects/bproc (thanks Erik). At boot phase 2 I get an error: boot: Install module e100 e100: Intel PRO/100 driver 3.0.18 socket: Address family not supported by protocol If if disable the hardware (I don't use it) I get similar error for the gigabit interface: tg3: no version for kfree found: kernel tainted socket: Address family not supported by protocol What do I do about this kind of errors? I'll highly appreciate any help. Thanks, -- Ted |
From: Gustavo G. M. <gu...@ma...> - 2004-09-28 14:39:44
|
Luke, Thanks for your answer. I have a doubt: the patch bproc-4.0.0pre6 is not only for the kernel 2.6.7? Because my kernel is 2.6.5. -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 Citando Luke Palmer <lop...@wi...>: > Gustavo, > > The build procedure in 2.6 is probably different than what you're used to. > I'm not sure if this would help your problem or not. I can tell you that > "make bzImage" is depreciated in favor of just "make", but reading the docs > would probably be a good idea. Try it that way with a clean source tree, > and let us know what happens. > > Also, why are you using pre4? pre6 is available, and will very likely be > cleaner. pre6 on FC2 compiles fine for me. > > -Luke > > > -----Original Message----- > From: bpr...@li... > [mailto:bpr...@li...] On Behalf Of Gustavo Gobi > Martinelli > Sent: Tuesday, September 28, 2004 8:07 AM > To: bpr...@li... > Cc: clu...@gr... > Subject: [BProc] BPROC error on Kernel 2.6.5 > > Hi everybody, > > I´m trying to install a beowulf cluster and I applied the bproc-4.0.0pre4 > patch > on my kernel 2.6.5 (Fedora Core 2). > > But when I try to recompile my kernel, I find an error exactly as below. > After > the "make bzImage" command: > > "arch/i386/kernel/ptrace.c: At top level: > arch/i386/kernel/ptrace.c:529: warning: type defaults to `int' in > declaration of > `EXPORT_SYMBOL' > arch/i386/kernel/ptrace.c:529: warning: parameter names (without types) in > function declaration > arch/i386/kernel/ptrace.c:529: warning: data definition has no type or > storage > class > make[1]: ** [arch/i386/kernel/ptrace.o] Erro 1 > make: ** [arch/i386/kernel] Erro 2" > > What do I have to do about this error? Can I edit the file ptrace.c to > verify > this errors? > > Another thing to know: I only execute the "make bzImage" command, because > the > recompilation for kernel 2.6.5 change some old command like "make dep". Is > there ok? > > -- > Atenciosamente, > Gustavo Gobi Martinelli > Linux User# 270627 > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > Project Admins to receive an Apple iPod Mini FREE for your judgement on > who ports your project to Linux PPC the best. Sponsored by IBM. > Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > > |
From: Luke P. <lop...@wi...> - 2004-09-28 14:16:45
|
Gustavo, The build procedure in 2.6 is probably different than what you're used = to. I'm not sure if this would help your problem or not. I can tell you = that "make bzImage" is depreciated in favor of just "make", but reading the = docs would probably be a good idea. Try it that way with a clean source = tree, and let us know what happens. Also, why are you using pre4? pre6 is available, and will very likely = be cleaner. pre6 on FC2 compiles fine for me. -Luke -----Original Message----- From: bpr...@li... [mailto:bpr...@li...] On Behalf Of Gustavo = Gobi Martinelli Sent: Tuesday, September 28, 2004 8:07 AM To: bpr...@li... Cc: clu...@gr... Subject: [BProc] BPROC error on Kernel 2.6.5 Hi everybody, I=B4m trying to install a beowulf cluster and I applied the = bproc-4.0.0pre4 patch on my kernel 2.6.5 (Fedora Core 2). But when I try to recompile my kernel, I find an error exactly as below. After the "make bzImage" command: "arch/i386/kernel/ptrace.c: At top level: arch/i386/kernel/ptrace.c:529: warning: type defaults to `int' in declaration of `EXPORT_SYMBOL' arch/i386/kernel/ptrace.c:529: warning: parameter names (without types) = in function declaration arch/i386/kernel/ptrace.c:529: warning: data definition has no type or storage class make[1]: ** [arch/i386/kernel/ptrace.o] Erro 1 make: ** [arch/i386/kernel] Erro 2" What do I have to do about this error? Can I edit the file ptrace.c to verify this errors? Another thing to know: I only execute the "make bzImage" command, = because the recompilation for kernel 2.6.5 change some old command like "make dep". = Is there ok? -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Gustavo G. M. <gu...@ma...> - 2004-09-28 13:07:06
|
Hi everybody, I´m trying to install a beowulf cluster and I applied the bproc-4.0.0pre4 patch on my kernel 2.6.5 (Fedora Core 2). But when I try to recompile my kernel, I find an error exactly as below. After the "make bzImage" command: "arch/i386/kernel/ptrace.c: At top level: arch/i386/kernel/ptrace.c:529: warning: type defaults to `int' in declaration of `EXPORT_SYMBOL' arch/i386/kernel/ptrace.c:529: warning: parameter names (without types) in function declaration arch/i386/kernel/ptrace.c:529: warning: data definition has no type or storage class make[1]: ** [arch/i386/kernel/ptrace.o] Erro 1 make: ** [arch/i386/kernel] Erro 2" What do I have to do about this error? Can I edit the file ptrace.c to verify this errors? Another thing to know: I only execute the "make bzImage" command, because the recompilation for kernel 2.6.5 change some old command like "make dep". Is there ok? -- Atenciosamente, Gustavo Gobi Martinelli Linux User# 270627 |
From: YhLu <Yh...@ty...> - 2004-09-24 01:58:24
|
It does not with kernel 2.6.8.1 in kernel.org too. YH -----Original Message----- From: er...@he... [mailto:er...@he...] Sent: Thursday, September 23, 2004 7:03 AM To: YhLu Cc: bpr...@li... Subject: Re: bproc patch for 2.6.4-52 in Suse 9.1 pro amd64 On Wed, Sep 22, 2004 at 04:07:44PM -0700, YhLu wrote: > Erik, > > I used the patch in bpro-4.0.0pre6 to the Kernel 2.6.4-52 in Suse 9.1 for > AMD64. > Got some rej. The patches I generate are against the stock kernels from kernel.org. Vendors like SuSE and Red Hat apply literally hundreds of patches to their kernels. My patches never apply cleanly after that. I recommend just using a kernel.org kernel. Otherwise it's up to you to resolve the conflicts. - Erik |
From: <er...@he...> - 2004-09-23 15:18:28
|
On Wed, Sep 22, 2004 at 04:07:44PM -0700, YhLu wrote: > Erik, > > I used the patch in bpro-4.0.0pre6 to the Kernel 2.6.4-52 in Suse 9.1 for > AMD64. > Got some rej. The patches I generate are against the stock kernels from kernel.org. Vendors like SuSE and Red Hat apply literally hundreds of patches to their kernels. My patches never apply cleanly after that. I recommend just using a kernel.org kernel. Otherwise it's up to you to resolve the conflicts. - Erik |
From: <er...@he...> - 2004-09-20 23:02:52
|
On Mon, Sep 20, 2004 at 08:22:46AM -0400, Ted Sariyski wrote: > Hi, > I try to install clustermatic4 on a opteron cluster running SuSe Server9 > for AMD64 (Tian B2882 Transport GX28). While installing rpms I get > errors for bproc-modules and v9fs-mounter-smp: > > #> depmod: QM_MODULES: Function not implemented > > Does anybody else had this problem? The Clustermatic 4 RPMS are for Linux 2.4. A lot has changed between Linux 2.4 and Linux 2.6. In order to use 2.6, I'll have to build from newer sources. (available at http://sf.net/projects/bproc) - Erik |
From: Steven J. <py...@li...> - 2004-09-20 15:15:02
|
Greetings, The kernel's module API changed between 2.4 and 2.6. You'll need to update your module utilities to use the new syscalls. G'day, sjames ||||| |||| ||||||||||||| ||| by Linux Labs International, Inc. Steven James, CTO 55 Marietta Street Suite 1830 Atlanta, Ga 30303 866 824 9737 support On Mon, 20 Sep 2004, Ted Sariyski wrote: > Hi, > I try to install clustermatic4 on a opteron cluster running SuSe Server9 > for AMD64 (Tian B2882 Transport GX28). While installing rpms I get > errors for bproc-modules and v9fs-mounter-smp: > > #> depmod: QM_MODULES: Function not implemented > > Does anybody else had this problem? > > Thanks, > Ted > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 > Project Admins to receive an Apple iPod Mini FREE for your judgement on > who ports your project to Linux PPC the best. Sponsored by IBM. > Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Ted S. <tsa...@cr...> - 2004-09-20 12:22:50
|
Hi, I try to install clustermatic4 on a opteron cluster running SuSe Server9 for AMD64 (Tian B2882 Transport GX28). While installing rpms I get errors for bproc-modules and v9fs-mounter-smp: #> depmod: QM_MODULES: Function not implemented Does anybody else had this problem? Thanks, Ted |
From: <er...@he...> - 2004-09-17 17:16:02
|
On Wed, Sep 15, 2004 at 07:59:46PM +0100, Miguel Dias Costa wrote: > Hello all > > does bjs set a stack ulimit to 8192? > > We verify that bpsh migrates ulimits to the nodes but with bjssub, there > is always this stack limit. bjs doesn't do anything with ulimits. If you're running in batch mode, you'll probably get whatever bjs itself gets. In interactive mode, your job will get whatever you've got in your shell. bjs really should deal with the ulimits. I was thinking that bjs should actually get PAM in the loop when starting jobs so that things like pam_limits can work too. - Erik |
From: Miguel D. C. <mc...@fc...> - 2004-09-15 18:53:10
|
Hello all does bjs set a stack ulimit to 8192?=20 We verify that bpsh migrates ulimits to the nodes but with bjssub, there is always this stack limit. --=20 Miguel Dias Costa <mc...@fc...> Centro de F=EDsica do Porto |
From: <er...@he...> - 2004-09-11 02:31:16
|
On Wed, Sep 08, 2004 at 10:43:17AM +0200, Peter Englmaier wrote: > Hi, we had a 'Scyld' Cluster running for years until some disk > on the master died. Some MPI program and other 'normal' programs > were running. After the disk crash the bproc system was hanging > and after a reboot all nodes didn't come up. Finally, > we installed RH9+Clustermatic 4. Main reason for this: we > had no good documentation aboud scyld and the old sysadmin had > left the institute. Setting up clustermatic was quite easy. > > Nodes had scratch disks with three partitions: Scyld boot > partition (actually an ext2 partition), swap, and /scratch. > All three partitions where destroyed - I could not mount any of them. > On about 10 of 22 nodes! Even swap was not recognized as such, although > fdisk reported all partitions. I suspect something was going wrong > with the kernel part of bproc. Is this possible? BProc doesn't get its fingers in any of the kernel's file system stuff so I think it's unlikely that BProc would induce file system problems. I suppose it's possible if something went crazy and started scribbling on memory. Personally, I've never seen a problem like that with BProc. I can't speak for Scyld though - they've modified BProc and I don't know the details of what they've done. Personally, I would pull out one of the disks to see if they work on some other stand alone system. If they do then I'd start picking through the node setup process to see what's going on there. - Erik |
From: <er...@he...> - 2004-09-10 21:59:21
|
On Thu, Sep 09, 2004 at 09:41:41AM -0700, J S wrote: > > We have a 3 computer cluster (each one a dual AMD), > and wish to use them all to do work. We have one of > them configured as a master, and the other 2 as slaves > (nodes 0-1). > > Executing > > NODES=-1,0,1 mpirun -np 6 -s --p4 ./cpi > Could not get number of cpus for node -1, assuming 1 > Not enough nodes to allocate all processes > > (It works well with NODES=0,1 for -np 4) > > We are using Bproc and MPICH w/ clustermatic. How can > I tell mpirun to start 2 processes in the master node? I'm afraid it's busted. There's a -l switch but it doesn't mix well with using the NODES string. I guess we never fixed it because it doesn't come up much here. If you decide to fix it, please send a patch. - Erik |
From: J S <joa...@ya...> - 2004-09-09 16:41:50
|
We have a 3 computer cluster (each one a dual AMD), and wish to use them all to do work. We have one of them configured as a master, and the other 2 as slaves (nodes 0-1). Executing NODES=-1,0,1 mpirun -np 6 -s --p4 ./cpi Could not get number of cpus for node -1, assuming 1 Not enough nodes to allocate all processes (It works well with NODES=0,1 for -np 4) We are using Bproc and MPICH w/ clustermatic. How can I tell mpirun to start 2 processes in the master node? Thanks in advance, João Silva __________________________________ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail |
From: Peter E. <Pet...@un...> - 2004-09-08 08:43:27
|
Hi, we had a 'Scyld' Cluster running for years until some disk on the master died. Some MPI program and other 'normal' programs were running. After the disk crash the bproc system was hanging and after a reboot all nodes didn't come up. Finally, we installed RH9+Clustermatic 4. Main reason for this: we had no good documentation aboud scyld and the old sysadmin had left the institute. Setting up clustermatic was quite easy. Nodes had scratch disks with three partitions: Scyld boot partition (actually an ext2 partition), swap, and /scratch. All three partitions where destroyed - I could not mount any of them. On about 10 of 22 nodes! Even swap was not recognized as such, although fdisk reported all partitions. I suspect something was going wrong with the kernel part of bproc. Is this possible? Best, Peter. -- Astron. Institut Uni Basel, Venusstr. 7, CH-4102 Binningen, Switzerland Phone: +41 (61) 2055-434, fax: -455, http://www.astro.unibas.ch/~ppe/ |
From: <er...@he...> - 2004-09-07 21:44:53
|
On Mon, Sep 06, 2004 at 07:41:13PM +0200, Florian Bruckner wrote: > can anybody tell me how bproc slave is connected with the bpmaster? > > as I understood it the daemons communicate with there kernel module via > a filedescriptor which the get by a system call. And all communication > between master and slave is done via a user space socket connection. > Requests sent from the master are interpreted by the slave or passed on > to the kernel (masqfs). > > But why are there sockets in kernel mode? i think they are only used for > moving processes around. But why aren't they also transfered via the > user space sockets? > > Is there any documentation around that describes all connections used by > bproc? The best documentation on the guts is probably here: http://public.lanl.gov/cluster/papers/papers/hendriks-ics02.pdf It talks a little about connections. In a nutshell, the kernel stuff passes messages to the daemons. The slaves all have a tcp connection the master. These connections are only used to pass small messages around. When a process wants to move, a messages is sent to the destination telling it that it wants to move. The destination establishes a new TCP connection back to the source. The actual process data is sent on that connection. The kernel code establishes this new connection. The kernel sets up that connection for security reasons. That way user space processes can't poke weird stuff in there. - Erik |
From: Florian B. <flo...@ao...> - 2004-09-06 17:39:08
|
can anybody tell me how bproc slave is connected with the bpmaster? as I understood it the daemons communicate with there kernel module via a filedescriptor which the get by a system call. And all communication between master and slave is done via a user space socket connection. Requests sent from the master are interpreted by the slave or passed on to the kernel (masqfs). But why are there sockets in kernel mode? i think they are only used for moving processes around. But why aren't they also transfered via the user space sockets? Is there any documentation around that describes all connections used by bproc? thanks in advance FloB flo...@ao... |
From: Joshua A. <lu...@ln...> - 2004-09-04 20:24:41
|
On Sat, Sep 04, 2004 at 01:50:23PM -0600, Michal Jaegermann wrote: > On 2004/Aug/20 I posted a message with a subject "Note on modules on > nodes (NFS support in particular)" which included a sample script. > Techniques demonstrated there apply here as well. > > Michal Here is a sample script that I put together for a system last year and is probably the progenator of Daryl's disk formating script. You call it with 'format_disk $BSH_NODERAGE'. If you are familure with fdisk, the gobledygook that you feed it through stdin will make sense :) Josh |
From: Michal J. <mi...@ha...> - 2004-09-04 19:50:50
|
On Sat, Aug 28, 2004 at 03:01:02PM -0500, Luke Palmer wrote: > I can't just do 'bpsh X fdisk' because the /dev filesystem isn't > present on slave nodes. There is, although various special files you would like to have in the specific situation may be not there; but nothing prevents you from adding these. A simple way is to use 'plugin miscfiles' directive in your node_up.conf and add whatever devices you would like to see. Alternatively something in that style will do (assuming that you want all /dev/hda* on nodes - as an example ): find /dev -name 'hda*' | cpio -o -c --quiet | \ bpsh $node cpio --quiet -imd for every "$node" you want that to happen. This can be run from 'node_up' script, or later, or separately. Whatever you need/desire. 'tar' instead of 'cpio' will do as well as GNU tar handles special device files too and there are still some other ways to accomplish that goal. > So- what's a good way to partition? 'sfdisk' run on nodes is handy as you can feed it a requested partitioning scheme from stdin. You may need some additional libraries for that and 'ldd' will tell you what they are. Add them to your configuration or copy to nodes similar like the above and you are set. Scripting that is a good idea .:-) For a specific situation that will be a few lines; things are getting longer if you are trying to be general. You will likely need still some other shared libraries if you would want to set some file systems, or run 'mkswap', on freshly created disk partitions. On 2004/Aug/20 I posted a message with a subject "Note on modules on nodes (NFS support in particular)" which included a sample script. Techniques demonstrated there apply here as well. Michal |
From: <er...@he...> - 2004-09-03 19:26:27
|
On Tue, Aug 24, 2004 at 04:01:23PM -0700, Vipul Deokar wrote: > Folks, > > I have a configuration of 4 identical "compute" nodes > (with disks) and 1 slightly different "master" node. > The master node has a slightly more powerful CPU, more > RAM, an additonal 10/100 NIC interface to connect to > extranet, a CDROM drive on secondary IDE, a more > recent verison of BIOS firmware. > > With Red Hat 9 (runlevel 3) and Clustermatic4 i386 > RPMS installed on computeNode#1 as master, I can build > a cluster with the other 4 nodes as slaves. (I see one > node rebooting every 5-6 minutes that I need to > debug). I am using this cluster currently. > > > However, with RedHat 9 (runlevel 3 or 5; more > packages) and Clustermatic4 i386 RPMS installed on the > "master", I cannot get any of the "compute" nodes up > as slaves. The node_up script fails on all nodes with > : > nodeup : Starting 1 child processes. > nodeup : Finished creating child processes. > nodeup : I/O error talking to child > nodeup : Child process for node 1 died with signal > 4 > > > The same config, node_up, config.boot scripts execute > in both cluster configuration attempts - one succeeds > and the other fails. Any insight why this would > happen? Signal 4 = SIGILL. This usually happens when the front end node is some newer rev of processor than the back end nodes. Red Hat is pretty good about installing the best glibc it can on the front end. (e.g. install the i686 version instead of the i386 one) I usually see something like this when the slave node is some other CPU type that doesn't qualify as i686. The issue is really that the library gets loaded on the front end and those instructions turn out to not be valid on the destination node. My work around is to downgrade the glibc on the front end to one that will work on all nodes. In other words, load the i386 one instead of the i686 one. - Erik |
From: <er...@he...> - 2004-09-03 19:19:00
|
On Sat, Aug 28, 2004 at 03:01:02PM -0500, Luke Palmer wrote: > Hello, > > This may seem a dumb/obvious question, but I'm trying to figure out how > to easily partition disks on slave nodes. I can't just do 'bpsh X > fdisk' because the /dev filesystem isn't present on slave nodes. So- > what's a good way to partition? > > Of course I read about beofdisk after some google searches. Now, I know > scyld has always liked to more or less hide the ways of getting their > software for free, but I can't find scyld ANYWHERE. I recall that > penguin computing has something to do with it now- poked around there > too with no luck. Can any one comment of the fate of scyld? I have no idea about scyld. Daryl's disk partitioning script is newer so I figure it's more likely to work but I do have an old mirror of their FTP site from about 1.5 yrs ago. I've been keeping it around since Becker is in the habit of accusing us of IP theft. Email me directly if you want an old beoboot source RPM or something. The beofdisk script is in there and it's got a nice big GPL license banner on it so it's fair game to pass around. - Erik |
From: <er...@he...> - 2004-09-03 19:10:05
|
On Tue, Aug 31, 2004 at 11:33:12AM -0500, Luke Palmer wrote: > Hello, > > I've noticed some daemons having trouble running on bproc nodes, usually > complaining about child processes dying. I'm running pre6 with 2.6.7, > libraries via an NFS mount. > > Attached is a simple pthreads test program. It runs fine on the master, > but segfaults when running via bpsh. I've attached the last few lines > of an strace of the test program. Looks fishy. > > Would others mind trying this test program, and commenting on their > findings? pthreads has been a sore point for a long time. However, I think those days might be over. *cross fingers* Grab the latest 'n' greatest bproc from CVS on sourceforge.net. It's got a lot of fixes for threading related stuff in there. I think I'm going to have a tagged version again pretty soon. Also, make sure that you're using NPTL and not the old linuxthreads stuff. I've done testing with NPTL only at this point. If linuxthreads doesn't work I'm prepared to say "use NPTL". Note that you can't migrate a multithreaded thing at this point. The result of that is undefined. - Erik |
From: Luke P. <lop...@wi...> - 2004-08-31 16:34:09
|
Hello, I've noticed some daemons having trouble running on bproc nodes, usually complaining about child processes dying. I'm running pre6 with 2.6.7, libraries via an NFS mount. Attached is a simple pthreads test program. It runs fine on the master, but segfaults when running via bpsh. I've attached the last few lines of an strace of the test program. Looks fishy. Would others mind trying this test program, and commenting on their findings? Thanks -Luke clone(child_stack=0x40817b08, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0x40817bf8, {entry_number:6, base_addr:0x40817bb0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0x40817bf8) = -1 EFAULT (Bad address) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ #include <iostream> #include <pthread.h> #include <stdlib.h> using namespace std; void *task(void *arg) { for (;;) { cout << (char *)arg; cout.flush(); } return NULL; } int main() { pthread_t t1; if ( pthread_create(&t1, NULL, task, (void *)"1") != 0 ) { cout << "pthread_create() error" << endl; abort(); } task((void *)"2"); } |