Thread: [SSI-users] Newbie trouble adding 2nd node...
Brought to you by:
brucewalker,
rogertsang
From: Jonathan D. P. <jo...@cs...> - 2005-12-09 18:24:38
|
Hi, I'm trying to set up a two node test cluster on "whitebox" supermicro servers, both dual processor xeons with HT disabled in bios using tg3 NICs with PXE. I'm using atftpd as the tftp server wich seems to go fine. after using nodeadd to configure the second node (no root failover), if comes up, gets a kernel and seems to chug pretty far through init, configuring the network, loading modules, even reporting joining the cluster in the 1st node's syslog: Dec 9 13:10:11 borg1 kernel: nm_add_node: Node 2 added at this point cluster -v says that node2 is "comingup" or some such, but it never gets to UP node2 panics and drops into the debugger with: Setting up IP spoofing protection: rp_filter Configuring network interfaces ... Disabling Privacy extentions on device c05ec740(lo) nm_send: Error 22 sending imalive! nm_send: Error 22 sending imalive! Kernel panic - not syncing: lost network connection to all potential root nodes! Instruction(i) breakpoint #0 at 0xc01272e0 (adjusted) 0xc01272e0 panik_hook: int3 Ouch! the cluster network is configured on eth0 through a cheapish netgear gigabit switch (one of the little blue bastards). Hopefully the fix is obvious to someone who's done this before, help? Thanks, -Jon |
From: Ivan K. <kr...@fa...> - 2005-12-09 18:44:58
|
Jonathan D. Proulx wrote: > Hopefully the fix is obvious to someone who's done this before, help? You've omitted both the OpenSSI version in question, and the distribution you're using. Telepathic debugging is scheduled for OpenSSI 3.0. Please bear with us until then by providing complete information in trouble reports ;) -- Ivan Krstic <kr...@fa...> | 0x147C722D |
From: Jonathan D. P. <jo...@cs...> - 2005-12-09 18:51:08
|
On Fri, Dec 09, 2005 at 07:44:35PM +0100, Ivan Krstic wrote: :Jonathan D. Proulx wrote: :> Hopefully the fix is obvious to someone who's done this before, help? :Telepathic debugging is scheduled for OpenSSI 3.0. Please bear with us :until then by providing complete information in trouble reports ;) I thought I saw that in the changelog, but have ben todo... Debian stable openSSI 1.9.1 - prepackaged .debs uname -r 2.6.10-ssi-686-smp |
From: Ivan K. <kr...@fa...> - 2005-12-09 19:08:15
|
Jonathan D. Proulx wrote: > Debian stable > openSSI 1.9.1 - prepackaged .debs 1.9.x is our development series, and is still considered unstable. In particular, a great deal of known bugs existed in 1.9.1 and many have been fixed in the 1.9.2 preview release, which we don't yet have packaged for Debian. If you intend to use this cluster in production, I strongly recommend you try the 1.2.2 release instead. If you can't live with the 2.4 kernel, your best bet is trying out the 1.9.2 preview release for Fedora Core 3, or helping us port the 1.9.2 changes to Debian and prepare packages. -- Ivan Krstic <kr...@fa...> | 0x147C722D |
From: Jonathan D. P. <jo...@cs...> - 2005-12-14 19:25:36
|
On Fri, Dec 09, 2005 at 08:07:54PM +0100, Ivan Krstic wrote: :If you intend to use this cluster in production, I strongly recommend :you try the 1.2.2 release instead. If you can't live with the 2.4 :kernel, your best bet is trying out the 1.9.2 preview release for Fedora :Core 3, or helping us port the 1.9.2 changes to Debian and prepare packages. I'd though I needed the 1.9 series for the OpenAFS patches, but with further reading saw that these were available for 1.2, so I've taken your advice. This brings up a new problem. After a fresh install of debian on the machine and adding and pinning the apt sources for openssi (1.2.2-1) and dist-upgrading. I cannot get it to build a proper initrd. It seems to be ignoring the /etc/mkinitrd/modules file. I need the tg3 module, which I added but does not show up in the initrd. I added somemore network drives that I don't need and none of those made it in either. mkinitrd compains about: /usr/sbin/mkinitrd: MODULES=dep cannot be done due to version conflict /usr/sbin/mkinitrd: using MODULES=most instead I don't know if this is due to some remnant of the previous initrd-tools install or what. I also tried mounting the initrd on the loopback and jamming the net/tg3.o in there myself. On boot this complained that there was no such device. I have identical hardware running custom 2.4.28 kernels and it used the tg3 fro the broadcom nic and when I had this under OpenSSI 1.9.1 adding tg3 to the modules file just worked (for the first node atleast). Any thought on what I'm doing wrong? Thanks, -Jon |