Thread: RE: [SSI-devel] OpenSSI kernel on SuSE
Brought to you by:
brucewalker,
rogertsang
From: Walker, B. J <bru...@hp...> - 2004-10-26 14:56:45
|
Bharata, Great to see you have gotten that far. Several observations: A. you commented out the cluster initialization in your linuxrc (pre-root and post-root) so lots of data structures didn't get initialized and I am actually surprised you got as far as you did. I would suggest putting those back in so at least the kernel thinks it is a single node cluster, with all the SSI kernel stuff set up. B. according the strace, init is being run as process 2, which is not good. This might get fixed when you do step A so I wouldn't worry about it yet. C. according to the trace, /dev/initctl is a fifo, which it probably is; however, the system seems to think it is being serviced on another node (the reference to ICS), which is not good; again, not having the cluster initialization might be the cause of this; also might be the case that devfs and devfsd are not completely set up. Bruce P.s. While one might expect that an SSI kernel on which the SSI initialization calls were NOT done could function as a std base kernel, that was not a priority and probably never tested. This of course will be a focus in the "hooks" version in which the SSI code will be modules that can be optionally loaded, functioning as a std. base if the modules are not loaded. > -----Original Message----- > From: ssi...@li...=20 > [mailto:ssi...@li...] On=20 > Behalf Of RAO, BHARATA BHASKER (STSD) > Sent: Tuesday, October 26, 2004 1:30 AM > To: Bruce Walker > Cc: ssi...@li... > Subject: [SSI-devel] OpenSSI kernel on SuSE >=20 >=20 > On Sat, 2004-10-16 at 11:19, Bruce Walker wrote: > > Developers, > >=20 > > As you can see below, Bharata has started looking into=20 > getting OpenSSI=20 > > running on SUSE (finally someone is doing it). > > I would suggest the following plan: > > 1. On an unmodified SUSE system, try to boot an OpenSSI=20 > kernel with an > > OpenSSI ramdisk (being careful to modify the "boottab" file to > > correspond the node you are trying to boot). > > 2. If step #1 doesn't work, we will have to figure out why. > > 3. After step #1 seems to work, try to make sure the networking was > > successfully set up during the ramdisk (otherwise we=20 > will never get > > a second node to join). You could check this perhaps by=20 > just booting > > the node to single user and running ifconfig -a and=20 > trying to use the > > network. >=20 > When I tried the above approach, I could get the kernel to boot, with > the network setup properly(to and fro ping ok) during ramdisk. However > /sbin/init could not run fully and it hangs while trying to open > /dev/initctl. (strace output given below) >=20 > The kdb backtrace at the time of hang is pasted below. The=20 > linuxrc file > from the initrd image I am using is attached with this mail. >=20 > I would like to know if there are any issues with unmodified init > running on ssi kernel. >=20 > Any hints welcome. >=20 > Regards, > Bharata. >=20 >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D kdb backtrace = begin=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D >=20 > Entering kdb (current=3D0xc17dc000, pid 14) on processor 0 due=20 > to Keyboard > Entry > [0]kdb> bt > Stack traceback for pid 14 > 0xc17dc000 14 2 1 0 R 0xc17dc450 *kupdated > EBP EIP Function (args) > 0xc015eab2 .text.lock.buffer+0x22f > kernel .text 0xc0100000 0xc015e883 > 0xc015eb10 > 0xc17ddf3c 0xc015e389 sync_old_buffers+0x19 (0x1, 0x1,=20 > 0xc17dc000, 0x0, > 0x0) > kernel .text 0xc0100000 0xc015e370 > 0xc015e420 > 0xc17ddfec 0xc015e7cd kupdate+0x16d > kernel .text 0xc0100000 0xc015e660 > 0xc015e840 > 0xc01077ed kernel_thread_helper+0x5 > kernel .text 0xc0100000 0xc01077e8 > 0xc0107800 >=20 > [0]kdb> ps > Task Addr Pid Parent [*] cpu State Thread Command > 0xc17dc000 14 2 1 0 R 0xc17dc450 *kupdated > 0xdfff2000 2 0 1 1 R 0xdfff2450 init >=20 > [0]kdb> btp 2 > Stack traceback for pid 2 > 0xdfff2000 2 0 1 1 R 0xdfff2450 init > EBP EIP Function (args) > 0xc028842a .text.lock.ics_cli+0x9 > kernel .text 0xc0100000 0xc0288421 > 0xc0288500 > 0xdfff3e70 0xc0287ad5 icscli_handle_get+0xb5 (0x0, 0x90003, 0x0, > 0xdfff3f34, 0xc000000 > 0) > kernel .text 0xc0100000 0xc0287a20 > 0xc0287b40 > 0xdfff3ea0 0xc0236918 cli_fifonmsvr_getsvr+0x38 (0x0, 0xdfff3ec8, 0x0, > 0x28015, 0x1000 > 01) > kernel .text 0xc0100000 0xc02368e0 > 0xc0236a10 > 0xdfff3edc 0xc0236303 fifonm_getsvr+0xd3 (0xdee9a080, 0xc0159fbc, > 0xdfb4db80, 0xdfb4db > 80, 0xdee9a080) > kernel .text 0xc0100000 0xc0236230 > 0xc0236480 > 0xdfff3efc 0xc016f4d3 fifo_open+0x73 (0xdee9a080, 0xdfb4db80, 0x8001, > 0xc03c71ec, 0xdf > ff3f64) > kernel .text 0xc0100000 0xc016f460 > 0xc016f8b7 > 0xdfff3f18 0xc01584bd dentry_open_it+0xed (0xdeda0d80, 0xdef0ff80, > 0x8001, 0xdfff3f34, > 0xdfff3f34) > kernel .text 0xc0100000 0xc01583d0 > 0xc01585f0 > 0xdfff3fa0 0xc01583cd filp_open+0x8d (0xdeef2000, 0x8001, 0x0, > 0xdfff2000, 0x0) > kernel .text 0xc0100000 0xc0158340 > 0xc01583d0 > 0xdfff3fbc 0xc0158801 sys_open+0x51 (0x80a5f10, 0x8001, 0x0, 0x0, > 0x80a5d60) > kernel .text 0xc0100000 0xc01587b0 > 0xc0158860 > 0xc010be37 system_call+0x33 > kernel .text 0xc0100000 0xc010be04 > 0xc010be3c >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D kdb backtrace end=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D strace o/p for = /sbin/init = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D >=20 > sh-2.05b# strace /sbin/init 3 > execve("/sbin/init", ["/sbin/init", "3"], [/* 7 vars */]) =3D 0 > uname({sys=3D"Linux", node=3D"(none)", ...}) =3D 0 > brk(0) =3D 0x80bf4ac > brk(0x80e04ac) =3D 0x80e04ac > brk(0x80e1000) =3D 0x80e1000 > umask(022) =3D 022 > getpid() =3D 108 > geteuid32() =3D 0 > rt_sigaction(SIGSTOP, {SIG_IGN}, NULL, 8) =3D -1 EINVAL=20 > (Invalid argument) > rt_sigaction(SIGTERM, {SIG_IGN}, NULL, 8) =3D 0 > rt_sigaction(SIGALRM, {0x8048b80, [], SA_RESTORER,=20 > 0x804db68}, NULL, 8) > =3D 0 > alarm(3) > open("/dev/initctl", O_WRONLY|O_LARGEFILE >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D strace o/p for = /sbin/init = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D dmesg begin = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > Linux version 2.4.22-1.2199.nptl_ssi_3develsmp > (ro...@dl...) (gcc version > 3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #1 SMP Wed Sep 29=20 > 15:58:35 IST > 2004 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009f400 (usable) > BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 000000001fffa000 (usable) > BIOS-e820: 000000001fffa000 - 0000000020000000 (ACPI data) > BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) > BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved) > BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved) > 0MB HIGHMEM available. > 511MB LOWMEM available. > found SMP MP-table at 000f4fd0 > hm, page 000f4000 reserved twice. > hm, page 000f5000 reserved twice. > hm, page 000fe000 reserved twice. > hm, page 000ff000 reserved twice. > On node 0 totalpages: 131066 > zone(0): 4096 pages. > zone(1): 126970 pages. > zone(2): 0 pages. > ACPI: RSDP (v000 COMPAQ ) @ > 0x000f4f70 > ACPI: RSDT (v001 COMPAQ P31 0x00000002 .. 0x0000162e) @=20 > 0x1fffa000 > ACPI: FADT (v001 COMPAQ P31 0x00000002 .. 0x0000162e) @=20 > 0x1fffa040 > ACPI: MADT (v001 COMPAQ 00000083 0x00000002 0x00000000) @ 0x1fffa100 > ACPI: SPCR (v001 COMPAQ SPCRRBSU 0x00000001 .. 0x0000162e) @=20 > 0x1fffa1c0 > ACPI: DSDT (v001 COMPAQ DSDT 0x00000001 MSFT 0x0100000b) @ > 0x00000000 > ACPI: Local APIC address 0xfee00000 > ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > Processor #0 Pentium 4(tm) XEON(tm) APIC version 20 > ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled) > ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) > ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) > Processor #6 Pentium 4(tm) XEON(tm) APIC version 20 > ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) > Processor #1 Pentium 4(tm) XEON(tm) APIC version 20 > ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled) > ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) > ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) > Processor #7 Pentium 4(tm) XEON(tm) APIC version 20 > ACPI: LAPIC_NMI (acpi_id[0xff] polarity[0x0] trigger[0x0] lint[0x1]) > Using ACPI for processor (LAPIC) configuration information > Intel MultiProcessor Specification v1.4 > Virtual Wire compatibility mode. > OEM ID: COMPAQ Product ID: PROLIANT APIC at: 0xFEE00000 > I/O APIC #2 Version 17 at 0xFEC00000. > I/O APIC #3 Version 17 at 0xFEC01000. > I/O APIC #4 Version 17 at 0xFEC02000. > I/O APIC #5 Version 17 at 0xFEC03000. > Processors: 4 > xAPIC support is present > Enabling APIC mode: Flat. Using 4 I/O APICs > Kernel command line: root=3D6807 console=3DttyS0,38400 console=3Dtty0 = text > desktop splash=3Dsi > lent > Initializing CPU#0 > Detected 2799.308 MHz processor. > Console: colour VGA+ 80x25 > Calibrating delay loop... 5583.66 BogoMIPS > Memory: 508072k/524264k available (2842k kernel code, 15804k reserved, > 2448k data, 176 > k init, 0k highmem) > kdb version 4.3 by Keith Owens, Scott Lurndal. Copyright SGI,=20 > All Rights > Reserved > kdb_cmd[0]: bpa panic_hook > Instruction(i) BP #0 at 0xc0128890 (panic_hook) > is enabled globally adjust 1 > Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) > Inode cache hash table entries: 32768 (order: 6, 262144 bytes) > Mount cache hash table entries: 512 (order: 0, 4096 bytes) > Buffer cache hash table entries: 32768 (order: 5, 131072 bytes) > Page-cache hash table entries: 131072 (order: 7, 524288 bytes) > CPU: Trace cache: 12K uops, L1 D cache: 8K > CPU: L2 cache: 512K > CPU: Physical Processor ID: 0 > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... done. > Checking 'hlt' instruction... OK. > POSIX conformance testing by UNIFIX > mtrr: v1.40 (20010327) Richard Gooch (rg...@at...) > mtrr: detected mtrr type: Intel > CPU: Trace cache: 12K uops, L1 D cache: 8K > CPU: L2 cache: 512K > CPU: Physical Processor ID: 0 > Intel machine check reporting enabled on CPU#0. > CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 09 > per-CPU timeslice cutoff: 1462.64 usecs. > task migration cache decay timeout: 10 msecs. > enabled ExtINT on CPU#0 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Booting processor 1/1 eip 3000 > Initializing CPU#1 > masked ExtINT on CPU#1 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Calibrating delay loop... 5596.77 BogoMIPS > CPU: Trace cache: 12K uops, L1 D cache: 8K > CPU: L2 cache: 512K > CPU: Physical Processor ID: 0 > Intel machine check reporting enabled on CPU#1. > CPU1: Intel(R) Xeon(TM) CPU 2.80GHz stepping 09 > Booting processor 2/6 eip 3000 > Initializing CPU#2 > masked ExtINT on CPU#2 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Calibrating delay loop... 5596.77 BogoMIPS > CPU: Trace cache: 12K uops, L1 D cache: 8K > CPU: L2 cache: 512K > CPU: Physical Processor ID: 3 > Intel machine check reporting enabled on CPU#2. > CPU2: Intel(R) Xeon(TM) CPU 2.80GHz stepping 09 > Booting processor 3/7 eip 3000 > Initializing CPU#3 > masked ExtINT on CPU#3 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Calibrating delay loop... 5596.77 BogoMIPS > CPU: Trace cache: 12K uops, L1 D cache: 8K > CPU: L2 cache: 512K > CPU: Physical Processor ID: 3 > Intel machine check reporting enabled on CPU#3. > CPU3: Intel(R) Xeon(TM) CPU 2.80GHz stepping 09 > Total of 4 processors activated (22373.99 BogoMIPS). > ENABLING IO-APIC IRQs > Setting 2 in the phys_id_present_map > ...changing IO-APIC physical APIC ID to 2 ... ok. > Setting 3 in the phys_id_present_map > ...changing IO-APIC physical APIC ID to 3 ... ok. > Setting 4 in the phys_id_present_map > ...changing IO-APIC physical APIC ID to 4 ... ok. > Setting 5 in the phys_id_present_map > ...changing IO-APIC physical APIC ID to 5 ... ok. > ..TIMER: vector=3D0x31 pin1=3D2 pin2=3D0 > testing the IO APIC....................... >=20 >=20 >=20 >=20 > .................................... done. > Using local APIC timer interrupts. > calibrating APIC timer ... > ..... CPU clock speed is 2799.1766 MHz. > ..... host bus clock speed is 133.2940 MHz. > cpu: 0, clocks: 1332940, slice: 266588 > CPU0<T0:1332928,T1:1066336,D:4,S:266588,C:1332940> > cpu: 1, clocks: 1332940, slice: 266588 > cpu: 2, clocks: 1332940, slice: 266588 > cpu: 3, clocks: 1332940, slice: 266588 > CPU2<T0:1332928,T1:533136,D:28,S:266588,C:1332940> > CPU3<T0:1332928,T1:266576,D:0,S:266588,C:1332940> > CPU1<T0:1332928,T1:799744,D:8,S:266588,C:1332940> > cpu_sibling_map[0] =3D 1 > cpu_sibling_map[1] =3D 0 > cpu_sibling_map[2] =3D 3 > cpu_sibling_map[3] =3D 2 > mapping CPU#0's runqueue to CPU#1's runqueue. > mapping CPU#2's runqueue to CPU#3's runqueue. > Starting migration thread for cpu 0 > smp_num_cpus: 4. > Starting migration thread for cpu 1 > Starting migration thread for cpu 2 > Starting migration thread for cpu 3 > mtrr: your CPUs had inconsistent fixed MTRR settings > mtrr: probably your BIOS does not setup all CPUs > ACPI: Subsystem revision 20031002 > ACPI: Interpreter disabled. > PCI: PCI BIOS revision 2.10 entry at 0xf0094, last bus=3D6 > PCI: Using configuration type 1 > PCI: Probing PCI hardware > PCI: Probing PCI hardware (bus 00) > PCI: Ignoring BAR0-3 of IDE controller 00:0f.1 > PCI: Discovered peer bus 01 > PCI: Discovered peer bus 04 > PCI->APIC IRQ transform: (B0,I4,P0) -> 31 > PCI->APIC IRQ transform: (B0,I5,P0) -> 23 > PCI->APIC IRQ transform: (B0,I5,P1) -> 22 > PCI->APIC IRQ transform: (B0,I15,P0) -> 10 > PCI->APIC IRQ transform: (B1,I2,P0) -> 30 > PCI->APIC IRQ transform: (B4,I2,P0) -> 29 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:02 not found by BIOS > PCI: Device 00:78 not found by BIOS > PCI: Device 00:7b not found by BIOS > PCI: Device 00:88 not found by BIOS > PCI: Device 00:8a not found by BIOS > isapnp: Scanning for PnP cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Initializing RT netlink socket > apm: BIOS not found. > Starting kswapd > VFS: Disk quotas vdquot_6.5.1 > Journalled Block Device driver loaded > devfs: v1.12c (20020818) Richard Gooch (rg...@at...) > devfs: boot_options: 0x0 > pty: 2048 Unix98 ptys configured > Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT > SHARE_IRQ SERIAL_PC > I ISAPNP enabled > ttyS0 at 0x03f8 (irq =3D 4) is a 16550A > Real Time Clock Driver v1.10e > NET4: Frame Diverter 0.46 > RAMDISK driver initialized: 16 RAM disks of 9000K size 1024 blocksize > Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4 > ide: Assuming 33MHz system bus speed for PIO modes; override with > idebus=3Dxx > SvrWks CSB5: IDE controller at PCI slot 00:0f.1 > SvrWks CSB5: chipset revision 147 > SvrWks CSB5: not 100% native mode: will probe irqs later > SvrWks CSB5: simplex device: DMA forced > ide0: BM-DMA at 0x2000-0x2007, BIOS settings: hda:pio, hdb:pio > SvrWks CSB5: simplex device: DMA forced > ide1: BM-DMA at 0x2008-0x200f, BIOS settings: hdc:pio, hdd:pio > hda: COMPAQ CD-ROM SN-124, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > ide: late registration of driver. > md: md driver 0.90.0 MAX_MD_DEVS=3D256, MD_SB_DISKS=3D27 > md: Autodetecting RAID arrays. > md: autorun ... > md: ... autorun DONE. > pci_hotplug: PCI Hot Plug PCI Core version: 0.5 > Initializing Cryptographic API > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 4096 buckets, 32Kbytes > TCP: Hash tables configured (established 32768 bind 32768) > Linux IP multicast router 0.06 plus PIM-SM > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > IPVS: Connection hash table configured (size=3D65536, = memory=3D512Kbytes) > IPVS: ipvs loaded. > IPVS: [wlc] scheduler registered. > RAMDISK: Compressed image found at block 0 > Freeing initrd memory: 1575k freed > VFS: Mounted root (ext2 filesystem). > Freeing unused kernel memory: 176k freed > Note: unable to open serial console. > kmod: failed to exec /sbin/modprobe -s -k sysfs, errno =3D 2 > kmod: failed to exec /sbin/modprobe -s -k freesysfs, errno =3D 2 > SCSI subsystem driver Revision: 1.00 > kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno =3D = 2 > HP CISS Driver (v 2.4.50) > blocks=3D 35553120 block_size=3D 512 > heads=3D 255, sectors=3D 32, cylinders=3D 4357 RAID 0 >=20 > blk: queue c071c320, I/O limit 4294967295Mb (mask 0xffffffffffffffff) > Partition check: > cciss/c0d0: p1 < p5 p6 p7 p8 p9 > > tg3.c:v2.2 (August 24, 2003) > eth0: Tigon3 [partno(N/A) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) > 10/100/1000BaseT Et > hernet 00:0b:cd:82:76:5e > eth1: Tigon3 [partno(N/A) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) > 10/100/1000BaseT Et > hernet 00:0b:cd:82:76:55 > kjournald starting. Commit interval 5 seconds > EXT3-fs: mounted filesystem with ordered data mode. > tg3: eth0: Link is up at 100 Mbps, full duplex. > tg3: eth0: Flow control is on for TX and on for RX. >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D dmesg end = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 |
From: Bharata B R. <bha...@hp...> - 2004-11-04 12:27:30
|
Hello, OpenSSI kernel with corresponding ramdisk runs almost perfectly on SuSE 9.0 system. While the kernel had to be force installed, the linuxrc in the ramdisk needed some minor changes (modified linuxrc attached) The system boots completely to runlevel 3 with a few issues: - Lots of devices(like tty, pty etc) in /dev not set up properly. This mainly results in the failure of correct network configuration and terminal get. By hand-creating the missing devices, system does comeup fully into runlevel 3. - Some other minor issues all seem related to missing devices in /dev. Next steps to be tried: - Resolve /dev problems. - Try to boot 2nd node by installing openssi-tools and cluster-tools rpm. Will keep the list updated on the progress... Regards, Bharata. On Tue, 2004-10-26 at 20:26, Walker, Bruce J wrote: > Bharata, > > Great to see you have gotten that far. Several observations: > A. you commented out the cluster initialization in your linuxrc > (pre-root and post-root) so lots of data structures didn't get > initialized and I am actually surprised you got as far as you did. I > would suggest putting those back in so at least the kernel thinks it is > a single node cluster, with all the SSI kernel stuff set up. > B. according the strace, init is being run as process 2, which is not > good. This might get fixed when you do step A so I wouldn't worry about > it yet. > C. according to the trace, /dev/initctl is a fifo, which it probably is; > however, the system seems to think it is being serviced on another node > (the reference to ICS), which is not good; again, not having the > cluster initialization might be the cause of this; also might be the > case that devfs and devfsd are not completely set up. > > Bruce > > P.s. While one might expect that an SSI kernel on which the SSI > initialization calls were NOT done could function as a std base kernel, > that was not a priority and probably never tested. This of course will > be a focus in the "hooks" version in which the SSI code will be modules > that can be optionally loaded, functioning as a std. base if the modules > are not loaded. > |
From: Bharata B R. <bha...@hp...> - 2004-11-04 12:44:23
|
Missed the attachment, sorry, including the linuxrc here only... #!/bin/bash mount -t proc /proc /proc setquiet echo Mounted /proc filesystem echo Mounting sysfs mount -t sysfs none /sys echo "Loading scsi_mod.o module" insmod /lib/scsi_mod.o echo "Loading sd_mod.o module" insmod /lib/sd_mod.o echo "Loading cciss.o module" insmod /lib/cciss.o echo "Loading mii.o module" insmod /lib/mii.o echo "Loading 8390.o module" insmod /lib/8390.o echo "Loading tg3.o module" insmod /lib/tg3.o echo Gathering cluster info nicfound="" for iface in `LC_ALL='C' ifconfig -a | grep HWaddr | sed 's/\(eth[0-9]*\).*HWaddr \(.*\)/\1-\2/'` do ifdev=`echo $iface | cut -f 1 -d -` ifaddr=`echo $iface | cut -f 2 -d -` rec=`tail +2 /etc/boottab | grep $ifaddr` if [ $? -eq 0 ]; then ifaddr=`echo $rec | cut -f 3 -d / | cut -f 1 -d:` ifmask=`echo $rec | cut -f 3 -d / | cut -f 2 -d:` echo "Configuring $ifdev: $ifaddr/$ifmask" ifconfig $ifdev $ifaddr netmask $ifmask nodenum=`echo $rec | cut -f 1 -d /` if [ $nodenum -gt 0 ] then # set variables used for cluster IP address dev=$ifdev node=$nodenum nicfound=1 fi fi done if [ -z "$nicfound" ]; then echo "ERROR: Could not find the NIC used to add this node to the cluster." echo "Unable to continue. Halting." halt -L -f exit 1 # NOT REACHED fi master=`head -1 /etc/boottab` cat >/etc/cluster.conf <<EOF-RC INTERFACES=$dev CLUSTER_MASTER=$master CLUSTER_NODENUM=$node EOF-RC echo Configuring cluster cluster_config --prep echo Running pre-root cluster initialization cluster_config --preroot echo Mounting root in linuxrc mount_remote_root /sysroot if [ $? -ne "0" ] then echo Creating root device echo mkrootdev /dev/root | nash --quiet --force mount -o defaults --ro -t ext3 /dev/root /sysroot if [ $? -ne "0" ] then echo "ERROR: Mounting root file system failed." echo "Unable to continue. Halting." halt -L -f exit 1 # NOT REACHED fi fi echo Unmounting /proc doumount /proc echo Attempting pivot_root bash cd /sysroot pivot_root /sysroot /sysroot/initrd echo Running post-root cluster initialization #/sbin/cluster_config --postroot export LD_LIBRARY_PATH=/lib:/usr/lib:/usr/local/lib:/initrd/lib /initrd/bin/cluster_config --postroot echo Starting init exec /initrd/bin/chroot . /initrd/bin/cluster_config --initproc </dev/console >/dev/console 2>&1 On Thu, 2004-11-04 at 17:55, Bharata B Rao wrote: > Hello, > > OpenSSI kernel with corresponding ramdisk runs almost perfectly on SuSE > 9.0 system. > > While the kernel had to be force installed, the linuxrc in the ramdisk > needed some minor changes (modified linuxrc attached) > > The system boots completely to runlevel 3 with a few issues: > > - Lots of devices(like tty, pty etc) in /dev not set up properly. This > mainly results in the failure of correct network configuration and > terminal get. By hand-creating the missing devices, system does comeup > fully into runlevel 3. > - Some other minor issues all seem related to missing devices in /dev. > > Next steps to be tried: > > - Resolve /dev problems. > - Try to boot 2nd node by installing openssi-tools and cluster-tools > rpm. > > Will keep the list updated on the progress... > > Regards, > Bharata. > > > On Tue, 2004-10-26 at 20:26, Walker, Bruce J wrote: > > Bharata, > > > > Great to see you have gotten that far. Several observations: > > A. you commented out the cluster initialization in your linuxrc > > (pre-root and post-root) so lots of data structures didn't get > > initialized and I am actually surprised you got as far as you did. I > > would suggest putting those back in so at least the kernel thinks it is > > a single node cluster, with all the SSI kernel stuff set up. > > B. according the strace, init is being run as process 2, which is not > > good. This might get fixed when you do step A so I wouldn't worry about > > it yet. > > C. according to the trace, /dev/initctl is a fifo, which it probably is; > > however, the system seems to think it is being serviced on another node > > (the reference to ICS), which is not good; again, not having the > > cluster initialization might be the cause of this; also might be the > > case that devfs and devfsd are not completely set up. > > > > Bruce > > > > P.s. While one might expect that an SSI kernel on which the SSI > > initialization calls were NOT done could function as a std base kernel, > > that was not a priority and probably never tested. This of course will > > be a focus in the "hooks" version in which the SSI code will be modules > > that can be optionally loaded, functioning as a std. base if the modules > > are not loaded. > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |
From: Brian J. W. <Bri...@hp...> - 2004-11-04 21:33:45
|
Bharata B Rao wrote: > Hello, > > OpenSSI kernel with corresponding ramdisk runs almost perfectly on SuSE > 9.0 system. > > While the kernel had to be force installed, the linuxrc in the ramdisk > needed some minor changes (modified linuxrc attached) > > The system boots completely to runlevel 3 with a few issues: > > - Lots of devices(like tty, pty etc) in /dev not set up properly. This > mainly results in the failure of correct network configuration and > terminal get. By hand-creating the missing devices, system does comeup > fully into runlevel 3. > - Some other minor issues all seem related to missing devices in /dev. John, Do you think these issues could be fixed by installing our enhanced devfs? My understanding is that the OpenSSI kernel mounts devfs on /dev, but devfsd is required to create the full complement of devices that the system needs. Brian |
From: John B. <joh...@hp...> - 2004-11-04 23:00:55
|
Brian J. Watson wrote: > Bharata B Rao wrote: > >> Hello, >> >> OpenSSI kernel with corresponding ramdisk runs almost perfectly on SuSE >> 9.0 system. >> >> While the kernel had to be force installed, the linuxrc in the ramdisk >> needed some minor changes (modified linuxrc attached) >> >> The system boots completely to runlevel 3 with a few issues: >> >> - Lots of devices(like tty, pty etc) in /dev not set up properly. This >> mainly results in the failure of correct network configuration and >> terminal get. By hand-creating the missing devices, system does comeup >> fully into runlevel 3. >> - Some other minor issues all seem related to missing devices in /dev. > > > > John, > > Do you think these issues could be fixed by installing our enhanced > devfs? My understanding is that the OpenSSI kernel mounts devfs on /dev, > but devfsd is required to create the full complement of devices that the > system needs. > > Brian > > It will probably make things better, but it might require some tweaks to work with SuSE. John |