Ran into a few issues with KDB.  The main issue is the do_int3() function prototype in the existing KDB patch is a fastcall which looks for arguments in the CPU registers, but the revised int3 trap pushes the arguments on the stack.  That was a tricky one to catch.

Now it looks like we got a working CentOS 4 kernel for OpenSSI...

PXELINUX 2.11 2004-08-16  Copyright (C) 1994-2004 H. Peter Anvin
boot:
Loading kernel......................................
Loading initrd........................................
Ready.
Linux version 2.6.9-89.0.29.EL-ssi6_3.EL4.7 (root@node1) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-11)) #11 SMP Thu Oct 7 01:04:15 EDT 2010
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fee0000 (usable)
 BIOS-e820: 000000007fee0000 - 000000007fee3000 (ACPI NVS)
 BIOS-e820: 000000007fee3000 - 000000007fef0000 (ACPI data)
 BIOS-e820: 000000007fef0000 - 000000007ff00000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
1150MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f5810
NX (Execute Disable) protection: active
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:3 APIC version 16
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
Enabling APIC mode:  Flat.  Using 0 I/O APICs
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 3, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xfecc0000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 3, address 0xfecc0000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 80000000 (gap: 7ff00000:60100000)
Built 1 zonelists
Kernel command line: initrd=initrd ro BOOT_IMAGE=kernel
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 1801.249 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2065708k/2096000k available (3332k kernel code, 29476k reserved, 1719k data, 200k init, 1178496k highmem)
Calibrating delay using timer specific routine.. 3603.40 BogoMIPS (lpj=1801700)
kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
kdb_cmd[0]: bpa panic_hook
Instruction(i) BP #0 at 0xc0124ca0 (panic_hook)
    is enabled globally adjust 1
kdb_cmd[1]: defcmd archkdb "" "First line arch debugging"
kdb_cmd[7]: defcmd archkdbcpu "" "archkdb with only tasks on cpus"
kdb_cmd[13]: defcmd archkdbshort "" "archkdb with less detailed backtrace"
kdb_cmd[19]: defcmd archkdbcommon "" "Common arch debugging"
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
VProc hash table entries: 65536 (order: 7, 524288 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Core 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: AMD Dual Core AMD Opteron(tm) Processor 165 stepping 02
per-CPU timeslice cutoff: 2925.15 usecs.
task migration cache decay timeout: 2 msecs.
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 3599.60 BogoMIPS (lpj=1799801)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(2) -> Core 1
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Dual Core AMD Opteron(tm) Processor 165 stepping 02
Total of 2 processors activated (7203.00 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
checking TSC synchronization across 2 CPUs:
CPU#0 had 0 usecs TSC skew, fixed it up.
CPU#1 had 0 usecs TSC skew, fixed it up.
Brought up 2 CPUs
zapping low mappings.
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
Freeing initrd memory: 2425k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 3.00 entry at 0xf1e40, last bus=6
PCI: Using MMCONFIG
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040816
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 6 7 10 11 12) *5
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 6 *7 10 11 12)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 *10 11 12)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 6 7 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 6 7 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 6 7 10 *11 12)
ACPI: PCI Interrupt Link [ALKA] (IRQs *20)
ACPI: PCI Interrupt Link [ALKB] (IRQs *21)
ACPI: PCI Interrupt Link [ALKC] (IRQs *22), disabled.
ACPI: PCI Interrupt Link [ALKD] (IRQs *23), disabled.
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 27 (level, low) -> IRQ 177
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 31 (level, low) -> IRQ 185
ACPI: PCI Interrupt 0000:00:03.1[B] -> GSI 35 (level, low) -> IRQ 193
ACPI: PCI Interrupt 0000:00:03.2[C] -> GSI 39 (level, low) -> IRQ 201
ACPI: PCI Interrupt 0000:00:03.3[D] -> GSI 43 (level, low) -> IRQ 209
ACPI: PCI Interrupt 0000:00:0b.0[A] -> GSI 16 (level, low) -> IRQ 217
ACPI: PCI Interrupt 0000:00:0d.0[A] -> GSI 18 (level, low) -> IRQ 225
ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 20
ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 233
ACPI: PCI Interrupt 0000:00:0f.1[A] -> GSI 20 (level, low) -> IRQ 233
ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 21
ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 21 (level, low) -> IRQ 50
ACPI: PCI Interrupt 0000:00:10.1[A] -> GSI 21 (level, low) -> IRQ 50
ACPI: PCI Interrupt 0000:00:10.2[B] -> GSI 21 (level, low) -> IRQ 50
ACPI: PCI Interrupt 0000:00:10.3[B] -> GSI 21 (level, low) -> IRQ 50
ACPI: PCI Interrupt 0000:00:10.4[C] -> GSI 21 (level, low) -> IRQ 50
ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 36 (level, low) -> IRQ 58
Machine check exception polling timer started.
highmem bounce pool size: 64 pages
Initializing Cryptographic API
CFS server token hash table entries: 262144 (order: 8, 1048576 bytes)
SSI Token Message hash table entries: 4096 (order: 3, 32768 bytes)
PCI: Via IRQ fixup for 0000:00:10.0, from 5 to 2
PCI: Via IRQ fixup for 0000:00:10.1, from 5 to 2
PCI: Via IRQ fixup for 0000:00:10.2, from 7 to 2
PCI: Via IRQ fixup for 0000:00:10.3, from 7 to 2
ACPI: Processor [CPU0] (supports C1)
ACPI: Processor [CPU1] (supports C1)
ACPI: Thermal Zone [THRM] (37 C)
Real Time Clock Driver v1.12
Hangcheck: starting hangcheck timer 0.9.0 (tick is 180 seconds, margin is 60 seconds).
Hangcheck: Using get_cycles().
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:0f.1
ACPI: PCI Interrupt 0000:00:0f.1[A] -> GSI 20 (level, low) -> IRQ 233
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8237 (rev 00) IDE UDMA133 controller on pci0000:00:0f.1
    ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
hda: WDC WD2000JB-00EVA0, ATA DISK drive
Using cfq io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: HL-DT-STDVD-RAM GH22NP20, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 1024KiB
hda: Host Protected Area detected.
        current capacity is 390719855 sectors (200048 MB)
        native  capacity is 390721968 sectors (200049 MB)
hda: 390719855 sectors (200048 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100)
 hda: hda1 hda2 hda3
hdc: ATAPI 48X DVD-ROM DVD-R-RAM CD-R/RW drive, 2048kB Cache, UDMA(66)
Uniform CD-ROM driver Revision: 3.20
ACPI: PCI Interrupt 0000:00:0f.0[B] -> GSI 20 (level, low) -> IRQ 233
sata_via 0000:00:0f.0: routed to hard irq line 11
ata1: SATA max UDMA/133 cmd 0xB800 ctl 0xBC02 bmdma 0xC800 irq 233
ata2: SATA max UDMA/133 cmd 0xC000 ctl 0xC402 bmdma 0xC808 irq 233
scsi0 : sata_via
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7, max UDMA/133, 586112591 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : sata_via
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7, max UDMA/133, 976773168 sectors: LBA48 NCQ (depth 0/32)
ata2.00: ata2: dev 0 multi count 16
ata2.00: configured for UDMA/133
  Vendor: ATA       Model: Maxtor 6V300F0    Rev: VA11
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: xxx
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: ATA       Model: WDC WD5000ABYS-0  Rev: 12.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: xxx
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
mice: PS/2 mouse device common for all mice
u32 classifier
    OLD policer on
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 6, 262144 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 262144 (order: 9, 3145728 bytes)
TCP: Hash tables configured (established 262144 bind 262144)
IPVS: Registered protocols (TCP, UDP)
IPVS: Connection hash table configured (size=4096, memory=32Kbytes)
IPVS: ipvs loaded.
NET: Registered protocol family 1
NET: Registered protocol family 17
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 200k freed
Mounted /proc filesystem
Mounting sysfs
Creating /dev
Starting udev
Loading dm-mod.kdevice-mapper: 4.5.5-ioctl (2006-12-01) initialised: dm-devel@redhat.com
o module
Loading forcedeth.ko module
Loavia-rhine.c:v1.10-LK1.2.0-2.6 June-10-2004 Written by Donald Becker
ding r8169.ko mopcnet32.c:v1.31 29.04.2005 tsbogend@alpha.franken.de
dule
Loading sk98lin.ko module
Loading via-rhine.ko module
Loading pcnet32.ko module
Gathering cluster info
WARNING: Could not find a NIC with a static node configuration.
Dynamically allocating an IP address and node number.
Press [Enter] within 10 seconds to halt...

On Thu, Sep 30, 2010 at 11:41 PM, Roger Tsang <roger.tsang@gmail.com> wrote:
As of yesterday I finished fixing all OpenSSI related compiler warnings and errors.

There are some changes in the core that required rework in OpenSSI.  One of them is the ldt element in mm_context_t structure.  The element used to point to a contiguous memory region and is now an array of highmem page pointers.  It affects the data marshaling code in architecture dependent XDR interface for process migration.  I've only updated the i386 portion.

We still need to add KDB and I haven't tried the kernel yet.  When it works we'll gain all the drivers and fixes that comes with CentOS 4.8 that aren't in the vanilla 2.6.11 kernel.

-Roger


On Fri, Sep 24, 2010 at 12:52 AM, Roger Tsang <roger.tsang@gmail.com> wrote:
Hi,

Since we've seen users having trouble with hardware compatibility I felt a little curious tonight and started porting to kernel-2.6.9-89.0.29.EL for CentOS 4.  So far I'm 1/3 through the patch rejects.  If the core hasn't changed much from 2.6.11 we might have something this weekend.  =)

-Roger