|
From: Daniel R. <cos...@gm...> - 2006-04-08 19:45:46
|
> I tested it on MP bochs and the first AP does not run into a triple fault
> just after booting anymore - I just found out that this happens because
> it is not even booted anymore... Instead, a page fault occurs on the BSP
> at
> 0xe0001a34, which is somewhere in mp_detect::DetectFloatingPointer. This
> is also reported correctly by the kernel. The same thing happens on UP
> bochs.
Since my last update on friday DetectFloatingPointer() no longer allocates
a big block of memory to search for the structure, but rather uses a
single page, that is assigned a new physical address on each loop
iteration. Although I had a really thorough look at the function I
couldn't find the problem, and thus now assume that it must be some bug in
either the heap_manager or the mmu class.
What you might want to try is to add a cout statement at the beginning of
the main loop (line 137) that prints the current virtual
(virt_range.GetVirtualBase()) aswell as the physical (pbase + i*4096)
address. While the physical address should increase in 4kB steps, the
virtual address must always be 0xE003F000 - if it isn't there's some
problem in the heap_manager.
Also check on which run of the loop the page-fault occures. If it does
work the first time but crashes on the second run, something must have
gone wrong when the page was assigned its new physical address.
I just had a look at the old kernel's page class, and the only real
difference I could find is, that it invalidates the TLB entry before the
new physical address gets written to it, while I do it the other way
round. On real hardware this doesn't make any difference, but if your
bochs loads the TLB right away (and not only once the page gets accessed),
it might simply miss the new mapping. You should therefore try to reverse
the order of these two instructions (/hal/mmu.cpp line 103 <-> 106).
> Even more interesting is the result on AMD SimNow!: The UP version boots
> correctly and reports that there is only one cpu present. On the MP
> version, our kernel correctly finds 2 cpus and reports that all AP's
> have been
> booted - but the AP does not print its "Hello world" message. (There was
> a small bug in mp_detect::BootAllProcessors - it always returned true.)
> In
> fact, the AP is still not booted.
Could it be that you're using a slightly outdated version of the code ?
The latest version uses an integer return value that either holds the
number of processors booted, or zero if there were some problems during
startup.
> IMHO the usage of bit fields with "empty bits" is quite dangerous because
> the values of these empty bits are undefined and may cause conflicts.
> However, I just checked the value of the ICR and it is absolutely
> correct - but the AP still does not respond. I'll keep on trying...
I would argue that it's not any worse than with traditonal flags. If you
want to make sure that the unused bits are all set to zero, all you have
to do it initialize the bitfield (lapic_icr icr = {0}). Regarding upwards
compatibility this is however hardly any better as it might aswell be that
some of the reserved bits actually have to be set in the future. To be on
the save side one would therefore have to read the register out first and
only mask those bits that really have to be altered. This however should
also be possible using a bitfield:
lapic_icr icr = GetICR()
You might nevertheless be right that the unused bits are part of the
problem, as the AMD's 64bit processors may use an updated APIC version
that requires new flags. Try what happends if you only mask those bits of
the ICR that are really necessary (Use lapic_icr icr = GetICR() then only
set vector, delivery_mode & dest_shorthand). Also check if it you can get
the code to work if you update the ICR's value manually:
46: WriteRegister(lapic_reg(ICR1), target << 24);
47: // SetICR(flags);
48: uint value = (ReadRegister(lapic_reg(ICR0) &FFF3F000) | 0x0608;
49: WriteRegister(lapic_reg(ICR0), value);
To exclude the faintest possibility that gcc for some reasons changes
memory ordering when the ICRs are accessed, you might try to declare the
local APIC's base address (/hal/apic_local line 152) as volatile.
cheers,
Daniel
|