|
From: Victor A. <vic...@ly...> - 2006-04-06 17:39:54
|
Even if you consider it stable or not its nice to put it for download, so t= he file can be downloaded without cvs. I think that the new kernel is good = enough to be the next "target". --=20 _______________________________________________ Search for businesses by name, location, or phone number. -Lycos Yellow Pa= ges http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.as= p?SRC=3Dlycos10 |
|
From: Manuel H. <man...@on...> - 2006-04-06 17:47:46
|
I just downloaded the sources from CVS, compiled them and tested the kernel with bochs, it works fine. I think this is a good check and indicates that the sources are ready for making a tarball. |
|
From: Daniel R. <cos...@gm...> - 2006-04-07 19:47:54
|
I'm right now working on some I/O APIC code that is meant to enable IRQ handling. With this new code the tarball should actually be a much more impressive, as we might easily add some basic keyboard support or a small clock for demonstration purposes. Since it doesn't look as if there were any bigger problems ahead, I'd expect to be able to finish coding on this weekend. Just think of it as a deadline: either I manage to get the new code working until monday or we'll release the tarball without it.. > Manuel wrote: > I just downloaded the sources from CVS, compiled them and tested the > kernel with bochs, it works fine. Ehm.. does this mean that the multiprocessor code works now, or did you just run bochs as a uniprocessor. regards, cosmo86 |
|
From: Manuel H. <mho...@ph...> - 2006-04-08 11:28:55
|
> > Manuel wrote: > > I just downloaded the sources from CVS, compiled them and tested the > > kernel with bochs, it works fine. > > Ehm.. does this mean that the multiprocessor code works now, or did you > just run bochs as a uniprocessor. I tested it on MP bochs and the first AP does not run into a triple fault just after booting anymore - I just found out that this happens because it is not even booted anymore... Instead, a page fault occurs on the BSP at 0xe0001a34, which is somewhere in mp_detect::DetectFloatingPointer. This is also reported correctly by the kernel. The same thing happens on UP bochs. Even more interesting is the result on AMD SimNow!: The UP version boots correctly and reports that there is only one cpu present. On the MP version, our kernel correctly finds 2 cpus and reports that all AP's have been booted - but the AP does not print its "Hello world" message. (There was a small bug in mp_detect::BootAllProcessors - it always returned true.) In fact, the AP is still not booted. IMHO the usage of bit fields with "empty bits" is quite dangerous because the values of these empty bits are undefined and may cause conflicts. However, I just checked the value of the ICR and it is absolutely correct - but the AP still does not respond. I'll keep on trying... Regards, Manuel |
|
From: Daniel R. <cos...@gm...> - 2006-04-08 19:45:46
|
> I tested it on MP bochs and the first AP does not run into a triple fault
> just after booting anymore - I just found out that this happens because
> it is not even booted anymore... Instead, a page fault occurs on the BSP
> at
> 0xe0001a34, which is somewhere in mp_detect::DetectFloatingPointer. This
> is also reported correctly by the kernel. The same thing happens on UP
> bochs.
Since my last update on friday DetectFloatingPointer() no longer allocates
a big block of memory to search for the structure, but rather uses a
single page, that is assigned a new physical address on each loop
iteration. Although I had a really thorough look at the function I
couldn't find the problem, and thus now assume that it must be some bug in
either the heap_manager or the mmu class.
What you might want to try is to add a cout statement at the beginning of
the main loop (line 137) that prints the current virtual
(virt_range.GetVirtualBase()) aswell as the physical (pbase + i*4096)
address. While the physical address should increase in 4kB steps, the
virtual address must always be 0xE003F000 - if it isn't there's some
problem in the heap_manager.
Also check on which run of the loop the page-fault occures. If it does
work the first time but crashes on the second run, something must have
gone wrong when the page was assigned its new physical address.
I just had a look at the old kernel's page class, and the only real
difference I could find is, that it invalidates the TLB entry before the
new physical address gets written to it, while I do it the other way
round. On real hardware this doesn't make any difference, but if your
bochs loads the TLB right away (and not only once the page gets accessed),
it might simply miss the new mapping. You should therefore try to reverse
the order of these two instructions (/hal/mmu.cpp line 103 <-> 106).
> Even more interesting is the result on AMD SimNow!: The UP version boots
> correctly and reports that there is only one cpu present. On the MP
> version, our kernel correctly finds 2 cpus and reports that all AP's
> have been
> booted - but the AP does not print its "Hello world" message. (There was
> a small bug in mp_detect::BootAllProcessors - it always returned true.)
> In
> fact, the AP is still not booted.
Could it be that you're using a slightly outdated version of the code ?
The latest version uses an integer return value that either holds the
number of processors booted, or zero if there were some problems during
startup.
> IMHO the usage of bit fields with "empty bits" is quite dangerous because
> the values of these empty bits are undefined and may cause conflicts.
> However, I just checked the value of the ICR and it is absolutely
> correct - but the AP still does not respond. I'll keep on trying...
I would argue that it's not any worse than with traditonal flags. If you
want to make sure that the unused bits are all set to zero, all you have
to do it initialize the bitfield (lapic_icr icr = {0}). Regarding upwards
compatibility this is however hardly any better as it might aswell be that
some of the reserved bits actually have to be set in the future. To be on
the save side one would therefore have to read the register out first and
only mask those bits that really have to be altered. This however should
also be possible using a bitfield:
lapic_icr icr = GetICR()
You might nevertheless be right that the unused bits are part of the
problem, as the AMD's 64bit processors may use an updated APIC version
that requires new flags. Try what happends if you only mask those bits of
the ICR that are really necessary (Use lapic_icr icr = GetICR() then only
set vector, delivery_mode & dest_shorthand). Also check if it you can get
the code to work if you update the ICR's value manually:
46: WriteRegister(lapic_reg(ICR1), target << 24);
47: // SetICR(flags);
48: uint value = (ReadRegister(lapic_reg(ICR0) &FFF3F000) | 0x0608;
49: WriteRegister(lapic_reg(ICR0), value);
To exclude the faintest possibility that gcc for some reasons changes
memory ordering when the ICRs are accessed, you might try to declare the
local APIC's base address (/hal/apic_local line 152) as volatile.
cheers,
Daniel
|
|
From: Manuel H. <mho...@ph...> - 2006-04-09 09:19:21
|
You are right, I updated the code on thursday and so I missed your updates from friday. Now the code works perfectly fine on both UP and MP bochs, on the latter it also boots up the APs and the print their messages. It also works om UP SimNow!. Unfortunately the kernel hangs on MP SimNow!... For some strange reason the AP is not booted. Even worse: The timeout counter does not stop at 0, but instead overflows and continues below 0! Unless the kernel checks the code in the very short moment when it passes 0, it will wait forever... This seems to be a bug in SimNow! - I'll try to figure that out. However, it should be possible to boot the AP - at least my kernel does. I had a look at the MP specification and it says that there should always be an INIT IPI at the beginning. I've tried that, too - without any effect... Regards, Manuel |
|
From: Daniel R. <cos...@gm...> - 2006-04-09 14:26:50
|
> Unfortunately the kernel hangs on MP SimNow!... For some strange reason > the AP is not booted. Even worse: The timeout counter does not stop at > 0, > but instead overflows and continues below 0! Unless the kernel checks the > code in the very short moment when it passes 0, it will wait forever... I hope you made sure that the local apic's base-address is set-up correct. Right now the kernel always uses the default constructor, which assumes that the register base is 0xFEE00000. You can print the actual base address from mp_detect.cpp in line 93. If it's something different than the default, we just found the bug and all I'll have to do is to change the initialization routines a bit. Apart from that try doing with the LVTT what I described for the ICR yesterday.. regards, cosmo86 |
|
From: Daniel R. <cos...@gm...> - 2006-04-20 19:19:33
|
> I had a look at the MP specification and it says that there should > always be an INIT IPI at the beginning. I've tried that, too - without > any effect... Hi Manuel, I just looked it up and, according to the multiprocessor specifications, you are right here: A INIT IPI really seems to be needed before the AP may even accept a STARTUP IPI. Unless we want to support 486 processors, it's however not necessary to set the CMOS' warm reboot vector, as modern processors don't really reset on an INIT IPI. All they do is to return to real-mode, before they enter a wait-for-SIPI state. Apart from that the STARTUP IPI obviously has to be sent twice in a row (Intel Reference Manual, 7.5) ? The code I just added to the CVS now send an INIT-SIPI-SIPI sequence exactly as specified by the documentation. Apart from that I also fixed a possible bug in SendStartupIPI() by explicitly setting the assert-flag. Actually all IPI's, except for INIT de-assert, must have this flag set, although the Intel reference manual also states that modern CPUs should ignore it. If we're lucky AMD is just a bit more picky then the rest.. regards, cosmo86 |
|
From: Manuel H. <mho...@ph...> - 2006-04-23 14:04:59
|
Hi Daniel, using the INIT-STARTUP-STARTUP sequence finally works and boots up AMD SimNow! The AP is booted up and interrupts are enabled. If I press some keys, the scancodes are displayed. I discovered that there is still a slight problem on uniprocessor bochs. The current IRQ code relies on the presence of an I/O APIC - which does in general not exist on a UP system. The code tries to read one of the registers, which is not mapped into virtual memory, causing a page fault. Regards, Manuel |
|
From: Daniel R. <cos...@gm...> - 2006-04-23 19:39:22
|
> Using the INIT-STARTUP-STARTUP sequence finally works and boots up AMD > SimNow! The AP is booted up and interrupts are enabled. If I press some > keys, the scancodes are displayed. That's some excellent news, although I still don't quite understand were the problem really was. Bochs, aswell as my HT machine, start booting right after the first SIPI was sent, so that the second one is probably only meant to ensure that the BSP really waits until the AP has received the message. That's just a wild guess though, the Apic documentation just doesn't explain at all what the sequence really does. > I discovered that there is still a slight problem on uniprocessor bochs. > The current IRQ code relies on the presence of an I/O APIC - which does > in > general not exist on a UP system. The code tries to read one of the > registers, which is not mapped into virtual memory, causing a page fault. Hmm, that shouldn't be too hard to fix. During the weekend I played around with hyperthreading a bit and managed to start my second logical processor with merely some 10 lines of code - actually I did expect it to be way less trivial. Before the new feature can be added to the kernel, I'll however have to write some cpuid & msr instruction wrappers fist. As I'll be quite busy during the next few days (school starts again - yuck), I however don't expect to be able to write the code before the thursday. Apart from that I also collected some more information about local Apics on uniprocessors systems. It seems as if virtually all processors from the p54c (1994) onwards actually have a local Apic, that is however very often disabled by the BIOS. Luckily the Linux kernel includes a small hack that tries to activate the Apic by simply re-setting the very flag the BIOS cleared to disable it. According to the Intel Manuals this hack is actually not guaranteed to work, but I guess that there can't be that many problem with it, if it's in the official Linux source.. I've already tried to implement it for our kernel, but it doesn't yet work. After executing the hack the cpuid instruction does report a local Apic, but I can't access its registers. Most probably I'll just have to install a MTRR to mark the region as strong uncachable to make it work. regards, cosmo86 |