Re: [flickertcb-devel] Crash on "reload CR3 with the kernel's original value"
Status: Alpha
Brought to you by:
jonmccune
From: Alexander N. <ale...@gm...> - 2014-03-15 21:29:40
|
I'm very sorry for spamming you, but I wanted to add: When I switched from using the latest flicker code from your git repositry and instead used version 0.7, I got in a state of kernel panic after a few minutes Kernel panic - not syncing: attempted to kill init! exitcode=0x00000007 > drm_kms_helper: panic occured, switching back to text console > This was the only thing showing on a black screen. Also go.sh was stuck trying and failing to output anything from dmesg just before the kernel panic occured. I don't know if this happened due to me switching from one flicker version to another or if it was just random chance, but I thought you should now, if it means anything to you? As always, I'm very grateful for any assistance you may send my way. //Alexander Nilsson 2014-03-15 21:56 GMT+01:00 Alexander Nilsson <ale...@gm...>: > Hi again! > > It appears that the OS now resumes on every try (I can see "top" updating > in the background) but somehow mouse movements and keyboard input it does > not seem to register. > > So I modified go.sh and added some output from dmesg right after flicker > session has finished, I got the following output (JPEG): > https://drive.google.com/file/d/0B5DRLAjJvOKtSXh1LXd1Vm9tV2xFdlJ3RkpYdTJoR0lNLWg4/edit?usp=sharing > > I tried to pipe dmesg to a file but that gives me "Input/Output error" > which as you can see is the same message I get when the script tries to > enable the other CPUs again. > > Oh.. and after a while I get a message that my wifi connection has been > lost. > > Can it be that interrupts are not enabled again after OS resume? Or is it > something else? > > //Alexander Nilsson > > > 2014-03-15 20:31 GMT+01:00 Alexander Nilsson <ale...@gm...>: > > Okay so this is interesting, I've installed a new non-pae kernel and now >> when I try flicker 50% of the time I get a computer freeze (infinite) and >> the other time I get a short freeze (~1 s) followed by the next echo in >> go.sh ("Retrieving outputs from Flicker session:") and _then_ infinite >> freeze! But here's the really interesting part: the marker in my terminal >> continue to blink after the first 1 s freeze, so the OS appears to be >> resumed but it does not execute any new commands. >> >> I also tried to execute the pal by modifying go.sh to >> >> echo -n G > $SYSFSPATH/control ; echo 1 > >>> /sys/devices/system/cpu/cpu2/online >>> >> >> thinking that maybe the first CPU has locked somehow, but to no avail. >> >> I will continue investigating, and perhaps I'll be able to describe the >> symptoms a bit clearer then, but is this something you have encountered >> before? >> >> //Alexander Nilsson >> >> >> 2014-03-15 17:50 GMT+01:00 Justin King-Lacroix < >> jus...@cs...>: >> >> On AMT: As far as I knew, AMT only worked over the built-in ethernet >>> port; if the machine you're working on doesn't have one, I see your >>> predicament. (I don't *think* AMT works over wireless... unless Intel has >>> written that feature into recent-ish versions of the platform.) >>> >>> On the kernel: yeah, I'm pretty sure PAE kernels have 64-bit wide CR3 >>> and paging data structure, so that sounds like a potential source of >>> problems. (It makes sense for the *buntu variants to all switch to PAE >>> kernels; a few features are only available in that mode, most importantly >>> the NX bit.) >>> >>> J >>> >>> >>> >>> >>> On 15 March 2014 16:41, Alexander Nilsson <ale...@gm...> wrote: >>> >>>> Thank you for your quick reply. >>>> >>>> Yes it does have AMT but I have not yet managed to get a connection, I >>>> think it has something to do with the fact that I only have a wifi network >>>> card (plus a usb ethernet card), I haven't been able to find out if it is >>>> related or I'm just doing it wrong, I will continue my investigation. >>>> >>>> Your reply got me thinking and I investigated further: PAE appears to >>>> be enabled in my kernel, despite the fact that I specifically installed >>>> xubuntu in order skip that feature (apparently there was a policy change >>>> that I was unaware of regarding kernel features in ubuntu versions later >>>> than 12.10). >>>> >>>> I will change kernel and then I will let you now if it's working or not >>>> after that! >>>> >>>> Thanks! >>>> >>>> //Alexander Nilsson >>>> >>>> >>>> 2014-03-15 16:08 GMT+01:00 Justin King-Lacroix < >>>> jus...@cs...>: >>>> >>>> "... my machine does not have a serial port...": Does your machine >>>>> support any remote management spec, like Intel AMT? If so, you can fiddle >>>>> with the serial port code in Flicker to use that. >>>>> >>>>> "... my machine reboots on the following line...": >>>>> (This following is a total guess, knowing nothing about your >>>>> environment...) >>>>> Are you using a 32-bit or 64-bit OS/kernel? The reason is: >>>>> From memory, CR3 is the page directory pointer. That line looks like >>>>> it restores 32 bits of CR3; if you're in 64-bit mode, that means the top 64 >>>>> bits are zero, which will result in a triple-fault when attempting to do >>>>> anything in kernel memory (like, say, execute). A triple-fault causes a CPU >>>>> reset. >>>>> >>>>> </$0.02> >>>>> >>>>> J >>>>> >>>>> >>>>> >>>>> On 15 March 2014 13:55, Alexander Nilsson <ale...@gm...>wrote: >>>>> >>>>>> Hi once again! >>>>>> >>>>>> I'm sorry for spamming you but maybe you know how to investigate the >>>>>> following problem? >>>>>> >>>>>> Whenever I start a PAL my machine reboots. Since my machine does not >>>>>> have a serial port the only tool I have available for debugging is to put >>>>>> an infinite loop in the code to see if the computer hangs or reboots. >>>>>> >>>>>> By using this approach I now know that my machine reboots on the >>>>>> following line (in bold) in asm.S (about line 630 and forward): >>>>>> >>>>>> /* reload CR3 with the kernel's original value */ >>>>>>> movl %esi, %ebx /* start addr of struct cpu_state */ >>>>>>> movl CPU_STATE_OFFSET_CR3(%ebx), %eax >>>>>>> *movl %eax, %cr3* ; <----------- reboots here >>>>>>> SayAsm // print something without touching stack or data >>>>>>> >>>>>> >>>>>> Unfortunately I do not have the skills to figure out what it is >>>>>> exactly that goes wrong and I was hoping that someone here might know how >>>>>> to debug this further? >>>>>> >>>>>> Thank you for any and all assistance! >>>>>> >>>>>> //Alexander Nilsson >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Learn Graph Databases - Download FREE O'Reilly Book >>>>>> "Graph Databases" is the definitive new guide to graph databases and >>>>>> their >>>>>> applications. Written by three acclaimed leaders in the field, >>>>>> this first edition is now available. Download your free book today! >>>>>> http://p.sf.net/sfu/13534_NeoTech >>>>>> _______________________________________________ >>>>>> flickertcb-devel mailing list >>>>>> fli...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/flickertcb-devel >>>>>> >>>>>> >>>>> >>>> >>> >> > |