From: Tong Li <to...@cs...> - 2005-11-04 19:41:10
|
I have an x86_64 SMP running Linux 2.6.14. The kernel was compiled with frame pointers enabled. I've experienced two problems with oprofile both 0.9.1 and 0.8.2. When I started oprofile with callgraph on, a panic occurred right away. The problem seems to be that check_user_page_readable() does not correctly filter out illegal user-space addresses so that some garbage frame_head value gets passed to dump_backtrace() in backtrace.c. 2.6.14-rc5-mm1 has a patch that killed the use of check_user_page_readable() and replaced it with __copy_from_user_inatomic(). I applied this patch and it had solved the problem. Now the new problem I'm having is that opreport -c doesn't print out kernel call graphs correctly: all functions only show self, not their callers (see below). However, call graphs for user functions (e.g., libc calls) are shown correctly. CPU: P4 / Xeon, speed 3336.37 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name app name symbol name ------------------------------------------------------------------------------- 17606 28.1277 vmlinux vmlinux find_busiest_group 17606 100.000 vmlinux vmlinux find_busiest_group [self] ------------------------------------------------------------------------------- 11753 18.7769 vmlinux vmlinux _raw_spin_lock 11753 100.000 vmlinux vmlinux _raw_spin_lock [self] Does anyone know how to fix this? Thanks, tong |
From: John L. <le...@mo...> - 2005-11-04 20:15:34
|
On Fri, Nov 04, 2005 at 02:41:05PM -0500, Tong Li wrote: > Does anyone know how to fix this? You need to enable the CONFIG_FRAME_POINTER option (or similar) in the kernel john |
From: Tong Li <to...@cs...> - 2005-11-04 20:57:08
|
> You need to enable the CONFIG_FRAME_POINTER option (or similar) in the > kernel Yes, but I did have it enabled ... tong |
From: Tong Li <to...@cs...> - 2006-01-12 04:51:26
|
Regarding the problem that oprofile does not generate the correct call graphs for kernel functions, I think it is due to a bug in valid_kernel_stack() in arch/i386/oprofile/backtrace.c. The reasoning that the frame pointer (headaddr) should be greater than stack (i.e., regs) is not correct, because stack points to the top of the new stack that is set up when an NMI occurs, which is at a different (and on my system higher) address than the frame pointer (headaddr) on the stack of the interrupted process. I think the correct comparison should be between headaddr and regs->rsp (or regs->esp for 32-bit). So I changed the line unsigned long stack = (unsigned long)regs; to #ifdef CONFIG_X86_64 unsigned long stack = (unsigned long)regs->rsp; #else unsigned long stack = (unsigned long)regs->esp; #endif With this change, I'm now able to backtrace kernel functions. tong On Fri, 4 Nov 2005, John Levon wrote: > On Fri, Nov 04, 2005 at 02:41:05PM -0500, Tong Li wrote: > >> Does anyone know how to fix this? > > You need to enable the CONFIG_FRAME_POINTER option (or similar) in the > kernel > > john > |
From: John L. <le...@mo...> - 2006-01-16 13:26:43
|
On Wed, Jan 11, 2006 at 11:51:17PM -0500, Tong Li wrote: > Regarding the problem that oprofile does not generate the correct call > graphs for kernel functions, I think it is due to a bug in > valid_kernel_stack() in arch/i386/oprofile/backtrace.c. The reasoning that > the frame pointer (headaddr) should be greater than stack (i.e., regs) is > not correct, because stack points to the top of the new stack that is set > up when an NMI occurs, which is at a different (and on my system higher) > address than the frame pointer (headaddr) on the stack of the interrupted > process. I think the correct comparison should be between headaddr and > regs->rsp (or regs->esp for 32-bit). So I changed the line Does x86-64 use separate interrupt stacks? Where is the code that does this stack setup? > With this change, I'm now able to backtrace kernel functions. This change looks OK, but you need to modify the comment above which describes the stack layout too. john |
From: Tong Li <to...@cs...> - 2006-01-18 18:24:54
|
> Does x86-64 use separate interrupt stacks? Where is the code that does > this stack setup? NMI and several other exceptions use special stacks, maintained by the interrupt stack table (IST). The code is in trap_init() in arch/x86_64/kernel/traps.c. You'll see it's different from i386's code. > > This change looks OK, but you need to modify the comment above which > describes the stack layout too. In the comments, we could change the line "struct pt_regs stored on stack (struct pt_regs *)" to "saved regs->esp value", and the last sentence "Thus, &pt_regs <-> stack base restricts ..." to "Thus, regs->esp <-> stack base restricts ...", or something like that. What do you think? Please let me know if you'd like to make to make a patch or whatever (I've never done this and don't know what the right way should be). Thanks, tong |
From: John L. <le...@mo...> - 2006-01-18 18:37:20
|
On Wed, Jan 18, 2006 at 01:24:43PM -0500, Tong Li wrote: > NMI and several other exceptions use special stacks, maintained by the > interrupt stack table (IST). The code is in trap_init() in > arch/x86_64/kernel/traps.c. You'll see it's different from i386's code. OK. > >This change looks OK, but you need to modify the comment above which > >describes the stack layout too. > > In the comments, we could change the line "struct pt_regs stored on stack > (struct pt_regs *)" to "saved regs->esp value", and the last sentence > "Thus, &pt_regs <-> stack base restricts ..." to "Thus, regs->esp <-> > stack base restricts ...", or something like that. What do you think? There's also a picture that shows the regs being on the same stack. That'll need fixing too. > Please let me know if you'd like to make to make a patch or whatever (I've Please do. Documentation/ in the linux kernel source shows you how. john |
From: Tong Li <to...@cs...> - 2006-01-24 01:10:48
|
I'm getting a call graph like the following: 177091 64.7726 vmlinux vmlinux _spin_lock 2562451 8.6735 vmlinux vmlinux _raw_spin_lock 2562451 100.000 vmlinux vmlinux _raw_spin_lock [self] -------------------------------------------------------------------------- 138919 79.5186 vmlinux vmlinux scheduler_tick 0 0 vmlinux vmlinux _spin_lock 177091 99.9972 vmlinux vmlinux _raw_spin_lock My questions are: 1. In the second block, why does _spin_lock have 0 in its samples field, as opposed to the 177091 in the first block? 2. Why does _raw_spin_lock show 177091 samples, as opposed to 2562451 as in the first block? It looks like the samples field doesn't show the total number of times that the function is on the stack when a sample is taken (i.e., when an NMI occurs), because otherwise each function would have a fixed value in all its occurrences. But what exactly do these numbers represent? Thanks, tong |
From: John L. <le...@mo...> - 2006-01-24 01:26:29
|
On Mon, Jan 23, 2006 at 08:10:35PM -0500, Tong Li wrote: > I'm getting a call graph like the following: > > 177091 64.7726 vmlinux vmlinux _spin_lock > 2562451 8.6735 vmlinux vmlinux _raw_spin_lock > 2562451 100.000 vmlinux vmlinux _raw_spin_lock [self] > -------------------------------------------------------------------------- > 138919 79.5186 vmlinux vmlinux scheduler_tick > 0 0 vmlinux vmlinux _spin_lock > 177091 99.9972 vmlinux vmlinux _raw_spin_lock > > 1. In the second block, why does _spin_lock have 0 in its samples field, > as opposed to the 177091 in the first block? We took no samples where the PC was within _raw_spin_lock itself. > 2. Why does _raw_spin_lock show 177091 samples, as opposed to 2562451 as > in the first block? The unindented sample value is the same as you get from a normal opreport, not a call-graph value. > It looks like the samples field doesn't show the total number of times > that the function is on the stack when a sample is taken (i.e., when an > NMI occurs), because otherwise each function would have a fixed value in > all its occurrences. But what exactly do these numbers represent? The number of call-graph arcs between the caller/callee and the subject function. Of all times we interrupted, 2562451 of these had a PC value within _spin_lock. Of all times, 177091 of these had a call stack where we could observe a call from _spin_lock to _raw_spin_lock. This /is/ a fixed value as you can see from the _raw_spin_lock entry (where _spin_lock callers count 177091) compared to the _spin_lock entry (where callee value is the same). See the manual, especially the section on interpretation. regards john |