From: Amitabha R. <ami...@gm...> - 2006-12-04 14:42:38
|
Hi I am using oprofile on Xen (xenoprofile) and I have come across a problem with profiling vdso regions. Oprofile does not currently distinguish these from anonymous regions. Here is an example : 86741 68.3065 dbench 6098 4.8020 anon (tgid:7143 range:0xbfffe000-0xbffff000) When this should ideally be reported as 599 30.2068 [VDSO]18553.0xbfffe000.0xbffff000 While is this is fine right now because of the fixed address range for vdso it can turn into a problem when we start to randomize this location. Also it is misleading to people who are not aware of this region ( I wasnt initially and was wondering where the anon map came from :). I've put together a patch to solve this problem (below). Its in a very hackish state right now. For one its on the xen kernel rather than the pristine linux one. My questions to the list are: a) whether there is sufficient interest in this that I clean it up and send it out for testing. While an application "typically" is not supposed to spend too much time in that region dbench is doing that for me and I think it is worthwhile to isolate the vdso region from anon pages to help in such situations. b) Any comments on my approach to the problem. Thanks -Amitabha The way I have solved this problem is to 1) Mark that region as VDSO via the mmu_context structure. This change is due anyway from one of Ingo Molnar's patches (http://lkml.org/lkml/2006/5/20/26) but I have no idea which linux kernel version or when Xen will sync to it. *** include/asm-i386/mach-xen/asm/mmu.h 2006-12-03 17:12:55.000000000 +0530 --- /opt/xen-unstable.hg/linux-2.6.16.29-xen/include/asm/mach-xen/asm/mmu.h 2006-12-01 23:56:51.000000000 +0530 *************** *** 13,18 **** --- 13,19 ---- struct semaphore sem; void *ldt; #ifdef CONFIG_XEN + void * vdso; int has_foreign_mappings; #endif } mm_context_t; 2) Added code to sysenter.c to indicate the vdso region in mm struct *** arch/i386/kernel/sysenter.c 2006-12-03 17:12:55.000000000 +0530 --- /opt/xen-unstable.hg/linux-2.6.16.29-xen/arch/i386/kernel/sysenter.c 2006-12-01 23:57:05.000000000 +0530 *************** *** 5,11 **** * * This file contains the needed initializations to support sysenter. */ - #include <linux/init.h> #include <linux/smp.h> #include <linux/thread_info.h> --- 5,10 ---- *************** *** 130,135 **** --- 129,135 ---- kmem_cache_free(vm_area_cachep, vma); return ret; } + mm->context.vdso = (void *)vma->vm_start; mm->total_vm++; up_write(&mm->mmap_sem); return 0; 3) Added a special VDSO cookie in the oprofile output types. *** drivers/oprofile/event_buffer.h 2006-12-03 19:54:21.000000000 +0530 --- /opt/xen-unstable.hg/linux-2.6.16.29-xen/drivers/oprofile/event_buffer.h 2006-12-02 00:50:30.000000000 +0530 *************** *** 12,17 **** --- 12,18 ---- #include <linux/types.h> #include <asm/semaphore.h> + #include <asm/elf.h> int alloc_event_buffer(void); *************** *** 36,45 **** --- 37,49 ---- #define TRACE_END_CODE 9 #define XEN_ENTER_SWITCH_CODE 10 #define DOMAIN_SWITCH_CODE 11 + #define COOKIE_VDSO 12 #define INVALID_COOKIE ~0UL #define NO_COOKIE 0UL + + /* Constant used to refer to coordinator domain (Xen) */ #define COORDINATOR_DOMAIN -1 4) Added code to the dcookie lookup function in buffer sync to correctly identify these regions and emit the right cookie. I think I will have to fix this part beause vdso should obviously move to the arch dependent code. *** drivers/oprofile/buffer_sync.c 2006-12-03 19:54:21.000000000 +0530 --- /opt/xen-unstable.hg/linux-2.6.16.29-xen/drivers/oprofile/buffer_sync.c 2006-12-02 00:47:04.000000000 +0530 *************** *** 30,35 **** --- 30,37 ---- #include <linux/profile.h> #include <linux/module.h> #include <linux/fs.h> + + #include <asm/elf.h> #include "oprofile_stats.h" #include "event_buffer.h" *************** *** 260,266 **** vma->vm_file->f_vfsmnt); *offset = (vma->vm_pgoff << PAGE_SHIFT) + addr - vma->vm_start; ! } else { /* must be an anonymous map */ *offset = addr; } --- 262,275 ---- vma->vm_file->f_vfsmnt); *offset = (vma->vm_pgoff << PAGE_SHIFT) + addr - vma->vm_start; ! } ! /* Should this be moved outside the loop by comparing ! with page aligned address straight away ? */ ! else if(mm->context.vdso == (void *)vma->vm_start){ ! cookie = COOKIE_VDSO; ! *offset = addr; ! } ! else { /* must be an anonymous map */ *offset = addr; } 5) I also fixed the userspace tools. Here is the patch. I now correctly get reports of the form GLOBAL_POWER_E...| samples| %| ------------------ 67 94.3662 bash 3 4.2254 [VDSO]18604.0xbfffe000.0xbffff000 diff -rc pristine/oprofile-0.9.1/daemon/opd_cookie.c oprofile-0.9.1/daemon/opd_cookie.c *** pristine/oprofile-0.9.1/daemon/opd_cookie.c 2005-05-26 05:30:02.000000000 +0530 --- oprofile-0.9.1/daemon/opd_cookie.c 2006-12-02 11:47:22.000000000 +0530 *************** *** 175,180 **** --- 175,183 ---- if (cookie == NO_COOKIE) return "anonymous"; + if(cookie == COOKIE_VDSO) + return "vdso"; + list_for_each(pos, &hashes[hash]) { entry = list_entry(pos, struct cookie_entry, list); if (entry->value == cookie) { diff -rc pristine/oprofile-0.9.1/daemon/opd_cookie.h oprofile-0.9.1/daemon/opd_cookie.h *** pristine/oprofile-0.9.1/daemon/opd_cookie.h 2005-04-27 22:35:55.000000000 +0530 --- oprofile-0.9.1/daemon/opd_cookie.h 2006-12-02 11:52:21.000000000 +0530 *************** *** 15,20 **** --- 15,21 ---- #define INVALID_COOKIE ~0LLU #define NO_COOKIE 0LLU + #define COOKIE_VDSO 12 /** * Shift value to remove trailing zero on a dcookie value, 7 is sufficient diff -rc pristine/oprofile-0.9.1/daemon/opd_mangling.c oprofile-0.9.1/daemon/opd_mangling.c *** pristine/oprofile-0.9.1/daemon/opd_mangling.c 2005-05-02 20:36:58.000000000 +0530 --- oprofile-0.9.1/daemon/opd_mangling.c 2006-12-02 14:20:37.000000000 +0530 *************** *** 62,67 **** --- 62,75 ---- return name; } + static char * mangle_vdso(struct anon_mapping const * anon) + { + char * name = xmalloc(PATH_MAX); + snprintf(name, 1024, "[VDSO]%u.0x%llx.0x%llx", (unsigned int)anon->tgid, + anon->start, anon->end); + return name; + } + static char * mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg) *************** *** 76,83 **** values.image_name = sf->kernel->name; values.flags |= MANGLE_KERNEL; } else if (sf->anon) { ! values.flags |= MANGLE_ANON; ! values.image_name = mangle_anon(sf->anon); } else { values.image_name = find_cookie(sf->cookie); } --- 84,98 ---- values.image_name = sf->kernel->name; values.flags |= MANGLE_KERNEL; } else if (sf->anon) { ! if(sf->cookie == COOKIE_VDSO){ ! values.flags |=MANGLE_VDSO; ! values.image_name = mangle_vdso(sf->anon); ! } ! else{ ! values.flags |= MANGLE_ANON; ! values.image_name = mangle_anon(sf->anon); ! } ! } else { values.image_name = find_cookie(sf->cookie); } diff -rc pristine/oprofile-0.9.1/daemon/opd_trans.c oprofile-0.9.1/daemon/opd_trans.c *** pristine/oprofile-0.9.1/daemon/opd_trans.c 2006-12-03 23:15:51.000000000 +0530 --- oprofile-0.9.1/daemon/opd_trans.c 2006-12-02 14:21:52.000000000 +0530 *************** *** 114,120 **** if (trans->in_kernel != 0) clear_trans_current(trans); ! if (!trans->in_kernel && trans->cookie == NO_COOKIE) trans->anon = find_anon_mapping(trans); /* get the current sfile if needed */ --- 114,121 ---- if (trans->in_kernel != 0) clear_trans_current(trans); ! ! if (!trans->in_kernel && (trans->cookie == NO_COOKIE || trans->cookie == COOKIE_VDSO)) trans->anon = find_anon_mapping(trans); /* get the current sfile if needed */ diff -rc pristine/oprofile-0.9.1/libop/op_mangle.h oprofile-0.9.1/libop/op_mangle.h *** pristine/oprofile-0.9.1/libop/op_mangle.h 2005-05-02 20:37:01.000000000 +0530 --- oprofile-0.9.1/libop/op_mangle.h 2006-12-02 14:11:05.000000000 +0530 *************** *** 27,32 **** --- 27,33 ---- MANGLE_CALLGRAPH = (1 << 4), MANGLE_ANON = (1 << 5), MANGLE_CG_ANON = (1 << 6), + MANGLE_VDSO = (1 <<7 ), }; /** |
From: John L. <le...@mo...> - 2006-12-14 00:43:35
|
On Mon, Dec 04, 2006 at 08:12:35PM +0530, Amitabha Roy wrote: > a) whether there is sufficient interest in this that I clean it up and > send it out for testing. While an application "typically" is not > supposed to spend too much time in that region dbench is doing that > for me and I think it is worthwhile to isolate the vdso region from > anon pages to help in such situations. > > b) Any comments on my approach to the problem. I'd prefer a user-space solution, so we don't have a new kernel/user incompatibility. We could default to the static location, and have something that exports from the kernel the location of the VDSO if relevant. regards john |
From: Amitabha R. <ami...@gm...> - 2006-12-15 15:51:48
|
On 12/15/06, John Levon <le...@mo...> wrote: > On Fri, Dec 15, 2006 at 08:36:12PM +0530, Amitabha Roy wrote: > > > Well, as I have been saying, adding this patch to userspace WILL NOT > > break compatibility with the kernel. Everything that was or is running > > fine will continue to do so. > > But it's completely pointless as we can never merge the kernel patch. Well I can change the kernel patch so it is predicated by a configuration switch. That will not break anything. But that is pointless if you dont merge the userspace patch. We have to start somewhere. > > > bug. Perhaps you have a cleaner solution in mind that will also > > magically be backward and forward compatible, that I am not seeing > > here. > > I've said twice now what you need is some way of exporting the vdso > address from the kernel, and we can check if it's there. > individual vdso addresses for each binary being profiled. Clearly a non trivial task. We would have to seriously hack the dcookie mechanism to remember that kind of information. Thats a bit excessive for simply marking out VDSO regions, no ? While I would love to have VDSOs marked out I am not desperate enough to mess around with the dcookie code :) Even assuming that I did, I dont want to change the cookie protocol (else we are back to the compatibility question). The options then are export it via the /proc interface or perhaps a new device file. Or perhaps enhance the dcookie sycall to lookup VDSO maps for individual binaries or a cookie for the vdso region itself. Getting that kind of patch into the kernel IMHO will be very challenging. I think the kernel patch (modified to be predicated by a switch) and the userspace patch as is will solve the problem simply and everything will be compatible with everything else. Thanks -Amitabha |
From: John L. <le...@mo...> - 2006-12-15 16:41:40
|
On Fri, Dec 15, 2006 at 09:21:47PM +0530, Amitabha Roy wrote: > Well I can change the kernel patch so it is predicated by a > configuration switch. How would that help you? As soon as you turn it in, you've broken compatibility. CONFIG_BREAK_COMPAT isn't acceptable... > >I've said twice now what you need is some way of exporting the vdso > >address from the kernel, and we can check if it's there. > > individual vdso addresses for each binary being profiled. I thought the VDSO is at a fixed address across an entire system. > Clearly a > non trivial task. We would have to seriously hack the dcookie > mechanism to remember that kind of information. The daemon would read it, nothing in the kernel except exporting the info somewhere. regards john |
From: Amitabha R. <ami...@gm...> - 2006-12-14 08:37:20
|
Hi John A pure userspace solution would not work if the VDSO regions were to be randomized (which it is in the current 2.6 head of tree). So we must have some kind of support in the kernel. I've already sent a kernel patch (to LKML and copied phil.el on it). I sent out a cleaned up userspace patch recently to this list. Incorporating it will cause *no incompatibility* with the kernel since the special cookie will not be emitted unless my kernel patch is applied. On the other hand we have to make changes to the kernel anyway so incorporating this will make a strong case for Phil to apply the kernel patch. This would also be the first step to extracting symbols in the VDSO region from the kernel and display them in the detailed report. It would be very useful to diagnose cases like frequent syscall restarts which seems to be happening in the experiments I am running. In theory of course one could do something purely in userspace like open the /proc/map file of some process (perhaps oprofile itself) and then look for the vdso keyword. But this would not be backward compatible with older kernels which do not mark the vdso region. Also the /proc/map file is unreadable on some FC4 systems. Do you have any other pure userspace solution in mind ? Thanks -Amitabha On 12/14/06, John Levon <le...@mo...> wrote: > On Mon, Dec 04, 2006 at 08:12:35PM +0530, Amitabha Roy wrote: > > > a) whether there is sufficient interest in this that I clean it up and > > send it out for testing. While an application "typically" is not > > supposed to spend too much time in that region dbench is doing that > > for me and I think it is worthwhile to isolate the vdso region from > > anon pages to help in such situations. > > > > b) Any comments on my approach to the problem. > > I'd prefer a user-space solution, so we don't have a new kernel/user > incompatibility. We could default to the static location, and have > something that exports from the kernel the location of the VDSO if > relevant. > > regards > john > |
From: John L. <le...@mo...> - 2006-12-14 20:05:50
|
On Thu, Dec 14, 2006 at 02:07:19PM +0530, Amitabha Roy wrote: > I've already sent a kernel patch (to LKML and copied phil.el on it). > I sent out a cleaned up userspace patch recently to this list. > Incorporating it will cause *no incompatibility* with the kernel since Nonsense. A new cookie in the kernel requires a new userspace. Like I said: > >I'd prefer a user-space solution, so we don't have a new kernel/user > >incompatibility. We could default to the static location, and have > >something that exports from the kernel the location of the VDSO if > >relevant. regards john |
From: Amitabha R. <ami...@gm...> - 2006-12-15 16:59:25
|
Hi John As an aside, I appreciate that you are taking the time to discuss this with me. This is a good discussion. On 12/15/06, John Levon <le...@mo...> wrote: > How would that help you? As soon as you turn it in, you've broken > compatibility. CONFIG_BREAK_COMPAT isn't acceptable... > At least the user can choose CONFIG_OPROF_VDSO=y and make sure that they have a version of oprofile that has the userspace vdso patch. There is an option. And if enough users migrate to the latest version of Oprofile (why not if it is in active dev ?) make that the default in the kernel. > > > > individual vdso addresses for each binary being profiled. > > I thought the VDSO is at a fixed address across an entire system. > Not in the newer kernels. Its randomized. Here are snippets from an oprof output : GLOBAL_POWER_E...| samples| %| ------------------ 301758 81.3305 dbench 7236 1.9503 [VDSO]4681.0xb7f85000.0xb7f86000 GLOBAL_POWER_E...| samples| %| ------------------ 40 93.0233 sshd 3 6.9767 [VDSO]3884.0xb7fd2000.0xb7fd3000 As you can see the vdso region is not at the same adress for different binaries. This is top of the 2.6 tree. Thanks -Amitabha |
From: John L. <le...@mo...> - 2006-12-15 17:05:48
|
On Fri, Dec 15, 2006 at 10:29:24PM +0530, Amitabha Roy wrote: > At least the user can choose CONFIG_OPROF_VDSO=y and make sure that > they have a version of oprofile that has the userspace vdso patch. > There is an option. And if enough users migrate to the latest version > of Oprofile (why not if it is in active dev ?) make that the default > in the kernel. It can't be the default, this is the problem. > Not in the newer kernels. Its randomized. Here are snippets from an > oprof output : I still don't get it. We have to read /proc/pid/maps for anon regions; why can't we handle the vdso entry there? regards john |
From: Amitabha R. <ami...@gm...> - 2006-12-15 06:36:30
|
Correct which is why the userspace patch must be in place before you can apply the kernel patch. But if the userspace patch is already applied it does not matter if the kernel is patched yet or not. We will never see that cookie. Case where this would break things: User does a kernel upgrade and gets the kernel patch but does not upgrade oprofile. Oprofile sees the new cookie and breaks. Yeah ok, so the user has to upgrade their oprofile in this case. Is that your point ? -Amitabha On 12/15/06, John Levon <le...@mo...> wrote: > On Thu, Dec 14, 2006 at 02:07:19PM +0530, Amitabha Roy wrote: > > > I've already sent a kernel patch (to LKML and copied phil.el on it). > > I sent out a cleaned up userspace patch recently to this list. > > Incorporating it will cause *no incompatibility* with the kernel since > > Nonsense. A new cookie in the kernel requires a new userspace. > > Like I said: > > > >I'd prefer a user-space solution, so we don't have a new kernel/user > > >incompatibility. We could default to the static location, and have > > >something that exports from the kernel the location of the VDSO if > > >relevant. > > regards > john > |
From: John L. <le...@mo...> - 2006-12-15 13:36:27
|
On Fri, Dec 15, 2006 at 12:06:28PM +0530, Amitabha Roy wrote: > Case where this would break things: User does a kernel upgrade and > gets the kernel patch but does not upgrade oprofile. Oprofile sees the > new cookie and breaks. Yeah ok, so the user has to upgrade their > oprofile in this case. Is that your point ? Yes john |
From: Amitabha R. <ami...@gm...> - 2006-12-15 17:25:04
|
On 12/15/06, John Levon <le...@mo...> wrote: > On Fri, Dec 15, 2006 at 10:29:24PM +0530, Amitabha Roy wrote: > > I still don't get it. We have to read /proc/pid/maps for anon regions; > why can't we handle the vdso entry there? > That would be an option. Where can I find this code which reads the maps file ? |
From: John L. <le...@mo...> - 2006-12-15 17:25:41
|
On Fri, Dec 15, 2006 at 10:55:02PM +0530, Amitabha Roy wrote: > >On Fri, Dec 15, 2006 at 10:29:24PM +0530, Amitabha Roy wrote: > > > >I still don't get it. We have to read /proc/pid/maps for anon regions; > >why can't we handle the vdso entry there? > > > > That would be an option. Where can I find this code which reads the maps > file ? daemon/ in the oprofile source john |
From: Amitabha R. <ami...@gm...> - 2006-12-16 05:35:23
|
Hi John I sent out a userspace only patch that prints out the name of the anon region when possible. That will also solve this VDSO problem. RFC. Thanks -Amitabha On 12/15/06, John Levon <le...@mo...> wrote: > On Fri, Dec 15, 2006 at 10:55:02PM +0530, Amitabha Roy wrote: > > > >On Fri, Dec 15, 2006 at 10:29:24PM +0530, Amitabha Roy wrote: > > > > > >I still don't get it. We have to read /proc/pid/maps for anon regions; > > >why can't we handle the vdso entry there? > > > > > > > That would be an option. Where can I find this code which reads the maps > > file ? > > daemon/ in the oprofile source > > john > |
From: Amitabha R. <ami...@gm...> - 2006-12-15 14:12:35
|
So why not accept the patch into oprofile user space so users at least have a choice to upgrade oprofile when they upgrade kernels ? It doesnt hurt backward compatibility with kernels that do not have the VDSO patch. Is your objection to the userspace patch or the kernel one ? It clearly cant hurt to apply the userspace one. If you are worried about the kernel one I can redo the patch to make it conditional on a configuration option. -Amitabha On 12/15/06, John Levon <le...@mo...> wrote: > On Fri, Dec 15, 2006 at 12:06:28PM +0530, Amitabha Roy wrote: > > > Case where this would break things: User does a kernel upgrade and > > gets the kernel patch but does not upgrade oprofile. Oprofile sees the > > new cookie and breaks. Yeah ok, so the user has to upgrade their > > oprofile in this case. Is that your point ? > > Yes > > john > |
From: John L. <le...@mo...> - 2006-12-15 14:15:31
|
On Fri, Dec 15, 2006 at 07:42:34PM +0530, Amitabha Roy wrote: > So why not accept the patch into oprofile user space so users at least > have a choice to upgrade oprofile when they upgrade kernels ? It > doesnt hurt backward compatibility with kernels that do not have the > VDSO patch. Because we can't break compatibility between the kernel and userspace, full stop. john |
From: Amitabha R. <ami...@gm...> - 2006-12-15 15:06:14
|
Well, as I have been saying, adding this patch to userspace WILL NOT break compatibility with the kernel. Everything that was or is running fine will continue to do so. Anyway its your call. I think VDSO reported as anon is a profiling bug. Perhaps you have a cleaner solution in mind that will also magically be backward and forward compatible, that I am not seeing here. Bottomline : you need to change the kernel AND userspace to do anything about it. If both sides point to each other and say compatibility, its a deadlock. <shrug> -Amitabha On 12/15/06, John Levon <le...@mo...> wrote: > On Fri, Dec 15, 2006 at 07:42:34PM +0530, Amitabha Roy wrote: > > > So why not accept the patch into oprofile user space so users at least > > have a choice to upgrade oprofile when they upgrade kernels ? It > > doesnt hurt backward compatibility with kernels that do not have the > > VDSO patch. > > Because we can't break compatibility between the kernel and userspace, > full stop. > > john > |
From: John L. <le...@mo...> - 2006-12-15 15:24:15
|
On Fri, Dec 15, 2006 at 08:36:12PM +0530, Amitabha Roy wrote: > Well, as I have been saying, adding this patch to userspace WILL NOT > break compatibility with the kernel. Everything that was or is running > fine will continue to do so. But it's completely pointless as we can never merge the kernel patch. > bug. Perhaps you have a cleaner solution in mind that will also > magically be backward and forward compatible, that I am not seeing > here. I've said twice now what you need is some way of exporting the vdso address from the kernel, and we can check if it's there. regards john |