From: SourceForge.net <no...@so...> - 2009-07-23 22:47:08
|
Bugs item #1685267, was opened at 2007-03-21 08:07 Message generated for change (Comment added) made by mishikal You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116191&aid=1685267&group_id=16191 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: Fixed Priority: 9 Private: No Submitted By: Sami Farin (safari) Assigned to: Maynard Johnson (maynardj) Summary: (callgraph)opreport error: parse_filename() invalid filename Initial Comment: when using callgraph feature, opreport fails with the following words: # opreport -c -p /lib/modules/2.6.19.7-exec-shield-1/kernel/ -l -g /usr/bin/Xorg opreport error: parse_filename() invalid filename [5]: /var/lib/oprofile/samples/current/{root}/var/lib/oprofile/samples/current/{root}/usr/bin/Xorg/{dep}/{anon}/12410.0xeb2000.0xeb3000/GLOBAL_POWER_EVENTS.45000.1.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/bin/Xorg/{dep}/{anon}/12410.0xeb2000.0xeb3000/GLOBAL_POWER_EVENTS.45000.1.all.all.all/{cg}/{root}/usr/local/bin/oprofiled/GLOBAL_POWER_EVENTS.45000.1.all.all.all # opreport -v opreport: oprofile 0.9.2 compiled on Mar 18 2007 13:06:58 I added [5] into parse_filename() to find out which of them failed. [5] is the latest one (skip "{cg}"). ---------------------------------------------------------------------- Comment By: Quanah Gibson-Mount (mishikal) Date: 2009-07-23 15:47 Message: Also stuff like this: warning: /lib64/libcrypto.so.0.9.7a/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/lib64/libcrypto.so.0.9.7a/{dep}/{root}/lib64/libcrypto.so.0.9. 7a/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /lib64/tls/libc-2.3.4.so/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/lib64/tls/libc-2.3.4.so/{dep}/{root}/lib64/tls/libc-2.3.4.so/CPU _CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /lib64/tls/libpthread-2.3.4.so/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/lib64/tls/libpthread-2.3.4.so/{dep}/{root}/lib64/tls/libpt hread-2.3.4.so/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /no-vmlinux could not be found. warning: /no-vmlinux/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{kern}/no-vmlinux/{dep}/{kern}/no-vmlinux/CPU_CLK_UNHALTED.100000.0.all.all.all coul d not be found. warning: /usr/lib64/libbfd-2.15.92.0.2.so/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/lib64/libbfd-2.15.92.0.2.so/{dep}/{root}/usr/lib64/l ibbfd-2.15.92.0.2.so/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/lib64/libstdc++.so.6.0.3/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/lib64/libstdc++.so.6.0.3/{dep}/{root}/usr/lib64/libstdc ++.so.6.0.3/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/bin/opreport/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/bin/opreport/{dep}/{root}/usr/local/bin/opreport/CPU_CL K_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/bin/oprofiled/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/bin/oprofiled/{dep}/{root}/usr/local/bin/oprofiled/CPU _CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/stow/bdb-4.5.20.2/lib/libdb-4.5.so/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/stow/bdb-4.5.20.2/lib/libdb-4.5.s o/{dep}/{root}/usr/local/stow/bdb-4.5.20.2/lib/libdb-4.5.so/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/stow/openldap-2.4.17-bdb45/lib/liblber-2.4.so.2.5.0/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/stow/openldap-2. 4.17-bdb45/lib/liblber-2.4.so.2.5.0/{dep}/{root}/usr/local/stow/openldap-2.4.17-bdb45/lib/liblber-2.4.so.2.5.0/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/stow/openldap-2.4.17-bdb45/lib/libldap_r-2.4.so.2.5.0/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/stow/openldap- 2.4.17-bdb45/lib/libldap_r-2.4.so.2.5.0/{dep}/{root}/usr/local/stow/openldap-2.4.17-bdb45/lib/libldap_r-2.4.so.2.5.0/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/stow/openldap-2.4.17-bdb45/lib/openldap/back_hdb-2.4.so.2.5.0/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/stow/o penldap-2.4.17-bdb45/lib/openldap/back_hdb-2.4.so.2.5.0/{dep}/{root}/usr/local/stow/openldap-2.4.17-bdb45/lib/openldap/back_hdb-2.4.so.2.5.0/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/local/stow/openldap-2.4.17-bdb45/lib/slapd/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/stow/openldap-2.4.17-bdb45/lib/ slapd/{dep}/{root}/usr/local/stow/openldap-2.4.17-bdb45/lib/slapd/CPU_CLK_UNHALTED.100000.0.all.all.all could not be found. warning: /usr/sbin/hald/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/sbin/hald/{dep}/{root}/usr/sbin/hald/CPU_CLK_UNHALTED.100000.0.all.all .all could not be found. warning: /usr/sbin/ntpd/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/sbin/ntpd/{dep}/{root}/usr/sbin/ntpd/CPU_CLK_UNHALTED.100000.0.all.all .all could not be found. ---------------------------------------------------------------------- Comment By: Quanah Gibson-Mount (mishikal) Date: 2009-07-23 15:46 Message: 0.9.4 fails for me when doing opreport -c -l: [root@vmhost401 ~]# opreport -c -l opreport error: parse_filename() invalid filename: /var/lib/oprofile/samples/current/{root}/var/lib/oprofile/samples/current/{root}/usr/local/jdk1.6.0_14/bin/java/{dep}/{anon:anon}/21168.0x7fa8a9a2f000.0x7fa8aaff0000/CPU_CLK_UNHALTED.100000.0.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/usr/local/jdk1.6.0_14/bin/java/{dep}/{anon:anon}/21168.0x7fa8a9a2f000.0x7fa8aaff0000/CPU_CLK_UNHALTED.100000.0.all.all.all/{cg}/{root}/usr/local/bin/oprofiled/CPU_CLK_UNHALTED.100000.0.all.all.all ---------------------------------------------------------------------- Comment By: Maynard Johnson (maynardj) Date: 2009-07-14 06:03 Message: I've not received any feedback on the attached patch yet. I'm getting ready to roll out a new release, and I intend to include this patch in the release. The patch has been reviewed and committed, so I'm marking this bug as "Fixed" (until someone proves otherwise). ---------------------------------------------------------------------- Comment By: Maynard Johnson (maynardj) Date: 2009-07-06 06:43 Message: I posted a patch on the oprofile list on June 25, subject heading "[PATCH -- v2] Handle bad samples from kernel due to overflows", that is intended to address this issue. I would appreciate any testing that could be done, especially from those who can reproduce the kinds of errors that are described in this bug. Comments are welcome. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2009-03-25 12:32 Message: First, we need to verify that oprofiled is being built correctly with frame pointers. This might just be an RPM build problem. If you rebuild current CVS with the default flags, and retest, does it still occur? Remember to rm -rf /var/lib/oprofile/ It's not possible for someone who cannot reproduce this problem to fix it. My comment stands and describes a fix - we need an interested hacker to fix this problem *properly*. ---------------------------------------------------------------------- Comment By: Brian Barnes (bcbarnes) Date: 2009-03-24 21:54 Message: I am running into this bug while using Fedora 10. Linux hackworth 2.6.27.19-170.2.35.fc10.x86_64 #1 SMP Mon Feb 23 13:00:23 EST 2009 x86_64 x86_64 x86_64 GNU/Linux Fedora release 10 (Cambridge) oprofile-0.9.4-3.fc10.x86_64 oprofile-gui-0.9.4-3.fc10.x86_64 There is a Fedora bugzilla entry at https://bugzilla.redhat.com/show_bug.cgi?id=465996 Fedora would prefer upstream to apply the known patch, since the fix is here. Can anyone apply the fix? It has been known for ~2 years now? I hope some oprofile maintainer is still reading this :) ---------------------------------------------------------------------- Comment By: John Levon (movement) Date: 2008-10-13 09:39 Message: Try this workaround: find /var/lib/oprofile -name '*/var/lib/oprofile*' | xargs rm -rf This issue must be root-caused by a willing investigator. It's not OK to just paper over the cracks. Does anyone have an example of this failure where oprofiled itself isn't implicated, like phil_e suggests? The last attempt to fix this oprofiled issue, as it seems to be, was this commit: 2004-02-23 Philippe Elie <ph...@wa...> * daemon/Makefile.am: * daemon/liblegacy/Makefile.am: force daemon build with frame pointer. If daemon is built w/o frame pointer we get random arc, some of them go in the vma used by an existing mmaped sample file, so cookie lookup return a sample filename which is used as a from or to component, this break pptools by creating invalid filename. Note this doesn't protect agaisnt a make CFLAGS=-fomit-frame-pointer. Sadly, there's no mailing list discussion from that time, and we don't have IRC logs. Clearly, this fix isn't really sufficient. Nor can we simply escape the filenames because we then still have the same problem, no matter how escaped we become. We can't easily detect this exact situation. One possibility, and possibly the cleanest, is to make oprofiled ignore anything that would attempt to use a sample file in session_dir. So whenever the main image, the app image or to/from begin with the string sample_dir, we must never open a sample file. opd_sfile.c:get_file() might be a good place for this, you would do a check for this, and not call opd_open_sample_file() if so. You'd have to mark the sf as ignored for this reason and probably keep statistics. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2008-10-12 05:43 Message: Wow, what a massive time-waster. It's a shame that this kind of fragileness causes this great tool, which is otherwise extremely useful, to be completely useless, at least for callgraph profiling on x86_64. I agree with bgamari's sentiments. I can confirm that (1) this bug also affects openSUSE 11.0 on x86_64 (kernel 2.6.25.16-0.1-default, fwiw) when using vanilla oprofile-0.9.4 and that (2) rutt's patch (scroll down a bit) fixes it, or at least works around the problem. My thanks to rutt for that. The patch doesn't apply cleanly to 0.9.4 any more, so here's an updated version of it. Julian Seward --- ORIG/oprofile-0.9.4/libpp/profile_spec.cpp 2008-05-20 17:04:25.000000000 +0200 +++ oprofile-0.9.4/libpp/profile_spec.cpp 2008-10-12 14:16:54.000000000 +0200 @@ -376,14 +376,21 @@ if (is_jit_sample(sub)) return false; - filename_spec file_spec(filename, spec.extra_found_images); - if (spec.match(file_spec)) { - if (exclude_dependent && file_spec.is_dependent()) - return false; - return true; - } - - return false; + try + { + filename_spec file_spec(filename, spec.extra_found_images); + if (spec.match(file_spec)) { + if (exclude_dependent && file_spec.is_dependent()) + return false; + return true; + } + return false; + } + catch(invalid_argument arg) + { + cerr << "Caught invalid argument for" << filename << std::endl; + return false; + } } } // anonymous namespace ---------------------------------------------------------------------- Comment By: Ben Gamari (bgamari) Date: 2008-10-07 10:42 Message: My x86-64 machine running kernel 2.6.27-rc8 is also exhibiting this behavior. I have submitting a Fedora bug (https://bugzilla.redhat.com/show_bug.cgi?id=465996) since there doesn't seem to be any activity to speak of here. Can someone please merge rutt's patch so those of us who actually need to _use_ their profiler can move on with our lives? I understand the desire to track down the underlying kernel issue but the patch seems like a pretty common-sense precaution and this bug is rendering oprofile entirely useless. ---------------------------------------------------------------------- Comment By: Ani (anirbans) Date: 2008-02-20 08:28 Message: Logged In: YES user_id=1963853 Originator: NO Any updates on this bug please? We are using a mixed 64 bit kernel running 32 bit apps on our mips systems and the callgraph feature appears to be broken. ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-26 12:56 Message: Logged In: YES user_id=113232 Originator: NO I spoke too soon. Can't figure out how to append a file. So here is the patch: diff -dur /u/jruttenberg/tmp/junk/oprofile-0.9.2/libpp/profile_spec.cpp ./libpp/profile_spec.cpp --- /u/jruttenberg/tmp/junk/oprofile-0.9.2/libpp/profile_spec.cpp 2005-08-07 07:15:51.000000000 -0400 +++ ./libpp/profile_spec.cpp 2007-04-26 11:55:08.234052000 -0400 @@ -357,14 +357,21 @@ if (!is_prefix(sub, "/{root}/") && !is_prefix(sub, "/{kern}/")) return false; - filename_spec file_spec(filename); - if (spec.match(file_spec)) { - if (exclude_dependent && file_spec.is_dependent()) - return false; - return true; + try + { + filename_spec file_spec(filename); + if (spec.match(file_spec)) { + if (exclude_dependent && file_spec.is_dependent()) + return false; + return true; + } + return false; + } + catch(invalid_argument arg) + { + cerr << "Caught invalid argument for" << filename << std::endl; + return false; } - - return false; } } // anonymous namespace ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-26 12:51 Message: Logged In: YES user_id=113232 Originator: NO I'm appending a patch which is a not very perfect workaround which at least allows me to print a call graph profile for the problems case. This is actually the first C++ program I've ever worked with were I was really glad for the use of exceptions as an error mechanism. I just caught the one that parse_filename throws at the appropriate place. Since this file was likely garbage, it just proceeded and made what looks like a reasonable profile. It would still be good to track down the kernel problem which is responsible for the bad cookies, but this seems like not a terrible filter. Another approach would be to filter these in oprofiled at the point where the bad cookies are returned from the kernel. If they point into the oprofile sample directory and have the {...} components, we know they can't be right. ---------------------------------------------------------------------- Comment By: Jason Kraftcheck (kraftcheck) Date: 2007-04-26 12:04 Message: Logged In: YES user_id=665142 Originator: NO I'm encountering this bug also. I am using an x86_86 build of oprofiled to profile x86_64 executables and am not using NFS for anything. Also, I noticed that the time required before these invalid paths start showing up in the samples directory increases by a lot if oprofiled is compiled w/out optimization. ---------------------------------------------------------------------- Comment By: Sami Farin (safari) Date: 2007-04-26 08:38 Message: Logged In: YES user_id=13022 Originator: YES No NFS involved. When I debug some program/library, they get compiled with -ggdb3 -fno-omit-frame-pointer -O0. Normally I compile with -O2 -g0 -fomit-frame-pointer ... . ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-26 08:25 Message: Logged In: YES user_id=113232 Originator: NO Phil, actually run was of a program unrelated to oprofile or the daemon itself. The fact that "oprofiled" ended up in the cg field is the result of some sort of bug in either the kernel or oprofiled. (See below.) There are a few facts which I realized might be relevant: 1. The OS is x86-64, but the programs involved are 32 bit. 2. Some of the programs and/or their libraries might reside on NFS mounted filesystems. I did some more debugging work. I added logging for create_cookie like this: fedora-1:daemon$ diff -du opd_cookie.c ~/tmp/oprofile-0.9.2/daemon --- opd_cookie.c 2005-10-18 19:28:26.000000000 -0400 +++ /u/jruttenberg/tmp/oprofile-0.9.2/daemon/opd_cookie.c 2007-04-26 11:15:37.211232000 -0400 @@ -114,6 +114,8 @@ entry->ignored = 0; } else { entry->ignored = is_image_ignored(entry->name); + if (! entry->ignored) + printf("cc: dcookie: entry->name %s\n", entry->name); } return entry; and this ended up in the oprofiled.log: cc: dcookie: entry->name /var/lib/oprofile/samples/current/{root}/var/lib/oprofile/samples/current/{root}/var/tmp/install/bin/tile-sim/{dep}/{root}/var/tmp/install/bin/tile-sim/GLOBAL_POWER_EVENTS.100000.1.all.all.all/{dep}/{root}/var/lib/oprofile/samples/current/{root}/var/tmp/install/bin/tile-sim/{dep}/{root}/var/tmp/install/bin/tile-sim/GLOBAL_POWER_EVENTS.100000.1.all.all.all/{cg}/{root}/usr/local/bin/oprofiled/GLOBAL_POWER_EVENTS.100000.1.all.all.all which seems quite wrong. I'm liking the idea about the frame pointer. All our code is compiled with -g, but perhaps some of the libraries are not (they are mostly stripped in any case.) ---------------------------------------------------------------------- Comment By: Philippe Elie (phil_e) Date: 2007-04-25 19:47 Message: Logged In: YES user_id=318973 Originator: NO Rutt, I'm unsure about what you're talking, perhaps a different problem, can you reproduce Safari's problem with a sample file unrelated to the damon itself ? For the problem reportd by Safari I suspect than some binary used by the daemon (shared libs) are compiled w/o frame pointer. During the stack trace the driver return some random invalid EIP. Some of these EIP point into one sample file (sample file are mmaped by the daemon), so the name of a sample files is used as a compoment of the new sample file which completly break the name demangling. Look how the sample filename /var/lib/oprofile/samples/current/{root}/usr/bin/Xorg/... is repeated inside the complete sample filename. I increased the priority of this bug but I've no idea how to fix it in a clean way. ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-25 17:49 Message: Logged In: YES user_id=113232 Originator: NO Still a little more work and I'm starting to form a theory. Safari, was there any NFS involved in your esample? Where any of the programs you ran or libraries they used on NFS file systems? ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-25 09:54 Message: Logged In: YES user_id=113232 Originator: NO Perhaps someone isn't null terminating? ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-25 09:53 Message: Logged In: YES user_id=113232 Originator: NO Added some logging to op_mangle_filename to see what the daemon was thinking when it made these files. In particular, I had it print out the three image, dep, and cg fields of the mangle_values input. Here is an example I got: mf: image_name /var/lib/oprofile/samples/current/{root}/home/scratch/jruttenberg/main2/install/bin/tile-sim/{dep}/{anon}/25234.0xf7d04000.0xf7da8000/{cg}/{root}/home/scratch/jruttenberg/main2/install/bin/tile-sim/GLOBAL_POWER_EVENTS.100000.1.all.all.all mf: dep_name /var/lib/oprofile/samples/current/{root}/home/scratch/jruttenberg/main2/install/bin/tile-sim/{dep}/{anon}/25234.0xf7d04000.0xf7da8000/{cg}/{root}/home/scratch/jruttenberg/main2/install/bin/tile-sim/GLOBAL_POWER_EVENTS.100000.1.all.all.all mf: cg_image_name /usr/local/bin/oprofiled How can this be right? ---------------------------------------------------------------------- Comment By: John Ruttenberg (rutt) Date: 2007-04-25 09:21 Message: Logged In: YES user_id=113232 Originator: NO I've done a little work on this and desire a little guidance. It seems that the given filename really is invalid (at least according to parse_filename. If I'm reading it correctly the filename must be in three parts: 1. The base spec which should be the path to the samples directory. It is terminated by {root} or {kern}. 2. The image filename, which I think should name the actual program which was invoked. It is terminated by {dep} UNLESS it starts with {anon} in which case it's just one component. 3. The dependent filename, whatever that is. It is terminated by {cg} 4. A call graph filename Here we have: 1. Base = /var/lib/oprofile/samples/current 2. Image = /var/lib/oprofile/samples/current/{root}/usr/bin/Xorg 3. Dep = {anon}/12410.0xeb2000.0xeb3000/GLOBAL_POWER_EVENTS.45000.1.all.all.all But now instead of finding {cg} as we might expect, we have a new {dep} filename. This seems wrong. So I looked at the daemon to try to figure out what's going on there. The function op_mangle_filename is getting this kind of mess passed to it in its mangle_values parameter. I'll investigate further, but come guidance is appreciated. There are references in the sources like PP:1 which seem like formal specs for these files. Where is this document? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116191&aid=1685267&group_id=16191 |