From: SourceForge.net <no...@so...> - 2010-03-24 17:00:05
|
Bugs item #2968895, was opened at 2010-03-11 20:29 Message generated for change (Comment added) made by assarbad You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116191&aid=2968895&group_id=16191 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Oliver Schneider (assarbad) Assigned to: Maynard Johnson (maynardj) Summary: Recurrence of bug 1930788 Initial Comment: The symptoms are identical, the version is 0.9.6 (I used the source package and compiled on Debian Lenny with all updates installed). I'm currently trying to track it down in GDB, but it seems to be a hard nut to crack, since the reason appears to be a C++ exception "somewhere". NB: bug 1930788 was closed and there doesn't seem to be a method to open it as bug reporter. Probably that is only allowed for tracker admins. ---------------------------------------------------------------------- >Comment By: Oliver Schneider (assarbad) Date: 2010-03-24 17:00 Message: Hi, I actually already did that before your comment from 2010-03-17 22:07 and since it did disappear I thought you removed it because it contained my email address in plain text in order to prevent spam bots from harvesting it. So you want that again? Actually the comment back then contained a bit more than just the signed-off line. // Oliver ---------------------------------------------------------------------- Comment By: Maynard Johnson (maynardj) Date: 2010-03-24 16:25 Message: Oliver, one minor thing before I push this patch into CVS . . . as I mentioned earlier, our project policy is to get a Signed-off-by line from each patch contributor. That also gives me your email address, which we put into the ChangeLog. Can you provide that please? Thanks. ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-20 18:31 Message: Sorry, for the delay. Yes, indeed. The files/folders were likely created by an older version. But this also creates a conceivable scenario in which others can encounter the same problem. Oh and since I forgot that: thanks a bunch for writing oprofile and making it available ;-) ---------------------------------------------------------------------- Comment By: Maynard Johnson (maynardj) Date: 2010-03-17 22:07 Message: The form of sample file causing the problem should never exist, according to how I read the code in daemon/opd_anon.c:get_anon_maps. Here's an example of one of your problematic sample files: \{root\}/usr/bin/python2.5/{dep}/{anon:/lib/ld-2.7.so}/19902.0xb7f4d000.0xb7f67000 Pruning off the uninteresting parts . . . <blah>/{anon:/lib/ld-2.7.so}/<blah> The current code in get_anon_maps prevents adding anon map objects having such a form, so of course, we'd never have an actual sample file created either. In older oprofile code (e.g. 0.9.3), we just listed the contents of /proc/<pid>/maps and blindly created an anon mapping for everything, even though most entries were mmap'ed binary code files (libraries plus executable). So anon mappings of the above form were expected. However, we counted on the fact that samples against a binary code file would have a non-null cookie and would then be attributed to its corresponding sfile. If the cookie value associated with the sample was null, we assumed it was an anon sample and we called find_anon_mapping. Presumably, this would function would find the "valid" anon mapping whose address range encompassed the sample. Then you get real sample files written to disk *ONLY* for anon mappings that had samples attributed to them. The fact that you're getting real sample files of the above form tells me that either the sample data was corrupted or there was a bug in the oprofile daemon code as to how it was attributing the samples You mentioned that perhaps the sample files where you ran into the opreport fault may have been from some earlier oprofile version. I believe that must be the case for reasons stated above. With this said, I think your patch is *partially* OK in that it does no harm and prevents opreport from crashing. But it does hide the fact that your sample data is suspect. I will commit your fix, but probably add a warning message along with it. ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-13 00:58 Message: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Oh, that was silly of me. Here is the ID and fingerprint of my key: ID: 0x0E88590F FP: 38B5 5EBA A470 C0F7 0942 81B8 C779 D829 0E88 590F // Oliver -----BEGIN PGP SIGNATURE----- Version: PGP Desktop 10.0.0 Charset: utf-8 wsFVAwUBS5rjL8d52CkOiFkPAQp4XA//VZ8eVYjVkki0MPcRLQdXoFxS6P2rCLrr vS5ehCuPOtu08MnE0pd73PuSY2MWHuc5EjMAJ7mcimAnaj/oCQqpZMNM2Eb9utkb 2QgYMLOc1VTw/VQEh547XXc+h9EipkVTfQqKy783caid9yTV5tcb6ndSbgdm7go2 beMxBy3rTCzhyPRtlbKmA9pgPUDFAJsrigexA9bWdS7DH8jzb1kreDr3HaP7imHB PKmS3oMSZ0IZrP+sNmY1ItXuov/5Wmn9ew6cDZfnc2W0aUWNZTOlB4Jx0u7WmOnC POVR1+nuPMrtadIsLUYyRXWuDVESrGzSTq4bLYGbT1EX+/vVBWQ4vRCv8NgixgmK HQkIRiEOzLqAdr6+VUJwNB9lGXTkczromqun28jpaqT26sELFvy3WloJwe6m/TsJ LA0JALXzl3dfN6Cis+S0AArBKMkCTsDy781t1UQjpow6EMxZP/vykUM98diVCkT+ 1yVKIu3XpIshn8DKU1HzBz22Y+HqjseXXAnv4PcPW0X/4GkSD7lIAGU+ZtkEHRrn 08Z7b7cyITbb/uT8elEGFIMql/SFtjkFd+tj1Oo20xx53r6ufb2KykDXVhJ8hX8I F6ach5lmZqOqncNigu/8C+7N26YFQ6uwXIUOBdVEgSjm4gwzjhZs8C6jpvBVujfW wuv8Q9gB/sc= =DN9I -----END PGP SIGNATURE----- ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-13 00:50 Message: Hi there, sure can do, do you have a public GPG/PGP key against which I can encrypt it? The data included there is not actually from OpenSource development, so it is confidential. // Oliver ---------------------------------------------------------------------- Comment By: Maynard Johnson (maynardj) Date: 2010-03-12 18:43 Message: Please attach the tar ball containing the offending sample data. You've done all the hard work already -- feel free to attach a patch. Just include the Signed-off-by line, as described at http://oprofile.sourceforge.net/contribute/. Thanks! ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 23:06 Message: BTW: the "opcontrol --reset" removed whatever the offending item in the samples folder was. I still have the original contents in a .tgz file, just in case. ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 23:00 Message: In the source tree of 0.9.6 as in the tar.gz I ran this grep -n -R '\.size\(\)' ./*|grep '-'|grep --color '\.size\(\)' to see further potential candidates prone to the same issue. I didn't check the context of each of them, though. However, this one also looks a bit suspicious: ./libutil++/string_manip.cpp:132: formatted.erase(formatted.size() - 1); // Oliver ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 22:55 Message: Yeah well, so much for nice formatting. The indentation in my previous post got swallowed, apparently. ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 22:52 Message: Hi again, here's when the crash happens ;) ... parse_filename.cpp: 77 /// Handle an anon region. Pretty print the details. 78 /// The second argument is the anon portion of the path which will 79 /// contain extra details such as the anon region name (unknown, vdso, heap etc.) 80 string const parse_anon(string const & str, string const & str2) 81 { 82 string name = str2; 83 // Get rid of "{anon: 84 name.erase(0, 6); 85 // Get rid of the trailing '}' 86 name.erase(name.size() - 1, 1); 87 vector<string> parts = separate_token(str, '.'); 88 if (parts.size() != 3) 89 throw invalid_argument("parse_anon() invalid name: " + str); ... >From what I see the problem is this: name.erase(name.size() - 1, 1); GDB allows to call functions inside the running program and a call to name.size() at that point gave me a return value of 0, so the call was name.erase(-1, 1) ... it never reached the next line (87) after that. I did have an older version of oprofile running before, but it gave me the same error. The files *could* be remnants from that time. However, I think that the size of the string should be checked after the erase() call on line 84 and before calling it again on line 86. I'll try to attach a file that details what I was doing in GDB. // Oliver ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 22:23 Message: Hi there, thanks for your reply. Reset will remove all the current sample files, right? So it'd be wise to create a backup in case the reset removes the symptom. After all the program shouldn't probably crash like that. Anyhow I'm getting pretty close in GDB. I was able to set a bpx on basic_string.h:1133 (which is the erase member function) and will see whether I get a usable stack backtrace this time. ---------------------------------------------------------------------- Comment By: Maynard Johnson (maynardj) Date: 2010-03-11 21:44 Message: This is unlikely to be the exact same problem as bug #1930788. I assume you're seeing the same high-level symptom of "opreport error: basic_string::erase" -- correct? If you do 'opcontrol --reset' and re-run the profile, are you still seeing this problem? If so, please do 'opreport -l --verbose=all' and direct the output to a file. Look into the file to see the last messages printed before the error. Hopefully, that will give us something to go on since gdb doesn't seem to help. ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 21:08 Message: A problem seems to be that GDB 6.8 gets derailed by the exception somehow. I'm unable to proceed until after profile_spec::generate_file_list() as called on line 256 in opreport_options.cpp and when trying to set a bpx there everything goes foobar. ---------------------------------------------------------------------- Comment By: Oliver Schneider (assarbad) Date: 2010-03-11 20:39 Message: Okay, the problem seems to come from the handle_options call on line 500 in opreport.cpp. btw: I'm running it with "opreport -l" ... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116191&aid=2968895&group_id=16191 |