Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#189 Recurrence of bug 1930788

closed-fixed
None
5
2011-08-12
2010-03-11
Oliver Schneider
No

The symptoms are identical, the version is 0.9.6 (I used the source package and compiled on Debian Lenny with all updates installed).
I'm currently trying to track it down in GDB, but it seems to be a hard nut to crack, since the reason appears to be a C++ exception "somewhere".

NB: bug 1930788 was closed and there doesn't seem to be a method to open it as bug reporter. Probably that is only allowed for tracker admins.

Discussion

  • Okay, the problem seems to come from the handle_options call on line 500 in opreport.cpp.

    btw: I'm running it with "opreport -l" ...

     
  • A problem seems to be that GDB 6.8 gets derailed by the exception somehow.

    I'm unable to proceed until after profile_spec::generate_file_list() as called on line 256 in opreport_options.cpp and when trying to set a bpx there everything goes foobar.

     
  • This is unlikely to be the exact same problem as bug #1930788. I assume you're seeing the same high-level symptom of "opreport error: basic_string::erase" -- correct? If you do 'opcontrol --reset' and re-run the profile, are you still seeing this problem? If so, please do 'opreport -l --verbose=all' and direct the output to a file. Look into the file to see the last messages printed before the error. Hopefully, that will give us something to go on since gdb doesn't seem to help.

     
  • Hi there,

    thanks for your reply. Reset will remove all the current sample files, right? So it'd be wise to create a backup in case the reset removes the symptom. After all the program shouldn't probably crash like that.

    Anyhow I'm getting pretty close in GDB. I was able to set a bpx on basic_string.h:1133 (which is the erase member function) and will see whether I get a usable stack backtrace this time.

     
  • Hi again,

    here's when the crash happens ;) ... parse_filename.cpp:

    77 /// Handle an anon region. Pretty print the details.
    78 /// The second argument is the anon portion of the path which will
    79 /// contain extra details such as the anon region name (unknown, vdso, heap etc.)
    80 string const parse_anon(string const & str, string const & str2)
    81 {
    82 string name = str2;
    83 // Get rid of "{anon:
    84 name.erase(0, 6);
    85 // Get rid of the trailing '}'
    86 name.erase(name.size() - 1, 1);
    87 vector<string> parts = separate_token(str, '.');
    88 if (parts.size() != 3)
    89 throw invalid_argument("parse_anon() invalid name: " + str);
    ...

    From what I see the problem is this:
    name.erase(name.size() - 1, 1);
    GDB allows to call functions inside the running program and a call to name.size() at that point gave me a return value of 0, so the call was name.erase(-1, 1) ... it never reached the next line (87) after that.

    I did have an older version of oprofile running before, but it gave me the same error. The files *could* be remnants from that time. However, I think that the size of the string should be checked after the erase() call on line 84 and before calling it again on line 86.

    I'll try to attach a file that details what I was doing in GDB.

    // Oliver

     
  • The GDB run which helped tracking it down ...

     
    Attachments
  • Yeah well, so much for nice formatting. The indentation in my previous post got swallowed, apparently.

     
  • In the source tree of 0.9.6 as in the tar.gz I ran this

    grep -n -R '\.size\(\)' ./*|grep '-'|grep --color '\.size\(\)'

    to see further potential candidates prone to the same issue. I didn't check the context of each of them, though. However, this one also looks a bit suspicious:

    ./libutil++/string_manip.cpp:132: formatted.erase(formatted.size() - 1);

    // Oliver

     
  • BTW: the "opcontrol --reset" removed whatever the offending item in the samples folder was. I still have the original contents in a .tgz file, just in case.

     
  • Please attach the tar ball containing the offending sample data. You've done all the hard work already -- feel free to attach a patch. Just include the Signed-off-by line, as described at http://oprofile.sourceforge.net/contribute/. Thanks!

     
    • assigned_to: nobody --> maynardj
     
  • Hi there, sure can do, do you have a public GPG/PGP key against which I can encrypt it? The data included there is not actually from OpenSource development, so it is confidential.

    // Oliver

     
  • -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA512

    Oh, that was silly of me. Here is the ID and fingerprint of my key:

    ID: 0x0E88590F
    FP: 38B5 5EBA A470 C0F7 0942 81B8 C779 D829 0E88 590F

    // Oliver

    -----BEGIN PGP SIGNATURE-----
    Version: PGP Desktop 10.0.0
    Charset: utf-8

    wsFVAwUBS5rjL8d52CkOiFkPAQp4XA//VZ8eVYjVkki0MPcRLQdXoFxS6P2rCLrr
    vS5ehCuPOtu08MnE0pd73PuSY2MWHuc5EjMAJ7mcimAnaj/oCQqpZMNM2Eb9utkb
    2QgYMLOc1VTw/VQEh547XXc+h9EipkVTfQqKy783caid9yTV5tcb6ndSbgdm7go2
    beMxBy3rTCzhyPRtlbKmA9pgPUDFAJsrigexA9bWdS7DH8jzb1kreDr3HaP7imHB
    PKmS3oMSZ0IZrP+sNmY1ItXuov/5Wmn9ew6cDZfnc2W0aUWNZTOlB4Jx0u7WmOnC
    POVR1+nuPMrtadIsLUYyRXWuDVESrGzSTq4bLYGbT1EX+/vVBWQ4vRCv8NgixgmK
    HQkIRiEOzLqAdr6+VUJwNB9lGXTkczromqun28jpaqT26sELFvy3WloJwe6m/TsJ
    LA0JALXzl3dfN6Cis+S0AArBKMkCTsDy781t1UQjpow6EMxZP/vykUM98diVCkT+
    1yVKIu3XpIshn8DKU1HzBz22Y+HqjseXXAnv4PcPW0X/4GkSD7lIAGU+ZtkEHRrn
    08Z7b7cyITbb/uT8elEGFIMql/SFtjkFd+tj1Oo20xx53r6ufb2KykDXVhJ8hX8I
    F6ach5lmZqOqncNigu/8C+7N26YFQ6uwXIUOBdVEgSjm4gwzjhZs8C6jpvBVujfW
    wuv8Q9gB/sc=
    =DN9I
    -----END PGP SIGNATURE-----

     
  • The folder names that were causing the problem including a few samples as they were present when the problem surfaced.

     
  • The patch for the issue

     
    Attachments
  • The form of sample file causing the problem should never exist, according to how I read the code in daemon/opd_anon.c:get_anon_maps. Here's an example of one of your problematic sample files:
    \{root\}/usr/bin/python2.5/{dep}/{anon:/lib/ld-2.7.so}/19902.0xb7f4d000.0xb7f67000

    Pruning off the uninteresting parts . . .
    <blah>/{anon:/lib/ld-2.7.so}/<blah>

    The current code in get_anon_maps prevents adding anon map objects having such a form, so of course, we'd never have an actual sample file created either. In older oprofile code (e.g. 0.9.3), we just listed the contents of /proc/<pid>/maps and blindly created an anon mapping for everything, even though most entries were mmap'ed binary code files (libraries plus executable). So anon mappings of the above form were expected. However, we counted on the fact that samples against a binary code file would have a non-null cookie and would then be attributed to its corresponding sfile. If the cookie value associated with the sample was null, we assumed it was an anon sample and we called find_anon_mapping. Presumably, this would function would find the "valid" anon mapping whose address range encompassed the sample. Then you get real sample files written to disk *ONLY* for anon mappings that had samples attributed to them. The fact that you're getting real sample files of the above form tells me that either the sample data was corrupted or there was a bug in the oprofile daemon code as to how it was attributing the samples

    You mentioned that perhaps the sample files where you ran into the opreport fault may have been from some earlier oprofile version. I believe that must be the case for reasons stated above. With this said, I think your patch is *partially* OK in that it does no harm and prevents opreport from crashing. But it does hide the fact that your sample data is suspect. I will commit your fix, but probably add a warning message along with it.

     
  • Sorry, for the delay. Yes, indeed. The files/folders were likely created by an older version. But this also creates a conceivable scenario in which others can encounter the same problem.

    Oh and since I forgot that: thanks a bunch for writing oprofile and making it available ;-)

     
  • Oliver, one minor thing before I push this patch into CVS . . . as I mentioned earlier, our project policy is to get a Signed-off-by line from each patch contributor. That also gives me your email address, which we put into the ChangeLog. Can you provide that please? Thanks.

     
  • Hi, I actually already did that before your comment from 2010-03-17 22:07 and since it did disappear I thought you removed it because it contained my email address in plain text in order to prevent spam bots from harvesting it. So you want that again? Actually the comment back then contained a bit more than just the signed-off line.

    // Oliver

     
  • The patch was committed to CVS on Mar 25, so I'll set this bug to FIXED.

     
    • status: open --> open-fixed
     
    • status: open-fixed --> closed-fixed