From: Maynard J. <may...@us...> - 2009-06-25 20:12:57
|
This is version 2 of a patch, originally posted on Jun 19. ======================================================= As has been discussed on this list (and reported in bug http://sourceforge.net/tracker/?func=detail&aid=1685267&group_id=16191&atid=116191), the oprofile kernel driver has sometimes recorded incorrect data leading to mis-attribution of samples, warnings of files that can't be found (that are bogus filenames), and "invalid filename" errors that cause opreport to choke and die. It's clear that high sample rates (or high logging rates due to callgraph data collection) can result in opening a timing window where task switches are lost and samples are, thus, attributed to the wrong task and binary file. The attached patch does the following to address this problem: 1. Extremely high sampling rates can result in samples being mis-attributed to the oddest things -- for example, a sample file that oprofiled has open at the time. So then a cookie switch is stored in sample data stream coming from the kernel. When the daemon processes this cookie switch, it creates a sample file for the sample file! This craziness has the form: <session-dir>/samples/[blah]/<session-dir>/samples/[blah] When any post-processing tools are done, sample filenames will be examined to determine if they are of the above form; if so, we discard them and print out one message (not a message per file) that warns the user we found invalid sample files, which were probably caused by too high of a sampling rate. 2. Implements a set of changes to keep overflow stats with the profile session data, including when oparchive is used to archive sample data. When any post-processing tools are used against the session data, the code added in this patch checks the overflow stats and prints a warning message if it finds any overflows. Comments are welcome. Thanks. -Maynard |