From: Maynard J. <may...@us...> - 2013-04-19 22:24:34
|
My colleague, Mauricio (on cc), used operf to do system-wide profiling of the specjbb benchmark, and he identified several scenarios where operf was dropping samples or attributing them to incorrect applications. We worked together to debug the problems with operf and re-test it under heavy load. This was an iterative process, as many of the problems were not easily reproduced and difficult to debug. The culmination of several weeks of work to improve operf is the patch attached below. Please test and review it. Comments are very welcome. --------------------------------------------------------------------- Fix holes in operf system-wide profiling of forked processes Using operf to do system-wide profiling of the specjbb benchmark exposed some holes in how operf was processing the perf_events data coming from the kernel. Some of the events we can get from the kernel are: - PERF_RECORD_COMM - PERF_RECORD_FORK - PERF_RECORD_MMAP - PERF_RECORD_SAMPLE The "COMM" event is to notify us of the start of an executable application. The "FORK" event tells us when a process forks another process. The "MMAP" event informs us when a shared library (or executable anonymous memory, or the executable file itself, etc.) has been mmap'ed into a process's address space. A "SAMPLE" event occurs each time the kernel takes a sample for a process. There is no guarantee in what order these events may arrive from the kernel, and when a large system (say, 64 CPUs) is running the specjbb benchmark full bore, with all processors pegged to 100%, you can get some very strange out-of-order looking sequence of events. Things get even stranger when using Java7 versus Java6 since Java7 spawns many more threads. The operf code had several issues where such out-of-order events were not handled properly, so some major changes were required in the code. Signed-off-by: Maynard Johnson <may...@us...> --- libperf_events/operf_process_info.cpp | 340 +++++++++++++++++++++------------ libperf_events/operf_process_info.h | 68 +++---- libperf_events/operf_utils.cpp | 244 ++++++++++++------------ 3 files changed, 367 insertions(+), 285 deletions(-) diff --git a/libperf_events/operf_process_info.cpp b/libperf_events/operf_process_info.cpp index a5710cc..0cfc26b 100644 --- a/libperf_events/operf_process_info.cpp +++ b/libperf_events/operf_process_info.cpp @@ -9,12 +9,18 @@ * Created on: Dec 13, 2011 * @author Maynard Johnson * (C) Copyright IBM Corp. 2011 + * + * Modified by Maynard Johnson <may...@us...> + * (C) Copyright IBM Corporation 2013 + * */ #include <stdio.h> +#include <unistd.h> #include <iostream> #include <map> #include <string.h> +#include <errno.h> #include "operf_process_info.h" #include "file_manip.h" #include "operf_utils.h" @@ -22,23 +28,13 @@ using namespace std; using namespace OP_perf_utils; -operf_process_info::operf_process_info(pid_t tgid, const char * appname, bool app_arg_is_fullname, bool is_valid) -: pid(tgid), _appname(appname ? appname : ""), valid(is_valid) +operf_process_info::operf_process_info(pid_t tgid, const char * appname, + bool app_arg_is_fullname, bool is_valid) +: pid(tgid), valid(is_valid), appname_valid(false), forked(false), look_for_appname_match(false), + appname_is_fullname(NOT_FULLNAME), num_app_chars_matched(-1) { - if (app_arg_is_fullname && appname) { - appname_is_fullname = YES_FULLNAME; - app_basename = op_basename(appname); - num_app_chars_matched = (int)app_basename.length(); - } else if (appname) { - appname_is_fullname = MAYBE_FULLNAME; - num_app_chars_matched = -1; - app_basename = appname; - } else { - appname_is_fullname = NOT_FULLNAME; - num_app_chars_matched = -1; - app_basename = ""; - } - forked = false; + _appname = ""; + set_appname(appname, app_arg_is_fullname); parent_of_fork = NULL; } @@ -50,64 +46,96 @@ operf_process_info::~operf_process_info() if (valid) { it = mmappings.begin(); end = mmappings.end(); - } else { - it = deferred_mmappings.begin(); - end = deferred_mmappings.end(); } mmappings.clear(); - deferred_mmappings.clear(); } -void operf_process_info::process_new_mapping(struct operf_mmap * mapping) +/* A combination of non-null appname and app_arg_is_fullname==true may be passed + * from various locations. But a non-null appname and app_arg_is_fullname==false + * may only be passed as a result of a PERF_RECORD_COMM event. Thus, in such + * a situation, if we find that the readlink technique for finding the appname + * does not work (most likely, because the process has already ended), we must + * try other means to get an appname. + */ +void operf_process_info::set_appname(const char * appname, bool app_arg_is_fullname) { - // If we do not know the full pathname of our app yet, - // let's try to determine if the passed filename is a good - // candidate appname. + char exe_symlink[64]; + char exe_realpath[PATH_MAX]; + bool from_COMM_event = (appname && !app_arg_is_fullname); - if (!mapping->is_anon_mapping && (appname_is_fullname < YES_FULLNAME) && (num_app_chars_matched < (int)app_basename.length())) { - string basename; - int num_matched_chars = get_num_matching_chars(mapping->filename, basename); - if (num_matched_chars > num_app_chars_matched) { - appname_is_fullname = MAYBE_FULLNAME; - _appname = mapping->filename; - app_basename = basename; - num_app_chars_matched = num_matched_chars; - cverb << vmisc << "Best appname match is " << _appname << endl; + /* - If the stored _appname is already valid, then return + * - If the passed appname is NULL and the stored _appname is not empty, then + * return, since a non-empty _appname implies we've been through this + * function before (and would have tried the readlink method or, perhaps, + * fallen back to some other method to set the stored _appname), and the NULL + * appname argument is going to help here. + */ + if (appname_valid || (!appname && _appname.length())) + return; + + if (look_for_appname_match && !from_COMM_event) + return find_best_match_appname_all_mappings(); + + snprintf(exe_symlink, 64, "/proc/%d/exe", pid); + memset(exe_realpath, '\0', PATH_MAX); + + /* If the user is running a command via taskset, the kernel will send us a PERF_RECORD_COMM + * for both comm=taskset and comm=<user_command> for the same process ID !! + * The user will not be interested in taskset samples; thus, we ignore such COMM events. + * This is a hack, but there doesn't seem to be a better way around the possibility of having + * application samples attributed to "taskset" instead of the application. + */ + if (readlink(exe_symlink, exe_realpath, sizeof(exe_realpath)-1) > 0) { + _appname = exe_realpath; + app_basename = op_basename(_appname); + if (!strncmp(app_basename.c_str(), "taskset", strlen("taskset"))) { + _appname = "unknown"; + app_basename = "unknown"; + } else { + appname_valid = true; } + } else { + /* Most likely that the process has ended already, so we'll need to determine + * the appname through different means. + */ + if (cverb << vmisc) + cerr << "PID: " << hex << pid << " Unable to obtain appname from " << exe_symlink << endl + << "\t" << strerror(errno) << endl; + if (appname && strcmp(appname, "taskset")) { + _appname = appname; + if (app_arg_is_fullname) { + appname_valid = true; + } else { + look_for_appname_match = true; + } + } else { + _appname = "unknown"; + } + app_basename = _appname; } - mmappings[mapping->start_addr] = mapping; - vector<operf_process_info *>::iterator it = forked_processes.begin(); - while (it != forked_processes.end()) { - operf_process_info * p = *it; - p->copy_new_parent_mapping(mapping); - cverb << vmisc << "Copied new parent mapping for " << mapping->filename - << " for forked process " << p->pid << endl; - it++; - } - + cverb << vmisc << "PID: " << hex << pid << " appname is set to " + << _appname << endl; + if (look_for_appname_match) + find_best_match_appname_all_mappings(); } -/* This method should only be invoked when a "delayed" COMM event is processed. - * By "delayed", I mean that we have already received MMAP events for the associated - * process, for which we've had to create a partial operf_process_info object -- one - * that has no _appname yet and is marked invalid. - * - * Given the above statement, the passed app_shortname "must" come from a comm.comm - * field, which is 16 chars in length (thus the name of the arg). +/* This operf_process_info object may be a parent to processes that it has forked. + * If the forked process has not done an 'exec' yet (i.e., we've not received a + * COMM event for it), then it's still a dependent process of its parent. + * If so, it will be in the parent's collection of forked processes. So, + * when adding a new mapping, we should copy that mapping to each forked + * child's operf_process_info object. Then, if samples are taken for that + * mapping for that forked process, the samples can be correctly attributed. */ -void operf_process_info::process_deferred_mappings(string app_shortname) +void operf_process_info::process_mapping(struct operf_mmap * mapping, bool do_self) { - _appname = app_shortname; - app_basename = app_shortname; - valid = true; - map<u64, struct operf_mmap *>::iterator it = deferred_mmappings.begin(); - while (it != deferred_mmappings.end()) { - process_new_mapping(it->second); - cverb << vmisc << "Processed deferred mapping for " << it->second->filename << endl; - it++; + if (!appname_valid && !is_forked()) { + if (look_for_appname_match) + check_mapping_for_appname(mapping); + else + set_appname(NULL, false); } - deferred_mmappings.clear(); - process_deferred_forked_processes(); + set_new_mapping_recursive(mapping, do_self); } int operf_process_info::get_num_matching_chars(string mapped_filename, string & basename) @@ -141,6 +169,48 @@ int operf_process_info::get_num_matching_chars(string mapped_filename, string & return num_matched_chars ? num_matched_chars : -1; } +/* If we do not know the full pathname of our app yet, + * let's try to determine if the passed filename is a good + * candidate appname. + * ASSUMPTION: This function is called only when look_for_appname_match==true. + */ +void operf_process_info::check_mapping_for_appname(struct operf_mmap * mapping) +{ + if (!mapping->is_anon_mapping) { + string basename; + int num_matched_chars = get_num_matching_chars(mapping->filename, basename); + if (num_matched_chars > num_app_chars_matched) { + if (num_matched_chars == app_basename.length()) { + appname_is_fullname = YES_FULLNAME; + look_for_appname_match = false; + appname_valid = true; + } else { + appname_is_fullname = MAYBE_FULLNAME; + } + _appname = mapping->filename; + app_basename = basename; + num_app_chars_matched = num_matched_chars; + cverb << vmisc << "Best appname match is " << _appname << endl; + } + } +} + +void operf_process_info::find_best_match_appname_all_mappings(void) +{ + map<u64, struct operf_mmap *>::iterator it; + + // We may not even have a candidate shortname (from a COMM event) for the app yet + if (_appname == "unknown") + return; + + it = mmappings.begin(); + while (it != mmappings.end()) { + check_mapping_for_appname(it->second); + it++; + } + +} + const struct operf_mmap * operf_process_info::find_mapping_for_sample(u64 sample_addr) { map<u64, struct operf_mmap *>::iterator it = mmappings.begin(); @@ -178,23 +248,15 @@ void operf_process_info::process_hypervisor_mapping(u64 ip) map<u64, struct operf_mmap *>::iterator end; curr_end = curr_start = ~0ULL; - if (valid) { - it = mmappings.begin(); - end = mmappings.end(); - } else { - it = deferred_mmappings.begin(); - end = deferred_mmappings.end(); - } + it = mmappings.begin(); + end = mmappings.end(); while (it != end) { if (it->second->is_hypervisor) { struct operf_mmap * _mmap = it->second; curr_start = _mmap->start_addr; curr_end = _mmap->end_addr; if (curr_start > ip) { - if (valid) - mmappings.erase(it); - else - deferred_mmappings.erase(it); + mmappings.erase(it); delete _mmap; } else { create_new_hyperv_mmap = false; @@ -220,10 +282,7 @@ void operf_process_info::process_hypervisor_mapping(u64 ip) cout << "\tstart_addr: " << hex << hypervisor_mmap->start_addr; cout << "; end addr: " << hypervisor_mmap->end_addr << endl; } - if (valid) - process_new_mapping(hypervisor_mmap); - else - add_deferred_mapping(hypervisor_mmap); + process_mapping(hypervisor_mmap, false); } } @@ -236,51 +295,40 @@ void operf_process_info::copy_mappings_to_forked_process(operf_process_info * fo * original object is created in operf_utils:__handle_mmap_event and * is saved in the global all_images_map. */ - forked_pid->process_new_mapping(mapping); + forked_pid->process_mapping(mapping, true); it++; } } -void operf_process_info::connect_forked_process_to_parent(operf_process_info * parent) +void operf_process_info::set_fork_info(operf_process_info * parent) { forked = true; parent_of_fork = parent; - if (parent->is_valid()) { - valid = true; - _appname = parent->get_app_name(); - if (parent->is_appname_valid() && !_appname.empty()) { - appname_is_fullname = YES_FULLNAME; - app_basename = op_basename(_appname); - num_app_chars_matched = (int)app_basename.length(); - } else if (!_appname.empty()) { - appname_is_fullname = MAYBE_FULLNAME; - num_app_chars_matched = -1; - app_basename = _appname; - } else { - appname_is_fullname = NOT_FULLNAME; - num_app_chars_matched = -1; - app_basename = ""; - } - parent->copy_mappings_to_forked_process(this); - } + parent_of_fork->add_forked_pid_association(this); + parent_of_fork->copy_mappings_to_forked_process(this); } -void operf_process_info::process_deferred_forked_processes(void) +/* ASSUMPTION: This function should only be called during reprocessing phase + * since we blindly set the _appname to that of the parent. If this function + * were called from elsewhere, the parent's _appname might not yet be fully baked. + */ +void operf_process_info::connect_forked_process_to_parent(void) { - vector<operf_process_info *>::iterator it = forked_processes.begin(); - while (it != forked_processes.end()) { - operf_process_info * p = *it; - p->connect_forked_process_to_parent(this); - cverb << vmisc << "Processed deferred forked process " << p->pid << endl; - it++; - } + if (cverb << vmisc) + cout << "Connecting forked proc " << pid << " to parent " << parent_of_fork << endl; + valid = true; + _appname = parent_of_fork->get_app_name(); + app_basename = op_basename(_appname); + appname_valid = true; } + void operf_process_info::remove_forked_process(pid_t forked_pid) { std::vector<operf_process_info *>::iterator it = forked_processes.begin(); while (it != forked_processes.end()) { - if ((*it)->pid == forked_pid) { + operf_process_info * p = *it; + if (p->pid == forked_pid) { forked_processes.erase(it); break; } @@ -288,29 +336,75 @@ void operf_process_info::remove_forked_process(pid_t forked_pid) } } -/* This function is called as a result of the following scenario: - * 1. An operf_process_info was created for a FORK event - * 2. The forked process was connected to (associated with) its parent, - * adding the parent's mmappings to the forked process's operf_process_info. - * 3. Then the forked process does an exec, which results in a COMM - * event. The forked process is now considered completely separate - * from its parent, so we need to disassociate it from the parent. +/* See comment in operf_utils::__handle_comm_event for conditions under + * which this function is called. */ -void operf_process_info::disassociate_from_parent(char * app_shortname) +void operf_process_info::try_disassociate_from_parent(char * app_shortname) { - _appname = app_shortname; - app_basename = app_shortname; - appname_is_fullname = NOT_FULLNAME; + if (parent_of_fork && (parent_of_fork->pid == this->pid)) + return; + + if (cverb << vmisc && parent_of_fork) + cout << "Dis-associating forked proc " << pid + << " from parent " << parent_of_fork->pid << endl; + valid = true; - /* Now that we have a valid app shortname (from the COMM event data), - * let's spin through our mmappings and process them -- see if we can - * find one that has a good appname candidate. - */ - num_app_chars_matched = 0; + set_appname(app_shortname, false); + map<u64, struct operf_mmap *>::iterator it = mmappings.begin(); while (it != mmappings.end()) { - process_new_mapping(it->second); + operf_mmap * cur = it->second; + /* mmappings from the parent may have been added to this proc info prior + * to this proc info becoming valid since we could not know at the time if + * this proc would ever be valid. But now we know it's valid (which is why + * we're dis-associating from the parent), so we remove these unnecessary + * parent mmappings. + */ + if (mmappings_from_parent[cur->start_addr]) { + mmappings_from_parent[cur->start_addr] = false; + mmappings.erase(it); + } else { + process_mapping(cur, false); + } + it++; + } + if (parent_of_fork) { + parent_of_fork->remove_forked_process(this->pid); + parent_of_fork = NULL; + } + forked = false; +} + +/* This function adds a new mapping to the current operf_process_info + * and then calls the same function on each of its forked children. + * If do_self==true, it means this function is being called by a parent + * on a forked child's operf_process_info. Then, if the mapping already + * exists, we do not set the corresponding mmappings_from_parent since we + * want to retain the knowledge that the mapping had already been added for + * this process versus from the parent. If do_self==false, it means this + * operf_process_info is the top-level parent and should set the corresponding + * mmappings_from_parent to false. The mmappings_from_parent map allows us to + * know whether to keep or discard the mapping if/when we dis-associate from + * the parent, + */ +void operf_process_info::set_new_mapping_recursive(struct operf_mmap * mapping, bool do_self) +{ + if (do_self) { + map<u64, struct operf_mmap *>::iterator it = mmappings.find(mapping->start_addr); + if (it == mmappings.end()) + mmappings_from_parent[mapping->start_addr] = true; + else + mmappings_from_parent[mapping->start_addr] = false; + } else { + mmappings_from_parent[mapping->start_addr] = false; + } + mmappings[mapping->start_addr] = mapping; + std::vector<operf_process_info *>::iterator it = forked_processes.begin(); + while (it != forked_processes.end()) { + operf_process_info * fp = *it; + fp->set_new_mapping_recursive(mapping, true); + cverb << vmisc << "Copied new parent mapping for " << mapping->filename + << " for forked process " << fp->pid << endl; it++; } - parent_of_fork->remove_forked_process(this->pid); } diff --git a/libperf_events/operf_process_info.h b/libperf_events/operf_process_info.h index 675cae3..2bd5212 100644 --- a/libperf_events/operf_process_info.h +++ b/libperf_events/operf_process_info.h @@ -46,54 +46,32 @@ struct operf_mmap { * be "ET_EXEC"). * * This class is designed to handle the possibility that MMAP events may occur for a process - * prior to the COMM event. I don't know if this is possible, but it didn't take much to - * add code to handle this exigency. + * prior to the COMM event. */ class operf_process_info { public: - operf_process_info(pid_t tgid, const char * appname, bool app_arg_is_fullname, bool is_valid); + operf_process_info(pid_t tgid, const char * appname, bool app_arg_is_fullname, + bool is_valid); ~operf_process_info(void); bool is_valid(void) { return (valid); } + bool is_appname_valid(void) { return (valid && appname_valid); } + void set_valid(void) { valid = true; } + void set_appname_valid(void) { appname_valid = true; } bool is_forked(void) { return forked; } - void process_new_mapping(struct operf_mmap * mapping); + void process_mapping(struct operf_mmap * mapping, bool do_self); void process_hypervisor_mapping(u64 ip); - void process_deferred_mappings(std::string app_shortname); - void connect_forked_process_to_parent(operf_process_info * parent); - void copy_new_parent_mapping(struct operf_mmap * mapping) - { mmappings[mapping->start_addr] = mapping; } + void connect_forked_process_to_parent(void); + void set_fork_info(operf_process_info * parent); void add_forked_pid_association(operf_process_info * forked_pid) { forked_processes.push_back(forked_pid); } void copy_mappings_to_forked_process(operf_process_info * forked_pid); - void disassociate_from_parent(char * appname); + void try_disassociate_from_parent(char * appname); void remove_forked_process(pid_t forked_pid); std::string get_app_name(void) { return _appname; } - void add_deferred_mapping(struct operf_mmap * mapping) - { deferred_mmappings[mapping->start_addr] = mapping; } const struct operf_mmap * find_mapping_for_sample(u64 sample_addr); + void set_appname(const char * appname, bool app_arg_is_fullname); + void check_mapping_for_appname(struct operf_mmap * mapping); - /* The valid bit is set when a COMM event has been received for the process - * represented by this object. But since the COMM event only gives a shortname - * for the app (16 chars at most), the process_info object is not completely - * baked until appname_valid() returns true. In truth, if appname_valid returns - * true, we can't really be sure we've got a valid full app name since the true - * result could be from: - * (appname_is_fullname == MAYBE_FULLNAME) &&(num_app_chars_matched > 0) - * But this is the best guess we can make. - */ - bool is_appname_valid(void) - { - bool result; - if (!valid) - return false; - if (appname_is_fullname == YES_FULLNAME) - result = true; - else if ((appname_is_fullname == MAYBE_FULLNAME) && - (num_app_chars_matched > 0)) - result = true; - else - result = false; - return result; - } private: typedef enum { @@ -103,24 +81,28 @@ private: } op_fullname_t; pid_t pid; std::string _appname; - bool valid; + bool valid, appname_valid, look_for_appname_match; bool forked; op_fullname_t appname_is_fullname; std::string app_basename; int num_app_chars_matched; std::map<u64, struct operf_mmap *> mmappings; - std::map<u64, struct operf_mmap *> deferred_mmappings; - /* When a FORK event is recieved, we try to associate that forked - * process with its parent, but if the parent operf_process_info is - * not yet valid, we have to defer this association until - * after the parent becomes valid. This forked_processes collection - * holds those forked processes for which the association to the - * parent has been deferred. + std::map<u64, bool> mmappings_from_parent; + /* When a FORK event is received, we associate that forked process + * with its parent by adding it to the parent's forked_processes + * collection. The main reason we need this collection is because + * PERF_RECORD_MMAP events may arrive for the parent out of order, + * after a PERF_RECORD_FORK. Since forked processes inherit their + * parent's mmappings, we want to make sure those mmappings exist + * for the forked process so that samples may be properly attributed. + * Therefore, the various paths of adding mmapings to a parent, will + * also result in adding those mmappings to forked children. */ std::vector<operf_process_info *> forked_processes; operf_process_info * parent_of_fork; + void set_new_mapping_recursive(struct operf_mmap * mapping, bool do_self); int get_num_matching_chars(std::string mapped_filename, std::string & basename); - void process_deferred_forked_processes(void); + void find_best_match_appname_all_mappings(void); }; diff --git a/libperf_events/operf_utils.cpp b/libperf_events/operf_utils.cpp index da46d33..ca504b0 100644 --- a/libperf_events/operf_utils.cpp +++ b/libperf_events/operf_utils.cpp @@ -10,7 +10,7 @@ * (C) Copyright IBM Corp. 2011 * * Modified by Maynard Johnson <may...@us...> - * (C) Copyright IBM Corporation 2012 + * (C) Copyright IBM Corporation 2012, 2013 * */ @@ -64,19 +64,6 @@ static list<event_t *> unresolved_events; static struct operf_transient trans; static bool sfile_init_done; -/* The handling of mmap's for a process was a bit tricky to get right, in particular, - * the handling of what I refer to as "deferred mmap's" -- i.e., when we receive an - * mmap event for which we've not yet received a comm event (so we don't know app name - * for the process). I have left in some debugging code here (compiled out via #ifdef) - * so we can easily test and validate any changes we ever may need to make to this code. - */ -//#define _TEST_DEFERRED_MAPPING -#ifdef _TEST_DEFERRED_MAPPING -static bool do_comm_event; -static event_t comm_event; -#endif - - /* Some architectures (e.g., ppc64) do not use the same event value (code) for oprofile * and for perf_events. The operf-record process requires event values that perf_events * understands, but the operf-read process requires oprofile event values. The purpose of @@ -233,7 +220,8 @@ static void __handle_fork_event(event_t * event) { if (cverb << vconvert) cout << "PERF_RECORD_FORK for tgid/tid = " << event->fork.pid - << "/" << event->fork.tid << endl; + << "/" << event->fork.tid << "; parent " << event->fork.ppid + << "/" << event->fork.ptid << endl; map<pid_t, operf_process_info *>::iterator it; operf_process_info * parent = NULL; @@ -246,62 +234,68 @@ static void __handle_fork_event(event_t * event) // Create a new proc info object for the parent, but mark it invalid since we have // not yet received a COMM event for this PID. parent = new operf_process_info(event->fork.ppid, app_name ? app_name : NULL, - app_name != NULL, false); + app_name != NULL, false); if (cverb << vconvert) - cout << "Adding new proc info to collection for PID " << event->fork.ppid << endl; + cout << "Adding new proc info to collection for parent PID " + << event->fork.ppid << endl; process_map[event->fork.ppid] = parent; } + /* If the forked process's pid is the same as the parent's, we simply ignore the FORK + * event. This is because operf_process_info objects are stored in the map collection + * by pid, meaning that the forked process and its parent reference the same + * operf_process_info object. + */ + if (event->fork.pid == event->fork.ppid) + return; + it = process_map.find(event->fork.pid); if (it == process_map.end()) { - forked_proc = new operf_process_info(event->fork.pid, - parent->get_app_name().c_str(), - parent->is_appname_valid(), parent->is_valid()); + forked_proc = new operf_process_info(event->fork.pid, NULL, false, false); if (cverb << vconvert) - cout << "Adding new proc info to collection for PID " << event->fork.pid << endl; + cout << "Adding new proc info to collection for forked PID " + << event->fork.pid << endl; process_map[event->fork.pid] = forked_proc; - forked_proc->connect_forked_process_to_parent(parent); - parent->add_forked_pid_association(forked_proc); - if (cverb << vconvert) - cout << "Connecting forked proc " << event->fork.pid << " to parent" << endl; + forked_proc->set_fork_info(parent); } else { - /* There are two ways that we may get to this point. One way is if - * we've received a COMM event for the forked process before the FORK event. + /* * Normally, if parent process A forks child process B which then does an exec, we - * first see a FORK event, followed by a COMM event. But apparently there's no - * guarantee in what order these events may be seen by userspace. No matter -- since - * the exec'ed process is now a standalone process (which will get MMAP events - * for all of its mmappings, there's no need to re-associate it back to the parent - * as we do for a non-exec'ed forked process. So we'll just ignore it. + * first see a FORK event, followed by a COMM event. In this case, the + * operf_process_info created for the forked process is marked as valid. But there's + * no guarantee what order these events may be seen by userspace -- we could easily + * get MMAP, FORK, and finally a COMM event, which is opposite of "expected". So we + * must handle this. * - * But the second way that there may be an existing operf_process_info object is if - * a new mmap event (a real MMAP event or a synthesized event (e.g. for hypervisor - * mmapping) occurred for the forked process before a COMM event was received for it. - * In this case, the forked process will be marked invalid until the COMM event - * is received. But if this process does *not* do an exec, there will never be a - * COMM event for it. Such forked processes should be tightly connected to their - * parent, so we'll go ahead and associate the forked process with its parent. - * If a COMM event comes later for the forked process, we'll disassociate them. + * For a valid operf_process_info, if the forked process pid is unique from that of + * the parent, it implies a COMM event was already received for this forked process. + * Such processes are treated as standalone processes, so we ignore the FORK event. + * For all other cases, if the forked process has not already been associated with + * its parent (i.e., !is_forked()), we go ahead and set that association. */ + forked_proc = it->second; - if (!forked_proc->is_valid()) { - forked_proc->connect_forked_process_to_parent(parent); - parent->add_forked_pid_association(forked_proc); + if (forked_proc->is_valid()) { + // Ignore the FORK event if (cverb << vconvert) - cout << "Connecting existing incomplete forked proc " << event->fork.pid - << " to parent" << endl; + cout << "Forked proc " << event->fork.pid + << " is currently valid (i.e., PERF_RECORD_COMM already received)," + << " so is independent from parent " + << event->fork.ppid << endl; + return; + } + + if (!forked_proc->is_forked()) { + forked_proc->set_fork_info(parent); + if (cverb << vconvert) + cout << "Set fork info for PID " << event->fork.pid + << " with parent " << event->fork.ppid << endl; } } } + static void __handle_comm_event(event_t * event) { -#ifdef _TEST_DEFERRED_MAPPING - if (!do_comm_event) { - comm_event = event; - return; - } -#endif if (cverb << vconvert) cout << "PERF_RECORD_COMM for " << event->comm.comm << ", tgid/tid = " << event->comm.pid << "/" << event->comm.tid << endl; @@ -309,14 +303,6 @@ static void __handle_comm_event(event_t * event) map<pid_t, operf_process_info *>::iterator it; it = process_map.find(event->comm.pid); if (it == process_map.end()) { - /* TODO: Handle system housekeeping tasks. For certain kinds of processes, - * we will get a COMM event, but never get an MMAP event (e.g, kpsmoused). - * Without receiving an MMAP event, we have no clue whether the name given - * with the COMM event is a full "appname" or not, so the operf_process_info - * is marked invalid. We end up dropping all samples for such tasks when - * doing a system-wide profile. - */ - /* A COMM event can occur as the result of the app doing a fork/exec, * where the COMM event is for the forked process. In that case, we * pass the event->comm field as the appname argument to the ctor. @@ -330,34 +316,58 @@ static void __handle_comm_event(event_t * event) appname_arg = event->comm.comm; is_complete_appname = false; } - operf_process_info * proc = new operf_process_info(event->comm.pid,appname_arg, - is_complete_appname, true); + /* If tid != pid, this may be a forked process for which we've not yet received + * the PERF_RECORD_FORK event, nor have we received any other events for the + * process (e.g., COMM event for parent). We mark such proc infos as "invalid" so we + * don't falsely attribute samples to a child thread which should, instead, + * be attributed to its parent. If this is indeed a forked process, we should + * eventually receive a COMM event for the parent (where tid==pid), at which time, + * we'll mark the proc info valid. If we never receive a COMM event for a parent, + * the proc info will get marked valid during reprocessing so we can attribute + * deferred samples at that time. + */ + + bool valid_bit = (event->comm.pid == event->comm.tid); + operf_process_info * proc = new operf_process_info(event->comm.pid, appname_arg, + is_complete_appname, valid_bit); if (cverb << vconvert) cout << "Adding new proc info to collection for PID " << event->comm.pid << endl; process_map[event->comm.pid] = proc; } else { - if (it->second->is_valid()) { - if (it->second->is_forked()) { - /* If the operf_process_info object we found was created as a result of - * a FORK event, then it was associated with the parent process and contains - * the parent's appname. But now we're getting a COMM event for this forked - * process, which means it did an exec, so we need to change the appname - * to the executable associated with this COMM event, which is done via - * calling disassociate_from_parent(). - */ - if (cverb << vconvert) - cout << "Dis-associating forked proc " << event->comm.pid - << " from parent" << endl; - it->second->disassociate_from_parent(event->comm.comm); - } else { - if (cverb << vconvert) - cout << "Received extraneous COMM event for " << event->comm.comm - << ", PID " << event->comm.pid << endl; + /* If we reach this point, it means a proc info object for this pid already exists; + * however, if it was created by something other than a "valid" COMM event (e.g., MMAP event), + * it's 'valid' bit will be set to false. NOTE: A "valid" COMM event is one in which + * tid==pid. + * + * We must handle the following situations: + * o If valid: + * - Existing proc info created for a parent (i.e., tid == pid), and the current + * COMM event is for a child -- and we ignore all child COMM events. + * - Existing proc info may have invalid appname, so we call set_appname() + * and see if this COMM event has an appropriate appname. + * + * o If not valid: + * - Existing proc info was created for the parent by an MMAP type of event, and the + * current COMM event is for the parent. + * - Existing proc info was created by FORK; now that we have a COMM event for it, + * the process should be treated as a standalone process, so we call + * try_disassociate_from_parent(). + */ + if (!it->second->is_valid()) { + // Ignore child COMM events (i.e., pid != tid). + if (event->comm.pid == event->comm.tid) { + if (it->second->is_forked()) { + it->second->try_disassociate_from_parent(event->comm.comm); + } else { + // Existing proc info created by MMAP event or some such + it->second->set_valid(); + it->second->set_appname(event->comm.comm, false); + } } } else { - if (cverb << vconvert) - cout << "Processing deferred mappings" << endl; - it->second->process_deferred_mappings(event->comm.comm); + if ((event->comm.pid == event->comm.tid) && !it->second->is_appname_valid()) { + it->second->set_appname(event->comm.comm, false); + } } } } @@ -399,9 +409,10 @@ static void __handle_mmap_event(event_t * event) mapping->pgoff = event->mmap.pgoff; if (cverb << vconvert) { - cout << "PERF_RECORD_MMAP for " << event->mmap.filename << endl; - cout << "\tstart_addr: " << hex << mapping->start_addr; - cout << "; end addr: " << mapping->end_addr << endl; + cout << "PERF_RECORD_MMAP for process " << hex << event->mmap.pid << "/" + << event->mmap.tid << ": " << event->mmap.filename << endl; + cout << "\tstart_addr: " << hex << mapping->start_addr + << "; end addr: " << mapping->end_addr << endl; } if (event->header.misc & PERF_RECORD_MISC_USER) @@ -458,28 +469,14 @@ static void __handle_mmap_event(event_t * event) operf_process_info * proc = new operf_process_info(event->mmap.pid, appname_arg, is_complete_appname, false); - proc->add_deferred_mapping(mapping); - if (cverb << vconvert) - cout << "Added deferred mapping " << event->mmap.filename - << " for new process_info object" << endl; process_map[event->mmap.pid] = proc; -#ifdef _TEST_DEFERRED_MAPPING - if (!do_comm_event) { - do_comm_event = true; - __handle_comm_event(comm_event, out); - } -#endif - } else if (!it->second->is_valid()) { - it->second->add_deferred_mapping(mapping); - if (cverb << vconvert) - cout << "Added deferred mapping " << event->mmap.filename - << " for existing but incomplete process_info object" << endl; + proc->process_mapping(mapping, false); } else { - if (cverb << vconvert) - cout << "Process mapping for " << event->mmap.filename << " on behalf of " - << event->mmap.pid << endl; - it->second->process_new_mapping(mapping); + it->second->process_mapping(mapping, false); } + if (cverb << vconvert) + cout << "Process mapping for " << event->mmap.filename << " on behalf of " + << event->mmap.pid << endl; } } @@ -498,22 +495,12 @@ static struct operf_transient * __get_operf_trans(struct sample_data * data, boo } else { // Find operf_process info for data.tgid. std::map<pid_t, operf_process_info *>::const_iterator it = process_map.find(data->pid); - if (it != process_map.end() && (it->second->is_appname_valid())) { + if (it != process_map.end() && it->second->is_appname_valid()) { proc = it->second; } else { - /* This can happen for the following reasons: - * - We get a sample before getting a COMM or MMAP - * event for the process being profiled - * - The COMM event has been processed, but since that - * only gives 16 chars of the app name, we don't - * have a valid app name yet - * - The kernel incorrectly records a sample for a - * process other than the one we requested (not - * likely -- this would be a kernel bug if it did) - * - */ + // This can validly happen if get a sample before getting a COMM event for the process if ((cverb << vconvert) && !first_time_processing) { - cerr << "Dropping sample -- process info unavailable" << endl; + cout << "Dropping sample -- process info unavailable for PID " << data->pid << endl; if (kernel_mode) operf_stats[OPERF_NO_APP_KERNEL_SAMPLE]++; else @@ -585,7 +572,7 @@ static struct operf_transient * __get_operf_trans(struct sample_data * data, boo } else { if ((cverb << vconvert) && !first_time_processing) { string domain = trans.in_kernel ? "kernel" : "userspace"; - cerr << "Discarding " << domain << " sample for process " << data->pid + cout << "Discarding " << domain << " sample for process " << data->pid << " where no appropriate mapping was found. (pc=0x" << hex << data->ip <<")" << endl; operf_stats[OPERF_LOST_NO_MAPPING]++; @@ -772,7 +759,7 @@ static int __handle_sample_event(event_t * event, u64 sample_type) domain = "unknown"; break; } - cerr << "Discarding sample from " << domain << " domain: " + cout << "Discarding sample from " << domain << " domain: " << hex << data.ip << endl; } goto out; @@ -952,13 +939,32 @@ void OP_perf_utils::op_reprocess_unresolved_events(u64 sample_type, bool print_p int num_recs = 0; cverb << vconvert << "Reprocessing samples" << endl; + + map<pid_t, operf_process_info *>::iterator procs = process_map.begin(); + for (; procs != process_map.end(); procs++) { + if (!procs->second->is_valid()) { + if (procs->second->is_forked()) { + procs->second->connect_forked_process_to_parent(); + } else { + procs->second->set_valid(); + } + } + // Force the appname_valid to true so we don't drop any samples for this process. + // The appname may not be accurate, but it's the best we can do now. + procs->second->set_appname_valid(); + } list<event_t *>::const_iterator it = unresolved_events.begin(); + int data_error = 0; for (; it != unresolved_events.end(); it++) { event_t * evt = (*it); + if (data_error < 0) { + free(evt); + continue; + } // This is just a sanity check, since all events in this list // are unresolved sample events. if (evt->header.type == PERF_RECORD_SAMPLE) { - __handle_sample_event(evt, sample_type); + data_error = __handle_sample_event(evt, sample_type); free(evt); num_recs++; if ((num_recs % 1000000 == 0) && print_progress) -- 1.7.1 |
From: Mauricio F. de O. <mau...@li...> - 2013-04-24 18:22:13
|
On 04/19/2013 07:24 PM, Maynard Johnson wrote: > My colleague, Mauricio (on cc), used operf to do system-wide profiling > of the specjbb benchmark, and he identified several scenarios where > operf was dropping samples or attributing them to incorrect applications. > We worked together to debug the problems with operf and re-test it > under heavy load. This was an iterative process, as many of the > problems were not easily reproduced and difficult to debug. The > culmination of several weeks of work to improve operf is the > patch attached below. Please test and review it. Comments are > very welcome. > Hi Maynard, The patch significantly improved the consistency of reports (opreport) between the scenarios of starting operf 'after' or 'before' the JVMs. (The 'before' case was inconsistent/problematic, which is now addressed w/ this patch) Samples from Java application/libraries/JITed code are now assigned to the java program image. Since the benchmark consumes all available CPUs to up ~100% userspace utilization, any overhead from profiling tools was a concern. So, the test scenarios included enabling/disabling lazy-conversion and reserving or not reserving CPUs for operf. The results across those permutations were consistent between starting operf before/after the JVMs. So, I believe this is good to go, as it resolved the issues with samples being incorrectly attributed. Thank you very much, -- Mauricio Faria de Oliveira IBM Linux Technology Center > --------------------------------------------------------------------- > > Fix holes in operf system-wide profiling of forked processes > > Using operf to do system-wide profiling of the specjbb benchmark > exposed some holes in how operf was processing the perf_events > data coming from the kernel. Some of the events we can get from > the kernel are: > - PERF_RECORD_COMM > - PERF_RECORD_FORK > - PERF_RECORD_MMAP > - PERF_RECORD_SAMPLE > > The "COMM" event is to notify us of the start of an executable > application. The "FORK" event tells us when a process forks > another process. The "MMAP" event informs us when a shared library > (or executable anonymous memory, or the executable file itself, etc.) > has been mmap'ed into a process's address space. A "SAMPLE" > event occurs each time the kernel takes a sample for a process. > > There is no guarantee in what order these events may arrive from > the kernel, and when a large system (say, 64 CPUs) is running > the specjbb benchmark full bore, with all processors pegged to > 100%, you can get some very strange out-of-order looking > sequence of events. Things get even stranger when using Java7 > versus Java6 since Java7 spawns many more threads. > > The operf code had several issues where such out-of-order > events were not handled properly, so some major changes were > required in the code. > > Signed-off-by: Maynard Johnson <may...@us...> > --- > libperf_events/operf_process_info.cpp | 340 +++++++++++++++++++++------------ > libperf_events/operf_process_info.h | 68 +++---- > libperf_events/operf_utils.cpp | 244 ++++++++++++------------ > 3 files changed, 367 insertions(+), 285 deletions(-) > > diff --git a/libperf_events/operf_process_info.cpp b/libperf_events/operf_process_info.cpp > index a5710cc..0cfc26b 100644 > --- a/libperf_events/operf_process_info.cpp > +++ b/libperf_events/operf_process_info.cpp > @@ -9,12 +9,18 @@ > * Created on: Dec 13, 2011 > * @author Maynard Johnson > * (C) Copyright IBM Corp. 2011 > + * > + * Modified by Maynard Johnson <may...@us...> > + * (C) Copyright IBM Corporation 2013 > + * > */ > > #include <stdio.h> > +#include <unistd.h> > #include <iostream> > #include <map> > #include <string.h> > +#include <errno.h> > #include "operf_process_info.h" > #include "file_manip.h" > #include "operf_utils.h" > @@ -22,23 +28,13 @@ > using namespace std; > using namespace OP_perf_utils; > > -operf_process_info::operf_process_info(pid_t tgid, const char * appname, bool app_arg_is_fullname, bool is_valid) > -: pid(tgid), _appname(appname ? appname : ""), valid(is_valid) > +operf_process_info::operf_process_info(pid_t tgid, const char * appname, > + bool app_arg_is_fullname, bool is_valid) > +: pid(tgid), valid(is_valid), appname_valid(false), forked(false), look_for_appname_match(false), > + appname_is_fullname(NOT_FULLNAME), num_app_chars_matched(-1) > { > - if (app_arg_is_fullname && appname) { > - appname_is_fullname = YES_FULLNAME; > - app_basename = op_basename(appname); > - num_app_chars_matched = (int)app_basename.length(); > - } else if (appname) { > - appname_is_fullname = MAYBE_FULLNAME; > - num_app_chars_matched = -1; > - app_basename = appname; > - } else { > - appname_is_fullname = NOT_FULLNAME; > - num_app_chars_matched = -1; > - app_basename = ""; > - } > - forked = false; > + _appname = ""; > + set_appname(appname, app_arg_is_fullname); > parent_of_fork = NULL; > } > > @@ -50,64 +46,96 @@ operf_process_info::~operf_process_info() > if (valid) { > it = mmappings.begin(); > end = mmappings.end(); > - } else { > - it = deferred_mmappings.begin(); > - end = deferred_mmappings.end(); > } > mmappings.clear(); > - deferred_mmappings.clear(); > } > > -void operf_process_info::process_new_mapping(struct operf_mmap * mapping) > +/* A combination of non-null appname and app_arg_is_fullname==true may be passed > + * from various locations. But a non-null appname and app_arg_is_fullname==false > + * may only be passed as a result of a PERF_RECORD_COMM event. Thus, in such > + * a situation, if we find that the readlink technique for finding the appname > + * does not work (most likely, because the process has already ended), we must > + * try other means to get an appname. > + */ > +void operf_process_info::set_appname(const char * appname, bool app_arg_is_fullname) > { > - // If we do not know the full pathname of our app yet, > - // let's try to determine if the passed filename is a good > - // candidate appname. > + char exe_symlink[64]; > + char exe_realpath[PATH_MAX]; > + bool from_COMM_event = (appname && !app_arg_is_fullname); > > - if (!mapping->is_anon_mapping && (appname_is_fullname < YES_FULLNAME) && (num_app_chars_matched < (int)app_basename.length())) { > - string basename; > - int num_matched_chars = get_num_matching_chars(mapping->filename, basename); > - if (num_matched_chars > num_app_chars_matched) { > - appname_is_fullname = MAYBE_FULLNAME; > - _appname = mapping->filename; > - app_basename = basename; > - num_app_chars_matched = num_matched_chars; > - cverb << vmisc << "Best appname match is " << _appname << endl; > + /* - If the stored _appname is already valid, then return > + * - If the passed appname is NULL and the stored _appname is not empty, then > + * return, since a non-empty _appname implies we've been through this > + * function before (and would have tried the readlink method or, perhaps, > + * fallen back to some other method to set the stored _appname), and the NULL > + * appname argument is going to help here. > + */ > + if (appname_valid || (!appname && _appname.length())) > + return; > + > + if (look_for_appname_match && !from_COMM_event) > + return find_best_match_appname_all_mappings(); > + > + snprintf(exe_symlink, 64, "/proc/%d/exe", pid); > + memset(exe_realpath, '\0', PATH_MAX); > + > + /* If the user is running a command via taskset, the kernel will send us a PERF_RECORD_COMM > + * for both comm=taskset and comm=<user_command> for the same process ID !! > + * The user will not be interested in taskset samples; thus, we ignore such COMM events. > + * This is a hack, but there doesn't seem to be a better way around the possibility of having > + * application samples attributed to "taskset" instead of the application. > + */ > + if (readlink(exe_symlink, exe_realpath, sizeof(exe_realpath)-1) > 0) { > + _appname = exe_realpath; > + app_basename = op_basename(_appname); > + if (!strncmp(app_basename.c_str(), "taskset", strlen("taskset"))) { > + _appname = "unknown"; > + app_basename = "unknown"; > + } else { > + appname_valid = true; > } > + } else { > + /* Most likely that the process has ended already, so we'll need to determine > + * the appname through different means. > + */ > + if (cverb << vmisc) > + cerr << "PID: " << hex << pid << " Unable to obtain appname from " << exe_symlink << endl > + << "\t" << strerror(errno) << endl; > + if (appname && strcmp(appname, "taskset")) { > + _appname = appname; > + if (app_arg_is_fullname) { > + appname_valid = true; > + } else { > + look_for_appname_match = true; > + } > + } else { > + _appname = "unknown"; > + } > + app_basename = _appname; > } > - mmappings[mapping->start_addr] = mapping; > - vector<operf_process_info *>::iterator it = forked_processes.begin(); > - while (it != forked_processes.end()) { > - operf_process_info * p = *it; > - p->copy_new_parent_mapping(mapping); > - cverb << vmisc << "Copied new parent mapping for " << mapping->filename > - << " for forked process " << p->pid << endl; > - it++; > - } > - > + cverb << vmisc << "PID: " << hex << pid << " appname is set to " > + << _appname << endl; > + if (look_for_appname_match) > + find_best_match_appname_all_mappings(); > } > > -/* This method should only be invoked when a "delayed" COMM event is processed. > - * By "delayed", I mean that we have already received MMAP events for the associated > - * process, for which we've had to create a partial operf_process_info object -- one > - * that has no _appname yet and is marked invalid. > - * > - * Given the above statement, the passed app_shortname "must" come from a comm.comm > - * field, which is 16 chars in length (thus the name of the arg). > +/* This operf_process_info object may be a parent to processes that it has forked. > + * If the forked process has not done an 'exec' yet (i.e., we've not received a > + * COMM event for it), then it's still a dependent process of its parent. > + * If so, it will be in the parent's collection of forked processes. So, > + * when adding a new mapping, we should copy that mapping to each forked > + * child's operf_process_info object. Then, if samples are taken for that > + * mapping for that forked process, the samples can be correctly attributed. > */ > -void operf_process_info::process_deferred_mappings(string app_shortname) > +void operf_process_info::process_mapping(struct operf_mmap * mapping, bool do_self) > { > - _appname = app_shortname; > - app_basename = app_shortname; > - valid = true; > - map<u64, struct operf_mmap *>::iterator it = deferred_mmappings.begin(); > - while (it != deferred_mmappings.end()) { > - process_new_mapping(it->second); > - cverb << vmisc << "Processed deferred mapping for " << it->second->filename << endl; > - it++; > + if (!appname_valid && !is_forked()) { > + if (look_for_appname_match) > + check_mapping_for_appname(mapping); > + else > + set_appname(NULL, false); > } > - deferred_mmappings.clear(); > - process_deferred_forked_processes(); > + set_new_mapping_recursive(mapping, do_self); > } > > int operf_process_info::get_num_matching_chars(string mapped_filename, string & basename) > @@ -141,6 +169,48 @@ int operf_process_info::get_num_matching_chars(string mapped_filename, string & > return num_matched_chars ? num_matched_chars : -1; > } > > +/* If we do not know the full pathname of our app yet, > + * let's try to determine if the passed filename is a good > + * candidate appname. > + * ASSUMPTION: This function is called only when look_for_appname_match==true. > + */ > +void operf_process_info::check_mapping_for_appname(struct operf_mmap * mapping) > +{ > + if (!mapping->is_anon_mapping) { > + string basename; > + int num_matched_chars = get_num_matching_chars(mapping->filename, basename); > + if (num_matched_chars > num_app_chars_matched) { > + if (num_matched_chars == app_basename.length()) { > + appname_is_fullname = YES_FULLNAME; > + look_for_appname_match = false; > + appname_valid = true; > + } else { > + appname_is_fullname = MAYBE_FULLNAME; > + } > + _appname = mapping->filename; > + app_basename = basename; > + num_app_chars_matched = num_matched_chars; > + cverb << vmisc << "Best appname match is " << _appname << endl; > + } > + } > +} > + > +void operf_process_info::find_best_match_appname_all_mappings(void) > +{ > + map<u64, struct operf_mmap *>::iterator it; > + > + // We may not even have a candidate shortname (from a COMM event) for the app yet > + if (_appname == "unknown") > + return; > + > + it = mmappings.begin(); > + while (it != mmappings.end()) { > + check_mapping_for_appname(it->second); > + it++; > + } > + > +} > + > const struct operf_mmap * operf_process_info::find_mapping_for_sample(u64 sample_addr) > { > map<u64, struct operf_mmap *>::iterator it = mmappings.begin(); > @@ -178,23 +248,15 @@ void operf_process_info::process_hypervisor_mapping(u64 ip) > map<u64, struct operf_mmap *>::iterator end; > > curr_end = curr_start = ~0ULL; > - if (valid) { > - it = mmappings.begin(); > - end = mmappings.end(); > - } else { > - it = deferred_mmappings.begin(); > - end = deferred_mmappings.end(); > - } > + it = mmappings.begin(); > + end = mmappings.end(); > while (it != end) { > if (it->second->is_hypervisor) { > struct operf_mmap * _mmap = it->second; > curr_start = _mmap->start_addr; > curr_end = _mmap->end_addr; > if (curr_start > ip) { > - if (valid) > - mmappings.erase(it); > - else > - deferred_mmappings.erase(it); > + mmappings.erase(it); > delete _mmap; > } else { > create_new_hyperv_mmap = false; > @@ -220,10 +282,7 @@ void operf_process_info::process_hypervisor_mapping(u64 ip) > cout << "\tstart_addr: " << hex << hypervisor_mmap->start_addr; > cout << "; end addr: " << hypervisor_mmap->end_addr << endl; > } > - if (valid) > - process_new_mapping(hypervisor_mmap); > - else > - add_deferred_mapping(hypervisor_mmap); > + process_mapping(hypervisor_mmap, false); > } > } > > @@ -236,51 +295,40 @@ void operf_process_info::copy_mappings_to_forked_process(operf_process_info * fo > * original object is created in operf_utils:__handle_mmap_event and > * is saved in the global all_images_map. > */ > - forked_pid->process_new_mapping(mapping); > + forked_pid->process_mapping(mapping, true); > it++; > } > } > > -void operf_process_info::connect_forked_process_to_parent(operf_process_info * parent) > +void operf_process_info::set_fork_info(operf_process_info * parent) > { > forked = true; > parent_of_fork = parent; > - if (parent->is_valid()) { > - valid = true; > - _appname = parent->get_app_name(); > - if (parent->is_appname_valid() && !_appname.empty()) { > - appname_is_fullname = YES_FULLNAME; > - app_basename = op_basename(_appname); > - num_app_chars_matched = (int)app_basename.length(); > - } else if (!_appname.empty()) { > - appname_is_fullname = MAYBE_FULLNAME; > - num_app_chars_matched = -1; > - app_basename = _appname; > - } else { > - appname_is_fullname = NOT_FULLNAME; > - num_app_chars_matched = -1; > - app_basename = ""; > - } > - parent->copy_mappings_to_forked_process(this); > - } > + parent_of_fork->add_forked_pid_association(this); > + parent_of_fork->copy_mappings_to_forked_process(this); > } > > -void operf_process_info::process_deferred_forked_processes(void) > +/* ASSUMPTION: This function should only be called during reprocessing phase > + * since we blindly set the _appname to that of the parent. If this function > + * were called from elsewhere, the parent's _appname might not yet be fully baked. > + */ > +void operf_process_info::connect_forked_process_to_parent(void) > { > - vector<operf_process_info *>::iterator it = forked_processes.begin(); > - while (it != forked_processes.end()) { > - operf_process_info * p = *it; > - p->connect_forked_process_to_parent(this); > - cverb << vmisc << "Processed deferred forked process " << p->pid << endl; > - it++; > - } > + if (cverb << vmisc) > + cout << "Connecting forked proc " << pid << " to parent " << parent_of_fork << endl; > + valid = true; > + _appname = parent_of_fork->get_app_name(); > + app_basename = op_basename(_appname); > + appname_valid = true; > } > > + > void operf_process_info::remove_forked_process(pid_t forked_pid) > { > std::vector<operf_process_info *>::iterator it = forked_processes.begin(); > while (it != forked_processes.end()) { > - if ((*it)->pid == forked_pid) { > + operf_process_info * p = *it; > + if (p->pid == forked_pid) { > forked_processes.erase(it); > break; > } > @@ -288,29 +336,75 @@ void operf_process_info::remove_forked_process(pid_t forked_pid) > } > } > > -/* This function is called as a result of the following scenario: > - * 1. An operf_process_info was created for a FORK event > - * 2. The forked process was connected to (associated with) its parent, > - * adding the parent's mmappings to the forked process's operf_process_info. > - * 3. Then the forked process does an exec, which results in a COMM > - * event. The forked process is now considered completely separate > - * from its parent, so we need to disassociate it from the parent. > +/* See comment in operf_utils::__handle_comm_event for conditions under > + * which this function is called. > */ > -void operf_process_info::disassociate_from_parent(char * app_shortname) > +void operf_process_info::try_disassociate_from_parent(char * app_shortname) > { > - _appname = app_shortname; > - app_basename = app_shortname; > - appname_is_fullname = NOT_FULLNAME; > + if (parent_of_fork && (parent_of_fork->pid == this->pid)) > + return; > + > + if (cverb << vmisc && parent_of_fork) > + cout << "Dis-associating forked proc " << pid > + << " from parent " << parent_of_fork->pid << endl; > + > valid = true; > - /* Now that we have a valid app shortname (from the COMM event data), > - * let's spin through our mmappings and process them -- see if we can > - * find one that has a good appname candidate. > - */ > - num_app_chars_matched = 0; > + set_appname(app_shortname, false); > + > map<u64, struct operf_mmap *>::iterator it = mmappings.begin(); > while (it != mmappings.end()) { > - process_new_mapping(it->second); > + operf_mmap * cur = it->second; > + /* mmappings from the parent may have been added to this proc info prior > + * to this proc info becoming valid since we could not know at the time if > + * this proc would ever be valid. But now we know it's valid (which is why > + * we're dis-associating from the parent), so we remove these unnecessary > + * parent mmappings. > + */ > + if (mmappings_from_parent[cur->start_addr]) { > + mmappings_from_parent[cur->start_addr] = false; > + mmappings.erase(it); > + } else { > + process_mapping(cur, false); > + } > + it++; > + } > + if (parent_of_fork) { > + parent_of_fork->remove_forked_process(this->pid); > + parent_of_fork = NULL; > + } > + forked = false; > +} > + > +/* This function adds a new mapping to the current operf_process_info > + * and then calls the same function on each of its forked children. > + * If do_self==true, it means this function is being called by a parent > + * on a forked child's operf_process_info. Then, if the mapping already > + * exists, we do not set the corresponding mmappings_from_parent since we > + * want to retain the knowledge that the mapping had already been added for > + * this process versus from the parent. If do_self==false, it means this > + * operf_process_info is the top-level parent and should set the corresponding > + * mmappings_from_parent to false. The mmappings_from_parent map allows us to > + * know whether to keep or discard the mapping if/when we dis-associate from > + * the parent, > + */ > +void operf_process_info::set_new_mapping_recursive(struct operf_mmap * mapping, bool do_self) > +{ > + if (do_self) { > + map<u64, struct operf_mmap *>::iterator it = mmappings.find(mapping->start_addr); > + if (it == mmappings.end()) > + mmappings_from_parent[mapping->start_addr] = true; > + else > + mmappings_from_parent[mapping->start_addr] = false; > + } else { > + mmappings_from_parent[mapping->start_addr] = false; > + } > + mmappings[mapping->start_addr] = mapping; > + std::vector<operf_process_info *>::iterator it = forked_processes.begin(); > + while (it != forked_processes.end()) { > + operf_process_info * fp = *it; > + fp->set_new_mapping_recursive(mapping, true); > + cverb << vmisc << "Copied new parent mapping for " << mapping->filename > + << " for forked process " << fp->pid << endl; > it++; > } > - parent_of_fork->remove_forked_process(this->pid); > } > diff --git a/libperf_events/operf_process_info.h b/libperf_events/operf_process_info.h > index 675cae3..2bd5212 100644 > --- a/libperf_events/operf_process_info.h > +++ b/libperf_events/operf_process_info.h > @@ -46,54 +46,32 @@ struct operf_mmap { > * be "ET_EXEC"). > * > * This class is designed to handle the possibility that MMAP events may occur for a process > - * prior to the COMM event. I don't know if this is possible, but it didn't take much to > - * add code to handle this exigency. > + * prior to the COMM event. > */ > class operf_process_info { > public: > - operf_process_info(pid_t tgid, const char * appname, bool app_arg_is_fullname, bool is_valid); > + operf_process_info(pid_t tgid, const char * appname, bool app_arg_is_fullname, > + bool is_valid); > ~operf_process_info(void); > bool is_valid(void) { return (valid); } > + bool is_appname_valid(void) { return (valid && appname_valid); } > + void set_valid(void) { valid = true; } > + void set_appname_valid(void) { appname_valid = true; } > bool is_forked(void) { return forked; } > - void process_new_mapping(struct operf_mmap * mapping); > + void process_mapping(struct operf_mmap * mapping, bool do_self); > void process_hypervisor_mapping(u64 ip); > - void process_deferred_mappings(std::string app_shortname); > - void connect_forked_process_to_parent(operf_process_info * parent); > - void copy_new_parent_mapping(struct operf_mmap * mapping) > - { mmappings[mapping->start_addr] = mapping; } > + void connect_forked_process_to_parent(void); > + void set_fork_info(operf_process_info * parent); > void add_forked_pid_association(operf_process_info * forked_pid) > { forked_processes.push_back(forked_pid); } > void copy_mappings_to_forked_process(operf_process_info * forked_pid); > - void disassociate_from_parent(char * appname); > + void try_disassociate_from_parent(char * appname); > void remove_forked_process(pid_t forked_pid); > std::string get_app_name(void) { return _appname; } > - void add_deferred_mapping(struct operf_mmap * mapping) > - { deferred_mmappings[mapping->start_addr] = mapping; } > const struct operf_mmap * find_mapping_for_sample(u64 sample_addr); > + void set_appname(const char * appname, bool app_arg_is_fullname); > + void check_mapping_for_appname(struct operf_mmap * mapping); > > - /* The valid bit is set when a COMM event has been received for the process > - * represented by this object. But since the COMM event only gives a shortname > - * for the app (16 chars at most), the process_info object is not completely > - * baked until appname_valid() returns true. In truth, if appname_valid returns > - * true, we can't really be sure we've got a valid full app name since the true > - * result could be from: > - * (appname_is_fullname == MAYBE_FULLNAME) &&(num_app_chars_matched > 0) > - * But this is the best guess we can make. > - */ > - bool is_appname_valid(void) > - { > - bool result; > - if (!valid) > - return false; > - if (appname_is_fullname == YES_FULLNAME) > - result = true; > - else if ((appname_is_fullname == MAYBE_FULLNAME) && > - (num_app_chars_matched > 0)) > - result = true; > - else > - result = false; > - return result; > - } > > private: > typedef enum { > @@ -103,24 +81,28 @@ private: > } op_fullname_t; > pid_t pid; > std::string _appname; > - bool valid; > + bool valid, appname_valid, look_for_appname_match; > bool forked; > op_fullname_t appname_is_fullname; > std::string app_basename; > int num_app_chars_matched; > std::map<u64, struct operf_mmap *> mmappings; > - std::map<u64, struct operf_mmap *> deferred_mmappings; > - /* When a FORK event is recieved, we try to associate that forked > - * process with its parent, but if the parent operf_process_info is > - * not yet valid, we have to defer this association until > - * after the parent becomes valid. This forked_processes collection > - * holds those forked processes for which the association to the > - * parent has been deferred. > + std::map<u64, bool> mmappings_from_parent; > + /* When a FORK event is received, we associate that forked process > + * with its parent by adding it to the parent's forked_processes > + * collection. The main reason we need this collection is because > + * PERF_RECORD_MMAP events may arrive for the parent out of order, > + * after a PERF_RECORD_FORK. Since forked processes inherit their > + * parent's mmappings, we want to make sure those mmappings exist > + * for the forked process so that samples may be properly attributed. > + * Therefore, the various paths of adding mmapings to a parent, will > + * also result in adding those mmappings to forked children. > */ > std::vector<operf_process_info *> forked_processes; > operf_process_info * parent_of_fork; > + void set_new_mapping_recursive(struct operf_mmap * mapping, bool do_self); > int get_num_matching_chars(std::string mapped_filename, std::string & basename); > - void process_deferred_forked_processes(void); > + void find_best_match_appname_all_mappings(void); > }; > > > diff --git a/libperf_events/operf_utils.cpp b/libperf_events/operf_utils.cpp > index da46d33..ca504b0 100644 > --- a/libperf_events/operf_utils.cpp > +++ b/libperf_events/operf_utils.cpp > @@ -10,7 +10,7 @@ > * (C) Copyright IBM Corp. 2011 > * > * Modified by Maynard Johnson <may...@us...> > - * (C) Copyright IBM Corporation 2012 > + * (C) Copyright IBM Corporation 2012, 2013 > * > */ > > @@ -64,19 +64,6 @@ static list<event_t *> unresolved_events; > static struct operf_transient trans; > static bool sfile_init_done; > > -/* The handling of mmap's for a process was a bit tricky to get right, in particular, > - * the handling of what I refer to as "deferred mmap's" -- i.e., when we receive an > - * mmap event for which we've not yet received a comm event (so we don't know app name > - * for the process). I have left in some debugging code here (compiled out via #ifdef) > - * so we can easily test and validate any changes we ever may need to make to this code. > - */ > -//#define _TEST_DEFERRED_MAPPING > -#ifdef _TEST_DEFERRED_MAPPING > -static bool do_comm_event; > -static event_t comm_event; > -#endif > - > - > /* Some architectures (e.g., ppc64) do not use the same event value (code) for oprofile > * and for perf_events. The operf-record process requires event values that perf_events > * understands, but the operf-read process requires oprofile event values. The purpose of > @@ -233,7 +220,8 @@ static void __handle_fork_event(event_t * event) > { > if (cverb << vconvert) > cout << "PERF_RECORD_FORK for tgid/tid = " << event->fork.pid > - << "/" << event->fork.tid << endl; > + << "/" << event->fork.tid << "; parent " << event->fork.ppid > + << "/" << event->fork.ptid << endl; > > map<pid_t, operf_process_info *>::iterator it; > operf_process_info * parent = NULL; > @@ -246,62 +234,68 @@ static void __handle_fork_event(event_t * event) > // Create a new proc info object for the parent, but mark it invalid since we have > // not yet received a COMM event for this PID. > parent = new operf_process_info(event->fork.ppid, app_name ? app_name : NULL, > - app_name != NULL, false); > + app_name != NULL, false); > if (cverb << vconvert) > - cout << "Adding new proc info to collection for PID " << event->fork.ppid << endl; > + cout << "Adding new proc info to collection for parent PID " > + << event->fork.ppid << endl; > process_map[event->fork.ppid] = parent; > } > > + /* If the forked process's pid is the same as the parent's, we simply ignore the FORK > + * event. This is because operf_process_info objects are stored in the map collection > + * by pid, meaning that the forked process and its parent reference the same > + * operf_process_info object. > + */ > + if (event->fork.pid == event->fork.ppid) > + return; > + > it = process_map.find(event->fork.pid); > if (it == process_map.end()) { > - forked_proc = new operf_process_info(event->fork.pid, > - parent->get_app_name().c_str(), > - parent->is_appname_valid(), parent->is_valid()); > + forked_proc = new operf_process_info(event->fork.pid, NULL, false, false); > if (cverb << vconvert) > - cout << "Adding new proc info to collection for PID " << event->fork.pid << endl; > + cout << "Adding new proc info to collection for forked PID " > + << event->fork.pid << endl; > process_map[event->fork.pid] = forked_proc; > - forked_proc->connect_forked_process_to_parent(parent); > - parent->add_forked_pid_association(forked_proc); > - if (cverb << vconvert) > - cout << "Connecting forked proc " << event->fork.pid << " to parent" << endl; > + forked_proc->set_fork_info(parent); > } else { > - /* There are two ways that we may get to this point. One way is if > - * we've received a COMM event for the forked process before the FORK event. > + /* > * Normally, if parent process A forks child process B which then does an exec, we > - * first see a FORK event, followed by a COMM event. But apparently there's no > - * guarantee in what order these events may be seen by userspace. No matter -- since > - * the exec'ed process is now a standalone process (which will get MMAP events > - * for all of its mmappings, there's no need to re-associate it back to the parent > - * as we do for a non-exec'ed forked process. So we'll just ignore it. > + * first see a FORK event, followed by a COMM event. In this case, the > + * operf_process_info created for the forked process is marked as valid. But there's > + * no guarantee what order these events may be seen by userspace -- we could easily > + * get MMAP, FORK, and finally a COMM event, which is opposite of "expected". So we > + * must handle this. > * > - * But the second way that there may be an existing operf_process_info object is if > - * a new mmap event (a real MMAP event or a synthesized event (e.g. for hypervisor > - * mmapping) occurred for the forked process before a COMM event was received for it. > - * In this case, the forked process will be marked invalid until the COMM event > - * is received. But if this process does *not* do an exec, there will never be a > - * COMM event for it. Such forked processes should be tightly connected to their > - * parent, so we'll go ahead and associate the forked process with its parent. > - * If a COMM event comes later for the forked process, we'll disassociate them. > + * For a valid operf_process_info, if the forked process pid is unique from that of > + * the parent, it implies a COMM event was already received for this forked process. > + * Such processes are treated as standalone processes, so we ignore the FORK event. > + * For all other cases, if the forked process has not already been associated with > + * its parent (i.e., !is_forked()), we go ahead and set that association. > */ > + > forked_proc = it->second; > - if (!forked_proc->is_valid()) { > - forked_proc->connect_forked_process_to_parent(parent); > - parent->add_forked_pid_association(forked_proc); > + if (forked_proc->is_valid()) { > + // Ignore the FORK event > if (cverb << vconvert) > - cout << "Connecting existing incomplete forked proc " << event->fork.pid > - << " to parent" << endl; > + cout << "Forked proc " << event->fork.pid > + << " is currently valid (i.e., PERF_RECORD_COMM already received)," > + << " so is independent from parent " > + << event->fork.ppid << endl; > + return; > + } > + > + if (!forked_proc->is_forked()) { > + forked_proc->set_fork_info(parent); > + if (cverb << vconvert) > + cout << "Set fork info for PID " << event->fork.pid > + << " with parent " << event->fork.ppid << endl; > } > } > } > > + > static void __handle_comm_event(event_t * event) > { > -#ifdef _TEST_DEFERRED_MAPPING > - if (!do_comm_event) { > - comm_event = event; > - return; > - } > -#endif > if (cverb << vconvert) > cout << "PERF_RECORD_COMM for " << event->comm.comm << ", tgid/tid = " > << event->comm.pid << "/" << event->comm.tid << endl; > @@ -309,14 +303,6 @@ static void __handle_comm_event(event_t * event) > map<pid_t, operf_process_info *>::iterator it; > it = process_map.find(event->comm.pid); > if (it == process_map.end()) { > - /* TODO: Handle system housekeeping tasks. For certain kinds of processes, > - * we will get a COMM event, but never get an MMAP event (e.g, kpsmoused). > - * Without receiving an MMAP event, we have no clue whether the name given > - * with the COMM event is a full "appname" or not, so the operf_process_info > - * is marked invalid. We end up dropping all samples for such tasks when > - * doing a system-wide profile. > - */ > - > /* A COMM event can occur as the result of the app doing a fork/exec, > * where the COMM event is for the forked process. In that case, we > * pass the event->comm field as the appname argument to the ctor. > @@ -330,34 +316,58 @@ static void __handle_comm_event(event_t * event) > appname_arg = event->comm.comm; > is_complete_appname = false; > } > - operf_process_info * proc = new operf_process_info(event->comm.pid,appname_arg, > - is_complete_appname, true); > + /* If tid != pid, this may be a forked process for which we've not yet received > + * the PERF_RECORD_FORK event, nor have we received any other events for the > + * process (e.g., COMM event for parent). We mark such proc infos as "invalid" so we > + * don't falsely attribute samples to a child thread which should, instead, > + * be attributed to its parent. If this is indeed a forked process, we should > + * eventually receive a COMM event for the parent (where tid==pid), at which time, > + * we'll mark the proc info valid. If we never receive a COMM event for a parent, > + * the proc info will get marked valid during reprocessing so we can attribute > + * deferred samples at that time. > + */ > + > + bool valid_bit = (event->comm.pid == event->comm.tid); > + operf_process_info * proc = new operf_process_info(event->comm.pid, appname_arg, > + is_complete_appname, valid_bit); > if (cverb << vconvert) > cout << "Adding new proc info to collection for PID " << event->comm.pid << endl; > process_map[event->comm.pid] = proc; > } else { > - if (it->second->is_valid()) { > - if (it->second->is_forked()) { > - /* If the operf_process_info object we found was created as a result of > - * a FORK event, then it was associated with the parent process and contains > - * the parent's appname. But now we're getting a COMM event for this forked > - * process, which means it did an exec, so we need to change the appname > - * to the executable associated with this COMM event, which is done via > - * calling disassociate_from_parent(). > - */ > - if (cverb << vconvert) > - cout << "Dis-associating forked proc " << event->comm.pid > - << " from parent" << endl; > - it->second->disassociate_from_parent(event->comm.comm); > - } else { > - if (cverb << vconvert) > - cout << "Received extraneous COMM event for " << event->comm.comm > - << ", PID " << event->comm.pid << endl; > + /* If we reach this point, it means a proc info object for this pid already exists; > + * however, if it was created by something other than a "valid" COMM event (e.g., MMAP event), > + * it's 'valid' bit will be set to false. NOTE: A "valid" COMM event is one in which > + * tid==pid. > + * > + * We must handle the following situations: > + * o If valid: > + * - Existing proc info created for a parent (i.e., tid == pid), and the current > + * COMM event is for a child -- and we ignore all child COMM events. > + * - Existing proc info may have invalid appname, so we call set_appname() > + * and see if this COMM event has an appropriate appname. > + * > + * o If not valid: > + * - Existing proc info was created for the parent by an MMAP type of event, and the > + * current COMM event is for the parent. > + * - Existing proc info was created by FORK; now that we have a COMM event for it, > + * the process should be treated as a standalone process, so we call > + * try_disassociate_from_parent(). > + */ > + if (!it->second->is_valid()) { > + // Ignore child COMM events (i.e., pid != tid). > + if (event->comm.pid == event->comm.tid) { > + if (it->second->is_forked()) { > + it->second->try_disassociate_from_parent(event->comm.comm); > + } else { > + // Existing proc info created by MMAP event or some such > + it->second->set_valid(); > + it->second->set_appname(event->comm.comm, false); > + } > } > } else { > - if (cverb << vconvert) > - cout << "Processing deferred mappings" << endl; > - it->second->process_deferred_mappings(event->comm.comm); > + if ((event->comm.pid == event->comm.tid) && !it->second->is_appname_valid()) { > + it->second->set_appname(event->comm.comm, false); > + } > } > } > } > @@ -399,9 +409,10 @@ static void __handle_mmap_event(event_t * event) > mapping->pgoff = event->mmap.pgoff; > > if (cverb << vconvert) { > - cout << "PERF_RECORD_MMAP for " << event->mmap.filename << endl; > - cout << "\tstart_addr: " << hex << mapping->start_addr; > - cout << "; end addr: " << mapping->end_addr << endl; > + cout << "PERF_RECORD_MMAP for process " << hex << event->mmap.pid << "/" > + << event->mmap.tid << ": " << event->mmap.filename << endl; > + cout << "\tstart_addr: " << hex << mapping->start_addr > + << "; end addr: " << mapping->end_addr << endl; > } > > if (event->header.misc & PERF_RECORD_MISC_USER) > @@ -458,28 +469,14 @@ static void __handle_mmap_event(event_t * event) > > operf_process_info * proc = new operf_process_info(event->mmap.pid, appname_arg, > is_complete_appname, false); > - proc->add_deferred_mapping(mapping); > - if (cverb << vconvert) > - cout << "Added deferred mapping " << event->mmap.filename > - << " for new process_info object" << endl; > process_map[event->mmap.pid] = proc; > -#ifdef _TEST_DEFERRED_MAPPING > - if (!do_comm_event) { > - do_comm_event = true; > - __handle_comm_event(comm_event, out); > - } > -#endif > - } else if (!it->second->is_valid()) { > - it->second->add_deferred_mapping(mapping); > - if (cverb << vconvert) > - cout << "Added deferred mapping " << event->mmap.filename > - << " for existing but incomplete process_info object" << endl; > + proc->process_mapping(mapping, false); > } else { > - if (cverb << vconvert) > - cout << "Process mapping for " << event->mmap.filename << " on behalf of " > - << event->mmap.pid << endl; > - it->second->process_new_mapping(mapping); > + it->second->process_mapping(mapping, false); > } > + if (cverb << vconvert) > + cout << "Process mapping for " << event->mmap.filename << " on behalf of " > + << event->mmap.pid << endl; > } > } > > @@ -498,22 +495,12 @@ static struct operf_transient * __get_operf_trans(struct sample_data * data, boo > } else { > // Find operf_process info for data.tgid. > std::map<pid_t, operf_process_info *>::const_iterator it = process_map.find(data->pid); > - if (it != process_map.end() && (it->second->is_appname_valid())) { > + if (it != process_map.end() && it->second->is_appname_valid()) { > proc = it->second; > } else { > - /* This can happen for the following reasons: > - * - We get a sample before getting a COMM or MMAP > - * event for the process being profiled > - * - The COMM event has been processed, but since that > - * only gives 16 chars of the app name, we don't > - * have a valid app name yet > - * - The kernel incorrectly records a sample for a > - * process other than the one we requested (not > - * likely -- this would be a kernel bug if it did) > - * > - */ > + // This can validly happen if get a sample before getting a COMM event for the process > if ((cverb << vconvert) && !first_time_processing) { > - cerr << "Dropping sample -- process info unavailable" << endl; > + cout << "Dropping sample -- process info unavailable for PID " << data->pid << endl; > if (kernel_mode) > operf_stats[OPERF_NO_APP_KERNEL_SAMPLE]++; > else > @@ -585,7 +572,7 @@ static struct operf_transient * __get_operf_trans(struct sample_data * data, boo > } else { > if ((cverb << vconvert) && !first_time_processing) { > string domain = trans.in_kernel ? "kernel" : "userspace"; > - cerr << "Discarding " << domain << " sample for process " << data->pid > + cout << "Discarding " << domain << " sample for process " << data->pid > << " where no appropriate mapping was found. (pc=0x" > << hex << data->ip <<")" << endl; > operf_stats[OPERF_LOST_NO_MAPPING]++; > @@ -772,7 +759,7 @@ static int __handle_sample_event(event_t * event, u64 sample_type) > domain = "unknown"; > break; > } > - cerr << "Discarding sample from " << domain << " domain: " > + cout << "Discarding sample from " << domain << " domain: " > << hex << data.ip << endl; > } > goto out; > @@ -952,13 +939,32 @@ void OP_perf_utils::op_reprocess_unresolved_events(u64 sample_type, bool print_p > int num_recs = 0; > > cverb << vconvert << "Reprocessing samples" << endl; > + > + map<pid_t, operf_process_info *>::iterator procs = process_map.begin(); > + for (; procs != process_map.end(); procs++) { > + if (!procs->second->is_valid()) { > + if (procs->second->is_forked()) { > + procs->second->connect_forked_process_to_parent(); > + } else { > + procs->second->set_valid(); > + } > + } > + // Force the appname_valid to true so we don't drop any samples for this process. > + // The appname may not be accurate, but it's the best we can do now. > + procs->second->set_appname_valid(); > + } > list<event_t *>::const_iterator it = unresolved_events.begin(); > + int data_error = 0; > for (; it != unresolved_events.end(); it++) { > event_t * evt = (*it); > + if (data_error < 0) { > + free(evt); > + continue; > + } > // This is just a sanity check, since all events in this list > // are unresolved sample events. > if (evt->header.type == PERF_RECORD_SAMPLE) { > - __handle_sample_event(evt, sample_type); > + data_error = __handle_sample_event(evt, sample_type); > free(evt); > num_recs++; > if ((num_recs % 1000000 == 0) && print_progress) > |
From: Maynard J. <may...@us...> - 2013-04-25 16:14:25
|
On 04/24/2013 01:21 PM, Mauricio Faria de Oliveira wrote: > On 04/19/2013 07:24 PM, Maynard Johnson wrote: >> My colleague, Mauricio (on cc), used operf to do system-wide profiling >> of the specjbb benchmark, and he identified several scenarios where >> operf was dropping samples or attributing them to incorrect applications. >> We worked together to debug the problems with operf and re-test it >> under heavy load. This was an iterative process, as many of the >> problems were not easily reproduced and difficult to debug. The >> culmination of several weeks of work to improve operf is the >> patch attached below. Please test and review it. Comments are >> very welcome. >> > > Hi Maynard, > > The patch significantly improved the consistency of reports (opreport) > between the scenarios of starting operf 'after' or 'before' the JVMs. > (The 'before' case was inconsistent/problematic, which is now addressed > w/ this patch) > > Samples from Java application/libraries/JITed code are now assigned to > the java program image. > > Since the benchmark consumes all available CPUs to up ~100% userspace > utilization, any overhead from profiling tools was a concern. So, the > test scenarios included enabling/disabling lazy-conversion and reserving > or not reserving CPUs for operf. > > The results across those permutations were consistent between starting > operf before/after the JVMs. > > So, I believe this is good to go, as it resolved the issues with > samples being incorrectly attributed. > > Thank you very much, Mauricio, Thanks again for bringing this issue to my attention and for all of the help in testing various scenarios. I've gone ahead and pushed this patch upstream since the fixes are definitely needed in system-wide profiling scenarios where the apps of interest are doing fork/execs (e.g., java). I am more than open to any review comments or other feedback.. -Maynard > > -- > Mauricio Faria de Oliveira > IBM Linux Technology Center > > >> --------------------------------------------------------------------- >> >> Fix holes in operf system-wide profiling of forked processes >> >> Using operf to do system-wide profiling of the specjbb benchmark >> exposed some holes in how operf was processing the perf_events >> data coming from the kernel. Some of the events we can get from >> the kernel are: >> - PERF_RECORD_COMM >> - PERF_RECORD_FORK >> - PERF_RECORD_MMAP >> - PERF_RECORD_SAMPLE >> >> The "COMM" event is to notify us of the start of an executable >> application. The "FORK" event tells us when a process forks >> another process. The "MMAP" event informs us when a shared library >> (or executable anonymous memory, or the executable file itself, etc.) >> has been mmap'ed into a process's address space. A "SAMPLE" >> event occurs each time the kernel takes a sample for a process. >> >> There is no guarantee in what order these events may arrive from >> the kernel, and when a large system (say, 64 CPUs) is running >> the specjbb benchmark full bore, with all processors pegged to >> 100%, you can get some very strange out-of-order looking >> sequence of events. Things get even stranger when using Java7 >> versus Java6 since Java7 spawns many more threads. >> >> The operf code had several issues where such out-of-order >> events were not handled properly, so some major changes were >> required in the code. >> >> Signed-off-by: Maynard Johnson <may...@us...> >> --- >> libperf_events/operf_process_info.cpp | 340 +++++++++++++++++++++------------ >> libperf_events/operf_process_info.h | 68 +++---- >> libperf_events/operf_utils.cpp | 244 ++++++++++++------------ >> 3 files changed, 367 insertions(+), 285 deletions(-) >> [snip] |