Thread: [PATCH] Improve sample collection in multi-threaded apps when using "--pid" option

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Improve sample collection in multi-threaded apps when using "--pid" option

See oprofile bug # 260 (https://sourceforge.net/p/oprofile/bugs/260/) for reference.

operf may fail to collect samples for new threads or forked
processes in a multi-threaded app under certain conditions:

  - If operf is started on a multi-threaded app (i.e., one that uses
    pthreads) with the "--pid" option
  - If said app has already started at least one thread
  - It the app either creates new threads or forks new processes
    after operf has started

then no samples are collected for those new threads/processes.

This patch fixes that problem (mostly) and documents the limitations and
issues when using the "--pid" option. The limitations and issues are:

 1. When using "--pid" to profile a multi-threaded application that also forks
    new processes, samples for processes that are forked before profiling is
    started may not be recorded (depending on timing of thread creation and
    when operf is started)

 2. The "--lazy-conversion" option is not recommended to be used in conjunction
    with the --pid option for profiling multi-threaded processes. Depending on
    the order of thread creation (or forking of new processes), you may not get
    any samples for the new threads/processes.

Signed-off-by: Maynard Johnson <may...@us...>
---
 doc/operf.1.in                   |   35 ++++++---
 libperf_events/operf_counter.cpp |  151 ++++++++++++++++++++++++++++++++------
 libperf_events/operf_counter.h   |   24 ++++--
 libperf_events/operf_utils.cpp   |   39 +++++++++-
 pe_profiling/operf.cpp           |   58 ++++++++++++--
 5 files changed, 254 insertions(+), 53 deletions(-)

diff --git a/doc/operf.1.in b/doc/operf.1.in
index 72a46ed..5ce43c0 100644
--- a/doc/operf.1.in
+++ b/doc/operf.1.in
@@ -35,21 +35,19 @@ and lost samples, are written to the operf.log file that can be found in the
 <session_dir>/samples directory.
 .br
 
-.SH OPTIONS
+.SH RUN MODES
+One (and only one) of the following
+.SB run modes
+must be specified:
 .TP
 .BI command [args]
 The command or application to be profiled.
 .I args
-are the input arguments that the command or application requires.  One (and only one) of either
-.I command
-,
-.I --pid
-or
-.I --system-wide
-is required.
+are the input arguments that the command or application requires.
 .br
 .TP
 .BI "--pid / -p " PID
+.RS
 This option enables operf to profile a running application.
 .I PID
 should be the process ID of the process you wish to profile.  When
@@ -65,7 +63,13 @@ data it has collected.  Use
 .BI -SIGINT
 .BI <operf-PID>
 for this purpose.
-.br
+.P
+.B Limitation:
+When using this option to profile a multi-threaded application that also forks
+new processes, be aware that samples for processes that are forked before profiling
+is started may not be recorded (depending on timing of thread creation and when
+operf is started).
+.RE
 .TP
 .BI "--system-wide / -s"
 This option is for performing a system-wide profile.  You must
@@ -86,6 +90,7 @@ that when running operf with this option, the user's current working
 directory should be /root or a subdirectory of /root to avoid
 storing sample data files in locations accessible by regular users.
 .br
+.SH OTHER OPTIONS
 .TP
 .BI "--vmlinux / -k " vmlinux_path
 .RS
@@ -188,6 +193,7 @@ directory on the current path.
 .br
 .TP
 .BI "--lazy-conversion / -l"
+.RS
 Use this option to reduce the overhead of
 .BI operf
 during profiling. Normally, profile data received from the kernel is converted
@@ -200,7 +206,16 @@ particularly on busy multi-processor systems. The
 option directs
 .BI operf
 to wait until profiling is completed to do the conversion of profile data.
-.br
+.P
+.B Note:
+This option is
+.B not
+recommended to be used in conjunction with the
+.I --pid
+option for profiling multi-threaded processes. Depending
+on the order of thread creation (or forking of new processes),
+you may not get any samples for the new threads/processes.
+.RE
 .TP
 .BI "--append / -a"
 By default,
diff --git a/libperf_events/operf_counter.cpp b/libperf_events/operf_counter.cpp
index dd738aa..9986576 100644
--- a/libperf_events/operf_counter.cpp
+++ b/libperf_events/operf_counter.cpp
@@ -1,5 +1,5 @@
 /**
- * @file pe_profiling/operf_counter.cpp
+ * @file libperf_events/operf_counter.cpp
  * C++ class implementation that abstracts the user-to-kernel interface
  * for using Linux Performance Events Subsystem.
  *
@@ -11,7 +11,7 @@
  * (C) Copyright IBM Corp. 2011
  *
  * Modified by Maynard Johnson <may...@us...>
- * (C) Copyright IBM Corporation 2012
+ * (C) Copyright IBM Corporation 2012, 2014
  *
 */
 
@@ -50,6 +50,7 @@ extern bool throttled;
 extern size_t mmap_size;
 extern size_t pg_sz;
 extern bool use_cpu_minus_one;
+extern bool track_new_forks;
 
 namespace {
 
@@ -233,7 +234,7 @@ operf_counter::~operf_counter() {
 }
 
 
-int operf_counter::perf_event_open(pid_t pid, int cpu, operf_record * rec)
+int operf_counter::perf_event_open(pid_t pid, int cpu, operf_record * rec, bool print_error)
 {
 	struct {
 		u64 count;
@@ -249,16 +250,21 @@ int operf_counter::perf_event_open(pid_t pid, int cpu, operf_record * rec)
 		int ret = -1;
 		cverb << vrecord << "perf_event_open failed: " << strerror(errno) << endl;
 		if (errno == EBUSY) {
-			cerr << "The performance monitoring hardware reports EBUSY. Is another profiling tool in use?" << endl
-			     << "On some architectures, tools such as oprofile and perf being used in system-wide "
-			     << "mode can cause this problem." << endl;
+			if (print_error) {
+				cerr << "The performance monitoring hardware reports EBUSY. Is another profiling tool in use?" << endl
+				     << "On some architectures, tools such as oprofile and perf being used in system-wide "
+				     << "mode can cause this problem." << endl;
+			}
 			ret = OP_PERF_HANDLED_ERROR;
 		} else if (errno == ESRCH) {
-			cerr << "!!!! No samples collected !!!" << endl;
-			cerr << "The target program/command ended before profiling was started." << endl;
+			if (print_error) {
+				cerr << "!!!! No samples collected !!!" << endl;
+				cerr << "The target program/command ended before profiling was started." << endl;
+			}
 			ret = OP_PERF_HANDLED_ERROR;
 		} else {
-			cerr << "perf_event_open failed with " << strerror(errno) << endl;
+			if (print_error)
+				cerr << "perf_event_open failed with " << strerror(errno) << endl;
 		}
 		return ret;
 	}
@@ -299,7 +305,8 @@ operf_record::~operf_record()
 
 operf_record::operf_record(int out_fd, bool sys_wide, pid_t the_pid, bool pid_running,
                            vector<operf_event_t> & events, vmlinux_info_t vi, bool do_cg,
-                           bool separate_by_cpu, bool out_fd_is_file)
+                           bool separate_by_cpu, bool out_fd_is_file,
+                           int _convert_read_pipe, int _convert_write_pipe)
 {
 	struct sigaction sa;
 	sigset_t ss;
@@ -317,6 +324,8 @@ operf_record::operf_record(int out_fd, bool sys_wide, pid_t the_pid, bool pid_ru
 	valid = false;
 	poll_data = NULL;
 	output_fd = out_fd;
+	read_comm_pipe = _convert_read_pipe;
+	write_comm_pipe = _convert_write_pipe;
 	write_to_file = out_fd_is_file;
 	opHeader.data_size = 0;
 	num_cpus = -1;
@@ -657,6 +666,7 @@ void operf_record::setup()
 		 * "inherit" on the perf_event_attr we pass to perf_event_open when cpu is '-1'.
 		 */
 		bool inherit = !profile_process_group;
+		std::map<u32, struct comm_event>::iterator proc_it = procs.begin();
 		for (unsigned proc_idx = 0; proc_idx < num_procs; proc_idx++) {
 			for (unsigned event = 0; event < evts.size(); event++) {
 				/* For a parent process, comm.tid==comm.pid, but for child
@@ -664,14 +674,17 @@ void operf_record::setup()
 				 * we must use comm.tid for the perf_event_open call.  So
 				 * we can use comm.tid for all cases.
 				 */
-				pid_t pid_for_open = profile_process_group ? procs[proc_idx].tid
-				                                           : pid_to_profile;
+				pid_t pid_for_open;
+				if (profile_process_group)
+					pid_for_open = proc_it++->second.tid;
+				else
+					pid_for_open = pid_to_profile;
 				operf_counter op_ctr(operf_counter(evts[event],
 				                                   (!pid_started && !system_wide),
 				                                   callgraph, separate_cpu,
 				                                   inherit, event));
 				if ((rc = op_ctr.perf_event_open(pid_for_open,
-				                                 real_cpu, this)) < 0) {
+				                                 real_cpu, this, true)) < 0) {
 					err_msg = "Internal Error.  Perf event setup failed.";
 					goto error;
 				}
@@ -716,24 +729,93 @@ void operf_record::record_process_info(void)
 {
 	map<unsigned int, unsigned int> pids_mapped;
 	pid_t last_tgid = -1;
-	for (unsigned int proc_idx = 0; proc_idx < procs.size(); proc_idx++)
+	std::map<u32, struct comm_event>::iterator proc_it = procs.begin();
+	for (unsigned int proc_idx = 0; proc_idx < procs.size(); proc_idx++, proc_it++)
 	{
-		int num = OP_perf_utils::op_write_output(output_fd, &procs[proc_idx],
-		                                         procs[proc_idx].header.size);
+		struct comm_event ce = proc_it->second;
+		int num = OP_perf_utils::op_write_output(output_fd, &ce, ce.header.size);
 		add_to_total(num);
 		if (cverb << vrecord)
-			cout << "Created COMM event for " << procs[proc_idx].comm << endl;
+			cout << "Created COMM event for " << ce.comm << endl;
 
-		if (((pid_t)(procs[proc_idx].pid) == last_tgid) ||
-				(pids_mapped.find(procs[proc_idx].pid) != pids_mapped.end()))
+		if (((pid_t)(ce.pid) == last_tgid) ||
+				(pids_mapped.find(ce.pid) != pids_mapped.end()))
 			continue;
-		OP_perf_utils::op_record_process_exec_mmaps(procs[proc_idx].tid,
-		                                            procs[proc_idx].pid,
+		OP_perf_utils::op_record_process_exec_mmaps(ce.tid,
+		                                            ce.pid,
 		                                            output_fd, this);
-		pids_mapped[procs[proc_idx].pid] = last_tgid = procs[proc_idx].pid;
+		pids_mapped[ce.pid] = last_tgid = ce.pid;
+	}
+}
+
+int operf_record::_start_recoding_new_thread(pid_t id)
+{
+	string err_msg;
+	int num_mmaps, rc, fd_for_set_output = -1;
+	struct comm_event ce;
+	u64 sample_id;
+	struct pollfd * old_polldata = poll_data;
+
+	num_mmaps = sizeof(poll_data)/sizeof(poll_data[0]);
+	num_mmaps++;
+	poll_data = new struct pollfd [num_mmaps];
+	// Copy only the existing pollfd objects from the array.  The new pollfd will
+	// be filled in via the call to _prepare_to_record_one_fd.
+	for (int i = 0; i < num_mmaps - 1; i++)
+		poll_data[i] = old_polldata[i];
+	delete[] old_polldata;
+	// Make a pseudo comm_event object.  At this point, the
+	// only field we need to set is tid.
+	memset(&ce, sizeof(ce), 0);
+	ce.tid = id;
+	add_process(ce);
+
+	for (unsigned event = 0; event < evts.size(); event++) {
+		operf_counter op_ctr(operf_counter(evts[event],
+		                                   (!pid_started && !system_wide),
+		                                   callgraph, separate_cpu,
+		                                   false, event));
+		if (op_ctr.perf_event_open(id, -1, this, false) < 0) {
+			sample_id = OP_PERF_NO_SAMPLE_ID;
+			// Send special value to convert process to indicate failure
+			ssize_t len = write(write_comm_pipe, &sample_id, sizeof(sample_id));
+			if (len < 0)
+				perror("Internal error on convert write_comm_pipe");
+			return -1;
+		}
+		perfCounters.push_back(op_ctr);
+		int fd = op_ctr.get_fd();
+		if (event == 0) {
+			rc = _prepare_to_record_one_fd(num_mmaps - 1, fd);
+			fd_for_set_output = fd;
+		} else {
+			if ((rc = ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT,
+			                fd_for_set_output)) < 0)
+				perror("_start_recoding_new_thread: ioctl #1 failed");
+		}
+
+		if (rc < 0)
+			return rc;
+
+		if ((rc = ioctl(fd, PERF_EVENT_IOC_ENABLE)) < 0) {
+			perror("_start_recoding_new_thread: ioctl #2 failed");
+			return rc;
+		}
+
+		sample_id = opHeader.h_attrs[event].ids.back();
+		ssize_t len = write(write_comm_pipe, &sample_id, sizeof(sample_id));
+		if (len < 0)
+			perror("Internal error on convert write_comm_pipe");
+		else if (len != sizeof(sample_id))
+			cerr << "Incomplete write convert to write_comm_pipe" << endl;
+		else
+			cverb << vrecord << "Sent sample_id " << sample_id << " to convert process" << endl;
 	}
+
+	return 0;
 }
 
+
 void operf_record::recordPerfData(void)
 {
 	bool disabled = false;
@@ -744,6 +826,8 @@ void operf_record::recordPerfData(void)
 	cerr << "operf: Profiler started" << endl;
 	while (1) {
 		int prev = sample_reads;
+		pid_t pi;
+		ssize_t len;
 
 		for (size_t i = 0; i < samples_array.size(); i++) {
 			if (samples_array[i].base)
@@ -755,6 +839,20 @@ void operf_record::recordPerfData(void)
 		if (prev == sample_reads) {
 			(void)poll(poll_data, poll_count, -1);
 		}
+		if (!quit && track_new_forks && procs.size() > 1) {
+			len = read(read_comm_pipe, &pi, sizeof(pi));
+
+			if (len < 0 && errno != EAGAIN) {
+				cverb << vrecord << "Non-fatal error: read_comm_pipe returned too few bytes" << endl;
+			} else if (len == sizeof(pi) && (procs.find(pi) == procs.end())) {
+				// Start profiling this new thread
+				cverb << vrecord << "Start recording for new thread " << pi << endl;
+				// Don't treat as fatal error if it doesn't work
+				if (_start_recoding_new_thread(pi) < 0)
+					cerr << "Unable to collect samples for forked process " << pi
+					     << ". Process may have ended before recording could be started." << endl;
+			}
+		}
 
 		if (quit) {
 			for (unsigned int i = 0; i < perfCounters.size(); i++)
@@ -768,9 +866,11 @@ void operf_record::recordPerfData(void)
 }
 
 void operf_read::init(int sample_data_pipe_fd, string input_filename, string samples_loc, op_cpu cputype,
-                      bool systemwide)
+                      bool systemwide, int _record_write_pipe, int _record_read_pipe)
 {
 	sample_data_fd = sample_data_pipe_fd;
+	read_comm_pipe = _record_read_pipe;
+	write_comm_pipe = _record_write_pipe;
 	inputFname = input_filename;
 	sampledir = samples_loc;
 	cpu_type = cputype;
@@ -782,6 +882,11 @@ operf_read::~operf_read()
 	evts.clear();
 }
 
+void operf_read::add_sample_id_to_opHeader(u64 sample_id)
+{
+	for (unsigned int i = 0; i < evts.size(); i++)
+		opHeader.h_attrs[i].ids.push_back(sample_id);
+}
 
 int operf_read::_read_header_info_with_ifstream(void)
 {
diff --git a/libperf_events/operf_counter.h b/libperf_events/operf_counter.h
index 0874616..dfb4336 100644
--- a/libperf_events/operf_counter.h
+++ b/libperf_events/operf_counter.h
@@ -1,5 +1,5 @@
 /**
- * @file pe_profiling/operf_counter.h
+ * @file libperf_events/operf_counter.h
  * C++ class definition that abstracts the user-to-kernel interface
  * for using Linux Performance Events Subsystem.
  *
@@ -11,7 +11,7 @@
  * (C) Copyright IBM Corp. 2011
  *
  * Modified by Maynard Johnson <may...@us...>
- * (C) Copyright IBM Corporation 2012
+ * (C) Copyright IBM Corporation 2012, 2014
  *
  */
 
@@ -51,6 +51,7 @@ op_perf_event_open(struct perf_event_attr * attr,
 }
 
 #define OP_PERF_HANDLED_ERROR -101
+#define OP_PERF_NO_SAMPLE_ID 0xdeadbeefdeadbeefULL
 
 
 class operf_counter {
@@ -58,7 +59,7 @@ public:
 	operf_counter(operf_event_t & evt, bool enable_on_exec, bool callgraph,
 	              bool separate_by_cpu, bool inherit, int event_number);
 	~operf_counter();
-	int perf_event_open(pid_t pid, int cpu, operf_record * pr);
+	int perf_event_open(pid_t pid, int cpu, operf_record * pr, bool print_error);
 	const struct perf_event_attr * the_attr(void) const { return &attr; }
 	int get_fd(void) const { return fd; }
 	int get_id(void) const { return id; }
@@ -82,12 +83,13 @@ public:
 	 */
 	operf_record(int output_fd, bool sys_wide, pid_t the_pid, bool pid_running,
 	             std::vector<operf_event_t> & evts, OP_perf_utils::vmlinux_info_t vi,
-	             bool callgraph, bool separate_by_cpu, bool output_fd_is_file);
+	             bool callgraph, bool separate_by_cpu, bool output_fd_is_file,
+	             int _convert_read_pipe, int _convert_write_pipe);
 	~operf_record();
 	void recordPerfData(void);
 	int out_fd(void) const { return output_fd; }
 	void add_to_total(int n) { total_bytes_recorded += n; }
-	void add_process(struct comm_event proc) { procs.push_back(proc); }
+	void add_process(struct comm_event proc) { procs[proc.tid] = proc; }
 	unsigned int get_total_bytes_recorded(void) const { return total_bytes_recorded; }
 	void register_perf_event_id(unsigned counter, u64 id, perf_event_attr evt_attr);
 	bool get_valid(void) { return valid; }
@@ -97,11 +99,14 @@ private:
 	void setup(void);
 	int prepareToRecord(void);
 	int _prepare_to_record_one_fd(int idx, int fd);
+	int _start_recoding_new_thread(pid_t id);
 	void record_process_info(void);
 	void write_op_header_info(void);
 	int _write_header_to_file(void);
 	int _write_header_to_pipe(void);
 	int output_fd;
+	int read_comm_pipe;
+	int write_comm_pipe;
 	bool write_to_file;
 	// Array of size 'num_cpus_used_for_perf_event_open * num_pids * num_events'
 	struct pollfd * poll_data;
@@ -113,7 +118,7 @@ private:
 	 * that information in a collection of type 'comm_event'.  We'll use this collection
 	 * for synthesizing PERF_RECORD_COMM events into the profile data stream.
 	 */
-	std::vector<struct comm_event> procs;
+	std::map<u32, struct comm_event> procs;
 	bool pid_started;
 	bool system_wide;
 	bool callgraph;
@@ -134,16 +139,21 @@ public:
 	: sample_data_fd(-1), inputFname(""), evts(_evts), cpu_type(CPU_NO_GOOD)
 	  { valid = syswide = false;}
 	void init(int sample_data_pipe_fd, std::string input_filename, std::string samples_dir, op_cpu cputype,
-	          bool systemwide);
+	          bool systemwide, int _record_write_pipe, int _record_read_pipe);
 	~operf_read();
 	int readPerfHeader(void);
 	unsigned int convertPerfData(void);
 	bool is_valid(void) {return valid; }
 	int get_eventnum_by_perf_event_id(u64 id) const;
 	inline const operf_event_t * get_event_by_counter(u32 counter) { return &evts[counter]; }
+	int get_write_comm_pipe(void) { return write_comm_pipe; }
+	int get_read_comm_pipe(void)  { return read_comm_pipe; }
+	void add_sample_id_to_opHeader(u64 sample_id);
 
 private:
 	int sample_data_fd;
+	int write_comm_pipe;
+	int read_comm_pipe;
 	std::string inputFname;
 	std::string sampledir;
 	std::ifstream istrm;
diff --git a/libperf_events/operf_utils.cpp b/libperf_events/operf_utils.cpp
index 68010c1..54a0ace 100644
--- a/libperf_events/operf_utils.cpp
+++ b/libperf_events/operf_utils.cpp
@@ -10,7 +10,7 @@
  * (C) Copyright IBM Corp. 2011
  *
  * Modified by Maynard Johnson <may...@us...>
- * (C) Copyright IBM Corporation 2012, 2013
+ * (C) Copyright IBM Corporation 2012, 2013, 2014
  *
  */
 
@@ -48,6 +48,7 @@ extern pid_t app_PID;
 extern verbose vrecord;
 extern verbose vconvert;
 extern void __set_event_throttled(int index);
+extern bool track_new_forks;
 
 using namespace std;
 
@@ -87,6 +88,8 @@ static void __handle_fork_event(event_t * event)
 	operf_process_info * parent = NULL;
 	operf_process_info * forked_proc = NULL;
 
+	// First, see if we already have a proc_info object for the parent process
+	// that did the fork
 	it = process_map.find(event->fork.ppid);
 	if (it != process_map.end()) {
 		parent = it->second;
@@ -101,14 +104,42 @@ static void __handle_fork_event(event_t * event)
 		process_map[event->fork.ppid] = parent;
 	}
 
-	 /* If the forked process's pid is the same as the parent's, we simply ignore the FORK
-	 * event. This is because operf_process_info objects are stored in the map collection
-	 * by pid, meaning that the forked process and its parent reference the same
+	/* If the user requested to profile by "--pid", then we must notify the
+	 * recording process whenever we see a fork event. If the record process
+	 * isn't already recording samples for this thread/process, it will start
+	 * recording now.
+	 */
+	if (track_new_forks) {
+		if (cverb << vconvert)
+			cout << "Inform record process of new pid/tid "
+			     << event->fork.pid << "/" << event->fork.tid << endl;
+		pid_t id = (event->fork.pid == event->fork.ppid) ? event->fork.tid :
+				event->fork.pid;
+		ssize_t len = write(operfRead.get_write_comm_pipe(), &id, sizeof(id));
+		if (len < 0)
+			perror("Internal error on record write_comm_pipe");
+		else if (len != sizeof(id))
+			cerr << "Incomplete write to record write_comm_pipe" << endl;
+		u64 sample_id;
+		// get sample id from recording process
+		len = read(operfRead.get_read_comm_pipe(), &sample_id, sizeof(sample_id));
+		if (sample_id == OP_PERF_NO_SAMPLE_ID) {
+			cverb << vconvert << "convert: No sample_id from record process" << endl;
+		} else {
+			cverb << vconvert << "Add sample_id " << sample_id << " to opHeader" << endl;
+			operfRead.add_sample_id_to_opHeader(sample_id);
+		}
+	}
+
+	/* If the forked process's pid is the same as the parent's, we simply ignore
+	 * the FORK event. This is because operf_process_info objects are stored in the map
+	 * collection by pid, meaning that the forked process and its parent reference the same
 	 * operf_process_info object.
 	 */
 	if (event->fork.pid == event->fork.ppid)
 		return;
 
+	// Now try to find a proc_info for the forked process itself.
 	it = process_map.find(event->fork.pid);
 	if (it == process_map.end()) {
 		forked_proc = new operf_process_info(event->fork.pid, NULL, false, false);
diff --git a/pe_profiling/operf.cpp b/pe_profiling/operf.cpp
index ec85f6c..51f55a5 100644
--- a/pe_profiling/operf.cpp
+++ b/pe_profiling/operf.cpp
@@ -11,7 +11,7 @@
  * (C) Copyright IBM Corp. 2011
  *
  * Modified by Maynard Johnson <may...@us...>
- * (C) Copyright IBM Corporation 2012, 2013
+ * (C) Copyright IBM Corporation 2012, 2013, 2014
  *
  */
 
@@ -77,12 +77,19 @@ int kptr_restrict;
 char * start_time_human_readable;
 std::vector<operf_event_t> events;
 operf_read operfRead(events);
+/* With certain operf options, we have to take extra steps to track new threads
+ * and processes that an app may create via pthread_create, fork, etc.  Note that
+ * any such thread or process creation APIs will result in a PERF_RECORD_FORK event,
+ * so we handle these new threads/processes in operf_utils::__handle_fork_event.
+ */
+bool track_new_forks;
 
 
 #define DEFAULT_OPERF_OUTFILE "operf.data"
 #define KERN_ADDR_SPACE_START_SYMBOL  "_text"
 #define KERN_ADDR_SPACE_END_SYMBOL    "_etext"
 
+static operf_record * operfRecord = NULL;
 static char * app_name_SAVE = NULL;
 static char ** app_args = NULL;
 static 	pid_t jitconv_pid = -1;
@@ -96,6 +103,14 @@ static char start_time_str[32];
 static bool jit_conversion_running;
 static void convert_sample_data(void);
 static int sample_data_pipe[2];
+static int app_ready_pipe[2], start_app_pipe[2], operf_record_ready_pipe[2];
+// The operf_convert_record_write_pipe is used for the convert process to send
+// forked PID data to the record process.
+static int operf_convert_record_write_pipe[2];
+// The operf_record_convert_write_pipe is used for the record process to send
+// data to the convert process in response to the forked PID data.
+static int operf_record_convert_write_pipe[2];
+
 bool ctl_c = false;
 bool pipe_closed = false;
 
@@ -225,8 +240,6 @@ void set_signals_for_parent(void)
 	}
 }
 
-static int app_ready_pipe[2], start_app_pipe[2], operf_record_ready_pipe[2];
-
 static string args_to_string(void)
 {
 	string ret;
@@ -310,6 +323,14 @@ int start_profiling(void)
 		perror("Internal error: could not create pipe");
 		return -1;
 	}
+	if (pipe2(operf_convert_record_write_pipe, O_NONBLOCK) < 0) {
+		perror("Internal error: could not create pipe");
+		return -1;
+	}
+	if (pipe(operf_record_convert_write_pipe) < 0) {
+		perror("Internal error: could not create pipe");
+		return -1;
+	}
 	operf_record_pid = fork();
 	if (operf_record_pid < 0) {
 		return -1;
@@ -318,6 +339,8 @@ int start_profiling(void)
 		int exit_code = EXIT_SUCCESS;
 		_set_basic_SIGINT_handler_for_child();
 		close(operf_record_ready_pipe[0]);
+		close(operf_convert_record_write_pipe[1]);
+		close(operf_record_convert_write_pipe[0]);
 		if (!operf_options::post_conversion)
 			close(sample_data_pipe[0]);
 		/*
@@ -343,11 +366,12 @@ int start_profiling(void)
 			} else {
 				outfd = sample_data_pipe[1];
 			}
-			operf_record operfRecord(outfd, operf_options::system_wide, app_PID,
+			operfRecord = new operf_record(outfd, operf_options::system_wide, app_PID,
 			                         (operf_options::pid == app_PID), events, vi,
 			                         operf_options::callgraph,
-			                         operf_options::separate_cpu, operf_options::post_conversion);
-			if (operfRecord.get_valid() == false) {
+			                         operf_options::separate_cpu, operf_options::post_conversion,
+			                         operf_convert_record_write_pipe[0], operf_record_convert_write_pipe[1]);
+			if (operfRecord->get_valid() == false) {
 				/* If valid is false, it means that one of the "known" errors has
 				 * occurred:
 				 *   - profiled process has already ended
@@ -370,9 +394,10 @@ int start_profiling(void)
 			}
 
 			// start recording
-			operfRecord.recordPerfData();
+			operfRecord->recordPerfData();
 			cverb << vdebug << "Total bytes recorded from perf events: " << dec
-					<< operfRecord.get_total_bytes_recorded() << endl;
+					<< operfRecord->get_total_bytes_recorded() << endl;
+			delete operfRecord;
 		} catch (const runtime_error & re) {
 			/* If the user does ctl-c, the operf-record process may get interrupted
 			 * in a system call, causing problems with writes to the sample data pipe.
@@ -388,6 +413,9 @@ int start_profiling(void)
 		_exit(exit_code);
 
 fail_out:
+		if (operfRecord)
+			delete operfRecord;
+
 		if (!ready){
 			/* ready==0 means we've not yet told parent we're ready,
 			 * but the parent is reading our pipe.  So we tell the
@@ -615,6 +643,10 @@ static end_code_t _run(void)
 			// parent
 			close(sample_data_pipe[0]);
 			close(sample_data_pipe[1]);
+			close(operf_convert_record_write_pipe[0]);
+			close(operf_convert_record_write_pipe[1]);
+			close(operf_record_convert_write_pipe[0]);
+			close(operf_record_convert_write_pipe[1]);
 		}
 	}
 
@@ -909,7 +941,11 @@ static void convert_sample_data(void)
 		inputfd = sample_data_pipe[0];
 		inputfname = "";
 	}
-	operfRead.init(inputfd, inputfname, current_sampledir, cpu_type, operf_options::system_wide);
+	close(operf_record_convert_write_pipe[1]);
+	close(operf_convert_record_write_pipe[0]);
+	operfRead.init(inputfd, inputfname, current_sampledir, cpu_type,
+	               operf_options::system_wide, operf_convert_record_write_pipe[1],
+	               operf_record_convert_write_pipe[0]);
 	if ((rc = operfRead.readPerfHeader()) < 0) {
 		if (rc != OP_PERF_HANDLED_ERROR)
 			cerr << "Error: Cannot create read header info for sample data " << endl;
@@ -1389,6 +1425,10 @@ static void process_args(int argc, char * const argv[])
 		string startEnd = _process_vmlinux(operf_options::vmlinux);
 		operf_create_vmlinux(operf_options::vmlinux.c_str(), startEnd.c_str());
 	}
+	if (operf_options::pid && !operf_options::post_conversion)
+		track_new_forks = true;
+	else
+		track_new_forks = false;
 
 	return;
 }
-- 
1.7.1





Thread: [PATCH] Improve sample collection in multi-threaded apps when using "--pid" option

oprofile-list