Thread: using oprofile to debug multi-processes programs on linux

Status: Alpha

Brought to you by: maynardj, movement, ssuthiku, wcohen

oprofile-list

using oprofile to debug multi-processes programs on linux

From: benzhi c. <cao...@gm...> - 2013-09-13 07:18:22

Hi, can Oprofile be  used to profile performance of multi-processes
programs ? And if it can, how to see the the performance of each process?
(P.S: The online manual shows that it can be used to profile multi-threads
programs, but I don't know whether it  can be used for multi-processes).
Any help will be appreciated, thanks a lot~
Best~
Emily

Re: using oprofile to debug multi-processes programs on linux

From: Maynard J. <may...@us...> - 2013-09-13 19:17:23

On 09/13/2013 02:18 AM, benzhi cao wrote:
> Hi, can Oprofile be  used to profile performance of multi-processes programs ? And if it can, how to see the the performance of each process? (P.S: The online manual shows that it can be used to profile multi-threads programs, but I don't know whether it  can be used for multi-processes). Any help will be appreciated, thanks a lot~
Hi, Emily,
Hopefully, you're using oprofile 0.9.9 so you can use operf instead of the older "legacy" opcontrol commands.  Using operf, you can specify to profile just the particular application (or process) you're interested in. If your application does fork/exec to create new child processes, operf will, by default, collect all sample data for the parent and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has some key bug fixes for operf relating to following forked children.)  You can specify "--separate-thread" (see operf's man page for details) so that samples are separated by process and thread.  If you do collect a --separate-thread profile, be aware that opreport, being a text-based report generator does not handle too many axes of separation very well.  You may get a report that looks like a jumbled mess, but would show a list of process IDs near the top of the report.  You could use that list of PIDs to generate per-process reports -- e.g., 'opreport tgid:<pid!
 #> [option
s]'.  In some cases, opreport gives up and tells you that you have to either provide a profile specification (e.g., 'tgid:<pid#">' or, if profiling with multiple events, 'event:<event_name>').  More information on profile specifications can be found at http://oprofile.sourceforge.net/doc/results.html#profile-spec.

-Maynard


> Best~
> Emily
> 
> 
> 
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> 
> 
> 
> _______________________________________________
> oprofile-list mailing list
> opr...@li...
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>

Re: using oprofile to debug multi-processes programs on linux

From: Maynard J. <may...@us...> - 2013-09-16 13:35:17

On 09/14/2013 08:24 PM, benzhi cao wrote:
> Thanks so much for your reply. And I can collect the information for every process now.
> Also I want to collect the L2 cache miss, so I try to use ophelp to find the event that 
> I can use for L2 cache miss. And I think the event is LLC_MISSES. But I also find 
> some guys who use l2_lines_in to profile l2 cache miss, so I was confused, I don't know
> which is the right event? What's more, my hardware is intel architecture 64.
> Best~
> Emily
Adding oprofile-list back to cc so that maybe someone else on the list can help, since Intel is not my primary architecture of expertise.

-Maynard
> 
> 
> 2013/9/14 Maynard Johnson <may...@us... <mailto:may...@us...>>
> 
>     On 09/13/2013 02:18 AM, benzhi cao wrote:
>     > Hi, can Oprofile be  used to profile performance of multi-processes programs ? And if it can, how to see the the performance of each process? (P.S: The online manual shows that it can be used to profile multi-threads programs, but I don't know whether it  can be used for multi-processes). Any help will be appreciated, thanks a lot~
>     Hi, Emily,
>     Hopefully, you're using oprofile 0.9.9 so you can use operf instead of the older "legacy" opcontrol commands.  Using operf, you can specify to profile just the particular application (or process) you're interested in. If your application does fork/exec to create new child processes, operf will, by default, collect all sample data for the parent and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has some key bug fixes for operf relating to following forked children.)  You can specify "--separate-thread" (see operf's man page for details) so that samples are separated by process and thread.  If you do collect a --separate-thread profile, be aware that opreport, being a text-based report generator does not handle too many axes of separation very well.  You may get a report that looks like a jumbled mess, but would show a list of process IDs near the top of the report.  You could use that list of PIDs to generate per-process reports -- e.g., 'opreport tgi!
 d:<pid!
>      #> [option
>     s]'.  In some cases, opreport gives up and tells you that you have to either provide a profile specification (e.g., 'tgid:<pid#">' or, if profiling with multiple events, 'event:<event_name>').  More information on profile specifications can be found at http://oprofile.sourceforge.net/doc/results.html#profile-spec.
> 
>     -Maynard
> 
> 
>     > Best~
>     > Emily
>     >
>     >
>     >
>     > ------------------------------------------------------------------------------
>     > How ServiceNow helps IT people transform IT departments:
>     > 1. Consolidate legacy IT systems to a single system of record for IT
>     > 2. Standardize and globalize service processes across IT
>     > 3. Implement zero-touch automation to replace manual, redundant tasks
>     > http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
>     >
>     >
>     >
>     > _______________________________________________
>     > oprofile-list mailing list
>     > opr...@li... <mailto:opr...@li...>
>     > https://lists.sourceforge.net/lists/listinfo/oprofile-list
>     >
> 
>

Re: Re: using oprofile to debug multi-processes programs on linux

From: benzhi c. <cao...@gm...> - 2013-09-21 10:46:52

Thanks for your reply. But now I have another questions. When I use 32
threads to run my app, and use the opreport to show the results, the
results were mess, and I cann't see the results easily.
Do you know how to see the results clearly?
Best~
Emily


2013/9/18 Michael Petlan <mp...@re...>

> Hi,
>
> As I know, the L2_LINES_IN can be used for that, see this reference guide:
> http://software.intel.com/**sites/products/documentation/**
> doclib/stdxe/2013/amplifierxe/**win/win_reference/pmp/events/**
> about_l2_cache_events.html<http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html>
>
> The LLC_MISSES should care about the last level cache, it may be the L3.
>
> I have L2_CACHE_MISS event for this, but maybe you haven't.
>
> Please take it as a non-official information.
>
> Regards,
> Michael
>
>
>
> -------- Original message --------
> Předmět: Re: using oprofile to debug multi-processes programs on linux
> Datum: Mon, 16 Sep 2013 08:34:03 -0500
> Od: Maynard Johnson <may...@us...>
> Komu: benzhi cao <cao...@gm...>
> Kopie: oprofile-list <oprofile-list@lists.**sourceforge.net<opr...@li...>
> >
>
> On 09/14/2013 08:24 PM, benzhi cao wrote:
>
>> Thanks so much for your reply. And I can collect the information for
>> every process now.
>> Also I want to collect the L2 cache miss, so I try to use ophelp to find
>> the event that
>> I can use for L2 cache miss. And I think the event is LLC_MISSES. But I
>> also find
>> some guys who use l2_lines_in to profile l2 cache miss, so I was
>> confused, I don't know
>> which is the right event? What's more, my hardware is intel architecture
>> 64.
>> Best~
>> Emily
>>
> Adding oprofile-list back to cc so that maybe someone else on the list can
> help, since Intel is not my primary architecture of expertise.
>
> -Maynard
>
>>
>>
>> 2013/9/14 Maynard Johnson <may...@us... <mailto:
>> may...@us...>>
>>
>>     On 09/13/2013 02:18 AM, benzhi cao wrote:
>>     > Hi, can Oprofile be  used to profile performance of multi-processes
>> programs ? And if it can, how to see the the performance of each process?
>> (P.S: The online manual shows that it can be used to profile multi-threads
>> programs, but I don't know whether it  can be used for multi-processes).
>> Any help will be appreciated, thanks a lot~
>>     Hi, Emily,
>>     Hopefully, you're using oprofile 0.9.9 so you can use operf instead
>> of the older "legacy" opcontrol commands.  Using operf, you can specify to
>> profile just the particular application (or process) you're interested in.
>> If your application does fork/exec to create new child processes, operf
>> will, by default, collect all sample data for the parent and children, but
>> will aggregate all sample data. (ATTENTION:  0.9.9 has some key bug fixes
>> for operf relating to following forked children.)  You can specify
>> "--separate-thread" (see operf's man page for details) so that samples are
>> separated by process and thread.  If you do collect a --separate-thread
>> profile, be aware that opreport, being a text-based report generator does
>> not handle too many axes of separation very well.  You may get a report
>> that looks like a jumbled mess, but would show a list of process IDs near
>> the top of the report.  You could use that list of PIDs to generate
>> per-process reports -- e.g., 'opreport tgi!
>>
>  d:<pid!
>
>>      #> [option
>>     s]'.  In some cases, opreport gives up and tells you that you have to
>> either provide a profile specification (e.g., 'tgid:<pid#">' or, if
>> profiling with multiple events, 'event:<event_name>').  More information on
>> profile specifications can be found at http://oprofile.sourceforge.**
>> net/doc/results.html#profile-**spec<http://oprofile.sourceforge.net/doc/results.html#profile-spec>
>> .
>>
>>     -Maynard
>>
>>
>>     > Best~
>>     > Emily
>>     >
>>     >
>>     >
>>     > ------------------------------**------------------------------**
>> ------------------
>>     > How ServiceNow helps IT people transform IT departments:
>>     > 1. Consolidate legacy IT systems to a single system of record for IT
>>     > 2. Standardize and globalize service processes across IT
>>     > 3. Implement zero-touch automation to replace manual, redundant
>> tasks
>>     > http://pubads.g.doubleclick.**net/gampad/clk?id=51271111&iu=**
>> /4140/ostg.clktrk<http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk>
>>     >
>>     >
>>     >
>>     > ______________________________**_________________
>>     > oprofile-list mailing list
>>     > oprofile-list@lists.**sourceforge.net<opr...@li...><mailto:
>> oprofile-list@lists.**sourceforge.net<opr...@li...>
>> >
>>     > https://lists.sourceforge.net/**lists/listinfo/oprofile-list<https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>>     >
>>
>>
>>
>
> ------------------------------**------------------------------**
> ------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
> SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
> includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> http://pubads.g.doubleclick.**net/gampad/clk?id=58041151&iu=**
> /4140/ostg.clktrk<http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk>
>
> ______________________________**_________________
> oprofile-list mailing list
> oprofile-list@lists.**sourceforge.net<opr...@li...>
> https://lists.sourceforge.net/**lists/listinfo/oprofile-list<https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>
>
>

Re: using oprofile to debug multi-processes programs on linux

From: Maynard J. <may...@us...> - 2013-09-23 15:16:53

On 09/21/2013 05:46 AM, benzhi cao wrote:
> Thanks for your reply. But now I have another questions. When I use 32 threads to run my app, and use the opreport to show the results, the results were mess, and I cann't see the results easily. 
> Do you know how to see the results clearly?
I mentioned in my first response that this likely would be the case.  Did you try the tips I suggested?  Here's an example of what I was trying to say:

If I use 'operf --separate-thread' to profile a Java 1.6 app, doing 'opreport' with no options shows the following jumbled mess:

[mpjohn@oc1757000783 myJavaStuff]$ opreport 
Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples directory.
CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
Processes with a thread ID of 21373
Processes with a thread ID of 21376
Processes with a thread ID of 21378
Processes with a thread ID of 21379
Processes with a thread ID of 21380
Processes with a thread ID of 21382
Processes with a thread ID of 21383
Processes with a thread ID of 21384
Processes with a thread ID of 21385
Processes with a thread ID of 21386
Processes with a thread ID of 21387
Processes with a thread ID of 21388
Processes with a thread ID of 21389
Processes with a thread ID of 21390
Processes with a thread ID of 21391
        tid:21373|        tid:21376|        tid:21378|        tid:21379|        tid:21380|        tid:21382|        tid:21383|        tid:21384|        tid:21385|        tid:21386|        tid:21387|        tid:21388|        tid:21389|        tid:21390|        tid:21391|
  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|  samples|      %|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       82 100.000      3763 100.000      2763 100.000        91 100.000         1 100.000         3 100.000        43 100.000         3 100.000         2 100.000         1 100.000         6 100.000         2 100.000       163 100.000         2 100.000    109761 100.000 java

 . . . . blah, blah

=======================================

It's practically impossible to read such a report manually.  The easiest thing for you to do is to pick individual processes (or threads) to focus on, one at a time; for example:

Focusing on the first process using 'opreport tgid:21373' I can get the exact same jumbled mess, showing all the individual thread IDs.  Notice the profile specification of "tgid:21373'.  The "tgid" is "thread group ID", which basically means you're asking opreport to show you all data for that process and its child threads. Since this is Java 1.6, I happen to know that the JVM creates threads to do its work (versus fork/exec which would create new child *processes*).  So I then randomly choose one of the other threads in the list above and use "tid" in the profile specification to see profile data for that thread:

opreport tid:21378
Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples directory.
CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
     2763 100.000 java
	CPU_CLK_UNHALT...|
	  samples|      %|
	------------------
	     2554 92.4358 libj9jit24.so
	       68  2.4611 no-vmlinux
	       50  1.8096 libc-2.12.so
	       27  0.9772 libj9vm24.so
	       24  0.8686 libj9thr24.so
	       16  0.5791 libj9prt24.so
	       12  0.4343 libpthread-2.12.so
	        5  0.1810 libj9hookable24.so

======================

Hope that helps.

-Maynard






> Best~
> Emily
> 
> 
> 2013/9/18 Michael Petlan <mp...@re... <mailto:mp...@re...>>
> 
>     Hi,
> 
>     As I know, the L2_LINES_IN can be used for that, see this reference guide:
>     http://software.intel.com/__sites/products/documentation/__doclib/stdxe/2013/amplifierxe/__win/win_reference/pmp/events/__about_l2_cache_events.html <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html>
> 
>     The LLC_MISSES should care about the last level cache, it may be the L3.
> 
>     I have L2_CACHE_MISS event for this, but maybe you haven't.
> 
>     Please take it as a non-official information.
> 
>     Regards,
>     Michael
> 
> 
> 
>     -------- Original message --------
>     Předmět: Re: using oprofile to debug multi-processes programs on linux
>     Datum: Mon, 16 Sep 2013 08:34:03 -0500
>     Od: Maynard Johnson <may...@us... <mailto:may...@us...>>
>     Komu: benzhi cao <cao...@gm... <mailto:cao...@gm...>>
>     Kopie: oprofile-list <oprofile-list@lists.__sourceforge.net <mailto:opr...@li...>>
> 
>     On 09/14/2013 08:24 PM, benzhi cao wrote:
> 
>         Thanks so much for your reply. And I can collect the information for every process now.
>         Also I want to collect the L2 cache miss, so I try to use ophelp to find the event that
>         I can use for L2 cache miss. And I think the event is LLC_MISSES. But I also find
>         some guys who use l2_lines_in to profile l2 cache miss, so I was confused, I don't know
>         which is the right event? What's more, my hardware is intel architecture 64.
>         Best~
>         Emily
> 
>     Adding oprofile-list back to cc so that maybe someone else on the list can help, since Intel is not my primary architecture of expertise.
> 
>     -Maynard
> 
> 
> 
>         2013/9/14 Maynard Johnson <may...@us... <mailto:may...@us...> <mailto:may...@us... <mailto:may...@us...>>>
> 
>             On 09/13/2013 02:18 AM, benzhi cao wrote:
>             > Hi, can Oprofile be  used to profile performance of multi-processes programs ? And if it can, how to see the the performance of each process? (P.S: The online manual shows that it can be used to profile multi-threads programs, but I don't know whether it  can be used for multi-processes). Any help will be appreciated, thanks a lot~
>             Hi, Emily,
>             Hopefully, you're using oprofile 0.9.9 so you can use operf instead of the older "legacy" opcontrol commands.  Using operf, you can specify to profile just the particular application (or process) you're interested in. If your application does fork/exec to create new child processes, operf will, by default, collect all sample data for the parent and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has some key bug fixes for operf relating to following forked children.)  You can specify "--separate-thread" (see operf's man page for details) so that samples are separated by process and thread.  If you do collect a --separate-thread profile, be aware that opreport, being a text-based report generator does not handle too many axes of separation very well.  You may get a report that looks like a jumbled mess, but would show a list of process IDs near the top of the report.  You could use that list of PIDs to generate per-process reports -- e.g.,
>         'opreport tgi!
> 
>      d:<pid!
> 
>              #> [option
>             s]'.  In some cases, opreport gives up and tells you that you have to either provide a profile specification (e.g., 'tgid:<pid#">' or, if profiling with multiple events, 'event:<event_name>').  More information on profile specifications can be found at http://oprofile.sourceforge.__net/doc/results.html#profile-__spec <http://oprofile.sourceforge.net/doc/results.html#profile-spec>.
> 
>             -Maynard
> 
> 
>             > Best~
>             > Emily
>             >
>             >
>             >
>             > ------------------------------__------------------------------__------------------
>             > How ServiceNow helps IT people transform IT departments:
>             > 1. Consolidate legacy IT systems to a single system of record for IT
>             > 2. Standardize and globalize service processes across IT
>             > 3. Implement zero-touch automation to replace manual, redundant tasks
>             > http://pubads.g.doubleclick.__net/gampad/clk?id=51271111&iu=__/4140/ostg.clktrk <http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk>
>             >
>             >
>             >
>             > _________________________________________________
>             > oprofile-list mailing list
>             > oprofile-list@lists.__sourceforge.net <mailto:opr...@li...> <mailto:oprofile-list@lists.__sourceforge.net <mailto:opr...@li...>>
>             > https://lists.sourceforge.net/__lists/listinfo/oprofile-list <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>             >
> 
> 
> 
> 
>     ------------------------------__------------------------------__------------------
>     LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>     http://pubads.g.doubleclick.__net/gampad/clk?id=58041151&iu=__/4140/ostg.clktrk <http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk>
> 
>     _________________________________________________
>     oprofile-list mailing list
>     oprofile-list@lists.__sourceforge.net <mailto:opr...@li...>
>     https://lists.sourceforge.net/__lists/listinfo/oprofile-list <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> 
> 
> 
> _______________________________________________
> oprofile-list mailing list
> opr...@li...
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>

Re: using oprofile to debug multi-processes programs on linux

From: benzhi c. <cao...@gm...> - 2013-09-25 06:46:28

Thanks a lot, it's very helpful to me.
What's more, when I profile with oprofile, I can not know which
function call the glibc functions like memmove,(I have already used
the --callgraph options, but still no result). Do you know how to do
that? Thanks~
Best
Emily


2013/9/23, Maynard Johnson <may...@us...>:
> On 09/21/2013 05:46 AM, benzhi cao wrote:
>> Thanks for your reply. But now I have another questions. When I use 32
>> threads to run my app, and use the opreport to show the results, the
>> results were mess, and I cann't see the results easily.
>> Do you know how to see the results clearly?
> I mentioned in my first response that this likely would be the case.  Did
> you try the tips I suggested?  Here's an example of what I was trying to
> say:
>
> If I use 'operf --separate-thread' to profile a Java 1.6 app, doing
> 'opreport' with no options shows the following jumbled mess:
>
> [mpjohn@oc1757000783 myJavaStuff]$ opreport
> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
> directory.
> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 100000
> Processes with a thread ID of 21373
> Processes with a thread ID of 21376
> Processes with a thread ID of 21378
> Processes with a thread ID of 21379
> Processes with a thread ID of 21380
> Processes with a thread ID of 21382
> Processes with a thread ID of 21383
> Processes with a thread ID of 21384
> Processes with a thread ID of 21385
> Processes with a thread ID of 21386
> Processes with a thread ID of 21387
> Processes with a thread ID of 21388
> Processes with a thread ID of 21389
> Processes with a thread ID of 21390
> Processes with a thread ID of 21391
>         tid:21373|        tid:21376|        tid:21378|        tid:21379|
>    tid:21380|        tid:21382|        tid:21383|        tid:21384|
> tid:21385|        tid:21386|        tid:21387|        tid:21388|
> tid:21389|        tid:21390|        tid:21391|
>   samples|      %|  samples|      %|  samples|      %|  samples|      %|
> samples|      %|  samples|      %|  samples|      %|  samples|      %|
> samples|      %|  samples|      %|  samples|      %|  samples|      %|
> samples|      %|  samples|      %|  samples|      %|
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>        82 100.000      3763 100.000      2763 100.000        91 100.000
>    1 100.000         3 100.000        43 100.000         3 100.000         2
> 100.000         1 100.000         6 100.000         2 100.000       163
> 100.000         2 100.000    109761 100.000 java
>
>  . . . . blah, blah
>
> =======================================
>
> It's practically impossible to read such a report manually.  The easiest
> thing for you to do is to pick individual processes (or threads) to focus
> on, one at a time; for example:
>
> Focusing on the first process using 'opreport tgid:21373' I can get the
> exact same jumbled mess, showing all the individual thread IDs.  Notice the
> profile specification of "tgid:21373'.  The "tgid" is "thread group ID",
> which basically means you're asking opreport to show you all data for that
> process and its child threads. Since this is Java 1.6, I happen to know that
> the JVM creates threads to do its work (versus fork/exec which would create
> new child *processes*).  So I then randomly choose one of the other threads
> in the list above and use "tid" in the profile specification to see profile
> data for that thread:
>
> opreport tid:21378
> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
> directory.
> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 100000
> CPU_CLK_UNHALT...|
>   samples|      %|
> ------------------
>      2763 100.000 java
> 	CPU_CLK_UNHALT...|
> 	  samples|      %|
> 	------------------
> 	     2554 92.4358 libj9jit24.so
> 	       68  2.4611 no-vmlinux
> 	       50  1.8096 libc-2.12.so
> 	       27  0.9772 libj9vm24.so
> 	       24  0.8686 libj9thr24.so
> 	       16  0.5791 libj9prt24.so
> 	       12  0.4343 libpthread-2.12.so
> 	        5  0.1810 libj9hookable24.so
>
> ======================
>
> Hope that helps.
>
> -Maynard
>
>
>
>
>
>
>> Best~
>> Emily
>>
>>
>> 2013/9/18 Michael Petlan <mp...@re... <mailto:mp...@re...>>
>>
>>     Hi,
>>
>>     As I know, the L2_LINES_IN can be used for that, see this reference
>> guide:
>>
>> http://software.intel.com/__sites/products/documentation/__doclib/stdxe/2013/amplifierxe/__win/win_reference/pmp/events/__about_l2_cache_events.html
>> <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html>
>>
>>     The LLC_MISSES should care about the last level cache, it may be the
>> L3.
>>
>>     I have L2_CACHE_MISS event for this, but maybe you haven't.
>>
>>     Please take it as a non-official information.
>>
>>     Regards,
>>     Michael
>>
>>
>>
>>     -------- Original message --------
>>     Předmět: Re: using oprofile to debug multi-processes programs on
>> linux
>>     Datum: Mon, 16 Sep 2013 08:34:03 -0500
>>     Od: Maynard Johnson <may...@us...
>> <mailto:may...@us...>>
>>     Komu: benzhi cao <cao...@gm...
>> <mailto:cao...@gm...>>
>>     Kopie: oprofile-list <oprofile-list@lists.__sourceforge.net
>> <mailto:opr...@li...>>
>>
>>     On 09/14/2013 08:24 PM, benzhi cao wrote:
>>
>>         Thanks so much for your reply. And I can collect the information
>> for every process now.
>>         Also I want to collect the L2 cache miss, so I try to use ophelp
>> to find the event that
>>         I can use for L2 cache miss. And I think the event is LLC_MISSES.
>> But I also find
>>         some guys who use l2_lines_in to profile l2 cache miss, so I was
>> confused, I don't know
>>         which is the right event? What's more, my hardware is intel
>> architecture 64.
>>         Best~
>>         Emily
>>
>>     Adding oprofile-list back to cc so that maybe someone else on the list
>> can help, since Intel is not my primary architecture of expertise.
>>
>>     -Maynard
>>
>>
>>
>>         2013/9/14 Maynard Johnson <may...@us...
>> <mailto:may...@us...> <mailto:may...@us...
>> <mailto:may...@us...>>>
>>
>>             On 09/13/2013 02:18 AM, benzhi cao wrote:
>>             > Hi, can Oprofile be  used to profile performance of
>> multi-processes programs ? And if it can, how to see the the performance
>> of each process? (P.S: The online manual shows that it can be used to
>> profile multi-threads programs, but I don't know whether it  can be used
>> for multi-processes). Any help will be appreciated, thanks a lot~
>>             Hi, Emily,
>>             Hopefully, you're using oprofile 0.9.9 so you can use operf
>> instead of the older "legacy" opcontrol commands.  Using operf, you can
>> specify to profile just the particular application (or process) you're
>> interested in. If your application does fork/exec to create new child
>> processes, operf will, by default, collect all sample data for the parent
>> and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has
>> some key bug fixes for operf relating to following forked children.)  You
>> can specify "--separate-thread" (see operf's man page for details) so that
>> samples are separated by process and thread.  If you do collect a
>> --separate-thread profile, be aware that opreport, being a text-based
>> report generator does not handle too many axes of separation very well.
>> You may get a report that looks like a jumbled mess, but would show a list
>> of process IDs near the top of the report.  You could use that list of
>> PIDs to generate per-process reports -- e.g.,
>>         'opreport tgi!
>>
>>      d:<pid!
>>
>>              #> [option
>>             s]'.  In some cases, opreport gives up and tells you that you
>> have to either provide a profile specification (e.g., 'tgid:<pid#">' or,
>> if profiling with multiple events, 'event:<event_name>').  More
>> information on profile specifications can be found at
>> http://oprofile.sourceforge.__net/doc/results.html#profile-__spec
>> <http://oprofile.sourceforge.net/doc/results.html#profile-spec>.
>>
>>             -Maynard
>>
>>
>>             > Best~
>>             > Emily
>>             >
>>             >
>>             >
>>             >
>> ------------------------------__------------------------------__------------------
>>             > How ServiceNow helps IT people transform IT departments:
>>             > 1. Consolidate legacy IT systems to a single system of
>> record for IT
>>             > 2. Standardize and globalize service processes across IT
>>             > 3. Implement zero-touch automation to replace manual,
>> redundant tasks
>>             >
>> http://pubads.g.doubleclick.__net/gampad/clk?id=51271111&iu=__/4140/ostg.clktrk
>> <http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk>
>>             >
>>             >
>>             >
>>             > _________________________________________________
>>             > oprofile-list mailing list
>>             > oprofile-list@lists.__sourceforge.net
>> <mailto:opr...@li...>
>> <mailto:oprofile-list@lists.__sourceforge.net
>> <mailto:opr...@li...>>
>>             > https://lists.sourceforge.net/__lists/listinfo/oprofile-list
>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>>             >
>>
>>
>>
>>
>>
>> ------------------------------__------------------------------__------------------
>>     LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>> SharePoint
>>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>> includes
>>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>>
>> http://pubads.g.doubleclick.__net/gampad/clk?id=58041151&iu=__/4140/ostg.clktrk
>> <http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk>
>>
>>     _________________________________________________
>>     oprofile-list mailing list
>>     oprofile-list@lists.__sourceforge.net
>> <mailto:opr...@li...>
>>     https://lists.sourceforge.net/__lists/listinfo/oprofile-list
>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>> SharePoint
>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>> includes
>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>>
>>
>>
>> _______________________________________________
>> oprofile-list mailing list
>> opr...@li...
>> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>>
>
>

Re: using oprofile to debug multi-processes programs on linux

From: benzhi c. <cao...@gm...> - 2013-09-25 07:45:10

By the way, I also think the callgraph is not accurate. It's that true?


2013/9/25 benzhi cao <cao...@gm...>

> Thanks a lot, it's very helpful to me.
> What's more, when I profile with oprofile, I can not know which
> function call the glibc functions like memmove,(I have already used
> the --callgraph options, but still no result). Do you know how to do
> that? Thanks~
> Best
> Emily
>
>
> 2013/9/23, Maynard Johnson <may...@us...>:
> > On 09/21/2013 05:46 AM, benzhi cao wrote:
> >> Thanks for your reply. But now I have another questions. When I use 32
> >> threads to run my app, and use the opreport to show the results, the
> >> results were mess, and I cann't see the results easily.
> >> Do you know how to see the results clearly?
> > I mentioned in my first response that this likely would be the case.  Did
> > you try the tips I suggested?  Here's an example of what I was trying to
> > say:
> >
> > If I use 'operf --separate-thread' to profile a Java 1.6 app, doing
> > 'opreport' with no options shows the following jumbled mess:
> >
> > [mpjohn@oc1757000783 myJavaStuff]$ opreport
> > Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
> > directory.
> > CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz
> (estimated)
> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit
> > mask of 0x00 (No unit mask) count 100000
> > Processes with a thread ID of 21373
> > Processes with a thread ID of 21376
> > Processes with a thread ID of 21378
> > Processes with a thread ID of 21379
> > Processes with a thread ID of 21380
> > Processes with a thread ID of 21382
> > Processes with a thread ID of 21383
> > Processes with a thread ID of 21384
> > Processes with a thread ID of 21385
> > Processes with a thread ID of 21386
> > Processes with a thread ID of 21387
> > Processes with a thread ID of 21388
> > Processes with a thread ID of 21389
> > Processes with a thread ID of 21390
> > Processes with a thread ID of 21391
> >         tid:21373|        tid:21376|        tid:21378|        tid:21379|
> >    tid:21380|        tid:21382|        tid:21383|        tid:21384|
> > tid:21385|        tid:21386|        tid:21387|        tid:21388|
> > tid:21389|        tid:21390|        tid:21391|
> >   samples|      %|  samples|      %|  samples|      %|  samples|      %|
> > samples|      %|  samples|      %|  samples|      %|  samples|      %|
> > samples|      %|  samples|      %|  samples|      %|  samples|      %|
> > samples|      %|  samples|      %|  samples|      %|
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >        82 100.000      3763 100.000      2763 100.000        91 100.000
> >    1 100.000         3 100.000        43 100.000         3 100.000
>   2
> > 100.000         1 100.000         6 100.000         2 100.000       163
> > 100.000         2 100.000    109761 100.000 java
> >
> >  . . . . blah, blah
> >
> > =======================================
> >
> > It's practically impossible to read such a report manually.  The easiest
> > thing for you to do is to pick individual processes (or threads) to focus
> > on, one at a time; for example:
> >
> > Focusing on the first process using 'opreport tgid:21373' I can get the
> > exact same jumbled mess, showing all the individual thread IDs.  Notice
> the
> > profile specification of "tgid:21373'.  The "tgid" is "thread group ID",
> > which basically means you're asking opreport to show you all data for
> that
> > process and its child threads. Since this is Java 1.6, I happen to know
> that
> > the JVM creates threads to do its work (versus fork/exec which would
> create
> > new child *processes*).  So I then randomly choose one of the other
> threads
> > in the list above and use "tid" in the profile specification to see
> profile
> > data for that thread:
> >
> > opreport tid:21378
> > Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
> > directory.
> > CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz
> (estimated)
> > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit
> > mask of 0x00 (No unit mask) count 100000
> > CPU_CLK_UNHALT...|
> >   samples|      %|
> > ------------------
> >      2763 100.000 java
> >       CPU_CLK_UNHALT...|
> >         samples|      %|
> >       ------------------
> >            2554 92.4358 libj9jit24.so
> >              68  2.4611 no-vmlinux
> >              50  1.8096 libc-2.12.so
> >              27  0.9772 libj9vm24.so
> >              24  0.8686 libj9thr24.so
> >              16  0.5791 libj9prt24.so
> >              12  0.4343 libpthread-2.12.so
> >               5  0.1810 libj9hookable24.so
> >
> > ======================
> >
> > Hope that helps.
> >
> > -Maynard
> >
> >
> >
> >
> >
> >
> >> Best~
> >> Emily
> >>
> >>
> >> 2013/9/18 Michael Petlan <mp...@re... <mailto:mp...@re...
> >>
> >>
> >>     Hi,
> >>
> >>     As I know, the L2_LINES_IN can be used for that, see this reference
> >> guide:
> >>
> >>
> http://software.intel.com/__sites/products/documentation/__doclib/stdxe/2013/amplifierxe/__win/win_reference/pmp/events/__about_l2_cache_events.html
> >> <
> http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html
> >
> >>
> >>     The LLC_MISSES should care about the last level cache, it may be the
> >> L3.
> >>
> >>     I have L2_CACHE_MISS event for this, but maybe you haven't.
> >>
> >>     Please take it as a non-official information.
> >>
> >>     Regards,
> >>     Michael
> >>
> >>
> >>
> >>     -------- Original message --------
> >>     Předmět: Re: using oprofile to debug multi-processes programs on
> >> linux
> >>     Datum: Mon, 16 Sep 2013 08:34:03 -0500
> >>     Od: Maynard Johnson <may...@us...
> >> <mailto:may...@us...>>
> >>     Komu: benzhi cao <cao...@gm...
> >> <mailto:cao...@gm...>>
> >>     Kopie: oprofile-list <oprofile-list@lists.__sourceforge.net
> >> <mailto:opr...@li...>>
> >>
> >>     On 09/14/2013 08:24 PM, benzhi cao wrote:
> >>
> >>         Thanks so much for your reply. And I can collect the information
> >> for every process now.
> >>         Also I want to collect the L2 cache miss, so I try to use ophelp
> >> to find the event that
> >>         I can use for L2 cache miss. And I think the event is
> LLC_MISSES.
> >> But I also find
> >>         some guys who use l2_lines_in to profile l2 cache miss, so I was
> >> confused, I don't know
> >>         which is the right event? What's more, my hardware is intel
> >> architecture 64.
> >>         Best~
> >>         Emily
> >>
> >>     Adding oprofile-list back to cc so that maybe someone else on the
> list
> >> can help, since Intel is not my primary architecture of expertise.
> >>
> >>     -Maynard
> >>
> >>
> >>
> >>         2013/9/14 Maynard Johnson <may...@us...
> >> <mailto:may...@us...> <mailto:may...@us...
> >> <mailto:may...@us...>>>
> >>
> >>             On 09/13/2013 02:18 AM, benzhi cao wrote:
> >>             > Hi, can Oprofile be  used to profile performance of
> >> multi-processes programs ? And if it can, how to see the the performance
> >> of each process? (P.S: The online manual shows that it can be used to
> >> profile multi-threads programs, but I don't know whether it  can be used
> >> for multi-processes). Any help will be appreciated, thanks a lot~
> >>             Hi, Emily,
> >>             Hopefully, you're using oprofile 0.9.9 so you can use operf
> >> instead of the older "legacy" opcontrol commands.  Using operf, you can
> >> specify to profile just the particular application (or process) you're
> >> interested in. If your application does fork/exec to create new child
> >> processes, operf will, by default, collect all sample data for the
> parent
> >> and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has
> >> some key bug fixes for operf relating to following forked children.)
>  You
> >> can specify "--separate-thread" (see operf's man page for details) so
> that
> >> samples are separated by process and thread.  If you do collect a
> >> --separate-thread profile, be aware that opreport, being a text-based
> >> report generator does not handle too many axes of separation very well.
> >> You may get a report that looks like a jumbled mess, but would show a
> list
> >> of process IDs near the top of the report.  You could use that list of
> >> PIDs to generate per-process reports -- e.g.,
> >>         'opreport tgi!
> >>
> >>      d:<pid!
> >>
> >>              #> [option
> >>             s]'.  In some cases, opreport gives up and tells you that
> you
> >> have to either provide a profile specification (e.g., 'tgid:<pid#">' or,
> >> if profiling with multiple events, 'event:<event_name>').  More
> >> information on profile specifications can be found at
> >> http://oprofile.sourceforge.__net/doc/results.html#profile-__spec
> >> <http://oprofile.sourceforge.net/doc/results.html#profile-spec>.
> >>
> >>             -Maynard
> >>
> >>
> >>             > Best~
> >>             > Emily
> >>             >
> >>             >
> >>             >
> >>             >
> >>
> ------------------------------__------------------------------__------------------
> >>             > How ServiceNow helps IT people transform IT departments:
> >>             > 1. Consolidate legacy IT systems to a single system of
> >> record for IT
> >>             > 2. Standardize and globalize service processes across IT
> >>             > 3. Implement zero-touch automation to replace manual,
> >> redundant tasks
> >>             >
> >> http://pubads.g.doubleclick.
> __net/gampad/clk?id=51271111&iu=__/4140/ostg.clktrk
> >> <
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> >
> >>             >
> >>             >
> >>             >
> >>             > _________________________________________________
> >>             > oprofile-list mailing list
> >>             > oprofile-list@lists.__sourceforge.net
> >> <mailto:opr...@li...>
> >> <mailto:oprofile-list@lists.__sourceforge.net
> >> <mailto:opr...@li...>>
> >>             >
> https://lists.sourceforge.net/__lists/listinfo/oprofile-list
> >> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
> >>             >
> >>
> >>
> >>
> >>
> >>
> >>
> ------------------------------__------------------------------__------------------
> >>     LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> >>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
> >> SharePoint
> >>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power
> Pack
> >> includes
> >>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> >>
> >> http://pubads.g.doubleclick.
> __net/gampad/clk?id=58041151&iu=__/4140/ostg.clktrk
> >> <
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> >
> >>
> >>     _________________________________________________
> >>     oprofile-list mailing list
> >>     oprofile-list@lists.__sourceforge.net
> >> <mailto:opr...@li...>
> >>     https://lists.sourceforge.net/__lists/listinfo/oprofile-list
> >> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
> >>
> >>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> >> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
> >> SharePoint
> >> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
> >> includes
> >> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> >>
> >>
> >>
> >> _______________________________________________
> >> oprofile-list mailing list
> >> opr...@li...
> >> https://lists.sourceforge.net/lists/listinfo/oprofile-list
> >>
> >
> >
>

Re: using oprofile to debug multi-processes programs on linux

From: Maynard J. <may...@us...> - 2013-09-25 13:22:10

On 09/25/2013 01:46 AM, benzhi cao wrote:
> Thanks a lot, it's very helpful to me.
> What's more, when I profile with oprofile, I can not know which
> function call the glibc functions like memmove,(I have already used
> the --callgraph options, but still no result). Do you know how to do
> that? Thanks~
Please be specific by telling us the commands you're using, the results you get, and what you think is wrong.  The callgraph option works *mostly*.  There are some corner cases mis-handled.  For example, see http://oprofile.sourceforge.net/doc/interpreting-callgraph.html.

-Maynard
> Best
> Emily
> 
> 
> 2013/9/23, Maynard Johnson <may...@us...>:
>> On 09/21/2013 05:46 AM, benzhi cao wrote:
>>> Thanks for your reply. But now I have another questions. When I use 32
>>> threads to run my app, and use the opreport to show the results, the
>>> results were mess, and I cann't see the results easily.
>>> Do you know how to see the results clearly?
>> I mentioned in my first response that this likely would be the case.  Did
>> you try the tips I suggested?  Here's an example of what I was trying to
>> say:
>>
>> If I use 'operf --separate-thread' to profile a Java 1.6 app, doing
>> 'opreport' with no options shows the following jumbled mess:
>>
>> [mpjohn@oc1757000783 myJavaStuff]$ opreport
>> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
>> directory.
>> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
>> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
>> mask of 0x00 (No unit mask) count 100000
>> Processes with a thread ID of 21373
>> Processes with a thread ID of 21376
>> Processes with a thread ID of 21378
>> Processes with a thread ID of 21379
>> Processes with a thread ID of 21380
>> Processes with a thread ID of 21382
>> Processes with a thread ID of 21383
>> Processes with a thread ID of 21384
>> Processes with a thread ID of 21385
>> Processes with a thread ID of 21386
>> Processes with a thread ID of 21387
>> Processes with a thread ID of 21388
>> Processes with a thread ID of 21389
>> Processes with a thread ID of 21390
>> Processes with a thread ID of 21391
>>         tid:21373|        tid:21376|        tid:21378|        tid:21379|
>>    tid:21380|        tid:21382|        tid:21383|        tid:21384|
>> tid:21385|        tid:21386|        tid:21387|        tid:21388|
>> tid:21389|        tid:21390|        tid:21391|
>>   samples|      %|  samples|      %|  samples|      %|  samples|      %|
>> samples|      %|  samples|      %|  samples|      %|  samples|      %|
>> samples|      %|  samples|      %|  samples|      %|  samples|      %|
>> samples|      %|  samples|      %|  samples|      %|
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>        82 100.000      3763 100.000      2763 100.000        91 100.000
>>    1 100.000         3 100.000        43 100.000         3 100.000         2
>> 100.000         1 100.000         6 100.000         2 100.000       163
>> 100.000         2 100.000    109761 100.000 java
>>
>>  . . . . blah, blah
>>
>> =======================================
>>
>> It's practically impossible to read such a report manually.  The easiest
>> thing for you to do is to pick individual processes (or threads) to focus
>> on, one at a time; for example:
>>
>> Focusing on the first process using 'opreport tgid:21373' I can get the
>> exact same jumbled mess, showing all the individual thread IDs.  Notice the
>> profile specification of "tgid:21373'.  The "tgid" is "thread group ID",
>> which basically means you're asking opreport to show you all data for that
>> process and its child threads. Since this is Java 1.6, I happen to know that
>> the JVM creates threads to do its work (versus fork/exec which would create
>> new child *processes*).  So I then randomly choose one of the other threads
>> in the list above and use "tid" in the profile specification to see profile
>> data for that thread:
>>
>> opreport tid:21378
>> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
>> directory.
>> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
>> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
>> mask of 0x00 (No unit mask) count 100000
>> CPU_CLK_UNHALT...|
>>   samples|      %|
>> ------------------
>>      2763 100.000 java
>> 	CPU_CLK_UNHALT...|
>> 	  samples|      %|
>> 	------------------
>> 	     2554 92.4358 libj9jit24.so
>> 	       68  2.4611 no-vmlinux
>> 	       50  1.8096 libc-2.12.so
>> 	       27  0.9772 libj9vm24.so
>> 	       24  0.8686 libj9thr24.so
>> 	       16  0.5791 libj9prt24.so
>> 	       12  0.4343 libpthread-2.12.so
>> 	        5  0.1810 libj9hookable24.so
>>
>> ======================
>>
>> Hope that helps.
>>
>> -Maynard
>>
>>
>>
>>
>>
>>
>>> Best~
>>> Emily
>>>
>>>
>>> 2013/9/18 Michael Petlan <mp...@re... <mailto:mp...@re...>>
>>>
>>>     Hi,
>>>
>>>     As I know, the L2_LINES_IN can be used for that, see this reference
>>> guide:
>>>
>>> http://software.intel.com/__sites/products/documentation/__doclib/stdxe/2013/amplifierxe/__win/win_reference/pmp/events/__about_l2_cache_events.html
>>> <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html>
>>>
>>>     The LLC_MISSES should care about the last level cache, it may be the
>>> L3.
>>>
>>>     I have L2_CACHE_MISS event for this, but maybe you haven't.
>>>
>>>     Please take it as a non-official information.
>>>
>>>     Regards,
>>>     Michael
>>>
>>>
>>>
>>>     -------- Original message --------
>>>     Předmět: Re: using oprofile to debug multi-processes programs on
>>> linux
>>>     Datum: Mon, 16 Sep 2013 08:34:03 -0500
>>>     Od: Maynard Johnson <may...@us...
>>> <mailto:may...@us...>>
>>>     Komu: benzhi cao <cao...@gm...
>>> <mailto:cao...@gm...>>
>>>     Kopie: oprofile-list <oprofile-list@lists.__sourceforge.net
>>> <mailto:opr...@li...>>
>>>
>>>     On 09/14/2013 08:24 PM, benzhi cao wrote:
>>>
>>>         Thanks so much for your reply. And I can collect the information
>>> for every process now.
>>>         Also I want to collect the L2 cache miss, so I try to use ophelp
>>> to find the event that
>>>         I can use for L2 cache miss. And I think the event is LLC_MISSES.
>>> But I also find
>>>         some guys who use l2_lines_in to profile l2 cache miss, so I was
>>> confused, I don't know
>>>         which is the right event? What's more, my hardware is intel
>>> architecture 64.
>>>         Best~
>>>         Emily
>>>
>>>     Adding oprofile-list back to cc so that maybe someone else on the list
>>> can help, since Intel is not my primary architecture of expertise.
>>>
>>>     -Maynard
>>>
>>>
>>>
>>>         2013/9/14 Maynard Johnson <may...@us...
>>> <mailto:may...@us...> <mailto:may...@us...
>>> <mailto:may...@us...>>>
>>>
>>>             On 09/13/2013 02:18 AM, benzhi cao wrote:
>>>             > Hi, can Oprofile be  used to profile performance of
>>> multi-processes programs ? And if it can, how to see the the performance
>>> of each process? (P.S: The online manual shows that it can be used to
>>> profile multi-threads programs, but I don't know whether it  can be used
>>> for multi-processes). Any help will be appreciated, thanks a lot~
>>>             Hi, Emily,
>>>             Hopefully, you're using oprofile 0.9.9 so you can use operf
>>> instead of the older "legacy" opcontrol commands.  Using operf, you can
>>> specify to profile just the particular application (or process) you're
>>> interested in. If your application does fork/exec to create new child
>>> processes, operf will, by default, collect all sample data for the parent
>>> and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has
>>> some key bug fixes for operf relating to following forked children.)  You
>>> can specify "--separate-thread" (see operf's man page for details) so that
>>> samples are separated by process and thread.  If you do collect a
>>> --separate-thread profile, be aware that opreport, being a text-based
>>> report generator does not handle too many axes of separation very well.
>>> You may get a report that looks like a jumbled mess, but would show a list
>>> of process IDs near the top of the report.  You could use that list of
>>> PIDs to generate per-process reports -- e.g.,
>>>         'opreport tgi!
>>>
>>>      d:<pid!
>>>
>>>              #> [option
>>>             s]'.  In some cases, opreport gives up and tells you that you
>>> have to either provide a profile specification (e.g., 'tgid:<pid#">' or,
>>> if profiling with multiple events, 'event:<event_name>').  More
>>> information on profile specifications can be found at
>>> http://oprofile.sourceforge.__net/doc/results.html#profile-__spec
>>> <http://oprofile.sourceforge.net/doc/results.html#profile-spec>.
>>>
>>>             -Maynard
>>>
>>>
>>>             > Best~
>>>             > Emily
>>>             >
>>>             >
>>>             >
>>>             >
>>> ------------------------------__------------------------------__------------------
>>>             > How ServiceNow helps IT people transform IT departments:
>>>             > 1. Consolidate legacy IT systems to a single system of
>>> record for IT
>>>             > 2. Standardize and globalize service processes across IT
>>>             > 3. Implement zero-touch automation to replace manual,
>>> redundant tasks
>>>             >
>>> http://pubads.g.doubleclick.__net/gampad/clk?id=51271111&iu=__/4140/ostg.clktrk
>>> <http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk>
>>>             >
>>>             >
>>>             >
>>>             > _________________________________________________
>>>             > oprofile-list mailing list
>>>             > oprofile-list@lists.__sourceforge.net
>>> <mailto:opr...@li...>
>>> <mailto:oprofile-list@lists.__sourceforge.net
>>> <mailto:opr...@li...>>
>>>             > https://lists.sourceforge.net/__lists/listinfo/oprofile-list
>>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>>>             >
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------__------------------------------__------------------
>>>     LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>>>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>>> SharePoint
>>>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>>> includes
>>>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>>>
>>> http://pubads.g.doubleclick.__net/gampad/clk?id=58041151&iu=__/4140/ostg.clktrk
>>> <http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk>
>>>
>>>     _________________________________________________
>>>     oprofile-list mailing list
>>>     oprofile-list@lists.__sourceforge.net
>>> <mailto:opr...@li...>
>>>     https://lists.sourceforge.net/__lists/listinfo/oprofile-list
>>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>>> SharePoint
>>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>>> includes
>>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>>>
>>>
>>>
>>> _______________________________________________
>>> oprofile-list mailing list
>>> opr...@li...
>>> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>>>
>>
>>
>

Re: using oprofile to debug multi-processes programs on linux

From: benzhi c. <cao...@gm...> - 2013-09-25 15:07:39

Thanks for your remind, I forgot to write the details. The commands I use
were as follows:
1.sudo opcontrol --setup --vmlinux=/home/ssg/vmlinux
--separate=lib,thread,kernel --event=CPU_CLK_UNHALTED:100000 --callgraph=10
2.sudo opcontrol --reset
3.sudo opcontrol --start
4.run my_program.
5.sudo opcontrol --stop
6.opreport -l --callgraph=10 --merge=tgid ./my_program | less
(I use the legacy mode instead of the operf due to I use many signal event
in my program. And I tried use the operf, it doesn't work. So I can only use
the legacy mode to profile.)
The result I get like follows:
samples  %        image name               symbol name
-------------------------------------------------------------------------------
  133       0.3858  wc                       mr_worker
  34345    99.6142  wc                       out_cmp
35263    20.9390  libc-2.15.so             __memset_sse2
  35263    100.000  libc-2.15.so             __memset_sse2 [self]
according to the online manual, it means out_cmp function calls memset
functions. But my out_cmp function is just strcmp. there is no memset
functions at all, So I think it is strange. So what do you think? Any help
would be appreciated. Thanks a lot~
Best
Emily





2013/9/25 Maynard Johnson <may...@us...>

> On 09/25/2013 01:46 AM, benzhi cao wrote:
> > Thanks a lot, it's very helpful to me.
> > What's more, when I profile with oprofile, I can not know which
> > function call the glibc functions like memmove,(I have already used
> > the --callgraph options, but still no result). Do you know how to do
> > that? Thanks~
> Please be specific by telling us the commands you're using, the results
> you get, and what you think is wrong.  The callgraph option works *mostly*.
>  There are some corner cases mis-handled.  For example, see
> http://oprofile.sourceforge.net/doc/interpreting-callgraph.html.
>
> -Maynard
> > Best
> > Emily
> >
> >
> > 2013/9/23, Maynard Johnson <may...@us...>:
> >> On 09/21/2013 05:46 AM, benzhi cao wrote:
> >>> Thanks for your reply. But now I have another questions. When I use 32
> >>> threads to run my app, and use the opreport to show the results, the
> >>> results were mess, and I cann't see the results easily.
> >>> Do you know how to see the results clearly?
> >> I mentioned in my first response that this likely would be the case.
>  Did
> >> you try the tips I suggested?  Here's an example of what I was trying to
> >> say:
> >>
> >> If I use 'operf --separate-thread' to profile a Java 1.6 app, doing
> >> 'opreport' with no options shows the following jumbled mess:
> >>
> >> [mpjohn@oc1757000783 myJavaStuff]$ opreport
> >> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
> >> directory.
> >> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz
> (estimated)
> >> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit
> >> mask of 0x00 (No unit mask) count 100000
> >> Processes with a thread ID of 21373
> >> Processes with a thread ID of 21376
> >> Processes with a thread ID of 21378
> >> Processes with a thread ID of 21379
> >> Processes with a thread ID of 21380
> >> Processes with a thread ID of 21382
> >> Processes with a thread ID of 21383
> >> Processes with a thread ID of 21384
> >> Processes with a thread ID of 21385
> >> Processes with a thread ID of 21386
> >> Processes with a thread ID of 21387
> >> Processes with a thread ID of 21388
> >> Processes with a thread ID of 21389
> >> Processes with a thread ID of 21390
> >> Processes with a thread ID of 21391
> >>         tid:21373|        tid:21376|        tid:21378|        tid:21379|
> >>    tid:21380|        tid:21382|        tid:21383|        tid:21384|
> >> tid:21385|        tid:21386|        tid:21387|        tid:21388|
> >> tid:21389|        tid:21390|        tid:21391|
> >>   samples|      %|  samples|      %|  samples|      %|  samples|      %|
> >> samples|      %|  samples|      %|  samples|      %|  samples|      %|
> >> samples|      %|  samples|      %|  samples|      %|  samples|      %|
> >> samples|      %|  samples|      %|  samples|      %|
> >>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>        82 100.000      3763 100.000      2763 100.000        91 100.000
> >>    1 100.000         3 100.000        43 100.000         3 100.000
>     2
> >> 100.000         1 100.000         6 100.000         2 100.000       163
> >> 100.000         2 100.000    109761 100.000 java
> >>
> >>  . . . . blah, blah
> >>
> >> =======================================
> >>
> >> It's practically impossible to read such a report manually.  The easiest
> >> thing for you to do is to pick individual processes (or threads) to
> focus
> >> on, one at a time; for example:
> >>
> >> Focusing on the first process using 'opreport tgid:21373' I can get the
> >> exact same jumbled mess, showing all the individual thread IDs.  Notice
> the
> >> profile specification of "tgid:21373'.  The "tgid" is "thread group ID",
> >> which basically means you're asking opreport to show you all data for
> that
> >> process and its child threads. Since this is Java 1.6, I happen to know
> that
> >> the JVM creates threads to do its work (versus fork/exec which would
> create
> >> new child *processes*).  So I then randomly choose one of the other
> threads
> >> in the list above and use "tid" in the profile specification to see
> profile
> >> data for that thread:
> >>
> >> opreport tid:21378
> >> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
> >> directory.
> >> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz
> (estimated)
> >> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
> unit
> >> mask of 0x00 (No unit mask) count 100000
> >> CPU_CLK_UNHALT...|
> >>   samples|      %|
> >> ------------------
> >>      2763 100.000 java
> >>      CPU_CLK_UNHALT...|
> >>        samples|      %|
> >>      ------------------
> >>           2554 92.4358 libj9jit24.so
> >>             68  2.4611 no-vmlinux
> >>             50  1.8096 libc-2.12.so
> >>             27  0.9772 libj9vm24.so
> >>             24  0.8686 libj9thr24.so
> >>             16  0.5791 libj9prt24.so
> >>             12  0.4343 libpthread-2.12.so
> >>              5  0.1810 libj9hookable24.so
> >>
> >> ======================
> >>
> >> Hope that helps.
> >>
> >> -Maynard
> >>
> >>
> >>
> >>
> >>
> >>
> >>> Best~
> >>> Emily
> >>>
> >>>
> >>> 2013/9/18 Michael Petlan <mp...@re... <mailto:
> mp...@re...>>
> >>>
> >>>     Hi,
> >>>
> >>>     As I know, the L2_LINES_IN can be used for that, see this reference
> >>> guide:
> >>>
> >>>
> http://software.intel.com/__sites/products/documentation/__doclib/stdxe/2013/amplifierxe/__win/win_reference/pmp/events/__about_l2_cache_events.html
> >>> <
> http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html
> >
> >>>
> >>>     The LLC_MISSES should care about the last level cache, it may be
> the
> >>> L3.
> >>>
> >>>     I have L2_CACHE_MISS event for this, but maybe you haven't.
> >>>
> >>>     Please take it as a non-official information.
> >>>
> >>>     Regards,
> >>>     Michael
> >>>
> >>>
> >>>
> >>>     -------- Original message --------
> >>>     Předmět: Re: using oprofile to debug multi-processes programs on
> >>> linux
> >>>     Datum: Mon, 16 Sep 2013 08:34:03 -0500
> >>>     Od: Maynard Johnson <may...@us...
> >>> <mailto:may...@us...>>
> >>>     Komu: benzhi cao <cao...@gm...
> >>> <mailto:cao...@gm...>>
> >>>     Kopie: oprofile-list <oprofile-list@lists.__sourceforge.net
> >>> <mailto:opr...@li...>>
> >>>
> >>>     On 09/14/2013 08:24 PM, benzhi cao wrote:
> >>>
> >>>         Thanks so much for your reply. And I can collect the
> information
> >>> for every process now.
> >>>         Also I want to collect the L2 cache miss, so I try to use
> ophelp
> >>> to find the event that
> >>>         I can use for L2 cache miss. And I think the event is
> LLC_MISSES.
> >>> But I also find
> >>>         some guys who use l2_lines_in to profile l2 cache miss, so I
> was
> >>> confused, I don't know
> >>>         which is the right event? What's more, my hardware is intel
> >>> architecture 64.
> >>>         Best~
> >>>         Emily
> >>>
> >>>     Adding oprofile-list back to cc so that maybe someone else on the
> list
> >>> can help, since Intel is not my primary architecture of expertise.
> >>>
> >>>     -Maynard
> >>>
> >>>
> >>>
> >>>         2013/9/14 Maynard Johnson <may...@us...
> >>> <mailto:may...@us...> <mailto:may...@us...
> >>> <mailto:may...@us...>>>
> >>>
> >>>             On 09/13/2013 02:18 AM, benzhi cao wrote:
> >>>             > Hi, can Oprofile be  used to profile performance of
> >>> multi-processes programs ? And if it can, how to see the the
> performance
> >>> of each process? (P.S: The online manual shows that it can be used to
> >>> profile multi-threads programs, but I don't know whether it  can be
> used
> >>> for multi-processes). Any help will be appreciated, thanks a lot~
> >>>             Hi, Emily,
> >>>             Hopefully, you're using oprofile 0.9.9 so you can use operf
> >>> instead of the older "legacy" opcontrol commands.  Using operf, you can
> >>> specify to profile just the particular application (or process) you're
> >>> interested in. If your application does fork/exec to create new child
> >>> processes, operf will, by default, collect all sample data for the
> parent
> >>> and children, but will aggregate all sample data. (ATTENTION:  0.9.9
> has
> >>> some key bug fixes for operf relating to following forked children.)
>  You
> >>> can specify "--separate-thread" (see operf's man page for details) so
> that
> >>> samples are separated by process and thread.  If you do collect a
> >>> --separate-thread profile, be aware that opreport, being a text-based
> >>> report generator does not handle too many axes of separation very well.
> >>> You may get a report that looks like a jumbled mess, but would show a
> list
> >>> of process IDs near the top of the report.  You could use that list of
> >>> PIDs to generate per-process reports -- e.g.,
> >>>         'opreport tgi!
> >>>
> >>>      d:<pid!
> >>>
> >>>              #> [option
> >>>             s]'.  In some cases, opreport gives up and tells you that
> you
> >>> have to either provide a profile specification (e.g., 'tgid:<pid#">'
> or,
> >>> if profiling with multiple events, 'event:<event_name>').  More
> >>> information on profile specifications can be found at
> >>> http://oprofile.sourceforge.__net/doc/results.html#profile-__spec
> >>> <http://oprofile.sourceforge.net/doc/results.html#profile-spec>.
> >>>
> >>>             -Maynard
> >>>
> >>>
> >>>             > Best~
> >>>             > Emily
> >>>             >
> >>>             >
> >>>             >
> >>>             >
> >>>
> ------------------------------__------------------------------__------------------
> >>>             > How ServiceNow helps IT people transform IT departments:
> >>>             > 1. Consolidate legacy IT systems to a single system of
> >>> record for IT
> >>>             > 2. Standardize and globalize service processes across IT
> >>>             > 3. Implement zero-touch automation to replace manual,
> >>> redundant tasks
> >>>             >
> >>> http://pubads.g.doubleclick.
> __net/gampad/clk?id=51271111&iu=__/4140/ostg.clktrk
> >>> <
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> >
> >>>             >
> >>>             >
> >>>             >
> >>>             > _________________________________________________
> >>>             > oprofile-list mailing list
> >>>             > oprofile-list@lists.__sourceforge.net
> >>> <mailto:opr...@li...>
> >>> <mailto:oprofile-list@lists.__sourceforge.net
> >>> <mailto:opr...@li...>>
> >>>             >
> https://lists.sourceforge.net/__lists/listinfo/oprofile-list
> >>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
> >>>             >
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> ------------------------------__------------------------------__------------------
> >>>     LIMITED TIME SALE - Full Year of Microsoft Training For Just
> $49.99!
> >>>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
> >>> SharePoint
> >>>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power
> Pack
> >>> includes
> >>>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends
> 9/20/13.
> >>>
> >>> http://pubads.g.doubleclick.
> __net/gampad/clk?id=58041151&iu=__/4140/ostg.clktrk
> >>> <
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> >
> >>>
> >>>     _________________________________________________
> >>>     oprofile-list mailing list
> >>>     oprofile-list@lists.__sourceforge.net
> >>> <mailto:opr...@li...>
> >>>     https://lists.sourceforge.net/__lists/listinfo/oprofile-list
> >>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
> >>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
> >>> SharePoint
> >>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
> >>> includes
> >>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
> >>>
> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> oprofile-list mailing list
> >>> opr...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/oprofile-list
> >>>
> >>
> >>
> >
>
>

Re: using oprofile to debug multi-processes programs on linux

From: Maynard J. <may...@us...> - 2013-09-25 16:24:28

On 09/25/2013 10:07 AM, benzhi cao wrote:
> Thanks for your remind, I forgot to write the details. The commands I use were as follows:
> 1.sudo opcontrol --setup --vmlinux=/home/ssg/vmlinux --separate=lib,thread,kernel --event=CPU_CLK_UNHALTED:100000 --callgraph=10
> 2.sudo opcontrol --reset
> 3.sudo opcontrol --start
> 4.run my_program.
> 5.sudo opcontrol --stop
> 6.opreport -l --callgraph=10 --merge=tgid ./my_program | less
> (I use the legacy mode instead of the operf due to I use many signal event in my program. And I tried use the operf, it doesn't work. So I can only use
> the legacy mode to profile.)
> The result I get like follows:
> samples  %        image name               symbol name
> -------------------------------------------------------------------------------
>   133       0.3858  wc                       mr_worker
>   34345    99.6142  wc                       out_cmp
> 35263    20.9390  libc-2.15.so <http://libc-2.15.so>             __memset_sse2
>   35263    100.000  libc-2.15.so <http://libc-2.15.so>             __memset_sse2 [self]
> according to the online manual, it means out_cmp function calls memset functions. But my out_cmp function is just strcmp. there is no memset functions at all, So I think it is strange. So what do you think? Any help would be appreciated. Thanks a lot~

First, a question . . . What version of oprofile are you using?

It's probably unlikely that strcmp is calling memset (and resulting in the issue described in http://oprofile.sourceforge.net/doc/interpreting-callgraph.html). But without seeing the source of your out_cmp function, I can't guess what else might be involved.  Another possibility is that some sample information was lost during profiling, causing opreport to falsely conclude that out_cmp calls memset.  Did you see any messages about lost/dropped samples or overflows?

By the way, we're not really very interested in opcontrol anymore since it has been deprecated for the last two releases.  By next release, it will be gone.  You said above that operf does not work with your program because of signals used by that program.  Please post a new thread to the list and completely describe the problem and how we can reproduce it.

Thanks.
-Maynard
> Best
> Emily
> 
> 
> 
> 
> 
> 2013/9/25 Maynard Johnson <may...@us... <mailto:may...@us...>>
> 
>     On 09/25/2013 01:46 AM, benzhi cao wrote:
>     > Thanks a lot, it's very helpful to me.
>     > What's more, when I profile with oprofile, I can not know which
>     > function call the glibc functions like memmove,(I have already used
>     > the --callgraph options, but still no result). Do you know how to do
>     > that? Thanks~
>     Please be specific by telling us the commands you're using, the results you get, and what you think is wrong.  The callgraph option works *mostly*.  There are some corner cases mis-handled.  For example, see http://oprofile.sourceforge.net/doc/interpreting-callgraph.html.
> 
>     -Maynard
>     > Best
>     > Emily
>     >
>     >
>     > 2013/9/23, Maynard Johnson <may...@us... <mailto:may...@us...>>:
>     >> On 09/21/2013 05:46 AM, benzhi cao wrote:
>     >>> Thanks for your reply. But now I have another questions. When I use 32
>     >>> threads to run my app, and use the opreport to show the results, the
>     >>> results were mess, and I cann't see the results easily.
>     >>> Do you know how to see the results clearly?
>     >> I mentioned in my first response that this likely would be the case.  Did
>     >> you try the tips I suggested?  Here's an example of what I was trying to
>     >> say:
>     >>
>     >> If I use 'operf --separate-thread' to profile a Java 1.6 app, doing
>     >> 'opreport' with no options shows the following jumbled mess:
>     >>
>     >> [mpjohn@oc1757000783 myJavaStuff]$ opreport
>     >> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
>     >> directory.
>     >> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
>     >> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
>     >> mask of 0x00 (No unit mask) count 100000
>     >> Processes with a thread ID of 21373
>     >> Processes with a thread ID of 21376
>     >> Processes with a thread ID of 21378
>     >> Processes with a thread ID of 21379
>     >> Processes with a thread ID of 21380
>     >> Processes with a thread ID of 21382
>     >> Processes with a thread ID of 21383
>     >> Processes with a thread ID of 21384
>     >> Processes with a thread ID of 21385
>     >> Processes with a thread ID of 21386
>     >> Processes with a thread ID of 21387
>     >> Processes with a thread ID of 21388
>     >> Processes with a thread ID of 21389
>     >> Processes with a thread ID of 21390
>     >> Processes with a thread ID of 21391
>     >>         tid:21373|        tid:21376|        tid:21378|        tid:21379|
>     >>    tid:21380|        tid:21382|        tid:21383|        tid:21384|
>     >> tid:21385|        tid:21386|        tid:21387|        tid:21388|
>     >> tid:21389|        tid:21390|        tid:21391|
>     >>   samples|      %|  samples|      %|  samples|      %|  samples|      %|
>     >> samples|      %|  samples|      %|  samples|      %|  samples|      %|
>     >> samples|      %|  samples|      %|  samples|      %|  samples|      %|
>     >> samples|      %|  samples|      %|  samples|      %|
>     >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     >>        82 100.000      3763 100.000      2763 100.000        91 100.000
>     >>    1 100.000         3 100.000        43 100.000         3 100.000         2
>     >> 100.000         1 100.000         6 100.000         2 100.000       163
>     >> 100.000         2 100.000    109761 100.000 java
>     >>
>     >>  . . . . blah, blah
>     >>
>     >> =======================================
>     >>
>     >> It's practically impossible to read such a report manually.  The easiest
>     >> thing for you to do is to pick individual processes (or threads) to focus
>     >> on, one at a time; for example:
>     >>
>     >> Focusing on the first process using 'opreport tgid:21373' I can get the
>     >> exact same jumbled mess, showing all the individual thread IDs.  Notice the
>     >> profile specification of "tgid:21373'.  The "tgid" is "thread group ID",
>     >> which basically means you're asking opreport to show you all data for that
>     >> process and its child threads. Since this is Java 1.6, I happen to know that
>     >> the JVM creates threads to do its work (versus fork/exec which would create
>     >> new child *processes*).  So I then randomly choose one of the other threads
>     >> in the list above and use "tid" in the profile specification to see profile
>     >> data for that thread:
>     >>
>     >> opreport tid:21378
>     >> Using /home/mpjohn/myJavaStuff/oprofile_data/samples/ for samples
>     >> directory.
>     >> CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
>     >> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
>     >> mask of 0x00 (No unit mask) count 100000
>     >> CPU_CLK_UNHALT...|
>     >>   samples|      %|
>     >> ------------------
>     >>      2763 100.000 java
>     >>      CPU_CLK_UNHALT...|
>     >>        samples|      %|
>     >>      ------------------
>     >>           2554 92.4358 libj9jit24.so
>     >>             68  2.4611 no-vmlinux
>     >>             50  1.8096 libc-2.12.so <http://libc-2.12.so>
>     >>             27  0.9772 libj9vm24.so
>     >>             24  0.8686 libj9thr24.so
>     >>             16  0.5791 libj9prt24.so
>     >>             12  0.4343 libpthread-2.12.so <http://libpthread-2.12.so>
>     >>              5  0.1810 libj9hookable24.so
>     >>
>     >> ======================
>     >>
>     >> Hope that helps.
>     >>
>     >> -Maynard
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>> Best~
>     >>> Emily
>     >>>
>     >>>
>     >>> 2013/9/18 Michael Petlan <mp...@re... <mailto:mp...@re...> <mailto:mp...@re... <mailto:mp...@re...>>>
>     >>>
>     >>>     Hi,
>     >>>
>     >>>     As I know, the L2_LINES_IN can be used for that, see this reference
>     >>> guide:
>     >>>
>     >>> http://software.intel.com/__sites/products/documentation/__doclib/stdxe/2013/amplifierxe/__win/win_reference/pmp/events/__about_l2_cache_events.html
>     >>> <http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/win_reference/pmp/events/about_l2_cache_events.html>
>     >>>
>     >>>     The LLC_MISSES should care about the last level cache, it may be the
>     >>> L3.
>     >>>
>     >>>     I have L2_CACHE_MISS event for this, but maybe you haven't.
>     >>>
>     >>>     Please take it as a non-official information.
>     >>>
>     >>>     Regards,
>     >>>     Michael
>     >>>
>     >>>
>     >>>
>     >>>     -------- Original message --------
>     >>>     Předmět: Re: using oprofile to debug multi-processes programs on
>     >>> linux
>     >>>     Datum: Mon, 16 Sep 2013 08:34:03 -0500
>     >>>     Od: Maynard Johnson <may...@us... <mailto:may...@us...>
>     >>> <mailto:may...@us... <mailto:may...@us...>>>
>     >>>     Komu: benzhi cao <cao...@gm... <mailto:cao...@gm...>
>     >>> <mailto:cao...@gm... <mailto:cao...@gm...>>>
>     >>>     Kopie: oprofile-list <oprofile-list@lists.__sourceforge.net <http://sourceforge.net>
>     >>> <mailto:opr...@li... <mailto:opr...@li...>>>
>     >>>
>     >>>     On 09/14/2013 08:24 PM, benzhi cao wrote:
>     >>>
>     >>>         Thanks so much for your reply. And I can collect the information
>     >>> for every process now.
>     >>>         Also I want to collect the L2 cache miss, so I try to use ophelp
>     >>> to find the event that
>     >>>         I can use for L2 cache miss. And I think the event is LLC_MISSES.
>     >>> But I also find
>     >>>         some guys who use l2_lines_in to profile l2 cache miss, so I was
>     >>> confused, I don't know
>     >>>         which is the right event? What's more, my hardware is intel
>     >>> architecture 64.
>     >>>         Best~
>     >>>         Emily
>     >>>
>     >>>     Adding oprofile-list back to cc so that maybe someone else on the list
>     >>> can help, since Intel is not my primary architecture of expertise.
>     >>>
>     >>>     -Maynard
>     >>>
>     >>>
>     >>>
>     >>>         2013/9/14 Maynard Johnson <may...@us... <mailto:may...@us...>
>     >>> <mailto:may...@us... <mailto:may...@us...>> <mailto:may...@us... <mailto:may...@us...>
>     >>> <mailto:may...@us... <mailto:may...@us...>>>>
>     >>>
>     >>>             On 09/13/2013 02:18 AM, benzhi cao wrote:
>     >>>             > Hi, can Oprofile be  used to profile performance of
>     >>> multi-processes programs ? And if it can, how to see the the performance
>     >>> of each process? (P.S: The online manual shows that it can be used to
>     >>> profile multi-threads programs, but I don't know whether it  can be used
>     >>> for multi-processes). Any help will be appreciated, thanks a lot~
>     >>>             Hi, Emily,
>     >>>             Hopefully, you're using oprofile 0.9.9 so you can use operf
>     >>> instead of the older "legacy" opcontrol commands.  Using operf, you can
>     >>> specify to profile just the particular application (or process) you're
>     >>> interested in. If your application does fork/exec to create new child
>     >>> processes, operf will, by default, collect all sample data for the parent
>     >>> and children, but will aggregate all sample data. (ATTENTION:  0.9.9 has
>     >>> some key bug fixes for operf relating to following forked children.)  You
>     >>> can specify "--separate-thread" (see operf's man page for details) so that
>     >>> samples are separated by process and thread.  If you do collect a
>     >>> --separate-thread profile, be aware that opreport, being a text-based
>     >>> report generator does not handle too many axes of separation very well.
>     >>> You may get a report that looks like a jumbled mess, but would show a list
>     >>> of process IDs near the top of the report.  You could use that list of
>     >>> PIDs to generate per-process reports -- e.g.,
>     >>>         'opreport tgi!
>     >>>
>     >>>      d:<pid!
>     >>>
>     >>>              #> [option
>     >>>             s]'.  In some cases, opreport gives up and tells you that you
>     >>> have to either provide a profile specification (e.g., 'tgid:<pid#">' or,
>     >>> if profiling with multiple events, 'event:<event_name>').  More
>     >>> information on profile specifications can be found at
>     >>> http://oprofile.sourceforge.__net/doc/results.html#profile-__spec
>     >>> <http://oprofile.sourceforge.net/doc/results.html#profile-spec>.
>     >>>
>     >>>             -Maynard
>     >>>
>     >>>
>     >>>             > Best~
>     >>>             > Emily
>     >>>             >
>     >>>             >
>     >>>             >
>     >>>             >
>     >>> ------------------------------__------------------------------__------------------
>     >>>             > How ServiceNow helps IT people transform IT departments:
>     >>>             > 1. Consolidate legacy IT systems to a single system of
>     >>> record for IT
>     >>>             > 2. Standardize and globalize service processes across IT
>     >>>             > 3. Implement zero-touch automation to replace manual,
>     >>> redundant tasks
>     >>>             >
>     >>> http://pubads.g.doubleclick.__net/gampad/clk?id=51271111&iu=__/4140/ostg.clktrk
>     >>> <http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk>
>     >>>             >
>     >>>             >
>     >>>             >
>     >>>             > _________________________________________________
>     >>>             > oprofile-list mailing list
>     >>>             > oprofile-list@lists.__sourceforge.net <http://sourceforge.net>
>     >>> <mailto:opr...@li... <mailto:opr...@li...>>
>     >>> <mailto:oprofile-list@lists. <mailto:oprofile-list@lists.>__sourceforge.net <http://sourceforge.net>
>     >>> <mailto:opr...@li... <mailto:opr...@li...>>>
>     >>>             > https://lists.sourceforge.net/__lists/listinfo/oprofile-list
>     >>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>     >>>             >
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> ------------------------------__------------------------------__------------------
>     >>>     LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>     >>>     1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>     >>> SharePoint
>     >>>     2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>     >>> includes
>     >>>     Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>     >>>
>     >>> http://pubads.g.doubleclick.__net/gampad/clk?id=58041151&iu=__/4140/ostg.clktrk
>     >>> <http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk>
>     >>>
>     >>>     _________________________________________________
>     >>>     oprofile-list mailing list
>     >>>     oprofile-list@lists.__sourceforge.net <http://sourceforge.net>
>     >>> <mailto:opr...@li... <mailto:opr...@li...>>
>     >>>     https://lists.sourceforge.net/__lists/listinfo/oprofile-list
>     >>> <https://lists.sourceforge.net/lists/listinfo/oprofile-list>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>> ------------------------------------------------------------------------------
>     >>> LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
>     >>> 1,500+ hours of tutorials including VisualStudio 2012, Windows 8,
>     >>> SharePoint
>     >>> 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack
>     >>> includes
>     >>> Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
>     >>> http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
>     >>>
>     >>>
>     >>>
>     >>> _______________________________________________
>     >>> oprofile-list mailing list
>     >>> opr...@li... <mailto:opr...@li...>
>     >>> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>     >>>
>     >>
>     >>
>     >
> 
>