Thread: samples lost due to no permanent mapping and no file mapping

Status: Alpha

Brought to you by: maynardj, movement, ssuthiku, wcohen

oprofile-list

samples lost due to no permanent mapping and no file mapping

From: J L. <joh...@ho...> - 2020-07-21 01:03:01

Oprofiling an intensive postgresql workload running on a Lenovo P52 and 
lniux kernel 5.7.4,   I see very high numbers of both of these : 
approximately two-thirds of all user-space samples are being lost.  see 
operf.log below


The operf command I've used is

operf --vmlinux /mnt/bullbild/linux-5.7.4-bullarch/vmlinux --session-dir 
/tmp/oprof_session.200720-195024 --events 
cpu_clk_unhalted:30000045:thread:1:1 --callgraph --separate-cpu 
--system-wide


and it works but tells me

WARNING: Lost samples detected! See 
/tmp/oprof_session.200720-195024/samples/operf.log for details.


Lowering the sampling rate as suggested does not help :  I tried with a 
count of 99999999 and same high lost samples.


I assume oprofile somehow is unable to map the instruction pointer of 
the event to any process?      How does it do this? Is there any setting 
or control or hint that I can set to help it?     or ,  failing that,  
is there any way I can ask opreport to give me the results as raw 
instruction addresses?


Help!


Cheers,  John Lumby

___________________________________________________________

Profiling started at Mon Jul 20 19:50:24 2020
Profiling stopped at Mon Jul 20 19:51:25 2020

-- OProfile/operf Statistics --
Nr. non-backtrace samples: 9435
Nr. kernel samples: 1924
Nr. user space samples: 7511
Nr. samples lost due to sample address not in expected range for domain: 0
Nr. lost kernel samples: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 4967
Nr. user context kernel samples lost due to no app info available: 0
Nr. user samples lost due to no app info available: 0
Nr. backtraces skipped due to no file mapping: 4967
Nr. hypervisor samples dropped due to address out-of-range: 0
Nr. samples lost reported by perf_events kernel: 0

Re: samples lost due to no permanent mapping and no file mapping

From: William C. <wc...@re...> - 2020-07-21 13:49:01

On 7/20/20 9:03 PM, J Lumby wrote:
> Oprofiling an intensive postgresql workload running on a Lenovo P52 and lniux kernel 5.7.4,   I see very high numbers of both of these : approximately two-thirds of all user-space samples are being lost.  see operf.log below
> 
> 
> The operf command I've used is
> 
> operf --vmlinux /mnt/bullbild/linux-5.7.4-bullarch/vmlinux --session-dir /tmp/oprof_session.200720-195024 --events cpu_clk_unhalted:30000045:thread:1:1 --callgraph --separate-cpu --system-wide
> 
> 
> and it works but tells me
> 
> WARNING: Lost samples detected! See /tmp/oprof_session.200720-195024/samples/operf.log for details.
> 
> 
> Lowering the sampling rate as suggested does not help :  I tried with a count of 99999999 and same high lost samples.
> 
> 
> I assume oprofile somehow is unable to map the instruction pointer of the event to any process?      How does it do this? Is there any setting or control or hint that I can set to help it?     or ,  failing that,  is there any way I can ask opreport to give me the results as raw instruction addresses?
> 
> 
> Help!

Hi,

Which version of oprofile was being used?  I just did a release of oprofile-1.4.0 it has commit [a3742f]:

Use the mmap offset to correctly compute the IP location in a file

Newer versions of binutils are now now splitting the file into
multiple mmap loads. The assumption that the mmap for the executable
code in the file starts at the beginning of the file (that the file
offset is always zero) is no longer true. The computation to convert
the IP address into an offset needs to also use the offset.

I wonder if maybe that a number of the samples might be discarded because they are not being mapped to reasonable locations in executables.

Could you give that new release of oprofile a try?

-Will
> 
> 
> Cheers,  John Lumby
> 
> ___________________________________________________________
> 
> Profiling started at Mon Jul 20 19:50:24 2020
> Profiling stopped at Mon Jul 20 19:51:25 2020
> 
> -- OProfile/operf Statistics --
> Nr. non-backtrace samples: 9435
> Nr. kernel samples: 1924
> Nr. user space samples: 7511
> Nr. samples lost due to sample address not in expected range for domain: 0
> Nr. lost kernel samples: 0
> Nr. samples lost due to sample file open failure: 0
> Nr. samples lost due to no permanent mapping: 4967
> Nr. user context kernel samples lost due to no app info available: 0
> Nr. user samples lost due to no app info available: 0
> Nr. backtraces skipped due to no file mapping: 4967
> Nr. hypervisor samples dropped due to address out-of-range: 0
> Nr. samples lost reported by perf_events kernel: 0
> 
> 
> 
> _______________________________________________
> oprofile-list mailing list
> opr...@li...
> https://lists.sourceforge.net/lists/listinfo/oprofile-list

Re: samples lost due to no permanent mapping and no file mapping

From: J L. <joh...@ho...> - 2020-07-21 21:47:23

On 7/21/20 9:48 AM, William Cohen wrote:
> On 7/20/20 9:03 PM, J Lumby wrote:
>> Oprofiling an intensive postgresql workload running on a Lenovo P52 and lniux kernel 5.7.4,   I see very high numbers of both of these : approximately two-thirds of all user-space samples are being lost.  see operf.log below
>>
>>
> Hi,
>
> Which version of oprofile was being used?  I just did a release of oprofile-1.4.0 it has commit [a3742f]:


That run was on 1.3.0


>
> Use the mmap offset to correctly compute the IP location in a file
>
> Newer versions of binutils are now now splitting the file into
> multiple mmap loads. The assumption that the mmap for the executable
> code in the file starts at the beginning of the file (that the file
> offset is always zero) is no longer true. The computation to convert
> the IP address into an offset needs to also use the offset.
>
> I wonder if maybe that a number of the samples might be discarded because they are not being mapped to reasonable locations in executables.
>
> Could you give that new release of oprofile a try?
>
> -Will


I saw 1.4.0 available almost immediately after I posted that. I've now 
tried the same run on 1.4.0 (compiled on the target machine just to be 
sure it compiles with the same bfd headers and libs) and there are mixed 
results.

It is still losing 80% of all userspace events :  see below;

but also the reports now show a completely different and much more 
"what-I-would-expect" list of tickcounts by function,   so maybe I can 
make do with this release.     But this improvement makes me wonder if 
maybe the fix is in the right direction but not quite there yet.    Is 
that possible?      Would switching on the -V "all' debugging help 
narrow down what is happening?       Also if you can point me to the 
code changes I can take a look and maybe spot something.


Cheers,   John

___________________________________________________________

on 1.4.0

operf --vmlinux /mnt/bullbild/linux-5.7.4-bullarch/vmlinux --session-dir 
/tmp/oprof_session.200721-173111 --events 
cpu_clk_unhalted:50000000:thread:1:1 --callgraph --separate-cpu 
--system-wide

Profiling started at Tue Jul 21 17:31:11 2020
Profiling stopped at Tue Jul 21 17:33:30 2020

-- OProfile/operf Statistics --
Nr. non-backtrace samples: 12423
Nr. kernel samples: 2170
Nr. user space samples: 10253
Nr. samples lost due to sample address not in expected range for domain: 0
Nr. lost kernel samples: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 8031
Nr. user context kernel samples lost due to no app info available: 0
Nr. user samples lost due to no app info available: 0
Nr. backtraces skipped due to no file mapping: 8029
Nr. hypervisor samples dropped due to address out-of-range: 0
Nr. samples lost reported by perf_events kernel: 0

Re: samples lost due to no permanent mapping and no file mapping

From: J L. <joh...@ho...> - 2020-07-24 15:47:36

On 7/21/20 5:47 PM, J Lumby wrote:
>
> On 7/21/20 9:48 AM, William Cohen wrote:
>>
>> Could you give that new release of oprofile a try?
>>
>> -Will
>
>
> I saw 1.4.0 available almost immediately after I posted that. I've now 
> tried the same run on 1.4.0 (compiled on the target machine just to be 
> sure it compiles with the same bfd headers and libs) and there are 
> mixed results.
>
> It is still losing 80% of all userspace events :  see below;
>
>
I turned on  --verbose="debug,convert" and from that, discovered the 
explanation for the very high loss.

My workload was forking a large number of processes in sequence, each of 
which did a certain amount of work (typically around 5 seconds-worth on 
a single CPU) and then exited.     I guess operf's handling of the 
mapping events takes something of the same order of time to understand 
each process,  by which time it has gone.

I changed the workload to do all the work in a single continuous process 
and now it works well :

Profiling started at Fri Jul 24 11:15:41 2020
Profiling stopped at Fri Jul 24 11:17:07 2020

-- OProfile/operf Statistics --
Nr. non-backtrace samples: 12791
Nr. kernel samples: 2248
Nr. user space samples: 10543
Nr. samples lost due to sample address not in expected range for domain: 0
Nr. lost kernel samples: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 34
Nr. user context kernel samples lost due to no app info available: 89
Nr. user samples lost due to no app info available: 0
Nr. backtraces skipped due to no file mapping: 34
Nr. hypervisor samples dropped due to address out-of-range: 0
Nr. samples lost reported by perf_events kernel: 0

Re: samples lost due to no permanent mapping and no file mapping

From: William C. <wc...@re...> - 2020-07-28 21:16:31

On 7/24/20 11:47 AM, J Lumby wrote:
> On 7/21/20 5:47 PM, J Lumby wrote:
>>
>> On 7/21/20 9:48 AM, William Cohen wrote:
>>>
>>> Could you give that new release of oprofile a try?
>>>
>>> -Will
>>
>>
>> I saw 1.4.0 available almost immediately after I posted that. I've now tried the same run on 1.4.0 (compiled on the target machine just to be sure it compiles with the same bfd headers and libs) and there are mixed results.
>>
>> It is still losing 80% of all userspace events :  see below;
>>
>>
> I turned on  --verbose="debug,convert" and from that, discovered the explanation for the very high loss.
> 
> My workload was forking a large number of processes in sequence, each of which did a certain amount of work (typically around 5 seconds-worth on a single CPU) and then exited.     I guess operf's handling of the mapping events takes something of the same order of time to understand each process,  by which time it has gone.

One though is how exactly are you starting doing the profiling?  Like the the following:

operf <command_to_profile>

Or attaching to a running process?

operf --pid <pid>

Or doing systemwide monitoring with --system-wide?

You might check to see if the linux perf command has a similar problem with the quick spawn and death of processes.
operf is using the same mechanism in the kernel to collect performance event samples.  There are some cases where the scanning of /proc can get behind the rapid creation and death of processes.  It would be useful to know if the problem lies with oprofile or is also seen in perf.

-Will

> 
> I changed the workload to do all the work in a single continuous process and now it works well :
> 
> Profiling started at Fri Jul 24 11:15:41 2020
> Profiling stopped at Fri Jul 24 11:17:07 2020
> 
> -- OProfile/operf Statistics --
> Nr. non-backtrace samples: 12791
> Nr. kernel samples: 2248
> Nr. user space samples: 10543
> Nr. samples lost due to sample address not in expected range for domain: 0
> Nr. lost kernel samples: 0
> Nr. samples lost due to sample file open failure: 0
> Nr. samples lost due to no permanent mapping: 34
> Nr. user context kernel samples lost due to no app info available: 89
> Nr. user samples lost due to no app info available: 0
> Nr. backtraces skipped due to no file mapping: 34
> Nr. hypervisor samples dropped due to address out-of-range: 0
> Nr. samples lost reported by perf_events kernel: 0
>

Re: samples lost due to no permanent mapping and no file mapping

From: John L. <joh...@ho...> - 2020-07-31 15:32:39

On 07/28/20 17:00, William Cohen wrote:
>
> One though is how exactly are you starting doing the profiling?  Like the the following:
>
> operf <command_to_profile>
>
> Or attaching to a running process?
>
> operf --pid <pid>
>
> Or doing systemwide monitoring with --system-wide?

Thanks William,  my operf command is

operf --vmlinux /mnt/bullbild/linux-5.7.4-bullarch/vmlinux 
--session-dir=/tmp/oprof_session.200724_120854 --callgraph 
--separate-cpu --system-wide

so yes,  system-wide.
>
> You might check to see if the linux perf command has a similar problem with the quick spawn and death of processes.
> operf is using the same mechanism in the kernel to collect performance event samples.  There are some cases where the scanning of /proc can get behind the rapid creation and death of processes.  It would be useful to know if the problem lies with oprofile or is also seen in perf.
I will try with perf some time but have never used that before (did not 
know about it) so may be a little while getting to it.

Cheers,  John