From: Maynard J. <may...@us...> - 2011-04-04 15:13:37
|
Mitsutaka Amano wrote: > On Wed, Mar 30, 2011 at 11:00 PM, Maynard Johnson <may...@us...> wrote: >> Mitsutaka Amano wrote: >> [snip] >>> I will inform you my accurately process based on your information. >>> I tried to run commands. my app's symbol could show using >>> --separate=library option. >>> >>> oprofile log will write on tmpfs because of my rootfs is jffs2 >>> (JFFS2 does not provide OProfile requirements. http://elinux.org/Profilers) >>> >>> [root@localhost TP]# mount -t tmpfs /dev/shm /var/lib/oprofile >>> [root@localhost TP]# mkdir /var/lib/oprofile/samples >>> >>> >>> [root@localhost TP]# opcontrol --init >>> [root@localhost TP]# opcontrol --reset >>> [root@localhost TP]# opcontrol --setup --no-vmlinux --separate=library >>> [root@localhost TP]# opcontrol --start >>> Using 2.6+ OProfile kernel interface. >>> Using log file /var/lib/oprofile/samples/oprofiled.log >>> Daemon started. >>> Profiler running. >>> [root@localhost TP]# ./thread_test_a 100 >>> 100 threads create OK. pid: 1993 >>> . (Waiting for 1 min. and then Press Ctrl+C to kill ./thread_test_a) >>> >>> [root@localhost TP]# opcontrol --stop >>> Stopping profiling. >>> [root@localhost TP]# opcontrol --shutdown >>> Killing daemon. >>> [root@localhost TP]# opreport -l ./thread_test_a >> >> >>> WARNING! The OProfile kernel driver reports sample buffer overflows. >>> Such overflows can result in incorrect sample attribution, invalid sample >>> files and other symptoms. See the oprofiled.log for details. >>> You should adjust your sampling frequency to eliminate (or at least minimize) >>> these overflows. >>> CPU: e500, speed 1320 MHz (estimated) >>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 >>> 0000 >>> samples % image name symbol name >>> 7 15.2174 libc-2.12.so __nanosleep_nocancel >>> 6 13.0435 libc-2.12.so __libc_disable_asynccancel >>> 5 10.8696 libc-2.12.so _wordcopy_fwd_aligned >>> 4 8.6957 libc-2.12.so _dl_addr >>> 3 6.5217 ld-2.12.so strcmp >>> 3 6.5217 libpthread-2.12.so start_thread >>> 2 4.3478 ld-2.12.so _dl_allocate_tls >>> 2 4.3478 ld-2.12.so _dl_relocate_object >>> 2 4.3478 ld-2.12.so check_match.9343 >>> 2 4.3478 ld-2.12.so do_lookup_x >>> 2 4.3478 libc-2.12.so calloc >>> 2 4.3478 libpthread-2.12.so pthread_create@@GLIBC_2.1 >>> 1 2.1739 libc-2.12.so _IO_setb >>> 1 2.1739 libc-2.12.so __libc_enable_asynccancel >>> 1 2.1739 libc-2.12.so __libc_sigaction >>> 1 2.1739 libc-2.12.so sigprocmask >>> 1 2.1739 libc-2.12.so sleep >>> 1 2.1739 libpthread-2.12.so do_clone.clone.0 >> >> OK, good. So you are now getting samples for for thread_test_a. >> >> >>> [root@localhost TP]# opreport -c ./thread_test_a >> >> First, there's no point in running 'opreport -c' on profile data that was collected without enabling callgraph (via 'opcontrol --callgraph=<n>'). Second, so you're back to the problem of seeing the "start > end" error. I presume you're still getting the error with the _SDA_BASE_ symbol, so why aren't you adding the '--exclude-symbols' option? If you exclude _SDA_BASE_ and still get the error, generate a report with '--verbose=all' and look at the tail of the report to see what symbol it was processing when it died; then exclude that one, too. Or else, build a newer version of binutils and re-build oprofile with that new binutils. > I forget below option(--callgraph and --exclude-symbols). > > [root@localhost TP]# mount -t tmpfs /dev/shm /var/lib/oprofile > [root@localhost TP]# mkdir /var/lib/oprofile/samples > [root@localhost TP]# opcontrol --init > [root@localhost TP]# opcontrol --reset > [root@localhost TP]# opcontrol --setup --no-vmlinux --separate=library > --callgraph=20 > [root@localhost TP]# opcontrol --status > Daemon not running > Event 0: CPU_CLK:100000:0:1:1 > Separate options: library > vmlinux file: none > Image filter: none > Call-graph depth: 20 > [root@localhost TP]# opcontrol --start > Using 2.6+ OProfile kernel interface. > Using log file /var/lib/oprofile/samples/oprofiled.log > Daemon started. > Profiler running. > [root@localhost TP]# ./thread_test_a 100 > 100 threads create OK. pid: 2044 > [root@localhost TP]# opcontrol --stop > Stopping profiling. > [root@localhost TP]# opcontrol --shutdown > Killing daemon. > [root@localhost TP]# opreport -c --exclude-symbols="_SDA_BASE_" > WARNING! The OProfile kernel driver reports sample buffer overflows. > Such overflows can result in incorrect sample attribution, invalid sample > files and other symptoms. See the oprofiled.log for details. > You should adjust your sampling frequency to eliminate (or at least minimize) > these overflows. > warning: /no-vmlinux could not be found. > warning: [heap] (tgid:2039 range:0x10021000-0x10080000) could not be found. > warning: [heap] (tgid:2147 range:0x100d5000-0x1013d000) could not be found. > warning: [vdso] (tgid:2147 range:0x100000-0x103000) could not be found. > CPU: e500, speed 1320 MHz (estimated) > Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 > 0000 > opreport error: profile_t::samples_range(): start > end something wrong with ker > nel or module layout ? > please report problem to opr...@li... > > When I add --verbose=all option, A part of log is the following. > > [root@localhost TP]# opreport -c --exclude-symbols='_SDA_BASE_' --verbose=all > ~~snip~~ > Caller sym: __do_global_ctors_aux filepos 9d8-a20 > symbol call___do_global_ctors_aux, value 4f0 > start a20, end a3c > in section .text, filepos 530 > Caller sym: call___do_global_ctors_aux filepos a20-a3c > symbol _fini, value 0 > start a3c, end a5c > in section .fini, filepos a3c > Caller sym: _fini filepos a3c-a5c > symbol _GLOBAL_OFFSET_TABLE_, value 8 > start c68, end 8c74 > in section .got, filepos c60 > Caller sym: _GLOBAL_OFFSET_TABLE_ filepos c68-8c74 > symbol _SDA_BASE_, value 8014 > start 8c74, end 10000260a > in section .got, filepos c60 > opreport error: profile_t::samples_range(): start > end something wrong with ker > please report problem to opr...@li... > > Can't the --exclude-symbols option work or not? mmm . . . right. The --exclude-symbols workaround does not work when generating callgraph reports, because even if you're filtering out a given symbol, that symbol still may show up in the calltree for a different symbol, and we don't filter that out. Coincidentally, I've gotten another report of this very same issue from a user within my company and have been able to reproduce it myself. I'm still debugging the problem, but what I have found out so far is that we're seeing the problem when oprofile is built with *newer* versions of binutils. On a sles 10 sp3 system with binutils 2.16.91, it works OK. -Maynard > > Currently binutils version is 2.20.51.0.2 (from RHEL6). I'll try the > latest version. > > Mitsutaka, > >> >> -Maynard >> >>> WARNING! The OProfile kernel driver reports sample buffer overflows. >>> Such overflows can result in incorrect sample attribution, invalid sample >>> files and other symptoms. See the oprofiled.log for details. >>> You should adjust your sampling frequency to eliminate (or at least minimize) >>> these overflows. >>> CPU: e500, speed 1320 MHz (estimated) >>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 >>> 0000 >>> opreport error: profile_t::samples_range(): start > end something wrong with ker >>> nel or module layout ? >>> please report problem to opr...@li... >>> [root@localhost TP]# opreport -d ./thread_test_a >>> WARNING! The OProfile kernel driver reports sample buffer overflows. >>> Such overflows can result in incorrect sample attribution, invalid sample >>> files and other symptoms. See the oprofiled.log for details. >>> You should adjust your sampling frequency to eliminate (or at least minimize) >>> these overflows. >>> CPU: e500, speed 1320 MHz (estimated) >>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 >>> 0000 >>> vma samples % image name symbol name >>> 000b86e4 7 15.2174 libc-2.12.so __nanosleep_nocancel >>> 000b8720 1 14.2857 >>> 000b8724 2 28.5714 >>> 000b872c 2 28.5714 >>> 000b8734 2 28.5714 >>> 0010bca0 6 13.0435 libc-2.12.so __libc_disable_asynccancel >>> 0010bcbc 5 83.3333 >>> 0010bcc0 1 16.6667 >>> 00092ef0 5 10.8696 libc-2.12.so _wordcopy_fwd_aligned >>> 00092f38 5 100.000 >>> 0013f5e0 4 8.6957 libc-2.12.so _dl_addr >>> 0013f754 2 50.0000 >>> 0013f784 1 25.0000 >>> 0013f7b8 1 25.0000 >>> 00019740 3 6.5217 ld-2.12.so strcmp >>> 0001974c 1 33.3333 >>> 00019750 1 33.3333 >>> 000197f8 1 33.3333 >>> 000070ec 3 6.5217 libpthread-2.12.so start_thread >>> 000070ec 1 33.3333 >>> 00007104 2 66.6667 >>> 00012b38 2 4.3478 ld-2.12.so _dl_allocate_tls >>> 00012b38 2 100.000 >>> 0000c040 2 4.3478 ld-2.12.so _dl_relocate_object >>> 0000c794 1 50.0000 >>> 0000c79c 1 50.0000 >>> 0000a1ac 2 4.3478 ld-2.12.so check_match.9343 >>> 0000a1d0 1 50.0000 >>> 0000a2a8 1 50.0000 >>> 0000a348 2 4.3478 ld-2.12.so do_lookup_x >>> 0000a450 1 50.0000 >>> 0000aa68 1 50.0000 >>> 0008b200 2 4.3478 libc-2.12.so calloc >>> 0008b298 2 100.000 >>> 00005e98 2 4.3478 libpthread-2.12.so pthread_create@@GLIBC_2.1 >>> 000064a4 1 50.0000 >>> 000066a0 1 50.0000 >>> 000845b8 1 2.1739 libc-2.12.so _IO_setb >>> 000845b8 1 100.000 >>> 0010bd44 1 2.1739 libc-2.12.so __libc_enable_asynccancel >>> 0010bd74 1 100.000 >>> 000359b4 1 2.1739 libc-2.12.so __libc_sigaction >>> 00035a28 1 100.000 >>> 00035b20 1 2.1739 libc-2.12.so sigprocmask >>> 00035b78 1 100.000 >>> 000b838c 1 2.1739 libc-2.12.so sleep >>> 000b84b8 1 100.000 >>> 00005cb4 1 2.1739 libpthread-2.12.so do_clone.clone.0 >>> 00005cb4 1 100.000 >>> >>> Also the all of callgraph couldn't be show due to show my TP. I think. >>> >>> [root@localhost TP]# opreport -c >>> WARNING! The OProfile kernel driver reports sample buffer overflows. >>> Such overflows can result in incorrect sample attribution, invalid sample >>> files and other symptoms. See the oprofiled.log for details. >>> You should adjust your sampling frequency to eliminate (or at least minimize) >>> these overflows. >>> warning: /no-vmlinux could not be found. >>> warning: [heap] (tgid:2096 range:0x100d5000-0x1013d000) could not be found. >>> warning: [stack] (tgid:1984 range:0x5be41000-0x5be62000) could not be found. >>> CPU: e500, speed 1320 MHz (estimated) >>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 >>> 0000 >>> opreport error: profile_t::samples_range(): start > end something wrong with ker >>> nel or module layout ? >>> please report problem to opr...@li... >>> >>> Mitsutaka, >>> >>>> >>>> >>>>> >>>>> [root@localhost TP]# opreport -c >>>>> warning: /no-vmlinux could not be found. >>>>> warning: [heap] (tgid:2890 range:0x100d5000-0x1013d000) could not be found. >>>>> CPU: e500, speed 1320 MHz (estimated) >>>>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 >>>>> 0000 >>>>> samples % image name app name symbol name >>>>> ------------------------------------------------------------------------------- >>>>> 1553 55.6830 no-vmlinux no-vmlinux /no-vmlinux >>>>> 1553 100.000 no-vmlinux no-vmlinux >>>>> /no-vmlinux [self] >>>>> ------------------------------------------------------------------------------- >>>>> 292 10.4697 bash bash /bin/bash >>>>> 292 100.000 bash bash >>>>> /bin/bash[self] >>>>> ------------------------------------------------------------------------------- >>>>> 81 2.9043 libc-2.12.so libc-2.12.so >>>>> __gconv_transform_ascii_internal >>>>> 81 100.000 libc-2.12.so libc-2.12.so >>>>> __gconv_transform_ascii_internal [self] >>>>> ~~snip~~ >>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> Now, as for your error . . . There are clues from the messages below that >>>>>>>>> imply >>>>>>>>> you're using oprofile 0.9.6 (or something close to that). Correct? >>>>>>>>> There's >>>>>>>>> been at least one fix committed to oprofile upstream that addresses your >>>>>>>>> symptom >>>>>>>>> since 0.9.6 was released. Please pull down the latest code from >>>>>>>>> oprofile's git >>>>>>>>> repo (git clone >>>>>>>>> git://oprofile.git.sourceforge.net/gitroot/oprofile/oprofile ) >>>>>>>>> and try it again. If you get the same error, please do the following: >>>>>>>>> >>>>>>>>> opreport --symbols --verbose=all> op-verbose.txt >>>>>>> I'm sorry to be duplicate entry. opreport log is the follow URL. >>>>>>> >>>>>>> http://marc.info/?l=oprofile-list&m=130106337208283&w=2 >>>>>>> >>>>>>> This log is the latest code from git repo >>>>>>> 4e8426faec431858e4c8b2fc5acd52e44b9c3b54 >>>>>>> >>>>>>> Also I attached below command log(op-verbose.txt). >>>>>> >>>>>> Please do not attach such big logs to mailing list postings. Instead, try to copy/paste just the relevant parts (like I suggested in my earlier response to you. >>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> Open up the op-verbose.txt file created above and find the error message >>>>>>>>> at the >>>>>>>>> bottom. Now scroll back until you find "bfd_info::get_symbols() for" some >>>>>>>>> binary. Copy the text from that point down to the error message and >>>>>>>>> include it >>>>>>>>> in your response to this message. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> -Maynard >>>>>>>> >>>>>>>> Mitsutaka, >>>>>>>> I just saw your other posting pop up on the list -- the one with the verbose >>>>>>>> output attached. You anticipated my request. :-) The fact that opreport >>>>>>>> is choking on _SDA_BASE_ seems very familiar. I'm pretty sure I encountered >>>>>>>> this once myself a couple years ago. And if my memory serves me correctly, >>>>>>>> I ran oparchive and pulled the data over to another system (different >>>>>>>> distro) and was able to run opreport successfully. I can't find any notes >>>>>>>> on the topic anymore, so I can't be sure, but I believe the assumption was >>>>>>>> that the version of binutils (and its libbfd) had something to do with the >>>>>>>> problem, which is why running opreport on the same sample data on a >>>>>>>> different system/distro worked. If you can, please try to do that. As a >>>>>>>> workaround, you can also use "--exclude-symbols=_SDA_BASE" with opreport. >>>>>>> Thank you for your re-checking. I tried to do using >>>>>>> "--exclude-symbols=_SDA_BASE". >>>>>> So did this option get rid of the "profile_t::samples_range(): start > end" error? It seems so, since you're now reporting a new problem (above, where opreport is saying no samples found for thread_test_a). >>>>> >>>>> Yes --exclude-symbols option got the same error. >>>>> >>>>> [root@localhost TP]# opreport -c --exclude-symbols=_SDA_BASE >>>> Is this -^ what you actually used for the symbol name? It's spelled wrong. My fault. I mis-spelled it above. It should be "_SDA_BASE_". >>>> >>>> -Maynard >>>> >>>>> WARNING! The OProfile kernel driver reports sample buffer overflows. >>>>> Such overflows can result in incorrect sample attribution, invalid sample >>>>> files and other symptoms. See the oprofiled.log for details. >>>>> You should adjust your sampling frequency to eliminate (or at least minimize) >>>>> these overflows. >>>>> warning: /no-vmlinux could not be found. >>>>> warning: [heap] (tgid:1907 range:0x100d5000-0x1013d000) could not be found. >>>>> warning: [vdso] (tgid:1907 range:0x100000-0x103000) could not be found. >>>>> CPU: e500, speed 1320 MHz (estimated) >>>>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 10 >>>>> 0000 >>>>> opreport error: profile_t::samples_range(): start > end something wrong with ker >>>>> nel or module layout ? >>>>> please report problem to opr...@li... >>>>> >>>>> As you said, I think what opreport is saying no samples found for >>>>> thread_test_a is a new problem. This problem is happen on nfsroot. >>>>> >>>>> >>>>> Mitsutaka >>>>> >>>>>>> But It could be showed the callgraph. >>>>>> Do you mean you're not getting callgraph information for thread_test_a? But as I said above, the message you're getting is telling you there's NO profile data at all for thread_test_a, so there certainly wouldn't be any callgraph data for it. >>>>>>> Then I understand the cause might be the version of binutils or >>>>>>> libbfd. I need to check them. >>>>>> To be clear, your version of binutils has nothing to do with callgraph. It *may* have something to do with opreport failing to process the _SDA_BASE_ symbol. That was just a theory of mine, which you could prove or disprove by pulling down a newer release of binutils from their home page and building oprofile with it (using the '--with-binutils' configure option). >>>>>> >>>>>> -Maynard >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Mitsutaka >>>>>>> >>>>>>>> >>>>>>>> -Maynard >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> By the way, do >>>>>>>>>> >>>>>>>>>> mount -t tmpfs /dev/shm /var/lib/oprofile >>>>>>>>>> mkdir /var/lib/oprofile/samples >>>>>>>>>> opcontrol --init >>>>>>>>>> opcontrol --reset >>>>>>>>>> opcontrol --start >>>>>>>>>> >>>>>>>>>> Profiling is started. >>>>>>>>>> >>>>>>>>>> ./thread_test_a >>>>>>>>>> [Ctrl+C] >>>>>>>>>> >>>>>>>>>> opcontrol --stop >>>>>>>>>> opreport -c >>>>>>>>>> # opreport -c >>>>>>>>>> Overflow stats not available >>>>>>>>>> warning: /no-vmlinuz could not be found. >>>>>>>>>> warning: [heap] (tgid:3210 range:0x10021000-0x100de000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [heap] (tgid:3633 range:0x20527000-0x20570000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [heap] (tgid:6549 range:0x10027000-0x10048000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [heap] (tgid:6555 range:0x10027000-0x10048000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [heap] (tgid:6558 range:0x100d5000-0x1013e000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [stack] (tgid:3210 range:0x5beba000-0x5bedb000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [stack] (tgid:3633 range:0x5be40000-0x5be61000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [stack] (tgid:6516 range:0x5bf42000-0x5bf63000) could not be >>>>>>>>>> found. >>>>>>>>>> warning: [vdso] (tgid:6558 range:0x100000-0x103000) could not be found. >>>>>>>>>> CPU: e500, speed 1320 MHz (estimated) >>>>>>>>>> Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit >>>>>>>>>> mask) count 100000 >>>>>>>>>> opreport error: profile_t::samples_range(): start> end something >>>>>>>>>> wrong with kernel or module layout ? >>>>>>>>>> please report problem to opr...@li... >>>>>>>>>> >>>>>>>>>> configuration is the following. >>>>>>>>>> # cat .oprofile/daemonrc >>>>>>>>>> SESSION_DIR=/var/lib/oprofile >>>>>>>>>> NR_CHOSEN=0 >>>>>>>>>> SEPARATE_LIB=0 >>>>>>>>>> SEPARATE_KERNEL=0 >>>>>>>>>> SEPARATE_THREAD=0 >>>>>>>>>> SEPARATE_CPU=0 >>>>>>>>>> VMLINUX=none >>>>>>>>>> IMAGE_FILTER= >>>>>>>>>> CPU_BUF_SIZE=0 >>>>>>>>>> CALLGRAPH=0 >>>>>>>>>> XENIMAGE=none >>>>>>>>>> >>>>>>>>>> Why did I get this error? I don't know where was the problem(pthread, >>>>>>>>>> jffs2, arch/powerpc/oprofile..). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mitsutaka >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> Enable your software for Intel(R) Active Management Technology to meet >>>>>>>>>> the >>>>>>>>>> growing manageability and security demands of your customers. Businesses >>>>>>>>>> are taking advantage of Intel(R) vPro (TM) technology - will your >>>>>>>>>> software >>>>>>>>>> be a part of the solution? Download the Intel(R) Manageability Checker >>>>>>>>>> today! http://p.sf.net/sfu/intel-dev2devmar >>>>>>>>>> _______________________________________________ >>>>>>>>>> oprofile-list mailing list >>>>>>>>>> opr...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/oprofile-list >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> Enable your software for Intel(R) Active Management Technology to meet the >>>>>>>>> growing manageability and security demands of your customers. Businesses >>>>>>>>> are taking advantage of Intel(R) vPro (TM) technology - will your software >>>>>>>>> be a part of the solution? Download the Intel(R) Manageability Checker >>>>>>>>> today! http://p.sf.net/sfu/intel-dev2devmar >>>>>>>>> _______________________________________________ >>>>>>>>> oprofile-list mailing list >>>>>>>>> opr...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/oprofile-list >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> |