Thank you for your enthusiastic guidance, sir:

I could get nice profiling results with the command list. (but with another issue I can not understand, described below) 

Command List:
> rm -rf /var/lib/oprofile
> rm -rf /root/.oprofile
> opcontrol --init
> opcontrol --no-vmlinux
> opcontrol --setup --event=CPU_CYCLES:10000 --separate=lib,kernel
> opcontrol --start --image=all
> ./array
> ./../mpeg2dec/oprofile_results/mpeg2dec -b ../mpeg2dec/input_base/input_base_4CIF_96bps.mpg -o3  output_base_4CIF_96bps_%03d
> opcontrol --dump
> opreport -l array
> opreport -l ./../mpeg2dec/oprofile_results/mpeg2dec

Nice Results:
[root]$ opreport -l array
Using /var/lib/oprofile/samples/ for samples directory.
warning: /no-vmlinux could not be found.
CPU: ARM Cortex-A9, speed 1998 MHz (estimated)
Counted CPU_CYCLES events (CPU cycle) with a unit mask of 0x00 (No unit mask) count 10000
samples  %        image name               symbol name
70547    95.8858  array                    slow_multiply
1181      1.6052  array                    fast_multiply
983       1.3361  array                    main
828       1.1254  no-vmlinux               /no-vmlinux
29        0.0394  ld-2.13.so               /lib/arm-linux-gnueabi/ld-2.13.so
6         0.0082  libc-2.13.so             /lib/arm-linux-gnueabi/libc-2.13.so
[root]$ opreport -l  mpeg2decode
Using /var/lib/oprofile/samples/ for samples directory.
warning: /no-vmlinux could not be found.
CPU: ARM Cortex-A9, speed 1998 MHz (estimated)
Counted CPU_CYCLES events (CPU cycle) with a unit mask of 0x00 (No unit mask) count 10000
samples  %        image name               symbol name
23899    16.9694  mpeg2decode              conv420to422
23648    16.7912  mpeg2decode              store_ppm_tga
16695    11.8542  mpeg2decode              conv422to444
16072    11.4119  mpeg2decode              Decode_Picture
15934    11.3139  mpeg2decode              Fast_IDCT
15133    10.7451  no-vmlinux               /no-vmlinux
14614    10.3766  mpeg2decode              putbyte
9260      6.5750  mpeg2decode              form_component_prediction
1631      1.1581  mpeg2decode              Flush_Buffer
825       0.5858  mpeg2decode              Decode_MPEG2_Intra_Block
481       0.3415  mpeg2decode              form_prediction.constprop.0
415       0.2947  mpeg2decode              Decode_MPEG2_Non_Intra_Block
304       0.2159  mpeg2decode              Get_Bits
200       0.1420  mpeg2decode              Show_Bits
195       0.1385  mpeg2decode              macroblock_modes
........

(*) However, there is an issue that I can hardly understand: after I repeated the commands list above several times (with the same event and sample rate, just exactly the same command sequences), it may give the different results, the total sample numbers are only several tens and no samples for user application functions.

In my thought, the profiling shoud be at least similar to above nice results.

Bad results:
[root]$ opreport -l array
Using /var/lib/oprofile/samples/ for samples directory.
warning: /no-vmlinux could not be found.
CPU: ARM Cortex-A9, speed 1998 MHz (estimated)
Counted CPU_CYCLES events (CPU cycle) with a unit mask of 0x00 (No unit mask) count 10000
samples  %        image name               symbol name
63       92.6471  no-vmlinux               /no-vmlinux
5         7.3529  libc-2.13.so             /lib/arm-linux-gnueabi/libc-2.13.so
(0)-(linaro-chenp)-[Sat Apr 27][15:06:36]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/workspace]
[root]$ opreport -l mpeg2decode
Using /var/lib/oprofile/samples/ for samples directory.
warning: /no-vmlinux could not be found.
CPU: ARM Cortex-A9, speed 1998 MHz (estimated)
Counted CPU_CYCLES events (CPU cycle) with a unit mask of 0x00 (No unit mask) count 10000
samples  %        image name               symbol name
69       95.8333  no-vmlinux               /no-vmlinux
3         4.1667  libc-2.13.so             /lib/arm-linux-gnueabi/libc-2.13.so

Besides, after this certain time, all following profiling with oprofile will give such kind of oprofile results. I can get the above nice profiling results again only when I reboot the system. 
I conducted dozen times of experiments. The repeat number of the above command sequence after which the results turn to the 'bad' kind (only tens of sample of kernel) is not regular. It just suddenly  becomes such a situation after seval times of profiling.

Hope I describe the problem clearly

Regards


On Sat, Apr 27, 2013 at 12:07 AM, Maynard Johnson <maynardj@us.ibm.com> wrote:
On 04/26/2013 09:25 AM, RocChen wrote:
>
> Very sorry for forgetting cc to the maillist~
>
>
> ---------- Forwarded message ----------
> From: *RocChen* <singleroc@gmail.com <mailto:singleroc@gmail.com>>
> Date: Fri, Apr 26, 2013 at 10:14 PM
> Subject: Re: no sample when profiling ARM Cortex-A9 with Linux kernel 3.3
> To: Koteswararao Nelakurthi <knelakurthi@mvista.com <mailto:knelakurthi@mvista.com>>
>
>
> Hai,Koteswararao
>
> Thanks for for quick reply.
>
> Here is my profiling procedure (opcontrol: oprofile 0.9.7 compiled on Apr 26 2013 08:47:51):
>
> [root]$ rm -rf /var/lib/oprofile/
> (0)-(linaro)-[Fri Apr 26][22:03:05]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ rm -rf ~/.oprofile/
> (0)-(linaro)-[Fri Apr 26][22:03:13]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ opcontrol --init
> (0)-(linaro)-[Fri Apr 26][22:03:29]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ opcontrol --setup --event=CPU_CYCLES:1000 --separate=all --no-vmlinux

The '--separate=all' option categorizes your samples by kernel, library, thread, and CPU.  The 'thread' and 'cpu' categorization is rarely needed and just leads to confusion when trying to generate reports.  Use '--separate=lib,kernel' instead.

CPU_CYCLES:1000 is a very high sampling rate, which is undoubtedly why you get the "WARNING! The OProfile kernel driver reports sample buffer overflows" message.  I recommend a count of 100000 (or maybe even higher) versus 1000.

> (0)-(linaro)-[Fri Apr 26][22:04:20]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ opcontrol --start --image=../mpeg2-oprofiling/src/mpeg2dec/mpeg2decode

For starters, don't use the '--image' option.  Please revert that with 'opcontrol --image=all' and try again.

-Maynard
> Using 2.6+ OProfile kernel interface.
> Using log file /var/lib/oprofile/samples/oprofiled.log
> Daemon started.
> Profiler running.
> (0)-(linaro)-[Fri Apr 26][22:05:25]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ time ./../mpeg2-oprofiling/src/mpeg2dec/mpeg2decode -b input_base_4CIF_96bps.mpg -o3 output_base_4CIF_96bps_                                                                                           %03d
> saving output_base_4CIF_96bps_000.ppm
> saving output_base_4CIF_96bps_001.ppm
> saving output_base_4CIF_96bps_002.ppm
> saving output_base_4CIF_96bps_003.ppm
> saving output_base_4CIF_96bps_004.ppm
> saving output_base_4CIF_96bps_005.ppm
> saving output_base_4CIF_96bps_006.ppm
> saving output_base_4CIF_96bps_007.ppm
> saving output_base_4CIF_96bps_008.ppm
>
> real    0m1.593s
> user    0m1.490s
> sys     0m0.100s
> (0)-(linaro)-[Fri Apr 26][22:05:54]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ opcontrol --dump
> (0)-(linaro)-[Fri Apr 26][22:06:06]-[.=~/Workspace/Zynq/testbench-zynq/hotcode-profiling/mediabench2_video/mpe                                                                                           g2dec/oprofile_results]
> [root]$ opreport -l ../mpeg2-oprofiling/src/mpeg2dec/mpeg2decode
> WARNING! The OProfile kernel driver reports sample buffer overflows.
> Such overflows can result in incorrect sample attribution, invalid sample
> files and other symptoms.  See the oprofiled.log for details.
> You should adjust your sampling frequency to eliminate (or at least minimize)
> these overflows.
> error: no sample files found: profile specification too strict ?
>
> ******************************************
> I review the dmesg log for something related with the pmu:
>
>> dmesg | grep PMU
>  hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 7 counters available
>> dmesg | grep pmu
>  registering platform device 'arm-pmu' id 0
>
>
>
>
> On Fri, Apr 26, 2013 at 9:52 PM, Koteswararao Nelakurthi <knelakurthi@mvista.com <mailto:knelakurthi@mvista.com>> wrote:
>
>     >>opcontrol --start --image=<application name>
>     Provide binary application name .
>     ex. opcontrol --start --image=array
>
>     Regards
>     koteswararao
>
>
>     On Fri, Apr 26, 2013 at 7:17 PM, Koteswararao Nelakurthi <knelakurthi@mvista.com <mailto:knelakurthi@mvista.com>> wrote:
>
>         Dear RocChen,
>
>         I hope i understood your situation.From the log your showed
>         in previous mail, your are successfully updated the oprofile
>         userland tool.
>         Coming to profiling of applications, you need do it as below
>
>         rm -rf /var/lib/oprofile/
>         rm -rf /root/.oprofile
>
>         opcontrol --start --image=<application name>
>
>         gcc -g <applicationname> -o <binary application name>
>         ex: gcc -g array.c -o array
>
>         ./applicationame
>         ex. ./array
>
>         opcontrol --dump
>
>         opreport
>
>         array is sample application which is simply doing some multiplication etc.
>         you can use any application that will put load over CPU so that i can use
>         H/W counter to count the samples.
>
>         Light Load application will not generate Events and CPU can't use much
>         of it's time to it and hence samples might not be generated.
>
>
>         Regards
>         koteswararao
>
>
>