From: <Eri...@en...> - 2003-11-28 12:35:10
|
Hello I'm measuring L2_LINES_OUT on a 4-way PIII (see /proc/cpuinfo below) using oprofile-0.7, and CPU0 happens not to be profiled, while all other CPUs are. Is this a known problem? Thanks. -- Eric PS: in case you reply, please CC: me, I'm not on the alias. ---- processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping : 3 cpu MHz : 549.474 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1094.45 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping : 3 cpu MHz : 549.474 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1097.72 processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping : 3 cpu MHz : 549.474 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1097.72 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping : 3 cpu MHz : 549.474 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1097.72 |
From: John L. <le...@mo...> - 2003-11-28 13:54:28
|
On Fri, Nov 28, 2003 at 01:35:05PM +0100, Eri...@en... wrote: > I'm measuring L2_LINES_OUT on a 4-way PIII (see /proc/cpuinfo below) > using oprofile-0.7, and CPU0 happens not to be profiled, while all other > CPUs are. > > Is this a known problem? No, it is not. A little more info would be good. Are you sure all CPUs are loaded ? What kernel version ? What are your results ? What is the workload ? regards john -- Khendon's Law: If the same point is made twice by the same person, the thread is over. |
From: <Eri...@en...> - 2003-11-28 14:21:01
|
> > I'm measuring L2_LINES_OUT on a 4-way PIII (see /proc/cpuinfo below) > > using oprofile-0.7, and CPU0 happens not to be profiled, while all other > > CPUs are. > > > > Is this a known problem? > > No, it is not. A little more info would be good. Are you sure all CPUs > are loaded ? What kernel version ? What are your results ? What is the > workload ? I was not sure what info would be needful. kernel: 2.4.20 + syscall_cpu_affinity patch I'm running network benchmarks. When I attach the NIC's irq to CPU0, oprofile/L2_LINES_OUT reports very very few cache misses in network-related kernel functions. When attaching the NIC's irq to either CPU1, CPU2, or CPU3, oprofile/L2_LINES_OUT reports a lot more cache misses. -- Eric |
From: John L. <le...@mo...> - 2003-11-28 15:11:15
|
On Fri, Nov 28, 2003 at 03:20:56PM +0100, Eri...@en... wrote: > > No, it is not. A little more info would be good. Are you sure all CPUs > > are loaded ? What kernel version ? What are your results ? What is the > > workload ? > > I was not sure what info would be needful. > > kernel: 2.4.20 + syscall_cpu_affinity patch > > I'm running network benchmarks. When I attach the NIC's irq to CPU0, > oprofile/L2_LINES_OUT reports very very few cache misses in > network-related kernel functions. When attaching the NIC's irq to either > CPU1, CPU2, or CPU3, oprofile/L2_LINES_OUT reports a lot more cache > misses. So what makes you sure it's oprofile rather than reality ? What happens if you load each CPU with a cpu-consuming infinite loop, and run CPU_CLK_UNHALTED ? regards john -- Khendon's Law: If the same point is made twice by the same person, the thread is over. |
From: <Eri...@en...> - 2003-11-28 16:08:30
|
> > > No, it is not. A little more info would be good. Are you sure all CPUs > > > are loaded ? What kernel version ? What are your results ? What is the > > > workload ? > > > > I was not sure what info would be needful. > > > > kernel: 2.4.20 + syscall_cpu_affinity patch > > > > I'm running network benchmarks. When I attach the NIC's irq to CPU0, > > oprofile/L2_LINES_OUT reports very very few cache misses in > > network-related kernel functions. When attaching the NIC's irq to either > > CPU1, CPU2, or CPU3, oprofile/L2_LINES_OUT reports a lot more cache > > misses. > > So what makes you sure it's oprofile rather than reality ? Because profiling the kernel with only CPU0 processing incoming packets leads to very few cache misses while doing exactly the same thing on CPU1 (or CPU2 or CPU3) leads to cache misses. > What happens if you load each CPU with a cpu-consuming infinite loop, > and run CPU_CLK_UNHALTED ? Ok I ran the two following two tests both with CPU_CLK_UNHALTED and L2_LINES_OUT: 1. NIC irq and web server process both attached to CPU0 (CPU0 is 0% idle while CPU1, CPU2, and CPU3 are 100% idle) 2. NIC irq and web server process both attached to CPU3 (CPU3 is 0% idle while CPU0, CPU1, and CPU2 are 100% idle) See results below. My conclusion: there's nothing wrong with CPU_CLK_UNHALTED but there definitely is something wrong for L2_LINES_OUT on CPU#0. But I have to say that I don't know if the problem comes from oprofile. Thank you. -- Eric ---- CPU_CLK_UNHALTED results for test #1: # opreport -l /usr/src/elemoine/linux-2.4.20/vmlinux | head -14 CPU: PIII, speed 549.474 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 10000 samples % symbol name 1636499 49.7816 default_idle 55275 1.6814 kmalloc 54338 1.6529 tcp_v4_rcv 54139 1.6469 __kfree_skb 51174 1.5567 tcp_transmit_skb 49522 1.5064 ip_queue_xmit 45780 1.3926 do_tcp_sendpages 45297 1.3779 alloc_skb 42335 1.2878 skb_release_data 42120 1.2813 ip_output CPU_CLK_UNHALTED results for test #2: # opreport -l /usr/src/elemoine/linux-2.4.20/vmlinux | head -14 CPU: PIII, speed 549.474 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 10000 1623638 49.6087 default_idle 56166 1.7161 kmalloc 53248 1.6269 tcp_v4_rcv 53217 1.6260 __kfree_skb 50478 1.5423 tcp_transmit_skb 47835 1.4616 ip_queue_xmit 47416 1.4488 alloc_skb 44143 1.3487 do_tcp_sendpages 41743 1.2754 ip_output 41576 1.2703 skb_release_data L2_LINES_OUT results for test #1: CPU: PIII, speed 549.474 MHz (estimated) Counted L2_LINES_OUT events (number of recovered lines from L2) with a unit mask of 0x00 (No unit mask) count 5000 samples % symbol name 57 35.4037 default_idle 18 11.1801 statm_pgd_range 12 7.4534 do_wp_page 10 6.2112 tcp_twkill__thr 8 4.9689 collect_sigign_sigcatch 8 4.9689 proc_pid_stat 7 4.3478 tcp_timewait_kill 2 1.2422 do_IRQ 2 1.2422 do_anonymous_page 2 1.2422 do_no_page 2 1.2422 fget 2 1.2422 task_dumpable L2_LINES_OUT results for test #2: Counted L2_LINES_OUT events (number of recovered lines from L2) with a unit mask of 0x00 (No unit mask) count 5000 samples % symbol name 92 5.7716 ip_queue_xmit 77 4.8306 tcp_sendmsg 69 4.3287 tcp_v4_rcv 61 3.8269 tcp_transmit_skb 58 3.6386 ip_output 43 2.6976 default_idle 43 2.6976 do_generic_file_read 42 2.6349 tcp_time_wait 32 2.0075 schedule 32 2.0075 tcp_recvmsg 30 1.8821 __kfree_skb 30 1.8821 tcp_poll ---- |
From: John L. <le...@mo...> - 2003-11-28 16:36:49
|
On Fri, Nov 28, 2003 at 05:08:23PM +0100, Eri...@en... wrote: > > So what makes you sure it's oprofile rather than reality ? > > Because profiling the kernel with only CPU0 processing incoming packets > leads to very few cache misses while doing exactly the same thing on > CPU1 (or CPU2 or CPU3) leads to cache misses. That's just repeating *what* is happening. Just because you get some unusual results does *not* imply that the results are broken. After all, is this not exactly what profiling is for ? Of course, I can't possibly say what the cause might be (and I am /not/ discounting an oprofile/cpu problem), but we haven't seen any problems like this before. > L2_LINES_OUT: > > 1. NIC irq and web server process both attached to CPU0 (CPU0 is 0% > idle while CPU1, CPU2, and CPU3 are 100% idle) > > 2. NIC irq and web server process both attached to CPU3 (CPU3 is 0% > idle while CPU0, CPU1, and CPU2 are 100% idle) > > My conclusion: there's nothing wrong with CPU_CLK_UNHALTED but there > definitely is something wrong for L2_LINES_OUT on CPU#0. But I have > > L2_LINES_OUT results for test #1: > > CPU: PIII, speed 549.474 MHz (estimated) > Counted L2_LINES_OUT events (number of recovered lines from L2) with a > samples % symbol name > 57 35.4037 default_idle > 18 11.1801 statm_pgd_range > L2_LINES_OUT results for test #2: > > Counted L2_LINES_OUT events (number of recovered lines from L2) with a > unit mask of 0x00 (No unit mask) count 5000 > samples % symbol name > 92 5.7716 ip_queue_xmit > 77 4.8306 tcp_sendmsg But even so, how did you reach such a conclusion ? These values are very low, and well into the level of statistical noise. Please remember that OProfile is not an exact or accounting profiler. Try running the tests for much longer, or reducing the count value considerably. When the results are into the thousands/tens of thousands, then it's more statistically reasonable. regards john |
From: <Eri...@en...> - 2003-11-28 18:39:50
|
> Try running the tests for much longer, or reducing the count value > considerably. When the results are into the thousands/tens of thousands, > then it's more statistically reasonable. You are right. I was not running the tests long enough. What I still don't get is that I ran the not-enough-long test many times and CPU0 was always the processor that seemed not to work! Thanks a lot for taking time to reply. -- Eric |