From: Dave H. <hav...@us...> - 2003-02-27 06:32:39
|
This happened while running dbench on 2.5.62. I haven't seen it before, but I thought I'd report it anyway. I'm using the 0.5 version of the userspace tools. I'm pretty sure it happened on this line in oprofile_add_sample(): cpu_buf->buffer[cpu_buf->pos].eip = eip; Unable to handle kernel paging request at virtual address f8c3c000 c0212022 *pde = 00000000 Oops: 0002 CPU: 13 EIP: 0060:[<c0212022>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010046 eax: 40082d94 ebx: 00000340 ecx: 00002000 edx: f8c2c000 esi: 00000000 edi: c0310f00 ebp: 00000000 esp: e2fbdf60 ds: 007b es: 007b ss: 0068 Stack: 00000000 00000000 c03a22c8 c02139f0 40082d94 00000000 00000000 0000000d e2fbc000 e2fbdfc4 bffff7a0 bffffbb8 40082d94 c0213291 0000000d c03a22c8 e2fbdfc4 c010a1eb e2fbdfc4 0000000d 40122020 00000000 c010962e e2fbdfc4 Call Trace: [<c02139f0>] ppro_check_ctrs+0x4c/0x80 [<c0213291>] nmi_callback+0x21/0x28 [<c010a1eb>] do_nmi+0x2b/0x48 [<c010962e>] nmi+0x1e/0x30 Code: 89 04 ca 8b 8b c4 0b 31 c0 8b 93 d0 0b 31 c0 8b 44 24 18 89 p >>EIP; c0212022 <oprofile_add_sample+102/128> <===== >>edi; c0310f00 <cpu_buffer+340/800> Trace; c02139f0 <ppro_check_ctrs+4c/80> Trace; c0213291 <nmi_callback+21/28> Trace; c010a1eb <do_nmi+2b/48> Trace; c010962e <nmi+1e/30> Code; c0212022 <oprofile_add_sample+102/128> 00000000 <_EIP>: Code; c0212022 <oprofile_add_sample+102/128> <===== 0: 89 04 ca mov %eax,(%edx,%ecx,8) <===== Code; c0212025 <oprofile_add_sample+105/128> 3: 8b 8b c4 0b 31 c0 mov 0xc0310bc4(%ebx),%ecx Code; c021202b <oprofile_add_sample+10b/128> 9: 8b 93 d0 0b 31 c0 mov 0xc0310bd0(%ebx),%edx Code; c0212031 <oprofile_add_sample+111/128> f: 8b 44 24 18 mov 0x18(%esp,1),%eax Code; c0212035 <oprofile_add_sample+115/128> 13: 89 00 mov %eax,(%eax) -- Dave Hansen hav...@us... |
From: John L. <le...@mo...> - 2003-02-27 17:41:14
|
On Wed, Feb 26, 2003 at 10:29:43PM -0800, Dave Hansen wrote: > I'm pretty sure it happened on this line in oprofile_add_sample(): > cpu_buf->buffer[cpu_buf->pos].eip = eip; > > Unable to handle kernel paging request at virtual address f8c3c000 Odd. UP or SMP ? Were you shutting down oprofile at the time (or did the daemon crash ? check /var/lib/oprofile/oprofiled.log) The only thing I can think is that the buffer got freed then we got an NMI afterwards (which obviously isn't supposed to happen...) UP oprofile in current linus is still totally broken, but I can't imagine it causing the above. regards john |
From: Dave H. <hav...@us...> - 2003-02-27 17:58:32
|
John Levon wrote: > On Wed, Feb 26, 2003 at 10:29:43PM -0800, Dave Hansen wrote: > >>I'm pretty sure it happened on this line in oprofile_add_sample(): >> cpu_buf->buffer[cpu_buf->pos].eip = eip; >> >>Unable to handle kernel paging request at virtual address f8c3c000 > > Odd. UP or SMP ? Were you shutting down oprofile at the time (or did the > daemon crash ? check /var/lib/oprofile/oprofiled.log) 16-way NUMAQ :) It was in the middle of a dbench run, so nothing critical was happening with oprofile. > The only thing I can think is that the buffer got freed then we got an > NMI afterwards (which obviously isn't supposed to happen...) The irqbalance code is causing some funniness on these machines, so it might be related. I'll speak up if I see it again. -- Dave Hansen hav...@us... |
From: Philippe E. <ph...@wa...> - 2003-03-05 19:11:56
|
Dave Hansen wrote: > This happened while running dbench on 2.5.62. I haven't seen it before, > but I thought I'd report it anyway. I'm using the 0.5 version of the > userspace tools. > > I'm pretty sure it happened on this line in oprofile_add_sample(): > cpu_buf->buffer[cpu_buf->pos].eip = eip; yes, in the last chunk of code in oprofile_add_sample() > Unable to handle kernel paging request at virtual address f8c3c000 > c0212022 > *pde = 00000000 > Oops: 0002 > CPU: 13 > EIP: 0060:[<c0212022>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010046 > eax: 40082d94 ebx: 00000340 ecx: 00002000 edx: f8c2c000 ^^^^^^^^ buffer overrrun by one entry (8192 entry by default of 8 bytes each on x86), potentially oprofile_add_sample() add 3 events in buffer but the protection at begin of code protect against two addition not three The bug is rare because add_sample use three entry in rare case, and thing are wrong only when cpu_buf->pos == buffer_size - 2 the code is not fixed in 2.5.64, John if you have not already a patch pending for this can you push it in your tree ? void oprofile_add_sample(unsigned long eip, unsigned int is_kernel, unsigned long event, int cpu) ..... - if (cpu_buf->pos > buffer_size - 2) { + if (cpu_buf->pos > buffer_size - 3) { cpu_buf->sample_lost_overflow++; goto out; } >>>EIP; c0212022 <oprofile_add_sample+102/128> <===== >> > >>>edi; c0310f00 <cpu_buffer+340/800> >> > > Trace; c02139f0 <ppro_check_ctrs+4c/80> > Trace; c0213291 <nmi_callback+21/28> > Trace; c010a1eb <do_nmi+2b/48> > Trace; c010962e <nmi+1e/30> > > Code; c0212022 <oprofile_add_sample+102/128> > 00000000 <_EIP>: > Code; c0212022 <oprofile_add_sample+102/128> <===== > 0: 89 04 ca mov %eax,(%edx,%ecx,8) <===== ecx == cpu_bufffer->pos == buffer_size ... boom ... regards, Philippe Elie |