From: Adrian B. <bu...@st...> - 2006-11-09 22:52:54
|
On Tue, Nov 07, 2006 at 06:33:44PM -0800, Linus Torvalds wrote: >... > The rest is really mostly one-liners (or close) to various subsystems. New > PCI ID's, trivial fixes, cifs, dvb, things like that. I'm feeling better > about this - there may be a -rc6, but maybe we don't even need one. >... Famous last words... ;-) This email lists some known regressions in 2.6.19-rc5 compared to 2.6.18. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : ThinkPad R50p: boot fail with (lapic && on_battery) References : http://lkml.org/lkml/2006/10/31/333 Submitter : Ernst Herzberg <ea...@ne...> Handled-By : Len Brown <len...@in...> Status : problem is being debugged Subject : ThinkPad T60: no screen after resume References : http://mail.matrix.de/pipermail/linux-thinkpad/2006-November/037011.html Submitter : Martin Lorenz <ma...@lo...> Status : unknown Subject : ThinkPad T60: lose ACPI events after suspend/resume References : http://lkml.org/lkml/2006/10/10/39 Submitter : Martin Lorenz <ma...@lo...> Status : problem might be fixed by commit f9dadfa71bc594df09044da61d1c72701121d802 Subject : i386: more DWARFs and strange messages References : http://lkml.org/lkml/2006/10/29/127 Submitter : Martin Lorenz <ma...@lo...> Status : should be fixed by commit 4b96b1a10cb00c867103b21f0f2a6c91b705db11 Subject : BUG: scheduling while atomic: events/0/0x00000001/4, etc.. References : http://lkml.org/lkml/2006/11/2/209 Submitter : Paolo Ornati <or...@fa...> Status : unknown Subject : weird battery charge level reported ACPI Error method parse / execution failed References : http://bugzilla.kernel.org/show_bug.cgi?id=7466 Submitter : Olivier Mondoloni <oli...@wa...> Status : unknown Subject : sata-via doesn't detect anymore disks attached to VIA vt6421 References : http://bugzilla.kernel.org/show_bug.cgi?id=7255 Submitter : Thierry Vignaud <tvi...@ma...> Status : unknown Subject : unable to rip cd References : http://lkml.org/lkml/2006/10/13/100 Submitter : Alex Romosan <ro...@sy...> Status : unknown Subject : x86_64: oprofile doesn't work References : http://lkml.org/lkml/2006/10/27/3 Submitter : Prakash Punnoor <pr...@pu...> Status : unknown Subject : x86_64: NR_IRQ increase causes 11.5% slowdown in lmbench's fork benchmark References : http://lkml.org/lkml/2006/11/2/192 Submitter : Tim Chen <tim...@li...> Caused-By : Eric W. Biederman <ebi...@xm...> commit 550f2299ac8ffaba943cf211380d3a8d3fa75301 Status : unknown Subject : PCI: MMCONFIG breakage References : http://lkml.org/lkml/2006/10/27/251 Submitter : Jeff Chua <jef...@gm...> Caused-By : Andi Kleen <ak...@su...> commit de09bddb9d6f96785be470c832b881e6d72d589f Handled-By : Andi Kleen <ak...@su...> Aaron Durbin <ad...@go...> Matthew Wilcox <ma...@wi...> Status : people are investigating Subject : SMP kernel can not generate ISA irq properly References : http://lkml.org/lkml/2006/10/22/15 Submitter : Komuro <kom...@ni...> Handled-By : Thomas Gleixner <tg...@li...> Status : Thomas is investigating Subject : ipath driver MCEs system on load when HT chip present References : http://bugzilla.kernel.org/show_bug.cgi?id=7455 Submitter : Bryan O'Sullivan <bo...@se...> Caused-By : Eric W. Biederman <ebi...@xm...> Handled-By : Bryan O'Sullivan <bo...@se...> Eric W. Biederman <ebi...@xm...> Status : Bryan and Eric are working on fixing the ipath driver Subject : boot hang in the microcode driver References : http://lkml.org/lkml/2006/11/6/117 Submitter : Arjan van de Ven <ar...@li...> Caused-By : Shaohua Li <sha...@in...> commit a30a6a2cb0fdc2c9701d6ddfb21affeb8146c038 Handled-By : Arjan van de Ven <ar...@li...> Patch : http://lkml.org/lkml/2006/11/6/117 Status : workaround-patch available |
From: Adrian B. <bu...@st...> - 2006-11-11 01:51:03
|
This email lists some known regressions in 2.6.19-rc5 compared to 2.6.18 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : PCI MSI setting corrupted during resume References : http://bugzilla.kernel.org/show_bug.cgi?id=7479 Submitter : Stephen Hemminger <she...@os...> Status : unknown Subject : x86_64 boot failure: irq 22: nobody cared (hda_intel MSI) References : http://lkml.org/lkml/2006/11/8/98 Submitter : Olivier Nicolas <ol...@tr...> Status : unknown Subject : SMP kernel can not generate ISA irq properly References : http://lkml.org/lkml/2006/10/22/15 http://lkml.org/lkml/2006/11/10/142 Submitter : Komuro <kom...@ni...> Handled-By : Thomas Gleixner <tg...@li...> Status : Thomas is investigating Subject : x86_64: Fix partial page check to ensure unusable memory is not being marked usable References : http://lkml.org/lkml/2006/11/9/239 Submitter : Aaron Durbin <ad...@go...> Caused-By : Mel Gorman <me...@cs...> commit 5cb248abf5ab65ab543b2d5fc16c738b28031fc0 Patch : http://lkml.org/lkml/2006/11/9/239 Status : patch available Subject : x86_64: Bad page state in process 'swapper' References : http://lkml.org/lkml/2006/11/10/135 http://lkml.org/lkml/2006/11/10/208 Submitter : Andre Noll <ma...@sy...> Handled-By : Andi Kleen <ak...@su...> Status : Andi is investigating Subject : x86_64: oprofile doesn't work References : http://lkml.org/lkml/2006/10/27/3 Submitter : Prakash Punnoor <pr...@pu...> Status : unknown Subject : weird battery charge level reported ACPI Error method parse / execution failed References : http://bugzilla.kernel.org/show_bug.cgi?id=7466 Submitter : Olivier Mondoloni <oli...@wa...> Status : unknown Subject : ThinkPad R50p: boot fail with (lapic && on_battery) References : http://lkml.org/lkml/2006/10/31/333 Submitter : Ernst Herzberg <ea...@ne...> Handled-By : Len Brown <len...@in...> Status : problem is being debugged Subject : BUG: scheduling while atomic: events/0/0x00000001/4 after resume References : http://lkml.org/lkml/2006/11/2/209 Submitter : Paolo Ornati <or...@fa...> Status : unknown Subject : sata-via doesn't detect anymore disks attached to VIA vt6421 References : http://bugzilla.kernel.org/show_bug.cgi?id=7255 Submitter : Thierry Vignaud <tvi...@ma...> Status : unknown Subject : libata must be initialized earlier References : http://ozlabs.org/pipermail/linuxppc-dev/2006-November/027945.html Submitter : Paul Mackerras <pa...@sa...> Handled-By : Brian King <br...@us...> Patch : http://marc.theaimsgroup.com/?l=linux-ide&m=116169938407596&w=2 Status : patch available Subject : unable to rip cd References : http://lkml.org/lkml/2006/10/13/100 http://lkml.org/lkml/2006/11/8/42 Submitter : Alex Romosan <ro...@sy...> Handled-By : Jens Axboe <jen...@or...> Status : Jens is investigating |
From: Adrian B. <bu...@st...> - 2006-11-15 10:22:10
|
This email lists some known regressions in 2.6.19-rc5 compared to 2.6.18 that are not yet fixed in Linus' tree. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject : PCI MSI setting corrupted during resume References : http://bugzilla.kernel.org/show_bug.cgi?id=7479 Submitter : Stephen Hemminger <she...@os...> Status : unknown Subject : SMP kernel can not generate ISA irq properly References : http://lkml.org/lkml/2006/10/22/15 http://lkml.org/lkml/2006/11/10/142 Submitter : Komuro <kom...@ni...> Handled-By : "Eric W. Biederman" <ebi...@xm...> Ingo Molnar <mi...@re...> Status : problem is being debugged Subject : ThinkPad R50p: boot fail with (lapic && on_battery) References : http://lkml.org/lkml/2006/10/31/333 Submitter : Ernst Herzberg <ea...@ne...> Handled-By : Len Brown <len...@in...> Status : problem is being debugged Subject : x86_64: Bad page state in process 'swapper' References : http://lkml.org/lkml/2006/11/10/135 http://lkml.org/lkml/2006/11/10/208 Submitter : Andre Noll <ma...@sy...> Handled-By : Andi Kleen <ak...@su...> Status : Andi is investigating Subject : x86_64: oprofile doesn't work References : http://lkml.org/lkml/2006/10/27/3 Submitter : Prakash Punnoor <pr...@pu...> Status : unknown Subject : unable to rip cd References : http://lkml.org/lkml/2006/10/13/100 http://lkml.org/lkml/2006/11/8/42 Submitter : Alex Romosan <ro...@sy...> Handled-By : Jens Axboe <jen...@or...> Status : Jens is investigating Subject : can't disable OHCI wakeup via sysfs References : http://lkml.org/lkml/2006/11/11/33 Submitter : Andrey Borzenkov <arv...@ma...> Handled-By : Alan Stern <st...@ro...> Patch : http://lkml.org/lkml/2006/11/13/261 Status : patch available |
From: Eric D. <da...@co...> - 2006-11-15 11:33:45
|
On Wednesday 15 November 2006 11:21, Adrian Bunk wrote: > Subject : x86_64: oprofile doesn't work > References : http://lkml.org/lkml/2006/10/27/3 > Submitter : Prakash Punnoor <pr...@pu...> > Status : unknown > I confirm a got this one too. On a working kernel on an Opteron, we have normally 4 directories in /dev/oprofile : # ls -ld /dev/oprofile/? drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/0 drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/1 drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/2 drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/3 With linux-2.6.19-rc5, the first one (0) is missing and we get 1,2,3 Maybe the 'bug' is in oprofile tools, that currently expect to find '0' Eric |
From: Andi K. <ak...@su...> - 2006-11-15 10:50:52
|
> On a working kernel on an Opteron, we have normally 4 directories > in /dev/oprofile : > > # ls -ld /dev/oprofile/? > drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/0 > drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/1 > drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/2 > drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/3 > > With linux-2.6.19-rc5, the first one (0) is missing and we get 1,2,3 That's because 0 was never available. It is used by the NMI watchdog. The new kernel doesn't give it to oprofile anymore. > Maybe the 'bug' is in oprofile tools, that currently expect to find '0' Yes, it's likely a user space issue. -Andi |
From: William C. <wc...@re...> - 2006-11-15 16:41:32
|
Andi Kleen wrote: >>On a working kernel on an Opteron, we have normally 4 directories >>in /dev/oprofile : >> >># ls -ld /dev/oprofile/? >>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/0 >>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/1 >>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/2 >>drwxr-xr-x 1 root root 0 15. Nov 12:38 /dev/oprofile/3 >> >>With linux-2.6.19-rc5, the first one (0) is missing and we get 1,2,3 > > > That's because 0 was never available. It is used by the NMI watchdog. > The new kernel doesn't give it to oprofile anymore. > > >>Maybe the 'bug' is in oprofile tools, that currently expect to find '0' > > > Yes, it's likely a user space issue. > > -Andi OProfile has a simplistic view of the performance monitoring hardware. The routines in libop/op_alloc_counter.c determine what set of performance registers is available from the processor in use. There is no check to see what registers are actually available in the /dev/oprofile directory. opcontrol executes ophelp to determine which specific counters to count which events. The function map_event_to_counter() in libop/op_alloc_counter.c does the actual selection. It seems what is needed is for map_event_to_counter() to check to see which counters are available and mark the others as unavailable. -Will |
From: Eric D. <da...@co...> - 2006-11-22 11:22:02
|
On Wednesday 15 November 2006 11:35, Eric Dumazet wrote: > On Wednesday 15 November 2006 11:21, Adrian Bunk wrote: > > Subject : x86_64: oprofile doesn't work > > References : http://lkml.org/lkml/2006/10/27/3 > > Submitter : Prakash Punnoor <pr...@pu...> > > Status : unknown > I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set. # opcontrol --setup --event=RESOURCE_STALLS:1000 --vmlinux=$VMFILE # opcontrol --start /usr/bin/opcontrol: line 911: /dev/oprofile/0/enabled: No such file or directory /usr/bin/opcontrol: line 911: /dev/oprofile/0/event: No such file or directory /usr/bin/opcontrol: line 911: /dev/oprofile/0/count: No such file or directory /usr/bin/opcontrol: line 911: /dev/oprofile/0/kernel: No such file or directory /usr/bin/opcontrol: line 911: /dev/oprofile/0/user: No such file or directory /usr/bin/opcontrol: line 911: /dev/oprofile/0/unit_mask: No such file or directory Using 2.6+ OProfile kernel interface. Reading module info. Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running. # ls -l /dev/oprofile/ total 0 drwxr-xr-x 1 root root 0 Nov 22 11:18 1 -rw-r--r-- 1 root root 0 Nov 22 11:18 backtrace_depth -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_size -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_watershed -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_buffer_size -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_type -rw-rw-rw- 1 root root 0 Nov 22 11:18 dump -rw-r--r-- 1 root root 0 Nov 22 11:18 enable -rw-r--r-- 1 root root 0 Nov 22 11:18 pointer_size drwxr-xr-x 1 root root 0 Nov 22 11:18 stats # dmesg | grep oprofile oprofile: using NMI interrupt. # opcontrol --version opcontrol: oprofile 0.9.2 compiled on Nov 22 2006 11:24:09 Eric |
From: William C. <wc...@nc...> - 2006-11-22 17:59:33
|
Eric Dumazet wrote: > On Wednesday 15 November 2006 11:35, Eric Dumazet wrote: > >>On Wednesday 15 November 2006 11:21, Adrian Bunk wrote: >> >>>Subject : x86_64: oprofile doesn't work >>>References : http://lkml.org/lkml/2006/10/27/3 >>>Submitter : Prakash Punnoor <pr...@pu...> >>>Status : unknown >> > > I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set. > > # opcontrol --setup --event=RESOURCE_STALLS:1000 --vmlinux=$VMFILE > # opcontrol --start > /usr/bin/opcontrol: line 911: /dev/oprofile/0/enabled: No such file or > directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/event: No such file or directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/count: No such file or directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/kernel: No such file or > directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/user: No such file or directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/unit_mask: No such file or > directory > Using 2.6+ OProfile kernel interface. > Reading module info. > Using log file /var/lib/oprofile/oprofiled.log > Daemon started. > Profiler running. > > # ls -l /dev/oprofile/ > total 0 > drwxr-xr-x 1 root root 0 Nov 22 11:18 1 > -rw-r--r-- 1 root root 0 Nov 22 11:18 backtrace_depth > -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer > -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_size > -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_watershed > -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_buffer_size > -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_type > -rw-rw-rw- 1 root root 0 Nov 22 11:18 dump > -rw-r--r-- 1 root root 0 Nov 22 11:18 enable > -rw-r--r-- 1 root root 0 Nov 22 11:18 pointer_size > drwxr-xr-x 1 root root 0 Nov 22 11:18 stats > # dmesg | grep oprofile > oprofile: using NMI interrupt. > # opcontrol --version > opcontrol: oprofile 0.9.2 compiled on Nov 22 2006 11:24:09 > > Eric Could you try the patch that I posted on the oprofile mailing list last week November 17 2005 for op_allocate.c and see if that resolves the problem you are having? http://sourceforge.net/mailarchive/message.php?msg_id=37316102 -Will |
From: William C. <wc...@re...> - 2006-11-22 18:05:46
Attachments:
opalloc.diff
|
Eric Dumazet wrote: > On Wednesday 15 November 2006 11:35, Eric Dumazet wrote: > >>On Wednesday 15 November 2006 11:21, Adrian Bunk wrote: >> >>>Subject : x86_64: oprofile doesn't work >>>References : http://lkml.org/lkml/2006/10/27/3 >>>Submitter : Prakash Punnoor <pr...@pu...> >>>Status : unknown >> > > I hit the same problem on i386 architecture too, if CONFIG_ACPI is not set. > > # opcontrol --setup --event=RESOURCE_STALLS:1000 --vmlinux=$VMFILE > # opcontrol --start > /usr/bin/opcontrol: line 911: /dev/oprofile/0/enabled: No such file or > directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/event: No such file or directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/count: No such file or directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/kernel: No such file or > directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/user: No such file or directory > /usr/bin/opcontrol: line 911: /dev/oprofile/0/unit_mask: No such file or > directory > Using 2.6+ OProfile kernel interface. > Reading module info. > Using log file /var/lib/oprofile/oprofiled.log > Daemon started. > Profiler running. > > # ls -l /dev/oprofile/ > total 0 > drwxr-xr-x 1 root root 0 Nov 22 11:18 1 > -rw-r--r-- 1 root root 0 Nov 22 11:18 backtrace_depth > -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer > -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_size > -rw-r--r-- 1 root root 0 Nov 22 11:18 buffer_watershed > -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_buffer_size > -rw-r--r-- 1 root root 0 Nov 22 11:18 cpu_type > -rw-rw-rw- 1 root root 0 Nov 22 11:18 dump > -rw-r--r-- 1 root root 0 Nov 22 11:18 enable > -rw-r--r-- 1 root root 0 Nov 22 11:18 pointer_size > drwxr-xr-x 1 root root 0 Nov 22 11:18 stats > # dmesg | grep oprofile > oprofile: using NMI interrupt. > # opcontrol --version > opcontrol: oprofile 0.9.2 compiled on Nov 22 2006 11:24:09 > > Eric You will also need another patch checked into the oprofile cvs last week mentioned: http://sourceforge.net/mailarchive/message.php?msg_id=35422937 -Will |
From: Eric D. <da...@co...> - 2006-11-22 18:26:30
|
On Wednesday 22 November 2006 19:05, William Cohen wrote: > You will also need another patch checked into the oprofile cvs last week > mentioned: > > http://sourceforge.net/mailarchive/message.php?msg_id=35422937 > > -Will Thank you William. I confirm that CVS oprofile version + patches you gave here works with linux-2.6.16-rc6 on i386, regardless of disabling nmi_watchdog (adding or not nmi_watchdog=0 in boot params) Eric |
From: Andi K. <ak...@su...> - 2006-11-15 16:49:07
|
> OProfile has a simplistic view of the performance monitoring hardware. The > routines in libop/op_alloc_counter.c determine what set of performance registers > is available from the processor in use. There is no check to see what registers > are actually available in the /dev/oprofile directory. > > opcontrol executes ophelp to determine which specific counters to count which > events. The function map_event_to_counter() in libop/op_alloc_counter.c does the > actual selection. It seems what is needed is for map_event_to_counter() to check > to see which counters are available and mark the others as unavailable Thanks for the explanation. Can you please fix it and release a new version? Documentation/Changes could be adapted then. -Andi |
From: Andrew M. <ak...@os...> - 2006-11-15 18:39:31
|
On Wed, 15 Nov 2006 17:48:05 +0100 Andi Kleen <ak...@su...> wrote: > > > OProfile has a simplistic view of the performance monitoring hardware. The > > routines in libop/op_alloc_counter.c determine what set of performance registers > > is available from the processor in use. There is no check to see what registers > > are actually available in the /dev/oprofile directory. > > > > opcontrol executes ophelp to determine which specific counters to count which > > events. The function map_event_to_counter() in libop/op_alloc_counter.c does the > > actual selection. It seems what is needed is for map_event_to_counter() to check > > to see which counters are available and mark the others as unavailable > > Thanks for the explanation. Can you please fix it and release a new version? > Documentation/Changes could be adapted then. > Meanwhile we should restore the NMI counter to fix this bug. |
From: Andi K. <ak...@su...> - 2006-11-15 18:46:07
|
On Wednesday 15 November 2006 19:39, Andrew Morton wrote: > On Wed, 15 Nov 2006 17:48:05 +0100 > Andi Kleen <ak...@su...> wrote: > > > > > > OProfile has a simplistic view of the performance monitoring hardware. The > > > routines in libop/op_alloc_counter.c determine what set of performance registers > > > is available from the processor in use. There is no check to see what registers > > > are actually available in the /dev/oprofile directory. > > > > > > opcontrol executes ophelp to determine which specific counters to count which > > > events. The function map_event_to_counter() in libop/op_alloc_counter.c does the > > > actual selection. It seems what is needed is for map_event_to_counter() to check > > > to see which counters are available and mark the others as unavailable > > > > Thanks for the explanation. Can you please fix it and release a new version? > > Documentation/Changes could be adapted then. > > > > Meanwhile we should restore the NMI counter to fix this bug. No, it was always oprofile who was buggy here, silently taking the nmi watchdog away. -Andi |
From: Linus T. <tor...@os...> - 2006-11-15 19:08:19
|
On Wed, 15 Nov 2006, Andi Kleen wrote: > > > > Meanwhile we should restore the NMI counter to fix this bug. > > No, it was always oprofile who was buggy here, silently taking > the nmi watchdog away. Andi, your "blame game" doesn't matter. The fact is, it used to work, and the kernel changed interfaces, so now it doesn't. In other words, a kernel interface to user land changed. THAT IS ALWAYS A BUG. We don't change UI. Yes, "oprofile" should be fixed to not depend on that, but the kernel shouldn't change the interfaces, and we should add back the zero entry. Linus |
From: Andi K. <ak...@su...> - 2006-11-15 19:24:15
|
> The fact is, it used to work, and the kernel changed interfaces, so now it > doesn't. No, it didn't work. oprofile may have done something, but it just silently killed the NMI watchdog in the process. That was never acceptable. Now we do proper accounting of NMI sources and also proper allocation of performance counters. > Yes, "oprofile" should be fixed to not depend on that, but the kernel > shouldn't change the interfaces, and we should add back the zero entry. That would break the nmi watchdog again. Anyways, there is a sysctl to disable the nmi watchdog if someone is desperate. But I think it is clearly oprofile who did wrong here and needs to be fixed. -Andi |
From: Andrew M. <ak...@os...> - 2006-11-15 20:21:37
|
On Wed, 15 Nov 2006 20:23:53 +0100 Andi Kleen <ak...@su...> wrote: > > > The fact is, it used to work, and the kernel changed interfaces, so now it > > doesn't. > > No, it didn't work. oprofile may have done something, but it > just silently killed the NMI watchdog in the process. > That was never acceptable. But people could get profiles out. I know, I've seen them! > Now we do proper accounting of NMI sources and also proper allocation > of performance counters. > > > > Yes, "oprofile" should be fixed to not depend on that, but the kernel > > shouldn't change the interfaces, and we should add back the zero entry. > > That would break the nmi watchdog again. > > Anyways, there is a sysctl to disable the nmi watchdog if someone > is desperate. > > But I think it is clearly oprofile who did wrong here and needs > to be fixed. > Is it correct to say that oprofile-on-2.6.18 works, and that oprofile-on-2.6.19-rc5 does not? Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail in some particular scenarios? If it's really true that oprofile is simply busted then that's a serious problem and we should find some way of unbusting it. If that means just adding a dummy "0" entry which always returns zero or something like that, then fine. But we can't just go and bust it. |
From: <ebi...@xm...> - 2006-11-15 21:20:09
|
Andrew Morton <ak...@os...> writes: > Is it correct to say that oprofile-on-2.6.18 works, and that > oprofile-on-2.6.19-rc5 does not? > > Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail > in some particular scenarios? > > If it's really true that oprofile is simply busted then that's a serious > problem and we should find some way of unbusting it. If that means just > adding a dummy "0" entry which always returns zero or something like that, > then fine. > > But we can't just go and bust it. The simple question. If we turn off the NMI watchdog on 2.6.19-rc5 does oprofile work? I believe that is what Andi said. The description I read was a resource conflict. The resources oprofile just expects it can used are already in use so we tell it no and the user space oprofile doesn't cope. Now I don't know the interface allows us to rename the interfaces from 1 2 3 to 0 1 2. If we can then that looks like something we can fix. Otherwise from the description I tend to agree with Andi. The user space application assumed it own hardware that it did not. Hmm. I bet if nothing else we could move the NMI watchdog from 0 to 3 and make things work that way... Eric |
From: Andrew M. <ak...@os...> - 2006-11-15 21:31:26
|
On Wed, 15 Nov 2006 14:18:24 -0700 ebi...@xm... (Eric W. Biederman) wrote: > Andrew Morton <ak...@os...> writes: > > > Is it correct to say that oprofile-on-2.6.18 works, and that > > oprofile-on-2.6.19-rc5 does not? > > > > Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail > > in some particular scenarios? > > > > If it's really true that oprofile is simply busted then that's a serious > > problem and we should find some way of unbusting it. If that means just > > adding a dummy "0" entry which always returns zero or something like that, > > then fine. > > > > But we can't just go and bust it. > > The simple question. If we turn off the NMI watchdog on 2.6.19-rc5 > does oprofile work? I believe that is what Andi said. > > The description I read was a resource conflict. The resources oprofile > just expects it can used are already in use so we tell it no and > the user space oprofile doesn't cope. That would have been a bug in earlier kernels. > Now I don't know the interface allows us to rename the interfaces > from 1 2 3 to 0 1 2. If we can then that looks like something we can > fix. Otherwise from the description I tend to agree with Andi. > > The user space application assumed it own hardware that it did not. > > Hmm. I bet if nothing else we could move the NMI watchdog from 0 to 3 > and make things work that way... Surely the appropriate behaviour is to allow oprofile to steal the NMI and to then put the NMI back to doing the watchdog thing after oprofile has finished with it. If that's not a feasible thing to do for 2.6.19 then some short-term hack which makes oprofile work again is needed. |
From: Mikael P. <mi...@it...> - 2006-11-16 10:56:07
|
Andrew Morton writes: > Surely the appropriate behaviour is to allow oprofile to steal the NMI and > to then put the NMI back to doing the watchdog thing after oprofile has > finished with it. Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented the in-kernel API allowing real performance counter drivers like oprofile (and perfctr) to claim the HW from the NMI watchdog, do their work, and then release it which resumed the watchdog. Note that oprofile (and perfctr) didn't do anything behind the NMI watchdog's back. They went via the API. Nothing dodgy going on. |
From: Andrew M. <ak...@os...> - 2006-11-16 20:24:54
|
On Thu, 16 Nov 2006 11:55:46 +0100 Mikael Pettersson <mi...@it...> wrote: > Andrew Morton writes: > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and > > to then put the NMI back to doing the watchdog thing after oprofile has > > finished with it. > > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented > the in-kernel API allowing real performance counter drivers like > oprofile (and perfctr) to claim the HW from the NMI watchdog, > do their work, and then release it which resumed the watchdog. OK. But from Andi's comments it seems that the NMI watchdog was failing to resume its operation. > Note that oprofile (and perfctr) didn't do anything behind the > NMI watchdog's back. They went via the API. Nothing dodgy going on. |
From: Mikael P. <mi...@it...> - 2006-11-17 09:59:33
|
Andrew Morton writes: > On Thu, 16 Nov 2006 11:55:46 +0100 > Mikael Pettersson <mi...@it...> wrote: > > > Andrew Morton writes: > > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and > > > to then put the NMI back to doing the watchdog thing after oprofile has > > > finished with it. > > > > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented > > the in-kernel API allowing real performance counter drivers like > > oprofile (and perfctr) to claim the HW from the NMI watchdog, > > do their work, and then release it which resumed the watchdog. > > OK. But from Andi's comments it seems that the NMI watchdog was failing to > resume its operation. It certainly worked when I originally implemented it. If it didn't work that way before 2.6.19-rc1 butchered it then that would have been a bug that should have been fixed. |
From: Andi K. <ak...@su...> - 2006-11-17 10:30:04
|
On Friday 17 November 2006 10:59, Mikael Pettersson wrote: > It certainly worked when I originally implemented it. I don't think so. NMI watchdog never recovered no matter if oprofile used the counter or not. -Andi |
From: Andrew M. <ak...@os...> - 2006-11-17 10:14:15
|
On Fri, 17 Nov 2006 10:59:07 +0100 Mikael Pettersson <mi...@it...> wrote: > Andrew Morton writes: > > On Thu, 16 Nov 2006 11:55:46 +0100 > > Mikael Pettersson <mi...@it...> wrote: > > > > > Andrew Morton writes: > > > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and > > > > to then put the NMI back to doing the watchdog thing after oprofile has > > > > finished with it. > > > > > > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented > > > the in-kernel API allowing real performance counter drivers like > > > oprofile (and perfctr) to claim the HW from the NMI watchdog, > > > do their work, and then release it which resumed the watchdog. > > > > OK. But from Andi's comments it seems that the NMI watchdog was failing to > > resume its operation. > > It certainly worked when I originally implemented it. If it didn't work > that way before 2.6.19-rc1 butchered it then that would have been a bug > that should have been fixed. Oh. OK. Meanwhile, 2.6.19-rc6 remains unfixed. |
From: Bill D. <dav...@tm...> - 2006-11-19 03:05:33
|
Andrew Morton wrote: > On Fri, 17 Nov 2006 10:59:07 +0100 > Mikael Pettersson <mi...@it...> wrote: > >> Andrew Morton writes: >> > On Thu, 16 Nov 2006 11:55:46 +0100 >> > Mikael Pettersson <mi...@it...> wrote: >> > >> > > Andrew Morton writes: >> > > > Surely the appropriate behaviour is to allow oprofile to steal the NMI and >> > > > to then put the NMI back to doing the watchdog thing after oprofile has >> > > > finished with it. >> > > >> > > Which is _exactly_ what pre-2.6.19-rc1 kernels did. I implemented >> > > the in-kernel API allowing real performance counter drivers like >> > > oprofile (and perfctr) to claim the HW from the NMI watchdog, >> > > do their work, and then release it which resumed the watchdog. >> > >> > OK. But from Andi's comments it seems that the NMI watchdog was failing to >> > resume its operation. >> >> It certainly worked when I originally implemented it. If it didn't work >> that way before 2.6.19-rc1 butchered it then that would have been a bug >> that should have been fixed. > > Oh. OK. > > Meanwhile, 2.6.19-rc6 remains unfixed. > Has anyone verified that nmi watchdog works at all in 2.6.19-rc6? I haven't built a kernel since rc2, other things have been taking my time. -- Bill Davidsen <dav...@tm...> Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a normal user and is setuid root, with the "vi" line edit mode selected, and the character set is "big5," an off-by-one errors occurs during wildcard (glob) expansion. |
From: Andi K. <ak...@su...> - 2006-11-16 03:21:39
|
On Wed, Nov 15, 2006 at 12:21:18PM -0800, Andrew Morton wrote: > Andi Kleen <ak...@su...> wrote: > > > > > > The fact is, it used to work, and the kernel changed interfaces, so now it > > > doesn't. > > > > No, it didn't work. oprofile may have done something, but it > > just silently killed the NMI watchdog in the process. > > That was never acceptable. > > But people could get profiles out. I know, I've seen them! Just the nmi watchdog was gone then. > > > Now we do proper accounting of NMI sources and also proper allocation > > of performance counters. > > > > > > > Yes, "oprofile" should be fixed to not depend on that, but the kernel > > > shouldn't change the interfaces, and we should add back the zero entry. > > > > That would break the nmi watchdog again. > > > > Anyways, there is a sysctl to disable the nmi watchdog if someone > > is desperate. > > > > But I think it is clearly oprofile who did wrong here and needs > > to be fixed. > > > > Is it correct to say that oprofile-on-2.6.18 works, and that > oprofile-on-2.6.19-rc5 does not? > > Or is there some sort of workaround for this, or does 2.6.19-rc5 only fail echo 0 > /proc/sys/kernel/nmi_watchdog before the oprofile module is loaded. With builtin oprofile probably nmi_watchdog=0 > in some particular scenarios? On x86-64 and on newer i386 machines (based on DMI year) > > If it's really true that oprofile is simply busted then that's a serious > problem and we should find some way of unbusting it. If that means just > adding a dummy "0" entry which always returns zero or something like that, > then fine. That could be probably done. > But we can't just go and bust it. It just did something unbelievable broken before. I would say it busted itself. -Andi |