From: dzickus <dz...@re...> - 2006-05-09 20:57:07
|
A major cleanup of the nmi watchdog on the i386 and x86_64 archs. This little project started out as a way to get unknown_nmi_panic working on an x86_64 SMP machine. But after Andi rejected my dirty little hack ;), it grew into a spring cleaning of the nmi watchdog. The fix Andi wanted led to a conflict with how oprofile worked. This led to a reservation layer around the performance counters and a bunch of changes in oprofile startup. All the reservation layer consists of is reserving a bit that corresponds to a performance counter. The intent here is to prevent subsystems like oprofile and nmi watchdog from stomping on each other when trying to manipulate these counters. Currently, oprofile must stop the nmi watchdog before using the counters. This approach does not encourage any sharing. :) During the testing of this reservation layer, I ran into more and more bugs from old hacks (notably in the SMP cases). As a result I found myself rewriting a lot of this code. I think I straightened everything out, except for enabling/disabling the nmi watchdog on an ioapic. The enable_irq/disable_irq macros don't seem to work for me and I don't know enough about the subsystem to figure it out either. Of course the oprofile code is shared with the i386 arch, so all the changes I did in x86_64 had to be duplicated over there too. I tested these changes on a Xeon (UP/SMP), P4 (32/64, UP/SMP) and under various boot-up defaults (nmi-watchdog=0,1,2). I also worked on some of these changes with Andi Kleen and the oprofile folks, so these ideas shouldn't be too surprising, just perhaps the implementation. :o Any thoughts/feedback/criticism to clean this up more and probably fix some of my coding style would be appreciated. Cheers, Don -- |
From: dzickus <dz...@re...> - 2006-05-09 20:57:07
|
Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as they are no longer needed. The various subsystems are modified to register with the die_notifier instead. Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/x86_64/kernel/nmi.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/nmi.c +++ linux-don/arch/x86_64/kernel/nmi.c @@ -42,20 +42,6 @@ static DEFINE_PER_CPU(unsigned, evntsel_ */ #define NMI_MAX_COUNTER_BITS 66 -/* - * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: - * - it may be reserved by some other driver, or not - * - when not reserved by some other driver, it may be used for - * the NMI watchdog, or not - * - * This is maintained separately from nmi_active because the NMI - * watchdog may also be driven from the I/O APIC timer. - */ -static DEFINE_SPINLOCK(lapic_nmi_owner_lock); -static unsigned int lapic_nmi_owner; -#define LAPIC_NMI_WATCHDOG (1<<0) -#define LAPIC_NMI_RESERVED (1<<1) - /* nmi_active: * >0: the lapic NMI watchdog is active, but can be disabled * <0: the lapic NMI watchdog has not been set up, and cannot @@ -322,33 +308,6 @@ static void enable_lapic_nmi_watchdog(vo touch_nmi_watchdog(); } -int reserve_lapic_nmi(void) -{ - unsigned int old_owner; - - spin_lock(&lapic_nmi_owner_lock); - old_owner = lapic_nmi_owner; - lapic_nmi_owner |= LAPIC_NMI_RESERVED; - spin_unlock(&lapic_nmi_owner_lock); - if (old_owner & LAPIC_NMI_RESERVED) - return -EBUSY; - if (old_owner & LAPIC_NMI_WATCHDOG) - disable_lapic_nmi_watchdog(); - return 0; -} - -void release_lapic_nmi(void) -{ - unsigned int new_owner; - - spin_lock(&lapic_nmi_owner_lock); - new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED; - lapic_nmi_owner = new_owner; - spin_unlock(&lapic_nmi_owner_lock); - if (new_owner & LAPIC_NMI_WATCHDOG) - enable_lapic_nmi_watchdog(); -} - void disable_timer_nmi_watchdog(void) { BUG_ON(nmi_watchdog != NMI_IO_APIC); @@ -763,13 +722,6 @@ done: return rc; } -static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu) -{ - return 0; -} - -static nmi_callback_t nmi_callback = dummy_nmi_callback; - asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code) { nmi_enter(); @@ -780,18 +732,11 @@ asmlinkage __kprobes void do_nmi(struct int do_nmi_callback(struct pt_regs * regs, int cpu) { - return rcu_dereference(nmi_callback)(regs, cpu); -} - -void set_nmi_callback(nmi_callback_t callback) -{ - vmalloc_sync_all(); - rcu_assign_pointer(nmi_callback, callback); -} - -void unset_nmi_callback(void) -{ - nmi_callback = dummy_nmi_callback; +#ifdef CONFIG_SYSCTL + if (unknown_nmi_panic) + return unknown_nmi_panic_callback(regs, cpu); +#endif + return 0; } #ifdef CONFIG_SYSCTL @@ -801,37 +746,8 @@ static int unknown_nmi_panic_callback(st unsigned char reason = get_nmi_reason(); char buf[64]; - if (!(reason & 0xc0)) { - sprintf(buf, "NMI received for unknown reason %02x\n", reason); - die_nmi(buf,regs); - } - return 0; -} - -/* - * proc handler for /proc/sys/kernel/unknown_nmi_panic - */ -int proc_unknown_nmi_panic(struct ctl_table *table, int write, struct file *file, - void __user *buffer, size_t *length, loff_t *ppos) -{ - int old_state; - - old_state = unknown_nmi_panic; - proc_dointvec(table, write, file, buffer, length, ppos); - if (!!old_state == !!unknown_nmi_panic) - return 0; - - if (unknown_nmi_panic) { - if (reserve_lapic_nmi() < 0) { - unknown_nmi_panic = 0; - return -EBUSY; - } else { - set_nmi_callback(unknown_nmi_panic_callback); - } - } else { - release_lapic_nmi(); - unset_nmi_callback(); - } + sprintf(buf, "NMI received for unknown reason %02x\n", reason); + die_nmi(buf,regs); return 0; } @@ -845,8 +761,6 @@ EXPORT_SYMBOL(reserve_perfctr_nmi); EXPORT_SYMBOL(release_perfctr_nmi); EXPORT_SYMBOL(reserve_evntsel_nmi); EXPORT_SYMBOL(release_evntsel_nmi); -EXPORT_SYMBOL(reserve_lapic_nmi); -EXPORT_SYMBOL(release_lapic_nmi); EXPORT_SYMBOL(disable_timer_nmi_watchdog); EXPORT_SYMBOL(enable_timer_nmi_watchdog); EXPORT_SYMBOL(touch_nmi_watchdog); Index: linux-don/arch/x86_64/kernel/x8664_ksyms.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/x8664_ksyms.c +++ linux-don/arch/x86_64/kernel/x8664_ksyms.c @@ -104,9 +104,6 @@ EXPORT_SYMBOL(screen_info); EXPORT_SYMBOL(rtc_lock); -EXPORT_SYMBOL_GPL(set_nmi_callback); -EXPORT_SYMBOL_GPL(unset_nmi_callback); - /* Export string functions. We normally rely on gcc builtin for most of these, but gcc sometimes decides not to inline them. */ #undef memcpy Index: linux-don/include/asm-x86_64/nmi.h =================================================================== --- linux-don.orig/include/asm-x86_64/nmi.h +++ linux-don/include/asm-x86_64/nmi.h @@ -7,25 +7,6 @@ #include <linux/pm.h> #include <asm/io.h> -struct pt_regs; - -typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu); - -/** - * set_nmi_callback - * - * Set a handler for an NMI. Only one handler may be - * set. Return 1 if the NMI was handled. - */ -void set_nmi_callback(nmi_callback_t callback); - -/** - * unset_nmi_callback - * - * Remove the handler previously set. - */ -void unset_nmi_callback(void); - /** * do_nmi_callback * @@ -72,8 +53,6 @@ extern int reserve_evntsel_nmi(unsigned extern void release_evntsel_nmi(unsigned int); extern void setup_apic_nmi_watchdog (void *); -extern int reserve_lapic_nmi(void); -extern void release_lapic_nmi(void); extern void disable_timer_nmi_watchdog(void); extern void enable_timer_nmi_watchdog(void); extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); Index: linux-don/arch/i386/oprofile/nmi_int.c =================================================================== --- linux-don.orig/arch/i386/oprofile/nmi_int.c +++ linux-don/arch/i386/oprofile/nmi_int.c @@ -16,14 +16,15 @@ #include <asm/nmi.h> #include <asm/msr.h> #include <asm/apic.h> +#include <asm/kdebug.h> #include "op_counter.h" #include "op_x86_model.h" - + static struct op_x86_model_spec const * model; static struct op_msrs cpu_msrs[NR_CPUS]; static unsigned long saved_lvtpc[NR_CPUS]; - + static int nmi_start(void); static void nmi_stop(void); @@ -81,13 +82,24 @@ static void exit_driverfs(void) #define exit_driverfs() do { } while (0) #endif /* CONFIG_PM */ - -static int nmi_callback(struct pt_regs * regs, int cpu) +int profile_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data) { - return model->check_ctrs(regs, &cpu_msrs[cpu]); + struct die_args *args = (struct die_args *)data; + int ret = NOTIFY_DONE; + int cpu = smp_processor_id(); + + switch(val) { + case DIE_NMI: + if (model->check_ctrs(args->regs, &cpu_msrs[cpu])) + ret = NOTIFY_STOP; + break; + default: + break; + } + return ret; } - - + static void nmi_cpu_save_registers(struct op_msrs * msrs) { unsigned int const nr_ctrs = model->num_counters; @@ -173,27 +185,29 @@ static void nmi_cpu_setup(void * dummy) apic_write(APIC_LVTPC, APIC_DM_NMI); } +static struct notifier_block profile_exceptions_nb = { + .notifier_call = profile_exceptions_notify, + .next = NULL, + .priority = 0 +}; static int nmi_setup(void) { + int err=0; + if (!allocate_msrs()) return -ENOMEM; - /* We walk a thin line between law and rape here. - * We need to be careful to install our NMI handler - * without actually triggering any NMIs as this will - * break the core code horrifically. - */ - if (reserve_lapic_nmi() < 0) { + if ((err = register_die_notifier(&profile_exceptions_nb))){ free_msrs(); - return -EBUSY; + return err; } + /* We need to serialize save and setup for HT because the subset * of msrs are distinct for save and setup operations */ on_each_cpu(nmi_save_registers, NULL, 0, 1); on_each_cpu(nmi_cpu_setup, NULL, 0, 1); - set_nmi_callback(nmi_callback); nmi_enabled = 1; return 0; } @@ -249,8 +263,7 @@ static void nmi_shutdown(void) { nmi_enabled = 0; on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1); - unset_nmi_callback(); - release_lapic_nmi(); + unregister_die_notifier(&profile_exceptions_nb); free_msrs(); } Index: linux-don/arch/i386/oprofile/nmi_timer_int.c =================================================================== --- linux-don.orig/arch/i386/oprofile/nmi_timer_int.c +++ linux-don/arch/i386/oprofile/nmi_timer_int.c @@ -17,32 +17,49 @@ #include <asm/nmi.h> #include <asm/apic.h> #include <asm/ptrace.h> +#include <asm/kdebug.h> -static int nmi_timer_callback(struct pt_regs * regs, int cpu) +int profile_timer_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data) { - oprofile_add_sample(regs, 0); - return 1; + struct die_args *args = (struct die_args *)data; + int ret = NOTIFY_DONE; + + switch(val) { + case DIE_NMI: + oprofile_add_sample(args->regs, 0); + ret = NOTIFY_STOP; + break; + default: + break; + } + return ret; } +static struct notifier_block profile_timer_exceptions_nb = { + .notifier_call = profile_timer_exceptions_notify, + .next = NULL, + .priority = 0 +}; + static int timer_start(void) { - disable_timer_nmi_watchdog(); - set_nmi_callback(nmi_timer_callback); + if (register_die_notifier(&profile_timer_exceptions_nb)) + return 1; return 0; } static void timer_stop(void) { - enable_timer_nmi_watchdog(); - unset_nmi_callback(); + unregister_die_notifier(&profile_timer_exceptions_nb); synchronize_sched(); /* Allow already-started NMIs to complete. */ } int __init op_nmi_timer_init(struct oprofile_operations * ops) { - if (atomic_read(&nmi_active) <= 0) + if ((nmi_watchdog != NMI_IO_APIC) || (atomic_read(&nmi_active) <= 0)) return -ENODEV; ops->start = timer_start; Index: linux-don/arch/i386/kernel/nmi.c =================================================================== --- linux-don.orig/arch/i386/kernel/nmi.c +++ linux-don/arch/i386/kernel/nmi.c @@ -42,20 +42,6 @@ static DEFINE_PER_CPU(unsigned long, evn */ #define NMI_MAX_COUNTER_BITS 66 -/* - * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: - * - it may be reserved by some other driver, or not - * - when not reserved by some other driver, it may be used for - * the NMI watchdog, or not - * - * This is maintained separately from nmi_active because the NMI - * watchdog may also be driven from the I/O APIC timer. - */ -static DEFINE_SPINLOCK(lapic_nmi_owner_lock); -static unsigned int lapic_nmi_owner; -#define LAPIC_NMI_WATCHDOG (1<<0) -#define LAPIC_NMI_RESERVED (1<<1) - /* nmi_active: * >0: the lapic NMI watchdog is active, but can be disabled * <0: the lapic NMI watchdog has not been set up, and cannot @@ -325,33 +311,6 @@ static void enable_lapic_nmi_watchdog(vo touch_nmi_watchdog(); } -int reserve_lapic_nmi(void) -{ - unsigned int old_owner; - - spin_lock(&lapic_nmi_owner_lock); - old_owner = lapic_nmi_owner; - lapic_nmi_owner |= LAPIC_NMI_RESERVED; - spin_unlock(&lapic_nmi_owner_lock); - if (old_owner & LAPIC_NMI_RESERVED) - return -EBUSY; - if (old_owner & LAPIC_NMI_WATCHDOG) - disable_lapic_nmi_watchdog(); - return 0; -} - -void release_lapic_nmi(void) -{ - unsigned int new_owner; - - spin_lock(&lapic_nmi_owner_lock); - new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED; - lapic_nmi_owner = new_owner; - spin_unlock(&lapic_nmi_owner_lock); - if (new_owner & LAPIC_NMI_WATCHDOG) - enable_lapic_nmi_watchdog(); -} - void disable_timer_nmi_watchdog(void) { BUG_ON(nmi_watchdog != NMI_IO_APIC); @@ -865,6 +824,15 @@ done: return rc; } +int do_nmi_callback(struct pt_regs * regs, int cpu) +{ +#ifdef CONFIG_SYSCTL + if (unknown_nmi_panic) + return unknown_nmi_panic_callback(regs, cpu); +#endif + return 0; +} + #ifdef CONFIG_SYSCTL static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu) @@ -872,37 +840,8 @@ static int unknown_nmi_panic_callback(st unsigned char reason = get_nmi_reason(); char buf[64]; - if (!(reason & 0xc0)) { - sprintf(buf, "NMI received for unknown reason %02x\n", reason); - die_nmi(regs, buf); - } - return 0; -} - -/* - * proc handler for /proc/sys/kernel/unknown_nmi_panic - */ -int proc_unknown_nmi_panic(ctl_table *table, int write, struct file *file, - void __user *buffer, size_t *length, loff_t *ppos) -{ - int old_state; - - old_state = unknown_nmi_panic; - proc_dointvec(table, write, file, buffer, length, ppos); - if (!!old_state == !!unknown_nmi_panic) - return 0; - - if (unknown_nmi_panic) { - if (reserve_lapic_nmi() < 0) { - unknown_nmi_panic = 0; - return -EBUSY; - } else { - set_nmi_callback(unknown_nmi_panic_callback); - } - } else { - release_lapic_nmi(); - unset_nmi_callback(); - } + sprintf(buf, "NMI received for unknown reason %02x\n", reason); + die_nmi(regs, buf); return 0; } @@ -916,7 +855,5 @@ EXPORT_SYMBOL(reserve_perfctr_nmi); EXPORT_SYMBOL(release_perfctr_nmi); EXPORT_SYMBOL(reserve_evntsel_nmi); EXPORT_SYMBOL(release_evntsel_nmi); -EXPORT_SYMBOL(reserve_lapic_nmi); -EXPORT_SYMBOL(release_lapic_nmi); EXPORT_SYMBOL(disable_timer_nmi_watchdog); EXPORT_SYMBOL(enable_timer_nmi_watchdog); Index: linux-don/arch/i386/kernel/traps.c =================================================================== --- linux-don.orig/arch/i386/kernel/traps.c +++ linux-don/arch/i386/kernel/traps.c @@ -673,13 +673,6 @@ void die_nmi (struct pt_regs *regs, cons do_exit(SIGSEGV); } -static int dummy_nmi_callback(struct pt_regs * regs, int cpu) -{ - return 0; -} - -static nmi_callback_t nmi_callback = dummy_nmi_callback; - static void default_do_nmi(struct pt_regs * regs) { unsigned char reason = 0; @@ -699,9 +692,10 @@ static void default_do_nmi(struct pt_reg */ if (nmi_watchdog_tick(regs, reason)) return; + if (!do_nmi_callback(regs, smp_processor_id())) #endif - if (!rcu_dereference(nmi_callback)(regs, smp_processor_id())) unknown_nmi_error(reason, regs); + return; } if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP) @@ -732,19 +726,6 @@ fastcall void do_nmi(struct pt_regs * re nmi_exit(); } -void set_nmi_callback(nmi_callback_t callback) -{ - vmalloc_sync_all(); - rcu_assign_pointer(nmi_callback, callback); -} -EXPORT_SYMBOL_GPL(set_nmi_callback); - -void unset_nmi_callback(void) -{ - nmi_callback = dummy_nmi_callback; -} -EXPORT_SYMBOL_GPL(unset_nmi_callback); - #ifdef CONFIG_KPROBES fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code) { Index: linux-don/include/asm-i386/nmi.h =================================================================== --- linux-don.orig/include/asm-i386/nmi.h +++ linux-don/include/asm-i386/nmi.h @@ -6,24 +6,13 @@ #include <linux/pm.h> -struct pt_regs; - -typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu); - -/** - * set_nmi_callback - * - * Set a handler for an NMI. Only one handler may be - * set. Return 1 if the NMI was handled. - */ -void set_nmi_callback(nmi_callback_t callback); - /** - * unset_nmi_callback + * do_nmi_callback * - * Remove the handler previously set. + * Check to see if a callback exists and execute it. Return 1 + * if the handler exists and was handled successfully. */ -void unset_nmi_callback(void); +int do_nmi_callback(struct pt_regs *regs, int cpu); extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); extern int avail_to_resrv_perfctr_nmi(unsigned int); @@ -33,8 +22,6 @@ extern int reserve_evntsel_nmi(unsigned extern void release_evntsel_nmi(unsigned int); extern void setup_apic_nmi_watchdog (void *); -extern int reserve_lapic_nmi(void); -extern void release_lapic_nmi(void); extern void disable_timer_nmi_watchdog(void); extern void enable_timer_nmi_watchdog(void); extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); Index: linux-don/kernel/sysctl.c =================================================================== --- linux-don.orig/kernel/sysctl.c +++ linux-don/kernel/sysctl.c @@ -75,8 +75,6 @@ extern int percpu_pagelist_fraction; #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) int unknown_nmi_panic; -extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *, - void __user *, size_t *, loff_t *); #endif /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ @@ -630,7 +628,7 @@ static ctl_table kern_table[] = { .data = &unknown_nmi_panic, .maxlen = sizeof (int), .mode = 0644, - .proc_handler = &proc_unknown_nmi_panic, + .proc_handler = &proc_dointvec, }, #endif #if defined(CONFIG_X86) -- |
From: dzickus <dz...@re...> - 2006-05-09 20:57:10
|
Misc header cleanup for nmi watchdog. Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/i386/kernel/apic.c =================================================================== --- linux-don.orig/arch/i386/kernel/apic.c +++ linux-don/arch/i386/kernel/apic.c @@ -36,6 +36,7 @@ #include <asm/arch_hooks.h> #include <asm/hpet.h> #include <asm/i8253.h> +#include <asm/nmi.h> #include <mach_apic.h> #include <mach_apicdef.h> Index: linux-don/arch/i386/kernel/io_apic.c =================================================================== --- linux-don.orig/arch/i386/kernel/io_apic.c +++ linux-don/arch/i386/kernel/io_apic.c @@ -38,6 +38,7 @@ #include <asm/desc.h> #include <asm/timer.h> #include <asm/i8259.h> +#include <asm/nmi.h> #include <mach_apic.h> Index: linux-don/arch/i386/kernel/nmi.c =================================================================== --- linux-don.orig/arch/i386/kernel/nmi.c +++ linux-don/arch/i386/kernel/nmi.c @@ -14,20 +14,15 @@ */ #include <linux/config.h> -#include <linux/mm.h> #include <linux/delay.h> -#include <linux/bootmem.h> -#include <linux/smp_lock.h> #include <linux/interrupt.h> -#include <linux/mc146818rtc.h> -#include <linux/kernel_stat.h> #include <linux/module.h> #include <linux/nmi.h> #include <linux/sysdev.h> #include <linux/sysctl.h> +#include <linux/percpu.h> #include <asm/smp.h> -#include <asm/div64.h> #include <asm/nmi.h> #include "mach_traps.h" Index: linux-don/arch/i386/kernel/smpboot.c =================================================================== --- linux-don.orig/arch/i386/kernel/smpboot.c +++ linux-don/arch/i386/kernel/smpboot.c @@ -52,6 +52,7 @@ #include <asm/tlbflush.h> #include <asm/desc.h> #include <asm/arch_hooks.h> +#include <asm/nmi.h> #include <mach_apic.h> #include <mach_wakecpu.h> Index: linux-don/arch/i386/oprofile/op_model_athlon.c =================================================================== --- linux-don.orig/arch/i386/oprofile/op_model_athlon.c +++ linux-don/arch/i386/oprofile/op_model_athlon.c @@ -13,6 +13,7 @@ #include <linux/oprofile.h> #include <asm/ptrace.h> #include <asm/msr.h> +#include <asm/nmi.h> #include "op_x86_model.h" #include "op_counter.h" Index: linux-don/arch/i386/oprofile/op_model_p4.c =================================================================== --- linux-don.orig/arch/i386/oprofile/op_model_p4.c +++ linux-don/arch/i386/oprofile/op_model_p4.c @@ -14,6 +14,7 @@ #include <asm/ptrace.h> #include <asm/fixmap.h> #include <asm/apic.h> +#include <asm/nmi.h> #include "op_x86_model.h" #include "op_counter.h" Index: linux-don/arch/i386/oprofile/op_model_ppro.c =================================================================== --- linux-don.orig/arch/i386/oprofile/op_model_ppro.c +++ linux-don/arch/i386/oprofile/op_model_ppro.c @@ -14,6 +14,7 @@ #include <asm/ptrace.h> #include <asm/msr.h> #include <asm/apic.h> +#include <asm/nmi.h> #include "op_x86_model.h" #include "op_counter.h" Index: linux-don/arch/x86_64/kernel/io_apic.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/io_apic.c +++ linux-don/arch/x86_64/kernel/io_apic.c @@ -41,6 +41,7 @@ #include <asm/mach_apic.h> #include <asm/acpi.h> #include <asm/dma.h> +#include <asm/nmi.h> #define __apicdebuginit __init Index: linux-don/include/asm-i386/apic.h =================================================================== --- linux-don.orig/include/asm-i386/apic.h +++ linux-don/include/asm-i386/apic.h @@ -112,24 +112,12 @@ extern void init_apic_mappings (void); extern void smp_local_timer_interrupt (struct pt_regs * regs); extern void setup_boot_APIC_clock (void); extern void setup_secondary_APIC_clock (void); -extern void setup_apic_nmi_watchdog (void); -extern int reserve_lapic_nmi(void); -extern void release_lapic_nmi(void); -extern void disable_timer_nmi_watchdog(void); -extern void enable_timer_nmi_watchdog(void); -extern void nmi_watchdog_tick (struct pt_regs * regs); extern int APIC_init_uniprocessor (void); extern void disable_APIC_timer(void); extern void enable_APIC_timer(void); extern void enable_NMI_through_LVT0 (void * dummy); -extern unsigned int nmi_watchdog; -#define NMI_NONE 0 -#define NMI_IO_APIC 1 -#define NMI_LOCAL_APIC 2 -#define NMI_INVALID 3 - extern int disable_timer_pin_1; void smp_send_timer_broadcast_ipi(struct pt_regs *regs); Index: linux-don/include/asm-i386/nmi.h =================================================================== --- linux-don.orig/include/asm-i386/nmi.h +++ linux-don/include/asm-i386/nmi.h @@ -5,24 +5,38 @@ #define ASM_NMI_H #include <linux/pm.h> - + struct pt_regs; - + typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu); - -/** + +/** * set_nmi_callback * * Set a handler for an NMI. Only one handler may be * set. Return 1 if the NMI was handled. */ void set_nmi_callback(nmi_callback_t callback); - -/** + +/** * unset_nmi_callback * * Remove the handler previously set. */ void unset_nmi_callback(void); - + +extern void setup_apic_nmi_watchdog (void); +extern int reserve_lapic_nmi(void); +extern void release_lapic_nmi(void); +extern void disable_timer_nmi_watchdog(void); +extern void enable_timer_nmi_watchdog(void); +extern void nmi_watchdog_tick (struct pt_regs * regs); + +extern unsigned int nmi_watchdog; +#define NMI_DEFAULT -1 +#define NMI_NONE 0 +#define NMI_IO_APIC 1 +#define NMI_LOCAL_APIC 2 +#define NMI_INVALID 3 + #endif /* ASM_NMI_H */ Index: linux-don/include/asm-x86_64/apic.h =================================================================== --- linux-don.orig/include/asm-x86_64/apic.h +++ linux-don/include/asm-x86_64/apic.h @@ -80,27 +80,11 @@ extern void init_apic_mappings (void); extern void smp_local_timer_interrupt (struct pt_regs * regs); extern void setup_boot_APIC_clock (void); extern void setup_secondary_APIC_clock (void); -extern void setup_apic_nmi_watchdog (void); -extern int reserve_lapic_nmi(void); -extern void release_lapic_nmi(void); -extern void disable_timer_nmi_watchdog(void); -extern void enable_timer_nmi_watchdog(void); -extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); extern int APIC_init_uniprocessor (void); extern void disable_APIC_timer(void); extern void enable_APIC_timer(void); extern void clustered_apic_check(void); -extern void nmi_watchdog_default(void); -extern int setup_nmi_watchdog(char *); - -extern unsigned int nmi_watchdog; -#define NMI_DEFAULT -1 -#define NMI_NONE 0 -#define NMI_IO_APIC 1 -#define NMI_LOCAL_APIC 2 -#define NMI_INVALID 3 - extern int disable_timer_pin_1; extern void setup_threshold_lvt(unsigned long lvt_off); Index: linux-don/include/asm-x86_64/nmi.h =================================================================== --- linux-don.orig/include/asm-x86_64/nmi.h +++ linux-don/include/asm-x86_64/nmi.h @@ -5,26 +5,27 @@ #define ASM_NMI_H #include <linux/pm.h> +#include <asm/io.h> struct pt_regs; - + typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu); - -/** + +/** * set_nmi_callback * * Set a handler for an NMI. Only one handler may be * set. Return 1 if the NMI was handled. */ void set_nmi_callback(nmi_callback_t callback); - -/** + +/** * unset_nmi_callback * * Remove the handler previously set. */ void unset_nmi_callback(void); - + #ifdef CONFIG_PM /** Replace the PM callback routine for NMI. */ @@ -56,4 +57,21 @@ extern int unknown_nmi_panic; extern int check_nmi_watchdog(void); +extern void setup_apic_nmi_watchdog (void); +extern int reserve_lapic_nmi(void); +extern void release_lapic_nmi(void); +extern void disable_timer_nmi_watchdog(void); +extern void enable_timer_nmi_watchdog(void); +extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); + +extern void nmi_watchdog_default(void); +extern int setup_nmi_watchdog(char *); + +extern unsigned int nmi_watchdog; +#define NMI_DEFAULT -1 +#define NMI_NONE 0 +#define NMI_IO_APIC 1 +#define NMI_LOCAL_APIC 2 +#define NMI_INVALID 3 + #endif /* ASM_NMI_H */ Index: linux-don/arch/x86_64/kernel/nmi.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/nmi.c +++ linux-don/arch/x86_64/kernel/nmi.c @@ -15,11 +15,7 @@ #include <linux/config.h> #include <linux/mm.h> #include <linux/delay.h> -#include <linux/bootmem.h> -#include <linux/smp_lock.h> #include <linux/interrupt.h> -#include <linux/mc146818rtc.h> -#include <linux/kernel_stat.h> #include <linux/module.h> #include <linux/sysdev.h> #include <linux/nmi.h> @@ -27,13 +23,9 @@ #include <linux/kprobes.h> #include <asm/smp.h> -#include <asm/mtrr.h> -#include <asm/mpspec.h> #include <asm/nmi.h> -#include <asm/msr.h> #include <asm/proto.h> #include <asm/kdebug.h> -#include <asm/local.h> #include <asm/mce.h> /* -- |
From: dzickus <dz...@re...> - 2006-05-09 20:57:10
|
This patch cleans up the NMI interrupt path. Instead of being gated by if the 'nmi callback' is set, the interrupt handler now calls everyone who is registered on the die_chain and additionally checks the nmi watchdog, reseting it if enabled. This allows more subsystems to hook into the NMI if they need to (without being block by set_nmi_callback). Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/i386/kernel/nmi.c =================================================================== --- linux-don.orig/arch/i386/kernel/nmi.c +++ linux-don/arch/i386/kernel/nmi.c @@ -780,7 +780,7 @@ void touch_nmi_watchdog (void) extern void die_nmi(struct pt_regs *, const char *msg); -void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason) +int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason) { /* @@ -793,10 +793,12 @@ void nmi_watchdog_tick (struct pt_regs * int cpu = smp_processor_id(); struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); u64 dummy; + int rc=0; /* check for other users first */ if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP) { + rc = 1; touched = 1; } @@ -849,10 +851,18 @@ void nmi_watchdog_tick (struct pt_regs * } /* start the cycle over again */ write_watchdog_counter(wd->perfctr_msr, NULL); - } + rc = 1; + } else if (nmi_watchdog == NMI_IO_APIC) { + /* don't know how to accurately check for this. + * just assume it was a watchdog timer interrupt + * This matches the old behaviour. + */ + rc = 1; + } else + printk(KERN_WARNING "Unknown enabled NMI hardware?!\n"); } done: - return; + return rc; } #ifdef CONFIG_SYSCTL Index: linux-don/arch/i386/kernel/traps.c =================================================================== --- linux-don.orig/arch/i386/kernel/traps.c +++ linux-don/arch/i386/kernel/traps.c @@ -673,6 +673,13 @@ void die_nmi (struct pt_regs *regs, cons do_exit(SIGSEGV); } +static int dummy_nmi_callback(struct pt_regs * regs, int cpu) +{ + return 0; +} + +static nmi_callback_t nmi_callback = dummy_nmi_callback; + static void default_do_nmi(struct pt_regs * regs) { unsigned char reason = 0; @@ -690,12 +697,11 @@ static void default_do_nmi(struct pt_reg * Ok, so this is none of the documented NMI sources, * so it must be the NMI watchdog. */ - if (nmi_watchdog) { - nmi_watchdog_tick(regs, reason); + if (nmi_watchdog_tick(regs, reason)) return; - } #endif - unknown_nmi_error(reason, regs); + if (!rcu_dereference(nmi_callback)(regs, smp_processor_id())) + unknown_nmi_error(reason, regs); return; } if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP) @@ -711,13 +717,6 @@ static void default_do_nmi(struct pt_reg reassert_nmi(); } -static int dummy_nmi_callback(struct pt_regs * regs, int cpu) -{ - return 0; -} - -static nmi_callback_t nmi_callback = dummy_nmi_callback; - fastcall void do_nmi(struct pt_regs * regs, long error_code) { int cpu; @@ -728,8 +727,7 @@ fastcall void do_nmi(struct pt_regs * re ++nmi_count(cpu); - if (!rcu_dereference(nmi_callback)(regs, cpu)) - default_do_nmi(regs); + default_do_nmi(regs); nmi_exit(); } Index: linux-don/arch/x86_64/kernel/nmi.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/nmi.c +++ linux-don/arch/x86_64/kernel/nmi.c @@ -683,16 +683,18 @@ void touch_nmi_watchdog (void) touch_softlockup_watchdog(); } -void __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) +int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) { int sum; int touched = 0; struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); u64 dummy; + int rc=0; /* check for other users first */ if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP) { + rc = 1; touched = 1; } @@ -747,10 +749,18 @@ void __kprobes nmi_watchdog_tick(struct } /* start the cycle over again */ wrmsrl(wd->perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); - } + rc = 1; + } else if (nmi_watchdog == NMI_IO_APIC) { + /* don't know how to accurately check for this. + * just assume it was a watchdog timer interrupt + * This matches the old behaviour. + */ + rc = 1; + } else + printk(KERN_WARNING "Unknown enabled NMI hardware?!\n"); } done: - return; + return rc; } static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu) @@ -762,15 +772,17 @@ static nmi_callback_t nmi_callback = dum asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code) { - int cpu = safe_smp_processor_id(); - nmi_enter(); add_pda(__nmi_count,1); - if (!rcu_dereference(nmi_callback)(regs, cpu)) - default_do_nmi(regs); + default_do_nmi(regs); nmi_exit(); } +int do_nmi_callback(struct pt_regs * regs, int cpu) +{ + return rcu_dereference(nmi_callback)(regs, cpu); +} + void set_nmi_callback(nmi_callback_t callback) { vmalloc_sync_all(); Index: linux-don/arch/x86_64/kernel/traps.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/traps.c +++ linux-don/arch/x86_64/kernel/traps.c @@ -657,12 +657,12 @@ asmlinkage __kprobes void default_do_nmi * Ok, so this is none of the documented NMI sources, * so it must be the NMI watchdog. */ - if (nmi_watchdog > 0) { - nmi_watchdog_tick(regs,reason); + if (nmi_watchdog_tick(regs,reason)) return; - } + if (!do_nmi_callback(regs,cpu)) #endif - unknown_nmi_error(reason, regs); + unknown_nmi_error(reason, regs); + return; } if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP) Index: linux-don/include/asm-i386/nmi.h =================================================================== --- linux-don.orig/include/asm-i386/nmi.h +++ linux-don/include/asm-i386/nmi.h @@ -37,7 +37,7 @@ extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); extern void disable_timer_nmi_watchdog(void); extern void enable_timer_nmi_watchdog(void); -extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); +extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); extern atomic_t nmi_active; extern unsigned int nmi_watchdog; Index: linux-don/include/asm-x86_64/nmi.h =================================================================== --- linux-don.orig/include/asm-x86_64/nmi.h +++ linux-don/include/asm-x86_64/nmi.h @@ -26,6 +26,14 @@ void set_nmi_callback(nmi_callback_t cal */ void unset_nmi_callback(void); +/** + * do_nmi_callback + * + * Check to see if a callback exists and execute it. Return 1 + * if the handler exists and was handled successfully. + */ +int do_nmi_callback(struct pt_regs *regs, int cpu); + #ifdef CONFIG_PM /** Replace the PM callback routine for NMI. */ @@ -68,7 +76,7 @@ extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); extern void disable_timer_nmi_watchdog(void); extern void enable_timer_nmi_watchdog(void); -extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); +extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); extern void nmi_watchdog_default(void); extern int setup_nmi_watchdog(char *); -- |
From: dzickus <dz...@re...> - 2006-05-09 20:57:12
|
Incorporates the new performance counter reservation system in oprofile. Also cleans up a lot of the initialization code. The code original zero'd out every register associated with performance counters regardless if those registers were used or not. This causes issues with the nmi watchdog. Signed-off-by: Don Zickus <dz...@re... Index: linux-don/arch/i386/oprofile/nmi_int.c =================================================================== --- linux-don.orig/arch/i386/oprofile/nmi_int.c +++ linux-don/arch/i386/oprofile/nmi_int.c @@ -97,15 +97,19 @@ static void nmi_cpu_save_registers(struc unsigned int i; for (i = 0; i < nr_ctrs; ++i) { - rdmsr(counters[i].addr, - counters[i].saved.low, - counters[i].saved.high); + if (counters[i].addr){ + rdmsr(counters[i].addr, + counters[i].saved.low, + counters[i].saved.high); + } } for (i = 0; i < nr_ctrls; ++i) { - rdmsr(controls[i].addr, - controls[i].saved.low, - controls[i].saved.high); + if (controls[i].addr){ + rdmsr(controls[i].addr, + controls[i].saved.low, + controls[i].saved.high); + } } } @@ -204,15 +208,19 @@ static void nmi_restore_registers(struct unsigned int i; for (i = 0; i < nr_ctrls; ++i) { - wrmsr(controls[i].addr, - controls[i].saved.low, - controls[i].saved.high); + if (controls[i].addr){ + wrmsr(controls[i].addr, + controls[i].saved.low, + controls[i].saved.high); + } } for (i = 0; i < nr_ctrs; ++i) { - wrmsr(counters[i].addr, - counters[i].saved.low, - counters[i].saved.high); + if (counters[i].addr){ + wrmsr(counters[i].addr, + counters[i].saved.low, + counters[i].saved.high); + } } } @@ -233,6 +241,7 @@ static void nmi_cpu_shutdown(void * dumm apic_write(APIC_LVTPC, saved_lvtpc[cpu]); apic_write(APIC_LVTERR, v); nmi_restore_registers(msrs); + model->shutdown(msrs); } @@ -283,6 +292,14 @@ static int nmi_create_files(struct super struct dentry * dir; char buf[2]; + /* quick little hack to _not_ expose a counter if it is not + * available for use. This should protect userspace app. + * NOTE: assumes 1:1 mapping here (that counters are organized + * sequentially in their struct assignment). + */ + if (unlikely(!avail_to_resrv_perfctr_nmi_bit(i))) + continue; + snprintf(buf, 2, "%d", i); dir = oprofilefs_mkdir(sb, root, buf); oprofilefs_create_ulong(sb, dir, "enabled", &counter_config[i].enabled); Index: linux-don/arch/i386/oprofile/op_model_athlon.c =================================================================== --- linux-don.orig/arch/i386/oprofile/op_model_athlon.c +++ linux-don/arch/i386/oprofile/op_model_athlon.c @@ -21,10 +21,12 @@ #define NUM_COUNTERS 4 #define NUM_CONTROLS 4 +#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0) #define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0) #define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(unsigned int)(l), -1);} while (0) #define CTR_OVERFLOWED(n) (!((n) & (1U<<31))) +#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0) #define CTRL_READ(l,h,msrs,c) do {rdmsr(msrs->controls[(c)].addr, (l), (h));} while (0) #define CTRL_WRITE(l,h,msrs,c) do {wrmsr(msrs->controls[(c)].addr, (l), (h));} while (0) #define CTRL_SET_ACTIVE(n) (n |= (1<<22)) @@ -40,15 +42,21 @@ static unsigned long reset_value[NUM_COU static void athlon_fill_in_addresses(struct op_msrs * const msrs) { - msrs->counters[0].addr = MSR_K7_PERFCTR0; - msrs->counters[1].addr = MSR_K7_PERFCTR1; - msrs->counters[2].addr = MSR_K7_PERFCTR2; - msrs->counters[3].addr = MSR_K7_PERFCTR3; - - msrs->controls[0].addr = MSR_K7_EVNTSEL0; - msrs->controls[1].addr = MSR_K7_EVNTSEL1; - msrs->controls[2].addr = MSR_K7_EVNTSEL2; - msrs->controls[3].addr = MSR_K7_EVNTSEL3; + int i; + + for (i=0; i < NUM_COUNTERS; i++) { + if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i)) + msrs->counters[i].addr = MSR_K7_PERFCTR0 + i; + else + msrs->counters[i].addr = 0; + } + + for (i=0; i < NUM_CONTROLS; i++) { + if (reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i)) + msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i; + else + msrs->controls[i].addr = 0; + } } @@ -59,19 +67,23 @@ static void athlon_setup_ctrs(struct op_ /* clear all counters */ for (i = 0 ; i < NUM_CONTROLS; ++i) { + if (unlikely(!CTRL_IS_RESERVED(msrs,i))) + continue; CTRL_READ(low, high, msrs, i); CTRL_CLEAR(low); CTRL_WRITE(low, high, msrs, i); } - + /* avoid a false detection of ctr overflows in NMI handler */ for (i = 0; i < NUM_COUNTERS; ++i) { + if (unlikely(!CTR_IS_RESERVED(msrs,i))) + continue; CTR_WRITE(1, msrs, i); } /* enable active counters */ for (i = 0; i < NUM_COUNTERS; ++i) { - if (counter_config[i].enabled) { + if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) { reset_value[i] = counter_config[i].count; CTR_WRITE(counter_config[i].count, msrs, i); @@ -98,6 +110,8 @@ static int athlon_check_ctrs(struct pt_r int i; for (i = 0 ; i < NUM_COUNTERS; ++i) { + if (!reset_value[i]) + continue; CTR_READ(low, high, msrs, i); if (CTR_OVERFLOWED(low)) { oprofile_add_sample(regs, i); @@ -132,12 +146,27 @@ static void athlon_stop(struct op_msrs c /* Subtle: stop on all counters to avoid race with * setting our pm callback */ for (i = 0 ; i < NUM_COUNTERS ; ++i) { + if (!reset_value[i]) + continue; CTRL_READ(low, high, msrs, i); CTRL_SET_INACTIVE(low); CTRL_WRITE(low, high, msrs, i); } } +static void athlon_shutdown(struct op_msrs const * const msrs) +{ + int i; + + for (i = 0 ; i < NUM_COUNTERS ; ++i) { + if (CTR_IS_RESERVED(msrs,i)) + release_perfctr_nmi(MSR_K7_PERFCTR0 + i); + } + for (i = 0 ; i < NUM_CONTROLS ; ++i) { + if (CTRL_IS_RESERVED(msrs,i)) + release_evntsel_nmi(MSR_K7_EVNTSEL0 + i); + } +} struct op_x86_model_spec const op_athlon_spec = { .num_counters = NUM_COUNTERS, @@ -146,5 +175,6 @@ struct op_x86_model_spec const op_athlon .setup_ctrs = &athlon_setup_ctrs, .check_ctrs = &athlon_check_ctrs, .start = &athlon_start, - .stop = &athlon_stop + .stop = &athlon_stop, + .shutdown = &athlon_shutdown }; Index: linux-don/arch/i386/oprofile/op_model_p4.c =================================================================== --- linux-don.orig/arch/i386/oprofile/op_model_p4.c +++ linux-don/arch/i386/oprofile/op_model_p4.c @@ -32,7 +32,7 @@ #define NUM_CONTROLS_HT2 (NUM_ESCRS_HT2 + NUM_CCCRS_HT2) static unsigned int num_counters = NUM_COUNTERS_NON_HT; - +static unsigned int num_controls = NUM_CONTROLS_NON_HT; /* this has to be checked dynamically since the hyper-threadedness of a chip is discovered at @@ -40,8 +40,10 @@ static unsigned int num_counters = NUM_C static inline void setup_num_counters(void) { #ifdef CONFIG_SMP - if (smp_num_siblings == 2) + if (smp_num_siblings == 2){ num_counters = NUM_COUNTERS_HT2; + num_controls = NUM_CONTROLS_HT2; + } #endif } @@ -97,15 +99,6 @@ static struct p4_counter_binding p4_coun #define NUM_UNUSED_CCCRS NUM_CCCRS_NON_HT - NUM_COUNTERS_NON_HT -/* All cccr we don't use. */ -static int p4_unused_cccr[NUM_UNUSED_CCCRS] = { - MSR_P4_BPU_CCCR1, MSR_P4_BPU_CCCR3, - MSR_P4_MS_CCCR1, MSR_P4_MS_CCCR3, - MSR_P4_FLAME_CCCR1, MSR_P4_FLAME_CCCR3, - MSR_P4_IQ_CCCR0, MSR_P4_IQ_CCCR1, - MSR_P4_IQ_CCCR2, MSR_P4_IQ_CCCR3 -}; - /* p4 event codes in libop/op_event.h are indices into this table. */ static struct p4_event_binding p4_events[NUM_EVENTS] = { @@ -372,6 +365,8 @@ static struct p4_event_binding p4_events #define CCCR_OVF_P(cccr) ((cccr) & (1U<<31)) #define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31))) +#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0) +#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0) #define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0) #define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0) #define CTR_OVERFLOW_P(ctr) (!((ctr) & 0x80000000)) @@ -401,29 +396,34 @@ static unsigned long reset_value[NUM_COU static void p4_fill_in_addresses(struct op_msrs * const msrs) { unsigned int i; - unsigned int addr, stag; + unsigned int addr, cccraddr, stag; setup_num_counters(); stag = get_stagger(); - /* the counter registers we pay attention to */ + /* initialize some registers */ for (i = 0; i < num_counters; ++i) { - msrs->counters[i].addr = - p4_counters[VIRT_CTR(stag, i)].counter_address; + msrs->counters[i].addr = 0; } - - /* FIXME: bad feeling, we don't save the 10 counters we don't use. */ - - /* 18 CCCR registers */ - for (i = 0, addr = MSR_P4_BPU_CCCR0 + stag; - addr <= MSR_P4_IQ_CCCR5; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + for (i = 0; i < num_controls; ++i) { + msrs->controls[i].addr = 0; } + /* the counter & cccr registers we pay attention to */ + for (i = 0; i < num_counters; ++i) { + addr = p4_counters[VIRT_CTR(stag, i)].counter_address; + cccraddr = p4_counters[VIRT_CTR(stag, i)].cccr_address; + if (reserve_perfctr_nmi(addr)){ + msrs->counters[i].addr = addr; + msrs->controls[i].addr = cccraddr; + } + } + /* 43 ESCR registers in three or four discontiguous group */ for (addr = MSR_P4_BSU_ESCR0 + stag; addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + if (reserve_evntsel_nmi(addr)) + msrs->controls[i].addr = addr; } /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1 @@ -431,47 +431,57 @@ static void p4_fill_in_addresses(struct if (boot_cpu_data.x86_model >= 0x3) { for (addr = MSR_P4_BSU_ESCR0 + stag; addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + if (reserve_evntsel_nmi(addr)) + msrs->controls[i].addr = addr; } } else { for (addr = MSR_P4_IQ_ESCR0 + stag; addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + if (reserve_evntsel_nmi(addr)) + msrs->controls[i].addr = addr; } } for (addr = MSR_P4_RAT_ESCR0 + stag; addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + if (reserve_evntsel_nmi(addr)) + msrs->controls[i].addr = addr; } for (addr = MSR_P4_MS_ESCR0 + stag; addr <= MSR_P4_TC_ESCR1; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + if (reserve_evntsel_nmi(addr)) + msrs->controls[i].addr = addr; } for (addr = MSR_P4_IX_ESCR0 + stag; addr <= MSR_P4_CRU_ESCR3; ++i, addr += addr_increment()) { - msrs->controls[i].addr = addr; + if (reserve_evntsel_nmi(addr)) + msrs->controls[i].addr = addr; } /* there are 2 remaining non-contiguously located ESCRs */ if (num_counters == NUM_COUNTERS_NON_HT) { /* standard non-HT CPUs handle both remaining ESCRs*/ - msrs->controls[i++].addr = MSR_P4_CRU_ESCR5; - msrs->controls[i++].addr = MSR_P4_CRU_ESCR4; + if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5)) + msrs->controls[i++].addr = MSR_P4_CRU_ESCR5; + if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4)) + msrs->controls[i++].addr = MSR_P4_CRU_ESCR4; } else if (stag == 0) { /* HT CPUs give the first remainder to the even thread, as the 32nd control register */ - msrs->controls[i++].addr = MSR_P4_CRU_ESCR4; + if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4)) + msrs->controls[i++].addr = MSR_P4_CRU_ESCR4; } else { /* and two copies of the second to the odd thread, for the 22st and 23nd control registers */ - msrs->controls[i++].addr = MSR_P4_CRU_ESCR5; - msrs->controls[i++].addr = MSR_P4_CRU_ESCR5; + if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5)) { + msrs->controls[i++].addr = MSR_P4_CRU_ESCR5; + msrs->controls[i++].addr = MSR_P4_CRU_ESCR5; + } } } @@ -544,7 +554,6 @@ static void p4_setup_ctrs(struct op_msrs { unsigned int i; unsigned int low, high; - unsigned int addr; unsigned int stag; stag = get_stagger(); @@ -557,59 +566,24 @@ static void p4_setup_ctrs(struct op_msrs /* clear the cccrs we will use */ for (i = 0 ; i < num_counters ; i++) { + if (unlikely(!CTRL_IS_RESERVED(msrs,i))) + continue; rdmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high); CCCR_CLEAR(low); CCCR_SET_REQUIRED_BITS(low); wrmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high); } - /* clear cccrs outside our concern */ - for (i = stag ; i < NUM_UNUSED_CCCRS ; i += addr_increment()) { - rdmsr(p4_unused_cccr[i], low, high); - CCCR_CLEAR(low); - CCCR_SET_REQUIRED_BITS(low); - wrmsr(p4_unused_cccr[i], low, high); - } - /* clear all escrs (including those outside our concern) */ - for (addr = MSR_P4_BSU_ESCR0 + stag; - addr < MSR_P4_IQ_ESCR0; addr += addr_increment()) { - wrmsr(addr, 0, 0); - } - - /* On older models clear also MSR_P4_IQ_ESCR0/1 */ - if (boot_cpu_data.x86_model < 0x3) { - wrmsr(MSR_P4_IQ_ESCR0, 0, 0); - wrmsr(MSR_P4_IQ_ESCR1, 0, 0); - } - - for (addr = MSR_P4_RAT_ESCR0 + stag; - addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { - wrmsr(addr, 0, 0); - } - - for (addr = MSR_P4_MS_ESCR0 + stag; - addr <= MSR_P4_TC_ESCR1; addr += addr_increment()){ - wrmsr(addr, 0, 0); - } - - for (addr = MSR_P4_IX_ESCR0 + stag; - addr <= MSR_P4_CRU_ESCR3; addr += addr_increment()){ - wrmsr(addr, 0, 0); + for (i = num_counters; i < num_controls; i++) { + if (unlikely(!CTRL_IS_RESERVED(msrs,i))) + continue; + wrmsr(msrs->controls[i].addr, 0, 0); } - if (num_counters == NUM_COUNTERS_NON_HT) { - wrmsr(MSR_P4_CRU_ESCR4, 0, 0); - wrmsr(MSR_P4_CRU_ESCR5, 0, 0); - } else if (stag == 0) { - wrmsr(MSR_P4_CRU_ESCR4, 0, 0); - } else { - wrmsr(MSR_P4_CRU_ESCR5, 0, 0); - } - /* setup all counters */ for (i = 0 ; i < num_counters ; ++i) { - if (counter_config[i].enabled) { + if ((counter_config[i].enabled) && (CTRL_IS_RESERVED(msrs,i))) { reset_value[i] = counter_config[i].count; pmc_setup_one_p4_counter(i); CTR_WRITE(counter_config[i].count, VIRT_CTR(stag, i)); @@ -696,12 +670,32 @@ static void p4_stop(struct op_msrs const stag = get_stagger(); for (i = 0; i < num_counters; ++i) { + if (!reset_value[i]) + continue; CCCR_READ(low, high, VIRT_CTR(stag, i)); CCCR_SET_DISABLE(low); CCCR_WRITE(low, high, VIRT_CTR(stag, i)); } } +static void p4_shutdown(struct op_msrs const * const msrs) +{ + int i; + + for (i = 0 ; i < num_counters ; ++i) { + if (CTR_IS_RESERVED(msrs,i)) + release_perfctr_nmi(msrs->counters[i].addr); + } + /* some of the control registers are specially reserved in + * conjunction with the counter registers (hence the starting offset). + * This saves a few bits. + */ + for (i = num_counters ; i < num_controls ; ++i) { + if (CTRL_IS_RESERVED(msrs,i)) + release_evntsel_nmi(msrs->controls[i].addr); + } +} + #ifdef CONFIG_SMP struct op_x86_model_spec const op_p4_ht2_spec = { @@ -711,7 +705,8 @@ struct op_x86_model_spec const op_p4_ht2 .setup_ctrs = &p4_setup_ctrs, .check_ctrs = &p4_check_ctrs, .start = &p4_start, - .stop = &p4_stop + .stop = &p4_stop, + .shutdown = &p4_shutdown }; #endif @@ -722,5 +717,6 @@ struct op_x86_model_spec const op_p4_spe .setup_ctrs = &p4_setup_ctrs, .check_ctrs = &p4_check_ctrs, .start = &p4_start, - .stop = &p4_stop + .stop = &p4_stop, + .shutdown = &p4_shutdown }; Index: linux-don/arch/i386/oprofile/op_model_ppro.c =================================================================== --- linux-don.orig/arch/i386/oprofile/op_model_ppro.c +++ linux-don/arch/i386/oprofile/op_model_ppro.c @@ -22,10 +22,12 @@ #define NUM_COUNTERS 2 #define NUM_CONTROLS 2 +#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0) #define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0) #define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), -1);} while (0) #define CTR_OVERFLOWED(n) (!((n) & (1U<<31))) +#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0) #define CTRL_READ(l,h,msrs,c) do {rdmsr((msrs->controls[(c)].addr), (l), (h));} while (0) #define CTRL_WRITE(l,h,msrs,c) do {wrmsr((msrs->controls[(c)].addr), (l), (h));} while (0) #define CTRL_SET_ACTIVE(n) (n |= (1<<22)) @@ -41,11 +43,21 @@ static unsigned long reset_value[NUM_COU static void ppro_fill_in_addresses(struct op_msrs * const msrs) { - msrs->counters[0].addr = MSR_P6_PERFCTR0; - msrs->counters[1].addr = MSR_P6_PERFCTR1; + int i; + + for (i=0; i < NUM_COUNTERS; i++) { + if (reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i)) + msrs->counters[i].addr = MSR_P6_PERFCTR0 + i; + else + msrs->counters[i].addr = 0; + } - msrs->controls[0].addr = MSR_P6_EVNTSEL0; - msrs->controls[1].addr = MSR_P6_EVNTSEL1; + for (i=0; i < NUM_CONTROLS; i++) { + if (reserve_evntsel_nmi(MSR_P6_EVNTSEL0 + i)) + msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i; + else + msrs->controls[i].addr = 0; + } } @@ -56,6 +68,8 @@ static void ppro_setup_ctrs(struct op_ms /* clear all counters */ for (i = 0 ; i < NUM_CONTROLS; ++i) { + if (unlikely(!CTRL_IS_RESERVED(msrs,i))) + continue; CTRL_READ(low, high, msrs, i); CTRL_CLEAR(low); CTRL_WRITE(low, high, msrs, i); @@ -63,12 +77,14 @@ static void ppro_setup_ctrs(struct op_ms /* avoid a false detection of ctr overflows in NMI handler */ for (i = 0; i < NUM_COUNTERS; ++i) { + if (unlikely(!CTR_IS_RESERVED(msrs,i))) + continue; CTR_WRITE(1, msrs, i); } /* enable active counters */ for (i = 0; i < NUM_COUNTERS; ++i) { - if (counter_config[i].enabled) { + if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) { reset_value[i] = counter_config[i].count; CTR_WRITE(counter_config[i].count, msrs, i); @@ -81,6 +97,8 @@ static void ppro_setup_ctrs(struct op_ms CTRL_SET_UM(low, counter_config[i].unit_mask); CTRL_SET_EVENT(low, counter_config[i].event); CTRL_WRITE(low, high, msrs, i); + } else { + reset_value[i] = 0; } } } @@ -93,6 +111,8 @@ static int ppro_check_ctrs(struct pt_reg int i; for (i = 0 ; i < NUM_COUNTERS; ++i) { + if (!reset_value[i]) + continue; CTR_READ(low, high, msrs, i); if (CTR_OVERFLOWED(low)) { oprofile_add_sample(regs, i); @@ -118,18 +138,38 @@ static int ppro_check_ctrs(struct pt_reg static void ppro_start(struct op_msrs const * const msrs) { unsigned int low,high; - CTRL_READ(low, high, msrs, 0); - CTRL_SET_ACTIVE(low); - CTRL_WRITE(low, high, msrs, 0); + + if (reset_value[0]) { + CTRL_READ(low, high, msrs, 0); + CTRL_SET_ACTIVE(low); + CTRL_WRITE(low, high, msrs, 0); + } } static void ppro_stop(struct op_msrs const * const msrs) { unsigned int low,high; - CTRL_READ(low, high, msrs, 0); - CTRL_SET_INACTIVE(low); - CTRL_WRITE(low, high, msrs, 0); + + if (reset_value[0]) { + CTRL_READ(low, high, msrs, 0); + CTRL_SET_INACTIVE(low); + CTRL_WRITE(low, high, msrs, 0); + } +} + +static void ppro_shutdown(struct op_msrs const * const msrs) +{ + int i; + + for (i = 0 ; i < NUM_COUNTERS ; ++i) { + if (CTR_IS_RESERVED(msrs,i)) + release_perfctr_nmi(MSR_P6_PERFCTR0 + i); + } + for (i = 0 ; i < NUM_CONTROLS ; ++i) { + if (CTRL_IS_RESERVED(msrs,i)) + release_evntsel_nmi(MSR_P6_EVNTSEL0 + i); + } } @@ -140,5 +180,6 @@ struct op_x86_model_spec const op_ppro_s .setup_ctrs = &ppro_setup_ctrs, .check_ctrs = &ppro_check_ctrs, .start = &ppro_start, - .stop = &ppro_stop + .stop = &ppro_stop, + .shutdown = &ppro_shutdown }; Index: linux-don/arch/i386/oprofile/op_x86_model.h =================================================================== --- linux-don.orig/arch/i386/oprofile/op_x86_model.h +++ linux-don/arch/i386/oprofile/op_x86_model.h @@ -40,6 +40,7 @@ struct op_x86_model_spec { struct op_msrs const * const msrs); void (*start)(struct op_msrs const * const msrs); void (*stop)(struct op_msrs const * const msrs); + void (*shutdown)(struct op_msrs const * const msrs); }; extern struct op_x86_model_spec const op_ppro_spec; -- |
From: dzickus <dz...@re...> - 2006-05-09 20:57:12
|
This patch includes the changes to make the nmi watchdog on x86_64 SMP aware. A bunch of code was moved around to make it simpler to read. In addition, it is now possible to determine if a particular NMI was the result of the watchdog or not. This feature allows the kernel to filter out unknown NMIs easier. Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/x86_64/kernel/apic.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/apic.c +++ linux-don/arch/x86_64/kernel/apic.c @@ -480,8 +480,7 @@ void __cpuinit setup_local_APIC (void) } nmi_watchdog_default(); - if (nmi_watchdog == NMI_LOCAL_APIC) - setup_apic_nmi_watchdog(); + setup_apic_nmi_watchdog(NULL); apic_pm_activate(); } Index: linux-don/arch/x86_64/kernel/nmi.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/nmi.c +++ linux-don/arch/x86_64/kernel/nmi.c @@ -57,53 +57,29 @@ static unsigned int lapic_nmi_owner; #define LAPIC_NMI_RESERVED (1<<1) /* nmi_active: - * +1: the lapic NMI watchdog is active, but can be disabled - * 0: the lapic NMI watchdog has not been set up, and cannot + * >0: the lapic NMI watchdog is active, but can be disabled + * <0: the lapic NMI watchdog has not been set up, and cannot * be enabled - * -1: the lapic NMI watchdog is disabled, but can be enabled + * 0: the lapic NMI watchdog is disabled, but can be enabled */ -int nmi_active; /* oprofile uses this */ +atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */ int panic_on_timeout; unsigned int nmi_watchdog = NMI_DEFAULT; static unsigned int nmi_hz = HZ; -static unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ -static unsigned int nmi_p4_cccr_val; -/* Note that these events don't tick when the CPU idles. This means - the frequency varies with CPU load. */ - -#define K7_EVNTSEL_ENABLE (1 << 22) -#define K7_EVNTSEL_INT (1 << 20) -#define K7_EVNTSEL_OS (1 << 17) -#define K7_EVNTSEL_USR (1 << 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 -#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING +struct nmi_watchdog_ctlblk { + int enabled; + u64 check_bit; + unsigned int cccr_msr; + unsigned int perfctr_msr; /* the MSR to reset in NMI handler */ + unsigned int evntsel_msr; /* the MSR to select the events to handle */ +}; +static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk); -#define MSR_P4_MISC_ENABLE 0x1A0 -#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7) -#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL (1<<12) -#define MSR_P4_PERFCTR0 0x300 -#define MSR_P4_CCCR0 0x360 -#define P4_ESCR_EVENT_SELECT(N) ((N)<<25) -#define P4_ESCR_OS (1<<3) -#define P4_ESCR_USR (1<<2) -#define P4_CCCR_OVF_PMI0 (1<<26) -#define P4_CCCR_OVF_PMI1 (1<<27) -#define P4_CCCR_THRESHOLD(N) ((N)<<20) -#define P4_CCCR_COMPLEMENT (1<<19) -#define P4_CCCR_COMPARE (1<<18) -#define P4_CCCR_REQUIRED (3<<16) -#define P4_CCCR_ESCR_SELECT(N) ((N)<<13) -#define P4_CCCR_ENABLE (1<<12) -/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter - CRU_ESCR0 (with any non-null event selector) through a complemented - max threshold. [IA32-Vol3, Section 14.9.9] */ -#define MSR_P4_IQ_COUNTER0 0x30C -#define P4_NMI_CRU_ESCR0 (P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR) -#define P4_NMI_IQ_CCCR0 \ - (P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \ - P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE) +/* local prototypes */ +static void stop_apic_nmi_watchdog(void *unused); +static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu); /* converts an msr to an appropriate reservation bit */ static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr) @@ -242,6 +218,12 @@ int __init check_nmi_watchdog (void) int *counts; int cpu; + if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT)) + return 0; + + if (!atomic_read(&nmi_active)) + return 0; + counts = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL); if (!counts) return -1; @@ -259,19 +241,22 @@ int __init check_nmi_watchdog (void) mdelay((10*1000)/nmi_hz); // wait 10 ticks for_each_online_cpu(cpu) { + if (!per_cpu(nmi_watchdog_ctlblk.enabled, cpu)) + continue; if (cpu_pda(cpu)->__nmi_count - counts[cpu] <= 5) { - endflag = 1; printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n", cpu, counts[cpu], cpu_pda(cpu)->__nmi_count); - nmi_active = 0; - lapic_nmi_owner &= ~LAPIC_NMI_WATCHDOG; - nmi_perfctr_msr = 0; - kfree(counts); - return -1; + per_cpu(nmi_watchdog_ctlblk.enabled, cpu) = 0; + atomic_dec(&nmi_active); } } + if (!atomic_read(&nmi_active)) { + kfree(counts); + atomic_set(&nmi_active, -1); + return -1; + } endflag = 1; printk("OK.\n"); @@ -298,8 +283,11 @@ int __init setup_nmi_watchdog(char *str) get_option(&str, &nmi); - if (nmi >= NMI_INVALID) + if ((nmi >= NMI_INVALID) || (nmi < NMI_NONE)) return 0; + + if ((nmi == NMI_LOCAL_APIC) && (nmi_known_cpu() == 0)) + return 0; /* no lapic support */ nmi_watchdog = nmi; return 1; } @@ -308,31 +296,30 @@ __setup("nmi_watchdog=", setup_nmi_watch static void disable_lapic_nmi_watchdog(void) { - if (nmi_active <= 0) + BUG_ON(nmi_watchdog != NMI_LOCAL_APIC); + + if (atomic_read(&nmi_active) <= 0) return; - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_AMD: - wrmsr(MSR_K7_EVNTSEL0, 0, 0); - break; - case X86_VENDOR_INTEL: - if (boot_cpu_data.x86 == 15) { - wrmsr(MSR_P4_IQ_CCCR0, 0, 0); - wrmsr(MSR_P4_CRU_ESCR0, 0, 0); - } - break; - } - nmi_active = -1; - /* tell do_nmi() and others that we're not active any more */ - nmi_watchdog = 0; + + on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1); + + BUG_ON(atomic_read(&nmi_active) != 0); } static void enable_lapic_nmi_watchdog(void) { - if (nmi_active < 0) { - nmi_watchdog = NMI_LOCAL_APIC; - touch_nmi_watchdog(); - setup_apic_nmi_watchdog(); - } + BUG_ON(nmi_watchdog != NMI_LOCAL_APIC); + + /* are we already enabled */ + if (atomic_read(&nmi_active) != 0) + return; + + /* are we lapic aware */ + if (nmi_known_cpu() <= 0) + return; + + on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1); + touch_nmi_watchdog(); } int reserve_lapic_nmi(void) @@ -364,21 +351,24 @@ void release_lapic_nmi(void) void disable_timer_nmi_watchdog(void) { - if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0)) + BUG_ON(nmi_watchdog != NMI_IO_APIC); + + if (atomic_read(&nmi_active) <= 0) return; disable_irq(0); - unset_nmi_callback(); - nmi_active = -1; - nmi_watchdog = NMI_NONE; + on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1); + + BUG_ON(atomic_read(&nmi_active) != 0); } void enable_timer_nmi_watchdog(void) { - if (nmi_active < 0) { - nmi_watchdog = NMI_IO_APIC; + BUG_ON(nmi_watchdog != NMI_IO_APIC); + + if (atomic_read(&nmi_active) == 0) { touch_nmi_watchdog(); - nmi_active = 1; + on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1); enable_irq(0); } } @@ -389,7 +379,7 @@ static int nmi_pm_active; /* nmi_active static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state) { - nmi_pm_active = nmi_active; + nmi_pm_active = atomic_read(&nmi_active); disable_lapic_nmi_watchdog(); return 0; } @@ -397,7 +387,7 @@ static int lapic_nmi_suspend(struct sys_ static int lapic_nmi_resume(struct sys_device *dev) { if (nmi_pm_active > 0) - enable_lapic_nmi_watchdog(); + enable_lapic_nmi_watchdog(); return 0; } @@ -416,7 +406,13 @@ static int __init init_lapic_nmi_sysfs(v { int error; - if (nmi_active == 0 || nmi_watchdog != NMI_LOCAL_APIC) + /* should really be a BUG_ON but b/c this is an + * init call, it just doesn't work. -dcz + */ + if (nmi_watchdog != NMI_LOCAL_APIC) + return 0; + + if ( atomic_read(&nmi_active) < 0 ) return 0; error = sysdev_class_register(&nmi_sysclass); @@ -429,100 +425,232 @@ late_initcall(init_lapic_nmi_sysfs); #endif /* CONFIG_PM */ +/* + * Activate the NMI watchdog via the local APIC. + * Original code written by Keith Owens. + */ + +/* Note that these events don't tick when the CPU idles. This means + the frequency varies with CPU load. */ + +#define K7_EVNTSEL_ENABLE (1 << 22) +#define K7_EVNTSEL_INT (1 << 20) +#define K7_EVNTSEL_OS (1 << 17) +#define K7_EVNTSEL_USR (1 << 16) +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING + static int setup_k7_watchdog(void) { + unsigned int perfctr_msr, evntsel_msr; unsigned int evntsel; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); - nmi_perfctr_msr = MSR_K7_PERFCTR0; - - if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + perfctr_msr = MSR_K7_PERFCTR0; + evntsel_msr = MSR_K7_EVNTSEL0; + if (!reserve_perfctr_nmi(perfctr_msr)) goto fail; - if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0)) + if (!reserve_evntsel_nmi(evntsel_msr)) goto fail1; /* Simulator may not support it */ - if (checking_wrmsrl(MSR_K7_EVNTSEL0, 0UL)) + if (checking_wrmsrl(evntsel_msr, 0UL)) goto fail2; - wrmsrl(MSR_K7_PERFCTR0, 0UL); + wrmsrl(perfctr_msr, 0UL); evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS | K7_EVNTSEL_USR | K7_NMI_EVENT; - wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz * 1000 / nmi_hz)); + /* setup the timer */ + wrmsr(evntsel_msr, evntsel, 0); + wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; - wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); + wrmsr(evntsel_msr, evntsel, 0); + + wd->perfctr_msr = perfctr_msr; + wd->evntsel_msr = evntsel_msr; + wd->cccr_msr = 0; //unused + wd->check_bit = 1ULL<<63; return 1; fail2: - release_evntsel_nmi(MSR_K7_EVNTSEL0); + release_evntsel_nmi(evntsel_msr); fail1: - release_perfctr_nmi(nmi_perfctr_msr); + release_perfctr_nmi(perfctr_msr); fail: return 0; } +static void stop_k7_watchdog(void) +{ + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + + wrmsr(wd->evntsel_msr, 0, 0); + + release_evntsel_nmi(wd->evntsel_msr); + release_perfctr_nmi(wd->perfctr_msr); +} + +/* Note that these events don't tick when the CPU idles. This means + the frequency varies with CPU load. */ + +#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7) +#define P4_ESCR_EVENT_SELECT(N) ((N)<<25) +#define P4_ESCR_OS (1<<3) +#define P4_ESCR_USR (1<<2) +#define P4_CCCR_OVF_PMI0 (1<<26) +#define P4_CCCR_OVF_PMI1 (1<<27) +#define P4_CCCR_THRESHOLD(N) ((N)<<20) +#define P4_CCCR_COMPLEMENT (1<<19) +#define P4_CCCR_COMPARE (1<<18) +#define P4_CCCR_REQUIRED (3<<16) +#define P4_CCCR_ESCR_SELECT(N) ((N)<<13) +#define P4_CCCR_ENABLE (1<<12) +#define P4_CCCR_OVF (1<<31) +/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter + CRU_ESCR0 (with any non-null event selector) through a complemented + max threshold. [IA32-Vol3, Section 14.9.9] */ static int setup_p4_watchdog(void) { + unsigned int perfctr_msr, evntsel_msr, cccr_msr; + unsigned int evntsel, cccr_val; unsigned int misc_enable, dummy; + unsigned int ht_num; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); - rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy); + rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy); if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL)) return 0; - nmi_perfctr_msr = MSR_P4_IQ_COUNTER0; - nmi_p4_cccr_val = P4_NMI_IQ_CCCR0; #ifdef CONFIG_SMP - if (smp_num_siblings == 2) - nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1; + /* detect which hyperthread we are on */ + if (smp_num_siblings == 2) { + unsigned int ebx, apicid; + + ebx = cpuid_ebx(1); + apicid = (ebx >> 24) & 0xff; + ht_num = apicid & 1; + } else #endif + ht_num = 0; + + /* performance counters are shared resources + * assign each hyperthread its own set + * (re-use the ESCR0 register, seems safe + * and keeps the cccr_val the same) + */ + if (!ht_num) { + /* logical cpu 0 */ + perfctr_msr = MSR_P4_IQ_PERFCTR0; + evntsel_msr = MSR_P4_CRU_ESCR0; + cccr_msr = MSR_P4_IQ_CCCR0; + cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4); + } else { + /* logical cpu 1 */ + perfctr_msr = MSR_P4_IQ_PERFCTR1; + evntsel_msr = MSR_P4_CRU_ESCR0; + cccr_msr = MSR_P4_IQ_CCCR1; + cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4); + } - if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + if (!reserve_perfctr_nmi(perfctr_msr)) goto fail; - if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0)) + if (!reserve_evntsel_nmi(evntsel_msr)) goto fail1; - wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0); - wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0); - Dprintk("setting P4_IQ_COUNTER0 to 0x%08lx\n", -(cpu_khz * 1000UL / nmi_hz)); - wrmsrl(MSR_P4_IQ_COUNTER0, -((u64)cpu_khz * 1000 / nmi_hz)); + evntsel = P4_ESCR_EVENT_SELECT(0x3F) + | P4_ESCR_OS + | P4_ESCR_USR; + + cccr_val |= P4_CCCR_THRESHOLD(15) + | P4_CCCR_COMPLEMENT + | P4_CCCR_COMPARE + | P4_CCCR_REQUIRED; + + wrmsr(evntsel_msr, evntsel, 0); + wrmsr(cccr_msr, cccr_val, 0); + wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); apic_write(APIC_LVTPC, APIC_DM_NMI); - wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0); + cccr_val |= P4_CCCR_ENABLE; + wrmsr(cccr_msr, cccr_val, 0); + + wd->perfctr_msr = perfctr_msr; + wd->evntsel_msr = evntsel_msr; + wd->cccr_msr = cccr_msr; + wd->check_bit = 1ULL<<39; return 1; fail1: - release_perfctr_nmi(nmi_perfctr_msr); + release_perfctr_nmi(perfctr_msr); fail: return 0; } -void setup_apic_nmi_watchdog(void) +static void stop_p4_watchdog(void) { - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_AMD: - if (boot_cpu_data.x86 != 15) - return; - if (strstr(boot_cpu_data.x86_model_id, "Screwdriver")) - return; - if (!setup_k7_watchdog()) - return; - break; - case X86_VENDOR_INTEL: - if (boot_cpu_data.x86 != 15) - return; - if (!setup_p4_watchdog()) + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + + wrmsr(wd->cccr_msr, 0, 0); + wrmsr(wd->evntsel_msr, 0, 0); + + release_evntsel_nmi(wd->evntsel_msr); + release_perfctr_nmi(wd->perfctr_msr); +} + +void setup_apic_nmi_watchdog(void *unused) +{ + /* only support LOCAL and IO APICs for now */ + if ((nmi_watchdog != NMI_LOCAL_APIC) && + (nmi_watchdog != NMI_IO_APIC)) + return; + + if (nmi_watchdog == NMI_LOCAL_APIC) { + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + if (strstr(boot_cpu_data.x86_model_id, "Screwdriver")) + return; + if (!setup_k7_watchdog()) + return; + break; + case X86_VENDOR_INTEL: + if (!setup_p4_watchdog()) + return; + break; + default: return; - break; + } + } + __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 1; + atomic_inc(&nmi_active); +} - default: - return; +static void stop_apic_nmi_watchdog(void *unused) +{ + /* only support LOCAL and IO APICs for now */ + if ((nmi_watchdog != NMI_LOCAL_APIC) && + (nmi_watchdog != NMI_IO_APIC)) + return; + + if (nmi_watchdog == NMI_LOCAL_APIC) { + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + if (strstr(boot_cpu_data.x86_model_id, "Screwdriver")) + return; + stop_k7_watchdog(); + break; + case X86_VENDOR_INTEL: + stop_p4_watchdog(); + break; + default: + return; + } } - lapic_nmi_owner = LAPIC_NMI_WATCHDOG; - nmi_active = 1; + __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 0; + atomic_dec(&nmi_active); } /* @@ -559,50 +687,70 @@ void __kprobes nmi_watchdog_tick(struct { int sum; int touched = 0; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + u64 dummy; + + /* check for other users first */ + if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) + == NOTIFY_STOP) { + touched = 1; + } sum = read_pda(apic_timer_irqs); if (__get_cpu_var(nmi_touch)) { __get_cpu_var(nmi_touch) = 0; touched = 1; } + #ifdef CONFIG_X86_MCE /* Could check oops_in_progress here too, but it's safer not too */ if (atomic_read(&mce_entry) > 0) touched = 1; #endif + /* if the apic timer isn't firing, this cpu isn't doing much */ if (!touched && __get_cpu_var(last_irq_sum) == sum) { /* * Ayiee, looks like this CPU is stuck ... * wait a few IRQs (5 seconds) before doing the oops ... */ local_inc(&__get_cpu_var(alert_counter)); - if (local_read(&__get_cpu_var(alert_counter)) == 5*nmi_hz) { - if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) - == NOTIFY_STOP) { - local_set(&__get_cpu_var(alert_counter), 0); - return; - } + if (local_read(&__get_cpu_var(alert_counter)) == 5*nmi_hz) die_nmi("NMI Watchdog detected LOCKUP on CPU %d\n", regs); - } } else { __get_cpu_var(last_irq_sum) = sum; local_set(&__get_cpu_var(alert_counter), 0); } - if (nmi_perfctr_msr) { - if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) { - /* - * P4 quirks: - * - An overflown perfctr will assert its interrupt - * until the OVF flag in its CCCR is cleared. - * - LVTPC is masked on interrupt and must be - * unmasked by the LVTPC handler. - */ - wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0); - apic_write(APIC_LVTPC, APIC_DM_NMI); - } - wrmsrl(nmi_perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); + + /* see if the nmi watchdog went off */ + if (wd->enabled) { + if (nmi_watchdog == NMI_LOCAL_APIC) { + rdmsrl(wd->perfctr_msr, dummy); + if (dummy & wd->check_bit){ + /* this wasn't a watchdog timer interrupt */ + goto done; + } + + /* only Intel uses the cccr msr */ + if (wd->cccr_msr != 0) { + /* + * P4 quirks: + * - An overflown perfctr will assert its interrupt + * until the OVF flag in its CCCR is cleared. + * - LVTPC is masked on interrupt and must be + * unmasked by the LVTPC handler. + */ + rdmsrl(wd->cccr_msr, dummy); + dummy &= ~P4_CCCR_OVF; + wrmsrl(wd->cccr_msr, dummy); + apic_write(APIC_LVTPC, APIC_DM_NMI); + } + /* start the cycle over again */ + wrmsrl(wd->perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz)); + } } +done: + return; } static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu) Index: linux-don/include/asm-x86_64/nmi.h =================================================================== --- linux-don.orig/include/asm-x86_64/nmi.h +++ linux-don/include/asm-x86_64/nmi.h @@ -63,7 +63,7 @@ extern void release_perfctr_nmi(unsigned extern int reserve_evntsel_nmi(unsigned int); extern void release_evntsel_nmi(unsigned int); -extern void setup_apic_nmi_watchdog (void); +extern void setup_apic_nmi_watchdog (void *); extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); extern void disable_timer_nmi_watchdog(void); @@ -73,6 +73,7 @@ extern void nmi_watchdog_tick (struct pt extern void nmi_watchdog_default(void); extern int setup_nmi_watchdog(char *); +extern atomic_t nmi_active; extern unsigned int nmi_watchdog; #define NMI_DEFAULT -1 #define NMI_NONE 0 -- |
From: Andi K. <ak...@su...> - 2006-05-09 21:42:29
|
On Tuesday 09 May 2006 22:50, dzickus wrote: > > This patch includes the changes to make the nmi watchdog on x86_64 SMP > aware. A bunch of code was moved around to make it simpler to read. In > addition, it is now possible to determine if a particular NMI was the result > of the watchdog or not. This feature allows the kernel to filter out > unknown NMIs easier. Looks all good to me. I merged it all up. Thanks. Regarding the ioapic watchdog not switching with the sysctl - i guess we can live with that. It is deprecated anyways. Thanks for doing all that work. -Andi |
From: dzickus <dz...@re...> - 2006-05-09 20:57:12
|
This patch includes the changes to make the nmi watchdog on i386 SMP aware. A bunch of code was moved around to make it simpler to read. In addition, it is now possible to determine if a particular NMI was the result of the watchdog or not. This feature allows the kernel to filter out unknown NMIs easier. Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/i386/kernel/apic.c =================================================================== --- linux-don.orig/arch/i386/kernel/apic.c +++ linux-don/arch/i386/kernel/apic.c @@ -587,8 +587,7 @@ void __devinit setup_local_APIC(void) printk("No ESR for 82489DX.\n"); } - if (nmi_watchdog == NMI_LOCAL_APIC) - setup_apic_nmi_watchdog(); + setup_apic_nmi_watchdog(NULL); apic_pm_activate(); } Index: linux-don/arch/i386/kernel/nmi.c =================================================================== --- linux-don.orig/arch/i386/kernel/nmi.c +++ linux-don/arch/i386/kernel/nmi.c @@ -24,16 +24,10 @@ #include <asm/smp.h> #include <asm/nmi.h> +#include <asm/kdebug.h> #include "mach_traps.h" -unsigned int nmi_watchdog = NMI_NONE; -extern int unknown_nmi_panic; -static unsigned int nmi_hz = HZ; -static unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ -static unsigned int nmi_p4_cccr_val; -extern void show_registers(struct pt_regs *regs); - /* perfctr_nmi_owner tracks the ownership of the perfctr registers: * evtsel_nmi_owner tracks the ownership of the event selection * - different performance counters/ event selection may be reserved for @@ -63,51 +57,31 @@ static unsigned int lapic_nmi_owner; #define LAPIC_NMI_RESERVED (1<<1) /* nmi_active: - * +1: the lapic NMI watchdog is active, but can be disabled - * 0: the lapic NMI watchdog has not been set up, and cannot + * >0: the lapic NMI watchdog is active, but can be disabled + * <0: the lapic NMI watchdog has not been set up, and cannot * be enabled - * -1: the lapic NMI watchdog is disabled, but can be enabled + * 0: the lapic NMI watchdog is disabled, but can be enabled */ -int nmi_active; +atomic_t nmi_active = ATOMIC_INIT(0); /* oprofile uses this */ -#define K7_EVNTSEL_ENABLE (1 << 22) -#define K7_EVNTSEL_INT (1 << 20) -#define K7_EVNTSEL_OS (1 << 17) -#define K7_EVNTSEL_USR (1 << 16) -#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 -#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING +unsigned int nmi_watchdog = NMI_DEFAULT; +static unsigned int nmi_hz = HZ; -#define P6_EVNTSEL0_ENABLE (1 << 22) -#define P6_EVNTSEL_INT (1 << 20) -#define P6_EVNTSEL_OS (1 << 17) -#define P6_EVNTSEL_USR (1 << 16) -#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 -#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED +struct nmi_watchdog_ctlblk { + int enabled; + u64 check_bit; + unsigned int cccr_msr; + unsigned int perfctr_msr; /* the MSR to reset in NMI handler */ + unsigned int evntsel_msr; /* the MSR to select the events to handle */ +}; +static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk); -#define MSR_P4_MISC_ENABLE 0x1A0 -#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7) -#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL (1<<12) -#define MSR_P4_PERFCTR0 0x300 -#define MSR_P4_CCCR0 0x360 -#define P4_ESCR_EVENT_SELECT(N) ((N)<<25) -#define P4_ESCR_OS (1<<3) -#define P4_ESCR_USR (1<<2) -#define P4_CCCR_OVF_PMI0 (1<<26) -#define P4_CCCR_OVF_PMI1 (1<<27) -#define P4_CCCR_THRESHOLD(N) ((N)<<20) -#define P4_CCCR_COMPLEMENT (1<<19) -#define P4_CCCR_COMPARE (1<<18) -#define P4_CCCR_REQUIRED (3<<16) -#define P4_CCCR_ESCR_SELECT(N) ((N)<<13) -#define P4_CCCR_ENABLE (1<<12) -/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter - CRU_ESCR0 (with any non-null event selector) through a complemented - max threshold. [IA32-Vol3, Section 14.9.9] */ -#define MSR_P4_IQ_COUNTER0 0x30C -#define P4_NMI_CRU_ESCR0 (P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR) -#define P4_NMI_IQ_CCCR0 \ - (P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \ - P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE) +/* local prototypes */ +static void stop_apic_nmi_watchdog(void *unused); +static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu); + +extern void show_registers(struct pt_regs *regs); +extern int unknown_nmi_panic; /* converts an msr to an appropriate reservation bit */ static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr) @@ -208,6 +182,17 @@ void release_evntsel_nmi(unsigned int ms clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner[0])); } +static __cpuinit inline int nmi_known_cpu(void) +{ + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6)); + case X86_VENDOR_INTEL: + return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6)); + } + return 0; +} + #ifdef CONFIG_SMP /* The performance counters used by NMI_LOCAL_APIC don't trigger when * the CPU is idle. To make sure the NMI watchdog really ticks on all @@ -234,7 +219,10 @@ static int __init check_nmi_watchdog(voi unsigned int *prev_nmi_count; int cpu; - if (nmi_watchdog == NMI_NONE) + if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT)) + return 0; + + if (!atomic_read(&nmi_active)) return 0; prev_nmi_count = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL); @@ -258,18 +246,22 @@ static int __init check_nmi_watchdog(voi if (!cpu_isset(cpu, cpu_callin_map)) continue; #endif + if (!per_cpu(nmi_watchdog_ctlblk.enabled, cpu)) + continue; if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) { - endflag = 1; printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n", cpu, prev_nmi_count[cpu], nmi_count(cpu)); - nmi_active = 0; - lapic_nmi_owner &= ~LAPIC_NMI_WATCHDOG; - kfree(prev_nmi_count); - return -1; + per_cpu(nmi_watchdog_ctlblk.enabled, cpu) = 0; + atomic_dec(&nmi_active); } } + if (!atomic_read(&nmi_active)) { + kfree(prev_nmi_count); + atomic_set(&nmi_active, -1); + return -1; + } endflag = 1; printk("OK.\n"); @@ -290,31 +282,16 @@ static int __init setup_nmi_watchdog(cha get_option(&str, &nmi); - if (nmi >= NMI_INVALID) + if ((nmi >= NMI_INVALID) || (nmi < NMI_NONE)) return 0; - if (nmi == NMI_NONE) - nmi_watchdog = nmi; /* * If any other x86 CPU has a local APIC, then * please test the NMI stuff there and send me the * missing bits. Right now Intel P6/P4 and AMD K7 only. */ - if ((nmi == NMI_LOCAL_APIC) && - (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) && - (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15)) - nmi_watchdog = nmi; - if ((nmi == NMI_LOCAL_APIC) && - (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && - (boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15)) - nmi_watchdog = nmi; - /* - * We can enable the IO-APIC watchdog - * unconditionally. - */ - if (nmi == NMI_IO_APIC) { - nmi_active = 1; - nmi_watchdog = nmi; - } + if ((nmi == NMI_LOCAL_APIC) && (nmi_known_cpu() == 0)) + return 0; /* no lapic support */ + nmi_watchdog = nmi; return 1; } @@ -322,41 +299,30 @@ __setup("nmi_watchdog=", setup_nmi_watch static void disable_lapic_nmi_watchdog(void) { - if (nmi_active <= 0) + BUG_ON(nmi_watchdog != NMI_LOCAL_APIC); + + if (atomic_read(&nmi_active) <= 0) return; - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_AMD: - wrmsr(MSR_K7_EVNTSEL0, 0, 0); - break; - case X86_VENDOR_INTEL: - switch (boot_cpu_data.x86) { - case 6: - if (boot_cpu_data.x86_model > 0xd) - break; - wrmsr(MSR_P6_EVNTSEL0, 0, 0); - break; - case 15: - if (boot_cpu_data.x86_model > 0x4) - break; + on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1); - wrmsr(MSR_P4_IQ_CCCR0, 0, 0); - wrmsr(MSR_P4_CRU_ESCR0, 0, 0); - break; - } - break; - } - nmi_active = -1; - /* tell do_nmi() and others that we're not active any more */ - nmi_watchdog = 0; + BUG_ON(atomic_read(&nmi_active) != 0); } static void enable_lapic_nmi_watchdog(void) { - if (nmi_active < 0) { - nmi_watchdog = NMI_LOCAL_APIC; - setup_apic_nmi_watchdog(); - } + BUG_ON(nmi_watchdog != NMI_LOCAL_APIC); + + /* are we already enabled */ + if (atomic_read(&nmi_active) != 0) + return; + + /* are we lapic aware */ + if (nmi_known_cpu() <= 0) + return; + + on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1); + touch_nmi_watchdog(); } int reserve_lapic_nmi(void) @@ -388,20 +354,25 @@ void release_lapic_nmi(void) void disable_timer_nmi_watchdog(void) { - if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0)) + BUG_ON(nmi_watchdog != NMI_IO_APIC); + + if (atomic_read(&nmi_active) <= 0) return; - unset_nmi_callback(); - nmi_active = -1; - nmi_watchdog = NMI_NONE; + disable_irq(0); + on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1); + + BUG_ON(atomic_read(&nmi_active) != 0); } void enable_timer_nmi_watchdog(void) { - if (nmi_active < 0) { - nmi_watchdog = NMI_IO_APIC; + BUG_ON(nmi_watchdog != NMI_IO_APIC); + + if (atomic_read(&nmi_active) == 0) { touch_nmi_watchdog(); - nmi_active = 1; + on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1); + enable_irq(0); } } @@ -411,7 +382,7 @@ static int nmi_pm_active; /* nmi_active static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state) { - nmi_pm_active = nmi_active; + nmi_pm_active = atomic_read(&nmi_active); disable_lapic_nmi_watchdog(); return 0; } @@ -439,7 +410,13 @@ static int __init init_lapic_nmi_sysfs(v { int error; - if (nmi_active == 0 || nmi_watchdog != NMI_LOCAL_APIC) + /* should really be a BUG_ON but b/c this is an + * init call, it just doesn't work. -dcz + */ + if (nmi_watchdog != NMI_LOCAL_APIC) + return 0; + + if ( atomic_read(&nmi_active) < 0 ) return 0; error = sysdev_class_register(&nmi_sysclass); @@ -457,143 +434,312 @@ late_initcall(init_lapic_nmi_sysfs); * Original code written by Keith Owens. */ -static void write_watchdog_counter(const char *descr) +static void write_watchdog_counter(unsigned int perfctr_msr, const char *descr) { u64 count = (u64)cpu_khz * 1000; do_div(count, nmi_hz); if(descr) Dprintk("setting %s to -0x%08Lx\n", descr, count); - wrmsrl(nmi_perfctr_msr, 0 - count); + wrmsrl(perfctr_msr, 0 - count); } +/* Note that these events don't tick when the CPU idles. This means + the frequency varies with CPU load. */ + +#define K7_EVNTSEL_ENABLE (1 << 22) +#define K7_EVNTSEL_INT (1 << 20) +#define K7_EVNTSEL_OS (1 << 17) +#define K7_EVNTSEL_USR (1 << 16) +#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76 +#define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING + static int setup_k7_watchdog(void) { + unsigned int perfctr_msr, evntsel_msr; unsigned int evntsel; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); - nmi_perfctr_msr = MSR_K7_PERFCTR0; - - if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + perfctr_msr = MSR_K7_PERFCTR0; + evntsel_msr = MSR_K7_EVNTSEL0; + if (!reserve_perfctr_nmi(perfctr_msr)) goto fail; - if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0)) + if (!reserve_evntsel_nmi(evntsel_msr)) goto fail1; - wrmsrl(MSR_K7_PERFCTR0, 0UL); + wrmsrl(perfctr_msr, 0UL); evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS | K7_EVNTSEL_USR | K7_NMI_EVENT; - wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); - write_watchdog_counter("K7_PERFCTR0"); + /* setup the timer */ + wrmsr(evntsel_msr, evntsel, 0); + write_watchdog_counter(perfctr_msr, "K7_PERFCTR0"); apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; - wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); + wrmsr(evntsel_msr, evntsel, 0); + + wd->perfctr_msr = perfctr_msr; + wd->evntsel_msr = evntsel_msr; + wd->cccr_msr = 0; //unused + wd->check_bit = 1ULL<<63; return 1; fail1: - release_perfctr_nmi(nmi_perfctr_msr); + release_perfctr_nmi(perfctr_msr); fail: return 0; } +static void stop_k7_watchdog(void) +{ + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + + wrmsr(wd->evntsel_msr, 0, 0); + + release_evntsel_nmi(wd->evntsel_msr); + release_perfctr_nmi(wd->perfctr_msr); +} + +#define P6_EVNTSEL0_ENABLE (1 << 22) +#define P6_EVNTSEL_INT (1 << 20) +#define P6_EVNTSEL_OS (1 << 17) +#define P6_EVNTSEL_USR (1 << 16) +#define P6_EVENT_CPU_CLOCKS_NOT_HALTED 0x79 +#define P6_NMI_EVENT P6_EVENT_CPU_CLOCKS_NOT_HALTED + static int setup_p6_watchdog(void) { + unsigned int perfctr_msr, evntsel_msr; unsigned int evntsel; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); - nmi_perfctr_msr = MSR_P6_PERFCTR0; - - if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + perfctr_msr = MSR_P6_PERFCTR0; + evntsel_msr = MSR_P6_EVNTSEL0; + if (!reserve_perfctr_nmi(perfctr_msr)) goto fail; - if (!reserve_evntsel_nmi(MSR_P6_EVNTSEL0)) + if (!reserve_evntsel_nmi(evntsel_msr)) goto fail1; + wrmsrl(perfctr_msr, 0UL); + evntsel = P6_EVNTSEL_INT | P6_EVNTSEL_OS | P6_EVNTSEL_USR | P6_NMI_EVENT; - wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); - write_watchdog_counter("P6_PERFCTR0"); + /* setup the timer */ + wrmsr(evntsel_msr, evntsel, 0); + write_watchdog_counter(perfctr_msr, "P6_PERFCTR0"); apic_write(APIC_LVTPC, APIC_DM_NMI); - evntsel |= P6_EVNTSEL0_ENABLE; - wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); + evntsel |= K7_EVNTSEL_ENABLE; + wrmsr(evntsel_msr, evntsel, 0); + + wd->perfctr_msr = perfctr_msr; + wd->evntsel_msr = evntsel_msr; + wd->cccr_msr = 0; //unused + wd->check_bit = 1ULL<<39; return 1; fail1: - release_perfctr_nmi(nmi_perfctr_msr); + release_perfctr_nmi(perfctr_msr); fail: return 0; } +static void stop_p6_watchdog(void) +{ + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + + wrmsr(wd->evntsel_msr, 0, 0); + + release_evntsel_nmi(wd->evntsel_msr); + release_perfctr_nmi(wd->perfctr_msr); +} + +/* Note that these events don't tick when the CPU idles. This means + the frequency varies with CPU load. */ + +#define MSR_P4_MISC_ENABLE_PERF_AVAIL (1<<7) +#define P4_ESCR_EVENT_SELECT(N) ((N)<<25) +#define P4_ESCR_OS (1<<3) +#define P4_ESCR_USR (1<<2) +#define P4_CCCR_OVF_PMI0 (1<<26) +#define P4_CCCR_OVF_PMI1 (1<<27) +#define P4_CCCR_THRESHOLD(N) ((N)<<20) +#define P4_CCCR_COMPLEMENT (1<<19) +#define P4_CCCR_COMPARE (1<<18) +#define P4_CCCR_REQUIRED (3<<16) +#define P4_CCCR_ESCR_SELECT(N) ((N)<<13) +#define P4_CCCR_ENABLE (1<<12) +#define P4_CCCR_OVF (1<<31) +/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter + CRU_ESCR0 (with any non-null event selector) through a complemented + max threshold. [IA32-Vol3, Section 14.9.9] */ + static int setup_p4_watchdog(void) { + unsigned int perfctr_msr, evntsel_msr, cccr_msr; + unsigned int evntsel, cccr_val; unsigned int misc_enable, dummy; + unsigned int ht_num; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); - rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy); + rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy); if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL)) return 0; - nmi_perfctr_msr = MSR_P4_IQ_COUNTER0; - nmi_p4_cccr_val = P4_NMI_IQ_CCCR0; #ifdef CONFIG_SMP - if (smp_num_siblings == 2) - nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1; + /* detect which hyperthread we are on */ + if (smp_num_siblings == 2) { + unsigned int ebx, apicid; + + ebx = cpuid_ebx(1); + apicid = (ebx >> 24) & 0xff; + ht_num = apicid & 1; + } else #endif + ht_num = 0; - if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + /* performance counters are shared resources + * assign each hyperthread its own set + * (re-use the ESCR0 register, seems safe + * and keeps the cccr_val the same) + */ + if (!ht_num) { + /* logical cpu 0 */ + perfctr_msr = MSR_P4_IQ_PERFCTR0; + evntsel_msr = MSR_P4_CRU_ESCR0; + cccr_msr = MSR_P4_IQ_CCCR0; + cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4); + } else { + /* logical cpu 1 */ + perfctr_msr = MSR_P4_IQ_PERFCTR1; + evntsel_msr = MSR_P4_CRU_ESCR0; + cccr_msr = MSR_P4_IQ_CCCR1; + cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4); + } + + if (!reserve_perfctr_nmi(perfctr_msr)) goto fail; - if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0)) + if (!reserve_evntsel_nmi(evntsel_msr)) goto fail1; - wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0); - wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0); - write_watchdog_counter("P4_IQ_COUNTER0"); + evntsel = P4_ESCR_EVENT_SELECT(0x3F) + | P4_ESCR_OS + | P4_ESCR_USR; + + cccr_val |= P4_CCCR_THRESHOLD(15) + | P4_CCCR_COMPLEMENT + | P4_CCCR_COMPARE + | P4_CCCR_REQUIRED; + + wrmsr(evntsel_msr, evntsel, 0); + wrmsr(cccr_msr, cccr_val, 0); + write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0"); apic_write(APIC_LVTPC, APIC_DM_NMI); - wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0); + cccr_val |= P4_CCCR_ENABLE; + wrmsr(cccr_msr, cccr_val, 0); + wd->perfctr_msr = perfctr_msr; + wd->evntsel_msr = evntsel_msr; + wd->cccr_msr = cccr_msr; + wd->check_bit = 1ULL<<39; return 1; fail1: - release_perfctr_nmi(nmi_perfctr_msr); + release_perfctr_nmi(perfctr_msr); fail: return 0; } -void setup_apic_nmi_watchdog (void) +static void stop_p4_watchdog(void) { - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_AMD: - if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15) - return; - if (!setup_k7_watchdog()) - return; - break; - case X86_VENDOR_INTEL: - switch (boot_cpu_data.x86) { - case 6: - if (boot_cpu_data.x86_model > 0xd) - return; + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + + wrmsr(wd->cccr_msr, 0, 0); + wrmsr(wd->evntsel_msr, 0, 0); + + release_evntsel_nmi(wd->evntsel_msr); + release_perfctr_nmi(wd->perfctr_msr); +} + +void setup_apic_nmi_watchdog (void *unused) +{ + /* only support LOCAL and IO APICs for now */ + if ((nmi_watchdog != NMI_LOCAL_APIC) && + (nmi_watchdog != NMI_IO_APIC)) + return; - if(!setup_p6_watchdog()) + if (nmi_watchdog == NMI_LOCAL_APIC) { + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15) return; - break; - case 15: - if (boot_cpu_data.x86_model > 0x4) + if (!setup_k7_watchdog()) return; + break; + case X86_VENDOR_INTEL: + switch (boot_cpu_data.x86) { + case 6: + if (boot_cpu_data.x86_model > 0xd) + return; - if (!setup_p4_watchdog()) + if (!setup_p6_watchdog()) + return; + break; + case 15: + if (boot_cpu_data.x86_model > 0x4) + return; + + if (!setup_p4_watchdog()) + return; + break; + default: return; + } + break; + default: + return; + } + } + __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 1; + atomic_inc(&nmi_active); +} + +static void stop_apic_nmi_watchdog(void *unused) +{ + /* only support LOCAL and IO APICs for now */ + if ((nmi_watchdog != NMI_LOCAL_APIC) && + (nmi_watchdog != NMI_IO_APIC)) + return; + + if (nmi_watchdog == NMI_LOCAL_APIC) { + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + stop_k7_watchdog(); + break; + case X86_VENDOR_INTEL: + switch (boot_cpu_data.x86) { + case 6: + if (boot_cpu_data.x86_model > 0xd) + break; + stop_p6_watchdog(); + break; + case 15: + if (boot_cpu_data.x86_model > 0x4) + break; + stop_p4_watchdog(); + break; + } break; default: return; } - break; - default: - return; } - lapic_nmi_owner = LAPIC_NMI_WATCHDOG; - nmi_active = 1; + __get_cpu_var(nmi_watchdog_ctlblk.enabled) = 0; + atomic_dec(&nmi_active); } /* @@ -634,7 +780,7 @@ void touch_nmi_watchdog (void) extern void die_nmi(struct pt_regs *, const char *msg); -void nmi_watchdog_tick (struct pt_regs * regs) +void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason) { /* @@ -643,11 +789,21 @@ void nmi_watchdog_tick (struct pt_regs * * smp_processor_id(). */ unsigned int sum; + int touched = 0; int cpu = smp_processor_id(); + struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); + u64 dummy; + + /* check for other users first */ + if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) + == NOTIFY_STOP) { + touched = 1; + } sum = per_cpu(irq_stat, cpu).apic_timer_irqs; - if (last_irq_sums[cpu] == sum) { + /* if the apic timer isn't firing, this cpu isn't doing much */ + if (!touched && last_irq_sums[cpu] == sum) { /* * Ayiee, looks like this CPU is stuck ... * wait a few IRQs (5 seconds) before doing the oops ... @@ -662,26 +818,41 @@ void nmi_watchdog_tick (struct pt_regs * last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } - if (nmi_perfctr_msr) { - if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) { - /* - * P4 quirks: - * - An overflown perfctr will assert its interrupt - * until the OVF flag in its CCCR is cleared. - * - LVTPC is masked on interrupt and must be - * unmasked by the LVTPC handler. - */ - wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0); - apic_write(APIC_LVTPC, APIC_DM_NMI); - } - else if (nmi_perfctr_msr == MSR_P6_PERFCTR0) { - /* Only P6 based Pentium M need to re-unmask - * the apic vector but it doesn't hurt - * other P6 variant */ - apic_write(APIC_LVTPC, APIC_DM_NMI); + /* see if the nmi watchdog went off */ + if (wd->enabled) { + if (nmi_watchdog == NMI_LOCAL_APIC) { + rdmsrl(wd->perfctr_msr, dummy); + if (dummy & wd->check_bit){ + /* this wasn't a watchdog timer interrupt */ + goto done; + } + + /* only Intel P4 uses the cccr msr */ + if (wd->cccr_msr != 0) { + /* + * P4 quirks: + * - An overflown perfctr will assert its interrupt + * until the OVF flag in its CCCR is cleared. + * - LVTPC is masked on interrupt and must be + * unmasked by the LVTPC handler. + */ + rdmsrl(wd->cccr_msr, dummy); + dummy &= ~P4_CCCR_OVF; + wrmsrl(wd->cccr_msr, dummy); + apic_write(APIC_LVTPC, APIC_DM_NMI); + } + else if (wd->perfctr_msr == MSR_P6_PERFCTR0) { + /* Only P6 based Pentium M need to re-unmask + * the apic vector but it doesn't hurt + * other P6 variant */ + apic_write(APIC_LVTPC, APIC_DM_NMI); + } + /* start the cycle over again */ + write_watchdog_counter(wd->perfctr_msr, NULL); } - write_watchdog_counter(NULL); } +done: + return; } #ifdef CONFIG_SYSCTL Index: linux-don/arch/i386/kernel/traps.c =================================================================== --- linux-don.orig/arch/i386/kernel/traps.c +++ linux-don/arch/i386/kernel/traps.c @@ -691,7 +691,7 @@ static void default_do_nmi(struct pt_reg * so it must be the NMI watchdog. */ if (nmi_watchdog) { - nmi_watchdog_tick(regs); + nmi_watchdog_tick(regs, reason); return; } #endif Index: linux-don/include/asm-i386/nmi.h =================================================================== --- linux-don.orig/include/asm-i386/nmi.h +++ linux-don/include/asm-i386/nmi.h @@ -32,13 +32,14 @@ extern void release_perfctr_nmi(unsigned extern int reserve_evntsel_nmi(unsigned int); extern void release_evntsel_nmi(unsigned int); -extern void setup_apic_nmi_watchdog (void); +extern void setup_apic_nmi_watchdog (void *); extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); extern void disable_timer_nmi_watchdog(void); extern void enable_timer_nmi_watchdog(void); -extern void nmi_watchdog_tick (struct pt_regs * regs); +extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason); +extern atomic_t nmi_active; extern unsigned int nmi_watchdog; #define NMI_DEFAULT -1 #define NMI_NONE 0 Index: linux-don/arch/i386/oprofile/nmi_timer_int.c =================================================================== --- linux-don.orig/arch/i386/oprofile/nmi_timer_int.c +++ linux-don/arch/i386/oprofile/nmi_timer_int.c @@ -42,9 +42,7 @@ static void timer_stop(void) int __init op_nmi_timer_init(struct oprofile_operations * ops) { - extern int nmi_active; - - if (nmi_active <= 0) + if (atomic_read(&nmi_active) <= 0) return -ENODEV; ops->start = timer_start; -- |
From: Markus A. <ar...@re...> - 2006-05-22 17:32:00
|
@@ -457,143 +434,312 @@ late_initcall(init_lapic_nmi_sysfs); [...] static int setup_p6_watchdog(void) [...] apic_write(APIC_LVTPC, APIC_DM_NMI); - evntsel |= P6_EVNTSEL0_ENABLE; - wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); + evntsel |= K7_EVNTSEL_ENABLE; Me thinks you want P6_EVNTSEL0_ENABLE here, although the value is the same. |
From: Don Z. <dz...@re...> - 2006-05-22 17:51:52
|
On Mon, May 22, 2006 at 07:31:41PM +0200, Markus Armbruster wrote: > @@ -457,143 +434,312 @@ late_initcall(init_lapic_nmi_sysfs); > [...] > static int setup_p6_watchdog(void) > [...] > apic_write(APIC_LVTPC, APIC_DM_NMI); > - evntsel |= P6_EVNTSEL0_ENABLE; > - wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); > + evntsel |= K7_EVNTSEL_ENABLE; > > Me thinks you want P6_EVNTSEL0_ENABLE here, although the value is the > same. Yup, the downside of copying/pasting... Thanks. |
From: dzickus <dz...@re...> - 2006-05-09 20:57:12
|
Adds basic infrastructure to allow subsystems to reserve performance counters on the x86 chips. Only UP kernels are supported in this patch to make reviewing easier. The SMP portion makes a lot more changes. Think of this as a locking mechanism where each bit represents a different counter. In addition, each subsystem should also reserve an appropriate event selection register that will correspond to the performance counter it will be using (this is mainly neccessary for the Pentium 4 chips as they break the 1:1 relationship to performance counters). This will help prevent subsystems like oprofile from interfering with the nmi watchdog. Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/x86_64/kernel/nmi.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/nmi.c +++ linux-don/arch/x86_64/kernel/nmi.c @@ -28,6 +28,20 @@ #include <asm/kdebug.h> #include <asm/mce.h> +/* perfctr_nmi_owner tracks the ownership of the perfctr registers: + * evtsel_nmi_owner tracks the ownership of the event selection + * - different performance counters/ event selection may be reserved for + * different subsystems this reservation system just tries to coordinate + * things a little + */ +static DEFINE_PER_CPU(unsigned, perfctr_nmi_owner); +static DEFINE_PER_CPU(unsigned, evntsel_nmi_owner[2]); + +/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's + * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now) + */ +#define NMI_MAX_COUNTER_BITS 66 + /* * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: * - it may be reserved by some other driver, or not @@ -91,6 +105,95 @@ static unsigned int nmi_p4_cccr_val; (P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \ P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE) +/* converts an msr to an appropriate reservation bit */ +static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr) +{ + /* returns the bit offset of the performance counter register */ + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + return (msr - MSR_K7_PERFCTR0); + case X86_VENDOR_INTEL: + return (msr - MSR_P4_BPU_PERFCTR0); + } + return 0; +} + +/* converts an msr to an appropriate reservation bit */ +static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr) +{ + /* returns the bit offset of the event selection register */ + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + return (msr - MSR_K7_EVNTSEL0); + case X86_VENDOR_INTEL: + return (msr - MSR_P4_BSU_ESCR0); + } + return 0; +} + +/* checks for a bit availability (hack for oprofile) */ +int avail_to_resrv_perfctr_nmi_bit(unsigned int counter) +{ + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner))); +} + +/* checks the an msr for availability */ +int avail_to_resrv_perfctr_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_perfctr_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner))); +} + +int reserve_perfctr_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_perfctr_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner))) + return 1; + return 0; +} + +void release_perfctr_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_perfctr_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + clear_bit(counter, &__get_cpu_var(perfctr_nmi_owner)); +} + +int reserve_evntsel_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_evntsel_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + if (!test_and_set_bit(counter, &__get_cpu_var(evntsel_nmi_owner))) + return 1; + return 0; +} + +void release_evntsel_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_evntsel_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner)); +} + static __cpuinit inline int nmi_known_cpu(void) { switch (boot_cpu_data.x86_vendor) { @@ -326,34 +429,22 @@ late_initcall(init_lapic_nmi_sysfs); #endif /* CONFIG_PM */ -/* - * Activate the NMI watchdog via the local APIC. - * Original code written by Keith Owens. - */ - -static void clear_msr_range(unsigned int base, unsigned int n) +static int setup_k7_watchdog(void) { - unsigned int i; - - for(i = 0; i < n; ++i) - wrmsr(base+i, 0, 0); -} - -static void setup_k7_watchdog(void) -{ - int i; unsigned int evntsel; nmi_perfctr_msr = MSR_K7_PERFCTR0; - for(i = 0; i < 4; ++i) { - /* Simulator may not support it */ - if (checking_wrmsrl(MSR_K7_EVNTSEL0+i, 0UL)) { - nmi_perfctr_msr = 0; - return; - } - wrmsrl(MSR_K7_PERFCTR0+i, 0UL); - } + if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + goto fail; + + if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0)) + goto fail1; + + /* Simulator may not support it */ + if (checking_wrmsrl(MSR_K7_EVNTSEL0, 0UL)) + goto fail2; + wrmsrl(MSR_K7_PERFCTR0, 0UL); evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS @@ -365,6 +456,13 @@ static void setup_k7_watchdog(void) apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); + return 1; +fail2: + release_evntsel_nmi(MSR_K7_EVNTSEL0); +fail1: + release_perfctr_nmi(nmi_perfctr_msr); +fail: + return 0; } @@ -383,22 +481,11 @@ static int setup_p4_watchdog(void) nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1; #endif - if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL)) - clear_msr_range(0x3F1, 2); - /* MSR 0x3F0 seems to have a default value of 0xFC00, but current - docs doesn't fully define it, so leave it alone for now. */ - if (boot_cpu_data.x86_model >= 0x3) { - /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */ - clear_msr_range(0x3A0, 26); - clear_msr_range(0x3BC, 3); - } else { - clear_msr_range(0x3A0, 31); - } - clear_msr_range(0x3C0, 6); - clear_msr_range(0x3C8, 6); - clear_msr_range(0x3E0, 2); - clear_msr_range(MSR_P4_CCCR0, 18); - clear_msr_range(MSR_P4_PERFCTR0, 18); + if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + goto fail; + + if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0)) + goto fail1; wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0); wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0); @@ -407,6 +494,10 @@ static int setup_p4_watchdog(void) apic_write(APIC_LVTPC, APIC_DM_NMI); wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0); return 1; +fail1: + release_perfctr_nmi(nmi_perfctr_msr); +fail: + return 0; } void setup_apic_nmi_watchdog(void) @@ -417,7 +508,8 @@ void setup_apic_nmi_watchdog(void) return; if (strstr(boot_cpu_data.x86_model_id, "Screwdriver")) return; - setup_k7_watchdog(); + if (!setup_k7_watchdog()) + return; break; case X86_VENDOR_INTEL: if (boot_cpu_data.x86 != 15) @@ -587,6 +679,12 @@ int proc_unknown_nmi_panic(struct ctl_ta EXPORT_SYMBOL(nmi_active); EXPORT_SYMBOL(nmi_watchdog); +EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi); +EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit); +EXPORT_SYMBOL(reserve_perfctr_nmi); +EXPORT_SYMBOL(release_perfctr_nmi); +EXPORT_SYMBOL(reserve_evntsel_nmi); +EXPORT_SYMBOL(release_evntsel_nmi); EXPORT_SYMBOL(reserve_lapic_nmi); EXPORT_SYMBOL(release_lapic_nmi); EXPORT_SYMBOL(disable_timer_nmi_watchdog); Index: linux-don/include/asm-x86_64/nmi.h =================================================================== --- linux-don.orig/include/asm-x86_64/nmi.h +++ linux-don/include/asm-x86_64/nmi.h @@ -56,7 +56,13 @@ extern int panic_on_timeout; extern int unknown_nmi_panic; extern int check_nmi_watchdog(void); - +extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); +extern int avail_to_resrv_perfctr_nmi(unsigned int); +extern int reserve_perfctr_nmi(unsigned int); +extern void release_perfctr_nmi(unsigned int); +extern int reserve_evntsel_nmi(unsigned int); +extern void release_evntsel_nmi(unsigned int); + extern void setup_apic_nmi_watchdog (void); extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); Index: linux-don/arch/i386/kernel/nmi.c =================================================================== --- linux-don.orig/arch/i386/kernel/nmi.c +++ linux-don/arch/i386/kernel/nmi.c @@ -34,6 +34,20 @@ static unsigned int nmi_perfctr_msr; /* static unsigned int nmi_p4_cccr_val; extern void show_registers(struct pt_regs *regs); +/* perfctr_nmi_owner tracks the ownership of the perfctr registers: + * evtsel_nmi_owner tracks the ownership of the event selection + * - different performance counters/ event selection may be reserved for + * different subsystems this reservation system just tries to coordinate + * things a little + */ +static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner); +static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]); + +/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's + * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now) + */ +#define NMI_MAX_COUNTER_BITS 66 + /* * lapic_nmi_owner tracks the ownership of the lapic NMI hardware: * - it may be reserved by some other driver, or not @@ -95,6 +109,105 @@ int nmi_active; (P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT| \ P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE) +/* converts an msr to an appropriate reservation bit */ +static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr) +{ + /* returns the bit offset of the performance counter register */ + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + return (msr - MSR_K7_PERFCTR0); + case X86_VENDOR_INTEL: + switch (boot_cpu_data.x86) { + case 6: + return (msr - MSR_P6_PERFCTR0); + case 15: + return (msr - MSR_P4_BPU_PERFCTR0); + } + } + return 0; +} + +/* converts an msr to an appropriate reservation bit */ +static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr) +{ + /* returns the bit offset of the event selection register */ + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + return (msr - MSR_K7_EVNTSEL0); + case X86_VENDOR_INTEL: + switch (boot_cpu_data.x86) { + case 6: + return (msr - MSR_P6_EVNTSEL0); + case 15: + return (msr - MSR_P4_BSU_ESCR0); + } + } + return 0; +} + +/* checks for a bit availability (hack for oprofile) */ +int avail_to_resrv_perfctr_nmi_bit(unsigned int counter) +{ + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner))); +} + +/* checks the an msr for availability */ +int avail_to_resrv_perfctr_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_perfctr_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner))); +} + +int reserve_perfctr_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_perfctr_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner))) + return 1; + return 0; +} + +void release_perfctr_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_perfctr_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + clear_bit(counter, &__get_cpu_var(perfctr_nmi_owner)); +} + +int reserve_evntsel_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_evntsel_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + if (!test_and_set_bit(counter, &__get_cpu_var(evntsel_nmi_owner[0]))) + return 1; + return 0; +} + +void release_evntsel_nmi(unsigned int msr) +{ + unsigned int counter; + + counter = nmi_evntsel_msr_to_bit(msr); + BUG_ON(counter > NMI_MAX_COUNTER_BITS); + + clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner[0])); +} + #ifdef CONFIG_SMP /* The performance counters used by NMI_LOCAL_APIC don't trigger when * the CPU is idle. To make sure the NMI watchdog really ticks on all @@ -344,14 +457,6 @@ late_initcall(init_lapic_nmi_sysfs); * Original code written by Keith Owens. */ -static void clear_msr_range(unsigned int base, unsigned int n) -{ - unsigned int i; - - for(i = 0; i < n; ++i) - wrmsr(base+i, 0, 0); -} - static void write_watchdog_counter(const char *descr) { u64 count = (u64)cpu_khz * 1000; @@ -362,14 +467,19 @@ static void write_watchdog_counter(const wrmsrl(nmi_perfctr_msr, 0 - count); } -static void setup_k7_watchdog(void) +static int setup_k7_watchdog(void) { unsigned int evntsel; nmi_perfctr_msr = MSR_K7_PERFCTR0; - clear_msr_range(MSR_K7_EVNTSEL0, 4); - clear_msr_range(MSR_K7_PERFCTR0, 4); + if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + goto fail; + + if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0)) + goto fail1; + + wrmsrl(MSR_K7_PERFCTR0, 0UL); evntsel = K7_EVNTSEL_INT | K7_EVNTSEL_OS @@ -381,16 +491,24 @@ static void setup_k7_watchdog(void) apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= K7_EVNTSEL_ENABLE; wrmsr(MSR_K7_EVNTSEL0, evntsel, 0); + return 1; +fail1: + release_perfctr_nmi(nmi_perfctr_msr); +fail: + return 0; } -static void setup_p6_watchdog(void) +static int setup_p6_watchdog(void) { unsigned int evntsel; nmi_perfctr_msr = MSR_P6_PERFCTR0; - clear_msr_range(MSR_P6_EVNTSEL0, 2); - clear_msr_range(MSR_P6_PERFCTR0, 2); + if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + goto fail; + + if (!reserve_evntsel_nmi(MSR_P6_EVNTSEL0)) + goto fail1; evntsel = P6_EVNTSEL_INT | P6_EVNTSEL_OS @@ -402,6 +520,11 @@ static void setup_p6_watchdog(void) apic_write(APIC_LVTPC, APIC_DM_NMI); evntsel |= P6_EVNTSEL0_ENABLE; wrmsr(MSR_P6_EVNTSEL0, evntsel, 0); + return 1; +fail1: + release_perfctr_nmi(nmi_perfctr_msr); +fail: + return 0; } static int setup_p4_watchdog(void) @@ -419,22 +542,11 @@ static int setup_p4_watchdog(void) nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1; #endif - if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL)) - clear_msr_range(0x3F1, 2); - /* MSR 0x3F0 seems to have a default value of 0xFC00, but current - docs doesn't fully define it, so leave it alone for now. */ - if (boot_cpu_data.x86_model >= 0x3) { - /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */ - clear_msr_range(0x3A0, 26); - clear_msr_range(0x3BC, 3); - } else { - clear_msr_range(0x3A0, 31); - } - clear_msr_range(0x3C0, 6); - clear_msr_range(0x3C8, 6); - clear_msr_range(0x3E0, 2); - clear_msr_range(MSR_P4_CCCR0, 18); - clear_msr_range(MSR_P4_PERFCTR0, 18); + if (!reserve_perfctr_nmi(nmi_perfctr_msr)) + goto fail; + + if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0)) + goto fail1; wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0); wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0); @@ -442,6 +554,10 @@ static int setup_p4_watchdog(void) apic_write(APIC_LVTPC, APIC_DM_NMI); wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0); return 1; +fail1: + release_perfctr_nmi(nmi_perfctr_msr); +fail: + return 0; } void setup_apic_nmi_watchdog (void) @@ -450,7 +566,8 @@ void setup_apic_nmi_watchdog (void) case X86_VENDOR_AMD: if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15) return; - setup_k7_watchdog(); + if (!setup_k7_watchdog()) + return; break; case X86_VENDOR_INTEL: switch (boot_cpu_data.x86) { @@ -458,7 +575,8 @@ void setup_apic_nmi_watchdog (void) if (boot_cpu_data.x86_model > 0xd) return; - setup_p6_watchdog(); + if(!setup_p6_watchdog()) + return; break; case 15: if (boot_cpu_data.x86_model > 0x4) @@ -611,6 +729,12 @@ int proc_unknown_nmi_panic(ctl_table *ta EXPORT_SYMBOL(nmi_active); EXPORT_SYMBOL(nmi_watchdog); +EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi); +EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit); +EXPORT_SYMBOL(reserve_perfctr_nmi); +EXPORT_SYMBOL(release_perfctr_nmi); +EXPORT_SYMBOL(reserve_evntsel_nmi); +EXPORT_SYMBOL(release_evntsel_nmi); EXPORT_SYMBOL(reserve_lapic_nmi); EXPORT_SYMBOL(release_lapic_nmi); EXPORT_SYMBOL(disable_timer_nmi_watchdog); Index: linux-don/include/asm-i386/nmi.h =================================================================== --- linux-don.orig/include/asm-i386/nmi.h +++ linux-don/include/asm-i386/nmi.h @@ -25,6 +25,13 @@ void set_nmi_callback(nmi_callback_t cal */ void unset_nmi_callback(void); +extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); +extern int avail_to_resrv_perfctr_nmi(unsigned int); +extern int reserve_perfctr_nmi(unsigned int); +extern void release_perfctr_nmi(unsigned int); +extern int reserve_evntsel_nmi(unsigned int); +extern void release_evntsel_nmi(unsigned int); + extern void setup_apic_nmi_watchdog (void); extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); -- |
From: dzickus <dz...@re...> - 2006-05-09 20:57:13
|
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi watchdog. Signed-off-by: Don Zickus <dz...@re...> Index: linux-don/arch/i386/kernel/nmi.c =================================================================== --- linux-don.orig/arch/i386/kernel/nmi.c +++ linux-don/arch/i386/kernel/nmi.c @@ -845,6 +845,56 @@ static int unknown_nmi_panic_callback(st return 0; } +/* + * proc handler for /proc/sys/kernel/nmi + */ +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file, + void __user *buffer, size_t *length, loff_t *ppos) +{ + int old_state; + + nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0; + old_state = nmi_watchdog_enabled; + proc_dointvec(table, write, file, buffer, length, ppos); + if (!!old_state == !!nmi_watchdog_enabled) + return 0; + + if (atomic_read(&nmi_active) < 0) { + printk( KERN_WARNING "NMI watchdog is permanently disabled\n"); + return 0; + } + + if (nmi_watchdog == NMI_DEFAULT) { + if (nmi_known_cpu() > 0) + nmi_watchdog = NMI_LOCAL_APIC; + else + nmi_watchdog = NMI_IO_APIC; + } + + if (nmi_watchdog == NMI_LOCAL_APIC) + { + if (nmi_watchdog_enabled) + enable_lapic_nmi_watchdog(); + else + disable_lapic_nmi_watchdog(); + } else if (nmi_watchdog == NMI_IO_APIC) { + /* FIXME + * for some reason these functions don't work + */ + printk("Can not enable/disable NMI on IO APIC\n"); + return 0; + /*if (nmi_watchdog_enabled) + enable_timer_nmi_watchdog(); + else + disable_timer_nmi_watchdog(); + */ + } else { + printk( KERN_WARNING + "NMI watchdog doesn't know what hardware to touch\n"); + } + return 0; +} + #endif EXPORT_SYMBOL(nmi_active); Index: linux-don/arch/x86_64/kernel/nmi.c =================================================================== --- linux-don.orig/arch/x86_64/kernel/nmi.c +++ linux-don/arch/x86_64/kernel/nmi.c @@ -751,6 +751,52 @@ static int unknown_nmi_panic_callback(st return 0; } +/* + * proc handler for /proc/sys/kernel/nmi + */ +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file, + void __user *buffer, size_t *length, loff_t *ppos) +{ + int old_state; + + nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0; + old_state = nmi_watchdog_enabled; + proc_dointvec(table, write, file, buffer, length, ppos); + if (!!old_state == !!nmi_watchdog_enabled) + return 0; + + if (atomic_read(&nmi_active) < 0) { + printk( KERN_WARNING "NMI watchdog is permanently disabled\n"); + return 0; + } + + /* if nmi_watchdog is not set yet, then set it */ + nmi_watchdog_default(); + + if (nmi_watchdog == NMI_LOCAL_APIC) + { + if (nmi_watchdog_enabled) + enable_lapic_nmi_watchdog(); + else + disable_lapic_nmi_watchdog(); + } else if (nmi_watchdog == NMI_IO_APIC) { + /* FIXME + * for some reason these functions don't work + */ + printk("Can not enable/disable NMI on IO APIC\n"); + return 0; + /*if (nmi_watchdog_enabled) + enable_timer_nmi_watchdog(); + else + disable_timer_nmi_watchdog(); + */ + } else { + printk( KERN_WARNING + "NMI watchdog doesn't know what hardware to touch\n"); + } + return 0; +} + #endif EXPORT_SYMBOL(nmi_active); Index: linux-don/include/asm-i386/nmi.h =================================================================== --- linux-don.orig/include/asm-i386/nmi.h +++ linux-don/include/asm-i386/nmi.h @@ -14,6 +14,7 @@ */ int do_nmi_callback(struct pt_regs *regs, int cpu); +extern int nmi_watchdog_enabled; extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); extern int avail_to_resrv_perfctr_nmi(unsigned int); extern int reserve_perfctr_nmi(unsigned int); Index: linux-don/include/asm-x86_64/nmi.h =================================================================== --- linux-don.orig/include/asm-x86_64/nmi.h +++ linux-don/include/asm-x86_64/nmi.h @@ -43,6 +43,7 @@ extern void die_nmi(char *str, struct pt extern int panic_on_timeout; extern int unknown_nmi_panic; +extern int nmi_watchdog_enabled; extern int check_nmi_watchdog(void); extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); Index: linux-don/include/linux/sysctl.h =================================================================== --- linux-don.orig/include/linux/sysctl.h +++ linux-don/include/linux/sysctl.h @@ -148,6 +148,7 @@ enum KERN_SPIN_RETRY=70, /* int: number of spinlock retries */ KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */ KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */ + KERN_NMI_ENABLED=73, /* int: enable/disable nmi watchdog */ }; Index: linux-don/kernel/sysctl.c =================================================================== --- linux-don.orig/kernel/sysctl.c +++ linux-don/kernel/sysctl.c @@ -75,6 +75,9 @@ extern int percpu_pagelist_fraction; #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) int unknown_nmi_panic; +int nmi_watchdog_enabled; +extern int proc_nmi_enabled(struct ctl_table *, int , struct file *, + void __user *, size_t *, loff_t *); #endif /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ @@ -630,6 +633,14 @@ static ctl_table kern_table[] = { .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = KERN_NMI_ENABLED, + .procname = "nmi", + .data = &nmi_watchdog_enabled, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = &proc_nmi_enabled, + }, #endif #if defined(CONFIG_X86) { -- |
From: Andi K. <ak...@su...> - 2006-05-09 21:41:59
|
On Tuesday 09 May 2006 22:50, dzickus wrote: > > Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi > watchdog. The subject is wrong i think - sysctl != sysfs. And shouldn't that be called nmi_watchdog? > > Signed-off-by: Don Zickus <dz...@re...> > > Index: linux-don/arch/i386/kernel/nmi.c > =================================================================== > --- linux-don.orig/arch/i386/kernel/nmi.c > +++ linux-don/arch/i386/kernel/nmi.c > @@ -845,6 +845,56 @@ static int unknown_nmi_panic_callback(st > return 0; > } > > +/* > + * proc handler for /proc/sys/kernel/nmi > + */ > +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file, > + void __user *buffer, size_t *length, loff_t *ppos) > +{ > + int old_state; > + > + nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0; > + old_state = nmi_watchdog_enabled; > + proc_dointvec(table, write, file, buffer, length, ppos); > + if (!!old_state == !!nmi_watchdog_enabled) > + return 0; > + > + if (atomic_read(&nmi_active) < 0) { > + printk( KERN_WARNING "NMI watchdog is permanently disabled\n"); > + return 0; > + } The returns should be probably -EIO or similar. Same further down. > + > + if (nmi_watchdog == NMI_DEFAULT) { > + if (nmi_known_cpu() > 0) > + nmi_watchdog = NMI_LOCAL_APIC; > + else > + nmi_watchdog = NMI_IO_APIC; > + } > + > + if (nmi_watchdog == NMI_LOCAL_APIC) > + { Move the { up Anyways I fixed these up myself, no need to resubmit. -Andi |
From: Stephane E. <er...@hp...> - 2006-05-10 09:17:50
|
Don, Congratulations on the patch. I am glad to see that some of the SMP issues I reported a long time ago are now fixed. On Tue, May 09, 2006 at 04:50:43PM -0400, dzickus wrote: > > Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi > watchdog. > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve some performance counters on the fly. This gets complicated because now the perfmon subsystem (and probably oprofile) cannot check register availability when they are first initialized. Basically each time, the /sys entry is modified, they would have to scan the list of available performance counters. I don't know exactly when Oprofile does this checking. For perfmon, this is done only once, when the PMU description table is loaded. Also something that I did not see in this code is the error detection in case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none. Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog() will fail on some CPUs. How is that handled? Thanks. > Signed-off-by: Don Zickus <dz...@re...> > > Index: linux-don/arch/i386/kernel/nmi.c > =================================================================== > --- linux-don.orig/arch/i386/kernel/nmi.c > +++ linux-don/arch/i386/kernel/nmi.c > @@ -845,6 +845,56 @@ static int unknown_nmi_panic_callback(st > return 0; > } > > +/* > + * proc handler for /proc/sys/kernel/nmi > + */ > +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file, > + void __user *buffer, size_t *length, loff_t *ppos) > +{ > + int old_state; > + > + nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0; > + old_state = nmi_watchdog_enabled; > + proc_dointvec(table, write, file, buffer, length, ppos); > + if (!!old_state == !!nmi_watchdog_enabled) > + return 0; > + > + if (atomic_read(&nmi_active) < 0) { > + printk( KERN_WARNING "NMI watchdog is permanently disabled\n"); > + return 0; > + } > + > + if (nmi_watchdog == NMI_DEFAULT) { > + if (nmi_known_cpu() > 0) > + nmi_watchdog = NMI_LOCAL_APIC; > + else > + nmi_watchdog = NMI_IO_APIC; > + } > + > + if (nmi_watchdog == NMI_LOCAL_APIC) > + { > + if (nmi_watchdog_enabled) > + enable_lapic_nmi_watchdog(); > + else > + disable_lapic_nmi_watchdog(); > + } else if (nmi_watchdog == NMI_IO_APIC) { > + /* FIXME > + * for some reason these functions don't work > + */ > + printk("Can not enable/disable NMI on IO APIC\n"); > + return 0; > + /*if (nmi_watchdog_enabled) > + enable_timer_nmi_watchdog(); > + else > + disable_timer_nmi_watchdog(); > + */ > + } else { > + printk( KERN_WARNING > + "NMI watchdog doesn't know what hardware to touch\n"); > + } > + return 0; > +} > + > #endif > > EXPORT_SYMBOL(nmi_active); > Index: linux-don/arch/x86_64/kernel/nmi.c > =================================================================== > --- linux-don.orig/arch/x86_64/kernel/nmi.c > +++ linux-don/arch/x86_64/kernel/nmi.c > @@ -751,6 +751,52 @@ static int unknown_nmi_panic_callback(st > return 0; > } > > +/* > + * proc handler for /proc/sys/kernel/nmi > + */ > +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file, > + void __user *buffer, size_t *length, loff_t *ppos) > +{ > + int old_state; > + > + nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0; > + old_state = nmi_watchdog_enabled; > + proc_dointvec(table, write, file, buffer, length, ppos); > + if (!!old_state == !!nmi_watchdog_enabled) > + return 0; > + > + if (atomic_read(&nmi_active) < 0) { > + printk( KERN_WARNING "NMI watchdog is permanently disabled\n"); > + return 0; > + } > + > + /* if nmi_watchdog is not set yet, then set it */ > + nmi_watchdog_default(); > + > + if (nmi_watchdog == NMI_LOCAL_APIC) > + { > + if (nmi_watchdog_enabled) > + enable_lapic_nmi_watchdog(); > + else > + disable_lapic_nmi_watchdog(); > + } else if (nmi_watchdog == NMI_IO_APIC) { > + /* FIXME > + * for some reason these functions don't work > + */ > + printk("Can not enable/disable NMI on IO APIC\n"); > + return 0; > + /*if (nmi_watchdog_enabled) > + enable_timer_nmi_watchdog(); > + else > + disable_timer_nmi_watchdog(); > + */ > + } else { > + printk( KERN_WARNING > + "NMI watchdog doesn't know what hardware to touch\n"); > + } > + return 0; > +} > + > #endif > > EXPORT_SYMBOL(nmi_active); > Index: linux-don/include/asm-i386/nmi.h > =================================================================== > --- linux-don.orig/include/asm-i386/nmi.h > +++ linux-don/include/asm-i386/nmi.h > @@ -14,6 +14,7 @@ > */ > int do_nmi_callback(struct pt_regs *regs, int cpu); > > +extern int nmi_watchdog_enabled; > extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); > extern int avail_to_resrv_perfctr_nmi(unsigned int); > extern int reserve_perfctr_nmi(unsigned int); > Index: linux-don/include/asm-x86_64/nmi.h > =================================================================== > --- linux-don.orig/include/asm-x86_64/nmi.h > +++ linux-don/include/asm-x86_64/nmi.h > @@ -43,6 +43,7 @@ extern void die_nmi(char *str, struct pt > > extern int panic_on_timeout; > extern int unknown_nmi_panic; > +extern int nmi_watchdog_enabled; > > extern int check_nmi_watchdog(void); > extern int avail_to_resrv_perfctr_nmi_bit(unsigned int); > Index: linux-don/include/linux/sysctl.h > =================================================================== > --- linux-don.orig/include/linux/sysctl.h > +++ linux-don/include/linux/sysctl.h > @@ -148,6 +148,7 @@ enum > KERN_SPIN_RETRY=70, /* int: number of spinlock retries */ > KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */ > KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */ > + KERN_NMI_ENABLED=73, /* int: enable/disable nmi watchdog */ > }; > > > Index: linux-don/kernel/sysctl.c > =================================================================== > --- linux-don.orig/kernel/sysctl.c > +++ linux-don/kernel/sysctl.c > @@ -75,6 +75,9 @@ extern int percpu_pagelist_fraction; > > #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) > int unknown_nmi_panic; > +int nmi_watchdog_enabled; > +extern int proc_nmi_enabled(struct ctl_table *, int , struct file *, > + void __user *, size_t *, loff_t *); > #endif > > /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ > @@ -630,6 +633,14 @@ static ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = &proc_dointvec, > }, > + { > + .ctl_name = KERN_NMI_ENABLED, > + .procname = "nmi", > + .data = &nmi_watchdog_enabled, > + .maxlen = sizeof (int), > + .mode = 0644, > + .proc_handler = &proc_nmi_enabled, > + }, > #endif > #if defined(CONFIG_X86) > { > > -- > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list -- -Stephane |
From: Andi K. <ak...@su...> - 2006-05-10 09:29:16
|
> On Tue, May 09, 2006 at 04:50:43PM -0400, dzickus wrote: > > > > Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi > > watchdog. > > > > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve > some performance counters on the fly. This gets complicated because now > the perfmon subsystem Right now we don't care about perfmon at all because it's not in (x86) mainline If you want anybody to care you have to submit and pass review > (and probably oprofile) cannot check register > availability when they are first initialized. Basically each time, > the /sys entry is modified, they would have to scan the list of available > performance counters. I don't know exactly when Oprofile does this checking. > For perfmon, this is done only once, when the PMU description table is loaded. I think the NMI watchdog will fail if the register is already allocated. oprofile should check and allocate when it fills in the register. -Andi |
From: Don Z. <dz...@re...> - 2006-05-10 14:59:08
|
On Wed, May 10, 2006 at 02:10:26AM -0700, Stephane Eranian wrote: > Don, > > Congratulations on the patch. I am glad to see that some of the SMP > issues I reported a long time ago are now fixed. Thanks and glad I could help. > > Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi > > watchdog. > > > > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve > some performance counters on the fly. This gets complicated because now > the perfmon subsystem (and probably oprofile) cannot check register > availability when they are first initialized. Basically each time, > the /sys entry is modified, they would have to scan the list of available > performance counters. I don't know exactly when Oprofile does this checking. > For perfmon, this is done only once, when the PMU description table is loaded. How often did you plan on enabling/disabling the nmi_watchdog? My understanding was you disable nmi_watchdog, run oprofile/perfmon, re-enable nmi_watchdog. I guess I don't understand what type of funky scenarios you are dealing with. > > Also something that I did not see in this code is the error detection in > case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none. > Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and > a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog() > will fail on some CPUs. How is that handled? It's not. In fact I wouldn't know what to do in such situations. Is it really wrong to only have a subset of cpus being monitored by the nmi_watchdog? This seems to be wandering into the area where the user is looking to do something complicated (profiling a subset of cpus) and as such might be expected to make sure the nmi_watchdog is properly enabled on all cpus when they are done. Cheers, Don |
From: Stephane E. <er...@hp...> - 2006-05-10 15:11:17
|
Don, On Wed, May 10, 2006 at 10:59:53AM -0400, Don Zickus wrote: > > > > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve > > some performance counters on the fly. This gets complicated because now > > the perfmon subsystem (and probably oprofile) cannot check register > > availability when they are first initialized. Basically each time, > > the /sys entry is modified, they would have to scan the list of available > > performance counters. I don't know exactly when Oprofile does this checking. > > For perfmon, this is done only once, when the PMU description table is loaded. > How often did you plan on enabling/disabling the nmi_watchdog? My > understanding was you disable nmi_watchdog, run oprofile/perfmon, > re-enable nmi_watchdog. I guess I don't understand what type of funky > scenarios you are dealing with. > I thought that part of the exercise was to allow oprofile/perfmon and nmi_watchdog to work simultaneously, i.e., don't disable watchdog when you monitor. Leave watchdog on and have oprofile/perfmon run with fewer counters. The alternative would be to do what you say: disbale nmi, monitor, re-enable nmi. > > > > Also something that I did not see in this code is the error detection in > > case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none. > > Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and > > a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog() > > will fail on some CPUs. How is that handled? > > It's not. In fact I wouldn't know what to do in such situations. Is it > really wrong to only have a subset of cpus being monitored by the > nmi_watchdog? This seems to be wandering into the area where the user is > looking to do something complicated (profiling a subset of cpus) and as > such might be expected to make sure the nmi_watchdog is properly enabled > on all cpus when they are done. > On larger machines (think NUMA-style), it does not always make sense to monitor all CPUs. -- -Stephane |
From: Don Z. <dz...@re...> - 2006-05-10 19:05:21
|
On Wed, May 10, 2006 at 08:03:57AM -0700, Stephane Eranian wrote: > Don, > > On Wed, May 10, 2006 at 10:59:53AM -0400, Don Zickus wrote: > > > > > > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve > > > some performance counters on the fly. This gets complicated because now > > > the perfmon subsystem (and probably oprofile) cannot check register > > > availability when they are first initialized. Basically each time, > > > the /sys entry is modified, they would have to scan the list of available > > > performance counters. I don't know exactly when Oprofile does this checking. > > > For perfmon, this is done only once, when the PMU description table is loaded. > > > How often did you plan on enabling/disabling the nmi_watchdog? My > > understanding was you disable nmi_watchdog, run oprofile/perfmon, > > re-enable nmi_watchdog. I guess I don't understand what type of funky > > scenarios you are dealing with. > > > I thought that part of the exercise was to allow oprofile/perfmon and nmi_watchdog > to work simultaneously, i.e., don't disable watchdog when you monitor. Leave watchdog > on and have oprofile/perfmon run with fewer counters. The alternative would be to do > what you say: disbale nmi, monitor, re-enable nmi. It was and the patch allows one to run the monitor and the nmi_watchdog at the same time. I guess I am confused to what your original concern was. > > > > > > > Also something that I did not see in this code is the error detection in > > > case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none. > > > Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and > > > a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog() > > > will fail on some CPUs. How is that handled? > > > > It's not. In fact I wouldn't know what to do in such situations. Is it > > really wrong to only have a subset of cpus being monitored by the > > nmi_watchdog? This seems to be wandering into the area where the user is > > looking to do something complicated (profiling a subset of cpus) and as > > such might be expected to make sure the nmi_watchdog is properly enabled > > on all cpus when they are done. > > > On larger machines (think NUMA-style), it does not always make sense to monitor all CPUs. That makes sense. Thinking about it, it probably wouldn't be too hard to add code to try and re-scan the cpus and enable those nmi_watchdogs that were disabled during your monitoring. But you would have to manually do this once you were done testing. Cheers, Don |