You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Seiji A. <sei...@hd...> - 2013-01-10 17:34:31
|
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. This patch adds following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, it introduces a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create new irq handlers with tracepoints. - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Registering the new handers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at regstering time of irq_vector handlers. - Switch IDT to new one at enabling TP time. - Restore to an original IDT at disabling TP time. The new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <sei...@hd...> --- arch/x86/include/asm/desc.h | 31 ++++++- arch/x86/include/asm/entry_arch.h | 5 +- arch/x86/include/asm/hw_irq.h | 16 +++ arch/x86/include/asm/trace/irq_vectors.h | 154 ++++++++++++++++++++++++++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/apic/apic.c | 94 ++++++++++++++++++ arch/x86/kernel/cpu/mcheck/therm_throt.c | 14 +++ arch/x86/kernel/cpu/mcheck/threshold.c | 14 +++ arch/x86/kernel/entry_32.S | 12 ++- arch/x86/kernel/entry_64.S | 27 ++++- arch/x86/kernel/head_64.S | 6 + arch/x86/kernel/irq.c | 23 +++++ arch/x86/kernel/irq_work.c | 12 +++ arch/x86/kernel/smp.c | 35 +++++++ arch/x86/kernel/tracepoint.c | 61 ++++++++++++ include/xen/events.h | 3 + 16 files changed, 498 insertions(+), 10 deletions(-) create mode 100644 arch/x86/include/asm/trace/irq_vectors.h create mode 100644 arch/x86/kernel/tracepoint.c diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 8bf1c06..55f2fe5 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -320,6 +320,17 @@ static inline void set_nmi_gate(int gate, void *addr) } #endif +#ifdef CONFIG_TRACING +extern gate_desc trace_idt_table[]; +static inline void trace_set_intr_gate(unsigned int gate, void *addr) +{ + gate_desc s; + + pack_gate(&s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); + write_idt_entry(trace_idt_table, gate, &s); +} +#endif + static inline void _set_gate(int gate, unsigned type, void *addr, unsigned dpl, unsigned ist, unsigned seg) { @@ -331,6 +342,9 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * setup time */ write_idt_entry(idt_table, gate, &s); +#ifdef CONFIG_TRACING + write_idt_entry(trace_idt_table, gate, &s); +#endif } /* @@ -360,12 +374,25 @@ static inline void alloc_system_vector(int vector) } } -static inline void alloc_intr_gate(unsigned int n, void *addr) +#ifdef CONFIG_TRACING +static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) +{ + trace_set_intr_gate(n, addr); +} +#else +#define __trace_alloc_intr_gate(n, addr) +#endif + +static inline void __alloc_intr_gate(unsigned int n, void *addr) { - alloc_system_vector(n); set_intr_gate(n, addr); } +#define alloc_intr_gate(n, addr) \ + alloc_system_vector(n); \ + __alloc_intr_gate(n, addr); \ + __trace_alloc_intr_gate(n, trace_##addr); + /* * This routine sets up an interrupt gate at directory privilege level 3. */ diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h index 40afa00..0bb99d8 100644 --- a/arch/x86/include/asm/entry_arch.h +++ b/arch/x86/include/asm/entry_arch.h @@ -13,8 +13,9 @@ BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR) BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR) BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR) -BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR) -BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR) +BUILD_INTERRUPT3(irq_move_cleanup_interrupt, IRQ_MOVE_CLEANUP_VECTOR, + smp_irq_move_cleanup_interrupt) +BUILD_INTERRUPT3(reboot_interrupt, REBOOT_VECTOR, smp_reboot_interrupt) #endif BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index eb92a6e..2e297d8 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -76,6 +76,22 @@ extern void threshold_interrupt(void); extern void call_function_interrupt(void); extern void call_function_single_interrupt(void); +#ifdef CONFIG_TRACING +/* Interrupt handlers registered during init_IRQ */ +extern void trace_apic_timer_interrupt(void); +extern void trace_x86_platform_ipi(void); +extern void trace_error_interrupt(void); +extern void trace_irq_work_interrupt(void); +extern void trace_spurious_interrupt(void); +extern void trace_thermal_interrupt(void); +extern void trace_reschedule_interrupt(void); +extern void trace_threshold_interrupt(void); +extern void trace_call_function_interrupt(void); +extern void trace_call_function_single_interrupt(void); +#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt +#define trace_reboot_interrupt reboot_interrupt +#endif /* CONFIG_TRACING */ + /* IOAPIC */ #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs)) extern unsigned long io_apic_irqs; diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h new file mode 100644 index 0000000..9bcb27b --- /dev/null +++ b/arch/x86/include/asm/trace/irq_vectors.h @@ -0,0 +1,154 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM irq_vectors + +#if !defined(_TRACE_IRQ_VECTORS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_IRQ_VECTORS_H + +#include <linux/tracepoint.h> + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +#define DECLARE_IRQ_VECTOR_EVENT(name) \ +TRACE_EVENT_FN(name, \ + TP_PROTO(int vector), \ + \ + TP_ARGS(vector), \ + \ + TP_STRUCT__entry( \ + __field( int, vector ) \ + ), \ + \ + TP_fast_assign( \ + __entry->vector = vector; \ + ), \ + \ + TP_printk("vector=%d", __entry->vector), \ + trace_irq_vector_regfunc, trace_irq_vector_unregfunc \ +); + +/* + * local_timer_entry - called before entering a local timer interrupt + * vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(local_timer_entry) + +/* + * local_timer_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(local_timer_exit) + +/* + * reschedule_entry - called before entering a reschedule vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(reschedule_entry) + +/* + * reschedule_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(reschedule_exit) + +/* + * spurious_apic_entry - called before entering a spurious apic vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(spurious_apic_entry) + +/* + * spurious_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(spurious_apic_exit) + +/* + * error_apic_entry - called before entering an error apic vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(error_apic_entry) + +/* + * error_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(error_apic_exit) + +/* + * x86_platform_ipi_entry - called before entering a x86 platform ipi interrupt + * vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(x86_platform_ipi_entry) + +/* + * x86_platform_ipi_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(x86_platform_ipi_exit) + +/* + * irq_work_entry - called before entering a irq work interrupt + * vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(irq_work_entry) + +/* + * irq_work_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(irq_work_exit) + +/* + * call_function_entry - called before entering a call function interrupt + * vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(call_function_entry) + +/* + * call_function_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(call_function_exit) + +/* + * call_function_single_entry - called before entering a call function + * single interrupt vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(call_function_single_entry) + +/* + * call_function_single_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(call_function_single_exit) + +/* + * threshold_apic_entry - called before entering a threshold apic interrupt + * vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(threshold_apic_entry) + +/* + * threshold_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(threshold_apic_exit) + +/* + * thermal_apic_entry - called before entering a thermal apic interrupt + * vector handler + */ +DECLARE_IRQ_VECTOR_EVENT(thermal_apic_entry) + +/* + * thrmal_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DECLARE_IRQ_VECTOR_EVENT(thermal_apic_exit) + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH ../../arch/x86/include/asm/trace +#define TRACE_INCLUDE_FILE irq_vectors +#endif /* _TRACE_IRQ_VECTORS_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> + diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 34e923a..24e2080 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -100,6 +100,7 @@ obj-$(CONFIG_OF) += devicetree.o obj-$(CONFIG_UPROBES) += uprobes.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o +obj-$(CONFIG_TRACING) += tracepoint.o ### # 64 bit specific files diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b994cc8..89f3f4d 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -55,6 +55,9 @@ #include <asm/tsc.h> #include <asm/hypervisor.h> +#define CREATE_TRACE_POINTS +#include <asm/trace/irq_vectors.h> + unsigned int num_processors; unsigned disabled_cpus __cpuinitdata; @@ -934,6 +937,30 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) set_irq_regs(old_regs); } +void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + /* + * NOTE! We'd better ACK the irq immediately, + * because timer handling can be slow. + */ + ack_APIC_irq(); + /* + * update_process_times() expects us to have done irq_enter(). + * Besides, if we don't timer interrupts ignore the global + * interrupt lock, which is the WrongThing (tm) to do. + */ + irq_enter(); + exit_idle(); + trace_local_timer_entry(LOCAL_TIMER_VECTOR); + local_apic_timer_interrupt(); + trace_local_timer_exit(LOCAL_TIMER_VECTOR); + irq_exit(); + + set_irq_regs(old_regs); +} + int setup_profiling_timer(unsigned int multiplier) { return -EINVAL; @@ -1931,6 +1958,31 @@ void smp_spurious_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_spurious_interrupt(struct pt_regs *regs) +{ + u32 v; + + irq_enter(); + exit_idle(); + trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR); + /* + * Check if this really is a spurious interrupt and ACK it + * if it is a vectored one. Just in case... + * Spurious interrupts should not be ACKed. + */ + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) + ack_APIC_irq(); + + inc_irq_stat(irq_spurious_count); + + /* see sw-dev-man vol 3, chapter 7.4.13.5 */ + pr_info("spurious APIC interrupt on CPU#%d, " + "should never happen.\n", smp_processor_id()); + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR); + irq_exit(); +} + /* * This interrupt should never happen with our APIC/SMP architecture */ @@ -1974,6 +2026,48 @@ void smp_error_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_error_interrupt(struct pt_regs *regs) +{ + u32 v0, v1; + u32 i = 0; + static const char * const error_interrupt_reason[] = { + "Send CS error", /* APIC Error Bit 0 */ + "Receive CS error", /* APIC Error Bit 1 */ + "Send accept error", /* APIC Error Bit 2 */ + "Receive accept error", /* APIC Error Bit 3 */ + "Redirectable IPI", /* APIC Error Bit 4 */ + "Send illegal vector", /* APIC Error Bit 5 */ + "Received illegal vector", /* APIC Error Bit 6 */ + "Illegal register address", /* APIC Error Bit 7 */ + }; + + irq_enter(); + exit_idle(); + trace_error_apic_entry(ERROR_APIC_VECTOR); + /* First tickle the hardware, only then report what went on. -- REW */ + v0 = apic_read(APIC_ESR); + apic_write(APIC_ESR, 0); + v1 = apic_read(APIC_ESR); + ack_APIC_irq(); + atomic_inc(&irq_err_count); + + apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)", + smp_processor_id(), v0 , v1); + + v1 = v1 & 0xff; + while (v1) { + if (v1 & 0x1) + apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]); + i++; + v1 >>= 1; + } + + apic_printk(APIC_DEBUG, KERN_CONT "\n"); + + trace_error_apic_exit(ERROR_APIC_VECTOR); + irq_exit(); +} + /** * connect_bsp_APIC - attach the APIC to the interrupt system */ diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index 47a1870..e7aa7fc 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -29,6 +29,7 @@ #include <asm/idle.h> #include <asm/mce.h> #include <asm/msr.h> +#include <asm/trace/irq_vectors.h> /* How long to wait between reporting thermal events */ #define CHECK_INTERVAL (300 * HZ) @@ -389,6 +390,19 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) ack_APIC_irq(); } +asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs) +{ + irq_enter(); + exit_idle(); + trace_thermal_apic_entry(THERMAL_APIC_VECTOR); + inc_irq_stat(irq_thermal_count); + smp_thermal_vector(); + trace_thermal_apic_exit(THERMAL_APIC_VECTOR); + irq_exit(); + /* Ack only at the end to avoid potential reentry */ + ack_APIC_irq(); +} + /* Thermal monitoring depends on APIC, ACPI and clock modulation */ static int intel_thermal_supported(struct cpuinfo_x86 *c) { diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c index aa578ca..0cbef99 100644 --- a/arch/x86/kernel/cpu/mcheck/threshold.c +++ b/arch/x86/kernel/cpu/mcheck/threshold.c @@ -8,6 +8,7 @@ #include <asm/apic.h> #include <asm/idle.h> #include <asm/mce.h> +#include <asm/trace/irq_vectors.h> static void default_threshold_interrupt(void) { @@ -27,3 +28,16 @@ asmlinkage void smp_threshold_interrupt(void) /* Ack only at the end to avoid potential reentry */ ack_APIC_irq(); } + +asmlinkage void smp_trace_threshold_interrupt(void) +{ + irq_enter(); + exit_idle(); + trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR); + inc_irq_stat(irq_threshold_count); + mce_threshold_vector(); + trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR); + irq_exit(); + /* Ack only at the end to avoid potential reentry */ + ack_APIC_irq(); +} diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index ff84d54..bb52af6 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -846,7 +846,17 @@ ENTRY(name) \ CFI_ENDPROC; \ ENDPROC(name) -#define BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(name, nr, smp_##name) + +#ifdef CONFIG_TRACING +#define TRACE_BUILD_INTERRUPT(name, nr) \ + BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name) +#else +#define TRACE_BUILD_INTERRUPT(name, nr) +#endif + +#define BUILD_INTERRUPT(name, nr) \ + BUILD_INTERRUPT3(name, nr, smp_##name); \ + TRACE_BUILD_INTERRUPT(name, nr) /* The include is where all of the SMP etc. interrupts come from */ #include <asm/entry_arch.h> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 07a7a04..34279ad 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1146,7 +1146,7 @@ END(common_interrupt) /* * APIC interrupts. */ -.macro apicinterrupt num sym do_sym +.macro apicinterrupt3 num sym do_sym ENTRY(\sym) INTR_FRAME ASM_CLAC @@ -1158,15 +1158,32 @@ ENTRY(\sym) END(\sym) .endm +#ifdef CONFIG_TRACING +#define trace(sym) trace_##sym +#define smp_trace(sym) smp_trace_##sym + +.macro trace_apicinterrupt num sym +apicinterrupt3 \num trace(\sym) smp_trace(\sym) +.endm +#else +.macro trace_apicinterrupt num sym do_sym +.endm +#endif + +.macro apicinterrupt num sym do_sym +apicinterrupt3 \num \sym \do_sym +trace_apicinterrupt \num \sym +.endm + #ifdef CONFIG_SMP -apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \ +apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR \ irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt -apicinterrupt REBOOT_VECTOR \ +apicinterrupt3 REBOOT_VECTOR \ reboot_interrupt smp_reboot_interrupt #endif #ifdef CONFIG_X86_UV -apicinterrupt UV_BAU_MESSAGE \ +apicinterrupt3 UV_BAU_MESSAGE \ uv_bau_message_intr1 uv_bau_message_interrupt #endif apicinterrupt LOCAL_TIMER_VECTOR \ @@ -1454,7 +1471,7 @@ ENTRY(xen_failsafe_callback) CFI_ENDPROC END(xen_failsafe_callback) -apicinterrupt XEN_HVM_EVTCHN_CALLBACK \ +apicinterrupt3 XEN_HVM_EVTCHN_CALLBACK \ xen_hvm_callback_vector xen_evtchn_do_upcall #endif /* CONFIG_XEN */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 980053c..054213b 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -471,6 +471,12 @@ ENTRY(idt_table) ENTRY(nmi_idt_table) .skip IDT_ENTRIES * 16 +#ifdef CONFIG_TRACING + .align L1_CACHE_BYTES +ENTRY(trace_idt_table) + .skip IDT_ENTRIES * 16 +#endif + __PAGE_ALIGNED_BSS .align PAGE_SIZE ENTRY(empty_zero_page) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index e4595f1..216bec1 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -17,6 +17,7 @@ #include <asm/idle.h> #include <asm/mce.h> #include <asm/hw_irq.h> +#include <asm/trace/irq_vectors.h> atomic_t irq_err_count; @@ -228,6 +229,28 @@ void smp_x86_platform_ipi(struct pt_regs *regs) set_irq_regs(old_regs); } +void smp_trace_x86_platform_ipi(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + ack_APIC_irq(); + + irq_enter(); + + exit_idle(); + + trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR); + inc_irq_stat(x86_platform_ipis); + + if (x86_platform_ipi_callback) + x86_platform_ipi_callback(); + + trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR); + irq_exit(); + + set_irq_regs(old_regs); +} + EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq); #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c index ca8f703..09e6262 100644 --- a/arch/x86/kernel/irq_work.c +++ b/arch/x86/kernel/irq_work.c @@ -8,6 +8,7 @@ #include <linux/irq_work.h> #include <linux/hardirq.h> #include <asm/apic.h> +#include <asm/trace/irq_vectors.h> void smp_irq_work_interrupt(struct pt_regs *regs) { @@ -18,6 +19,17 @@ void smp_irq_work_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_irq_work_interrupt(struct pt_regs *regs) +{ + irq_enter(); + ack_APIC_irq(); + trace_irq_work_entry(IRQ_WORK_VECTOR); + inc_irq_stat(apic_irq_work_irqs); + irq_work_run(); + trace_irq_work_exit(IRQ_WORK_VECTOR); + irq_exit(); +} + void arch_irq_work_raise(void) { #ifdef CONFIG_X86_LOCAL_APIC diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..aad58af 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -30,6 +30,7 @@ #include <asm/proto.h> #include <asm/apic.h> #include <asm/nmi.h> +#include <asm/trace/irq_vectors.h> /* * Some notes on x86 processor bugs affecting SMP operation: * @@ -259,6 +260,18 @@ void smp_reschedule_interrupt(struct pt_regs *regs) */ } +void smp_trace_reschedule_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + trace_reschedule_entry(RESCHEDULE_VECTOR); + inc_irq_stat(irq_resched_count); + scheduler_ipi(); + trace_reschedule_exit(RESCHEDULE_VECTOR); + /* + * KVM uses this interrupt to force a cpu out of guest mode + */ +} + void smp_call_function_interrupt(struct pt_regs *regs) { ack_APIC_irq(); @@ -268,6 +281,17 @@ void smp_call_function_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_call_function_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + irq_enter(); + trace_call_function_entry(CALL_FUNCTION_VECTOR); + generic_smp_call_function_interrupt(); + inc_irq_stat(irq_call_count); + trace_call_function_exit(CALL_FUNCTION_VECTOR); + irq_exit(); +} + void smp_call_function_single_interrupt(struct pt_regs *regs) { ack_APIC_irq(); @@ -277,6 +301,17 @@ void smp_call_function_single_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_call_function_single_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + irq_enter(); + trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR); + generic_smp_call_function_single_interrupt(); + inc_irq_stat(irq_call_count); + trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); + irq_exit(); +} + static int __init nonmi_ipi_setup(char *str) { smp_no_nmi_ipi = true; diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c new file mode 100644 index 0000000..bd5642a --- /dev/null +++ b/arch/x86/kernel/tracepoint.c @@ -0,0 +1,61 @@ +/* + * Code for supporting irq vector tracepoints. + * + * Copyright (C) 2013 Seiji Aguchi <sei...@hd...> + * + */ +#include <asm/hw_irq.h> +#include <asm/desc.h> + +static struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1, + (unsigned long) trace_idt_table }; + +#ifndef CONFIG_X86_64 +gate_desc trace_idt_table[NR_VECTORS] __page_aligned_data + = { { { { 0, 0 } } }, }; +#endif + +static struct desc_ptr orig_idt_descr[NR_CPUS]; +static int trace_irq_vector_refcount; + +static void switch_to_trace_idt(void *arg) +{ + store_idt(&orig_idt_descr[smp_processor_id()]); + load_idt(&trace_idt_descr); + + return; +} + +static void restore_original_idt(void *arg) +{ + if (orig_idt_descr[smp_processor_id()].address) { + load_idt(&orig_idt_descr[smp_processor_id()]); + memset(&orig_idt_descr[smp_processor_id()], 0, + sizeof(struct desc_ptr)); + } + + return; +} + +void trace_irq_vector_regfunc(void) +{ + if (!trace_irq_vector_refcount) { + smp_call_function(switch_to_trace_idt, NULL, 0); + local_irq_disable(); + switch_to_trace_idt(NULL); + local_irq_enable(); + } + trace_irq_vector_refcount++; +} + +void trace_irq_vector_unregfunc(void) +{ + trace_irq_vector_refcount--; + if (!trace_irq_vector_refcount) { + smp_call_function(restore_original_idt, NULL, 0); + local_irq_disable(); + restore_original_idt(NULL); + local_irq_enable(); + } +} + diff --git a/include/xen/events.h b/include/xen/events.h index c6bfe01..9216d07 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -76,6 +76,9 @@ unsigned irq_from_evtchn(unsigned int evtchn); /* Xen HVM evtchn vector callback */ void xen_hvm_callback_vector(void); +#ifdef CONFIG_TRACING +#define trace_xen_hvm_callback_vector xen_hvm_callback_vector +#endif extern int xen_have_vector_callback; int xen_set_callback_via(uint64_t via); void xen_evtchn_do_upcall(struct pt_regs *regs); -- 1.7.1 |
|
From: Seiji A. <sei...@hd...> - 2013-01-10 17:33:37
|
Change log
v6 -> v7
- Divide into two patches to make a code review easier.
Summery of each patch is as follows.
- Patch 1/2
- Add an irq_vector tracing infrastructure.
- Create idt_table for tracing. It is refactored to avoid duplicating
existing logic.
- Duplicate new irq handlers inserted tracepoints.
- Patch 2/2
- Share a common logic among irq handlers to make them
manageable and readable.
v5 -> v6
- Rebased to 3.7
v4 -> v5
- Rebased to 3.6.0
- Introduce a logic switching IDT at enabling/disabling TP time
so that a time penalty makes a zero when tracepoints are disabled.
This IDT is created only when CONFIG_TRACEPOINTS is enabled.
- Remove arch_irq_vector_entry/exit and add followings again
so that we can add each tracepoint in a generic way.
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
- Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching
IDT first.
- Move irq_vectors.h in the directory of arch/x86/include/asm/trace because
I'm not sure if a logic switching IDT is sharable with other architectures.
v3 -> v4
- Add a latency measurement of each tracepoint
- Rebased to 3.6-rc6
v2 -> v3
- Remove an invalidate_tlb_vector event because it was replaced by a call function vector
in a following commit.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
v1 -> v2
- Modify variable name from irq to vector.
- Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit.
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
[Purpose of this patch]
As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.
http://www.spinics.net/lists/mm-commits/msg85707.html
<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled. They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.
There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers. Tracing such events gives
us the information about IRQ interaction with other system events.
The trace also tells where the system is spending its time. We want to
know which cores are handling interrupts and how they are affecting other
processes in the system. Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>
On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.
I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.
[Patch Description]
Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.
This patch adds following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
- local_timer_vector
- reschedule_vector
- call_function_vector
- call_function_single_vector
- irq_work_entry_vector
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
Please see descriptions in each patch.
Seiji Aguchi (2):
[RFC][PATCH v7 1/2] trace,x86: add x86 irq vector tracepoints
[RFC][PATCH v7 2/2] trace,x86: code-sharing between non-trace and trace irq handlers
arch/x86/include/asm/desc.h | 31 ++++++-
arch/x86/include/asm/entry_arch.h | 5 +-
arch/x86/include/asm/hw_irq.h | 16 +++
arch/x86/include/asm/trace/irq_vectors.h | 154 ++++++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/apic/apic.c | 71 +++++++++++---
arch/x86/kernel/cpu/mcheck/therm_throt.c | 24 ++++-
arch/x86/kernel/cpu/mcheck/threshold.c | 24 ++++-
arch/x86/kernel/entry_32.S | 12 ++-
arch/x86/kernel/entry_64.S | 27 ++++-
arch/x86/kernel/head_64.S | 6 +
arch/x86/kernel/irq.c | 31 ++++--
arch/x86/kernel/irq_work.c | 24 ++++-
arch/x86/kernel/smp.c | 65 +++++++++++--
arch/x86/kernel/tracepoint.c | 61 ++++++++++++
include/linux/hardirq.h | 25 +++++
include/xen/events.h | 3 +
17 files changed, 526 insertions(+), 54 deletions(-)
|
|
From: Seiji A. <sei...@hd...> - 2013-01-05 00:01:42
|
Tony, Can you review this patchset? Seiji > -----Original Message----- > From: Anton Vorontsov [mailto:cbo...@gm...] > Sent: Friday, December 21, 2012 6:37 PM > To: Seiji Aguchi > Cc: Luck, Tony (ton...@in...); lin...@vg...; cc...@an...; kee...@ch...; Satoru Moriya; > dle...@li...; dz...@re... > Subject: Re: [PATCH v3 0/2] pstore,efi_pstore: Avoid deadlock in non-blocking paths > > On Fri, Dec 21, 2012 at 11:27:00PM +0000, Seiji Aguchi wrote: > > Tony, > > > > Could you please apply this patchset to your tree? > > Actually, I'd prefer to take both patches via pstore tree. The EFI part is isolated and small, so even if it conflicts, it would be easy to > resolve, unlike to pstore core part. > > So, would be great if Tony could give an Ack for EFI part and we'd merge this via pstore tree. > > Thanks, > Anton |
|
From: Anton V. <cbo...@gm...> - 2012-12-21 23:40:46
|
On Fri, Dec 21, 2012 at 11:27:00PM +0000, Seiji Aguchi wrote: > Tony, > > Could you please apply this patchset to your tree? Actually, I'd prefer to take both patches via pstore tree. The EFI part is isolated and small, so even if it conflicts, it would be easy to resolve, unlike to pstore core part. So, would be great if Tony could give an Ack for EFI part and we'd merge this via pstore tree. Thanks, Anton |
|
From: Seiji A. <sei...@hd...> - 2012-12-21 23:27:20
|
Tony, Could you please apply this patchset to your tree? Seiji > -----Original Message----- > From: Don Zickus [mailto:dz...@re...] > Sent: Thursday, December 20, 2012 2:09 PM > To: Seiji Aguchi > Cc: lin...@vg...; Luck, Tony (ton...@in...); cc...@an...; kee...@ch...; > cbo...@gm...; Satoru Moriya; dle...@li... > Subject: Re: [PATCH v3 0/2] pstore,efi_pstore: Avoid deadlock in non-blocking paths > > On Thu, Dec 20, 2012 at 03:11:14PM +0000, Seiji Aguchi wrote: > > Changelog > > v2 -> v3 > > - Merge modifications of pstore part in 2/2 to 1/2. > > - Rename pstore_is_non_blocking_path() to pstore_cannot_block_path(). > > > > v1 -> v2 > > - Erase a logic checking the number of online cpus. > > - Create a patchset to fix deadlocking issue in both pstore filesystem and > > efi_pstore driver. > > - Introduce a function, is_non_blocking_path(), to check if pstore > > is in panic and emergency-restart paths (PATCH 1/2) > > - Avoid efi_pstore_driver is blocked in non-blocking paths > > such as nmi, panic and emergency-restart paths (PATCH 2/2) > > > > [Issue] > > > > There are some paths in kernel which shouldn't be blocked, like NMI, > > panic case after stopping cpus, emergency-restart. > > I am ok with it. > > Acked-by: Don Zickus <dz...@re...> |
|
From: Satoru M. <sat...@hd...> - 2012-12-21 22:46:51
|
Changelog v1 -> v2 - add RFC tag again - change semantics as follows - set event threads' priority to maxprio - set vcpu threads' priority to maxprio - 1 - isolate all the posix stuff and put them into os_prioritize() in os-posix.c/qemu-os-win32.h to avoid breaking win32 build - introduce qemu_init_realtime(), qemu_realtime_is_enable and qemu_realtime_get_parameters() and struct QemuRealtimeInfo to keep realtime option and remove related global variables in vl.c - add other benchmark(qpid-latency-test) result We have some plans to migrate old enterprise/control systems which require low latency (msec order) to kvm virtualized environment. In order to satify the requirements, this patch adds realtime option to qemu: -realtime maxprio=<prio>,policy=<pol> This option change the scheduling policy and priority to realtime one (event threads: maxprio, vcpu threads: maxprio - 1) and mlock all qemu and guest memory. Of course, we need more improvements to keep latency low in qemu virtualized environment and this is a first step. OTOH, we can meet the requirement of our first migration project with this patch. [ Note ] This version doesn't support vhost, vpnc and linux-aio. These are some basic performance results: Host : 4 core, 4GB Guest: 1 core, 512MB Benchmark: qpid-latency-test http://qpid.apache.org/ https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_MRG/2/html/Messaging_Installation_and_Configuration_Guide/qpid_latency_test.html Command: - qemu $ qemu -smp 1 -m 512 -enable-kvm -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1 -drive file=vm.img,if=virtio (-realtime maxprio=99,policy=fifo) - benchmark $ chrt -f 99 qpid-latency-test --tcp-nodelay --rate 10000 -b <server> Results: worst latency (msec) from 100 run - no load 1. normal qemu : 17.468400 2. chrt qemu(*) : 10.019900 3. realtime qemu: 8.048370 - load (iperf, server:vm, client:other physical sercer) 4. normal qemu : 26.711100 5. chrt qemu : 8.485140 6. realtime qemu: 10.176700 (*) $ chrt -f -p 99 <event_thread_tid> $ chrt -f -p 98 <vcpu_thread_tid> Any comments are welcome. Regards, Satoru Signed-off-by: Satoru Moriya <sat...@hd...> --- cpus.c | 17 +++++++++++++++++ include/qemu/thread.h | 4 ++++ include/sysemu/os-posix.h | 1 + include/sysemu/os-win32.h | 1 + os-posix.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++ qemu-config.c | 16 ++++++++++++++++ qemu-options.hx | 9 +++++++++ qemu-thread-posix.c | 27 ++++++++++++++++++++++++++ qemu-thread-win32.c | 13 +++++++++++++ vl.c | 33 ++++++++++++++++++++++++++++++++ 10 files changed, 169 insertions(+) diff --git a/cpus.c b/cpus.c index 4a7782a..a049970 100644 --- a/cpus.c +++ b/cpus.c @@ -734,6 +734,9 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) CPUArchState *env = arg; CPUState *cpu = ENV_GET_CPU(env); int r; + int rt_policy, rt_priority; + struct sched_param sp; + qemu_mutex_lock(&qemu_global_mutex); qemu_thread_get_self(cpu->thread); @@ -746,6 +749,20 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) exit(1); } + if (qemu_realtime_is_enabled()) { + qemu_realtime_get_parameters(&rt_policy, &rt_priority); + /* + * vcpu threads' priority must be set to event thread priority -1 + * to avoid starvation. + */ + sp.sched_priority = rt_priority - 1; + r = sched_setscheduler(0, rt_policy, &sp); + if (r < 0) { + perror("Setting realtime policy failed"); + exit(1); + } + } + qemu_kvm_init_cpu_signals(env); /* signal CPU creation */ diff --git a/include/qemu/thread.h b/include/qemu/thread.h index c02404b..3d8b3d2 100644 --- a/include/qemu/thread.h +++ b/include/qemu/thread.h @@ -53,4 +53,8 @@ void qemu_thread_get_self(QemuThread *thread); bool qemu_thread_is_self(QemuThread *thread); void qemu_thread_exit(void *retval); +void qemu_init_realtime(int, int); +bool qemu_realtime_is_enabled(void); +void qemu_realtime_get_parameters(int *, int *); + #endif diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h index 7f198e4..e5995b0 100644 --- a/include/sysemu/os-posix.h +++ b/include/sysemu/os-posix.h @@ -31,6 +31,7 @@ void os_set_proc_name(const char *s); void os_setup_signal_handling(void); void os_daemonize(void); void os_setup_post(void); +void os_prioritize(const char *, int); typedef struct timeval qemu_timeval; #define qemu_gettimeofday(tp) gettimeofday(tp, NULL) diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h index d0e9234..946b566 100644 --- a/include/sysemu/os-win32.h +++ b/include/sysemu/os-win32.h @@ -78,6 +78,7 @@ static inline void os_daemonize(void) {} static inline void os_setup_post(void) {} void os_set_line_buffering(void); static inline void os_set_proc_name(const char *dummy) {} +static inline void os_prioritize(const char *pol, int prio) {} #if !defined(EPROTONOSUPPORT) # define EPROTONOSUPPORT EINVAL diff --git a/os-posix.c b/os-posix.c index 5c64518..8fe0fa2 100644 --- a/os-posix.c +++ b/os-posix.c @@ -33,12 +33,14 @@ #include <pwd.h> #include <grp.h> #include <libgen.h> +#include <sched.h> /* Needed early for CONFIG_BSD etc. */ #include "config-host.h" #include "sysemu/sysemu.h" #include "net/slirp.h" #include "qemu-options.h" +#include "qemu-thread.h" #ifdef CONFIG_LINUX #include <sys/prctl.h> @@ -363,3 +365,49 @@ bool is_daemonized(void) { return daemonize; } + +void os_prioritize(const char *rt_sched_policy, int max_sched_priority) +{ + int rt_pol, sys_min_prio, sys_max_prio; + + if (rt_sched_policy) { + if (!strcmp(rt_sched_policy, "rr")) { + rt_pol = SCHED_RR; + } else if (!strcmp(rt_sched_policy, "fifo")) { + rt_pol = SCHED_FIFO; + } else { + fprintf(stderr, "qemu: invalid option value '%s'\n", + rt_sched_policy); + exit(1); + } + } else { + rt_pol = SCHED_RR; + } + + sys_min_prio = sched_get_priority_min(rt_pol); + sys_max_prio = sched_get_priority_max(rt_pol); + + if (max_sched_priority < sys_min_prio + 1) { + /* + * We set event threads' priority to max_sched_priorty and + * vcpu threads' to max_sched_priority - 1 in order to avoid + * starvation. So, it must be > sys_min_prio + 1. + */ + fprintf(stderr, "qemu: invalid option maxprio=%d. It must be >= %d\n", + max_sched_priority, sys_min_prio + 1); + exit(1); + } + + if (sys_max_prio < max_sched_priority) { + fprintf(stderr, "qemu: invalid option maxprio=%d. It must be <= %d\n", + max_sched_priority, sys_max_prio); + exit(1); + } + + qemu_init_realtime(rt_pol, max_sched_priority); + + if (mlockall(MCL_CURRENT | MCL_FUTURE)) { + perror("mlockall"); + exit(1); + } +} diff --git a/qemu-config.c b/qemu-config.c index 2188c3e..b945d07 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -647,6 +647,21 @@ static QemuOptsList qemu_object_opts = { }, }; +static QemuOptsList qemu_realtime_opts = { + .name = "realtime", + .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head), + .desc = { + { + .name = "maxprio", + .type = QEMU_OPT_NUMBER, + }, { + .name = "policy", + .type = QEMU_OPT_STRING, + }, + { /* end of list */ } + }, +}; + static QemuOptsList *vm_config_groups[32] = { &qemu_drive_opts, &qemu_chardev_opts, @@ -664,6 +679,7 @@ static QemuOptsList *vm_config_groups[32] = { &qemu_sandbox_opts, &qemu_add_fd_opts, &qemu_object_opts, + &qemu_realtime_opts, NULL, }; diff --git a/qemu-options.hx b/qemu-options.hx index 9df0cde..968a20a 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2447,6 +2447,15 @@ STEXI Do not start CPU at startup (you must type 'c' in the monitor). ETEXI +DEF("realtime", HAS_ARG, QEMU_OPTION_realtime, + "-realtime maxprio=prio[,policy=pol]\n", + QEMU_ARCH_ALL) +STEXI +@item -realtime maxprio=@var{prio}[,policy=@var{pold}] +@findex -realtime +run qemu as realtime process with priority @var{prio} and policy @var{pol}. +ETEXI + DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \ "-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL) STEXI diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c index 7be292e..10a97cc 100644 --- a/qemu-thread-posix.c +++ b/qemu-thread-posix.c @@ -22,6 +22,15 @@ #include <sys/time.h> #include "qemu/thread.h" +struct QemuRealtimeInfo { + bool is_realtime; + int policy; + int max_priority; +}; +typedef struct QemuRealtimeInfo QemuRealtimeInfo; + +static QemuRealtimeInfo rt_info; + static void error_exit(int err, const char *msg) { fprintf(stderr, "qemu: %s: %s\n", msg, strerror(err)); @@ -324,3 +333,21 @@ void *qemu_thread_join(QemuThread *thread) } return ret; } + +void qemu_init_realtime(int rt_sched_policy, int max_sched_priority) +{ + rt_info.is_realtime = true; + rt_info.policy = rt_sched_policy; + rt_info.max_priority = max_sched_priority; +} + +bool qemu_realtime_is_enabled(void) +{ + return rt_info.is_realtime; +} + +void qemu_realtime_get_parameters(int *policy, int *max_priority) +{ + *policy = rt_info.policy; + *max_priority = rt_info.max_priority; +} diff --git a/qemu-thread-win32.c b/qemu-thread-win32.c index 8037b39..3beebcf 100644 --- a/qemu-thread-win32.c +++ b/qemu-thread-win32.c @@ -369,3 +369,16 @@ bool qemu_thread_is_self(QemuThread *thread) { return GetCurrentThreadId() == thread->tid; } + +void qemu_init_realtime(int rt_sched_policy, int max_sched_priority) +{ +} + +bool qemu_realtime_is_enabled(void) +{ + return false; +} + +void qemu_realtime_get_parameters(int *policy, int *max_priority) +{ +} diff --git a/vl.c b/vl.c index e6a8d89..c310587 100644 --- a/vl.c +++ b/vl.c @@ -29,6 +29,7 @@ #include <sys/time.h> #include <zlib.h> #include "qemu/bitmap.h" +#include "qemu-thread.h" /* Needed early for CONFIG_BSD etc. */ #include "config-host.h" @@ -1148,6 +1149,17 @@ static void smp_parse(const char *optarg) max_cpus = smp_cpus; } +static void configure_realtime(QemuOpts *opts) +{ + const char *pol; + int prio; + + pol = qemu_opt_get(opts, "policy"); + prio = qemu_opt_get_number(opts, "maxprio", 1); + + os_prioritize(pol, prio); +} + /***********************************************************/ /* USB devices */ @@ -1754,9 +1766,22 @@ static void main_loop(void) { bool nonblocking; int last_io = 0; + int rt_policy, rt_priority; + struct sched_param sp; #ifdef CONFIG_PROFILER int64_t ti; #endif + + if (qemu_realtime_is_enabled()) { + qemu_realtime_get_parameters(&rt_policy, &rt_priority); + + sp.sched_priority = rt_priority;; + if (sched_setscheduler(0, rt_policy, &sp) < 0) { + perror("Setting realtime policy failed"); + exit(1); + } + } + do { nonblocking = !kvm_enabled() && last_io > 0; #ifdef CONFIG_PROFILER @@ -2758,6 +2783,14 @@ int main(int argc, char **argv, char **envp) } numa_add(optarg); break; + case QEMU_OPTION_realtime: + opts = qemu_opts_parse(qemu_find_opts("realtime"), optarg, 0); + if (!opts) { + fprintf(stderr, "parse error: %s\n", optarg); + exit(1); + } + configure_realtime(opts); + break; case QEMU_OPTION_display: display_type = select_display(optarg); break; -- 1.7.11.7 |
|
From: Don Z. <dz...@re...> - 2012-12-20 19:09:42
|
On Thu, Dec 20, 2012 at 03:11:14PM +0000, Seiji Aguchi wrote: > Changelog > v2 -> v3 > - Merge modifications of pstore part in 2/2 to 1/2. > - Rename pstore_is_non_blocking_path() to pstore_cannot_block_path(). > > v1 -> v2 > - Erase a logic checking the number of online cpus. > - Create a patchset to fix deadlocking issue in both pstore filesystem and > efi_pstore driver. > - Introduce a function, is_non_blocking_path(), to check if pstore > is in panic and emergency-restart paths (PATCH 1/2) > - Avoid efi_pstore_driver is blocked in non-blocking paths > such as nmi, panic and emergency-restart paths (PATCH 2/2) > > [Issue] > > There are some paths in kernel which shouldn't be blocked, > like NMI, panic case after stopping cpus, emergency-restart. I am ok with it. Acked-by: Don Zickus <dz...@re...> |
|
From: Seiji A. <sei...@hd...> - 2012-12-20 15:13:03
|
[Issue]
There is a scenario which efi_pstore may hang up:
- cpuA grabs efivars->lock
- cpuB panics and calls smp_send_stop
- smp_send_stop sends IRQ to cpuA
- after 1 second, cpuB gives up on cpuA and sends an NMI instead
- cpuA is now in an NMI handler while still holding efivars->lock
- cpuB is deadlocked
This case may happen if a firmware has a bug and
cpuA is stuck talking with it.
[Solution]
This patch changes a spin_lock to a spin_trylock in non-blocking paths.
and if the spin_lock has already taken by another cpu,
it returns without accessing to a firmware to avoid the deadlock.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
drivers/firmware/efivars.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 7b1c374..ef5070d 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -1209,7 +1209,16 @@ static int efi_pstore_write(enum pstore_type_id type,
u64 storage_space, remaining_space, max_variable_size;
efi_status_t status = EFI_NOT_FOUND;
- spin_lock(&efivars->lock);
+ if (pstore_cannot_block_path(reason)) {
+ /*
+ * If the lock is taken by another cpu in non-blocking path,
+ * this driver returns without entering firmware to avoid
+ * hanging up.
+ */
+ if (!spin_trylock(&efivars->lock))
+ return -EBUSY;
+ } else
+ spin_lock(&efivars->lock);
/*
* Check if there is a space enough to log.
-- 1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2012-12-20 15:12:23
|
[Issue]
When pstore is in panic and emergency-restart paths, it may be blocked
in those paths because it simply takes spin_lock.
This is an example scenario which pstore may hang up in a panic path:
- cpuA grabs psinfo->buf_lock
- cpuB panics and calls smp_send_stop
- smp_send_stop sends IRQ to cpuA
- after 1 second, cpuB gives up on cpuA and sends an NMI instead
- cpuA is now in an NMI handler while still holding buf_lock
- cpuB is deadlocked
This case may happen if a firmware has a bug and
cpuA is stuck talking with it more than one second.
Also, this is a similar scenario in an emergency-restart path:
- cpuA grabs psinfo->buf_lock and stucks in a firmware
- cpuB kicks emergency-restart via either sysrq-b or hangcheck timer.
And then, cpuB is deadlocked by taking psinfo->buf_lock again.
[Solution]
This patch avoids the deadlocking issues in both panic and emergency_restart
paths by introducing a function, pstore_cannot_block_path(), to check if a cpu
can be blocked in current path.
With this patch, pstore is not blocked even if another cpu has
taken a spin_lock, in those paths by changing from spin_lock_irqsave
to spin_trylock_irqsave.
In addition, according to a comment of emergency_restart() in kernel/sys.c,
spin_lock shouldn't be taken in an emergency_restart path to avoid
deadlock. This patch fits the comment below.
<snip>
/**
* emergency_restart - reboot the system
*
* Without shutting down any hardware or taking any locks
* reboot the system. This is called when we know we are in
* trouble so this is our best effort to reboot. This is
* safe to call in interrupt context.
*/
void emergency_restart(void)
<snip>
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
fs/pstore/platform.c | 34 ++++++++++++++++++++++++++++------
include/linux/pstore.h | 6 ++++++
2 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 5ea2e77..6bd3369 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -96,6 +96,26 @@ static const char *get_reason_str(enum kmsg_dump_reason reason)
}
}
+bool pstore_cannot_block_path(enum kmsg_dump_reason reason)
+{
+ /*
+ * In case of NMI path, pstore shouldn't be blocked
+ * regardless of reason.
+ */
+ if (in_nmi())
+ return true;
+
+ switch (reason) {
+ /* In panic case, other cpus are stopped by smp_send_stop(). */
+ case KMSG_DUMP_PANIC:
+ /* Emergency restart shouldn't be blocked by spin lock. */
+ case KMSG_DUMP_EMERG:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* callback from kmsg_dump. (s2,l2) has the most recently
* written bytes, older bytes are in (s1,l1). Save as much
@@ -114,10 +134,12 @@ static void pstore_dump(struct kmsg_dumper *dumper,
why = get_reason_str(reason);
- if (in_nmi()) {
- is_locked = spin_trylock(&psinfo->buf_lock);
- if (!is_locked)
- pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
+ if (pstore_cannot_block_path(reason)) {
+ is_locked = spin_trylock_irqsave(&psinfo->buf_lock, flags);
+ if (!is_locked) {
+ pr_err("pstore dump routine blocked in %s path, may corrupt error record\n"
+ , in_nmi() ? "NMI" : why);
+ }
} else
spin_lock_irqsave(&psinfo->buf_lock, flags);
oopscount++;
@@ -143,9 +165,9 @@ static void pstore_dump(struct kmsg_dumper *dumper,
total += hsize + len;
part++;
}
- if (in_nmi()) {
+ if (pstore_cannot_block_path(reason)) {
if (is_locked)
- spin_unlock(&psinfo->buf_lock);
+ spin_unlock_irqrestore(&psinfo->buf_lock, flags);
} else
spin_unlock_irqrestore(&psinfo->buf_lock, flags);
}
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 1788909..75d0176 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -68,12 +68,18 @@ struct pstore_info {
#ifdef CONFIG_PSTORE
extern int pstore_register(struct pstore_info *);
+extern bool pstore_cannot_block_path(enum kmsg_dump_reason reason);
#else
static inline int
pstore_register(struct pstore_info *psi)
{
return -ENODEV;
}
+static inline bool
+pstore_cannot_block_path(enum kmsg_dump_reason reason)
+{
+ return false;
+}
#endif
#endif /*_LINUX_PSTORE_H*/
-- 1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2012-12-20 15:11:30
|
Changelog
v2 -> v3
- Merge modifications of pstore part in 2/2 to 1/2.
- Rename pstore_is_non_blocking_path() to pstore_cannot_block_path().
v1 -> v2
- Erase a logic checking the number of online cpus.
- Create a patchset to fix deadlocking issue in both pstore filesystem and
efi_pstore driver.
- Introduce a function, is_non_blocking_path(), to check if pstore
is in panic and emergency-restart paths (PATCH 1/2)
- Avoid efi_pstore_driver is blocked in non-blocking paths
such as nmi, panic and emergency-restart paths (PATCH 2/2)
[Issue]
There are some paths in kernel which shouldn't be blocked,
like NMI, panic case after stopping cpus, emergency-restart.
On the other hand, current pstore avoids blocking in a NMI path
but it may be blocked in other paths.
Also, an efi_pstore driver may be blocked in all of those paths
because it simply takes a spin lock at writing time.
If they are blocked in those paths, the system will hang up and
it has a big impact for users.
Here is an example scenario which pstore is blocked in panic path.
- cpuA grabs psinfo->buf_lock
- cpuB panics and calls smp_send_stop
- smp_send_stop sends IRQ to cpuA
after 1 second, cpuB gives up on cpuA and sends an NMI instead
- cpuA is now in an NMI handler while still holding buf_lock.
And then, cpuB is deadlocked by taking efi_pstore->lock again.
This case may happen if a firmware has a bug and cpuA is stuck in it.
[Solution]
This patchset avoids that pstore and efi_pstore driver are blocked
in the non-blocking paths like NMI, panic, and emrgency-restart
by introducing a function checking if they are in those paths.
Please see each patch for detailed explanations.
Seiji Aguchi (2):
[PATCH v3 1/2] pstore: Avoid deadlock in panic and emergency-restart path
[PATCH v3 2/2] efi_pstore: Avoid deadlock in non-blocking paths
drivers/firmware/efivars.c | 11 ++++++++++-
fs/pstore/platform.c | 34 ++++++++++++++++++++++++++++------
include/linux/pstore.h | 6 ++++++
3 files changed, 44 insertions(+), 7 deletions(-)
|
|
From: Seiji A. <sei...@hd...> - 2012-12-20 15:02:24
|
>Though it seems most of patch2 should have been in patch1 (except for the efi part). OK. I will merge pstore part in patch2 to patch1. > > The only nitpick I have is the name 'is_non_blocking_path'. I don't know why, but the name doesn't seem exactly right to me. I was > thinking something like 'pstore_cannot_block_path' or something. But it is just a nitpick, so it doesn't matter either way to me. I will rename the function in next patch. Seiji |
|
From: Seiji A. <sei...@hd...> - 2012-12-18 05:06:30
|
Thank you for reviewing my patch. I will update it in accordance with your comment. Seiji > -----Original Message----- > From: Steven Rostedt [mailto:ro...@go...] > Sent: Monday, December 17, 2012 10:02 PM > To: Seiji Aguchi > Cc: x8...@ke...; lin...@vg...; H. Peter Anvin (hp...@zy...); Thomas Gleixner (tg...@li...); > 'mi...@el...' (mi...@el...); Borislav Petkov (bp...@al...); Satoru Moriya; dle...@li...; linux- > ed...@vg...; Luck, Tony (ton...@in...) > Subject: Re: [RFC][PATCH v6]trace,x86: add x86 irq vector tracepoints > > On Tue, 2012-12-18 at 01:34 +0000, Seiji Aguchi wrote: > > Change log > > > > v5 -> v6 > > - Rebased to 3.7 > > > > v4 -> v5 > > - Rebased to 3.6.0 > > > > - Introduce a logic switching IDT at enabling/disabling TP time > > so that a time penalty makes a zero when tracepoints are disabled. > > This IDT is created only when CONFIG_TRACEPOINTS is enabled. > > > > - Remove arch_irq_vector_entry/exit and add followings again > > so that we can add each tracepoint in a generic way. > > - error_apic_vector > > - thermal_apic_vector > > - threshold_apic_vector > > - spurious_apic_vector > > - x86_platform_ipi_vector > > > > - Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching > > IDT first. > > > > - Move irq_vectors.h in the directory of arch/x86/include/asm/trace because > > I'm not sure if a logic switching IDT is sharable with other architectures. > > > > v3 -> v4 > > - Add a latency measurement of each tracepoint > > - Rebased to 3.6-rc6 > > > > v2 -> v3 > > - Remove an invalidate_tlb_vector event because it was replaced by a call function vector > > in a following commit. > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit; > > h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4 > > > > v1 -> v2 > > - Modify variable name from irq to vector. > > - Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit. > > - error_apic_vector > > - thermal_apic_vector > > - threshold_apic_vector > > - spurious_apic_vector > > - x86_platform_ipi_vector > > > > [Purpose of this patch] > > > > As Vaibhav explained in the thread below, tracepoints for irq vectors > > are useful. > > > > http://www.spinics.net/lists/mm-commits/msg85707.html > > > > <snip> > > The current interrupt traces from irq_handler_entry and > > irq_handler_exit provide when an interrupt is handled. They provide > > good data about when the system has switched to kernel space and how > > it affects the currently running processes. > > > > There are some IRQ vectors which trigger the system into kernel space, > > which are not handled in generic IRQ handlers. Tracing such events > > gives us the information about IRQ interaction with other system events. > > > > The trace also tells where the system is spending its time. We want > > to know which cores are handling interrupts and how they are affecting > > other processes in the system. Also, the trace provides information > > about when the cores are idle and which interrupts are changing that state. > > <snip> > > > > On the other hand, my usecase is tracing just local timer event and > > getting a value of instruction pointer. > > > > I suggested to add an argument local timer event to get instruction pointer before. > > But there is another way to get it with external module like systemtap. > > So, I don't need to add any argument to irq vector tracepoints now. > > > > [Patch Description] > > > > Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. > > But there is an above use case to trace specific irq_vector rather than tracing all events. > > In this case, we are concerned about overhead due to unwanted events. > > > > This patch adds following tracepoints instead of introducing irq_vector_entry/exit. > > so that we can enable them independently. > > - local_timer_vector > > - reschedule_vector > > - call_function_vector > > - call_function_single_vector > > - irq_work_entry_vector > > - error_apic_vector > > - thermal_apic_vector > > - threshold_apic_vector > > - spurious_apic_vector > > - x86_platform_ipi_vector > > > > Also, it introduces a logic switching IDT at enabling/disabling time > > so that a time penalty makes a complete zero when tracepoints are disabled. Detailed explanations are as follows. > > - Create new irq handlers inserted tracepoints by using macros. > > - Create a new IDT, trace_idt_table, at boot time by duplicating original IDT, idt table, and > > registering the new handers for tracpoints. > > - Switch IDT to new one at enabling TP time. > > - Restore to an original IDT at disabling TP time. > > The new IDT is created only when CONFIG_TRACEPOINTS is enabled to avoid being used for other purposes. > > > > Signed-off-by: Seiji Aguchi <sei...@hd...> > > --- > > arch/x86/include/asm/desc.h | 27 +++++ > > arch/x86/include/asm/entry_arch.h | 32 +++++ > > arch/x86/include/asm/hw_irq.h | 14 +++ > > arch/x86/kernel/Makefile | 1 + > > arch/x86/kernel/apic/apic.c | 186 +++++++++++++++++------------- > > arch/x86/kernel/cpu/mcheck/therm_throt.c | 26 +++-- > > arch/x86/kernel/cpu/mcheck/threshold.c | 27 +++-- > > arch/x86/kernel/entry_64.S | 33 ++++++ > > arch/x86/kernel/head_64.S | 6 + > > arch/x86/kernel/irq.c | 44 ++++--- > > arch/x86/kernel/irq_work.c | 22 +++- > > arch/x86/kernel/irqinit.c | 2 + > > arch/x86/kernel/smp.c | 68 ++++++++---- > > 13 files changed, 345 insertions(+), 143 deletions(-) > > > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > > index 8bf1c06..52becf4 100644 > > --- a/arch/x86/include/asm/desc.h > > +++ b/arch/x86/include/asm/desc.h > > @@ -345,6 +345,33 @@ static inline void set_intr_gate(unsigned int n, void *addr) > > _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } > > > > +#ifdef CONFIG_TRACEPOINTS > > +extern gate_desc trace_idt_table[]; > > +extern void trace_idt_table_init(void); static inline void > > +_trace_set_gate(int gate, unsigned type, void *addr, > > + unsigned dpl, unsigned ist, unsigned seg) { > > + gate_desc s; > > + > > + pack_gate(&s, type, (unsigned long)addr, dpl, ist, seg); > > + /* > > + * does not need to be atomic because it is only done once at > > + * setup time > > + */ > > + write_idt_entry(trace_idt_table, gate, &s); } > > + > > +static inline void trace_set_intr_gate(unsigned int n, void *addr) { > > + BUG_ON((unsigned)n > 0xFF); > > + _trace_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } #else > > +static inline void trace_idt_table_init(void) { } #endif > > + > > extern int first_system_vector; > > /* used_vectors is BITMAP for irq is not managed by percpu vector_irq > > */ extern unsigned long used_vectors[]; diff --git > > a/arch/x86/include/asm/entry_arch.h > > b/arch/x86/include/asm/entry_arch.h > > index 40afa00..8ef3900 100644 > > --- a/arch/x86/include/asm/entry_arch.h > > +++ b/arch/x86/include/asm/entry_arch.h > > @@ -45,3 +45,35 @@ > > BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR) > > #endif > > > > #endif > > + > > +#ifdef CONFIG_TRACEPOINTS > > +#ifdef CONFIG_SMP > > +BUILD_INTERRUPT(trace_reschedule_interrupt, RESCHEDULE_VECTOR) > > +BUILD_INTERRUPT(trace_call_function_interrupt, CALL_FUNCTION_VECTOR) > > +BUILD_INTERRUPT(trace_call_function_single_interrupt, > > + CALL_FUNCTION_SINGLE_VECTOR) > > +#endif > > + > > +BUILD_INTERRUPT(trace_x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) > > + > > +#ifdef CONFIG_X86_LOCAL_APIC > > + > > +BUILD_INTERRUPT(trace_apic_timer_interrupt, LOCAL_TIMER_VECTOR) > > +BUILD_INTERRUPT(trace_error_interrupt, ERROR_APIC_VECTOR) > > +BUILD_INTERRUPT(trace_spurious_interrupt, SPURIOUS_APIC_VECTOR) > > + > > +#ifdef CONFIG_IRQ_WORK > > +BUILD_INTERRUPT(trace_irq_work_interrupt, IRQ_WORK_VECTOR) #endif > > + > > +#ifdef CONFIG_X86_THERMAL_VECTOR > > +BUILD_INTERRUPT(trace_thermal_interrupt, THERMAL_APIC_VECTOR) #endif > > + > > +#ifdef CONFIG_X86_MCE_THRESHOLD > > +BUILD_INTERRUPT(trace_threshold_interrupt, THRESHOLD_APIC_VECTOR) > > +#endif > > + > > +#endif > > Um, you could save all the above duplication if you just did: > > in arch/x86/kernel/entry_32.S > > #ifdef CONFIG_TRACING > #define BUILD_TRACE_INTERRUPT(name, nr) \ > BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name) #else #define BUILD_TRACE_INTERRUPT(name, nr) #endif > > #define BUILD_INTERRUPT(name, nr) \ > BUILD_INTERRUPT3(name, nr, smp_##name) \ > BUILD_TRACE_INTERRUPT(name, nr) > > #include <asm/entry_arch.h> > > > > + > > +#endif /* CONFIG_TRACEPOINTS */ > > diff --git a/arch/x86/include/asm/hw_irq.h > > b/arch/x86/include/asm/hw_irq.h index eb92a6e..4472a78 100644 > > --- a/arch/x86/include/asm/hw_irq.h > > +++ b/arch/x86/include/asm/hw_irq.h > > @@ -76,6 +76,20 @@ extern void threshold_interrupt(void); extern void > > call_function_interrupt(void); extern void > > call_function_single_interrupt(void); > > > > +#ifdef CONFIG_TRACEPOINTS > > +/* Interrupt handlers registered during init_IRQ */ extern void > > +trace_apic_timer_interrupt(void); > > +extern void trace_x86_platform_ipi(void); extern void > > +trace_error_interrupt(void); extern void > > +trace_irq_work_interrupt(void); extern void > > +trace_spurious_interrupt(void); extern void > > +trace_thermal_interrupt(void); extern void > > +trace_reschedule_interrupt(void); > > +extern void trace_threshold_interrupt(void); extern void > > +trace_call_function_interrupt(void); > > +extern void trace_call_function_single_interrupt(void); > > +#endif /* CONFIG_TRACEPOINTS */ > > + > > /* IOAPIC */ > > #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & > > io_apic_irqs)) extern unsigned long io_apic_irqs; diff --git > > a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index > > 34e923a..33504ea 100644 > > --- a/arch/x86/kernel/Makefile > > +++ b/arch/x86/kernel/Makefile > > @@ -100,6 +100,7 @@ obj-$(CONFIG_OF) += devicetree.o > > obj-$(CONFIG_UPROBES) += uprobes.o > > > > obj-$(CONFIG_PERF_EVENTS) += perf_regs.o > > +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o > > > > ### > > # 64 bit specific files > > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > > index b994cc8..20c4b1f 100644 > > --- a/arch/x86/kernel/apic/apic.c > > +++ b/arch/x86/kernel/apic/apic.c > > @@ -55,6 +55,9 @@ > > #include <asm/tsc.h> > > #include <asm/hypervisor.h> > > > > +#define CREATE_TRACE_POINTS > > +#include <asm/trace/irq_vectors.h> > > + > > unsigned int num_processors; > > > > unsigned disabled_cpus __cpuinitdata; @@ -912,27 +915,34 @@ static > > void local_apic_timer_interrupt(void) > > * [ if a single-CPU system runs an SMP kernel then we call the local > > * interrupt as well. Thus we cannot inline the local irq ... ] > > */ > > -void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) -{ > > - struct pt_regs *old_regs = set_irq_regs(regs); > > - > > - /* > > - * NOTE! We'd better ACK the irq immediately, > > - * because timer handling can be slow. > > - */ > > - ack_APIC_irq(); > > - /* > > - * update_process_times() expects us to have done irq_enter(). > > - * Besides, if we don't timer interrupts ignore the global > > - * interrupt lock, which is the WrongThing (tm) to do. > > - */ > > - irq_enter(); > > - exit_idle(); > > - local_apic_timer_interrupt(); > > - irq_exit(); > > - > > - set_irq_regs(old_regs); > > -} > > +#define SMP_APIC_TIMER_INTERRUPT(trace, trace_enter, trace_exit) \ > > +void __irq_entry smp_##trace##apic_timer_interrupt(struct pt_regs *regs)\ > > +{ \ > > + struct pt_regs *old_regs = set_irq_regs(regs); \ > > + \ > > + /* \ > > + * NOTE! We'd better ACK the irq immediately, \ > > + * because timer handling can be slow. \ > > + */ \ > > + ack_APIC_irq(); \ > > + /* \ > > + * update_process_times() expects us to have done irq_enter(). \ > > + * Besides, if we don't timer interrupts ignore the global \ > > + * interrupt lock, which is the WrongThing (tm) to do. \ > > + */ \ > > + irq_enter(); \ > > + exit_idle(); \ > > + trace_enter; \ > > + local_apic_timer_interrupt(); \ > > + trace_exit; \ > > + irq_exit(); \ > > + \ > > + set_irq_regs(old_regs); \ > > +} > > + > > +SMP_APIC_TIMER_INTERRUPT(,,) > > +SMP_APIC_TIMER_INTERRUPT(trace_, trace_local_timer_entry(LOCAL_TIMER_VECTOR), > > + trace_local_timer_exit(LOCAL_TIMER_VECTOR)) > > > These macros are just plan ugly as all hell. What about using static inlines and something like: > > static inline void entering_irq(void) > { > irq_enter(); > exit_idle(); > } > > static inline void exiting_irq(void) > { > irq_exit(); > } > > -- Just duplicate the two apic timer interrupts because it has the needed ack first. > > > Then the rest could be: > > static inline void __smp_spurious_interrupt(struct pt_regs *regs) { > u32 v; > > > /* > * Check if this really is a spurious interrupt and ACK it > * if it is a vectored one. Just in case... > * Spurious interrupts should not be ACKed. > */ > v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); > if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) > ack_APIC_irq(); > > inc_irq_stat(irq_spurious_count); > > /* see sw-dev-man vol 3, chapter 7.4.13.5 */ > pr_info("spurious APIC interrupt on CPU#%d, " > "should never happen.\n", smp_processor_id()); > > } > > void smp_spurious_interrupt(struct pt_regs *regs) { > entering_irq(); > __smp_spurious_interrupt(regs); > exiting_irq(); > } > > void smp_trace_spurious_interrupt(struct pt_regs *regs) { > entering_irq(); > trace_spurious_apic_enter(SPURIOUS_APIC_VECTOR); > __smp_spurious_interrupt(regs); > trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR); > exiting_irq(); > } > > This is so much more readable (and maintainable) than using those ugly macros. > > -- Steve > > > > > int setup_profiling_timer(unsigned int multiplier) { @@ -1908,71 > > +1918,91 @@ int __init APIC_init_uniprocessor(void) > > /* > > * This interrupt should _never_ happen with our APIC/SMP architecture > > */ > > -void smp_spurious_interrupt(struct pt_regs *regs) -{ > > - u32 v; > > - > > - irq_enter(); > > - exit_idle(); > > - /* > > - * Check if this really is a spurious interrupt and ACK it > > - * if it is a vectored one. Just in case... > > - * Spurious interrupts should not be ACKed. > > - */ > > - v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); > > - if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) > > - ack_APIC_irq(); > > - > > - inc_irq_stat(irq_spurious_count); > > - > > - /* see sw-dev-man vol 3, chapter 7.4.13.5 */ > > - pr_info("spurious APIC interrupt on CPU#%d, " > > - "should never happen.\n", smp_processor_id()); > > - irq_exit(); > > -} > > +#define SMP_SPURIOUS_INTERRUPT(trace, trace_enter, trace_exit) \ > > +void smp_##trace##spurious_interrupt(struct pt_regs *regs) \ > > +{ \ > > + u32 v; \ > > + \ > > + irq_enter(); \ > > + exit_idle(); \ > > + trace_enter; \ > > + /* \ > > + * Check if this really is a spurious interrupt and ACK it \ > > + * if it is a vectored one. Just in case... \ > > + * Spurious interrupts should not be ACKed. \ > > + */ \ > > + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));\ > > + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) \ > > + ack_APIC_irq(); \ > > + \ > > + inc_irq_stat(irq_spurious_count); \ > > + \ > > + /* see sw-dev-man vol 3, chapter 7.4.13.5 */ \ > > + pr_info("spurious APIC interrupt on CPU#%d, " \ > > + "should never happen.\n", smp_processor_id()); \ > > + trace_exit; \ > > + irq_exit(); \ > > +} > > + > > +SMP_SPURIOUS_INTERRUPT(,,) > > +SMP_SPURIOUS_INTERRUPT(trace_, trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR), > > + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR)) > > > > /* > > * This interrupt should never happen with our APIC/SMP architecture > > */ > > -void smp_error_interrupt(struct pt_regs *regs) -{ > > - u32 v0, v1; > > - u32 i = 0; > > - static const char * const error_interrupt_reason[] = { > > - "Send CS error", /* APIC Error Bit 0 */ > > - "Receive CS error", /* APIC Error Bit 1 */ > > - "Send accept error", /* APIC Error Bit 2 */ > > - "Receive accept error", /* APIC Error Bit 3 */ > > - "Redirectable IPI", /* APIC Error Bit 4 */ > > - "Send illegal vector", /* APIC Error Bit 5 */ > > - "Received illegal vector", /* APIC Error Bit 6 */ > > - "Illegal register address", /* APIC Error Bit 7 */ > > - }; > > - > > - irq_enter(); > > - exit_idle(); > > - /* First tickle the hardware, only then report what went on. -- REW */ > > - v0 = apic_read(APIC_ESR); > > - apic_write(APIC_ESR, 0); > > - v1 = apic_read(APIC_ESR); > > - ack_APIC_irq(); > > - atomic_inc(&irq_err_count); > > - > > - apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)", > > - smp_processor_id(), v0 , v1); > > - > > - v1 = v1 & 0xff; > > - while (v1) { > > - if (v1 & 0x1) > > - apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]); > > - i++; > > - v1 >>= 1; > > - } > > - > > - apic_printk(APIC_DEBUG, KERN_CONT "\n"); > > - > > - irq_exit(); > > -} > |
|
From: Steven R. <ro...@go...> - 2012-12-18 04:49:10
|
On Tue, 2012-12-18 at 01:34 +0000, Seiji Aguchi wrote: > Change log > > v5 -> v6 > - Rebased to 3.7 > > v4 -> v5 > - Rebased to 3.6.0 > > - Introduce a logic switching IDT at enabling/disabling TP time > so that a time penalty makes a zero when tracepoints are disabled. > This IDT is created only when CONFIG_TRACEPOINTS is enabled. > > - Remove arch_irq_vector_entry/exit and add followings again > so that we can add each tracepoint in a generic way. > - error_apic_vector > - thermal_apic_vector > - threshold_apic_vector > - spurious_apic_vector > - x86_platform_ipi_vector > > - Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching > IDT first. > > - Move irq_vectors.h in the directory of arch/x86/include/asm/trace because > I'm not sure if a logic switching IDT is sharable with other architectures. > > v3 -> v4 > - Add a latency measurement of each tracepoint > - Rebased to 3.6-rc6 > > v2 -> v3 > - Remove an invalidate_tlb_vector event because it was replaced by a call function vector > in a following commit. > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4 > > v1 -> v2 > - Modify variable name from irq to vector. > - Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit. > - error_apic_vector > - thermal_apic_vector > - threshold_apic_vector > - spurious_apic_vector > - x86_platform_ipi_vector > > [Purpose of this patch] > > As Vaibhav explained in the thread below, tracepoints for irq vectors > are useful. > > http://www.spinics.net/lists/mm-commits/msg85707.html > > <snip> > The current interrupt traces from irq_handler_entry and irq_handler_exit > provide when an interrupt is handled. They provide good data about when > the system has switched to kernel space and how it affects the currently > running processes. > > There are some IRQ vectors which trigger the system into kernel space, > which are not handled in generic IRQ handlers. Tracing such events gives > us the information about IRQ interaction with other system events. > > The trace also tells where the system is spending its time. We want to > know which cores are handling interrupts and how they are affecting other > processes in the system. Also, the trace provides information about when > the cores are idle and which interrupts are changing that state. > <snip> > > On the other hand, my usecase is tracing just local timer event and > getting a value of instruction pointer. > > I suggested to add an argument local timer event to get instruction pointer before. > But there is another way to get it with external module like systemtap. > So, I don't need to add any argument to irq vector tracepoints now. > > [Patch Description] > > Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. > But there is an above use case to trace specific irq_vector rather than tracing all events. > In this case, we are concerned about overhead due to unwanted events. > > This patch adds following tracepoints instead of introducing irq_vector_entry/exit. > so that we can enable them independently. > - local_timer_vector > - reschedule_vector > - call_function_vector > - call_function_single_vector > - irq_work_entry_vector > - error_apic_vector > - thermal_apic_vector > - threshold_apic_vector > - spurious_apic_vector > - x86_platform_ipi_vector > > Also, it introduces a logic switching IDT at enabling/disabling time so that a time penalty makes > a complete zero when tracepoints are disabled. Detailed explanations are as follows. > - Create new irq handlers inserted tracepoints by using macros. > - Create a new IDT, trace_idt_table, at boot time by duplicating original IDT, idt table, and > registering the new handers for tracpoints. > - Switch IDT to new one at enabling TP time. > - Restore to an original IDT at disabling TP time. > The new IDT is created only when CONFIG_TRACEPOINTS is enabled to avoid being used for other purposes. > > Signed-off-by: Seiji Aguchi <sei...@hd...> > --- > arch/x86/include/asm/desc.h | 27 +++++ > arch/x86/include/asm/entry_arch.h | 32 +++++ > arch/x86/include/asm/hw_irq.h | 14 +++ > arch/x86/kernel/Makefile | 1 + > arch/x86/kernel/apic/apic.c | 186 +++++++++++++++++------------- > arch/x86/kernel/cpu/mcheck/therm_throt.c | 26 +++-- > arch/x86/kernel/cpu/mcheck/threshold.c | 27 +++-- > arch/x86/kernel/entry_64.S | 33 ++++++ > arch/x86/kernel/head_64.S | 6 + > arch/x86/kernel/irq.c | 44 ++++--- > arch/x86/kernel/irq_work.c | 22 +++- > arch/x86/kernel/irqinit.c | 2 + > arch/x86/kernel/smp.c | 68 ++++++++---- > 13 files changed, 345 insertions(+), 143 deletions(-) > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index 8bf1c06..52becf4 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -345,6 +345,33 @@ static inline void set_intr_gate(unsigned int n, void *addr) > _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); > } > > +#ifdef CONFIG_TRACEPOINTS > +extern gate_desc trace_idt_table[]; > +extern void trace_idt_table_init(void); > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > + unsigned dpl, unsigned ist, unsigned seg) > +{ > + gate_desc s; > + > + pack_gate(&s, type, (unsigned long)addr, dpl, ist, seg); > + /* > + * does not need to be atomic because it is only done once at > + * setup time > + */ > + write_idt_entry(trace_idt_table, gate, &s); > +} > + > +static inline void trace_set_intr_gate(unsigned int n, void *addr) > +{ > + BUG_ON((unsigned)n > 0xFF); > + _trace_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); > +} > +#else > +static inline void trace_idt_table_init(void) > +{ > +} > +#endif > + > extern int first_system_vector; > /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ > extern unsigned long used_vectors[]; > diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h > index 40afa00..8ef3900 100644 > --- a/arch/x86/include/asm/entry_arch.h > +++ b/arch/x86/include/asm/entry_arch.h > @@ -45,3 +45,35 @@ BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR) > #endif > > #endif > + > +#ifdef CONFIG_TRACEPOINTS > +#ifdef CONFIG_SMP > +BUILD_INTERRUPT(trace_reschedule_interrupt, RESCHEDULE_VECTOR) > +BUILD_INTERRUPT(trace_call_function_interrupt, CALL_FUNCTION_VECTOR) > +BUILD_INTERRUPT(trace_call_function_single_interrupt, > + CALL_FUNCTION_SINGLE_VECTOR) > +#endif > + > +BUILD_INTERRUPT(trace_x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) > + > +#ifdef CONFIG_X86_LOCAL_APIC > + > +BUILD_INTERRUPT(trace_apic_timer_interrupt, LOCAL_TIMER_VECTOR) > +BUILD_INTERRUPT(trace_error_interrupt, ERROR_APIC_VECTOR) > +BUILD_INTERRUPT(trace_spurious_interrupt, SPURIOUS_APIC_VECTOR) > + > +#ifdef CONFIG_IRQ_WORK > +BUILD_INTERRUPT(trace_irq_work_interrupt, IRQ_WORK_VECTOR) > +#endif > + > +#ifdef CONFIG_X86_THERMAL_VECTOR > +BUILD_INTERRUPT(trace_thermal_interrupt, THERMAL_APIC_VECTOR) > +#endif > + > +#ifdef CONFIG_X86_MCE_THRESHOLD > +BUILD_INTERRUPT(trace_threshold_interrupt, THRESHOLD_APIC_VECTOR) > +#endif > + > +#endif Um, you could save all the above duplication if you just did: in arch/x86/kernel/entry_32.S #ifdef CONFIG_TRACING #define BUILD_TRACE_INTERRUPT(name, nr) \ BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name) #else #define BUILD_TRACE_INTERRUPT(name, nr) #endif #define BUILD_INTERRUPT(name, nr) \ BUILD_INTERRUPT3(name, nr, smp_##name) \ BUILD_TRACE_INTERRUPT(name, nr) #include <asm/entry_arch.h> > + > +#endif /* CONFIG_TRACEPOINTS */ > diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h > index eb92a6e..4472a78 100644 > --- a/arch/x86/include/asm/hw_irq.h > +++ b/arch/x86/include/asm/hw_irq.h > @@ -76,6 +76,20 @@ extern void threshold_interrupt(void); > extern void call_function_interrupt(void); > extern void call_function_single_interrupt(void); > > +#ifdef CONFIG_TRACEPOINTS > +/* Interrupt handlers registered during init_IRQ */ > +extern void trace_apic_timer_interrupt(void); > +extern void trace_x86_platform_ipi(void); > +extern void trace_error_interrupt(void); > +extern void trace_irq_work_interrupt(void); > +extern void trace_spurious_interrupt(void); > +extern void trace_thermal_interrupt(void); > +extern void trace_reschedule_interrupt(void); > +extern void trace_threshold_interrupt(void); > +extern void trace_call_function_interrupt(void); > +extern void trace_call_function_single_interrupt(void); > +#endif /* CONFIG_TRACEPOINTS */ > + > /* IOAPIC */ > #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs)) > extern unsigned long io_apic_irqs; > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index 34e923a..33504ea 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -100,6 +100,7 @@ obj-$(CONFIG_OF) += devicetree.o > obj-$(CONFIG_UPROBES) += uprobes.o > > obj-$(CONFIG_PERF_EVENTS) += perf_regs.o > +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o > > ### > # 64 bit specific files > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c > index b994cc8..20c4b1f 100644 > --- a/arch/x86/kernel/apic/apic.c > +++ b/arch/x86/kernel/apic/apic.c > @@ -55,6 +55,9 @@ > #include <asm/tsc.h> > #include <asm/hypervisor.h> > > +#define CREATE_TRACE_POINTS > +#include <asm/trace/irq_vectors.h> > + > unsigned int num_processors; > > unsigned disabled_cpus __cpuinitdata; > @@ -912,27 +915,34 @@ static void local_apic_timer_interrupt(void) > * [ if a single-CPU system runs an SMP kernel then we call the local > * interrupt as well. Thus we cannot inline the local irq ... ] > */ > -void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) > -{ > - struct pt_regs *old_regs = set_irq_regs(regs); > - > - /* > - * NOTE! We'd better ACK the irq immediately, > - * because timer handling can be slow. > - */ > - ack_APIC_irq(); > - /* > - * update_process_times() expects us to have done irq_enter(). > - * Besides, if we don't timer interrupts ignore the global > - * interrupt lock, which is the WrongThing (tm) to do. > - */ > - irq_enter(); > - exit_idle(); > - local_apic_timer_interrupt(); > - irq_exit(); > - > - set_irq_regs(old_regs); > -} > +#define SMP_APIC_TIMER_INTERRUPT(trace, trace_enter, trace_exit) \ > +void __irq_entry smp_##trace##apic_timer_interrupt(struct pt_regs *regs)\ > +{ \ > + struct pt_regs *old_regs = set_irq_regs(regs); \ > + \ > + /* \ > + * NOTE! We'd better ACK the irq immediately, \ > + * because timer handling can be slow. \ > + */ \ > + ack_APIC_irq(); \ > + /* \ > + * update_process_times() expects us to have done irq_enter(). \ > + * Besides, if we don't timer interrupts ignore the global \ > + * interrupt lock, which is the WrongThing (tm) to do. \ > + */ \ > + irq_enter(); \ > + exit_idle(); \ > + trace_enter; \ > + local_apic_timer_interrupt(); \ > + trace_exit; \ > + irq_exit(); \ > + \ > + set_irq_regs(old_regs); \ > +} > + > +SMP_APIC_TIMER_INTERRUPT(,,) > +SMP_APIC_TIMER_INTERRUPT(trace_, trace_local_timer_entry(LOCAL_TIMER_VECTOR), > + trace_local_timer_exit(LOCAL_TIMER_VECTOR)) These macros are just plan ugly as all hell. What about using static inlines and something like: static inline void entering_irq(void) { irq_enter(); exit_idle(); } static inline void exiting_irq(void) { irq_exit(); } -- Just duplicate the two apic timer interrupts because it has the needed ack first. Then the rest could be: static inline void __smp_spurious_interrupt(struct pt_regs *regs) { u32 v; /* * Check if this really is a spurious interrupt and ACK it * if it is a vectored one. Just in case... * Spurious interrupts should not be ACKed. */ v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) ack_APIC_irq(); inc_irq_stat(irq_spurious_count); /* see sw-dev-man vol 3, chapter 7.4.13.5 */ pr_info("spurious APIC interrupt on CPU#%d, " "should never happen.\n", smp_processor_id()); } void smp_spurious_interrupt(struct pt_regs *regs) { entering_irq(); __smp_spurious_interrupt(regs); exiting_irq(); } void smp_trace_spurious_interrupt(struct pt_regs *regs) { entering_irq(); trace_spurious_apic_enter(SPURIOUS_APIC_VECTOR); __smp_spurious_interrupt(regs); trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR); exiting_irq(); } This is so much more readable (and maintainable) than using those ugly macros. -- Steve > > int setup_profiling_timer(unsigned int multiplier) > { > @@ -1908,71 +1918,91 @@ int __init APIC_init_uniprocessor(void) > /* > * This interrupt should _never_ happen with our APIC/SMP architecture > */ > -void smp_spurious_interrupt(struct pt_regs *regs) > -{ > - u32 v; > - > - irq_enter(); > - exit_idle(); > - /* > - * Check if this really is a spurious interrupt and ACK it > - * if it is a vectored one. Just in case... > - * Spurious interrupts should not be ACKed. > - */ > - v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); > - if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) > - ack_APIC_irq(); > - > - inc_irq_stat(irq_spurious_count); > - > - /* see sw-dev-man vol 3, chapter 7.4.13.5 */ > - pr_info("spurious APIC interrupt on CPU#%d, " > - "should never happen.\n", smp_processor_id()); > - irq_exit(); > -} > +#define SMP_SPURIOUS_INTERRUPT(trace, trace_enter, trace_exit) \ > +void smp_##trace##spurious_interrupt(struct pt_regs *regs) \ > +{ \ > + u32 v; \ > + \ > + irq_enter(); \ > + exit_idle(); \ > + trace_enter; \ > + /* \ > + * Check if this really is a spurious interrupt and ACK it \ > + * if it is a vectored one. Just in case... \ > + * Spurious interrupts should not be ACKed. \ > + */ \ > + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));\ > + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) \ > + ack_APIC_irq(); \ > + \ > + inc_irq_stat(irq_spurious_count); \ > + \ > + /* see sw-dev-man vol 3, chapter 7.4.13.5 */ \ > + pr_info("spurious APIC interrupt on CPU#%d, " \ > + "should never happen.\n", smp_processor_id()); \ > + trace_exit; \ > + irq_exit(); \ > +} > + > +SMP_SPURIOUS_INTERRUPT(,,) > +SMP_SPURIOUS_INTERRUPT(trace_, trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR), > + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR)) > > /* > * This interrupt should never happen with our APIC/SMP architecture > */ > -void smp_error_interrupt(struct pt_regs *regs) > -{ > - u32 v0, v1; > - u32 i = 0; > - static const char * const error_interrupt_reason[] = { > - "Send CS error", /* APIC Error Bit 0 */ > - "Receive CS error", /* APIC Error Bit 1 */ > - "Send accept error", /* APIC Error Bit 2 */ > - "Receive accept error", /* APIC Error Bit 3 */ > - "Redirectable IPI", /* APIC Error Bit 4 */ > - "Send illegal vector", /* APIC Error Bit 5 */ > - "Received illegal vector", /* APIC Error Bit 6 */ > - "Illegal register address", /* APIC Error Bit 7 */ > - }; > - > - irq_enter(); > - exit_idle(); > - /* First tickle the hardware, only then report what went on. -- REW */ > - v0 = apic_read(APIC_ESR); > - apic_write(APIC_ESR, 0); > - v1 = apic_read(APIC_ESR); > - ack_APIC_irq(); > - atomic_inc(&irq_err_count); > - > - apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)", > - smp_processor_id(), v0 , v1); > - > - v1 = v1 & 0xff; > - while (v1) { > - if (v1 & 0x1) > - apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]); > - i++; > - v1 >>= 1; > - } > - > - apic_printk(APIC_DEBUG, KERN_CONT "\n"); > - > - irq_exit(); > -} |
|
From: Seiji A. <sei...@hd...> - 2012-12-18 01:35:50
|
Change log v5 -> v6 - Rebased to 3.7 v4 -> v5 - Rebased to 3.6.0 - Introduce a logic switching IDT at enabling/disabling TP time so that a time penalty makes a zero when tracepoints are disabled. This IDT is created only when CONFIG_TRACEPOINTS is enabled. - Remove arch_irq_vector_entry/exit and add followings again so that we can add each tracepoint in a generic way. - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector - Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching IDT first. - Move irq_vectors.h in the directory of arch/x86/include/asm/trace because I'm not sure if a logic switching IDT is sharable with other architectures. v3 -> v4 - Add a latency measurement of each tracepoint - Rebased to 3.6-rc6 v2 -> v3 - Remove an invalidate_tlb_vector event because it was replaced by a call function vector in a following commit. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4 v1 -> v2 - Modify variable name from irq to vector. - Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit. - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector [Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. This patch adds following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, it introduces a logic switching IDT at enabling/disabling time so that a time penalty makes a complete zero when tracepoints are disabled. Detailed explanations are as follows. - Create new irq handlers inserted tracepoints by using macros. - Create a new IDT, trace_idt_table, at boot time by duplicating original IDT, idt table, and registering the new handers for tracpoints. - Switch IDT to new one at enabling TP time. - Restore to an original IDT at disabling TP time. The new IDT is created only when CONFIG_TRACEPOINTS is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <sei...@hd...> --- arch/x86/include/asm/desc.h | 27 +++++ arch/x86/include/asm/entry_arch.h | 32 +++++ arch/x86/include/asm/hw_irq.h | 14 +++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/apic/apic.c | 186 +++++++++++++++++------------- arch/x86/kernel/cpu/mcheck/therm_throt.c | 26 +++-- arch/x86/kernel/cpu/mcheck/threshold.c | 27 +++-- arch/x86/kernel/entry_64.S | 33 ++++++ arch/x86/kernel/head_64.S | 6 + arch/x86/kernel/irq.c | 44 ++++--- arch/x86/kernel/irq_work.c | 22 +++- arch/x86/kernel/irqinit.c | 2 + arch/x86/kernel/smp.c | 68 ++++++++---- 13 files changed, 345 insertions(+), 143 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 8bf1c06..52becf4 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -345,6 +345,33 @@ static inline void set_intr_gate(unsigned int n, void *addr) _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#ifdef CONFIG_TRACEPOINTS +extern gate_desc trace_idt_table[]; +extern void trace_idt_table_init(void); +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(&s, type, (unsigned long)addr, dpl, ist, seg); + /* + * does not need to be atomic because it is only done once at + * setup time + */ + write_idt_entry(trace_idt_table, gate, &s); +} + +static inline void trace_set_intr_gate(unsigned int n, void *addr) +{ + BUG_ON((unsigned)n > 0xFF); + _trace_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); +} +#else +static inline void trace_idt_table_init(void) +{ +} +#endif + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h index 40afa00..8ef3900 100644 --- a/arch/x86/include/asm/entry_arch.h +++ b/arch/x86/include/asm/entry_arch.h @@ -45,3 +45,35 @@ BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR) #endif #endif + +#ifdef CONFIG_TRACEPOINTS +#ifdef CONFIG_SMP +BUILD_INTERRUPT(trace_reschedule_interrupt, RESCHEDULE_VECTOR) +BUILD_INTERRUPT(trace_call_function_interrupt, CALL_FUNCTION_VECTOR) +BUILD_INTERRUPT(trace_call_function_single_interrupt, + CALL_FUNCTION_SINGLE_VECTOR) +#endif + +BUILD_INTERRUPT(trace_x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) + +#ifdef CONFIG_X86_LOCAL_APIC + +BUILD_INTERRUPT(trace_apic_timer_interrupt, LOCAL_TIMER_VECTOR) +BUILD_INTERRUPT(trace_error_interrupt, ERROR_APIC_VECTOR) +BUILD_INTERRUPT(trace_spurious_interrupt, SPURIOUS_APIC_VECTOR) + +#ifdef CONFIG_IRQ_WORK +BUILD_INTERRUPT(trace_irq_work_interrupt, IRQ_WORK_VECTOR) +#endif + +#ifdef CONFIG_X86_THERMAL_VECTOR +BUILD_INTERRUPT(trace_thermal_interrupt, THERMAL_APIC_VECTOR) +#endif + +#ifdef CONFIG_X86_MCE_THRESHOLD +BUILD_INTERRUPT(trace_threshold_interrupt, THRESHOLD_APIC_VECTOR) +#endif + +#endif + +#endif /* CONFIG_TRACEPOINTS */ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index eb92a6e..4472a78 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -76,6 +76,20 @@ extern void threshold_interrupt(void); extern void call_function_interrupt(void); extern void call_function_single_interrupt(void); +#ifdef CONFIG_TRACEPOINTS +/* Interrupt handlers registered during init_IRQ */ +extern void trace_apic_timer_interrupt(void); +extern void trace_x86_platform_ipi(void); +extern void trace_error_interrupt(void); +extern void trace_irq_work_interrupt(void); +extern void trace_spurious_interrupt(void); +extern void trace_thermal_interrupt(void); +extern void trace_reschedule_interrupt(void); +extern void trace_threshold_interrupt(void); +extern void trace_call_function_interrupt(void); +extern void trace_call_function_single_interrupt(void); +#endif /* CONFIG_TRACEPOINTS */ + /* IOAPIC */ #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs)) extern unsigned long io_apic_irqs; diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 34e923a..33504ea 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -100,6 +100,7 @@ obj-$(CONFIG_OF) += devicetree.o obj-$(CONFIG_UPROBES) += uprobes.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o ### # 64 bit specific files diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b994cc8..20c4b1f 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -55,6 +55,9 @@ #include <asm/tsc.h> #include <asm/hypervisor.h> +#define CREATE_TRACE_POINTS +#include <asm/trace/irq_vectors.h> + unsigned int num_processors; unsigned disabled_cpus __cpuinitdata; @@ -912,27 +915,34 @@ static void local_apic_timer_interrupt(void) * [ if a single-CPU system runs an SMP kernel then we call the local * interrupt as well. Thus we cannot inline the local irq ... ] */ -void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) -{ - struct pt_regs *old_regs = set_irq_regs(regs); - - /* - * NOTE! We'd better ACK the irq immediately, - * because timer handling can be slow. - */ - ack_APIC_irq(); - /* - * update_process_times() expects us to have done irq_enter(). - * Besides, if we don't timer interrupts ignore the global - * interrupt lock, which is the WrongThing (tm) to do. - */ - irq_enter(); - exit_idle(); - local_apic_timer_interrupt(); - irq_exit(); - - set_irq_regs(old_regs); -} +#define SMP_APIC_TIMER_INTERRUPT(trace, trace_enter, trace_exit) \ +void __irq_entry smp_##trace##apic_timer_interrupt(struct pt_regs *regs)\ +{ \ + struct pt_regs *old_regs = set_irq_regs(regs); \ + \ + /* \ + * NOTE! We'd better ACK the irq immediately, \ + * because timer handling can be slow. \ + */ \ + ack_APIC_irq(); \ + /* \ + * update_process_times() expects us to have done irq_enter(). \ + * Besides, if we don't timer interrupts ignore the global \ + * interrupt lock, which is the WrongThing (tm) to do. \ + */ \ + irq_enter(); \ + exit_idle(); \ + trace_enter; \ + local_apic_timer_interrupt(); \ + trace_exit; \ + irq_exit(); \ + \ + set_irq_regs(old_regs); \ +} + +SMP_APIC_TIMER_INTERRUPT(,,) +SMP_APIC_TIMER_INTERRUPT(trace_, trace_local_timer_entry(LOCAL_TIMER_VECTOR), + trace_local_timer_exit(LOCAL_TIMER_VECTOR)) int setup_profiling_timer(unsigned int multiplier) { @@ -1908,71 +1918,91 @@ int __init APIC_init_uniprocessor(void) /* * This interrupt should _never_ happen with our APIC/SMP architecture */ -void smp_spurious_interrupt(struct pt_regs *regs) -{ - u32 v; - - irq_enter(); - exit_idle(); - /* - * Check if this really is a spurious interrupt and ACK it - * if it is a vectored one. Just in case... - * Spurious interrupts should not be ACKed. - */ - v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); - if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) - ack_APIC_irq(); - - inc_irq_stat(irq_spurious_count); - - /* see sw-dev-man vol 3, chapter 7.4.13.5 */ - pr_info("spurious APIC interrupt on CPU#%d, " - "should never happen.\n", smp_processor_id()); - irq_exit(); -} +#define SMP_SPURIOUS_INTERRUPT(trace, trace_enter, trace_exit) \ +void smp_##trace##spurious_interrupt(struct pt_regs *regs) \ +{ \ + u32 v; \ + \ + irq_enter(); \ + exit_idle(); \ + trace_enter; \ + /* \ + * Check if this really is a spurious interrupt and ACK it \ + * if it is a vectored one. Just in case... \ + * Spurious interrupts should not be ACKed. \ + */ \ + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));\ + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) \ + ack_APIC_irq(); \ + \ + inc_irq_stat(irq_spurious_count); \ + \ + /* see sw-dev-man vol 3, chapter 7.4.13.5 */ \ + pr_info("spurious APIC interrupt on CPU#%d, " \ + "should never happen.\n", smp_processor_id()); \ + trace_exit; \ + irq_exit(); \ +} + +SMP_SPURIOUS_INTERRUPT(,,) +SMP_SPURIOUS_INTERRUPT(trace_, trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR), + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR)) /* * This interrupt should never happen with our APIC/SMP architecture */ -void smp_error_interrupt(struct pt_regs *regs) -{ - u32 v0, v1; - u32 i = 0; - static const char * const error_interrupt_reason[] = { - "Send CS error", /* APIC Error Bit 0 */ - "Receive CS error", /* APIC Error Bit 1 */ - "Send accept error", /* APIC Error Bit 2 */ - "Receive accept error", /* APIC Error Bit 3 */ - "Redirectable IPI", /* APIC Error Bit 4 */ - "Send illegal vector", /* APIC Error Bit 5 */ - "Received illegal vector", /* APIC Error Bit 6 */ - "Illegal register address", /* APIC Error Bit 7 */ - }; - - irq_enter(); - exit_idle(); - /* First tickle the hardware, only then report what went on. -- REW */ - v0 = apic_read(APIC_ESR); - apic_write(APIC_ESR, 0); - v1 = apic_read(APIC_ESR); - ack_APIC_irq(); - atomic_inc(&irq_err_count); - - apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)", - smp_processor_id(), v0 , v1); - - v1 = v1 & 0xff; - while (v1) { - if (v1 & 0x1) - apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]); - i++; - v1 >>= 1; - } - - apic_printk(APIC_DEBUG, KERN_CONT "\n"); - - irq_exit(); -} +#define SMP_ERROR_INTERRUPT(trace, trace_enter, trace_exit) \ +void smp_##trace##error_interrupt(struct pt_regs *regs) \ +{ \ + u32 v0, v1; \ + u32 i = 0; \ + static const char * const error_interrupt_reason[] = { \ + "Send CS error", /* APIC Error Bit 0 */ \ + "Receive CS error", /* APIC Error Bit 1 */ \ + "Send accept error", /* APIC Error Bit 2 */ \ + "Receive accept error", /* APIC Error Bit 3 */ \ + "Redirectable IPI", /* APIC Error Bit 4 */ \ + "Send illegal vector", /* APIC Error Bit 5 */ \ + "Received illegal vector", /* APIC Error Bit 6 */ \ + "Illegal register address", /* APIC Error Bit 7 */ \ + }; \ + \ + irq_enter(); \ + exit_idle(); \ + trace_enter; \ + /* \ + * First tickle the hardware, only then report what went on. \ + * -- REW \ + */ \ + v0 = apic_read(APIC_ESR); \ + apic_write(APIC_ESR, 0); \ + v1 = apic_read(APIC_ESR); \ + ack_APIC_irq(); \ + atomic_inc(&irq_err_count); \ + \ + apic_printk(APIC_DEBUG, \ + KERN_DEBUG "APIC error on CPU%d: %02x(%02x)", \ + smp_processor_id(), v0 , v1); \ + \ + v1 = v1 & 0xff; \ + while (v1) { \ + if (v1 & 0x1) \ + apic_printk(APIC_DEBUG, KERN_CONT " : %s", \ + error_interrupt_reason[i]); \ + i++; \ + v1 >>= 1; \ + } \ + \ + apic_printk(APIC_DEBUG, KERN_CONT "\n"); \ + \ + trace_exit; \ + irq_exit(); \ +} + + +SMP_ERROR_INTERRUPT(,,) +SMP_ERROR_INTERRUPT(trace_, trace_error_apic_entry(ERROR_APIC_VECTOR), + trace_error_apic_exit(ERROR_APIC_VECTOR)) /** * connect_bsp_APIC - attach the APIC to the interrupt system diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index 47a1870..a1c86ab 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -23,6 +23,7 @@ #include <linux/init.h> #include <linux/smp.h> #include <linux/cpu.h> +#include <asm/trace/irq_vectors.h> #include <asm/processor.h> #include <asm/apic.h> @@ -378,17 +379,24 @@ static void unexpected_thermal_interrupt(void) static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt; -asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) -{ - irq_enter(); - exit_idle(); - inc_irq_stat(irq_thermal_count); - smp_thermal_vector(); - irq_exit(); - /* Ack only at the end to avoid potential reentry */ - ack_APIC_irq(); +#define SMP_THERMAL_INTERRUPT(trace, trace_enter, trace_exit) \ +asmlinkage void smp_##trace##thermal_interrupt(struct pt_regs *regs) \ +{ \ + irq_enter(); \ + exit_idle(); \ + trace_enter; \ + inc_irq_stat(irq_thermal_count); \ + smp_thermal_vector(); \ + trace_exit; \ + irq_exit(); \ + /* Ack only at the end to avoid potential reentry */ \ + ack_APIC_irq(); \ } +SMP_THERMAL_INTERRUPT(,,) +SMP_THERMAL_INTERRUPT(trace_, trace_thermal_apic_entry(THERMAL_APIC_VECTOR), + trace_thermal_apic_exit(THERMAL_APIC_VECTOR)) + /* Thermal monitoring depends on APIC, ACPI and clock modulation */ static int intel_thermal_supported(struct cpuinfo_x86 *c) { diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c index aa578ca..b7a95c5 100644 --- a/arch/x86/kernel/cpu/mcheck/threshold.c +++ b/arch/x86/kernel/cpu/mcheck/threshold.c @@ -4,6 +4,7 @@ #include <linux/interrupt.h> #include <linux/kernel.h> +#include <asm/trace/irq_vectors.h> #include <asm/irq_vectors.h> #include <asm/apic.h> #include <asm/idle.h> @@ -17,13 +18,21 @@ static void default_threshold_interrupt(void) void (*mce_threshold_vector)(void) = default_threshold_interrupt; -asmlinkage void smp_threshold_interrupt(void) -{ - irq_enter(); - exit_idle(); - inc_irq_stat(irq_threshold_count); - mce_threshold_vector(); - irq_exit(); - /* Ack only at the end to avoid potential reentry */ - ack_APIC_irq(); +#define SMP_THRESHOLD_INTERRUPT(trace, trace_enter, trace_exit) \ +asmlinkage void smp_##trace##threshold_interrupt(void) \ +{ \ + irq_enter(); \ + exit_idle(); \ + trace_enter; \ + inc_irq_stat(irq_threshold_count); \ + mce_threshold_vector(); \ + trace_exit; \ + irq_exit(); \ + /* Ack only at the end to avoid potential reentry */ \ + ack_APIC_irq(); \ } + +SMP_THRESHOLD_INTERRUPT(,,) +SMP_THRESHOLD_INTERRUPT(trace_, + trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR), + trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR)) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 70641af..f2e1a06 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1201,6 +1201,39 @@ apicinterrupt IRQ_WORK_VECTOR \ irq_work_interrupt smp_irq_work_interrupt #endif +#ifdef CONFIG_TRACEPOINTS + +apicinterrupt LOCAL_TIMER_VECTOR \ + trace_apic_timer_interrupt smp_trace_apic_timer_interrupt +apicinterrupt X86_PLATFORM_IPI_VECTOR \ + trace_x86_platform_ipi smp_trace_x86_platform_ipi + +apicinterrupt THRESHOLD_APIC_VECTOR \ + trace_threshold_interrupt smp_trace_threshold_interrupt +apicinterrupt THERMAL_APIC_VECTOR \ + trace_thermal_interrupt smp_trace_thermal_interrupt + +#ifdef CONFIG_SMP +apicinterrupt CALL_FUNCTION_SINGLE_VECTOR \ + trace_call_function_single_interrupt \ + smp_trace_call_function_single_interrupt +apicinterrupt CALL_FUNCTION_VECTOR \ + trace_call_function_interrupt smp_trace_call_function_interrupt +apicinterrupt RESCHEDULE_VECTOR \ + trace_reschedule_interrupt smp_trace_reschedule_interrupt +#endif + +apicinterrupt ERROR_APIC_VECTOR \ + trace_error_interrupt smp_trace_error_interrupt +apicinterrupt SPURIOUS_APIC_VECTOR \ + trace_spurious_interrupt smp_trace_spurious_interrupt + +#ifdef CONFIG_IRQ_WORK +apicinterrupt IRQ_WORK_VECTOR \ + trace_irq_work_interrupt smp_trace_irq_work_interrupt +#endif +#endif /* CONFIG_TRACEPOINTS */ + /* * Exception entry points. */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 980053c..fd0e7f9 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -471,6 +471,12 @@ ENTRY(idt_table) ENTRY(nmi_idt_table) .skip IDT_ENTRIES * 16 +#ifdef CONFIG_TRACEPOINTS + .align L1_CACHE_BYTES +ENTRY(trace_idt_table) + .skip IDT_ENTRIES * 16 +#endif + __PAGE_ALIGNED_BSS .align PAGE_SIZE ENTRY(empty_zero_page) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index e4595f1..9fd70ad 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -18,6 +18,8 @@ #include <asm/mce.h> #include <asm/hw_irq.h> +#include <asm/trace/irq_vectors.h> + atomic_t irq_err_count; /* Function pointer for generic interrupt vector handling */ @@ -208,26 +210,32 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs) /* * Handler for X86_PLATFORM_IPI_VECTOR. */ -void smp_x86_platform_ipi(struct pt_regs *regs) -{ - struct pt_regs *old_regs = set_irq_regs(regs); - - ack_APIC_irq(); - - irq_enter(); - - exit_idle(); - - inc_irq_stat(x86_platform_ipis); - - if (x86_platform_ipi_callback) - x86_platform_ipi_callback(); - - irq_exit(); - - set_irq_regs(old_regs); +#define SMP_X86_PLATFORM_IPI(trace, trace_enter, trace_exit) \ +void smp_##trace##x86_platform_ipi(struct pt_regs *regs) \ +{ \ + struct pt_regs *old_regs = set_irq_regs(regs); \ + \ + ack_APIC_irq(); \ + \ + irq_enter(); \ + \ + exit_idle(); \ + trace_enter; \ + inc_irq_stat(x86_platform_ipis); \ + \ + if (x86_platform_ipi_callback) \ + x86_platform_ipi_callback(); \ + trace_exit; \ + irq_exit(); \ + \ + set_irq_regs(old_regs); \ } +SMP_X86_PLATFORM_IPI(,,) +SMP_X86_PLATFORM_IPI(trace_, + trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR), + trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR)) + EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq); #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c index ca8f703..a669b94 100644 --- a/arch/x86/kernel/irq_work.c +++ b/arch/x86/kernel/irq_work.c @@ -8,16 +8,24 @@ #include <linux/irq_work.h> #include <linux/hardirq.h> #include <asm/apic.h> +#include <asm/trace/irq_vectors.h> -void smp_irq_work_interrupt(struct pt_regs *regs) -{ - irq_enter(); - ack_APIC_irq(); - inc_irq_stat(apic_irq_work_irqs); - irq_work_run(); - irq_exit(); +#define SMP_IRQ_WORK_INTERRUPT(trace, trace_enter, trace_exit) \ +void smp_##trace##irq_work_interrupt(struct pt_regs *regs) \ +{ \ + irq_enter(); \ + ack_APIC_irq(); \ + trace_enter; \ + inc_irq_stat(apic_irq_work_irqs); \ + irq_work_run(); \ + trace_exit; \ + irq_exit(); \ } +SMP_IRQ_WORK_INTERRUPT(,,) +SMP_IRQ_WORK_INTERRUPT(trace_, trace_irq_work_entry(IRQ_WORK_VECTOR), + trace_irq_work_exit(IRQ_WORK_VECTOR)) + void arch_irq_work_raise(void) { #ifdef CONFIG_X86_LOCAL_APIC diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index 6e03b0d..cf76128 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -251,4 +251,6 @@ void __init native_init_IRQ(void) irq_ctx_init(smp_processor_id()); #endif + + trace_idt_table_init(); } diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..d8e1a2c 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -23,6 +23,7 @@ #include <linux/interrupt.h> #include <linux/cpu.h> #include <linux/gfp.h> +#include <asm/trace/irq_vectors.h> #include <asm/mtrr.h> #include <asm/tlbflush.h> @@ -249,34 +250,57 @@ finish: /* * Reschedule call back. */ -void smp_reschedule_interrupt(struct pt_regs *regs) -{ - ack_APIC_irq(); - inc_irq_stat(irq_resched_count); - scheduler_ipi(); - /* - * KVM uses this interrupt to force a cpu out of guest mode - */ +#define SMP_RESCHEDULE_INTERRUPT(trace, trace_enter, trace_exit) \ +void smp_##trace##reschedule_interrupt(struct pt_regs *regs) \ +{ \ + ack_APIC_irq(); \ + trace_enter; \ + inc_irq_stat(irq_resched_count); \ + scheduler_ipi(); \ + trace_exit; \ + /* \ + * KVM uses this interrupt to force a cpu out of guest mode \ + */ \ } -void smp_call_function_interrupt(struct pt_regs *regs) -{ - ack_APIC_irq(); - irq_enter(); - generic_smp_call_function_interrupt(); - inc_irq_stat(irq_call_count); - irq_exit(); +SMP_RESCHEDULE_INTERRUPT(,,) +SMP_RESCHEDULE_INTERRUPT(trace_, trace_reschedule_entry(RESCHEDULE_VECTOR), + trace_reschedule_exit(RESCHEDULE_VECTOR)) + +#define SMP_CALL_FUNCTION_INTERRUPT(trace, trace_enter, trace_exit) \ +void smp_##trace##call_function_interrupt(struct pt_regs *regs) \ +{ \ + ack_APIC_irq(); \ + irq_enter(); \ + trace_enter; \ + generic_smp_call_function_interrupt(); \ + inc_irq_stat(irq_call_count); \ + trace_exit; \ + irq_exit(); \ } -void smp_call_function_single_interrupt(struct pt_regs *regs) -{ - ack_APIC_irq(); - irq_enter(); - generic_smp_call_function_single_interrupt(); - inc_irq_stat(irq_call_count); - irq_exit(); +SMP_CALL_FUNCTION_INTERRUPT(,,) +SMP_CALL_FUNCTION_INTERRUPT(trace_, + trace_call_function_entry(CALL_FUNCTION_VECTOR), + trace_call_function_exit(CALL_FUNCTION_VECTOR)) + +#define SMP_CALL_FUNCTION_SINGLE_INTERRUPT(trace, trace_enter, trace_exit)\ +void smp_##trace##call_function_single_interrupt(struct pt_regs *regs) \ +{ \ + ack_APIC_irq(); \ + irq_enter(); \ + trace_enter; \ + generic_smp_call_function_single_interrupt(); \ + inc_irq_stat(irq_call_count); \ + trace_exit; \ + irq_exit(); \ } +SMP_CALL_FUNCTION_SINGLE_INTERRUPT(,,) +SMP_CALL_FUNCTION_SINGLE_INTERRUPT(trace_, + trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR), + trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR)) + static int __init nonmi_ipi_setup(char *str) { smp_no_nmi_ipi = true; -- 1.7.1 |
|
From: Don Z. <dz...@re...> - 2012-12-17 21:45:28
|
On Mon, Dec 17, 2012 at 08:56:27PM +0000, Seiji Aguchi wrote: > Changelog v1 -> v2 > - Erase a logic checking the number of online cpus. > - Create a patchset to fix deadlocking issue in both pstore filesystem and > efi_pstore driver. > - Introduce a function, is_non_blocking_path(), to check if pstore > is in panic and emergency-restart paths (PATCH 1/2) > - Avoid efi_pstore_driver is blocked in non-blocking paths > such as nmi, panic and emergency-restart paths (PATCH 2/2) This patch set seems cleaner. Though it seems most of patch2 should have been in patch1 (except for the efi part). The only nitpick I have is the name 'is_non_blocking_path'. I don't know why, but the name doesn't seem exactly right to me. I was thinking something like 'pstore_cannot_block_path' or something. But it is just a nitpick, so it doesn't matter either way to me. Cheers, Don > > [Issue] > > There are some paths which shouldn't be blocked in kernel, like NMI, panic case after stopping cpus, > emergency-restart. > > On the other hand, current pstore avoids blocking in a NMI path but it may be blocked in other paths. > Also, an efi_pstore driver may be blocked in all of those paths because it simply takes a spin lock > at writing time. > > If they are blocked in those paths, the system will hang up and it has a big impact for users. > > Here is an example scenario which pstore is blocked in panic path. > > - cpuA grabs psinfo->buf_lock > - cpuB panics and calls smp_send_stop > - smp_send_stop sends IRQ to cpuA > after 1 second, cpuB gives up on cpuA and sends an NMI instead > - cpuA is now in an NMI handler while still holding buf_lock. > And then, cpuB is deadlocked by taking efi_pstore->lock again. > > This case may happen if a firmware has a bug and cpuA is stuck in it. > > [Solution] > > This patchset avoids that pstore and efi_pstore driver are blocked in the non-blocking paths > like NMI, panic, and emrgency-restart by introducing a function checking if they are in those paths. > Please see each patch for detailed explanations. > > Seiji Aguchi (2): > [PATCH v2 1/2] pstore: Avoid deadlock in panic and emergency-restart path > [PATCH v2 2/2] efi_pstore: Avoid deadlock in non-blocking paths > > drivers/firmware/efivars.c | 11 ++++++++++- > fs/pstore/platform.c | 34 ++++++++++++++++++++++++++++------ > include/linux/pstore.h | 6 ++++++ > 3 files changed, 44 insertions(+), 7 deletions(-) |
|
From: Seiji A. <sei...@hd...> - 2012-12-17 20:58:10
|
[Issue]
There is a scenario which efi_pstore may hang up:
- cpuA grabs efivars->lock
- cpuB panics and calls smp_send_stop
- smp_send_stop sends IRQ to cpuA
- after 1 second, cpuB gives up on cpuA and sends an NMI instead
- cpuA is now in an NMI handler while still holding efivars->lock
- cpuB is deadlocked
This case may happen if a firmware has a bug and
cpuA is stuck talking with it.
[Solution]
This patch changes a spin_lock to a spin_trylock in non-blocking paths.
and if the spin_lock has already taken by another cpu,
it returns without accessing to a firmware to avoid the deadlock.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
drivers/firmware/efivars.c | 11 ++++++++++-
fs/pstore/platform.c | 6 +++---
include/linux/pstore.h | 6 ++++++
3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 52c5d89..25d464e 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -1210,7 +1210,16 @@ static int efi_pstore_write(enum pstore_type_id type,
u64 storage_space, remaining_space, max_variable_size;
efi_status_t status = EFI_NOT_FOUND;
- spin_lock(&efivars->lock);
+ if (pstore_is_non_blocking_path(reason)) {
+ /*
+ * If the lock is taken by another cpu in non-blocking path,
+ * this driver returns without entering firmware to avoid
+ * hanging up.
+ */
+ if (!spin_trylock(&efivars->lock))
+ return -EBUSY;
+ } else
+ spin_lock(&efivars->lock);
/*
* Check if there is a space enough to log.
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index b87d1de..f959eba 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -96,7 +96,7 @@ static const char *get_reason_str(enum kmsg_dump_reason reason)
}
}
-static bool is_non_blocking_path(enum kmsg_dump_reason reason)
+bool pstore_is_non_blocking_path(enum kmsg_dump_reason reason)
{
/*
* In case of NMI path, pstore shouldn't be blocked
@@ -134,7 +134,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
why = get_reason_str(reason);
- if (is_non_blocking_path(reason)) {
+ if (pstore_is_non_blocking_path(reason)) {
is_locked = spin_trylock_irqsave(&psinfo->buf_lock, flags);
if (!is_locked) {
pr_err("pstore dump routine blocked in %s path, may corrupt error record\n"
@@ -165,7 +165,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
total += hsize + len;
part++;
}
- if (is_non_blocking_path(reason)) {
+ if (pstore_is_non_blocking_path(reason)) {
if (is_locked)
spin_unlock_irqrestore(&psinfo->buf_lock, flags);
} else
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 1788909..595a040 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -68,12 +68,18 @@ struct pstore_info {
#ifdef CONFIG_PSTORE
extern int pstore_register(struct pstore_info *);
+extern bool pstore_is_non_blocking_path(enum kmsg_dump_reason reason);
#else
static inline int
pstore_register(struct pstore_info *psi)
{
return -ENODEV;
}
+static inline bool
+pstore_is_non_blocking_path(enum kmsg_dump_reason reason)
+{
+ return false;
+}
#endif
#endif /*_LINUX_PSTORE_H*/
--
1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2012-12-17 20:57:30
|
[Issue]
When pstore is in panic and emergency-restart paths, it may be blocked
in those paths because it simply takes spin_lock.
This is an example scenario which pstore may hang up in a panic path:
- cpuA grabs psinfo->buf_lock
- cpuB panics and calls smp_send_stop
- smp_send_stop sends IRQ to cpuA
- after 1 second, cpuB gives up on cpuA and sends an NMI instead
- cpuA is now in an NMI handler while still holding buf_lock
- cpuB is deadlocked
This case may happen if a firmware has a bug and
cpuA is stuck talking with it more than one second.
Also, this is a similar scenario in an emergency-restart path:
- cpuA grabs psinfo->buf_lock and stucks in a firmware
- cpuB kicks emergency-restart via either sysrq-b or hangcheck timer.
And then, cpuB is deadlocked by taking psinfo->buf_lock again.
[Solution]
This patch avoids the deadlocking issues in both panic and emergency_restart
paths by introducing a function, is_non_blocking_path(), to check if a cpu
can be blocked in current path.
With this patch, pstore is not blocked even if another cpu has
taken a spin_lock, in those paths by changing from spin_lock_irqsave
to spin_trylock_irqsave.
In addition, according to a comment of emergency_restart() in kernel/sys.c,
spin_lock shouldn't be taken in an emergency_restart path to avoid
deadlock. This patch fits the comment below.
<snip>
/**
* emergency_restart - reboot the system
*
* Without shutting down any hardware or taking any locks
* reboot the system. This is called when we know we are in
* trouble so this is our best effort to reboot. This is
* safe to call in interrupt context.
*/
void emergency_restart(void)
<snip>
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
fs/pstore/platform.c | 34 ++++++++++++++++++++++++++++------
1 files changed, 28 insertions(+), 6 deletions(-)
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 5ea2e77..b87d1de 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -96,6 +96,26 @@ static const char *get_reason_str(enum kmsg_dump_reason reason)
}
}
+static bool is_non_blocking_path(enum kmsg_dump_reason reason)
+{
+ /*
+ * In case of NMI path, pstore shouldn't be blocked
+ * regardless of reason.
+ */
+ if (in_nmi())
+ return true;
+
+ switch (reason) {
+ /* In panic case, other cpus are stopped by smp_send_stop(). */
+ case KMSG_DUMP_PANIC:
+ /* Emergency restart shouldn't be blocked by spin lock. */
+ case KMSG_DUMP_EMERG:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* callback from kmsg_dump. (s2,l2) has the most recently
* written bytes, older bytes are in (s1,l1). Save as much
@@ -114,10 +134,12 @@ static void pstore_dump(struct kmsg_dumper *dumper,
why = get_reason_str(reason);
- if (in_nmi()) {
- is_locked = spin_trylock(&psinfo->buf_lock);
- if (!is_locked)
- pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
+ if (is_non_blocking_path(reason)) {
+ is_locked = spin_trylock_irqsave(&psinfo->buf_lock, flags);
+ if (!is_locked) {
+ pr_err("pstore dump routine blocked in %s path, may corrupt error record\n"
+ , in_nmi() ? "NMI" : why);
+ }
} else
spin_lock_irqsave(&psinfo->buf_lock, flags);
oopscount++;
@@ -143,9 +165,9 @@ static void pstore_dump(struct kmsg_dumper *dumper,
total += hsize + len;
part++;
}
- if (in_nmi()) {
+ if (is_non_blocking_path(reason)) {
if (is_locked)
- spin_unlock(&psinfo->buf_lock);
+ spin_unlock_irqrestore(&psinfo->buf_lock, flags);
} else
spin_unlock_irqrestore(&psinfo->buf_lock, flags);
}
--
1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2012-12-17 20:56:49
|
Changelog v1 -> v2
- Erase a logic checking the number of online cpus.
- Create a patchset to fix deadlocking issue in both pstore filesystem and
efi_pstore driver.
- Introduce a function, is_non_blocking_path(), to check if pstore
is in panic and emergency-restart paths (PATCH 1/2)
- Avoid efi_pstore_driver is blocked in non-blocking paths
such as nmi, panic and emergency-restart paths (PATCH 2/2)
[Issue]
There are some paths which shouldn't be blocked in kernel, like NMI, panic case after stopping cpus,
emergency-restart.
On the other hand, current pstore avoids blocking in a NMI path but it may be blocked in other paths.
Also, an efi_pstore driver may be blocked in all of those paths because it simply takes a spin lock
at writing time.
If they are blocked in those paths, the system will hang up and it has a big impact for users.
Here is an example scenario which pstore is blocked in panic path.
- cpuA grabs psinfo->buf_lock
- cpuB panics and calls smp_send_stop
- smp_send_stop sends IRQ to cpuA
after 1 second, cpuB gives up on cpuA and sends an NMI instead
- cpuA is now in an NMI handler while still holding buf_lock.
And then, cpuB is deadlocked by taking efi_pstore->lock again.
This case may happen if a firmware has a bug and cpuA is stuck in it.
[Solution]
This patchset avoids that pstore and efi_pstore driver are blocked in the non-blocking paths
like NMI, panic, and emrgency-restart by introducing a function checking if they are in those paths.
Please see each patch for detailed explanations.
Seiji Aguchi (2):
[PATCH v2 1/2] pstore: Avoid deadlock in panic and emergency-restart path
[PATCH v2 2/2] efi_pstore: Avoid deadlock in non-blocking paths
drivers/firmware/efivars.c | 11 ++++++++++-
fs/pstore/platform.c | 34 ++++++++++++++++++++++++++++------
include/linux/pstore.h | 6 ++++++
3 files changed, 44 insertions(+), 7 deletions(-)
|
|
From: <vj...@te...> - 2012-12-14 03:47:53
|
<HTML><HEAD> <META content="MSHTML 6.00.2900.5512" name=GENERATOR></HEAD> <BODY style="MARGIN: 0px"><SPAN style="FONT-SIZE: 9px; COLOR: rgb(0,0,0); FONT-FAMILY: Arial,Helvetica,Arial,sans-serif; TEXT-ALIGN: center"><FONT face=Verdana color=#0000ff size=2><SPAN style="COLOR: #87787d"> <P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt; COLOR: #827d7e" align=center><FONT size=1>dle-develop, si no podes visualizar este correo, podes hacerlo <A href="http://bit.ly/T8Vn2B">clickeando en este enlace.</A><BR>1800 Modelos De Contratos Editables Listos Para Usar !! | Documentos Legales Pre-Diseñados<BR></FONT><SPAN style="COLOR: #f72008"></P> <P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=center><STRONG><FONT face="Segoe UI"><A href="http://bit.ly/T8Vn2B"><IMG style="WIDTH: 636px; HEIGHT: 2946px" alt="1800 Modelos De Contratos Pre Diseñados Listos Para Usar" hspace=0 src="http://img10.imageshack.us/img10/715/cntr80p.jpg" align=baseline border=0></A><BR></SPAN></FONT></STRONG><FONT face="Segoe UI" size=3><STRONG style="COLOR: #0782f8"><FONT color=#7a6f90>UNICA OPORTUNIDAD A PRECIO PROMOCIONAL<BR> Para visitar la web y ver mas detalles haga</FONT> <A href="http://bit.ly/T8Vn2B">CLIC ACA</A></STRONG></FONT><BR><BR></P> <P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=center><FONT size=1>Este email tiene como unico destinatario dle-develop</FONT></P> <P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=center><FONT size=1>Para ser eliminado de nuestras listas envienos un email y en asunto aclarar REMOVER<BR></FONT></P> <P style="MARGIN-TOP: 0pt; MARGIN-BOTTOM: 0pt" align=center><FONT size=1><SPAN style="COLOR: #e0ffff">xxxbrbescpenigqamogt</SPAN><BR></FONT></SPAN></FONT></SPAN></P></BODY></HTML> |
|
From: Seiji A. <sei...@hd...> - 2012-12-11 00:06:54
|
> A boot argument might help - so we can force use of pstore in cases where kdump is failing (or prevent use of pstore in cases where it > seem to be preventing us getting to kdump ... I don't have a preference). BUT this would only be useful if we had a repeatable > problem so that we could switch to the other mode ... and it seems likely that the kinds of problems that cause pstore or kdump to fail > would be weird cases that are not very repeatable :-( I think it is helpful for reproducible problems because reasons of kdump failure vary widely. For example, users may forget to erase vmcores from a dump disk and kdump fails due to lack of space. Also, FC cable or other hardware may be broken. But, The person I have to convince is not you but kexec engineers... > > -Tony |
|
From: Seiji A. <sei...@hd...> - 2012-12-10 23:56:04
|
> > If we can fix it with a small patch in adance, it is really helpful for us. > > As I said in my email I just sent, it may not help you without testing it. > As there are probably other problems in that un-tested theoretical scenario. OK. I understood. > > > > 2) > > In the long term, I plan to add a kmsg_dump to a kexec path because kdump may fail in the real world. > > In that case, we need another troubleshooting material like pstore to detect a root cause of failure. > > But you are assuming that kmsg_dump is perfect and it isn't, in which case by putting kmsg_dump in the kdump path, you actually may > be blocking kdump from working. That is why I'm trying to fix problems of pstore in panic path right now. If kmsg_dump/pstore/platform drivers work well in panic path, they work in kdump path as well. Seiji |
|
From: Seiji A. <sei...@hd...> - 2012-12-10 23:52:34
|
> Now my first reaction would be, if that is the scenario, why couldn't cpuA release the lock within one second. Because if cpuA is stuck > talking with firmware, then your patch to force the unlock is probably going to trip over the same problems. > (those problems include dealing with resetting a firmware state machine) > > IOW, your patch is just one of many minefields you will have to cross in order to be successful. I agree that I need to fix the platform driver's deadlocking as well to make an overall system workable. (I just wanted to fix the deadlocking problems step by step.) > > Without any testing to show that you have cleared all those minefields, I am leaning against your patch for now. I would rather have a > complete patchset that fixes all the problems in this path. Thanks. I will try to make the complete patchset. > > If you did have to go this route, why not take the 'reason' variable and pass it to some function 'in_panic_path(reason)' that tells you if > you are panicing or not. Then you can change the 'if in_nmi()' to 'if in_nmi() || in_panic'. > > The panic path already disables irqs for you, not sure you need to do it again. Unless you have some other path you are trying to > protect in mind. Thank you for the suggestion. I will update my patch in accordance with your comment if anyone else disagree that. > > Cheers, > Don |
|
From: Luck, T. <ton...@in...> - 2012-12-10 18:19:23
|
> But you are assuming that kmsg_dump is perfect and it isn't, in which case > by putting kmsg_dump in the kdump path, you actually may be blocking kdump > from working. I think the concern is that kdump isn't perfect, so sometimes we don't get a good dump from it. In those cases it would have been nice to have a pstore log of the original problem. But ... I don't see an answer to this problem. Adding more code just increases the number of possible places we can fail (especially as we are executing in a state where we know that things are all messed up ... the first kernel panic'd because something bad happened that we didn't know how to fix). A boot argument might help - so we can force use of pstore in cases where kdump is failing (or prevent use of pstore in cases where it seem to be preventing us getting to kdump ... I don't have a preference). BUT this would only be useful if we had a repeatable problem so that we could switch to the other mode ... and it seems likely that the kinds of problems that cause pstore or kdump to fail would be weird cases that are not very repeatable :-( -Tony |
|
From: Don Z. <dz...@re...> - 2012-12-10 16:48:55
|
On Fri, Dec 07, 2012 at 11:43:03PM +0000, Seiji Aguchi wrote: > > Can all these things really happen (did you run into this problem on a real system?). Or is this just a theoretical problem. Ugly (but > > practical) hacks might be OK to solve real problems. > > It is a theoretical problem right now. > But it is a timing issue and there is a possibility to happen actually. > > > But do we really want them to fix problems that actually never happen? > > If we find a problem (even if it is theoretical), we can't say "It actually never happen.". > > I have some reasons to submit this patch before reproducing actually. > > 1) > It is too late if we fix a problem after it actually happened in case where we apply Linux, including pstore, > to mission critical systems, because the failure of those systems has a great impact on a whole society. > Customers in this area ask us to fix a problem as soon as possible. > On the other hand, this kind of timing issue is hard to reproduce. > So, our support service engineers often work all night to reproduce it. > It is a nightmare for us. > > If we can fix it with a small patch in adance, it is really helpful for us. As I said in my email I just sent, it may not help you without testing it. As there are probably other problems in that un-tested theoretical scenario. > > 2) > In the long term, I plan to add a kmsg_dump to a kexec path because kdump may fail in the real world. > In that case, we need another troubleshooting material like pstore to detect a root cause of failure. But you are assuming that kmsg_dump is perfect and it isn't, in which case by putting kmsg_dump in the kdump path, you actually may be blocking kdump from working. That is the biggest hold up for those guys from including it I believe. Cheers, Don |
|
From: Don Z. <dz...@re...> - 2012-12-10 16:42:44
|
On Fri, Dec 07, 2012 at 09:41:13PM +0000, Seiji Aguchi wrote: > [Issue] > > If one cpu ,which is taking a psinfo->buf_lock, > receive NMI from a panicked cpu via smp_send_stop(), > the panicked cpu hangs up in pstore_dump() called by kmsg_dump(KMSG_DUMP_PANIC) > because the psinfo->buf_lock is taken again in it. Hi Seiji, I am trying to understand this scenario. You said it is theoretical. Looking through smp_send_stop, it only sends an NMI if the IRQ did not work after a second. So the scenario you are looking at is: cpuA grabs psinfo->buf_lock cpuB panics and calls smp_send_stop smp_send_stop sends IRQ to cpuA after 1 second, cpuB gives up on cpuA and sends an NMI instead cpuA is now in an NMI handler while still holding buf_lock cpuB is deadlocked Now my first reaction would be, if that is the scenario, why couldn't cpuA release the lock within one second. Because if cpuA is stuck talking with firmware, then your patch to force the unlock is probably going to trip over the same problems. (those problems include dealing with resetting a firmware state machine) IOW, your patch is just one of many minefields you will have to cross in order to be successful. Without any testing to show that you have cleared all those minefields, I am leaning against your patch for now. I would rather have a complete patchset that fixes all the problems in this path. > > To avoid the deadlock, an easy solution is moving kmsg_dump above > smp_send_stop() in panic path. > > But, it is not safe to kick pstore while multiple cpus are running in panic case, > because they may touch corrupted data/variables and unnecessary failures may happen. > In that case, we can't guarantee that a panicked cpu can log messages reliably > because it may have harmful effects due to the failures. > > [Solution] > > This patch skips taking a psinfo->buf_lock when just one cpu is online > because stopped cpus turn to offline via smp_send_stop() > in some architectures like x86, powerpc or arm64. > > It may be a hack but solves my concern deadlocking in x86 architecture. If you did have to go this route, why not take the 'reason' variable and pass it to some function 'in_panic_path(reason)' that tells you if you are panicing or not. Then you can change the 'if in_nmi()' to 'if in_nmi() || in_panic'. The panic path already disables irqs for you, not sure you need to do it again. Unless you have some other path you are trying to protect in mind. Cheers, Don |