You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Seiji A. <sei...@hd...> - 2013-06-04 18:38:09
|
> > The point is that the IDT address itself should not be mutable state if > > it can be at all avoided. > > Hmm, maybe we can do it. Have two counters, a debug_idt_ctr and a > trace_idt_ctr, then have a function that basically does this: > > if (this_cpu_read(debug_idt_ctr)) > load_idt(&nmi_idt_descr); /* probably should rename to debug_idt_descr) */ > else if (trace_idt_ctr) > load_idt(&trace_idt_descr); > else > load_idt(&idt_descr); > > Then all modifications of the idt would call this function. I think it will work. I will make the patch. Seiji > > -- Steve > > |
|
From: Steven R. <ro...@go...> - 2013-06-04 18:32:51
|
On Tue, 2013-06-04 at 11:15 -0700, H. Peter Anvin wrote: > On 06/03/2013 04:53 PM, Steven Rostedt wrote: > > > > This way we wont be opening up any easy root holes where if a process > > finds a way to modify some arbitrary kernel memory, we can prevent it > > from modifying the current_idt_descr_ptr and have a nice way to exploit > > the IDT. Sure, one can argue that if they can modify arbitrary kernel > > memory, we may already be lost, but lets not make it easier for them > > than need be. > > > > I don't like current_idt_descr_ptr if we can avoid it. It is a direct > proxy for reading and writing the original IDT, in other words, it > really hasn't really addressed the issue. > > What I'm thinking we really should have is a function that returns the > IDT that we currently should be using, based on the current state. If > that state is, say, tracing on/off and NMI on/off, then that can be > indicated by bits in a state vector. The NMI on/off may be a bit trickier, as it is also a debug state as well. When we go into a nested debug or NMI state we use the same IDT. > > The point is that the IDT address itself should not be mutable state if > it can be at all avoided. Hmm, maybe we can do it. Have two counters, a debug_idt_ctr and a trace_idt_ctr, then have a function that basically does this: if (this_cpu_read(debug_idt_ctr)) load_idt(&nmi_idt_descr); /* probably should rename to debug_idt_descr) */ else if (trace_idt_ctr) load_idt(&trace_idt_descr); else load_idt(&idt_descr); Then all modifications of the idt would call this function. -- Steve |
|
From: H. P. A. <hp...@zy...> - 2013-06-04 18:17:32
|
On 06/03/2013 04:53 PM, Steven Rostedt wrote: > > This way we wont be opening up any easy root holes where if a process > finds a way to modify some arbitrary kernel memory, we can prevent it > from modifying the current_idt_descr_ptr and have a nice way to exploit > the IDT. Sure, one can argue that if they can modify arbitrary kernel > memory, we may already be lost, but lets not make it easier for them > than need be. > I don't like current_idt_descr_ptr if we can avoid it. It is a direct proxy for reading and writing the original IDT, in other words, it really hasn't really addressed the issue. What I'm thinking we really should have is a function that returns the IDT that we currently should be using, based on the current state. If that state is, say, tracing on/off and NMI on/off, then that can be indicated by bits in a state vector. The point is that the IDT address itself should not be mutable state if it can be at all avoided. -hpa |
|
From: Seiji A. <sei...@hd...> - 2013-06-04 15:51:28
|
> Yeah, I believe this does work. But you probably should add a comment
> like the following:
OK. I will add some comment above " extern atomic_long_t current_idt_descr_ptr;".
>
> /*
> * The current_idt_descr_ptr can only be set out of interrupt context
> * to avoid races.
I will introduce set_current_idt() as follows.
set_current_idt(unsigned long idt)
{
If (WARN_ON_ONCE(in_interrupt()))
return;
atomic_long_set(idt);
}
> * Once set, the load_current_idt() is called by interrupt
> * context either by NMI, debug, or via a smp_call_function(). That way
> * the IDT will always be set back to the expected descriptor.
> */
The important thing is not "called by interrupt context" but "called with interrupt disabled"
to avoid races.
Actually, load_current_idt() is called in process context in irq_vector_{reg/unreg}func().
In next patch, I will rewrite the comment.
> >
> > +extern atomic_long_t current_idt_descr_ptr;
> > +static inline void load_current_idt(void)
> > +{
> > + if (atomic_long_read(¤t_idt_descr_ptr))
>
> Also, we should probably add here:
> unsigned long new_idt = atomic_long_read(¤t_idt_descr_ptr);
>
> if (WARN_ON_ONCE(!validate_idt(new_idt))
> return;
> load_idt((const struct desc_ptr *)new_idt);
>
> > + load_idt((const struct desc_ptr *)
> > + atomic_long_read(¤t_idt_descr_ptr));
> > + else
> > + load_idt((const struct desc_ptr *)&idt_descr);
> > +}
> > +
>
> Then have
>
> bool validate_idt(unsigned long idt)
> {
> switch(idt) {
> case (unsigned long)&trace_idt_descr:
> case (unsigned long)&idt_descr:
> return 0;
> }
> return -1;
> }
>
> This way we wont be opening up any easy root holes where if a process
> finds a way to modify some arbitrary kernel memory, we can prevent it
> from modifying the current_idt_descr_ptr and have a nice way to exploit
> the IDT. Sure, one can argue that if they can modify arbitrary kernel
> memory, we may already be lost, but lets not make it easier for them
> than need be.
I will introduce the validate_idt() as above in a next patch.
Thanks.
Seiji
>
> -- Steve
>
|
|
From: Steven R. <ro...@go...> - 2013-06-03 23:53:10
|
On Mon, 2013-06-03 at 15:29 -0400, Seiji Aguchi wrote:
Yeah, I believe this does work. But you probably should add a comment
like the following:
/*
* The current_idt_descr_ptr can only be set out of interrupt context
* to avoid races. Once set, the load_current_idt() is called by
interrupt
* context either by NMI, debug, or via a smp_call_function(). That way
* the IDT will always be set back to the expected descriptor.
*/
>
> +extern atomic_long_t current_idt_descr_ptr;
> +static inline void load_current_idt(void)
> +{
> + if (atomic_long_read(¤t_idt_descr_ptr))
Also, we should probably add here:
unsigned long new_idt = atomic_long_read(¤t_idt_descr_ptr);
if (WARN_ON_ONCE(!validate_idt(new_idt))
return;
load_idt((const struct desc_ptr *)new_idt);
> + load_idt((const struct desc_ptr *)
> + atomic_long_read(¤t_idt_descr_ptr));
> + else
> + load_idt((const struct desc_ptr *)&idt_descr);
> +}
> +
Then have
bool validate_idt(unsigned long idt)
{
switch(idt) {
case (unsigned long)&trace_idt_descr:
case (unsigned long)&idt_descr:
return 0;
}
return -1;
}
This way we wont be opening up any easy root holes where if a process
finds a way to modify some arbitrary kernel memory, we can prevent it
from modifying the current_idt_descr_ptr and have a nice way to exploit
the IDT. Sure, one can argue that if they can modify arbitrary kernel
memory, we may already be lost, but lets not make it easier for them
than need be.
-- Steve
|
|
From: Seiji A. <sei...@hd...> - 2013-06-03 19:30:55
|
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. This patch adds following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, it introduces a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Create non-trace and trace irq handlers with entering_irq()/exiting_irq(). - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Registering the new handers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at regstering time of irq_vector handlers. - Switch IDT to new one at enabling TP time. - Restore to an original IDT at disabling TP time. The new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <sei...@hd...> --- arch/x86/include/asm/desc.h | 55 +++++++++++++++- arch/x86/include/asm/entry_arch.h | 8 ++- arch/x86/include/asm/hw_irq.h | 17 +++++ arch/x86/include/asm/mshyperv.h | 1 + arch/x86/include/asm/trace/irq_vectors.h | 104 ++++++++++++++++++++++++++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/apic/Makefile | 1 + arch/x86/kernel/apic/apic.c | 71 +++++++++++++++++---- arch/x86/kernel/cpu/common.c | 5 +- arch/x86/kernel/cpu/mcheck/therm_throt.c | 24 +++++-- arch/x86/kernel/cpu/mcheck/threshold.c | 24 +++++-- arch/x86/kernel/entry_32.S | 12 +++- arch/x86/kernel/entry_64.S | 31 +++++++-- arch/x86/kernel/head_64.S | 6 ++ arch/x86/kernel/irq.c | 31 ++++++--- arch/x86/kernel/irq_work.c | 24 ++++++- arch/x86/kernel/smp.c | 65 ++++++++++++++++-- arch/x86/kernel/tracepoint.c | 58 +++++++++++++++++ include/xen/events.h | 3 + 19 files changed, 482 insertions(+), 59 deletions(-) create mode 100644 arch/x86/include/asm/trace/irq_vectors.h create mode 100644 arch/x86/kernel/tracepoint.c diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 8bf1c06..400f0db 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -320,6 +320,19 @@ static inline void set_nmi_gate(int gate, void *addr) } #endif +#ifdef CONFIG_TRACING +extern struct desc_ptr trace_idt_descr; +extern gate_desc trace_idt_table[]; +static inline void write_trace_idt_entry(int entry, const gate_desc *gate) +{ + write_idt_entry(trace_idt_table, entry, gate); +} +#else +static inline void write_trace_idt_entry(int entry, const gate_desc *gate) +{ +} +#endif + static inline void _set_gate(int gate, unsigned type, void *addr, unsigned dpl, unsigned ist, unsigned seg) { @@ -331,6 +344,7 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * setup time */ write_idt_entry(idt_table, gate, &s); + write_trace_idt_entry(gate, &s); } /* @@ -360,12 +374,39 @@ static inline void alloc_system_vector(int vector) } } -static inline void alloc_intr_gate(unsigned int n, void *addr) +#ifdef CONFIG_TRACING +static inline void trace_set_intr_gate(unsigned int gate, void *addr) +{ + gate_desc s; + + pack_gate(&s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); + write_idt_entry(trace_idt_table, gate, &s); +} + +static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) +{ + trace_set_intr_gate(n, addr); +} +#else +static inline void trace_set_intr_gate(unsigned int gate, void *addr) +{ +} + +#define __trace_alloc_intr_gate(n, addr) +#endif + +static inline void __alloc_intr_gate(unsigned int n, void *addr) { - alloc_system_vector(n); set_intr_gate(n, addr); } +#define alloc_intr_gate(n, addr) \ + do { \ + alloc_system_vector(n); \ + __alloc_intr_gate(n, addr); \ + __trace_alloc_intr_gate(n, trace_##addr); \ + } while (0) + /* * This routine sets up an interrupt gate at directory privilege level 3. */ @@ -405,4 +446,14 @@ static inline void set_system_intr_gate_ist(int n, void *addr, unsigned ist) _set_gate(n, GATE_INTERRUPT, addr, 0x3, ist, __KERNEL_CS); } +extern atomic_long_t current_idt_descr_ptr; +static inline void load_current_idt(void) +{ + if (atomic_long_read(¤t_idt_descr_ptr)) + load_idt((const struct desc_ptr *) + atomic_long_read(¤t_idt_descr_ptr)); + else + load_idt((const struct desc_ptr *)&idt_descr); +} + #endif /* _ASM_X86_DESC_H */ diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h index 9bd4eca..dc5fa66 100644 --- a/arch/x86/include/asm/entry_arch.h +++ b/arch/x86/include/asm/entry_arch.h @@ -13,14 +13,16 @@ BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR) BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR) BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR) -BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR) -BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR) +BUILD_INTERRUPT3(irq_move_cleanup_interrupt, IRQ_MOVE_CLEANUP_VECTOR, + smp_irq_move_cleanup_interrupt) +BUILD_INTERRUPT3(reboot_interrupt, REBOOT_VECTOR, smp_reboot_interrupt) #endif BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) #ifdef CONFIG_HAVE_KVM -BUILD_INTERRUPT(kvm_posted_intr_ipi, POSTED_INTR_VECTOR) +BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR, + smp_kvm_posted_intr_ipi) #endif /* diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 1da97ef..e4ac559 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -77,6 +77,23 @@ extern void threshold_interrupt(void); extern void call_function_interrupt(void); extern void call_function_single_interrupt(void); +#ifdef CONFIG_TRACING +/* Interrupt handlers registered during init_IRQ */ +extern void trace_apic_timer_interrupt(void); +extern void trace_x86_platform_ipi(void); +extern void trace_error_interrupt(void); +extern void trace_irq_work_interrupt(void); +extern void trace_spurious_interrupt(void); +extern void trace_thermal_interrupt(void); +extern void trace_reschedule_interrupt(void); +extern void trace_threshold_interrupt(void); +extern void trace_call_function_interrupt(void); +extern void trace_call_function_single_interrupt(void); +#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt +#define trace_reboot_interrupt reboot_interrupt +#define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi +#endif /* CONFIG_TRACING */ + /* IOAPIC */ #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs)) extern unsigned long io_apic_irqs; diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index c2934be..cc40e03 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -12,6 +12,7 @@ struct ms_hyperv_info { extern struct ms_hyperv_info ms_hyperv; void hyperv_callback_vector(void); +#define trace_hyperv_callback_vector hyperv_callback_vector void hyperv_vector_handler(struct pt_regs *regs); void hv_register_vmbus_handler(int irq, irq_handler_t handler); diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h new file mode 100644 index 0000000..2874df2 --- /dev/null +++ b/arch/x86/include/asm/trace/irq_vectors.h @@ -0,0 +1,104 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM irq_vectors + +#if !defined(_TRACE_IRQ_VECTORS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_IRQ_VECTORS_H + +#include <linux/tracepoint.h> + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_irq_vector, + + TP_PROTO(int vector), + + TP_ARGS(vector), + + TP_STRUCT__entry( + __field( int, vector ) + ), + + TP_fast_assign( + __entry->vector = vector; + ), + + TP_printk("vector=%d", __entry->vector) ); + +#define DEFINE_IRQ_VECTOR_EVENT(name) \ +DEFINE_EVENT_FN(x86_irq_vector, name##_entry, \ + TP_PROTO(int vector), \ + TP_ARGS(vector), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); \ +DEFINE_EVENT_FN(x86_irq_vector, name##_exit, \ + TP_PROTO(int vector), \ + TP_ARGS(vector), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); + + +/* + * local_timer - called when entering/exiting a local timer interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(local_timer); + +/* + * reschedule - called when entering/exiting a reschedule vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(reschedule); + +/* + * spurious_apic - called when entering/exiting a spurious apic vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(spurious_apic); + +/* + * error_apic - called when entering/exiting an error apic vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(error_apic); + +/* + * x86_platform_ipi - called when entering/exiting a x86 platform ipi interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi); + +/* + * irq_work - called when entering/exiting a irq work interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(irq_work); + +/* + * call_function - called when entering/exiting a call function interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(call_function); + +/* + * call_function_single - called when entering/exiting a call function + * single interrupt vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(call_function_single); + +/* + * threshold_apic - called when entering/exiting a threshold apic interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(threshold_apic); + +/* + * thermal_apic - called when entering/exiting a thermal apic interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(thermal_apic); + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#define TRACE_INCLUDE_FILE irq_vectors +#endif /* _TRACE_IRQ_VECTORS_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 7bd3bd3..74b3891 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -102,6 +102,7 @@ obj-$(CONFIG_OF) += devicetree.o obj-$(CONFIG_UPROBES) += uprobes.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o +obj-$(CONFIG_TRACING) += tracepoint.o ### # 64 bit specific files diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile index 0ae0323..5274c3a 100644 --- a/arch/x86/kernel/apic/Makefile +++ b/arch/x86/kernel/apic/Makefile @@ -2,6 +2,7 @@ # Makefile for local APIC drivers and for the IO-APIC code # +CFLAGS_apic.o := -I$(src)/../../include/asm/trace obj-$(CONFIG_X86_LOCAL_APIC) += apic.o apic_noop.o ipi.o obj-y += hw_nmi.o diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 904611b..61ced40 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -55,6 +55,9 @@ #include <asm/tsc.h> #include <asm/hypervisor.h> +#define CREATE_TRACE_POINTS +#include <asm/trace/irq_vectors.h> + unsigned int num_processors; unsigned disabled_cpus __cpuinitdata; @@ -919,17 +922,35 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) /* * NOTE! We'd better ACK the irq immediately, * because timer handling can be slow. + * + * update_process_times() expects us to have done irq_enter(). + * Besides, if we don't timer interrupts ignore the global + * interrupt lock, which is the WrongThing (tm) to do. */ - ack_APIC_irq(); + entering_ack_irq(); + local_apic_timer_interrupt(); + exiting_irq(); + + set_irq_regs(old_regs); +} + +void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + /* + * NOTE! We'd better ACK the irq immediately, + * because timer handling can be slow. + * * update_process_times() expects us to have done irq_enter(). * Besides, if we don't timer interrupts ignore the global * interrupt lock, which is the WrongThing (tm) to do. */ - irq_enter(); - exit_idle(); + entering_ack_irq(); + trace_local_timer_entry(LOCAL_TIMER_VECTOR); local_apic_timer_interrupt(); - irq_exit(); + trace_local_timer_exit(LOCAL_TIMER_VECTOR); + exiting_irq(); set_irq_regs(old_regs); } @@ -1907,12 +1928,10 @@ int __init APIC_init_uniprocessor(void) /* * This interrupt should _never_ happen with our APIC/SMP architecture */ -void smp_spurious_interrupt(struct pt_regs *regs) +static inline void __smp_spurious_interrupt(void) { u32 v; - irq_enter(); - exit_idle(); /* * Check if this really is a spurious interrupt and ACK it * if it is a vectored one. Just in case... @@ -1927,13 +1946,28 @@ void smp_spurious_interrupt(struct pt_regs *regs) /* see sw-dev-man vol 3, chapter 7.4.13.5 */ pr_info("spurious APIC interrupt on CPU#%d, " "should never happen.\n", smp_processor_id()); - irq_exit(); +} + +void smp_spurious_interrupt(struct pt_regs *regs) +{ + entering_irq(); + __smp_spurious_interrupt(); + exiting_irq(); +} + +void smp_trace_spurious_interrupt(struct pt_regs *regs) +{ + entering_irq(); + trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR); + __smp_spurious_interrupt(); + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR); + exiting_irq(); } /* * This interrupt should never happen with our APIC/SMP architecture */ -void smp_error_interrupt(struct pt_regs *regs) +static inline void __smp_error_interrupt(struct pt_regs *regs) { u32 v0, v1; u32 i = 0; @@ -1948,8 +1982,6 @@ void smp_error_interrupt(struct pt_regs *regs) "Illegal register address", /* APIC Error Bit 7 */ }; - irq_enter(); - exit_idle(); /* First tickle the hardware, only then report what went on. -- REW */ v0 = apic_read(APIC_ESR); apic_write(APIC_ESR, 0); @@ -1970,7 +2002,22 @@ void smp_error_interrupt(struct pt_regs *regs) apic_printk(APIC_DEBUG, KERN_CONT "\n"); - irq_exit(); +} + +void smp_error_interrupt(struct pt_regs *regs) +{ + entering_irq(); + __smp_error_interrupt(regs); + exiting_irq(); +} + +void smp_trace_error_interrupt(struct pt_regs *regs) +{ + entering_irq(); + trace_error_apic_entry(ERROR_APIC_VECTOR); + __smp_error_interrupt(regs); + trace_error_apic_exit(ERROR_APIC_VECTOR); + exiting_irq(); } /** diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 22018f7..5878d0a 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1069,6 +1069,7 @@ static __init int setup_disablecpuid(char *arg) } __setup("clearcpuid=", setup_disablecpuid); +atomic_long_t current_idt_descr_ptr = ATOMIC_LONG_INIT(0); #ifdef CONFIG_X86_64 struct desc_ptr idt_descr = { NR_VECTORS * 16 - 1, (unsigned long) idt_table }; struct desc_ptr nmi_idt_descr = { NR_VECTORS * 16 - 1, @@ -1161,7 +1162,7 @@ void debug_stack_reset(void) if (WARN_ON(!this_cpu_read(debug_stack_use_ctr))) return; if (this_cpu_dec_return(debug_stack_use_ctr) == 0) - load_idt((const struct desc_ptr *)&idt_descr); + load_current_idt(); } #else /* CONFIG_X86_64 */ @@ -1257,7 +1258,7 @@ void __cpuinit cpu_init(void) switch_to_new_gdt(cpu); loadsegment(fs, 0); - load_idt((const struct desc_ptr *)&idt_descr); + load_current_idt(); memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8); syscall_init(); diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index 47a1870..2f3a799 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -29,6 +29,7 @@ #include <asm/idle.h> #include <asm/mce.h> #include <asm/msr.h> +#include <asm/trace/irq_vectors.h> /* How long to wait between reporting thermal events */ #define CHECK_INTERVAL (300 * HZ) @@ -378,15 +379,26 @@ static void unexpected_thermal_interrupt(void) static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt; -asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) +static inline void __smp_thermal_interrupt(void) { - irq_enter(); - exit_idle(); inc_irq_stat(irq_thermal_count); smp_thermal_vector(); - irq_exit(); - /* Ack only at the end to avoid potential reentry */ - ack_APIC_irq(); +} + +asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) +{ + entering_irq(); + __smp_thermal_interrupt(); + exiting_ack_irq(); +} + +asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs) +{ + entering_irq(); + trace_thermal_apic_entry(THERMAL_APIC_VECTOR); + __smp_thermal_interrupt(); + trace_thermal_apic_exit(THERMAL_APIC_VECTOR); + exiting_ack_irq(); } /* Thermal monitoring depends on APIC, ACPI and clock modulation */ diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c index aa578ca..fe6b1c8 100644 --- a/arch/x86/kernel/cpu/mcheck/threshold.c +++ b/arch/x86/kernel/cpu/mcheck/threshold.c @@ -8,6 +8,7 @@ #include <asm/apic.h> #include <asm/idle.h> #include <asm/mce.h> +#include <asm/trace/irq_vectors.h> static void default_threshold_interrupt(void) { @@ -17,13 +18,24 @@ static void default_threshold_interrupt(void) void (*mce_threshold_vector)(void) = default_threshold_interrupt; -asmlinkage void smp_threshold_interrupt(void) +static inline void __smp_threshold_interrupt(void) { - irq_enter(); - exit_idle(); inc_irq_stat(irq_threshold_count); mce_threshold_vector(); - irq_exit(); - /* Ack only at the end to avoid potential reentry */ - ack_APIC_irq(); +} + +asmlinkage void smp_threshold_interrupt(void) +{ + entering_irq(); + __smp_threshold_interrupt(); + exiting_ack_irq(); +} + +asmlinkage void smp_trace_threshold_interrupt(void) +{ + entering_irq(); + trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR); + __smp_threshold_interrupt(); + trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR); + exiting_ack_irq(); } diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index 8f3e2de..2cfbc3a 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -801,7 +801,17 @@ ENTRY(name) \ CFI_ENDPROC; \ ENDPROC(name) -#define BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(name, nr, smp_##name) + +#ifdef CONFIG_TRACING +#define TRACE_BUILD_INTERRUPT(name, nr) \ + BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name) +#else +#define TRACE_BUILD_INTERRUPT(name, nr) +#endif + +#define BUILD_INTERRUPT(name, nr) \ + BUILD_INTERRUPT3(name, nr, smp_##name); \ + TRACE_BUILD_INTERRUPT(name, nr) /* The include is where all of the SMP etc. interrupts come from */ #include <asm/entry_arch.h> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 7272089..11eef43 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1138,7 +1138,7 @@ END(common_interrupt) /* * APIC interrupts. */ -.macro apicinterrupt num sym do_sym +.macro apicinterrupt3 num sym do_sym ENTRY(\sym) INTR_FRAME ASM_CLAC @@ -1150,15 +1150,32 @@ ENTRY(\sym) END(\sym) .endm +#ifdef CONFIG_TRACING +#define trace(sym) trace_##sym +#define smp_trace(sym) smp_trace_##sym + +.macro trace_apicinterrupt num sym +apicinterrupt3 \num trace(\sym) smp_trace(\sym) +.endm +#else +.macro trace_apicinterrupt num sym do_sym +.endm +#endif + +.macro apicinterrupt num sym do_sym +apicinterrupt3 \num \sym \do_sym +trace_apicinterrupt \num \sym +.endm + #ifdef CONFIG_SMP -apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \ +apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR \ irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt -apicinterrupt REBOOT_VECTOR \ +apicinterrupt3 REBOOT_VECTOR \ reboot_interrupt smp_reboot_interrupt #endif #ifdef CONFIG_X86_UV -apicinterrupt UV_BAU_MESSAGE \ +apicinterrupt3 UV_BAU_MESSAGE \ uv_bau_message_intr1 uv_bau_message_interrupt #endif apicinterrupt LOCAL_TIMER_VECTOR \ @@ -1167,7 +1184,7 @@ apicinterrupt X86_PLATFORM_IPI_VECTOR \ x86_platform_ipi smp_x86_platform_ipi #ifdef CONFIG_HAVE_KVM -apicinterrupt POSTED_INTR_VECTOR \ +apicinterrupt3 POSTED_INTR_VECTOR \ kvm_posted_intr_ipi smp_kvm_posted_intr_ipi #endif @@ -1451,13 +1468,13 @@ ENTRY(xen_failsafe_callback) CFI_ENDPROC END(xen_failsafe_callback) -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \ +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ xen_hvm_callback_vector xen_evtchn_do_upcall #endif /* CONFIG_XEN */ #if IS_ENABLED(CONFIG_HYPERV) -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \ +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ hyperv_callback_vector hyperv_vector_handler #endif /* CONFIG_HYPERV */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 08f7e80..dae6a9d 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -519,6 +519,12 @@ ENTRY(idt_table) ENTRY(nmi_idt_table) .skip IDT_ENTRIES * 16 +#ifdef CONFIG_TRACING + .align L1_CACHE_BYTES +ENTRY(trace_idt_table) + .skip IDT_ENTRIES * 16 +#endif + __PAGE_ALIGNED_BSS NEXT_PAGE(empty_zero_page) .skip PAGE_SIZE diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index ac0631d..06af119 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -17,6 +17,7 @@ #include <asm/idle.h> #include <asm/mce.h> #include <asm/hw_irq.h> +#include <asm/trace/irq_vectors.h> atomic_t irq_err_count; @@ -204,23 +205,21 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs) /* * Handler for X86_PLATFORM_IPI_VECTOR. */ -void smp_x86_platform_ipi(struct pt_regs *regs) +void __smp_x86_platform_ipi(void) { - struct pt_regs *old_regs = set_irq_regs(regs); - - ack_APIC_irq(); - - irq_enter(); - - exit_idle(); - inc_irq_stat(x86_platform_ipis); if (x86_platform_ipi_callback) x86_platform_ipi_callback(); +} - irq_exit(); +void smp_x86_platform_ipi(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + entering_ack_irq(); + __smp_x86_platform_ipi(); + exiting_irq(); set_irq_regs(old_regs); } @@ -246,6 +245,18 @@ void smp_kvm_posted_intr_ipi(struct pt_regs *regs) } #endif +void smp_trace_x86_platform_ipi(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + entering_ack_irq(); + trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR); + __smp_x86_platform_ipi(); + trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR); + exiting_irq(); + set_irq_regs(old_regs); +} + EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq); #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c index ca8f703..636a55e 100644 --- a/arch/x86/kernel/irq_work.c +++ b/arch/x86/kernel/irq_work.c @@ -8,14 +8,34 @@ #include <linux/irq_work.h> #include <linux/hardirq.h> #include <asm/apic.h> +#include <asm/trace/irq_vectors.h> -void smp_irq_work_interrupt(struct pt_regs *regs) +static inline void irq_work_entering_irq(void) { irq_enter(); ack_APIC_irq(); +} + +static inline void __smp_irq_work_interrupt(void) +{ inc_irq_stat(apic_irq_work_irqs); irq_work_run(); - irq_exit(); +} + +void smp_irq_work_interrupt(struct pt_regs *regs) +{ + irq_work_entering_irq(); + __smp_irq_work_interrupt(); + exiting_irq(); +} + +void smp_trace_irq_work_interrupt(struct pt_regs *regs) +{ + irq_work_entering_irq(); + trace_irq_work_entry(IRQ_WORK_VECTOR); + __smp_irq_work_interrupt(); + trace_irq_work_exit(IRQ_WORK_VECTOR); + exiting_irq(); } void arch_irq_work_raise(void) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..f4fe0b8 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -30,6 +30,7 @@ #include <asm/proto.h> #include <asm/apic.h> #include <asm/nmi.h> +#include <asm/trace/irq_vectors.h> /* * Some notes on x86 processor bugs affecting SMP operation: * @@ -249,32 +250,80 @@ finish: /* * Reschedule call back. */ -void smp_reschedule_interrupt(struct pt_regs *regs) +static inline void __smp_reschedule_interrupt(void) { - ack_APIC_irq(); inc_irq_stat(irq_resched_count); scheduler_ipi(); +} + +void smp_reschedule_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + __smp_reschedule_interrupt(); /* * KVM uses this interrupt to force a cpu out of guest mode */ } -void smp_call_function_interrupt(struct pt_regs *regs) +void smp_trace_reschedule_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + trace_reschedule_entry(RESCHEDULE_VECTOR); + __smp_reschedule_interrupt(); + trace_reschedule_exit(RESCHEDULE_VECTOR); + /* + * KVM uses this interrupt to force a cpu out of guest mode + */ +} + +static inline void call_function_entering_irq(void) { ack_APIC_irq(); irq_enter(); +} + +static inline void __smp_call_function_interrupt(void) +{ generic_smp_call_function_interrupt(); inc_irq_stat(irq_call_count); - irq_exit(); } -void smp_call_function_single_interrupt(struct pt_regs *regs) +void smp_call_function_interrupt(struct pt_regs *regs) +{ + call_function_entering_irq(); + __smp_call_function_interrupt(); + exiting_irq(); +} + +void smp_trace_call_function_interrupt(struct pt_regs *regs) +{ + call_function_entering_irq(); + trace_call_function_entry(CALL_FUNCTION_VECTOR); + __smp_call_function_interrupt(); + trace_call_function_exit(CALL_FUNCTION_VECTOR); + exiting_irq(); +} + +static inline void __smp_call_function_single_interrupt(void) { - ack_APIC_irq(); - irq_enter(); generic_smp_call_function_single_interrupt(); inc_irq_stat(irq_call_count); - irq_exit(); +} + +void smp_call_function_single_interrupt(struct pt_regs *regs) +{ + call_function_entering_irq(); + __smp_call_function_single_interrupt(); + exiting_irq(); +} + +void smp_trace_call_function_single_interrupt(struct pt_regs *regs) +{ + call_function_entering_irq(); + trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR); + __smp_call_function_single_interrupt(); + trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); + exiting_irq(); } static int __init nonmi_ipi_setup(char *str) diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c new file mode 100644 index 0000000..09a5fa7 --- /dev/null +++ b/arch/x86/kernel/tracepoint.c @@ -0,0 +1,58 @@ +/* + * Code for supporting irq vector tracepoints. + * + * Copyright (C) 2013 Seiji Aguchi <sei...@hd...> + * + */ +#include <asm/hw_irq.h> +#include <asm/desc.h> +#include <linux/atomic.h> + +struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1, + (unsigned long) trace_idt_table }; + +#ifndef CONFIG_X86_64 +gate_desc trace_idt_table[NR_VECTORS] __page_aligned_data + = { { { { 0, 0 } } }, }; +#endif + +static int trace_irq_vector_refcount; +static DEFINE_MUTEX(irq_vector_mutex); + +static void switch_idt(void *arg) +{ + load_current_idt(); +} + +void trace_irq_vector_regfunc(void) +{ + mutex_lock(&irq_vector_mutex); + if (!trace_irq_vector_refcount) { + atomic_long_set(¤t_idt_descr_ptr, + (unsigned long)&trace_idt_descr); + wmb(); + smp_call_function(switch_idt, NULL, 0); + local_irq_disable(); + switch_idt(NULL); + local_irq_enable(); + } + trace_irq_vector_refcount++; + mutex_unlock(&irq_vector_mutex); +} + +void trace_irq_vector_unregfunc(void) +{ + mutex_lock(&irq_vector_mutex); + trace_irq_vector_refcount--; + if (!trace_irq_vector_refcount) { + atomic_long_set(¤t_idt_descr_ptr, + (unsigned long)&idt_descr); + wmb(); + smp_call_function(switch_idt, NULL, 0); + local_irq_disable(); + switch_idt(NULL); + local_irq_enable(); + } + mutex_unlock(&irq_vector_mutex); +} + diff --git a/include/xen/events.h b/include/xen/events.h index b2b27c6..c9ea10e 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -76,6 +76,9 @@ unsigned irq_from_evtchn(unsigned int evtchn); /* Xen HVM evtchn vector callback */ void xen_hvm_callback_vector(void); +#ifdef CONFIG_TRACING +#define trace_xen_hvm_callback_vector xen_hvm_callback_vector +#endif extern int xen_have_vector_callback; int xen_set_callback_via(uint64_t via); void xen_evtchn_do_upcall(struct pt_regs *regs); -- 1.7.1 |
|
From: Seiji A. <sei...@hd...> - 2013-06-03 19:29:54
|
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.
To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.
To do this, this patch introduces entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.
A way to use them is as follows.
Non-trace irq handler:
smp_irq_handler()
{
entering_irq(); /* pre-processing of this handler */
__smp_irq_handler(); /*
* common logic between non-trace and trace handlers
* in a vector.
*/
exiting_irq(); /* post-processing of this handler */
}
Trace irq_handler:
smp_trace_irq_handler()
{
entering_irq(); /* pre-processing of this handler */
trace_irq_entry(); /* tracepoint for irq entry */
__smp_irq_handler(); /*
* common logic between non-trace and trace handlers
* in a vector.
*/
trace_irq_exit(); /* tracepoint for irq exit */
exiting_irq(); /* post-processing of this handler */
}
If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.
smp_trace_irq_handler()
{
trace_irq_entry();
smp_irq_handler();
trace_irq_exit();
}
But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use rcu to synchronize.
As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.
smp_trace_irq_hander()
{
irq_entry();
trace_irq_entry();
smp_irq_handler();
trace_irq_exit();
irq_exit();
}
But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
arch/x86/include/asm/apic.h | 27 +++++++++++++++++++++++++++
1 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 3388034..f8119b5 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -12,6 +12,7 @@
#include <asm/fixmap.h>
#include <asm/mpspec.h>
#include <asm/msr.h>
+#include <asm/idle.h>
#define ARCH_APICTIMER_STOPS_ON_C3 1
@@ -687,5 +688,31 @@ extern int default_check_phys_apicid_present(int phys_apicid);
#endif
#endif /* CONFIG_X86_LOCAL_APIC */
+extern void irq_enter(void);
+extern void irq_exit(void);
+
+static inline void entering_irq(void)
+{
+ irq_enter();
+ exit_idle();
+}
+
+static inline void entering_ack_irq(void)
+{
+ ack_APIC_irq();
+ entering_irq();
+}
+
+static inline void exiting_irq(void)
+{
+ irq_exit();
+}
+
+static inline void exiting_ack_irq(void)
+{
+ irq_exit();
+ /* Ack only at the end to avoid potential reentry */
+ ack_APIC_irq();
+}
#endif /* _ASM_X86_APIC_H */
-- 1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2013-06-03 19:29:25
|
From: Steven Rostedt <ro...@go...> Each TRACE_EVENT() adds several helper functions. If two or more trace events share the same structure and print format, they can also share most of these helper functions and save a lot of space from duplicate code. This is why the DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created. Some events require a trigger to be called at registering and unregistering of the event and to do so they use TRACE_EVENT_FN(). If multiple events require a trigger, they currently have no choice but to use TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately causes a lot of wasted duplicate code created. By adding a DEFINE_EVENT_FN(), these events can still use a DECLARE_EVENT_CLASS() and then define their own triggers. Signed-off-by: Steven Rostedt <ro...@go...> Signed-off-by: Seiji Aguchi <sei...@hd...> --- include/linux/tracepoint.h | 2 ++ include/trace/define_trace.h | 5 +++++ include/trace/ftrace.h | 4 ++++ 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 2f322c3..9bf59e5 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -378,6 +378,8 @@ static inline void tracepoint_synchronize_unregister(void) #define DECLARE_EVENT_CLASS(name, proto, args, tstruct, assign, print) #define DEFINE_EVENT(template, name, proto, args) \ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) +#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg)\ + DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) #define DEFINE_EVENT_CONDITION(template, name, proto, \ diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h index 1905ca8..02e1003 100644 --- a/include/trace/define_trace.h +++ b/include/trace/define_trace.h @@ -44,6 +44,10 @@ #define DEFINE_EVENT(template, name, proto, args) \ DEFINE_TRACE(name) +#undef DEFINE_EVENT_FN +#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \ + DEFINE_TRACE_FN(name, reg, unreg) + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ DEFINE_TRACE(name) @@ -91,6 +95,7 @@ #undef TRACE_EVENT_CONDITION #undef DECLARE_EVENT_CLASS #undef DEFINE_EVENT +#undef DEFINE_EVENT_FN #undef DEFINE_EVENT_PRINT #undef DEFINE_EVENT_CONDITION #undef TRACE_HEADER_MULTI_READ diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 19edd7f..d615f78 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -71,6 +71,10 @@ static struct ftrace_event_call __used \ __attribute__((__aligned__(4))) event_##name +#undef DEFINE_EVENT_FN +#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \ + DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) -- 1.7.1 |
|
From: Seiji A. <sei...@hd...> - 2013-06-03 19:27:01
|
Change log
v12 -> v13
- Rebase to 3.10-rc3
- Patch 2/3
- Separate a patch introducing entering/exiting_irq() from patch-v12 3/3.
- Patch 3/3
- Introduce write_trace_idt_entry() to remove "ifdef CONFIG_TRACING"
from _set_gate().
- Change DEFINE_IRQ_VECTOR_EVENT to save on repeat code.
- Add a path of irq_vector.h to Makefile.
- Introduce current_idt_descr_ptr and load_current_idt() to switch
IDT in a generic way with handling Debug traps/NMI and cpu hotplug.
v11 -> v12
- Rebase to 3.9-rc5
v10 -> v11
- Rebase to 3.9-rc2
- Add a modification for hyperv_callback vector. (patch 2/3)
- Change a way to switch idt to check the table in use instead of saving/restoring it,
because saving/restoring functions will break if we have to add another one. (patch 2/3)
v9 -> v10
- Add an explanation the reason why tracepoints has to place inside irq enter/exit handling. (patch 3/3)
v8 -> v9
- Rebase to 3.8-rc6
- Add Steven's email address at the top of the message and
move my signed-off-by below Steven's one because it is
originally created by Steven. (patch 1/3)
- Introduce a irq_vector_mutex to avoid a race at registering/unregistering
time. (patch 2/3)
- Use a per_cpu data to orig_idt_descr because IDT descritor is needed to each cpu
and the appropriate data type is per_cpu data. It is suggested by Steven.
(patch 2/3)
v7 -> v8
- Rebase to 3.8-rc4
- Add a patch 1 introducing DEFINE_EVENT_FN() macro.
- Rename original patches 1 and 2 to 2 and 3.
- Change a definition of tracepoint to use DEFINE_EVENT_FN(). (patch 2)
- Change alloc_intr_gate() to use do{}while(0) to avoid a warning
of checkpatch.pl. (patch 2)
- Move entering_irq()/exiting_irq() to arch/x86/include/asm/apic.h (patch 3)
v6 -> v7
- Divide into two patches to make a code review easier.
Summery of each patch is as follows.
- Patch 1/2
- Add an irq_vector tracing infrastructure.
- Create idt_table for tracing. It is refactored to avoid duplicating
existing logic.
- Duplicate new irq handlers inserted tracepoints.
- Patch 2/2
- Share a common logic among irq handlers to make them
manageable and readable.
v5 -> v6
- Rebased to 3.7
v4 -> v5
- Rebased to 3.6.0
- Introduce a logic switching IDT at enabling/disabling TP time
so that a time penalty makes a zero when tracepoints are disabled.
This IDT is created only when CONFIG_TRACEPOINTS is enabled.
- Remove arch_irq_vector_entry/exit and add followings again
so that we can add each tracepoint in a generic way.
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
- Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching
IDT first.
- Move irq_vectors.h in the directory of arch/x86/include/asm/trace because
I'm not sure if a logic switching IDT is sharable with other architectures.
v3 -> v4
- Add a latency measurement of each tracepoint
- Rebased to 3.6-rc6
v2 -> v3
- Remove an invalidate_tlb_vector event because it was replaced by a call function vector
in a following commit.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
v1 -> v2
- Modify variable name from irq to vector.
- Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit.
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
[Purpose of this patch]
As Vaibhav explained in the thread below, tracepoints for irq vectors are useful.
http://www.spinics.net/lists/mm-commits/msg85707.html
<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit provide
when an interrupt is handled.
They provide good data about when the system has switched to kernel space and
how it affects the currently running processes.
There are some IRQ vectors which trigger the system into kernel space, which are
not handled in generic IRQ handlers.
Tracing such events gives us the information about IRQ interaction with other
system events.
The trace also tells where the system is spending its time. We want to know which
cores are handling interrupts and
how they are affecting other processes in the system. Also, the trace provides
information about when the cores are
idle and which interrupts are changing that state.
<snip>
On the other hand, my usecase is tracing just local timer event and getting
a value of instruction pointer.
I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.
[Patch Description]
Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.
This patch adds following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
- local_timer_vector
- reschedule_vector
- call_function_vector
- call_function_single_vector
- irq_work_entry_vector
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
Please see descriptions in each patch.
Seiji Aguchi (2):
trace,x86: Introduce entering/exiting_irq()
trace,x86: Add irq vector tracepoints
Steven Rostedt (1):
tracing: Add DEFINE_EVENT_FN() macro
arch/x86/include/asm/apic.h | 27 ++++++++
arch/x86/include/asm/desc.h | 55 +++++++++++++++-
arch/x86/include/asm/entry_arch.h | 8 ++-
arch/x86/include/asm/hw_irq.h | 17 +++++
arch/x86/include/asm/mshyperv.h | 1 +
arch/x86/include/asm/trace/irq_vectors.h | 104 ++++++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/apic/Makefile | 1 +
arch/x86/kernel/apic/apic.c | 71 +++++++++++++++++----
arch/x86/kernel/cpu/common.c | 5 +-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 24 +++++--
arch/x86/kernel/cpu/mcheck/threshold.c | 24 +++++--
arch/x86/kernel/entry_32.S | 12 +++-
arch/x86/kernel/entry_64.S | 31 +++++++--
arch/x86/kernel/head_64.S | 6 ++
arch/x86/kernel/irq.c | 31 ++++++---
arch/x86/kernel/irq_work.c | 24 ++++++-
arch/x86/kernel/smp.c | 65 ++++++++++++++++--
arch/x86/kernel/tracepoint.c | 58 +++++++++++++++++
include/linux/tracepoint.h | 2 +
include/trace/define_trace.h | 5 ++
include/trace/ftrace.h | 4 +
include/xen/events.h | 3 +
23 files changed, 520 insertions(+), 59 deletions(-)
create mode 100644 arch/x86/include/asm/trace/irq_vectors.h
create mode 100644 arch/x86/kernel/tracepoint.c
|
|
From: Seiji A. <sei...@hd...> - 2013-05-29 00:14:45
|
Eric,
> -----Original Message-----
> From: lib...@re... [mailto:lib...@re...] On Behalf Of Eric Blake
> Sent: Tuesday, May 28, 2013 7:08 PM
> Cc: lib...@re...; dle...@li...; Tomoki Sekiyama
> Subject: Re: [libvirt] [PATCHv2] xml: introduce startupPolicy for chardev device
>
> On 05/24/2013 04:46 PM, Eric Blake wrote:
> > From: Seiji Aguchi <sei...@hd...>
> >
> > [Problem]
> > Currently, guest OS's messages can be logged to a local disk of host OS
> > by creating chadevs with options below.
> > -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0
>
> >
> > Seiji's patch plus my initial review comments; I've ack'd this much,
> > but want to add a round-trip xml2xml test before pushing anything
> > (Seiji, if you want to use this as a starting point, rather than
> > waiting for my long weekend, feel free to post a v3).
>
>
> Sorry, but this needs a v3, and I'm going to hand it back to you to fix.
> I tried adding a testsuite, but it proved that you weren't outputting
> the startupPolicy during dumpxml formatting. My initial attempt to add
> it backfired (I'll post it below);
Thank you for handling/testing my patch.
>I'm worried that we are playing too
> fast and loose with a union type, since even with my patch in place,
> 'make check' fails with problems like:
>
> 84) QEMU XML-2-XML serial-dev ...
> Offset 913
> Expect [/>
> <target port='0'/>
> </serial>
> <console type='dev'>
> <source path='/dev/ttyS2]
> Actual [ startupPolicy='(null)'/>
> <target port='0'/>
> </serial>
> <console type='dev'>
> <source path='/dev/ttyS2' startupPolicy='(null)]
>
> which is a bug, since you said startupPolicy should only be tied to
> files and not other <serial> types. Do we need to repeat the
> startupPolicy on the <console> mirror of the first <serial> device, or
> is that unnecessary back-compat, and where we can safely declare that
> startupPolicy is only emitted for the <serial> side of things?
>
I need to think carefully about it before deciding it in a hurry.
> All told, your patch is not complete until it passes 'make check' with a
> new xml2xml test that proves we can round trip the new policy.
Anyway, I will post the v3 patch by fixing the issue.
Seiji
>
> diff --git c/src/conf/domain_conf.c w/src/conf/domain_conf.c
> index c24a9f0..5af5e40 100644
> --- c/src/conf/domain_conf.c
> +++ w/src/conf/domain_conf.c
> @@ -14547,8 +14547,14 @@ virDomainChrSourceDefFormat(virBufferPtr buf,
> if (def->type != VIR_DOMAIN_CHR_TYPE_PTY ||
> (def->data.file.path &&
> !(flags & VIR_DOMAIN_XML_INACTIVE))) {
> - virBufferEscapeString(buf, " <source path='%s'/>\n",
> + virBufferEscapeString(buf, " <source path='%s'",
> def->data.file.path);
> +
> + if (def->data.file.path && def->data.file.startupPolicy) {
> + const char *policy =
> virDomainStartupPolicyTypeToString(def->data.file.startupPolicy);
> + virBufferAsprintf(buf, " startupPolicy='%s'", policy);
> + }
> + virBufferAddLit(buf, "/>\n");
> }
> break;
>
> diff --git
> c/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.args
> w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.args
> new file mode 100644
> index 0000000..24977f2
> --- /dev/null
> +++ w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.args
> @@ -0,0 +1,8 @@
> +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu
> -S -M \
> +pc -m 214 -smp 1 -nographic -nodefconfig -nodefaults \
> +-chardev socket,id=charmonitor,path=/tmp/test-monitor,server,nowait \
> +-mon chardev=charmonitor,id=monitor,mode=readline \
> +-no-acpi -boot c -usb -hdc /tmp/idedisk.img \
> +-chardev file,id=charserial0,path=/tmp/serial.log \
> +-device isa-serial,chardev=charserial0,id=serial0 \
> +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
> diff --git
> c/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.xml
> w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.xml
> new file mode 100644
> index 0000000..34e0eb9
> --- /dev/null
> +++ w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.xml
> @@ -0,0 +1,35 @@
> +<domain type='qemu'>
> + <name>QEMUGuest1</name>
> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
> + <memory unit='KiB'>219136</memory>
> + <currentMemory unit='KiB'>219136</currentMemory>
> + <vcpu placement='static'>1</vcpu>
> + <os>
> + <type arch='i686' machine='pc'>hvm</type>
> + <boot dev='hd'/>
> + </os>
> + <clock offset='utc'/>
> + <on_poweroff>destroy</on_poweroff>
> + <on_reboot>restart</on_reboot>
> + <on_crash>destroy</on_crash>
> + <devices>
> + <emulator>/usr/bin/qemu</emulator>
> + <disk type='file' device='disk'>
> + <source file='/tmp/idedisk.img'/>
> + <target dev='hdc' bus='ide'/>
> + <address type='drive' controller='0' bus='0' target='0' unit='2'/>
> + </disk>
> + <controller type='usb' index='0'/>
> + <controller type='ide' index='0'/>
> + <controller type='pci' index='0' model='pci-root'/>
> + <serial type='file'>
> + <source path='/tmp/serial.log' startupPolicy='optional'/>
> + <target port='0'/>
> + </serial>
> + <console type='file'>
> + <source path='/tmp/serial.log'/>
> + <target type='serial' port='0'/>
> + </console>
> + <memballoon model='virtio'/>
> + </devices>
> +</domain>
> diff --git c/tests/qemuxml2argvtest.c w/tests/qemuxml2argvtest.c
> index 2c7fd01..22f4782 100644
> --- c/tests/qemuxml2argvtest.c
> +++ w/tests/qemuxml2argvtest.c
> @@ -727,6 +727,8 @@ mymain(void)
> QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
> DO_TEST("console-compat-chardev",
> QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
> + DO_TEST("serial-source-optional",
> + QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
>
> DO_TEST("channel-guestfwd",
> QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
> diff --git c/tests/qemuxml2xmltest.c w/tests/qemuxml2xmltest.c
> index 64a271c..1147519 100644
> --- c/tests/qemuxml2xmltest.c
> +++ w/tests/qemuxml2xmltest.c
> @@ -86,6 +86,8 @@ testCompareXMLToXMLHelper(const void *data)
> ret = testCompareXMLToXMLFiles(xml_in,
> info->different ? xml_out : xml_in,
> false);
> + if (ret < 0)
> + goto cleanup;
> }
> if (info->when & WHEN_ACTIVE) {
> ret = testCompareXMLToXMLFiles(xml_in,
> @@ -231,6 +233,7 @@ mymain(void)
> DO_TEST("console-virtio-many");
> DO_TEST("channel-guestfwd");
> DO_TEST("channel-virtio");
> + DO_TEST("serial-source-optional");
>
> DO_TEST("hostdev-usb-address");
> DO_TEST("hostdev-pci-address");
>
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
|
|
From: Eric B. <eb...@re...> - 2013-05-28 23:07:44
|
On 05/24/2013 04:46 PM, Eric Blake wrote:
> From: Seiji Aguchi <sei...@hd...>
>
> [Problem]
> Currently, guest OS's messages can be logged to a local disk of host OS
> by creating chadevs with options below.
> -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0
>
> Seiji's patch plus my initial review comments; I've ack'd this much,
> but want to add a round-trip xml2xml test before pushing anything
> (Seiji, if you want to use this as a starting point, rather than
> waiting for my long weekend, feel free to post a v3).
Sorry, but this needs a v3, and I'm going to hand it back to you to fix.
I tried adding a testsuite, but it proved that you weren't outputting
the startupPolicy during dumpxml formatting. My initial attempt to add
it backfired (I'll post it below); I'm worried that we are playing too
fast and loose with a union type, since even with my patch in place,
'make check' fails with problems like:
84) QEMU XML-2-XML serial-dev ...
Offset 913
Expect [/>
<target port='0'/>
</serial>
<console type='dev'>
<source path='/dev/ttyS2]
Actual [ startupPolicy='(null)'/>
<target port='0'/>
</serial>
<console type='dev'>
<source path='/dev/ttyS2' startupPolicy='(null)]
which is a bug, since you said startupPolicy should only be tied to
files and not other <serial> types. Do we need to repeat the
startupPolicy on the <console> mirror of the first <serial> device, or
is that unnecessary back-compat, and where we can safely declare that
startupPolicy is only emitted for the <serial> side of things?
All told, your patch is not complete until it passes 'make check' with a
new xml2xml test that proves we can round trip the new policy.
diff --git c/src/conf/domain_conf.c w/src/conf/domain_conf.c
index c24a9f0..5af5e40 100644
--- c/src/conf/domain_conf.c
+++ w/src/conf/domain_conf.c
@@ -14547,8 +14547,14 @@ virDomainChrSourceDefFormat(virBufferPtr buf,
if (def->type != VIR_DOMAIN_CHR_TYPE_PTY ||
(def->data.file.path &&
!(flags & VIR_DOMAIN_XML_INACTIVE))) {
- virBufferEscapeString(buf, " <source path='%s'/>\n",
+ virBufferEscapeString(buf, " <source path='%s'",
def->data.file.path);
+
+ if (def->data.file.path && def->data.file.startupPolicy) {
+ const char *policy =
virDomainStartupPolicyTypeToString(def->data.file.startupPolicy);
+ virBufferAsprintf(buf, " startupPolicy='%s'", policy);
+ }
+ virBufferAddLit(buf, "/>\n");
}
break;
diff --git
c/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.args
w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.args
new file mode 100644
index 0000000..24977f2
--- /dev/null
+++ w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.args
@@ -0,0 +1,8 @@
+LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test /usr/bin/qemu
-S -M \
+pc -m 214 -smp 1 -nographic -nodefconfig -nodefaults \
+-chardev socket,id=charmonitor,path=/tmp/test-monitor,server,nowait \
+-mon chardev=charmonitor,id=monitor,mode=readline \
+-no-acpi -boot c -usb -hdc /tmp/idedisk.img \
+-chardev file,id=charserial0,path=/tmp/serial.log \
+-device isa-serial,chardev=charserial0,id=serial0 \
+-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
diff --git
c/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.xml
w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.xml
new file mode 100644
index 0000000..34e0eb9
--- /dev/null
+++ w/tests/qemuxml2argvdata/qemuxml2argv-serial-source-optional.xml
@@ -0,0 +1,35 @@
+<domain type='qemu'>
+ <name>QEMUGuest1</name>
+ <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
+ <memory unit='KiB'>219136</memory>
+ <currentMemory unit='KiB'>219136</currentMemory>
+ <vcpu placement='static'>1</vcpu>
+ <os>
+ <type arch='i686' machine='pc'>hvm</type>
+ <boot dev='hd'/>
+ </os>
+ <clock offset='utc'/>
+ <on_poweroff>destroy</on_poweroff>
+ <on_reboot>restart</on_reboot>
+ <on_crash>destroy</on_crash>
+ <devices>
+ <emulator>/usr/bin/qemu</emulator>
+ <disk type='file' device='disk'>
+ <source file='/tmp/idedisk.img'/>
+ <target dev='hdc' bus='ide'/>
+ <address type='drive' controller='0' bus='0' target='0' unit='2'/>
+ </disk>
+ <controller type='usb' index='0'/>
+ <controller type='ide' index='0'/>
+ <controller type='pci' index='0' model='pci-root'/>
+ <serial type='file'>
+ <source path='/tmp/serial.log' startupPolicy='optional'/>
+ <target port='0'/>
+ </serial>
+ <console type='file'>
+ <source path='/tmp/serial.log'/>
+ <target type='serial' port='0'/>
+ </console>
+ <memballoon model='virtio'/>
+ </devices>
+</domain>
diff --git c/tests/qemuxml2argvtest.c w/tests/qemuxml2argvtest.c
index 2c7fd01..22f4782 100644
--- c/tests/qemuxml2argvtest.c
+++ w/tests/qemuxml2argvtest.c
@@ -727,6 +727,8 @@ mymain(void)
QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
DO_TEST("console-compat-chardev",
QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
+ DO_TEST("serial-source-optional",
+ QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
DO_TEST("channel-guestfwd",
QEMU_CAPS_CHARDEV, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG);
diff --git c/tests/qemuxml2xmltest.c w/tests/qemuxml2xmltest.c
index 64a271c..1147519 100644
--- c/tests/qemuxml2xmltest.c
+++ w/tests/qemuxml2xmltest.c
@@ -86,6 +86,8 @@ testCompareXMLToXMLHelper(const void *data)
ret = testCompareXMLToXMLFiles(xml_in,
info->different ? xml_out : xml_in,
false);
+ if (ret < 0)
+ goto cleanup;
}
if (info->when & WHEN_ACTIVE) {
ret = testCompareXMLToXMLFiles(xml_in,
@@ -231,6 +233,7 @@ mymain(void)
DO_TEST("console-virtio-many");
DO_TEST("channel-guestfwd");
DO_TEST("channel-virtio");
+ DO_TEST("serial-source-optional");
DO_TEST("hostdev-usb-address");
DO_TEST("hostdev-pci-address");
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
|
|
From: Nicholas M. <ni...@wd...> - 2013-05-28 20:49:43
|
Dear Dynamic and Savvy Software Engineers,
My outstanding client offers a rare and exciting opportunity to learn Ruby and Rails on the job. This is a well-funded small company (already profitable) working in the field of GIS "big data". They produce the nation's leading geographical/statistics software and are poised for excellent growth. This is a pre-IPO opportunity offering stock options and competitive salaries and will be located in the general Boston area. If you are hired for this position, you will become an ultimate Ruby on Rails expert and dramatically expand your programming skills.
They are looking for 1-2 year experienced BS and MS Computer Science graduates (exceptionally talented individuals still in school will be considered) with very good general coding skills, who may not necessarily know Ruby or Rails yet, but would jump at a chance to learn. The candidates should have prior web-based applications development experience and should have code samples that they can provide for evaluation. Ideal candidates would have experience coding in multiple languages, agile environments, Unix data-mining and familiarity with GIS (mapping) software.
There are also opportunities for more experienced RoR coders with some solid experience, but not much. Please feel free to submit resumes even if you have more than 2 years’ experience and want to dramatically sharpen your Ruby on Rails coding skills.
Lastly, they are also looking for a Lead RoR Coding expert who will have a ‘walks-on-water’ track-record.
Please submit resume to Nick Meyler at <ni...@wd...>;
Phone (818)597-3200 ext. 211
FORMAL DESCRIPTION:
Web Applications Developer (Ruby on Rails)
We are a fun and exciting Internet-based Big Data company looking to expand our team by
hiring two full-stack Web Applications Developers. This is an exceptional opportunity for skilled and self-starting Computer Science students or recent graduates to join our team and build unique, market-disrupting technologies, while mastering Ruby on Rails on the job in our small and growing, profitable, venture-backed company.
What You’ll Be Doing:
• Solving challenging problems working with massive datasets that include crime statistics,
schools, lifestyles, neighborhoods and more
• Working in an Agile, TDD environment writing code you can be proud of
• Keeping existing products up-to-date with the latest technologies, architectures, and standards
• Expanding your programming skills, and transforming into the ultimate Ruby on Rails expert
What’s In it For You?
• Significant potential to become a leading member of our development team.
• Casual fun working environment, solving some of the most exhilarating Big Data problems
in the industry, alongside a team that’s as passionate as you are about this stuff!
• Competitive salary, stock options for you to participate in company success, and great
benefits (401(k), 100% medical coverage, 11 paid holidays)
• Great work/life balance with 40-45 hour work weeks the norm for full time; 30-35 hours if
part time while in school!
• Long-term stability with a venture backed start up that is already profitable and growing
• Investors include the Founders of MapQuest and CoreLogic
Desired Skills & Experience:
• A BS in Computer Science, Computer Engineering, or similar; MS preferred
• Experience with Web Applications Development, ideally in multiple languages
• Self-directed time management, documentation and communication skills
• To be passionate about your craft
• An ideal candidate would also possess:
o Familiarity with GIS-related software, including spatial databases such as PostGIS and
the Google Maps Javascript APIs
o Knowledge of all aspects/tiers of web and mobile web app development
o Strong Unix skills including API/data mining experience
o Scaling and performance tuning of high-traffic, data-heavy sites
CURRENT POSITIONS OPEN:
3 potential opportunities:
1. Lead Rails Engineer (we still need a leading person, great top-notch RoR programmer).
2.Jr/Mid-level Rails Developer (some Rails experience, but not much, but a programmer with some solid experience)
3.Fresh BS or MS graduate w/ web applications development experience, who can learn RoR on the job.
Please send a resume and code sample to ni...@wd...
Best Regards,
Nicholas Meyler
GM/President, Technology
Wingate Dunross, Inc.
ph (818)597-3200 ext. 211
http://app.streamsend.com/private/u4Kt/nKR/rTa88dI/unsubscribe/18715885
|
|
From: Eric B. <eb...@re...> - 2013-05-24 23:00:00
|
From: Seiji Aguchi <sei...@hd...> [Problem] Currently, guest OS's messages can be logged to a local disk of host OS by creating chadevs with options below. -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0 When a hardware failure happens in the disk, qemu-kvm can't create the chardevs. In this case, guest OS doesn't boot up. Actually, there are users who don't desire that guest OS goes down due to a hardware failure of a log disk only. Therefore, qemu should offer some way to boot guest OS up even if the log disk is broken. [Solution] This patch supports startupPolicy for chardev. The starupPolicy is introduced just in cases where chardev is "file" because this patch aims for making guest OS boot up when a hardware failure happens. In other cases (pty, dev, pipe and unix) it is not introduced because they don't access to hardware. The policy works as follows. - If the value is "optional", guestOS boots up by dropping the chardev. - If other values are specified, guestOS fails to boot up. (the default) Description about original startupPolicy attribute: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=e5a84d74a278 Signed-off-by: Seiji Aguchi <sei...@hd...> Signed-off-by: Eric Blake <eb...@re...> --- Seiji's patch plus my initial review comments; I've ack'd this much, but want to add a round-trip xml2xml test before pushing anything (Seiji, if you want to use this as a starting point, rather than waiting for my long weekend, feel free to post a v3). docs/formatdomain.html.in | 15 ++++++++++++++- docs/schemas/domaincommon.rng | 3 +++ src/conf/domain_conf.c | 8 ++++++++ src/conf/domain_conf.h | 1 + src/qemu/qemu_process.c | 23 ++++++++++++++++------- tests/virt-aa-helper-test | 3 +++ 6 files changed, 45 insertions(+), 8 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 755d084..299fa49 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -4212,13 +4212,26 @@ qemu-kvm -net nic,model=? /dev/null <p> A file is opened and all data sent to the character device is written to the file. + <span class="since">Since 1.0.6</span>, it is possible to define + policy on what happens if the file is not accessible when + booting or migrating. This is done by + a <code>startupPolicy</code> attribute: </p> + <ul> + <li>If the value is "mandatory" (the default), the guest fails + to boot or migrate if the file is not found.</li> + <li>If the value is "optional", a missing file is at boot or + migration is substituted with /dev/null, so the guest still sees + the device but the host no longer tracks guest data on the device.</li> + <li>If the value is "requisite", the file is required for + booting, but optional on migration.</li> + </ul> <pre> ... <devices> <serial type="file"> - <source path="/var/log/vm/vm-serial.log"/> + <source path="/var/log/vm/vm-serial.log" startupPolicy="optional"/> <target port="1"/> </serial> </devices> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 3cace35..e8eec6c 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2780,6 +2780,9 @@ </optional> <optional> <attribute name="path"/> + <optional> + <ref name='startupPolicy'/> + </optional> </optional> <optional> <attribute name="host"/> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a9656af..c24a9f0 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6626,6 +6626,7 @@ virDomainChrSourceDefParseXML(virDomainChrSourceDefPtr def, char *path = NULL; char *mode = NULL; char *protocol = NULL; + char *startupPolicy = NULL; int remaining = 0; while (cur != NULL) { @@ -6646,6 +6647,9 @@ virDomainChrSourceDefParseXML(virDomainChrSourceDefPtr def, !(flags & VIR_DOMAIN_XML_INACTIVE))) path = virXMLPropString(cur, "path"); + if (startupPolicy == NULL && + def->type == VIR_DOMAIN_CHR_TYPE_FILE) + startupPolicy = virXMLPropString(cur, "startupPolicy"); break; case VIR_DOMAIN_CHR_TYPE_UDP: @@ -6718,6 +6722,10 @@ virDomainChrSourceDefParseXML(virDomainChrSourceDefPtr def, def->data.file.path = path; path = NULL; + + def->data.file.startupPolicy = + virDomainStartupPolicyTypeFromString(startupPolicy); + startupPolicy = NULL; break; case VIR_DOMAIN_CHR_TYPE_STDIO: diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3a71d6c..4d49dd5 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1074,6 +1074,7 @@ struct _virDomainChrSourceDef { /* no <source> for null, vc, stdio */ struct { char *path; + int startupPolicy; /* enum virDomainStartupPolicy */ } file; /* pty, file, pipe, or device */ struct { char *host; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d4fd4fb..154445d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2439,15 +2439,24 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED, return 0; if ((fd = open(dev->source.data.file.path, - O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) { - virReportSystemError(errno, - _("Unable to pre-create chardev file '%s'"), - dev->source.data.file.path); - return -1; + O_CREAT | O_APPEND | O_RDONLY, S_IRUSR|S_IWUSR)) < 0) { + if (dev->source.data.file.startupPolicy != + VIR_DOMAIN_STARTUP_POLICY_OPTIONAL) { + virReportSystemError(errno, + _("Unable to pre-create chardev file '%s'"), + dev->source.data.file.path); + return -1; + } + /* Change destination to /dev/null to work around missing file. */ + VIR_WARN("chardev file '%s' not found, switching to /dev/null", + dev->source.data.file.path); + VIR_FREE(dev->source.data.file.path); + if (VIR_STRDUP(dev->source.data.file.path, "/dev/null") < 0) + return -1; + } else { + VIR_FORCE_CLOSE(fd); } - VIR_FORCE_CLOSE(fd); - return 0; } diff --git a/tests/virt-aa-helper-test b/tests/virt-aa-helper-test index af91c61..7172fd6 100755 --- a/tests/virt-aa-helper-test +++ b/tests/virt-aa-helper-test @@ -255,6 +255,9 @@ testme "0" "disk (empty cdrom)" "-r -u $valid_uuid" "$test_xml" sed -e "s,###UUID###,$uuid,g" -e "s,###DISK###,$disk1,g" -e "s,</devices>,<serial type='file'><source path='$tmpdir/serial.log'/><target port='0'/></serial></devices>,g" "$template_xml" > "$test_xml" testme "0" "serial" "-r -u $valid_uuid" "$test_xml" +sed -e "s,###UUID###,$uuid,g" -e "s,###DISK###,$disk1,g" -e "s,</devices>,<serial type='file'><source path='$tmpdir/serial.log' startupPolicy='optional'/><target port='0'/></serial></devices>,g" "$template_xml" > "$test_xml" +testme "0" "serial" "-r -u $valid_uuid" "$test_xml" + sed -e "s,###UUID###,$uuid,g" -e "s,###DISK###,$disk1,g" -e "s,</devices>,<serial type='pty'><target port='0'/></serial></devices>,g" "$template_xml" > "$test_xml" testme "0" "serial (pty)" "-r -u $valid_uuid" "$test_xml" -- 1.8.1.4 |
|
From: Eric B. <eb...@re...> - 2013-05-24 22:38:38
|
On 05/02/2013 11:26 AM, Seiji Aguchi wrote: The subject line is a bit long for a patch. 'git shortlog -30' will give you an idea of typical subject lines; I'm modifying this to: xml: introduce startupPolicy for chardev device > [Problem] > Currently, guest OS's messages can be logged to a local disk of host OS > by creating chadevs with options below. > -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0 This long line is okay (quoting a command line)... > > When a hardware failure happens in the disk, qemu-kvm can't create the chardevs. > In this case, guest OS doesn't boot up. > > Actually, there are users who don't desire that guest OS goes down due to a hardware failure > of a log disk only.Therefore, qemu should offer some way to boot guest OS up even if the log ...but here, wrapping below 80 characters is desirable. Also, space after '.' ending a sentence, in English. So I rewrapped it. > disk is broken. > > [Solution] > This patch supports startupPolicy for chardev. > > The starupPolicy is introduced just in case where chardev is "file" s/case/cases/ > because this patch aims for making guest OS boot up when a hardware failure happens. > > In other cases ,pty, dev, pipe and unix, it is not introduced > because they don't access to hardware. > > The policy works as follows. > - If the value is "optional", guestOS boots up by dropping the chardev. > - If other values are specified, guestOS fails to boot up. (the default) > > Description about original startupPolicy attribute: > http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=e5a84d74a2789a917bf394f15de9989ec48fded0 > > Signed-off-by: Seiji Aguchi <sei...@hd...> > --- > docs/formatdomain.html.in | 9 ++++++++- > docs/schemas/domaincommon.rng | 3 +++ Good - adding documentation alongside the feature. > src/conf/domain_conf.c | 8 ++++++++ > src/conf/domain_conf.h | 1 + > src/qemu/qemu_process.c | 25 ++++++++++++++++++++++++- It's a bit nicer to split the XML addition and a driver's implementation of the XML (makes backporting easier if we later implement in a different driver, and want to backport the second implementation without the first); but this patch is small enough that I'm not too worried about it. > tests/virt-aa-helper-test | 3 +++ This added one test, but not quite enough. I generally also expect a .xml and .args that shows the new XML in use, and that there is no corresponding change to a generated qemu command line (as the automatic dropping is handled solely by libvirt reworking the XML). I can probably do that on your behalf. > 6 files changed, 47 insertions(+), 2 deletions(-) > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index f325c3c..1e1bf27 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -4044,13 +4044,20 @@ qemu-kvm -net nic,model=? /dev/null > <p> > A file is opened and all data sent to the character > device is written to the file. > + It is possible to define policy whether guestOS boots up s/guestOS/the guest/ > + if the file is not accessible. This is done by a startupPolicy > + attribute: > + <ul> Fails 'make check' with: Validating formatdomain.html formatdomain.html.tmp:4516: element p: validity error : Element ul is not declared in p list of possible children </p> ^ Solved by closing the <p> before starting the <ul>, or ditching <ul> altogether. > + <li>If the vaule is "optional", guestOS boots up by dropping the file.</li> s/vaule/value/; s/guestOS/guest/ > + <li>If other values are specified, guestOS fails to boot up. (the default)</li> > + </ul> Needs a "since 1.0.6" notation. Rather than leaving it as 'optional' and a nebulous 'anything else', I'd rather we match existing practice, and specifically call out the same states as elsewhere ('optional'|'requisite'|'mandatory'); especially since your RNG change reused that same enum. > +++ b/src/qemu/qemu_process.c > @@ -2442,7 +2442,30 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED, > virReportSystemError(errno, > _("Unable to pre-create chardev file '%s'"), > dev->source.data.file.path); > - return -1; > + if (dev->source.data.file.startupPolicy != > + VIR_DOMAIN_STARTUP_POLICY_OPTIONAL) { > + return -1; > + } This issues an error even when policy said we want a replacement and thus succeed; we only want to issue an error when we can't succeed, or else clear the error when we try the fallback. > + VIR_FREE(dev->source.data.file.path); > + /* > + * Change a destination to /dev/null to boot guest OS up > + * even if a log disk is broken. Not the typical comment style, and should come before we release the old name. > + */ > + VIR_WARN("Switch the destination to /dev/null"); > + dev->source.data.file.path = strdup("/dev/null"); > + > + if (!(dev->source.data.file.path)) { > + virReportOOMError(); > + return -1; > + } VIR_STRDUP has gone in since you posted, simplifying this. > + > + if ((fd = open(dev->source.data.file.path, > + O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) { O_CREAT and O_APPEND are pointless now that we know we are opening /dev/null. POSIX requires that you use at least one of O_RDONLY, O_WRONLY, or O_RDWR (yes, I know you were copying existing bad code, but no need to repeat that mistake). Just because O_RDONLY is 0 on Linux doesn't make it correct to omit an access mode in your code. For that matter, we KNOW that /dev/null already exists (POSIX says it does, and any system where it doesn't has lots more problems on its hand), so we can exit early as soon as we've malloc'd the replacement file name. > + virReportSystemError(errno, > + _("Unable to pre-create chardev file '%s'"), Since you've already changed the file name, this error message is not as useful as if it were the user's original file name; but changing that is a bit harder. Here's what I've changed so far; I won't push until I've also added a testcase that proves xml2xml round-trip works on the new attribute (but it's late for me on a holiday weekend, so that might not be until Tuesday). Since I've reviewed the patch prior to code freeze for 1.0.6, I think it will be okay pushing next week even if I don't finish the testsuite addition before DV does the freeze. Thanks again for a first time contribution, and apologies for the delays in getting it reviewed (I think others will agree with me that we've had a LOT of list traffic lately). Interdiff to your original mail (I'll send the complete patch, as-is, as a reply). diff --git i/docs/formatdomain.html.in w/docs/formatdomain.html.in index 92fc496..299fa49 100644 --- i/docs/formatdomain.html.in +++ w/docs/formatdomain.html.in @@ -4212,14 +4212,20 @@ qemu-kvm -net nic,model=? /dev/null <p> A file is opened and all data sent to the character device is written to the file. - It is possible to define policy whether guestOS boots up - if the file is not accessible. This is done by a startupPolicy - attribute: - <ul> - <li>If the vaule is "optional", guestOS boots up by dropping the file.</li> - <li>If other values are specified, guestOS fails to boot up. (the default)</li> - </ul> + <span class="since">Since 1.0.6</span>, it is possible to define + policy on what happens if the file is not accessible when + booting or migrating. This is done by + a <code>startupPolicy</code> attribute: </p> + <ul> + <li>If the value is "mandatory" (the default), the guest fails + to boot or migrate if the file is not found.</li> + <li>If the value is "optional", a missing file is at boot or + migration is substituted with /dev/null, so the guest still sees + the device but the host no longer tracks guest data on the device.</li> + <li>If the value is "requisite", the file is required for + booting, but optional on migration.</li> + </ul> <pre> ... diff --git i/src/qemu/qemu_process.c w/src/qemu/qemu_process.c index 1bc3305..154445d 100644 --- i/src/qemu/qemu_process.c +++ w/src/qemu/qemu_process.c @@ -2439,38 +2439,24 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED, return 0; if ((fd = open(dev->source.data.file.path, - O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) { - virReportSystemError(errno, - _("Unable to pre-create chardev file '%s'"), - dev->source.data.file.path); + O_CREAT | O_APPEND | O_RDONLY, S_IRUSR|S_IWUSR)) < 0) { if (dev->source.data.file.startupPolicy != VIR_DOMAIN_STARTUP_POLICY_OPTIONAL) { - return -1; - } - VIR_FREE(dev->source.data.file.path); - /* - * Change a destination to /dev/null to boot guest OS up - * even if a log disk is broken. - */ - VIR_WARN("Switch the destination to /dev/null"); - dev->source.data.file.path = strdup("/dev/null"); - - if (!(dev->source.data.file.path)) { - virReportOOMError(); - return -1; - } - - if ((fd = open(dev->source.data.file.path, - O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) { virReportSystemError(errno, _("Unable to pre-create chardev file '%s'"), dev->source.data.file.path); return -1; } + /* Change destination to /dev/null to work around missing file. */ + VIR_WARN("chardev file '%s' not found, switching to /dev/null", + dev->source.data.file.path); + VIR_FREE(dev->source.data.file.path); + if (VIR_STRDUP(dev->source.data.file.path, "/dev/null") < 0) + return -1; + } else { + VIR_FORCE_CLOSE(fd); } - VIR_FORCE_CLOSE(fd); - return 0; } -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org |
|
From: Seiji A. <sei...@hd...> - 2013-05-24 13:29:56
|
> This as a separate patch actually makes things more confusing to review.
> It should be merged into the previous patch. If you want to break up the
> changes, I would first add the entering_irq(), and exiting_irq() as
> patch 1, and then do the rest of the changes in patch 2.
Thanks. I will update this patchset as above.
Seiji
> -----Original Message-----
> From: Steven Rostedt [mailto:ro...@go...]
> Sent: Thursday, May 23, 2013 3:55 PM
> To: Seiji Aguchi
> Cc: lin...@vg...; x8...@ke...; hp...@zy...; Thomas Gleixner (tg...@li...); 'mi...@el...'
> (mi...@el...); Borislav Petkov (bp...@al...); lin...@vg...; Luck, Tony (ton...@in...); dle-
> de...@li...; Tomoki Sekiyama
> Subject: Re: [PATCH v12 3/3] trace,x86: code-sharing between non-trace and trace irq handlers
>
> On Fri, 2013-04-05 at 19:21 +0000, Seiji Aguchi wrote:
> > [Issue]
> >
> > Currently, irq vector handlers for tracing are just
> > copied non-trace handlers by simply inserting tracepoints.
> >
> > It is difficult to manage the codes.
> >
> > [Solution]
>
>
> This as a separate patch actually makes things more confusing to review.
> It should be merged into the previous patch. If you want to break up the
> changes, I would first add the entering_irq(), and exiting_irq() as
> patch 1, and then do the rest of the changes in patch 2.
>
> -- Steve
>
> > This patch shares common codes between non-trace and trace handlers
> > as follows to make them manageable and readable.
> >
> > Non-trace irq handler:
> > smp_irq_handler()
> > {
> > entering_irq(); /* pre-processing of this handler */
> > __smp_irq_handler(); /*
> > * common logic between non-trace and trace handlers
> > * in a vector.
> > */
> > exiting_irq(); /* post-processing of this handler */
> >
> > }
> >
> > Trace irq_handler:
> > smp_trace_irq_handler()
> > {
> > entering_irq(); /* pre-processing of this handler */
> > trace_irq_entry(); /* tracepoint for irq entry */
> > __smp_irq_handler(); /*
> > * common logic between non-trace and trace handlers
> > * in a vector.
> > */
> > trace_irq_exit(); /* tracepoint for irq exit */
> > exiting_irq(); /* post-processing of this handler */
> >
> > }
> >
> > If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks \
> > cleaner.
> >
> > smp_trace_irq_handler()
> > {
> > trace_irq_entry();
> > smp_irq_handler();
> > trace_irq_exit();
> > }
> >
> > But it doesn't work.
> > The problem is with irq_enter/exit() being called. They must be called before \
> > trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any \
> > tracepoints are used, as tracepoints use rcu to synchronize.
> >
> > As a possible alternative, we may be able to call irq_enter() first as follows if \
> > irq_enter() can nest.
> >
> > smp_trace_irq_hander()
> > {
> > irq_entry();
> > trace_irq_entry();
> > smp_irq_handler();
> > trace_irq_exit();
> > irq_exit();
> > }
> >
> > But it doesn't work, either.
> > If irq_enter() is nested, it may have a time penalty because it has to check if it \
> > was already called or not. The time penalty is not desired in performance sensitive \
> > paths even if it is tiny.
> >
> > Signed-off-by: Seiji Aguchi <sei...@hd...>
>
|
|
From: Seiji A. <sei...@hd...> - 2013-05-24 13:29:40
|
> > @@ -1156,8 +1156,14 @@ void debug_stack_reset(void)
Thank you for reviewing.
> > {
> > if (WARN_ON(!this_cpu_read(debug_stack_use_ctr)))
> > return;
> > - if (this_cpu_dec_return(debug_stack_use_ctr) == 0)
> > - load_idt((const struct desc_ptr *)&idt_descr);
> > + if (this_cpu_dec_return(debug_stack_use_ctr) == 0) {
> > +#ifdef CONFIG_TRACING
> > + if (this_cpu_read(trace_idt_in_use))
> > + load_idt((const struct desc_ptr *)&trace_idt_descr);
> > + else
> > +#endif
> > + load_idt((const struct desc_ptr *)&idt_descr);
> > + }
>
> I have a patch that makes this more generic. I'm wondering if that would
> be better than having everything so coupled.
>
Do you mean it should be discussed in another thread as you posted "[PATCH] x86: Have debug/nmi restore the IDT it replaced"?
Anyway, I will update my patch in accordance with your comments.
Seiji
> -----Original Message-----
> From: Steven Rostedt [mailto:ro...@go...]
> Sent: Thursday, May 23, 2013 3:52 PM
> To: Seiji Aguchi
> Cc: lin...@vg...; x8...@ke...; hp...@zy...; Thomas Gleixner (tg...@li...); 'mi...@el...'
> (mi...@el...); Borislav Petkov (bp...@al...); lin...@vg...; Luck, Tony (ton...@in...); dle-
> de...@li...; Tomoki Sekiyama
> Subject: Re: [PATCH v12 2/3] trace,x86: add x86 irq vector tracepoints
>
> On Fri, 2013-04-05 at 19:20 +0000, Seiji Aguchi wrote:
> > Signed-off-by: Seiji Aguchi <sei...@hd...>
> > ---
> > arch/x86/include/asm/desc.h | 36 +++++++-
> > arch/x86/include/asm/entry_arch.h | 5 +-
> > arch/x86/include/asm/hw_irq.h | 16 +++
> > arch/x86/include/asm/mshyperv.h | 1 +
> > arch/x86/include/asm/trace/irq_vectors.h | 159 ++++++++++++++++++++++++++++++
> > arch/x86/kernel/Makefile | 1 +
> > arch/x86/kernel/apic/apic.c | 94 ++++++++++++++++++
> > arch/x86/kernel/cpu/common.c | 12 ++-
> > arch/x86/kernel/cpu/mcheck/therm_throt.c | 14 +++
> > arch/x86/kernel/cpu/mcheck/threshold.c | 14 +++
> > arch/x86/kernel/entry_32.S | 12 ++-
> > arch/x86/kernel/entry_64.S | 29 +++++-
> > arch/x86/kernel/head_64.S | 6 +
> > arch/x86/kernel/irq.c | 23 +++++
> > arch/x86/kernel/irq_work.c | 12 +++
> > arch/x86/kernel/smp.c | 35 +++++++
> > arch/x86/kernel/tracepoint.c | 71 +++++++++++++
> > include/xen/events.h | 3 +
> > 18 files changed, 529 insertions(+), 14 deletions(-)
> > create mode 100644 arch/x86/include/asm/trace/irq_vectors.h
> > create mode 100644 arch/x86/kernel/tracepoint.c
> >
> > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> > index 8bf1c06..e718c72 100644
> > --- a/arch/x86/include/asm/desc.h
> > +++ b/arch/x86/include/asm/desc.h
> > @@ -39,6 +39,9 @@ extern gate_desc idt_table[];
> > extern struct desc_ptr nmi_idt_descr;
> > extern gate_desc nmi_idt_table[];
> >
> > +extern struct desc_ptr trace_idt_descr;
> > +extern DEFINE_PER_CPU(u32, trace_idt_in_use);
> > +extern DEFINE_PER_CPU(u32, debug_stack_use_ctr);
>
> Header files should use DECLARE_PER_CPU() not DEFINE_PER_CPU. And drop
> the "extern" when doing so, as it's implicit in the macro.
>
>
> > struct gdt_page {
> > struct desc_struct gdt[GDT_ENTRIES];
> > } __attribute__((aligned(PAGE_SIZE)));
> > @@ -320,6 +323,17 @@ static inline void set_nmi_gate(int gate, void *addr)
> > }
> > #endif
> >
> > +#ifdef CONFIG_TRACING
> > +extern gate_desc trace_idt_table[];
> > +static inline void trace_set_intr_gate(unsigned int gate, void *addr)
> > +{
> > + gate_desc s;
> > +
> > + pack_gate(&s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
> > + write_idt_entry(trace_idt_table, gate, &s);
> > +}
>
> Perhaps add a:
>
> static inline void write_trace_idt_entry(gate_desc *idt, int entry,
> const gate_desc *gate)
> {
> write_idt_entry(idt, entry, gate);
> }
> #else
> static inline void write_trace_idt_entry(gate_desc *idt, int entry,
> const gate_desc *gate)
> {
> }
> > +#endif
>
> Then you can below just do:
>
>
> > +
> > static inline void _set_gate(int gate, unsigned type, void *addr,
> > unsigned dpl, unsigned ist, unsigned seg)
> > {
> > @@ -331,6 +345,9 @@ static inline void _set_gate(int gate, unsigned type, void *addr,
> > * setup time
> > */
> > write_idt_entry(idt_table, gate, &s);
>
> write_trace_idt_entry(idt_table, gate, &s);
>
> and remove the #ifdef. Makes it nice to read.
>
> > +#ifdef CONFIG_TRACING
> > + write_idt_entry(trace_idt_table, gate, &s);
> > +#endif
> > }
> >
> > /*
> > @@ -360,12 +377,27 @@ static inline void alloc_system_vector(int vector)
> > }
> > }
> >
> > -static inline void alloc_intr_gate(unsigned int n, void *addr)
> > +#ifdef CONFIG_TRACING
> > +static inline void __trace_alloc_intr_gate(unsigned int n, void *addr)
> > +{
> > + trace_set_intr_gate(n, addr);
> > +}
> > +#else
> > +#define __trace_alloc_intr_gate(n, addr)
> > +#endif
> > +
> > +static inline void __alloc_intr_gate(unsigned int n, void *addr)
> > {
> > - alloc_system_vector(n);
> > set_intr_gate(n, addr);
> > }
> >
> > +#define alloc_intr_gate(n, addr) \
> > + do { \
> > + alloc_system_vector(n); \
> > + __alloc_intr_gate(n, addr); \
> > + __trace_alloc_intr_gate(n, trace_##addr); \
> > + } while (0)
> > +
> > /*
> > * This routine sets up an interrupt gate at directory privilege level 3.
> > */
> > diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
> > index 40afa00..0bb99d8 100644
> > --- a/arch/x86/include/asm/entry_arch.h
> > +++ b/arch/x86/include/asm/entry_arch.h
> > @@ -13,8 +13,9 @@
> > BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
> > BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
> > BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
> > -BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
> > -BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
> > +BUILD_INTERRUPT3(irq_move_cleanup_interrupt, IRQ_MOVE_CLEANUP_VECTOR,
> > + smp_irq_move_cleanup_interrupt)
> > +BUILD_INTERRUPT3(reboot_interrupt, REBOOT_VECTOR, smp_reboot_interrupt)
> > #endif
> >
> > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
> > diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> > index 10a78c3..e84e91a 100644
> > --- a/arch/x86/include/asm/hw_irq.h
> > +++ b/arch/x86/include/asm/hw_irq.h
> > @@ -76,6 +76,22 @@ extern void threshold_interrupt(void);
> > extern void call_function_interrupt(void);
> > extern void call_function_single_interrupt(void);
> >
> > +#ifdef CONFIG_TRACING
> > +/* Interrupt handlers registered during init_IRQ */
> > +extern void trace_apic_timer_interrupt(void);
> > +extern void trace_x86_platform_ipi(void);
> > +extern void trace_error_interrupt(void);
> > +extern void trace_irq_work_interrupt(void);
> > +extern void trace_spurious_interrupt(void);
> > +extern void trace_thermal_interrupt(void);
> > +extern void trace_reschedule_interrupt(void);
> > +extern void trace_threshold_interrupt(void);
> > +extern void trace_call_function_interrupt(void);
> > +extern void trace_call_function_single_interrupt(void);
> > +#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt
> > +#define trace_reboot_interrupt reboot_interrupt
> > +#endif /* CONFIG_TRACING */
> > +
> > /* IOAPIC */
> > #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs))
> > extern unsigned long io_apic_irqs;
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index c2934be..cc40e03 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -12,6 +12,7 @@ struct ms_hyperv_info {
> > extern struct ms_hyperv_info ms_hyperv;
> >
> > void hyperv_callback_vector(void);
> > +#define trace_hyperv_callback_vector hyperv_callback_vector
> > void hyperv_vector_handler(struct pt_regs *regs);
> > void hv_register_vmbus_handler(int irq, irq_handler_t handler);
> >
> > diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h
> > new file mode 100644
> > index 0000000..b4f1c53
> > --- /dev/null
> > +++ b/arch/x86/include/asm/trace/irq_vectors.h
> > @@ -0,0 +1,159 @@
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM irq_vectors
> > +
> > +#if !defined(_TRACE_IRQ_VECTORS_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_IRQ_VECTORS_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +extern void trace_irq_vector_regfunc(void);
> > +extern void trace_irq_vector_unregfunc(void);
> > +
> > +DECLARE_EVENT_CLASS(x86_irq_vector,
> > +
> > + TP_PROTO(int vector),
> > +
> > + TP_ARGS(vector),
> > +
> > + TP_STRUCT__entry(
> > + __field( int, vector )
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->vector = vector;
> > + ),
> > +
> > + TP_printk("vector=%d", __entry->vector) );
> > +
> > +#define DEFINE_IRQ_VECTOR_EVENT(name) \
> > +DEFINE_EVENT_FN(x86_irq_vector, name, \
> > + TP_PROTO(int vector), \
> > + TP_ARGS(vector), \
> > + trace_irq_vector_regfunc, \
> > + trace_irq_vector_unregfunc);
> > +
> > +/*
> > + * local_timer_entry - called before entering a local timer interrupt
> > + * vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(local_timer_entry);
> > +
> > +/*
> > + * local_timer_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(local_timer_exit);
>
> Looks like you can save on repeat code by doing the following:
>
> #define DEFINE_IRQ_VECTOR_EVENT(name) \
> DEFINE_EVENT_FN(x86_irq_vector, name##_entry, \
> TP_PROTO(int vector), \
> TP_ARGS(vector), \
> trace_irq_vector_regfunc, \
> trace_irq_vector_unregfunc); \
> DEFINE_EVENT_FN(x86_irq_vector, name##_exit, \
> TP_PROTO(int vector), \
> TP_ARGS(vector), \
> trace_irq_vector_regfunc, \
> trace_irq_vector_unregfunc); \
>
> Then just do for each:
>
> DEFINE_IRQ_VECTOR_EVENT(local_timer);
>
> [...]
>
> > +
> > +/*
> > + * reschedule_entry - called before entering a reschedule vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(reschedule_entry);
> > +
> > +/*
> > + * reschedule_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(reschedule_exit);
> > +
> > +/*
> > + * spurious_apic_entry - called before entering a spurious apic vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(spurious_apic_entry);
> > +
> > +/*
> > + * spurious_apic_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(spurious_apic_exit);
> > +
> > +/*
> > + * error_apic_entry - called before entering an error apic vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(error_apic_entry);
> > +
> > +/*
> > + * error_apic_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(error_apic_exit);
> > +
> > +/*
> > + * x86_platform_ipi_entry - called before entering a x86 platform ipi interrupt
> > + * vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi_entry);
> > +
> > +/*
> > + * x86_platform_ipi_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi_exit);
> > +
> > +/*
> > + * irq_work_entry - called before entering a irq work interrupt
> > + * vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(irq_work_entry);
> > +
> > +/*
> > + * irq_work_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(irq_work_exit);
> > +
> > +/*
> > + * call_function_entry - called before entering a call function interrupt
> > + * vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(call_function_entry);
> > +
> > +/*
> > + * call_function_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(call_function_exit);
> > +
> > +/*
> > + * call_function_single_entry - called before entering a call function
> > + * single interrupt vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(call_function_single_entry);
> > +
> > +/*
> > + * call_function_single_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(call_function_single_exit);
> > +
> > +/*
> > + * threshold_apic_entry - called before entering a threshold apic interrupt
> > + * vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(threshold_apic_entry);
> > +
> > +/*
> > + * threshold_apic_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(threshold_apic_exit);
> > +
> > +/*
> > + * thermal_apic_entry - called before entering a thermal apic interrupt
> > + * vector handler
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(thermal_apic_entry);
> > +
> > +/*
> > + * thrmal_apic_exit - called immediately after the interrupt vector
> > + * handler returns
> > + */
> > +DEFINE_IRQ_VECTOR_EVENT(thermal_apic_exit);
> > +
> > +#undef TRACE_INCLUDE_PATH
> > +#define TRACE_INCLUDE_PATH ../../arch/x86/include/asm/trace
>
> Hmm, this is dangerous. It's better to use the Makefile.
>
> here do:
>
> #define TRACE_INCLUDE_PATH .
>
> > +#define TRACE_INCLUDE_FILE irq_vectors
> > +#endif /* _TRACE_IRQ_VECTORS_H */
>
> Then in arch/x86/kernel/apic/Makefile:
>
> CFLAGS_apic.o := -I$(src)
>
> > +
> > +/* This part must be outside protection */
> > +#include <trace/define_trace.h>
> > +
> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> > index 7bd3bd3..74b3891 100644
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -102,6 +102,7 @@ obj-$(CONFIG_OF) += devicetree.o
> > obj-$(CONFIG_UPROBES) += uprobes.o
> >
> > obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
> > +obj-$(CONFIG_TRACING) += tracepoint.o
> >
> > ###
> > # 64 bit specific files
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 904611b..7d39c68 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -55,6 +55,9 @@
> > #include <asm/tsc.h>
> > #include <asm/hypervisor.h>
> >
> > +#define CREATE_TRACE_POINTS
> > +#include <asm/trace/irq_vectors.h>
> > +
> > unsigned int num_processors;
> >
> > unsigned disabled_cpus __cpuinitdata;
> > @@ -934,6 +937,30 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
> > set_irq_regs(old_regs);
> > }
> >
> > +void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs)
> > +{
> > + struct pt_regs *old_regs = set_irq_regs(regs);
> > +
> > + /*
> > + * NOTE! We'd better ACK the irq immediately,
> > + * because timer handling can be slow.
> > + */
> > + ack_APIC_irq();
> > + /*
> > + * update_process_times() expects us to have done irq_enter().
> > + * Besides, if we don't timer interrupts ignore the global
> > + * interrupt lock, which is the WrongThing (tm) to do.
> > + */
> > + irq_enter();
> > + exit_idle();
> > + trace_local_timer_entry(LOCAL_TIMER_VECTOR);
> > + local_apic_timer_interrupt();
> > + trace_local_timer_exit(LOCAL_TIMER_VECTOR);
> > + irq_exit();
> > +
> > + set_irq_regs(old_regs);
> > +}
> > +
> > int setup_profiling_timer(unsigned int multiplier)
> > {
> > return -EINVAL;
> > @@ -1930,6 +1957,31 @@ void smp_spurious_interrupt(struct pt_regs *regs)
> > irq_exit();
> > }
> >
> > +void smp_trace_spurious_interrupt(struct pt_regs *regs)
> > +{
> > + u32 v;
> > +
> > + irq_enter();
> > + exit_idle();
> > + trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR);
> > + /*
> > + * Check if this really is a spurious interrupt and ACK it
> > + * if it is a vectored one. Just in case...
> > + * Spurious interrupts should not be ACKed.
> > + */
> > + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));
> > + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
> > + ack_APIC_irq();
> > +
> > + inc_irq_stat(irq_spurious_count);
> > +
> > + /* see sw-dev-man vol 3, chapter 7.4.13.5 */
> > + pr_info("spurious APIC interrupt on CPU#%d, "
> > + "should never happen.\n", smp_processor_id());
> > + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR);
> > + irq_exit();
> > +}
> > +
> > /*
> > * This interrupt should never happen with our APIC/SMP architecture
> > */
> > @@ -1973,6 +2025,48 @@ void smp_error_interrupt(struct pt_regs *regs)
> > irq_exit();
> > }
> >
> > +void smp_trace_error_interrupt(struct pt_regs *regs)
> > +{
> > + u32 v0, v1;
> > + u32 i = 0;
> > + static const char * const error_interrupt_reason[] = {
> > + "Send CS error", /* APIC Error Bit 0 */
> > + "Receive CS error", /* APIC Error Bit 1 */
> > + "Send accept error", /* APIC Error Bit 2 */
> > + "Receive accept error", /* APIC Error Bit 3 */
> > + "Redirectable IPI", /* APIC Error Bit 4 */
> > + "Send illegal vector", /* APIC Error Bit 5 */
> > + "Received illegal vector", /* APIC Error Bit 6 */
> > + "Illegal register address", /* APIC Error Bit 7 */
> > + };
> > +
> > + irq_enter();
> > + exit_idle();
> > + trace_error_apic_entry(ERROR_APIC_VECTOR);
> > + /* First tickle the hardware, only then report what went on. -- REW */
> > + v0 = apic_read(APIC_ESR);
> > + apic_write(APIC_ESR, 0);
> > + v1 = apic_read(APIC_ESR);
> > + ack_APIC_irq();
> > + atomic_inc(&irq_err_count);
> > +
> > + apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
> > + smp_processor_id(), v0 , v1);
> > +
> > + v1 = v1 & 0xff;
> > + while (v1) {
> > + if (v1 & 0x1)
> > + apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]);
> > + i++;
> > + v1 >>= 1;
> > + }
> > +
> > + apic_printk(APIC_DEBUG, KERN_CONT "\n");
> > +
> > + trace_error_apic_exit(ERROR_APIC_VECTOR);
> > + irq_exit();
> > +}
> > +
> > /**
> > * connect_bsp_APIC - attach the APIC to the interrupt system
> > */
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index d814772..23045cd 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1144,7 +1144,7 @@ int is_debug_stack(unsigned long addr)
> > addr > (__get_cpu_var(debug_stack_addr) - DEBUG_STKSZ));
> > }
> >
> > -static DEFINE_PER_CPU(u32, debug_stack_use_ctr);
> > +DEFINE_PER_CPU(u32, debug_stack_use_ctr);
> >
> > void debug_stack_set_zero(void)
> > {
> > @@ -1156,8 +1156,14 @@ void debug_stack_reset(void)
> > {
> > if (WARN_ON(!this_cpu_read(debug_stack_use_ctr)))
> > return;
> > - if (this_cpu_dec_return(debug_stack_use_ctr) == 0)
> > - load_idt((const struct desc_ptr *)&idt_descr);
> > + if (this_cpu_dec_return(debug_stack_use_ctr) == 0) {
> > +#ifdef CONFIG_TRACING
> > + if (this_cpu_read(trace_idt_in_use))
> > + load_idt((const struct desc_ptr *)&trace_idt_descr);
> > + else
> > +#endif
> > + load_idt((const struct desc_ptr *)&idt_descr);
> > + }
>
> I have a patch that makes this more generic. I'm wondering if that would
> be better than having everything so coupled.
>
> > }
> >
> > #else /* CONFIG_X86_64 */
> > diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > index 47a1870..e7aa7fc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> > @@ -29,6 +29,7 @@
> > #include <asm/idle.h>
> > #include <asm/mce.h>
> > #include <asm/msr.h>
> > +#include <asm/trace/irq_vectors.h>
> >
> > /* How long to wait between reporting thermal events */
> > #define CHECK_INTERVAL (300 * HZ)
> > @@ -389,6 +390,19 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
> > ack_APIC_irq();
> > }
> >
> > +asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs)
> > +{
> > + irq_enter();
> > + exit_idle();
> > + trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
> > + inc_irq_stat(irq_thermal_count);
> > + smp_thermal_vector();
> > + trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
> > + irq_exit();
> > + /* Ack only at the end to avoid potential reentry */
> > + ack_APIC_irq();
> > +}
> > +
> > /* Thermal monitoring depends on APIC, ACPI and clock modulation */
> > static int intel_thermal_supported(struct cpuinfo_x86 *c)
> > {
> > diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
> > index aa578ca..0cbef99 100644
> > --- a/arch/x86/kernel/cpu/mcheck/threshold.c
> > +++ b/arch/x86/kernel/cpu/mcheck/threshold.c
> > @@ -8,6 +8,7 @@
> > #include <asm/apic.h>
> > #include <asm/idle.h>
> > #include <asm/mce.h>
> > +#include <asm/trace/irq_vectors.h>
> >
> > static void default_threshold_interrupt(void)
> > {
> > @@ -27,3 +28,16 @@ asmlinkage void smp_threshold_interrupt(void)
> > /* Ack only at the end to avoid potential reentry */
> > ack_APIC_irq();
> > }
> > +
> > +asmlinkage void smp_trace_threshold_interrupt(void)
> > +{
> > + irq_enter();
> > + exit_idle();
> > + trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
> > + inc_irq_stat(irq_threshold_count);
> > + mce_threshold_vector();
> > + trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
> > + irq_exit();
> > + /* Ack only at the end to avoid potential reentry */
> > + ack_APIC_irq();
> > +}
> > diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> > index 8f3e2de..2cfbc3a 100644
> > --- a/arch/x86/kernel/entry_32.S
> > +++ b/arch/x86/kernel/entry_32.S
> > @@ -801,7 +801,17 @@ ENTRY(name) \
> > CFI_ENDPROC; \
> > ENDPROC(name)
> >
> > -#define BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(name, nr, smp_##name)
> > +
> > +#ifdef CONFIG_TRACING
> > +#define TRACE_BUILD_INTERRUPT(name, nr) \
> > + BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name)
> > +#else
> > +#define TRACE_BUILD_INTERRUPT(name, nr)
> > +#endif
> > +
> > +#define BUILD_INTERRUPT(name, nr) \
> > + BUILD_INTERRUPT3(name, nr, smp_##name); \
> > + TRACE_BUILD_INTERRUPT(name, nr)
> >
> > /* The include is where all of the SMP etc. interrupts come from */
> > #include <asm/entry_arch.h>
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index c1d01e6..4570477 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -1138,7 +1138,7 @@ END(common_interrupt)
> > /*
> > * APIC interrupts.
> > */
> > -.macro apicinterrupt num sym do_sym
> > +.macro apicinterrupt3 num sym do_sym
> > ENTRY(\sym)
> > INTR_FRAME
> > ASM_CLAC
> > @@ -1150,15 +1150,32 @@ ENTRY(\sym)
> > END(\sym)
> > .endm
> >
> > +#ifdef CONFIG_TRACING
> > +#define trace(sym) trace_##sym
> > +#define smp_trace(sym) smp_trace_##sym
> > +
> > +.macro trace_apicinterrupt num sym
> > +apicinterrupt3 \num trace(\sym) smp_trace(\sym)
> > +.endm
> > +#else
> > +.macro trace_apicinterrupt num sym do_sym
> > +.endm
> > +#endif
> > +
> > +.macro apicinterrupt num sym do_sym
> > +apicinterrupt3 \num \sym \do_sym
> > +trace_apicinterrupt \num \sym
> > +.endm
> > +
> > #ifdef CONFIG_SMP
> > -apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \
> > +apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR \
> > irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt
> > -apicinterrupt REBOOT_VECTOR \
> > +apicinterrupt3 REBOOT_VECTOR \
> > reboot_interrupt smp_reboot_interrupt
> > #endif
> >
> > #ifdef CONFIG_X86_UV
> > -apicinterrupt UV_BAU_MESSAGE \
> > +apicinterrupt3 UV_BAU_MESSAGE \
> > uv_bau_message_intr1 uv_bau_message_interrupt
> > #endif
> > apicinterrupt LOCAL_TIMER_VECTOR \
> > @@ -1446,13 +1463,13 @@ ENTRY(xen_failsafe_callback)
> > CFI_ENDPROC
> > END(xen_failsafe_callback)
> >
> > -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \
> > +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
> > xen_hvm_callback_vector xen_evtchn_do_upcall
> >
> > #endif /* CONFIG_XEN */
> >
> > #if IS_ENABLED(CONFIG_HYPERV)
> > -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \
> > +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
> > hyperv_callback_vector hyperv_vector_handler
> > #endif /* CONFIG_HYPERV */
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index 6859e96..e827e67 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -518,6 +518,12 @@ ENTRY(idt_table)
> > ENTRY(nmi_idt_table)
> > .skip IDT_ENTRIES * 16
> >
> > +#ifdef CONFIG_TRACING
> > + .align L1_CACHE_BYTES
> > +ENTRY(trace_idt_table)
> > + .skip IDT_ENTRIES * 16
> > +#endif
> > +
> > __PAGE_ALIGNED_BSS
> > NEXT_PAGE(empty_zero_page)
> > .skip PAGE_SIZE
> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> > index e4595f1..216bec1 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> > @@ -17,6 +17,7 @@
> > #include <asm/idle.h>
> > #include <asm/mce.h>
> > #include <asm/hw_irq.h>
> > +#include <asm/trace/irq_vectors.h>
> >
> > atomic_t irq_err_count;
> >
> > @@ -228,6 +229,28 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
> > set_irq_regs(old_regs);
> > }
> >
> > +void smp_trace_x86_platform_ipi(struct pt_regs *regs)
> > +{
> > + struct pt_regs *old_regs = set_irq_regs(regs);
> > +
> > + ack_APIC_irq();
> > +
> > + irq_enter();
> > +
> > + exit_idle();
> > +
> > + trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR);
> > + inc_irq_stat(x86_platform_ipis);
> > +
> > + if (x86_platform_ipi_callback)
> > + x86_platform_ipi_callback();
> > +
> > + trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR);
> > + irq_exit();
> > +
> > + set_irq_regs(old_regs);
> > +}
> > +
> > EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
> >
> > #ifdef CONFIG_HOTPLUG_CPU
> > diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
> > index ca8f703..09e6262 100644
> > --- a/arch/x86/kernel/irq_work.c
> > +++ b/arch/x86/kernel/irq_work.c
> > @@ -8,6 +8,7 @@
> > #include <linux/irq_work.h>
> > #include <linux/hardirq.h>
> > #include <asm/apic.h>
> > +#include <asm/trace/irq_vectors.h>
> >
> > void smp_irq_work_interrupt(struct pt_regs *regs)
> > {
> > @@ -18,6 +19,17 @@ void smp_irq_work_interrupt(struct pt_regs *regs)
> > irq_exit();
> > }
> >
> > +void smp_trace_irq_work_interrupt(struct pt_regs *regs)
> > +{
> > + irq_enter();
> > + ack_APIC_irq();
> > + trace_irq_work_entry(IRQ_WORK_VECTOR);
> > + inc_irq_stat(apic_irq_work_irqs);
> > + irq_work_run();
> > + trace_irq_work_exit(IRQ_WORK_VECTOR);
> > + irq_exit();
> > +}
> > +
> > void arch_irq_work_raise(void)
> > {
> > #ifdef CONFIG_X86_LOCAL_APIC
> > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> > index 48d2b7d..aad58af 100644
> > --- a/arch/x86/kernel/smp.c
> > +++ b/arch/x86/kernel/smp.c
> > @@ -30,6 +30,7 @@
> > #include <asm/proto.h>
> > #include <asm/apic.h>
> > #include <asm/nmi.h>
> > +#include <asm/trace/irq_vectors.h>
> > /*
> > * Some notes on x86 processor bugs affecting SMP operation:
> > *
> > @@ -259,6 +260,18 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
> > */
> > }
> >
> > +void smp_trace_reschedule_interrupt(struct pt_regs *regs)
> > +{
> > + ack_APIC_irq();
> > + trace_reschedule_entry(RESCHEDULE_VECTOR);
> > + inc_irq_stat(irq_resched_count);
> > + scheduler_ipi();
> > + trace_reschedule_exit(RESCHEDULE_VECTOR);
> > + /*
> > + * KVM uses this interrupt to force a cpu out of guest mode
> > + */
> > +}
> > +
> > void smp_call_function_interrupt(struct pt_regs *regs)
> > {
> > ack_APIC_irq();
> > @@ -268,6 +281,17 @@ void smp_call_function_interrupt(struct pt_regs *regs)
> > irq_exit();
> > }
> >
> > +void smp_trace_call_function_interrupt(struct pt_regs *regs)
> > +{
> > + ack_APIC_irq();
> > + irq_enter();
> > + trace_call_function_entry(CALL_FUNCTION_VECTOR);
> > + generic_smp_call_function_interrupt();
> > + inc_irq_stat(irq_call_count);
> > + trace_call_function_exit(CALL_FUNCTION_VECTOR);
> > + irq_exit();
> > +}
> > +
> > void smp_call_function_single_interrupt(struct pt_regs *regs)
> > {
> > ack_APIC_irq();
> > @@ -277,6 +301,17 @@ void smp_call_function_single_interrupt(struct pt_regs *regs)
> > irq_exit();
> > }
> >
> > +void smp_trace_call_function_single_interrupt(struct pt_regs *regs)
> > +{
> > + ack_APIC_irq();
> > + irq_enter();
> > + trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
> > + generic_smp_call_function_single_interrupt();
> > + inc_irq_stat(irq_call_count);
> > + trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
> > + irq_exit();
> > +}
> > +
> > static int __init nonmi_ipi_setup(char *str)
> > {
> > smp_no_nmi_ipi = true;
> > diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c
> > new file mode 100644
> > index 0000000..3403dc3
> > --- /dev/null
> > +++ b/arch/x86/kernel/tracepoint.c
> > @@ -0,0 +1,71 @@
> > +/*
> > + * Code for supporting irq vector tracepoints.
> > + *
> > + * Copyright (C) 2013 Seiji Aguchi <sei...@hd...>
> > + *
> > + */
> > +#include <asm/hw_irq.h>
> > +#include <asm/desc.h>
> > +
> > +struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
> > + (unsigned long) trace_idt_table };
> > +
> > +#ifndef CONFIG_X86_64
> > +gate_desc trace_idt_table[NR_VECTORS] __page_aligned_data
> > + = { { { { 0, 0 } } }, };
> > +#endif
> > +
> > +static int trace_irq_vector_refcount;
> > +static DEFINE_MUTEX(irq_vector_mutex);
> > +DEFINE_PER_CPU(u32, trace_idt_in_use);
> > +
> > +static void switch_to_trace_idt(void *arg)
> > +{
> > +
> > + this_cpu_inc(trace_idt_in_use);
> > + load_idt(&trace_idt_descr);
> > +
> > + return;
> > +}
> > +
> > +static void restore_original_idt(void *arg)
> > +{
> > + if (WARN_ON(!this_cpu_read(trace_idt_in_use)))
> > + return;
> > + this_cpu_dec(trace_idt_in_use);
> > +
> > +#ifdef CONFIG_X86_64
> > + if (this_cpu_read(debug_stack_use_ctr))
> > + load_idt(&nmi_idt_descr);
> > + else
> > +#endif
>
> This is unneeded, as debug_stack_use_ctr should never be set here. It
> only gets set when interrupts are disabled and cleared before they are
> enabled again, and this is called from interrupt context. If it is set,
> then it is truly a bug.
>
> Only NMIs themselves can see this set, if it wasn't the one to set it.
>
> > + load_idt(&idt_descr);
> > + return;
> > +}
> > +
> > +void trace_irq_vector_regfunc(void)
> > +{
> > + mutex_lock(&irq_vector_mutex);
> > + if (!trace_irq_vector_refcount) {
> > + smp_call_function(switch_to_trace_idt, NULL, 0);
> > + local_irq_disable();
> > + switch_to_trace_idt(NULL);
> > + local_irq_enable();
> > + }
> > + trace_irq_vector_refcount++;
> > + mutex_unlock(&irq_vector_mutex);
> > +}
> > +
> > +void trace_irq_vector_unregfunc(void)
> > +{
> > + mutex_lock(&irq_vector_mutex);
> > + trace_irq_vector_refcount--;
> > + if (!trace_irq_vector_refcount) {
>
> Hmm, this needs to handle cpu hotplug. I just did the following:
>
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 1 > /debug/tracing/events/irq_vectors/enable
> echo 1 > /sys/devices/systems/cpu/cpu3/online
> echo 0 > /debug/tracing/events/irq/vectors/enable
>
> and got this:
>
> [ 224.595931] ------------[ cut here ]------------
> [ 224.596928] WARNING: at /home/rostedt/work/git/linux-trace.git/arch/x86/kernel/tracepoint.c:33 restore_original_idt+0x26/0x4e()
> [ 224.596928] Modules linked in: ebtables ipt_MASQUERADE sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec kvm_intel snd_hwdep snd_seq kvm
> snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc shpchp microcode i2c_i801 pata_acpi firewire_ohci firewire_core
> crc_itu_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video
> [ 224.596928] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.0-rc2-test+ #137
> [ 224.596928] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> [ 224.596928] ffffffff817c948b ffff88007d583ef8 ffffffff814c9ec6 ffff88007d583f38
> [ 224.596928] ffffffff810363a5 ffff88007d5971b8 0000000000000000 ffff88007d5971b8
> [ 224.596928] ffff88007d583f68 0000000000000001 0000000000000000 ffff88007d583f48
> [ 224.596928] Call Trace:
> [ 224.596928] <IRQ> [<ffffffff814c9ec6>] dump_stack+0x19/0x1b
> [ 224.596928] [<ffffffff810363a5>] warn_slowpath_common+0x67/0x80
> [ 224.596928] [<ffffffff810363d8>] warn_slowpath_null+0x1a/0x1c
> [ 224.596928] [<ffffffff8102864c>] restore_original_idt+0x26/0x4e
> [ 224.596928] [<ffffffff8108bd52>] generic_smp_call_function_single_interrupt+0xb5/0xee
> [ 224.596928] [<ffffffff8101f500>] smp_call_function_interrupt+0x18/0x27
> [ 224.596928] [<ffffffff814d646f>] call_function_interrupt+0x6f/0x80
> [ 224.596928] <EOI> [<ffffffff81009727>] ? default_idle+0x21/0x32
> [ 224.596928] [<ffffffff81009725>] ? default_idle+0x1f/0x32
> [ 224.596928] [<ffffffff81009e37>] arch_cpu_idle+0x18/0x22
> [ 224.596928] [<ffffffff810799da>] cpu_startup_entry+0xc8/0x12e
> [ 224.596928] [<ffffffff814bc6ad>] start_secondary+0x250/0x257
>
> -- Steve
>
>
> > + smp_call_function(restore_original_idt, NULL, 0);
> > + local_irq_disable();
> > + restore_original_idt(NULL);
> > + local_irq_enable();
> > + }
> > + mutex_unlock(&irq_vector_mutex);
> > +}
> > +
> > diff --git a/include/xen/events.h b/include/xen/events.h
> > index c6bfe01..9216d07 100644
> > --- a/include/xen/events.h
> > +++ b/include/xen/events.h
> > @@ -76,6 +76,9 @@ unsigned irq_from_evtchn(unsigned int evtchn);
> >
> > /* Xen HVM evtchn vector callback */
> > void xen_hvm_callback_vector(void);
> > +#ifdef CONFIG_TRACING
> > +#define trace_xen_hvm_callback_vector xen_hvm_callback_vector
> > +#endif
> > extern int xen_have_vector_callback;
> > int xen_set_callback_via(uint64_t via);
> > void xen_evtchn_do_upcall(struct pt_regs *regs);
>
|
|
From: Steven R. <ro...@go...> - 2013-05-23 19:55:00
|
On Fri, 2013-04-05 at 19:21 +0000, Seiji Aguchi wrote:
> [Issue]
>
> Currently, irq vector handlers for tracing are just
> copied non-trace handlers by simply inserting tracepoints.
>
> It is difficult to manage the codes.
>
> [Solution]
This as a separate patch actually makes things more confusing to review.
It should be merged into the previous patch. If you want to break up the
changes, I would first add the entering_irq(), and exiting_irq() as
patch 1, and then do the rest of the changes in patch 2.
-- Steve
> This patch shares common codes between non-trace and trace handlers
> as follows to make them manageable and readable.
>
> Non-trace irq handler:
> smp_irq_handler()
> {
> entering_irq(); /* pre-processing of this handler */
> __smp_irq_handler(); /*
> * common logic between non-trace and trace handlers
> * in a vector.
> */
> exiting_irq(); /* post-processing of this handler */
>
> }
>
> Trace irq_handler:
> smp_trace_irq_handler()
> {
> entering_irq(); /* pre-processing of this handler */
> trace_irq_entry(); /* tracepoint for irq entry */
> __smp_irq_handler(); /*
> * common logic between non-trace and trace handlers
> * in a vector.
> */
> trace_irq_exit(); /* tracepoint for irq exit */
> exiting_irq(); /* post-processing of this handler */
>
> }
>
> If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks \
> cleaner.
>
> smp_trace_irq_handler()
> {
> trace_irq_entry();
> smp_irq_handler();
> trace_irq_exit();
> }
>
> But it doesn't work.
> The problem is with irq_enter/exit() being called. They must be called before \
> trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any \
> tracepoints are used, as tracepoints use rcu to synchronize.
>
> As a possible alternative, we may be able to call irq_enter() first as follows if \
> irq_enter() can nest.
>
> smp_trace_irq_hander()
> {
> irq_entry();
> trace_irq_entry();
> smp_irq_handler();
> trace_irq_exit();
> irq_exit();
> }
>
> But it doesn't work, either.
> If irq_enter() is nested, it may have a time penalty because it has to check if it \
> was already called or not. The time penalty is not desired in performance sensitive \
> paths even if it is tiny.
>
> Signed-off-by: Seiji Aguchi <sei...@hd...>
|
|
From: Steven R. <ro...@go...> - 2013-05-23 19:51:47
|
On Fri, 2013-04-05 at 19:20 +0000, Seiji Aguchi wrote:
> Signed-off-by: Seiji Aguchi <sei...@hd...>
> ---
> arch/x86/include/asm/desc.h | 36 +++++++-
> arch/x86/include/asm/entry_arch.h | 5 +-
> arch/x86/include/asm/hw_irq.h | 16 +++
> arch/x86/include/asm/mshyperv.h | 1 +
> arch/x86/include/asm/trace/irq_vectors.h | 159 ++++++++++++++++++++++++++++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/apic/apic.c | 94 ++++++++++++++++++
> arch/x86/kernel/cpu/common.c | 12 ++-
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 14 +++
> arch/x86/kernel/cpu/mcheck/threshold.c | 14 +++
> arch/x86/kernel/entry_32.S | 12 ++-
> arch/x86/kernel/entry_64.S | 29 +++++-
> arch/x86/kernel/head_64.S | 6 +
> arch/x86/kernel/irq.c | 23 +++++
> arch/x86/kernel/irq_work.c | 12 +++
> arch/x86/kernel/smp.c | 35 +++++++
> arch/x86/kernel/tracepoint.c | 71 +++++++++++++
> include/xen/events.h | 3 +
> 18 files changed, 529 insertions(+), 14 deletions(-)
> create mode 100644 arch/x86/include/asm/trace/irq_vectors.h
> create mode 100644 arch/x86/kernel/tracepoint.c
>
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index 8bf1c06..e718c72 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -39,6 +39,9 @@ extern gate_desc idt_table[];
> extern struct desc_ptr nmi_idt_descr;
> extern gate_desc nmi_idt_table[];
>
> +extern struct desc_ptr trace_idt_descr;
> +extern DEFINE_PER_CPU(u32, trace_idt_in_use);
> +extern DEFINE_PER_CPU(u32, debug_stack_use_ctr);
Header files should use DECLARE_PER_CPU() not DEFINE_PER_CPU. And drop
the "extern" when doing so, as it's implicit in the macro.
> struct gdt_page {
> struct desc_struct gdt[GDT_ENTRIES];
> } __attribute__((aligned(PAGE_SIZE)));
> @@ -320,6 +323,17 @@ static inline void set_nmi_gate(int gate, void *addr)
> }
> #endif
>
> +#ifdef CONFIG_TRACING
> +extern gate_desc trace_idt_table[];
> +static inline void trace_set_intr_gate(unsigned int gate, void *addr)
> +{
> + gate_desc s;
> +
> + pack_gate(&s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
> + write_idt_entry(trace_idt_table, gate, &s);
> +}
Perhaps add a:
static inline void write_trace_idt_entry(gate_desc *idt, int entry,
const gate_desc *gate)
{
write_idt_entry(idt, entry, gate);
}
#else
static inline void write_trace_idt_entry(gate_desc *idt, int entry,
const gate_desc *gate)
{
}
> +#endif
Then you can below just do:
> +
> static inline void _set_gate(int gate, unsigned type, void *addr,
> unsigned dpl, unsigned ist, unsigned seg)
> {
> @@ -331,6 +345,9 @@ static inline void _set_gate(int gate, unsigned type, void *addr,
> * setup time
> */
> write_idt_entry(idt_table, gate, &s);
write_trace_idt_entry(idt_table, gate, &s);
and remove the #ifdef. Makes it nice to read.
> +#ifdef CONFIG_TRACING
> + write_idt_entry(trace_idt_table, gate, &s);
> +#endif
> }
>
> /*
> @@ -360,12 +377,27 @@ static inline void alloc_system_vector(int vector)
> }
> }
>
> -static inline void alloc_intr_gate(unsigned int n, void *addr)
> +#ifdef CONFIG_TRACING
> +static inline void __trace_alloc_intr_gate(unsigned int n, void *addr)
> +{
> + trace_set_intr_gate(n, addr);
> +}
> +#else
> +#define __trace_alloc_intr_gate(n, addr)
> +#endif
> +
> +static inline void __alloc_intr_gate(unsigned int n, void *addr)
> {
> - alloc_system_vector(n);
> set_intr_gate(n, addr);
> }
>
> +#define alloc_intr_gate(n, addr) \
> + do { \
> + alloc_system_vector(n); \
> + __alloc_intr_gate(n, addr); \
> + __trace_alloc_intr_gate(n, trace_##addr); \
> + } while (0)
> +
> /*
> * This routine sets up an interrupt gate at directory privilege level 3.
> */
> diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
> index 40afa00..0bb99d8 100644
> --- a/arch/x86/include/asm/entry_arch.h
> +++ b/arch/x86/include/asm/entry_arch.h
> @@ -13,8 +13,9 @@
> BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
> BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
> BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
> -BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
> -BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
> +BUILD_INTERRUPT3(irq_move_cleanup_interrupt, IRQ_MOVE_CLEANUP_VECTOR,
> + smp_irq_move_cleanup_interrupt)
> +BUILD_INTERRUPT3(reboot_interrupt, REBOOT_VECTOR, smp_reboot_interrupt)
> #endif
>
> BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index 10a78c3..e84e91a 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -76,6 +76,22 @@ extern void threshold_interrupt(void);
> extern void call_function_interrupt(void);
> extern void call_function_single_interrupt(void);
>
> +#ifdef CONFIG_TRACING
> +/* Interrupt handlers registered during init_IRQ */
> +extern void trace_apic_timer_interrupt(void);
> +extern void trace_x86_platform_ipi(void);
> +extern void trace_error_interrupt(void);
> +extern void trace_irq_work_interrupt(void);
> +extern void trace_spurious_interrupt(void);
> +extern void trace_thermal_interrupt(void);
> +extern void trace_reschedule_interrupt(void);
> +extern void trace_threshold_interrupt(void);
> +extern void trace_call_function_interrupt(void);
> +extern void trace_call_function_single_interrupt(void);
> +#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt
> +#define trace_reboot_interrupt reboot_interrupt
> +#endif /* CONFIG_TRACING */
> +
> /* IOAPIC */
> #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs))
> extern unsigned long io_apic_irqs;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index c2934be..cc40e03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -12,6 +12,7 @@ struct ms_hyperv_info {
> extern struct ms_hyperv_info ms_hyperv;
>
> void hyperv_callback_vector(void);
> +#define trace_hyperv_callback_vector hyperv_callback_vector
> void hyperv_vector_handler(struct pt_regs *regs);
> void hv_register_vmbus_handler(int irq, irq_handler_t handler);
>
> diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h
> new file mode 100644
> index 0000000..b4f1c53
> --- /dev/null
> +++ b/arch/x86/include/asm/trace/irq_vectors.h
> @@ -0,0 +1,159 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM irq_vectors
> +
> +#if !defined(_TRACE_IRQ_VECTORS_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_IRQ_VECTORS_H
> +
> +#include <linux/tracepoint.h>
> +
> +extern void trace_irq_vector_regfunc(void);
> +extern void trace_irq_vector_unregfunc(void);
> +
> +DECLARE_EVENT_CLASS(x86_irq_vector,
> +
> + TP_PROTO(int vector),
> +
> + TP_ARGS(vector),
> +
> + TP_STRUCT__entry(
> + __field( int, vector )
> + ),
> +
> + TP_fast_assign(
> + __entry->vector = vector;
> + ),
> +
> + TP_printk("vector=%d", __entry->vector) );
> +
> +#define DEFINE_IRQ_VECTOR_EVENT(name) \
> +DEFINE_EVENT_FN(x86_irq_vector, name, \
> + TP_PROTO(int vector), \
> + TP_ARGS(vector), \
> + trace_irq_vector_regfunc, \
> + trace_irq_vector_unregfunc);
> +
> +/*
> + * local_timer_entry - called before entering a local timer interrupt
> + * vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(local_timer_entry);
> +
> +/*
> + * local_timer_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(local_timer_exit);
Looks like you can save on repeat code by doing the following:
#define DEFINE_IRQ_VECTOR_EVENT(name) \
DEFINE_EVENT_FN(x86_irq_vector, name##_entry, \
TP_PROTO(int vector), \
TP_ARGS(vector), \
trace_irq_vector_regfunc, \
trace_irq_vector_unregfunc); \
DEFINE_EVENT_FN(x86_irq_vector, name##_exit, \
TP_PROTO(int vector), \
TP_ARGS(vector), \
trace_irq_vector_regfunc, \
trace_irq_vector_unregfunc); \
Then just do for each:
DEFINE_IRQ_VECTOR_EVENT(local_timer);
[...]
> +
> +/*
> + * reschedule_entry - called before entering a reschedule vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(reschedule_entry);
> +
> +/*
> + * reschedule_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(reschedule_exit);
> +
> +/*
> + * spurious_apic_entry - called before entering a spurious apic vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(spurious_apic_entry);
> +
> +/*
> + * spurious_apic_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(spurious_apic_exit);
> +
> +/*
> + * error_apic_entry - called before entering an error apic vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(error_apic_entry);
> +
> +/*
> + * error_apic_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(error_apic_exit);
> +
> +/*
> + * x86_platform_ipi_entry - called before entering a x86 platform ipi interrupt
> + * vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi_entry);
> +
> +/*
> + * x86_platform_ipi_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi_exit);
> +
> +/*
> + * irq_work_entry - called before entering a irq work interrupt
> + * vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(irq_work_entry);
> +
> +/*
> + * irq_work_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(irq_work_exit);
> +
> +/*
> + * call_function_entry - called before entering a call function interrupt
> + * vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(call_function_entry);
> +
> +/*
> + * call_function_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(call_function_exit);
> +
> +/*
> + * call_function_single_entry - called before entering a call function
> + * single interrupt vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(call_function_single_entry);
> +
> +/*
> + * call_function_single_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(call_function_single_exit);
> +
> +/*
> + * threshold_apic_entry - called before entering a threshold apic interrupt
> + * vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(threshold_apic_entry);
> +
> +/*
> + * threshold_apic_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(threshold_apic_exit);
> +
> +/*
> + * thermal_apic_entry - called before entering a thermal apic interrupt
> + * vector handler
> + */
> +DEFINE_IRQ_VECTOR_EVENT(thermal_apic_entry);
> +
> +/*
> + * thrmal_apic_exit - called immediately after the interrupt vector
> + * handler returns
> + */
> +DEFINE_IRQ_VECTOR_EVENT(thermal_apic_exit);
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH ../../arch/x86/include/asm/trace
Hmm, this is dangerous. It's better to use the Makefile.
here do:
#define TRACE_INCLUDE_PATH .
> +#define TRACE_INCLUDE_FILE irq_vectors
> +#endif /* _TRACE_IRQ_VECTORS_H */
Then in arch/x86/kernel/apic/Makefile:
CFLAGS_apic.o := -I$(src)
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> +
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 7bd3bd3..74b3891 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_OF) += devicetree.o
> obj-$(CONFIG_UPROBES) += uprobes.o
>
> obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
> +obj-$(CONFIG_TRACING) += tracepoint.o
>
> ###
> # 64 bit specific files
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 904611b..7d39c68 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -55,6 +55,9 @@
> #include <asm/tsc.h>
> #include <asm/hypervisor.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <asm/trace/irq_vectors.h>
> +
> unsigned int num_processors;
>
> unsigned disabled_cpus __cpuinitdata;
> @@ -934,6 +937,30 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
> set_irq_regs(old_regs);
> }
>
> +void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs)
> +{
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> + /*
> + * NOTE! We'd better ACK the irq immediately,
> + * because timer handling can be slow.
> + */
> + ack_APIC_irq();
> + /*
> + * update_process_times() expects us to have done irq_enter().
> + * Besides, if we don't timer interrupts ignore the global
> + * interrupt lock, which is the WrongThing (tm) to do.
> + */
> + irq_enter();
> + exit_idle();
> + trace_local_timer_entry(LOCAL_TIMER_VECTOR);
> + local_apic_timer_interrupt();
> + trace_local_timer_exit(LOCAL_TIMER_VECTOR);
> + irq_exit();
> +
> + set_irq_regs(old_regs);
> +}
> +
> int setup_profiling_timer(unsigned int multiplier)
> {
> return -EINVAL;
> @@ -1930,6 +1957,31 @@ void smp_spurious_interrupt(struct pt_regs *regs)
> irq_exit();
> }
>
> +void smp_trace_spurious_interrupt(struct pt_regs *regs)
> +{
> + u32 v;
> +
> + irq_enter();
> + exit_idle();
> + trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR);
> + /*
> + * Check if this really is a spurious interrupt and ACK it
> + * if it is a vectored one. Just in case...
> + * Spurious interrupts should not be ACKed.
> + */
> + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));
> + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
> + ack_APIC_irq();
> +
> + inc_irq_stat(irq_spurious_count);
> +
> + /* see sw-dev-man vol 3, chapter 7.4.13.5 */
> + pr_info("spurious APIC interrupt on CPU#%d, "
> + "should never happen.\n", smp_processor_id());
> + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR);
> + irq_exit();
> +}
> +
> /*
> * This interrupt should never happen with our APIC/SMP architecture
> */
> @@ -1973,6 +2025,48 @@ void smp_error_interrupt(struct pt_regs *regs)
> irq_exit();
> }
>
> +void smp_trace_error_interrupt(struct pt_regs *regs)
> +{
> + u32 v0, v1;
> + u32 i = 0;
> + static const char * const error_interrupt_reason[] = {
> + "Send CS error", /* APIC Error Bit 0 */
> + "Receive CS error", /* APIC Error Bit 1 */
> + "Send accept error", /* APIC Error Bit 2 */
> + "Receive accept error", /* APIC Error Bit 3 */
> + "Redirectable IPI", /* APIC Error Bit 4 */
> + "Send illegal vector", /* APIC Error Bit 5 */
> + "Received illegal vector", /* APIC Error Bit 6 */
> + "Illegal register address", /* APIC Error Bit 7 */
> + };
> +
> + irq_enter();
> + exit_idle();
> + trace_error_apic_entry(ERROR_APIC_VECTOR);
> + /* First tickle the hardware, only then report what went on. -- REW */
> + v0 = apic_read(APIC_ESR);
> + apic_write(APIC_ESR, 0);
> + v1 = apic_read(APIC_ESR);
> + ack_APIC_irq();
> + atomic_inc(&irq_err_count);
> +
> + apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
> + smp_processor_id(), v0 , v1);
> +
> + v1 = v1 & 0xff;
> + while (v1) {
> + if (v1 & 0x1)
> + apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]);
> + i++;
> + v1 >>= 1;
> + }
> +
> + apic_printk(APIC_DEBUG, KERN_CONT "\n");
> +
> + trace_error_apic_exit(ERROR_APIC_VECTOR);
> + irq_exit();
> +}
> +
> /**
> * connect_bsp_APIC - attach the APIC to the interrupt system
> */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index d814772..23045cd 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1144,7 +1144,7 @@ int is_debug_stack(unsigned long addr)
> addr > (__get_cpu_var(debug_stack_addr) - DEBUG_STKSZ));
> }
>
> -static DEFINE_PER_CPU(u32, debug_stack_use_ctr);
> +DEFINE_PER_CPU(u32, debug_stack_use_ctr);
>
> void debug_stack_set_zero(void)
> {
> @@ -1156,8 +1156,14 @@ void debug_stack_reset(void)
> {
> if (WARN_ON(!this_cpu_read(debug_stack_use_ctr)))
> return;
> - if (this_cpu_dec_return(debug_stack_use_ctr) == 0)
> - load_idt((const struct desc_ptr *)&idt_descr);
> + if (this_cpu_dec_return(debug_stack_use_ctr) == 0) {
> +#ifdef CONFIG_TRACING
> + if (this_cpu_read(trace_idt_in_use))
> + load_idt((const struct desc_ptr *)&trace_idt_descr);
> + else
> +#endif
> + load_idt((const struct desc_ptr *)&idt_descr);
> + }
I have a patch that makes this more generic. I'm wondering if that would
be better than having everything so coupled.
> }
>
> #else /* CONFIG_X86_64 */
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 47a1870..e7aa7fc 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -29,6 +29,7 @@
> #include <asm/idle.h>
> #include <asm/mce.h>
> #include <asm/msr.h>
> +#include <asm/trace/irq_vectors.h>
>
> /* How long to wait between reporting thermal events */
> #define CHECK_INTERVAL (300 * HZ)
> @@ -389,6 +390,19 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
> ack_APIC_irq();
> }
>
> +asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs)
> +{
> + irq_enter();
> + exit_idle();
> + trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
> + inc_irq_stat(irq_thermal_count);
> + smp_thermal_vector();
> + trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
> + irq_exit();
> + /* Ack only at the end to avoid potential reentry */
> + ack_APIC_irq();
> +}
> +
> /* Thermal monitoring depends on APIC, ACPI and clock modulation */
> static int intel_thermal_supported(struct cpuinfo_x86 *c)
> {
> diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
> index aa578ca..0cbef99 100644
> --- a/arch/x86/kernel/cpu/mcheck/threshold.c
> +++ b/arch/x86/kernel/cpu/mcheck/threshold.c
> @@ -8,6 +8,7 @@
> #include <asm/apic.h>
> #include <asm/idle.h>
> #include <asm/mce.h>
> +#include <asm/trace/irq_vectors.h>
>
> static void default_threshold_interrupt(void)
> {
> @@ -27,3 +28,16 @@ asmlinkage void smp_threshold_interrupt(void)
> /* Ack only at the end to avoid potential reentry */
> ack_APIC_irq();
> }
> +
> +asmlinkage void smp_trace_threshold_interrupt(void)
> +{
> + irq_enter();
> + exit_idle();
> + trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
> + inc_irq_stat(irq_threshold_count);
> + mce_threshold_vector();
> + trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
> + irq_exit();
> + /* Ack only at the end to avoid potential reentry */
> + ack_APIC_irq();
> +}
> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> index 8f3e2de..2cfbc3a 100644
> --- a/arch/x86/kernel/entry_32.S
> +++ b/arch/x86/kernel/entry_32.S
> @@ -801,7 +801,17 @@ ENTRY(name) \
> CFI_ENDPROC; \
> ENDPROC(name)
>
> -#define BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(name, nr, smp_##name)
> +
> +#ifdef CONFIG_TRACING
> +#define TRACE_BUILD_INTERRUPT(name, nr) \
> + BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name)
> +#else
> +#define TRACE_BUILD_INTERRUPT(name, nr)
> +#endif
> +
> +#define BUILD_INTERRUPT(name, nr) \
> + BUILD_INTERRUPT3(name, nr, smp_##name); \
> + TRACE_BUILD_INTERRUPT(name, nr)
>
> /* The include is where all of the SMP etc. interrupts come from */
> #include <asm/entry_arch.h>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index c1d01e6..4570477 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1138,7 +1138,7 @@ END(common_interrupt)
> /*
> * APIC interrupts.
> */
> -.macro apicinterrupt num sym do_sym
> +.macro apicinterrupt3 num sym do_sym
> ENTRY(\sym)
> INTR_FRAME
> ASM_CLAC
> @@ -1150,15 +1150,32 @@ ENTRY(\sym)
> END(\sym)
> .endm
>
> +#ifdef CONFIG_TRACING
> +#define trace(sym) trace_##sym
> +#define smp_trace(sym) smp_trace_##sym
> +
> +.macro trace_apicinterrupt num sym
> +apicinterrupt3 \num trace(\sym) smp_trace(\sym)
> +.endm
> +#else
> +.macro trace_apicinterrupt num sym do_sym
> +.endm
> +#endif
> +
> +.macro apicinterrupt num sym do_sym
> +apicinterrupt3 \num \sym \do_sym
> +trace_apicinterrupt \num \sym
> +.endm
> +
> #ifdef CONFIG_SMP
> -apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \
> +apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR \
> irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt
> -apicinterrupt REBOOT_VECTOR \
> +apicinterrupt3 REBOOT_VECTOR \
> reboot_interrupt smp_reboot_interrupt
> #endif
>
> #ifdef CONFIG_X86_UV
> -apicinterrupt UV_BAU_MESSAGE \
> +apicinterrupt3 UV_BAU_MESSAGE \
> uv_bau_message_intr1 uv_bau_message_interrupt
> #endif
> apicinterrupt LOCAL_TIMER_VECTOR \
> @@ -1446,13 +1463,13 @@ ENTRY(xen_failsafe_callback)
> CFI_ENDPROC
> END(xen_failsafe_callback)
>
> -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \
> +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
> xen_hvm_callback_vector xen_evtchn_do_upcall
>
> #endif /* CONFIG_XEN */
>
> #if IS_ENABLED(CONFIG_HYPERV)
> -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \
> +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
> hyperv_callback_vector hyperv_vector_handler
> #endif /* CONFIG_HYPERV */
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 6859e96..e827e67 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -518,6 +518,12 @@ ENTRY(idt_table)
> ENTRY(nmi_idt_table)
> .skip IDT_ENTRIES * 16
>
> +#ifdef CONFIG_TRACING
> + .align L1_CACHE_BYTES
> +ENTRY(trace_idt_table)
> + .skip IDT_ENTRIES * 16
> +#endif
> +
> __PAGE_ALIGNED_BSS
> NEXT_PAGE(empty_zero_page)
> .skip PAGE_SIZE
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index e4595f1..216bec1 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -17,6 +17,7 @@
> #include <asm/idle.h>
> #include <asm/mce.h>
> #include <asm/hw_irq.h>
> +#include <asm/trace/irq_vectors.h>
>
> atomic_t irq_err_count;
>
> @@ -228,6 +229,28 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
> set_irq_regs(old_regs);
> }
>
> +void smp_trace_x86_platform_ipi(struct pt_regs *regs)
> +{
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> + ack_APIC_irq();
> +
> + irq_enter();
> +
> + exit_idle();
> +
> + trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR);
> + inc_irq_stat(x86_platform_ipis);
> +
> + if (x86_platform_ipi_callback)
> + x86_platform_ipi_callback();
> +
> + trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR);
> + irq_exit();
> +
> + set_irq_regs(old_regs);
> +}
> +
> EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
>
> #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
> index ca8f703..09e6262 100644
> --- a/arch/x86/kernel/irq_work.c
> +++ b/arch/x86/kernel/irq_work.c
> @@ -8,6 +8,7 @@
> #include <linux/irq_work.h>
> #include <linux/hardirq.h>
> #include <asm/apic.h>
> +#include <asm/trace/irq_vectors.h>
>
> void smp_irq_work_interrupt(struct pt_regs *regs)
> {
> @@ -18,6 +19,17 @@ void smp_irq_work_interrupt(struct pt_regs *regs)
> irq_exit();
> }
>
> +void smp_trace_irq_work_interrupt(struct pt_regs *regs)
> +{
> + irq_enter();
> + ack_APIC_irq();
> + trace_irq_work_entry(IRQ_WORK_VECTOR);
> + inc_irq_stat(apic_irq_work_irqs);
> + irq_work_run();
> + trace_irq_work_exit(IRQ_WORK_VECTOR);
> + irq_exit();
> +}
> +
> void arch_irq_work_raise(void)
> {
> #ifdef CONFIG_X86_LOCAL_APIC
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 48d2b7d..aad58af 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -30,6 +30,7 @@
> #include <asm/proto.h>
> #include <asm/apic.h>
> #include <asm/nmi.h>
> +#include <asm/trace/irq_vectors.h>
> /*
> * Some notes on x86 processor bugs affecting SMP operation:
> *
> @@ -259,6 +260,18 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
> */
> }
>
> +void smp_trace_reschedule_interrupt(struct pt_regs *regs)
> +{
> + ack_APIC_irq();
> + trace_reschedule_entry(RESCHEDULE_VECTOR);
> + inc_irq_stat(irq_resched_count);
> + scheduler_ipi();
> + trace_reschedule_exit(RESCHEDULE_VECTOR);
> + /*
> + * KVM uses this interrupt to force a cpu out of guest mode
> + */
> +}
> +
> void smp_call_function_interrupt(struct pt_regs *regs)
> {
> ack_APIC_irq();
> @@ -268,6 +281,17 @@ void smp_call_function_interrupt(struct pt_regs *regs)
> irq_exit();
> }
>
> +void smp_trace_call_function_interrupt(struct pt_regs *regs)
> +{
> + ack_APIC_irq();
> + irq_enter();
> + trace_call_function_entry(CALL_FUNCTION_VECTOR);
> + generic_smp_call_function_interrupt();
> + inc_irq_stat(irq_call_count);
> + trace_call_function_exit(CALL_FUNCTION_VECTOR);
> + irq_exit();
> +}
> +
> void smp_call_function_single_interrupt(struct pt_regs *regs)
> {
> ack_APIC_irq();
> @@ -277,6 +301,17 @@ void smp_call_function_single_interrupt(struct pt_regs *regs)
> irq_exit();
> }
>
> +void smp_trace_call_function_single_interrupt(struct pt_regs *regs)
> +{
> + ack_APIC_irq();
> + irq_enter();
> + trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
> + generic_smp_call_function_single_interrupt();
> + inc_irq_stat(irq_call_count);
> + trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
> + irq_exit();
> +}
> +
> static int __init nonmi_ipi_setup(char *str)
> {
> smp_no_nmi_ipi = true;
> diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c
> new file mode 100644
> index 0000000..3403dc3
> --- /dev/null
> +++ b/arch/x86/kernel/tracepoint.c
> @@ -0,0 +1,71 @@
> +/*
> + * Code for supporting irq vector tracepoints.
> + *
> + * Copyright (C) 2013 Seiji Aguchi <sei...@hd...>
> + *
> + */
> +#include <asm/hw_irq.h>
> +#include <asm/desc.h>
> +
> +struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1,
> + (unsigned long) trace_idt_table };
> +
> +#ifndef CONFIG_X86_64
> +gate_desc trace_idt_table[NR_VECTORS] __page_aligned_data
> + = { { { { 0, 0 } } }, };
> +#endif
> +
> +static int trace_irq_vector_refcount;
> +static DEFINE_MUTEX(irq_vector_mutex);
> +DEFINE_PER_CPU(u32, trace_idt_in_use);
> +
> +static void switch_to_trace_idt(void *arg)
> +{
> +
> + this_cpu_inc(trace_idt_in_use);
> + load_idt(&trace_idt_descr);
> +
> + return;
> +}
> +
> +static void restore_original_idt(void *arg)
> +{
> + if (WARN_ON(!this_cpu_read(trace_idt_in_use)))
> + return;
> + this_cpu_dec(trace_idt_in_use);
> +
> +#ifdef CONFIG_X86_64
> + if (this_cpu_read(debug_stack_use_ctr))
> + load_idt(&nmi_idt_descr);
> + else
> +#endif
This is unneeded, as debug_stack_use_ctr should never be set here. It
only gets set when interrupts are disabled and cleared before they are
enabled again, and this is called from interrupt context. If it is set,
then it is truly a bug.
Only NMIs themselves can see this set, if it wasn't the one to set it.
> + load_idt(&idt_descr);
> + return;
> +}
> +
> +void trace_irq_vector_regfunc(void)
> +{
> + mutex_lock(&irq_vector_mutex);
> + if (!trace_irq_vector_refcount) {
> + smp_call_function(switch_to_trace_idt, NULL, 0);
> + local_irq_disable();
> + switch_to_trace_idt(NULL);
> + local_irq_enable();
> + }
> + trace_irq_vector_refcount++;
> + mutex_unlock(&irq_vector_mutex);
> +}
> +
> +void trace_irq_vector_unregfunc(void)
> +{
> + mutex_lock(&irq_vector_mutex);
> + trace_irq_vector_refcount--;
> + if (!trace_irq_vector_refcount) {
Hmm, this needs to handle cpu hotplug. I just did the following:
echo 0 > /sys/devices/system/cpu/cpu3/online
echo 1 > /debug/tracing/events/irq_vectors/enable
echo 1 > /sys/devices/systems/cpu/cpu3/online
echo 0 > /debug/tracing/events/irq/vectors/enable
and got this:
[ 224.595931] ------------[ cut here ]------------
[ 224.596928] WARNING: at /home/rostedt/work/git/linux-trace.git/arch/x86/kernel/tracepoint.c:33 restore_original_idt+0x26/0x4e()
[ 224.596928] Modules linked in: ebtables ipt_MASQUERADE sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec kvm_intel snd_hwdep snd_seq kvm snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc shpchp microcode i2c_i801 pata_acpi firewire_ohci firewire_core crc_itu_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video
[ 224.596928] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.10.0-rc2-test+ #137
[ 224.596928] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 224.596928] ffffffff817c948b ffff88007d583ef8 ffffffff814c9ec6 ffff88007d583f38
[ 224.596928] ffffffff810363a5 ffff88007d5971b8 0000000000000000 ffff88007d5971b8
[ 224.596928] ffff88007d583f68 0000000000000001 0000000000000000 ffff88007d583f48
[ 224.596928] Call Trace:
[ 224.596928] <IRQ> [<ffffffff814c9ec6>] dump_stack+0x19/0x1b
[ 224.596928] [<ffffffff810363a5>] warn_slowpath_common+0x67/0x80
[ 224.596928] [<ffffffff810363d8>] warn_slowpath_null+0x1a/0x1c
[ 224.596928] [<ffffffff8102864c>] restore_original_idt+0x26/0x4e
[ 224.596928] [<ffffffff8108bd52>] generic_smp_call_function_single_interrupt+0xb5/0xee
[ 224.596928] [<ffffffff8101f500>] smp_call_function_interrupt+0x18/0x27
[ 224.596928] [<ffffffff814d646f>] call_function_interrupt+0x6f/0x80
[ 224.596928] <EOI> [<ffffffff81009727>] ? default_idle+0x21/0x32
[ 224.596928] [<ffffffff81009725>] ? default_idle+0x1f/0x32
[ 224.596928] [<ffffffff81009e37>] arch_cpu_idle+0x18/0x22
[ 224.596928] [<ffffffff810799da>] cpu_startup_entry+0xc8/0x12e
[ 224.596928] [<ffffffff814bc6ad>] start_secondary+0x250/0x257
-- Steve
> + smp_call_function(restore_original_idt, NULL, 0);
> + local_irq_disable();
> + restore_original_idt(NULL);
> + local_irq_enable();
> + }
> + mutex_unlock(&irq_vector_mutex);
> +}
> +
> diff --git a/include/xen/events.h b/include/xen/events.h
> index c6bfe01..9216d07 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -76,6 +76,9 @@ unsigned irq_from_evtchn(unsigned int evtchn);
>
> /* Xen HVM evtchn vector callback */
> void xen_hvm_callback_vector(void);
> +#ifdef CONFIG_TRACING
> +#define trace_xen_hvm_callback_vector xen_hvm_callback_vector
> +#endif
> extern int xen_have_vector_callback;
> int xen_set_callback_via(uint64_t via);
> void xen_evtchn_do_upcall(struct pt_regs *regs);
|
|
From: Andrew M. <ak...@li...> - 2013-05-21 23:01:11
|
Guys, can we please get some review attention to this? From: Seiji Aguchi <sei...@hd...> Subject: Add kmsg_dump() to kexec path Problem ======= >From our support service experience, we always need to detect root cause of OS panic. And customers in enterprise area never forgive us if we can't detect the root cause of panic due to lack of materials for investigation. Kdump is a powerful troubleshooting feature, but it may accesses to multiple hardware, like HBA, FC-cable, to get to dump disk. This means kdump is not robust against hardware failure. Solution ======== Logging kernel message to persistent device is an effective way to get materials for investigation in case of kdump failure. So this patch adds kmsg_dump() to a kexec path. Also, it adds KMSG_DUMP_KEXEC to pstore_cannot_block_path() so that it can avoid deadlocking in kexec path. Please see the detail of pstore_cannot_block_path(). https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/pstore/platform.c?id=9f244e9cfd70c7c0f82d3c92ce772ab2a92d9f64 Actually, there are some objections about kmsg_dump(KMSG_DUMP_KEXEC) and EFI below. But I still think adding kmsg_dump() to a kexec path is useful. - http://marc.info/?l=linux-kernel&m=130698519720887&w=2 (1) kdump already saves kernel messages inside /proc/vmcore - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/kexec.c?id=a3dd3323058d281abd584b15ad4c5b65064d7a61 It is correct, but the content of /proc/vmcore is stored a dump disk as well. So, if kdump fails due to hardware failures, the kernel messages will be lost. (2) EFI firmware is buggy - http://marc.info/?l=linux-kernel&m=130698519720887&w=2 I haven't seen actual firmware bugs which may cause kdump failure. So I don't think we need to care about it too much. However, just to be safe, I introduced pstore_cannot_block_path() avoid deadlocking to pstore. Also, this patch doesn't affect almost all users because kmsg_dump() is kicked only when specifying both pstore.backend and printk.always_kmsg_dump parameters. Even if a buggy firmware causes a kdump failure and someone blames kdump, we can ask them to reproduce the kdump failure by removing the parameters. In addition, I checked current coding of platform drivers. There is no obvious problem as follows. - mtdoops/ramoops They are designed to be kicked in panic and oops cases only. So, they never run in a kexec path. - erst/efi/early_printk_mrst/nvram driver for powerpc I don't see any bugs which may causes kdump failure because deadlocking/dynamic memory allocation don't happen in their write callbacks. Signed-off-by: Seiji Aguchi <sei...@hd...> Signed-off-by: Andrew Morton <ak...@li...> --- fs/pstore/platform.c | 4 ++++ include/linux/kmsg_dump.h | 1 + kernel/kexec.c | 2 ++ 3 files changed, 7 insertions(+) diff -puN fs/pstore/platform.c~add-kmsg_dump-to-kexec-path fs/pstore/platform.c --- a/fs/pstore/platform.c~add-kmsg_dump-to-kexec-path +++ a/fs/pstore/platform.c @@ -91,6 +91,8 @@ static const char *get_reason_str(enum k return "Halt"; case KMSG_DUMP_POWEROFF: return "Poweroff"; + case KMSG_DUMP_KEXEC: + return "Kexec"; default: return "Unknown"; } @@ -110,6 +112,8 @@ bool pstore_cannot_block_path(enum kmsg_ case KMSG_DUMP_PANIC: /* Emergency restart shouldn't be blocked by spin lock. */ case KMSG_DUMP_EMERG: + /* In kexec path, pstore shouldn't be blocked to avoid kexec failure. */ + case KMSG_DUMP_KEXEC: return true; default: return false; diff -puN include/linux/kmsg_dump.h~add-kmsg_dump-to-kexec-path include/linux/kmsg_dump.h --- a/include/linux/kmsg_dump.h~add-kmsg_dump-to-kexec-path +++ a/include/linux/kmsg_dump.h @@ -28,6 +28,7 @@ enum kmsg_dump_reason { KMSG_DUMP_RESTART, KMSG_DUMP_HALT, KMSG_DUMP_POWEROFF, + KMSG_DUMP_KEXEC, }; /** diff -puN kernel/kexec.c~add-kmsg_dump-to-kexec-path kernel/kexec.c --- a/kernel/kexec.c~add-kmsg_dump-to-kexec-path +++ a/kernel/kexec.c @@ -32,6 +32,7 @@ #include <linux/vmalloc.h> #include <linux/swap.h> #include <linux/syscore_ops.h> +#include <linux/kmsg_dump.h> #include <asm/page.h> #include <asm/uaccess.h> @@ -1089,6 +1090,7 @@ void crash_kexec(struct pt_regs *regs) crash_setup_regs(&fixed_regs, regs); crash_save_vmcoreinfo(); + kmsg_dump(KMSG_DUMP_KEXEC); machine_crash_shutdown(&fixed_regs); machine_kexec(kexec_crash_image); } _ |
|
From: Andrew M. <ak...@li...> - 2013-05-21 23:01:11
|
On Tue, 21 May 2013 15:47:41 -0700 "gr...@li..." <gr...@li...> wrote: > On Tue, May 21, 2013 at 03:41:37PM -0700, Andrew Morton wrote: > > Guys, can we please get some review attention to this? > > I thought it was reviewed, and rejected, already. Did you miss those > emails? I had memories of the same thing, but can't find any record in lkml archives. Either we're thinking of a different patch or this one has appeared under multiple titles. |
|
From: <gr...@li...> - 2013-05-21 22:58:48
|
On Tue, May 21, 2013 at 03:52:14PM -0700, Andrew Morton wrote: > On Tue, 21 May 2013 15:47:41 -0700 "gr...@li..." <gr...@li...> wrote: > > > On Tue, May 21, 2013 at 03:41:37PM -0700, Andrew Morton wrote: > > > Guys, can we please get some review attention to this? > > > > I thought it was reviewed, and rejected, already. Did you miss those > > emails? > > I had memories of the same thing, but can't find any record in lkml > archives. Either we're thinking of a different patch or this one has > appeared under multiple titles. Eric responded to it, perhaps due to the huge number of people on the Cc, the mailing lists dropped it? |
|
From: <ebi...@xm...> - 2013-05-21 22:54:17
|
Andrew Morton <ak...@li...> writes: > Guys, can we please get some review attention to this? I think my original reply got lost. The review was nack. I don't see anything in this patch that makes this any more palatable than any other time this idea has come up. It is perfectly possible in the kexec on panic kernel to write the information to the kmsg. So all this appears to do is make us more dependent on a known broken kernel and make things more fragile and less reliable, for no apparent gain. With the kexec on panic code path the focus should be on making it simpler and more reliable. Especially when we are going to be calling out to firmware this becomes a completely impossible to debug code path. It is simple enough to not load modules in the kexec on panic kernel to avoid dealing with hardware that is silly and drivers that don't work, and those problems can be debugged and fixed. The justification for the patch below starts from an impossible set of requirements "Always needing to detect the root cause of an OS panic" and then says but we choose to use weird crazy configurations that we know are problematic and instead of making those more robust we choose to use an architecture that is less reliable. The only problem with kdump here is the implementation in the initial ram disk. Fixing the initial ramdisk so it logs to kmsg before it touches scarier hardware should be the solution. Eric > From: Seiji Aguchi <sei...@hd...> > Subject: Add kmsg_dump() to kexec path > > Problem > ======= > > From our support service experience, we always need to detect root cause > of OS panic. And customers in enterprise area never forgive us if we > can't detect the root cause of panic due to lack of materials for > investigation. > > Kdump is a powerful troubleshooting feature, but it may accesses to > multiple hardware, like HBA, FC-cable, to get to dump disk. > > This means kdump is not robust against hardware failure. > > Solution > ======== > > Logging kernel message to persistent device is an effective way to get > materials for investigation in case of kdump failure. > > So this patch adds kmsg_dump() to a kexec path. Also, it adds > KMSG_DUMP_KEXEC to pstore_cannot_block_path() so that it can avoid > deadlocking in kexec path. > > Please see the detail of pstore_cannot_block_path(). > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/pstore/platform.c?id=9f244e9cfd70c7c0f82d3c92ce772ab2a92d9f64 > > Actually, there are some objections about kmsg_dump(KMSG_DUMP_KEXEC) and > EFI below. But I still think adding kmsg_dump() to a kexec path is > useful. > > - http://marc.info/?l=linux-kernel&m=130698519720887&w=2 > > (1) kdump already saves kernel messages inside /proc/vmcore > > - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/kexec.c?id=a3dd3323058d281abd584b15ad4c5b65064d7a61 > > It is correct, but the content of /proc/vmcore is stored a dump disk > as well. So, if kdump fails due to hardware failures, the kernel > messages will be lost. > > (2) EFI firmware is buggy > > - http://marc.info/?l=linux-kernel&m=130698519720887&w=2 > > I haven't seen actual firmware bugs which may cause kdump failure. So > I don't think we need to care about it too much. However, just to be > safe, I introduced pstore_cannot_block_path() avoid deadlocking to > pstore. > > Also, this patch doesn't affect almost all users because kmsg_dump() is > kicked only when specifying both pstore.backend and > printk.always_kmsg_dump parameters. Even if a buggy firmware causes a > kdump failure and someone blames kdump, we can ask them to reproduce the > kdump failure by removing the parameters. > > In addition, I checked current coding of platform drivers. There is no > obvious problem as follows. > > - mtdoops/ramoops > They are designed to be kicked in panic and oops cases only. > So, they never run in a kexec path. > > - erst/efi/early_printk_mrst/nvram driver for powerpc > I don't see any bugs which may causes kdump failure because > deadlocking/dynamic memory allocation don't happen in their write callbacks. > > Signed-off-by: Seiji Aguchi <sei...@hd...> > Signed-off-by: Andrew Morton <ak...@li...> > --- > > fs/pstore/platform.c | 4 ++++ > include/linux/kmsg_dump.h | 1 + > kernel/kexec.c | 2 ++ > 3 files changed, 7 insertions(+) > > diff -puN fs/pstore/platform.c~add-kmsg_dump-to-kexec-path fs/pstore/platform.c > --- a/fs/pstore/platform.c~add-kmsg_dump-to-kexec-path > +++ a/fs/pstore/platform.c > @@ -91,6 +91,8 @@ static const char *get_reason_str(enum k > return "Halt"; > case KMSG_DUMP_POWEROFF: > return "Poweroff"; > + case KMSG_DUMP_KEXEC: > + return "Kexec"; > default: > return "Unknown"; > } > @@ -110,6 +112,8 @@ bool pstore_cannot_block_path(enum kmsg_ > case KMSG_DUMP_PANIC: > /* Emergency restart shouldn't be blocked by spin lock. */ > case KMSG_DUMP_EMERG: > + /* In kexec path, pstore shouldn't be blocked to avoid kexec failure. */ > + case KMSG_DUMP_KEXEC: > return true; > default: > return false; > diff -puN include/linux/kmsg_dump.h~add-kmsg_dump-to-kexec-path include/linux/kmsg_dump.h > --- a/include/linux/kmsg_dump.h~add-kmsg_dump-to-kexec-path > +++ a/include/linux/kmsg_dump.h > @@ -28,6 +28,7 @@ enum kmsg_dump_reason { > KMSG_DUMP_RESTART, > KMSG_DUMP_HALT, > KMSG_DUMP_POWEROFF, > + KMSG_DUMP_KEXEC, > }; > > /** > diff -puN kernel/kexec.c~add-kmsg_dump-to-kexec-path kernel/kexec.c > --- a/kernel/kexec.c~add-kmsg_dump-to-kexec-path > +++ a/kernel/kexec.c > @@ -32,6 +32,7 @@ > #include <linux/vmalloc.h> > #include <linux/swap.h> > #include <linux/syscore_ops.h> > +#include <linux/kmsg_dump.h> > > #include <asm/page.h> > #include <asm/uaccess.h> > @@ -1089,6 +1090,7 @@ void crash_kexec(struct pt_regs *regs) > > crash_setup_regs(&fixed_regs, regs); > crash_save_vmcoreinfo(); > + kmsg_dump(KMSG_DUMP_KEXEC); > machine_crash_shutdown(&fixed_regs); > machine_kexec(kexec_crash_image); > } > _ |
|
From: <gr...@li...> - 2013-05-21 22:47:49
|
On Tue, May 21, 2013 at 03:41:37PM -0700, Andrew Morton wrote: > Guys, can we please get some review attention to this? I thought it was reviewed, and rejected, already. Did you miss those emails? thanks, greg k-h |
|
From: Eric B. <eb...@re...> - 2013-05-18 04:07:33
|
On 05/10/2013 06:10 PM, Seiji Aguchi wrote: > Any comments? Apologies that no one has spoken up yet, but this is on my list of things to review. I got waylaid by debugging two nasty races today, or I would have had time to actually give comments; but come Monday, you should have some better feedback. > >> -----Original Message----- >> From: Seiji Aguchi >> Sent: Thursday, May 02, 2013 1:26 PM >> To: lib...@re... >> Cc: dle...@li...; Tomoki Sekiyama (tom...@hd...) >> Subject: [PATCH] Introduce startupPolicy for chardev to make guest OS bootable when hardware failure happens in log disk >> >> [Problem] >> Currently, guest OS's messages can be logged to a local disk of host OS >> by creating chadevs with options below. >> -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0 >> >> When a hardware failure happens in the disk, qemu-kvm can't create the chardevs. >> In this case, guest OS doesn't boot up. >> >> Actually, there are users who don't desire that guest OS goes down due to a hardware failure >> of a log disk only.Therefore, qemu should offer some way to boot guest OS up even if the log >> disk is broken. >> >> [Solution] >> This patch supports startupPolicy for chardev. >> >> The starupPolicy is introduced just in case where chardev is "file" >> because this patch aims for making guest OS boot up when a hardware failure happens. >> >> In other cases ,pty, dev, pipe and unix, it is not introduced >> because they don't access to hardware. >> >> The policy works as follows. >> - If the value is "optional", guestOS boots up by dropping the chardev. >> - If other values are specified, guestOS fails to boot up. (the default) >> >> Description about original startupPolicy attribute: >> http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=e5a84d74a2789a917bf394f15de9989ec48fded0 >> >> Signed-off-by: Seiji Aguchi <sei...@hd...> >> --- >> docs/formatdomain.html.in | 9 ++++++++- >> docs/schemas/domaincommon.rng | 3 +++ >> src/conf/domain_conf.c | 8 ++++++++ >> src/conf/domain_conf.h | 1 + >> src/qemu/qemu_process.c | 25 ++++++++++++++++++++++++- >> tests/virt-aa-helper-test | 3 +++ >> 6 files changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in >> index f325c3c..1e1bf27 100644 >> --- a/docs/formatdomain.html.in >> +++ b/docs/formatdomain.html.in >> @@ -4044,13 +4044,20 @@ qemu-kvm -net nic,model=? /dev/null >> <p> >> A file is opened and all data sent to the character >> device is written to the file. >> + It is possible to define policy whether guestOS boots up >> + if the file is not accessible. This is done by a startupPolicy >> + attribute: >> + <ul> >> + <li>If the vaule is "optional", guestOS boots up by dropping the file.</li> >> + <li>If other values are specified, guestOS fails to boot up. (the default)</li> >> + </ul> >> </p> >> >> <pre> >> ... >> <devices> >> <serial type="file"> >> - <source path="/var/log/vm/vm-serial.log"/> >> + <source path="/var/log/vm/vm-serial.log" startupPolicy="optional"/> >> <target port="1"/> >> </serial> >> </devices> >> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng >> index 10596dc..6fc0a3c 100644 >> --- a/docs/schemas/domaincommon.rng >> +++ b/docs/schemas/domaincommon.rng >> @@ -2706,6 +2706,9 @@ >> </optional> >> <optional> >> <attribute name="path"/> >> + <optional> >> + <ref name='startupPolicy'/> >> + </optional> >> </optional> >> <optional> >> <attribute name="host"/> >> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c >> index a8b5dfd..6680f15 100644 >> --- a/src/conf/domain_conf.c >> +++ b/src/conf/domain_conf.c >> @@ -6467,6 +6467,7 @@ virDomainChrSourceDefParseXML(virDomainChrSourceDefPtr def, >> char *path = NULL; >> char *mode = NULL; >> char *protocol = NULL; >> + char *startupPolicy = NULL; >> int remaining = 0; >> >> while (cur != NULL) { >> @@ -6487,6 +6488,9 @@ virDomainChrSourceDefParseXML(virDomainChrSourceDefPtr def, >> !(flags & VIR_DOMAIN_XML_INACTIVE))) >> path = virXMLPropString(cur, "path"); >> >> + if (startupPolicy == NULL && >> + def->type == VIR_DOMAIN_CHR_TYPE_FILE) >> + startupPolicy = virXMLPropString(cur, "startupPolicy"); >> break; >> >> case VIR_DOMAIN_CHR_TYPE_UDP: >> @@ -6559,6 +6563,10 @@ virDomainChrSourceDefParseXML(virDomainChrSourceDefPtr def, >> >> def->data.file.path = path; >> path = NULL; >> + >> + def->data.file.startupPolicy = >> + virDomainStartupPolicyTypeFromString(startupPolicy); >> + startupPolicy = NULL; >> break; >> >> case VIR_DOMAIN_CHR_TYPE_STDIO: >> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h >> index 3a0f23a..e709951 100644 >> --- a/src/conf/domain_conf.h >> +++ b/src/conf/domain_conf.h >> @@ -1052,6 +1052,7 @@ struct _virDomainChrSourceDef { >> /* no <source> for null, vc, stdio */ >> struct { >> char *path; >> + int startupPolicy; /* enum virDomainStartupPolicy */ >> } file; /* pty, file, pipe, or device */ >> struct { >> char *host; >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index e75c8c9..6e2f78e 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -2442,7 +2442,30 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED, >> virReportSystemError(errno, >> _("Unable to pre-create chardev file '%s'"), >> dev->source.data.file.path); >> - return -1; >> + if (dev->source.data.file.startupPolicy != >> + VIR_DOMAIN_STARTUP_POLICY_OPTIONAL) { >> + return -1; >> + } >> + VIR_FREE(dev->source.data.file.path); >> + /* >> + * Change a destination to /dev/null to boot guest OS up >> + * even if a log disk is broken. >> + */ >> + VIR_WARN("Switch the destination to /dev/null"); >> + dev->source.data.file.path = strdup("/dev/null"); >> + >> + if (!(dev->source.data.file.path)) { >> + virReportOOMError(); >> + return -1; >> + } >> + >> + if ((fd = open(dev->source.data.file.path, >> + O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) { >> + virReportSystemError(errno, >> + _("Unable to pre-create chardev file '%s'"), >> + dev->source.data.file.path); >> + return -1; >> + } >> } >> >> VIR_FORCE_CLOSE(fd); >> diff --git a/tests/virt-aa-helper-test b/tests/virt-aa-helper-test >> index af91c61..7172fd6 100755 >> --- a/tests/virt-aa-helper-test >> +++ b/tests/virt-aa-helper-test >> @@ -255,6 +255,9 @@ testme "0" "disk (empty cdrom)" "-r -u $valid_uuid" "$test_xml" >> sed -e "s,###UUID###,$uuid,g" -e "s,###DISK###,$disk1,g" -e "s,</devices>,<serial type='file'><source >> path='$tmpdir/serial.log'/><target port='0'/></serial></devices>,g" "$template_xml" > "$test_xml" >> testme "0" "serial" "-r -u $valid_uuid" "$test_xml" >> >> +sed -e "s,###UUID###,$uuid,g" -e "s,###DISK###,$disk1,g" -e "s,</devices>,<serial type='file'><source path='$tmpdir/serial.log' >> startupPolicy='optional'/><target port='0'/></serial></devices>,g" "$template_xml" > "$test_xml" >> +testme "0" "serial" "-r -u $valid_uuid" "$test_xml" >> + >> sed -e "s,###UUID###,$uuid,g" -e "s,###DISK###,$disk1,g" -e "s,</devices>,<serial type='pty'><target >> port='0'/></serial></devices>,g" "$template_xml" > "$test_xml" >> testme "0" "serial (pty)" "-r -u $valid_uuid" "$test_xml" >> >> -- >> 1.7.1 > > > -- > libvir-list mailing list > lib...@re... > https://www.redhat.com/mailman/listinfo/libvir-list > > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org |
|
From: Matt F. <mat...@in...> - 2013-05-13 19:51:43
|
On 05/10/2013 09:45 PM, Seiji Aguchi wrote: > The loop in efivar_update_sysfs_entries() reuses the same allocation for > entries each time it calls efivar_create_sysfs_entry(entry). This is > wrong because efivar_create_sysfs_entry() expects to keep the memory it > was passed, so the caller may not free it (and may not pass the same > memory in multiple times). This leads to the oops below. Fix by > getting a new allocation each time we go around the loop. > > ---[ end trace ba4907d5c519d111 ]--- > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<ffffffff8142f81f>] efivar_entry_find+0x14f/0x2d0 > PGD 0 > Oops: 0000 [#2] SMP > Modules linked in: oops(OF+) ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel arc4 ghash_clmulni_intel aesni_intel ablk_helper iwldvm cryptd lrw gf128mul glue_helper aes_x86_64 microcode mac80211 sg thinkpad_acpi pcspkr i2c_i801 lpc_ich mfd_core iwlwifi cfg80211 rfkill snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ptp pps_core wmi ext4(F) jbd2(F) mbcache(F) sd_mod(F) crc_t10dif(F) ahci(F) libahci(F) sdhci_pci(F) sdhci(F) mmc_core(F) i915(F) drm_kms_helper(F) drm(F) i2c_algo_bit(F) i2c_core(F) video(F) dm_mirror(F) dm > > _region_hash(F) dm_log(F) dm_mod(F) > CPU: 0 PID: 301 Comm: kworker/0:2 Tainted: GF D O 3.9.0+ #1 > Hardware name: LENOVO 4291EV7/4291EV7, BIOS 8DET52WW (1.22 ) 09/15/2011 > Workqueue: events efivar_update_sysfs_entries > task: ffff8801955920c0 ti: ffff88019413e000 task.ti: ffff88019413e000 > RIP: 0010:[<ffffffff8142f81f>] [<ffffffff8142f81f>] efivar_entry_find+0x14f/0x2d0 > RSP: 0018:ffff88019413fa48 EFLAGS: 00010006 > RAX: 0000000000000000 RBX: ffff880195d87c00 RCX: ffffffff81ab6f60 > RDX: ffff88019413fb88 RSI: 0000000000000400 RDI: ffff880196254000 > RBP: ffff88019413fbd8 R08: 0000000000000000 R09: ffff8800dad99037 > R10: ffff880195d87c00 R11: 0000000000000430 R12: ffffffff81ab6f60 > R13: fffffffffffff7d8 R14: ffff880196254000 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88019e200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000407f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Stack: > ffff88019413fb78 ffff88019413fb88 ffffffff81e85d60 03000000972b5c00 > ffff88019413fa29 ffffffff81e85d60 ffff88019413fbfb 0000000197087280 > 00000000000000fe 0000000000000001 ffffffff81e85dd9 ffff880197087280 > Call Trace: > [<ffffffff81254371>] ? idr_get_empty_slot+0x131/0x240 > [<ffffffff8125b6d2>] ? put_dec+0x72/0x90 > [<ffffffff81158e40>] ? cache_alloc_refill+0x170/0x2f0 > [<ffffffff81430420>] efivar_update_sysfs_entry+0x150/0x220 > [<ffffffff8103dd29>] ? efi_call2+0x9/0x70 > [<ffffffff8103d787>] ? virt_efi_get_next_variable+0x47/0x1b0 > [<ffffffff8115a8df>] ? kmem_cache_alloc_trace+0x1af/0x1c0 > [<ffffffff81430033>] efivar_init+0x2c3/0x380 > [<ffffffff814302d0>] ? efivar_delete+0xd0/0xd0 > [<ffffffff8143111f>] efivar_update_sysfs_entries+0x6f/0x90 > [<ffffffff810605f3>] process_one_work+0x183/0x490 > [<ffffffff81061780>] worker_thread+0x120/0x3a0 > [<ffffffff81061660>] ? manage_workers+0x160/0x160 > [<ffffffff8106752e>] kthread+0xce/0xe0 > [<ffffffff81067460>] ? kthread_freezable_should_stop+0x70/0x70 > [<ffffffff81543c5c>] ret_from_fork+0x7c/0xb0 > [<ffffffff81067460>] ? kthread_freezable_should_stop+0x70/0x70 > Code: 8d 55 b0 48 8d 45 a0 49 81 ed 28 08 00 00 48 89 95 78 fe ff ff 48 89 85 70 fe ff ff eb 27 66 0f 1f 44 00 00 4d 8d bd 28 08 00 00 <49> 8b 85 28 08 00 00 4d 39 e7 0f 84 21 01 00 00 4d 89 ee 4c 8d > RIP [<ffffffff8142f81f>] efivar_entry_find+0x14f/0x2d0 > RSP <ffff88019413fa48> > CR2: 0000000000000000 > ---[ end trace ba4907d5c519d112 ]--- > > Signed-off-by: Seiji Aguchi <sei...@hd...> > --- > drivers/firmware/efi/efivars.c | 8 +++----- > 1 files changed, 3 insertions(+), 5 deletions(-) Applied, thanks. |