You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Seiji A. <sei...@hd...> - 2013-04-05 19:26:06
|
Steven,
I confirmed that smp_apic_timer_interrupt is traced in a function graph tracing below.
If additional testing is needed, please let me know.
trace-cmd start -p function_graph -g "smp_trace_apic_timer_interrupt" -g "smp_apic_timer_interrupt" -e irq_vectors
plugin function_graph
Seiji
> -----Original Message-----
> From: Seiji Aguchi
> Sent: Friday, April 05, 2013 3:22 PM
> To: lin...@vg...; x8...@ke...; ro...@go...; hp...@zy...
> Cc: Thomas Gleixner (tg...@li...); 'mi...@el...' (mi...@el...); Borislav Petkov (bp...@al...); linux-
> ed...@vg...; Luck, Tony (ton...@in...); dle...@li...; Tomoki Sekiyama
> Subject: [PATCH v12 3/3] trace,x86: code-sharing between non-trace and trace irq handlers
>
> [Issue]
>
> Currently, irq vector handlers for tracing are just copied non-trace handlers by simply inserting tracepoints.
>
> It is difficult to manage the codes.
>
> [Solution]
>
> This patch shares common codes between non-trace and trace handlers as follows to make them manageable and readable.
>
> Non-trace irq handler:
> smp_irq_handler()
> {
> entering_irq(); /* pre-processing of this handler */
> __smp_irq_handler(); /*
> * common logic between non-trace and trace handlers
> * in a vector.
> */
> exiting_irq(); /* post-processing of this handler */
>
> }
>
> Trace irq_handler:
> smp_trace_irq_handler()
> {
> entering_irq(); /* pre-processing of this handler */
> trace_irq_entry(); /* tracepoint for irq entry */
> __smp_irq_handler(); /*
> * common logic between non-trace and trace handlers
> * in a vector.
> */
> trace_irq_exit(); /* tracepoint for irq exit */
> exiting_irq(); /* post-processing of this handler */
>
> }
>
> If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks \ cleaner.
>
> smp_trace_irq_handler()
> {
> trace_irq_entry();
> smp_irq_handler();
> trace_irq_exit();
> }
>
> But it doesn't work.
> The problem is with irq_enter/exit() being called. They must be called before \ trace_irq_enter/exit(), because of the rcu_irq_enter()
> must be called before any \ tracepoints are used, as tracepoints use rcu to synchronize.
>
> As a possible alternative, we may be able to call irq_enter() first as follows if \
> irq_enter() can nest.
>
> smp_trace_irq_hander()
> {
> irq_entry();
> trace_irq_entry();
> smp_irq_handler();
> trace_irq_exit();
> irq_exit();
> }
>
> But it doesn't work, either.
> If irq_enter() is nested, it may have a time penalty because it has to check if it \ was already called or not. The time penalty is not
> desired in performance sensitive \ paths even if it is tiny.
>
> Signed-off-by: Seiji Aguchi <sei...@hd...>
> ---
> arch/x86/include/asm/apic.h | 27 ++++++++
> arch/x86/kernel/apic/apic.c | 103 ++++++++----------------------
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 24 +++----
> arch/x86/kernel/cpu/mcheck/threshold.c | 24 +++----
> arch/x86/kernel/irq.c | 34 +++-------
> arch/x86/kernel/irq_work.c | 22 ++++--
> arch/x86/kernel/smp.c | 54 ++++++++++------
> 7 files changed, 137 insertions(+), 151 deletions(-)
>
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 3388034..f8119b5 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -12,6 +12,7 @@
> #include <asm/fixmap.h>
> #include <asm/mpspec.h>
> #include <asm/msr.h>
> +#include <asm/idle.h>
>
> #define ARCH_APICTIMER_STOPS_ON_C3 1
>
> @@ -687,5 +688,31 @@ extern int default_check_phys_apicid_present(int phys_apicid); #endif
>
> #endif /* CONFIG_X86_LOCAL_APIC */
> +extern void irq_enter(void);
> +extern void irq_exit(void);
> +
> +static inline void entering_irq(void)
> +{
> + irq_enter();
> + exit_idle();
> +}
> +
> +static inline void entering_ack_irq(void) {
> + ack_APIC_irq();
> + entering_irq();
> +}
> +
> +static inline void exiting_irq(void)
> +{
> + irq_exit();
> +}
> +
> +static inline void exiting_ack_irq(void) {
> + irq_exit();
> + /* Ack only at the end to avoid potential reentry */
> + ack_APIC_irq();
> +}
>
> #endif /* _ASM_X86_APIC_H */
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 7d39c68..61ced40 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -922,17 +922,14 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
> /*
> * NOTE! We'd better ACK the irq immediately,
> * because timer handling can be slow.
> - */
> - ack_APIC_irq();
> - /*
> + *
> * update_process_times() expects us to have done irq_enter().
> * Besides, if we don't timer interrupts ignore the global
> * interrupt lock, which is the WrongThing (tm) to do.
> */
> - irq_enter();
> - exit_idle();
> + entering_ack_irq();
> local_apic_timer_interrupt();
> - irq_exit();
> + exiting_irq();
>
> set_irq_regs(old_regs);
> }
> @@ -944,19 +941,16 @@ void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs)
> /*
> * NOTE! We'd better ACK the irq immediately,
> * because timer handling can be slow.
> - */
> - ack_APIC_irq();
> - /*
> + *
> * update_process_times() expects us to have done irq_enter().
> * Besides, if we don't timer interrupts ignore the global
> * interrupt lock, which is the WrongThing (tm) to do.
> */
> - irq_enter();
> - exit_idle();
> + entering_ack_irq();
> trace_local_timer_entry(LOCAL_TIMER_VECTOR);
> local_apic_timer_interrupt();
> trace_local_timer_exit(LOCAL_TIMER_VECTOR);
> - irq_exit();
> + exiting_irq();
>
> set_irq_regs(old_regs);
> }
> @@ -1934,12 +1928,10 @@ int __init APIC_init_uniprocessor(void)
> /*
> * This interrupt should _never_ happen with our APIC/SMP architecture
> */
> -void smp_spurious_interrupt(struct pt_regs *regs)
> +static inline void __smp_spurious_interrupt(void)
> {
> u32 v;
>
> - irq_enter();
> - exit_idle();
> /*
> * Check if this really is a spurious interrupt and ACK it
> * if it is a vectored one. Just in case...
> @@ -1954,38 +1946,28 @@ void smp_spurious_interrupt(struct pt_regs *regs)
> /* see sw-dev-man vol 3, chapter 7.4.13.5 */
> pr_info("spurious APIC interrupt on CPU#%d, "
> "should never happen.\n", smp_processor_id());
> - irq_exit();
> }
>
> -void smp_trace_spurious_interrupt(struct pt_regs *regs)
> +void smp_spurious_interrupt(struct pt_regs *regs)
> {
> - u32 v;
> + entering_irq();
> + __smp_spurious_interrupt();
> + exiting_irq();
> +}
>
> - irq_enter();
> - exit_idle();
> +void smp_trace_spurious_interrupt(struct pt_regs *regs) {
> + entering_irq();
> trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR);
> - /*
> - * Check if this really is a spurious interrupt and ACK it
> - * if it is a vectored one. Just in case...
> - * Spurious interrupts should not be ACKed.
> - */
> - v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));
> - if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
> - ack_APIC_irq();
> -
> - inc_irq_stat(irq_spurious_count);
> -
> - /* see sw-dev-man vol 3, chapter 7.4.13.5 */
> - pr_info("spurious APIC interrupt on CPU#%d, "
> - "should never happen.\n", smp_processor_id());
> + __smp_spurious_interrupt();
> trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR);
> - irq_exit();
> + exiting_irq();
> }
>
> /*
> * This interrupt should never happen with our APIC/SMP architecture
> */
> -void smp_error_interrupt(struct pt_regs *regs)
> +static inline void __smp_error_interrupt(struct pt_regs *regs)
> {
> u32 v0, v1;
> u32 i = 0;
> @@ -2000,8 +1982,6 @@ void smp_error_interrupt(struct pt_regs *regs)
> "Illegal register address", /* APIC Error Bit 7 */
> };
>
> - irq_enter();
> - exit_idle();
> /* First tickle the hardware, only then report what went on. -- REW */
> v0 = apic_read(APIC_ESR);
> apic_write(APIC_ESR, 0);
> @@ -2022,49 +2002,22 @@ void smp_error_interrupt(struct pt_regs *regs)
>
> apic_printk(APIC_DEBUG, KERN_CONT "\n");
>
> - irq_exit();
> }
>
> -void smp_trace_error_interrupt(struct pt_regs *regs)
> +void smp_error_interrupt(struct pt_regs *regs)
> {
> - u32 v0, v1;
> - u32 i = 0;
> - static const char * const error_interrupt_reason[] = {
> - "Send CS error", /* APIC Error Bit 0 */
> - "Receive CS error", /* APIC Error Bit 1 */
> - "Send accept error", /* APIC Error Bit 2 */
> - "Receive accept error", /* APIC Error Bit 3 */
> - "Redirectable IPI", /* APIC Error Bit 4 */
> - "Send illegal vector", /* APIC Error Bit 5 */
> - "Received illegal vector", /* APIC Error Bit 6 */
> - "Illegal register address", /* APIC Error Bit 7 */
> - };
> + entering_irq();
> + __smp_error_interrupt(regs);
> + exiting_irq();
> +}
>
> - irq_enter();
> - exit_idle();
> +void smp_trace_error_interrupt(struct pt_regs *regs) {
> + entering_irq();
> trace_error_apic_entry(ERROR_APIC_VECTOR);
> - /* First tickle the hardware, only then report what went on. -- REW */
> - v0 = apic_read(APIC_ESR);
> - apic_write(APIC_ESR, 0);
> - v1 = apic_read(APIC_ESR);
> - ack_APIC_irq();
> - atomic_inc(&irq_err_count);
> -
> - apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
> - smp_processor_id(), v0 , v1);
> -
> - v1 = v1 & 0xff;
> - while (v1) {
> - if (v1 & 0x1)
> - apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]);
> - i++;
> - v1 >>= 1;
> - }
> -
> - apic_printk(APIC_DEBUG, KERN_CONT "\n");
> -
> + __smp_error_interrupt(regs);
> trace_error_apic_exit(ERROR_APIC_VECTOR);
> - irq_exit();
> + exiting_irq();
> }
>
> /**
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index e7aa7fc..2f3a799 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -379,28 +379,26 @@ static void unexpected_thermal_interrupt(void)
>
> static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
>
> -asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
> +static inline void __smp_thermal_interrupt(void)
> {
> - irq_enter();
> - exit_idle();
> inc_irq_stat(irq_thermal_count);
> smp_thermal_vector();
> - irq_exit();
> - /* Ack only at the end to avoid potential reentry */
> - ack_APIC_irq();
> +}
> +
> +asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) {
> + entering_irq();
> + __smp_thermal_interrupt();
> + exiting_ack_irq();
> }
>
> asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs) {
> - irq_enter();
> - exit_idle();
> + entering_irq();
> trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
> - inc_irq_stat(irq_thermal_count);
> - smp_thermal_vector();
> + __smp_thermal_interrupt();
> trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
> - irq_exit();
> - /* Ack only at the end to avoid potential reentry */
> - ack_APIC_irq();
> + exiting_ack_irq();
> }
>
> /* Thermal monitoring depends on APIC, ACPI and clock modulation */ diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c
> b/arch/x86/kernel/cpu/mcheck/threshold.c
> index 0cbef99..fe6b1c8 100644
> --- a/arch/x86/kernel/cpu/mcheck/threshold.c
> +++ b/arch/x86/kernel/cpu/mcheck/threshold.c
> @@ -18,26 +18,24 @@ static void default_threshold_interrupt(void)
>
> void (*mce_threshold_vector)(void) = default_threshold_interrupt;
>
> -asmlinkage void smp_threshold_interrupt(void)
> +static inline void __smp_threshold_interrupt(void)
> {
> - irq_enter();
> - exit_idle();
> inc_irq_stat(irq_threshold_count);
> mce_threshold_vector();
> - irq_exit();
> - /* Ack only at the end to avoid potential reentry */
> - ack_APIC_irq();
> +}
> +
> +asmlinkage void smp_threshold_interrupt(void) {
> + entering_irq();
> + __smp_threshold_interrupt();
> + exiting_ack_irq();
> }
>
> asmlinkage void smp_trace_threshold_interrupt(void)
> {
> - irq_enter();
> - exit_idle();
> + entering_irq();
> trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
> - inc_irq_stat(irq_threshold_count);
> - mce_threshold_vector();
> + __smp_threshold_interrupt();
> trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
> - irq_exit();
> - /* Ack only at the end to avoid potential reentry */
> - ack_APIC_irq();
> + exiting_ack_irq();
> }
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 216bec1..ae836cd 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -209,23 +209,21 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
> /*
> * Handler for X86_PLATFORM_IPI_VECTOR.
> */
> -void smp_x86_platform_ipi(struct pt_regs *regs)
> +void __smp_x86_platform_ipi(void)
> {
> - struct pt_regs *old_regs = set_irq_regs(regs);
> -
> - ack_APIC_irq();
> -
> - irq_enter();
> -
> - exit_idle();
> -
> inc_irq_stat(x86_platform_ipis);
>
> if (x86_platform_ipi_callback)
> x86_platform_ipi_callback();
> +}
>
> - irq_exit();
> +void smp_x86_platform_ipi(struct pt_regs *regs) {
> + struct pt_regs *old_regs = set_irq_regs(regs);
>
> + entering_ack_irq();
> + __smp_x86_platform_ipi();
> + exiting_irq();
> set_irq_regs(old_regs);
> }
>
> @@ -233,21 +231,11 @@ void smp_trace_x86_platform_ipi(struct pt_regs *regs) {
> struct pt_regs *old_regs = set_irq_regs(regs);
>
> - ack_APIC_irq();
> -
> - irq_enter();
> -
> - exit_idle();
> -
> + entering_ack_irq();
> trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR);
> - inc_irq_stat(x86_platform_ipis);
> -
> - if (x86_platform_ipi_callback)
> - x86_platform_ipi_callback();
> -
> + __smp_x86_platform_ipi();
> trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR);
> - irq_exit();
> -
> + exiting_irq();
> set_irq_regs(old_regs);
> }
>
> diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c index 09e6262..636a55e 100644
> --- a/arch/x86/kernel/irq_work.c
> +++ b/arch/x86/kernel/irq_work.c
> @@ -10,24 +10,32 @@
> #include <asm/apic.h>
> #include <asm/trace/irq_vectors.h>
>
> -void smp_irq_work_interrupt(struct pt_regs *regs)
> +static inline void irq_work_entering_irq(void)
> {
> irq_enter();
> ack_APIC_irq();
> +}
> +
> +static inline void __smp_irq_work_interrupt(void) {
> inc_irq_stat(apic_irq_work_irqs);
> irq_work_run();
> - irq_exit();
> +}
> +
> +void smp_irq_work_interrupt(struct pt_regs *regs) {
> + irq_work_entering_irq();
> + __smp_irq_work_interrupt();
> + exiting_irq();
> }
>
> void smp_trace_irq_work_interrupt(struct pt_regs *regs) {
> - irq_enter();
> - ack_APIC_irq();
> + irq_work_entering_irq();
> trace_irq_work_entry(IRQ_WORK_VECTOR);
> - inc_irq_stat(apic_irq_work_irqs);
> - irq_work_run();
> + __smp_irq_work_interrupt();
> trace_irq_work_exit(IRQ_WORK_VECTOR);
> - irq_exit();
> + exiting_irq();
> }
>
> void arch_irq_work_raise(void)
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index aad58af..f4fe0b8 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -250,11 +250,16 @@ finish:
> /*
> * Reschedule call back.
> */
> -void smp_reschedule_interrupt(struct pt_regs *regs)
> +static inline void __smp_reschedule_interrupt(void)
> {
> - ack_APIC_irq();
> inc_irq_stat(irq_resched_count);
> scheduler_ipi();
> +}
> +
> +void smp_reschedule_interrupt(struct pt_regs *regs) {
> + ack_APIC_irq();
> + __smp_reschedule_interrupt();
> /*
> * KVM uses this interrupt to force a cpu out of guest mode
> */
> @@ -264,52 +269,61 @@ void smp_trace_reschedule_interrupt(struct pt_regs *regs) {
> ack_APIC_irq();
> trace_reschedule_entry(RESCHEDULE_VECTOR);
> - inc_irq_stat(irq_resched_count);
> - scheduler_ipi();
> + __smp_reschedule_interrupt();
> trace_reschedule_exit(RESCHEDULE_VECTOR);
> /*
> * KVM uses this interrupt to force a cpu out of guest mode
> */
> }
>
> -void smp_call_function_interrupt(struct pt_regs *regs)
> +static inline void call_function_entering_irq(void)
> {
> ack_APIC_irq();
> irq_enter();
> +}
> +
> +static inline void __smp_call_function_interrupt(void)
> +{
> generic_smp_call_function_interrupt();
> inc_irq_stat(irq_call_count);
> - irq_exit();
> +}
> +
> +void smp_call_function_interrupt(struct pt_regs *regs) {
> + call_function_entering_irq();
> + __smp_call_function_interrupt();
> + exiting_irq();
> }
>
> void smp_trace_call_function_interrupt(struct pt_regs *regs) {
> - ack_APIC_irq();
> - irq_enter();
> + call_function_entering_irq();
> trace_call_function_entry(CALL_FUNCTION_VECTOR);
> - generic_smp_call_function_interrupt();
> - inc_irq_stat(irq_call_count);
> + __smp_call_function_interrupt();
> trace_call_function_exit(CALL_FUNCTION_VECTOR);
> - irq_exit();
> + exiting_irq();
> }
>
> -void smp_call_function_single_interrupt(struct pt_regs *regs)
> +static inline void __smp_call_function_single_interrupt(void)
> {
> - ack_APIC_irq();
> - irq_enter();
> generic_smp_call_function_single_interrupt();
> inc_irq_stat(irq_call_count);
> - irq_exit();
> +}
> +
> +void smp_call_function_single_interrupt(struct pt_regs *regs) {
> + call_function_entering_irq();
> + __smp_call_function_single_interrupt();
> + exiting_irq();
> }
>
> void smp_trace_call_function_single_interrupt(struct pt_regs *regs) {
> - ack_APIC_irq();
> - irq_enter();
> + call_function_entering_irq();
> trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
> - generic_smp_call_function_single_interrupt();
> - inc_irq_stat(irq_call_count);
> + __smp_call_function_single_interrupt();
> trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
> - irq_exit();
> + exiting_irq();
> }
>
> static int __init nonmi_ipi_setup(char *str)
> --
> 1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2013-04-05 19:22:10
|
[Issue]
Currently, irq vector handlers for tracing are just
copied non-trace handlers by simply inserting tracepoints.
It is difficult to manage the codes.
[Solution]
This patch shares common codes between non-trace and trace handlers
as follows to make them manageable and readable.
Non-trace irq handler:
smp_irq_handler()
{
entering_irq(); /* pre-processing of this handler */
__smp_irq_handler(); /*
* common logic between non-trace and trace handlers
* in a vector.
*/
exiting_irq(); /* post-processing of this handler */
}
Trace irq_handler:
smp_trace_irq_handler()
{
entering_irq(); /* pre-processing of this handler */
trace_irq_entry(); /* tracepoint for irq entry */
__smp_irq_handler(); /*
* common logic between non-trace and trace handlers
* in a vector.
*/
trace_irq_exit(); /* tracepoint for irq exit */
exiting_irq(); /* post-processing of this handler */
}
If tracepoints can place outside entering_irq()/exiting_irq() as follows, it looks \
cleaner.
smp_trace_irq_handler()
{
trace_irq_entry();
smp_irq_handler();
trace_irq_exit();
}
But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before \
trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before any \
tracepoints are used, as tracepoints use rcu to synchronize.
As a possible alternative, we may be able to call irq_enter() first as follows if \
irq_enter() can nest.
smp_trace_irq_hander()
{
irq_entry();
trace_irq_entry();
smp_irq_handler();
trace_irq_exit();
irq_exit();
}
But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it \
was already called or not. The time penalty is not desired in performance sensitive \
paths even if it is tiny.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
arch/x86/include/asm/apic.h | 27 ++++++++
arch/x86/kernel/apic/apic.c | 103 ++++++++----------------------
arch/x86/kernel/cpu/mcheck/therm_throt.c | 24 +++----
arch/x86/kernel/cpu/mcheck/threshold.c | 24 +++----
arch/x86/kernel/irq.c | 34 +++-------
arch/x86/kernel/irq_work.c | 22 ++++--
arch/x86/kernel/smp.c | 54 ++++++++++------
7 files changed, 137 insertions(+), 151 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 3388034..f8119b5 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -12,6 +12,7 @@
#include <asm/fixmap.h>
#include <asm/mpspec.h>
#include <asm/msr.h>
+#include <asm/idle.h>
#define ARCH_APICTIMER_STOPS_ON_C3 1
@@ -687,5 +688,31 @@ extern int default_check_phys_apicid_present(int phys_apicid);
#endif
#endif /* CONFIG_X86_LOCAL_APIC */
+extern void irq_enter(void);
+extern void irq_exit(void);
+
+static inline void entering_irq(void)
+{
+ irq_enter();
+ exit_idle();
+}
+
+static inline void entering_ack_irq(void)
+{
+ ack_APIC_irq();
+ entering_irq();
+}
+
+static inline void exiting_irq(void)
+{
+ irq_exit();
+}
+
+static inline void exiting_ack_irq(void)
+{
+ irq_exit();
+ /* Ack only at the end to avoid potential reentry */
+ ack_APIC_irq();
+}
#endif /* _ASM_X86_APIC_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 7d39c68..61ced40 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -922,17 +922,14 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.
- */
- ack_APIC_irq();
- /*
+ *
* update_process_times() expects us to have done irq_enter().
* Besides, if we don't timer interrupts ignore the global
* interrupt lock, which is the WrongThing (tm) to do.
*/
- irq_enter();
- exit_idle();
+ entering_ack_irq();
local_apic_timer_interrupt();
- irq_exit();
+ exiting_irq();
set_irq_regs(old_regs);
}
@@ -944,19 +941,16 @@ void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs)
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.
- */
- ack_APIC_irq();
- /*
+ *
* update_process_times() expects us to have done irq_enter().
* Besides, if we don't timer interrupts ignore the global
* interrupt lock, which is the WrongThing (tm) to do.
*/
- irq_enter();
- exit_idle();
+ entering_ack_irq();
trace_local_timer_entry(LOCAL_TIMER_VECTOR);
local_apic_timer_interrupt();
trace_local_timer_exit(LOCAL_TIMER_VECTOR);
- irq_exit();
+ exiting_irq();
set_irq_regs(old_regs);
}
@@ -1934,12 +1928,10 @@ int __init APIC_init_uniprocessor(void)
/*
* This interrupt should _never_ happen with our APIC/SMP architecture
*/
-void smp_spurious_interrupt(struct pt_regs *regs)
+static inline void __smp_spurious_interrupt(void)
{
u32 v;
- irq_enter();
- exit_idle();
/*
* Check if this really is a spurious interrupt and ACK it
* if it is a vectored one. Just in case...
@@ -1954,38 +1946,28 @@ void smp_spurious_interrupt(struct pt_regs *regs)
/* see sw-dev-man vol 3, chapter 7.4.13.5 */
pr_info("spurious APIC interrupt on CPU#%d, "
"should never happen.\n", smp_processor_id());
- irq_exit();
}
-void smp_trace_spurious_interrupt(struct pt_regs *regs)
+void smp_spurious_interrupt(struct pt_regs *regs)
{
- u32 v;
+ entering_irq();
+ __smp_spurious_interrupt();
+ exiting_irq();
+}
- irq_enter();
- exit_idle();
+void smp_trace_spurious_interrupt(struct pt_regs *regs)
+{
+ entering_irq();
trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR);
- /*
- * Check if this really is a spurious interrupt and ACK it
- * if it is a vectored one. Just in case...
- * Spurious interrupts should not be ACKed.
- */
- v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));
- if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
- ack_APIC_irq();
-
- inc_irq_stat(irq_spurious_count);
-
- /* see sw-dev-man vol 3, chapter 7.4.13.5 */
- pr_info("spurious APIC interrupt on CPU#%d, "
- "should never happen.\n", smp_processor_id());
+ __smp_spurious_interrupt();
trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR);
- irq_exit();
+ exiting_irq();
}
/*
* This interrupt should never happen with our APIC/SMP architecture
*/
-void smp_error_interrupt(struct pt_regs *regs)
+static inline void __smp_error_interrupt(struct pt_regs *regs)
{
u32 v0, v1;
u32 i = 0;
@@ -2000,8 +1982,6 @@ void smp_error_interrupt(struct pt_regs *regs)
"Illegal register address", /* APIC Error Bit 7 */
};
- irq_enter();
- exit_idle();
/* First tickle the hardware, only then report what went on. -- REW */
v0 = apic_read(APIC_ESR);
apic_write(APIC_ESR, 0);
@@ -2022,49 +2002,22 @@ void smp_error_interrupt(struct pt_regs *regs)
apic_printk(APIC_DEBUG, KERN_CONT "\n");
- irq_exit();
}
-void smp_trace_error_interrupt(struct pt_regs *regs)
+void smp_error_interrupt(struct pt_regs *regs)
{
- u32 v0, v1;
- u32 i = 0;
- static const char * const error_interrupt_reason[] = {
- "Send CS error", /* APIC Error Bit 0 */
- "Receive CS error", /* APIC Error Bit 1 */
- "Send accept error", /* APIC Error Bit 2 */
- "Receive accept error", /* APIC Error Bit 3 */
- "Redirectable IPI", /* APIC Error Bit 4 */
- "Send illegal vector", /* APIC Error Bit 5 */
- "Received illegal vector", /* APIC Error Bit 6 */
- "Illegal register address", /* APIC Error Bit 7 */
- };
+ entering_irq();
+ __smp_error_interrupt(regs);
+ exiting_irq();
+}
- irq_enter();
- exit_idle();
+void smp_trace_error_interrupt(struct pt_regs *regs)
+{
+ entering_irq();
trace_error_apic_entry(ERROR_APIC_VECTOR);
- /* First tickle the hardware, only then report what went on. -- REW */
- v0 = apic_read(APIC_ESR);
- apic_write(APIC_ESR, 0);
- v1 = apic_read(APIC_ESR);
- ack_APIC_irq();
- atomic_inc(&irq_err_count);
-
- apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
- smp_processor_id(), v0 , v1);
-
- v1 = v1 & 0xff;
- while (v1) {
- if (v1 & 0x1)
- apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]);
- i++;
- v1 >>= 1;
- }
-
- apic_printk(APIC_DEBUG, KERN_CONT "\n");
-
+ __smp_error_interrupt(regs);
trace_error_apic_exit(ERROR_APIC_VECTOR);
- irq_exit();
+ exiting_irq();
}
/**
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index e7aa7fc..2f3a799 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -379,28 +379,26 @@ static void unexpected_thermal_interrupt(void)
static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
-asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
+static inline void __smp_thermal_interrupt(void)
{
- irq_enter();
- exit_idle();
inc_irq_stat(irq_thermal_count);
smp_thermal_vector();
- irq_exit();
- /* Ack only at the end to avoid potential reentry */
- ack_APIC_irq();
+}
+
+asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
+{
+ entering_irq();
+ __smp_thermal_interrupt();
+ exiting_ack_irq();
}
asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs)
{
- irq_enter();
- exit_idle();
+ entering_irq();
trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
- inc_irq_stat(irq_thermal_count);
- smp_thermal_vector();
+ __smp_thermal_interrupt();
trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
- irq_exit();
- /* Ack only at the end to avoid potential reentry */
- ack_APIC_irq();
+ exiting_ack_irq();
}
/* Thermal monitoring depends on APIC, ACPI and clock modulation */
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index 0cbef99..fe6b1c8 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -18,26 +18,24 @@ static void default_threshold_interrupt(void)
void (*mce_threshold_vector)(void) = default_threshold_interrupt;
-asmlinkage void smp_threshold_interrupt(void)
+static inline void __smp_threshold_interrupt(void)
{
- irq_enter();
- exit_idle();
inc_irq_stat(irq_threshold_count);
mce_threshold_vector();
- irq_exit();
- /* Ack only at the end to avoid potential reentry */
- ack_APIC_irq();
+}
+
+asmlinkage void smp_threshold_interrupt(void)
+{
+ entering_irq();
+ __smp_threshold_interrupt();
+ exiting_ack_irq();
}
asmlinkage void smp_trace_threshold_interrupt(void)
{
- irq_enter();
- exit_idle();
+ entering_irq();
trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
- inc_irq_stat(irq_threshold_count);
- mce_threshold_vector();
+ __smp_threshold_interrupt();
trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
- irq_exit();
- /* Ack only at the end to avoid potential reentry */
- ack_APIC_irq();
+ exiting_ack_irq();
}
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 216bec1..ae836cd 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -209,23 +209,21 @@ unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
/*
* Handler for X86_PLATFORM_IPI_VECTOR.
*/
-void smp_x86_platform_ipi(struct pt_regs *regs)
+void __smp_x86_platform_ipi(void)
{
- struct pt_regs *old_regs = set_irq_regs(regs);
-
- ack_APIC_irq();
-
- irq_enter();
-
- exit_idle();
-
inc_irq_stat(x86_platform_ipis);
if (x86_platform_ipi_callback)
x86_platform_ipi_callback();
+}
- irq_exit();
+void smp_x86_platform_ipi(struct pt_regs *regs)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+ entering_ack_irq();
+ __smp_x86_platform_ipi();
+ exiting_irq();
set_irq_regs(old_regs);
}
@@ -233,21 +231,11 @@ void smp_trace_x86_platform_ipi(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
- ack_APIC_irq();
-
- irq_enter();
-
- exit_idle();
-
+ entering_ack_irq();
trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR);
- inc_irq_stat(x86_platform_ipis);
-
- if (x86_platform_ipi_callback)
- x86_platform_ipi_callback();
-
+ __smp_x86_platform_ipi();
trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR);
- irq_exit();
-
+ exiting_irq();
set_irq_regs(old_regs);
}
diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
index 09e6262..636a55e 100644
--- a/arch/x86/kernel/irq_work.c
+++ b/arch/x86/kernel/irq_work.c
@@ -10,24 +10,32 @@
#include <asm/apic.h>
#include <asm/trace/irq_vectors.h>
-void smp_irq_work_interrupt(struct pt_regs *regs)
+static inline void irq_work_entering_irq(void)
{
irq_enter();
ack_APIC_irq();
+}
+
+static inline void __smp_irq_work_interrupt(void)
+{
inc_irq_stat(apic_irq_work_irqs);
irq_work_run();
- irq_exit();
+}
+
+void smp_irq_work_interrupt(struct pt_regs *regs)
+{
+ irq_work_entering_irq();
+ __smp_irq_work_interrupt();
+ exiting_irq();
}
void smp_trace_irq_work_interrupt(struct pt_regs *regs)
{
- irq_enter();
- ack_APIC_irq();
+ irq_work_entering_irq();
trace_irq_work_entry(IRQ_WORK_VECTOR);
- inc_irq_stat(apic_irq_work_irqs);
- irq_work_run();
+ __smp_irq_work_interrupt();
trace_irq_work_exit(IRQ_WORK_VECTOR);
- irq_exit();
+ exiting_irq();
}
void arch_irq_work_raise(void)
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index aad58af..f4fe0b8 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -250,11 +250,16 @@ finish:
/*
* Reschedule call back.
*/
-void smp_reschedule_interrupt(struct pt_regs *regs)
+static inline void __smp_reschedule_interrupt(void)
{
- ack_APIC_irq();
inc_irq_stat(irq_resched_count);
scheduler_ipi();
+}
+
+void smp_reschedule_interrupt(struct pt_regs *regs)
+{
+ ack_APIC_irq();
+ __smp_reschedule_interrupt();
/*
* KVM uses this interrupt to force a cpu out of guest mode
*/
@@ -264,52 +269,61 @@ void smp_trace_reschedule_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
trace_reschedule_entry(RESCHEDULE_VECTOR);
- inc_irq_stat(irq_resched_count);
- scheduler_ipi();
+ __smp_reschedule_interrupt();
trace_reschedule_exit(RESCHEDULE_VECTOR);
/*
* KVM uses this interrupt to force a cpu out of guest mode
*/
}
-void smp_call_function_interrupt(struct pt_regs *regs)
+static inline void call_function_entering_irq(void)
{
ack_APIC_irq();
irq_enter();
+}
+
+static inline void __smp_call_function_interrupt(void)
+{
generic_smp_call_function_interrupt();
inc_irq_stat(irq_call_count);
- irq_exit();
+}
+
+void smp_call_function_interrupt(struct pt_regs *regs)
+{
+ call_function_entering_irq();
+ __smp_call_function_interrupt();
+ exiting_irq();
}
void smp_trace_call_function_interrupt(struct pt_regs *regs)
{
- ack_APIC_irq();
- irq_enter();
+ call_function_entering_irq();
trace_call_function_entry(CALL_FUNCTION_VECTOR);
- generic_smp_call_function_interrupt();
- inc_irq_stat(irq_call_count);
+ __smp_call_function_interrupt();
trace_call_function_exit(CALL_FUNCTION_VECTOR);
- irq_exit();
+ exiting_irq();
}
-void smp_call_function_single_interrupt(struct pt_regs *regs)
+static inline void __smp_call_function_single_interrupt(void)
{
- ack_APIC_irq();
- irq_enter();
generic_smp_call_function_single_interrupt();
inc_irq_stat(irq_call_count);
- irq_exit();
+}
+
+void smp_call_function_single_interrupt(struct pt_regs *regs)
+{
+ call_function_entering_irq();
+ __smp_call_function_single_interrupt();
+ exiting_irq();
}
void smp_trace_call_function_single_interrupt(struct pt_regs *regs)
{
- ack_APIC_irq();
- irq_enter();
+ call_function_entering_irq();
trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
- generic_smp_call_function_single_interrupt();
- inc_irq_stat(irq_call_count);
+ __smp_call_function_single_interrupt();
trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
- irq_exit();
+ exiting_irq();
}
static int __init nonmi_ipi_setup(char *str)
--
1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2013-04-05 19:21:47
|
[Purpose of this patch] As Vaibhav explained in the thread below, tracepoints for irq vectors are useful. http://www.spinics.net/lists/mm-commits/msg85707.html <snip> The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled. They provide good data about when the system has switched to kernel space and how it affects the currently running processes. There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers. Tracing such events gives us the information about IRQ interaction with other system events. The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and how they are affecting other processes in the system. Also, the trace provides information about when the cores are idle and which interrupts are changing that state. <snip> On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer. I suggested to add an argument local timer event to get instruction pointer before. But there is another way to get it with external module like systemtap. So, I don't need to add any argument to irq vector tracepoints now. [Patch Description] Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events. But there is an above use case to trace specific irq_vector rather than tracing all events. In this case, we are concerned about overhead due to unwanted events. This patch adds following tracepoints instead of introducing irq_vector_entry/exit. so that we can enable them independently. - local_timer_vector - reschedule_vector - call_function_vector - call_function_single_vector - irq_work_entry_vector - error_apic_vector - thermal_apic_vector - threshold_apic_vector - spurious_apic_vector - x86_platform_ipi_vector Also, it introduces a logic switching IDT at enabling/disabling time so that a time penalty makes a zero when tracepoints are disabled. Detailed explanations are as follows. - Duplicate new irq handlers inserted tracepoints. - Create a new IDT, trace_idt_table, at boot time by adding a logic to _set_gate(). It is just a copy of original idt table. - Registering the new handers for tracpoints to the new IDT by introducing macros to alloc_intr_gate() called at regstering time of irq_vector handlers. - Switch IDT to new one at enabling TP time. - Restore to an original IDT at disabling TP time. The new IDT is created only when CONFIG_TRACING is enabled to avoid being used for other purposes. Signed-off-by: Seiji Aguchi <sei...@hd...> --- arch/x86/include/asm/desc.h | 36 +++++++- arch/x86/include/asm/entry_arch.h | 5 +- arch/x86/include/asm/hw_irq.h | 16 +++ arch/x86/include/asm/mshyperv.h | 1 + arch/x86/include/asm/trace/irq_vectors.h | 159 ++++++++++++++++++++++++++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/apic/apic.c | 94 ++++++++++++++++++ arch/x86/kernel/cpu/common.c | 12 ++- arch/x86/kernel/cpu/mcheck/therm_throt.c | 14 +++ arch/x86/kernel/cpu/mcheck/threshold.c | 14 +++ arch/x86/kernel/entry_32.S | 12 ++- arch/x86/kernel/entry_64.S | 29 +++++- arch/x86/kernel/head_64.S | 6 + arch/x86/kernel/irq.c | 23 +++++ arch/x86/kernel/irq_work.c | 12 +++ arch/x86/kernel/smp.c | 35 +++++++ arch/x86/kernel/tracepoint.c | 71 +++++++++++++ include/xen/events.h | 3 + 18 files changed, 529 insertions(+), 14 deletions(-) create mode 100644 arch/x86/include/asm/trace/irq_vectors.h create mode 100644 arch/x86/kernel/tracepoint.c diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 8bf1c06..e718c72 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -39,6 +39,9 @@ extern gate_desc idt_table[]; extern struct desc_ptr nmi_idt_descr; extern gate_desc nmi_idt_table[]; +extern struct desc_ptr trace_idt_descr; +extern DEFINE_PER_CPU(u32, trace_idt_in_use); +extern DEFINE_PER_CPU(u32, debug_stack_use_ctr); struct gdt_page { struct desc_struct gdt[GDT_ENTRIES]; } __attribute__((aligned(PAGE_SIZE))); @@ -320,6 +323,17 @@ static inline void set_nmi_gate(int gate, void *addr) } #endif +#ifdef CONFIG_TRACING +extern gate_desc trace_idt_table[]; +static inline void trace_set_intr_gate(unsigned int gate, void *addr) +{ + gate_desc s; + + pack_gate(&s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); + write_idt_entry(trace_idt_table, gate, &s); +} +#endif + static inline void _set_gate(int gate, unsigned type, void *addr, unsigned dpl, unsigned ist, unsigned seg) { @@ -331,6 +345,9 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * setup time */ write_idt_entry(idt_table, gate, &s); +#ifdef CONFIG_TRACING + write_idt_entry(trace_idt_table, gate, &s); +#endif } /* @@ -360,12 +377,27 @@ static inline void alloc_system_vector(int vector) } } -static inline void alloc_intr_gate(unsigned int n, void *addr) +#ifdef CONFIG_TRACING +static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) +{ + trace_set_intr_gate(n, addr); +} +#else +#define __trace_alloc_intr_gate(n, addr) +#endif + +static inline void __alloc_intr_gate(unsigned int n, void *addr) { - alloc_system_vector(n); set_intr_gate(n, addr); } +#define alloc_intr_gate(n, addr) \ + do { \ + alloc_system_vector(n); \ + __alloc_intr_gate(n, addr); \ + __trace_alloc_intr_gate(n, trace_##addr); \ + } while (0) + /* * This routine sets up an interrupt gate at directory privilege level 3. */ diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h index 40afa00..0bb99d8 100644 --- a/arch/x86/include/asm/entry_arch.h +++ b/arch/x86/include/asm/entry_arch.h @@ -13,8 +13,9 @@ BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR) BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR) BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR) -BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR) -BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR) +BUILD_INTERRUPT3(irq_move_cleanup_interrupt, IRQ_MOVE_CLEANUP_VECTOR, + smp_irq_move_cleanup_interrupt) +BUILD_INTERRUPT3(reboot_interrupt, REBOOT_VECTOR, smp_reboot_interrupt) #endif BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 10a78c3..e84e91a 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -76,6 +76,22 @@ extern void threshold_interrupt(void); extern void call_function_interrupt(void); extern void call_function_single_interrupt(void); +#ifdef CONFIG_TRACING +/* Interrupt handlers registered during init_IRQ */ +extern void trace_apic_timer_interrupt(void); +extern void trace_x86_platform_ipi(void); +extern void trace_error_interrupt(void); +extern void trace_irq_work_interrupt(void); +extern void trace_spurious_interrupt(void); +extern void trace_thermal_interrupt(void); +extern void trace_reschedule_interrupt(void); +extern void trace_threshold_interrupt(void); +extern void trace_call_function_interrupt(void); +extern void trace_call_function_single_interrupt(void); +#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt +#define trace_reboot_interrupt reboot_interrupt +#endif /* CONFIG_TRACING */ + /* IOAPIC */ #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs)) extern unsigned long io_apic_irqs; diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index c2934be..cc40e03 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -12,6 +12,7 @@ struct ms_hyperv_info { extern struct ms_hyperv_info ms_hyperv; void hyperv_callback_vector(void); +#define trace_hyperv_callback_vector hyperv_callback_vector void hyperv_vector_handler(struct pt_regs *regs); void hv_register_vmbus_handler(int irq, irq_handler_t handler); diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h new file mode 100644 index 0000000..b4f1c53 --- /dev/null +++ b/arch/x86/include/asm/trace/irq_vectors.h @@ -0,0 +1,159 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM irq_vectors + +#if !defined(_TRACE_IRQ_VECTORS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_IRQ_VECTORS_H + +#include <linux/tracepoint.h> + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_irq_vector, + + TP_PROTO(int vector), + + TP_ARGS(vector), + + TP_STRUCT__entry( + __field( int, vector ) + ), + + TP_fast_assign( + __entry->vector = vector; + ), + + TP_printk("vector=%d", __entry->vector) ); + +#define DEFINE_IRQ_VECTOR_EVENT(name) \ +DEFINE_EVENT_FN(x86_irq_vector, name, \ + TP_PROTO(int vector), \ + TP_ARGS(vector), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); + +/* + * local_timer_entry - called before entering a local timer interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(local_timer_entry); + +/* + * local_timer_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(local_timer_exit); + +/* + * reschedule_entry - called before entering a reschedule vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(reschedule_entry); + +/* + * reschedule_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(reschedule_exit); + +/* + * spurious_apic_entry - called before entering a spurious apic vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(spurious_apic_entry); + +/* + * spurious_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(spurious_apic_exit); + +/* + * error_apic_entry - called before entering an error apic vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(error_apic_entry); + +/* + * error_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(error_apic_exit); + +/* + * x86_platform_ipi_entry - called before entering a x86 platform ipi interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi_entry); + +/* + * x86_platform_ipi_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(x86_platform_ipi_exit); + +/* + * irq_work_entry - called before entering a irq work interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(irq_work_entry); + +/* + * irq_work_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(irq_work_exit); + +/* + * call_function_entry - called before entering a call function interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(call_function_entry); + +/* + * call_function_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(call_function_exit); + +/* + * call_function_single_entry - called before entering a call function + * single interrupt vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(call_function_single_entry); + +/* + * call_function_single_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(call_function_single_exit); + +/* + * threshold_apic_entry - called before entering a threshold apic interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(threshold_apic_entry); + +/* + * threshold_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(threshold_apic_exit); + +/* + * thermal_apic_entry - called before entering a thermal apic interrupt + * vector handler + */ +DEFINE_IRQ_VECTOR_EVENT(thermal_apic_entry); + +/* + * thrmal_apic_exit - called immediately after the interrupt vector + * handler returns + */ +DEFINE_IRQ_VECTOR_EVENT(thermal_apic_exit); + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH ../../arch/x86/include/asm/trace +#define TRACE_INCLUDE_FILE irq_vectors +#endif /* _TRACE_IRQ_VECTORS_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> + diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 7bd3bd3..74b3891 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -102,6 +102,7 @@ obj-$(CONFIG_OF) += devicetree.o obj-$(CONFIG_UPROBES) += uprobes.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o +obj-$(CONFIG_TRACING) += tracepoint.o ### # 64 bit specific files diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 904611b..7d39c68 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -55,6 +55,9 @@ #include <asm/tsc.h> #include <asm/hypervisor.h> +#define CREATE_TRACE_POINTS +#include <asm/trace/irq_vectors.h> + unsigned int num_processors; unsigned disabled_cpus __cpuinitdata; @@ -934,6 +937,30 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs) set_irq_regs(old_regs); } +void __irq_entry smp_trace_apic_timer_interrupt(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + /* + * NOTE! We'd better ACK the irq immediately, + * because timer handling can be slow. + */ + ack_APIC_irq(); + /* + * update_process_times() expects us to have done irq_enter(). + * Besides, if we don't timer interrupts ignore the global + * interrupt lock, which is the WrongThing (tm) to do. + */ + irq_enter(); + exit_idle(); + trace_local_timer_entry(LOCAL_TIMER_VECTOR); + local_apic_timer_interrupt(); + trace_local_timer_exit(LOCAL_TIMER_VECTOR); + irq_exit(); + + set_irq_regs(old_regs); +} + int setup_profiling_timer(unsigned int multiplier) { return -EINVAL; @@ -1930,6 +1957,31 @@ void smp_spurious_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_spurious_interrupt(struct pt_regs *regs) +{ + u32 v; + + irq_enter(); + exit_idle(); + trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR); + /* + * Check if this really is a spurious interrupt and ACK it + * if it is a vectored one. Just in case... + * Spurious interrupts should not be ACKed. + */ + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1)); + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) + ack_APIC_irq(); + + inc_irq_stat(irq_spurious_count); + + /* see sw-dev-man vol 3, chapter 7.4.13.5 */ + pr_info("spurious APIC interrupt on CPU#%d, " + "should never happen.\n", smp_processor_id()); + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR); + irq_exit(); +} + /* * This interrupt should never happen with our APIC/SMP architecture */ @@ -1973,6 +2025,48 @@ void smp_error_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_error_interrupt(struct pt_regs *regs) +{ + u32 v0, v1; + u32 i = 0; + static const char * const error_interrupt_reason[] = { + "Send CS error", /* APIC Error Bit 0 */ + "Receive CS error", /* APIC Error Bit 1 */ + "Send accept error", /* APIC Error Bit 2 */ + "Receive accept error", /* APIC Error Bit 3 */ + "Redirectable IPI", /* APIC Error Bit 4 */ + "Send illegal vector", /* APIC Error Bit 5 */ + "Received illegal vector", /* APIC Error Bit 6 */ + "Illegal register address", /* APIC Error Bit 7 */ + }; + + irq_enter(); + exit_idle(); + trace_error_apic_entry(ERROR_APIC_VECTOR); + /* First tickle the hardware, only then report what went on. -- REW */ + v0 = apic_read(APIC_ESR); + apic_write(APIC_ESR, 0); + v1 = apic_read(APIC_ESR); + ack_APIC_irq(); + atomic_inc(&irq_err_count); + + apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)", + smp_processor_id(), v0 , v1); + + v1 = v1 & 0xff; + while (v1) { + if (v1 & 0x1) + apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]); + i++; + v1 >>= 1; + } + + apic_printk(APIC_DEBUG, KERN_CONT "\n"); + + trace_error_apic_exit(ERROR_APIC_VECTOR); + irq_exit(); +} + /** * connect_bsp_APIC - attach the APIC to the interrupt system */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index d814772..23045cd 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1144,7 +1144,7 @@ int is_debug_stack(unsigned long addr) addr > (__get_cpu_var(debug_stack_addr) - DEBUG_STKSZ)); } -static DEFINE_PER_CPU(u32, debug_stack_use_ctr); +DEFINE_PER_CPU(u32, debug_stack_use_ctr); void debug_stack_set_zero(void) { @@ -1156,8 +1156,14 @@ void debug_stack_reset(void) { if (WARN_ON(!this_cpu_read(debug_stack_use_ctr))) return; - if (this_cpu_dec_return(debug_stack_use_ctr) == 0) - load_idt((const struct desc_ptr *)&idt_descr); + if (this_cpu_dec_return(debug_stack_use_ctr) == 0) { +#ifdef CONFIG_TRACING + if (this_cpu_read(trace_idt_in_use)) + load_idt((const struct desc_ptr *)&trace_idt_descr); + else +#endif + load_idt((const struct desc_ptr *)&idt_descr); + } } #else /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index 47a1870..e7aa7fc 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -29,6 +29,7 @@ #include <asm/idle.h> #include <asm/mce.h> #include <asm/msr.h> +#include <asm/trace/irq_vectors.h> /* How long to wait between reporting thermal events */ #define CHECK_INTERVAL (300 * HZ) @@ -389,6 +390,19 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs) ack_APIC_irq(); } +asmlinkage void smp_trace_thermal_interrupt(struct pt_regs *regs) +{ + irq_enter(); + exit_idle(); + trace_thermal_apic_entry(THERMAL_APIC_VECTOR); + inc_irq_stat(irq_thermal_count); + smp_thermal_vector(); + trace_thermal_apic_exit(THERMAL_APIC_VECTOR); + irq_exit(); + /* Ack only at the end to avoid potential reentry */ + ack_APIC_irq(); +} + /* Thermal monitoring depends on APIC, ACPI and clock modulation */ static int intel_thermal_supported(struct cpuinfo_x86 *c) { diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c index aa578ca..0cbef99 100644 --- a/arch/x86/kernel/cpu/mcheck/threshold.c +++ b/arch/x86/kernel/cpu/mcheck/threshold.c @@ -8,6 +8,7 @@ #include <asm/apic.h> #include <asm/idle.h> #include <asm/mce.h> +#include <asm/trace/irq_vectors.h> static void default_threshold_interrupt(void) { @@ -27,3 +28,16 @@ asmlinkage void smp_threshold_interrupt(void) /* Ack only at the end to avoid potential reentry */ ack_APIC_irq(); } + +asmlinkage void smp_trace_threshold_interrupt(void) +{ + irq_enter(); + exit_idle(); + trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR); + inc_irq_stat(irq_threshold_count); + mce_threshold_vector(); + trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR); + irq_exit(); + /* Ack only at the end to avoid potential reentry */ + ack_APIC_irq(); +} diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S index 8f3e2de..2cfbc3a 100644 --- a/arch/x86/kernel/entry_32.S +++ b/arch/x86/kernel/entry_32.S @@ -801,7 +801,17 @@ ENTRY(name) \ CFI_ENDPROC; \ ENDPROC(name) -#define BUILD_INTERRUPT(name, nr) BUILD_INTERRUPT3(name, nr, smp_##name) + +#ifdef CONFIG_TRACING +#define TRACE_BUILD_INTERRUPT(name, nr) \ + BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name) +#else +#define TRACE_BUILD_INTERRUPT(name, nr) +#endif + +#define BUILD_INTERRUPT(name, nr) \ + BUILD_INTERRUPT3(name, nr, smp_##name); \ + TRACE_BUILD_INTERRUPT(name, nr) /* The include is where all of the SMP etc. interrupts come from */ #include <asm/entry_arch.h> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index c1d01e6..4570477 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1138,7 +1138,7 @@ END(common_interrupt) /* * APIC interrupts. */ -.macro apicinterrupt num sym do_sym +.macro apicinterrupt3 num sym do_sym ENTRY(\sym) INTR_FRAME ASM_CLAC @@ -1150,15 +1150,32 @@ ENTRY(\sym) END(\sym) .endm +#ifdef CONFIG_TRACING +#define trace(sym) trace_##sym +#define smp_trace(sym) smp_trace_##sym + +.macro trace_apicinterrupt num sym +apicinterrupt3 \num trace(\sym) smp_trace(\sym) +.endm +#else +.macro trace_apicinterrupt num sym do_sym +.endm +#endif + +.macro apicinterrupt num sym do_sym +apicinterrupt3 \num \sym \do_sym +trace_apicinterrupt \num \sym +.endm + #ifdef CONFIG_SMP -apicinterrupt IRQ_MOVE_CLEANUP_VECTOR \ +apicinterrupt3 IRQ_MOVE_CLEANUP_VECTOR \ irq_move_cleanup_interrupt smp_irq_move_cleanup_interrupt -apicinterrupt REBOOT_VECTOR \ +apicinterrupt3 REBOOT_VECTOR \ reboot_interrupt smp_reboot_interrupt #endif #ifdef CONFIG_X86_UV -apicinterrupt UV_BAU_MESSAGE \ +apicinterrupt3 UV_BAU_MESSAGE \ uv_bau_message_intr1 uv_bau_message_interrupt #endif apicinterrupt LOCAL_TIMER_VECTOR \ @@ -1446,13 +1463,13 @@ ENTRY(xen_failsafe_callback) CFI_ENDPROC END(xen_failsafe_callback) -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \ +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ xen_hvm_callback_vector xen_evtchn_do_upcall #endif /* CONFIG_XEN */ #if IS_ENABLED(CONFIG_HYPERV) -apicinterrupt HYPERVISOR_CALLBACK_VECTOR \ +apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ hyperv_callback_vector hyperv_vector_handler #endif /* CONFIG_HYPERV */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 6859e96..e827e67 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -518,6 +518,12 @@ ENTRY(idt_table) ENTRY(nmi_idt_table) .skip IDT_ENTRIES * 16 +#ifdef CONFIG_TRACING + .align L1_CACHE_BYTES +ENTRY(trace_idt_table) + .skip IDT_ENTRIES * 16 +#endif + __PAGE_ALIGNED_BSS NEXT_PAGE(empty_zero_page) .skip PAGE_SIZE diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index e4595f1..216bec1 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -17,6 +17,7 @@ #include <asm/idle.h> #include <asm/mce.h> #include <asm/hw_irq.h> +#include <asm/trace/irq_vectors.h> atomic_t irq_err_count; @@ -228,6 +229,28 @@ void smp_x86_platform_ipi(struct pt_regs *regs) set_irq_regs(old_regs); } +void smp_trace_x86_platform_ipi(struct pt_regs *regs) +{ + struct pt_regs *old_regs = set_irq_regs(regs); + + ack_APIC_irq(); + + irq_enter(); + + exit_idle(); + + trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR); + inc_irq_stat(x86_platform_ipis); + + if (x86_platform_ipi_callback) + x86_platform_ipi_callback(); + + trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR); + irq_exit(); + + set_irq_regs(old_regs); +} + EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq); #ifdef CONFIG_HOTPLUG_CPU diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c index ca8f703..09e6262 100644 --- a/arch/x86/kernel/irq_work.c +++ b/arch/x86/kernel/irq_work.c @@ -8,6 +8,7 @@ #include <linux/irq_work.h> #include <linux/hardirq.h> #include <asm/apic.h> +#include <asm/trace/irq_vectors.h> void smp_irq_work_interrupt(struct pt_regs *regs) { @@ -18,6 +19,17 @@ void smp_irq_work_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_irq_work_interrupt(struct pt_regs *regs) +{ + irq_enter(); + ack_APIC_irq(); + trace_irq_work_entry(IRQ_WORK_VECTOR); + inc_irq_stat(apic_irq_work_irqs); + irq_work_run(); + trace_irq_work_exit(IRQ_WORK_VECTOR); + irq_exit(); +} + void arch_irq_work_raise(void) { #ifdef CONFIG_X86_LOCAL_APIC diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..aad58af 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -30,6 +30,7 @@ #include <asm/proto.h> #include <asm/apic.h> #include <asm/nmi.h> +#include <asm/trace/irq_vectors.h> /* * Some notes on x86 processor bugs affecting SMP operation: * @@ -259,6 +260,18 @@ void smp_reschedule_interrupt(struct pt_regs *regs) */ } +void smp_trace_reschedule_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + trace_reschedule_entry(RESCHEDULE_VECTOR); + inc_irq_stat(irq_resched_count); + scheduler_ipi(); + trace_reschedule_exit(RESCHEDULE_VECTOR); + /* + * KVM uses this interrupt to force a cpu out of guest mode + */ +} + void smp_call_function_interrupt(struct pt_regs *regs) { ack_APIC_irq(); @@ -268,6 +281,17 @@ void smp_call_function_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_call_function_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + irq_enter(); + trace_call_function_entry(CALL_FUNCTION_VECTOR); + generic_smp_call_function_interrupt(); + inc_irq_stat(irq_call_count); + trace_call_function_exit(CALL_FUNCTION_VECTOR); + irq_exit(); +} + void smp_call_function_single_interrupt(struct pt_regs *regs) { ack_APIC_irq(); @@ -277,6 +301,17 @@ void smp_call_function_single_interrupt(struct pt_regs *regs) irq_exit(); } +void smp_trace_call_function_single_interrupt(struct pt_regs *regs) +{ + ack_APIC_irq(); + irq_enter(); + trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR); + generic_smp_call_function_single_interrupt(); + inc_irq_stat(irq_call_count); + trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); + irq_exit(); +} + static int __init nonmi_ipi_setup(char *str) { smp_no_nmi_ipi = true; diff --git a/arch/x86/kernel/tracepoint.c b/arch/x86/kernel/tracepoint.c new file mode 100644 index 0000000..3403dc3 --- /dev/null +++ b/arch/x86/kernel/tracepoint.c @@ -0,0 +1,71 @@ +/* + * Code for supporting irq vector tracepoints. + * + * Copyright (C) 2013 Seiji Aguchi <sei...@hd...> + * + */ +#include <asm/hw_irq.h> +#include <asm/desc.h> + +struct desc_ptr trace_idt_descr = { NR_VECTORS * 16 - 1, + (unsigned long) trace_idt_table }; + +#ifndef CONFIG_X86_64 +gate_desc trace_idt_table[NR_VECTORS] __page_aligned_data + = { { { { 0, 0 } } }, }; +#endif + +static int trace_irq_vector_refcount; +static DEFINE_MUTEX(irq_vector_mutex); +DEFINE_PER_CPU(u32, trace_idt_in_use); + +static void switch_to_trace_idt(void *arg) +{ + + this_cpu_inc(trace_idt_in_use); + load_idt(&trace_idt_descr); + + return; +} + +static void restore_original_idt(void *arg) +{ + if (WARN_ON(!this_cpu_read(trace_idt_in_use))) + return; + this_cpu_dec(trace_idt_in_use); + +#ifdef CONFIG_X86_64 + if (this_cpu_read(debug_stack_use_ctr)) + load_idt(&nmi_idt_descr); + else +#endif + load_idt(&idt_descr); + return; +} + +void trace_irq_vector_regfunc(void) +{ + mutex_lock(&irq_vector_mutex); + if (!trace_irq_vector_refcount) { + smp_call_function(switch_to_trace_idt, NULL, 0); + local_irq_disable(); + switch_to_trace_idt(NULL); + local_irq_enable(); + } + trace_irq_vector_refcount++; + mutex_unlock(&irq_vector_mutex); +} + +void trace_irq_vector_unregfunc(void) +{ + mutex_lock(&irq_vector_mutex); + trace_irq_vector_refcount--; + if (!trace_irq_vector_refcount) { + smp_call_function(restore_original_idt, NULL, 0); + local_irq_disable(); + restore_original_idt(NULL); + local_irq_enable(); + } + mutex_unlock(&irq_vector_mutex); +} + diff --git a/include/xen/events.h b/include/xen/events.h index c6bfe01..9216d07 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -76,6 +76,9 @@ unsigned irq_from_evtchn(unsigned int evtchn); /* Xen HVM evtchn vector callback */ void xen_hvm_callback_vector(void); +#ifdef CONFIG_TRACING +#define trace_xen_hvm_callback_vector xen_hvm_callback_vector +#endif extern int xen_have_vector_callback; int xen_set_callback_via(uint64_t via); void xen_evtchn_do_upcall(struct pt_regs *regs); -- 1.7.1 |
|
From: Seiji A. <sei...@hd...> - 2013-04-05 19:20:34
|
From: Steven Rostedt <ro...@go...> Each TRACE_EVENT() adds several helper functions. If two or more trace events share the same structure and print format, they can also share most of these helper functions and save a lot of space from duplicate code. This is why the DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created. Some events require a trigger to be called at registering and unregistering of the event and to do so they use TRACE_EVENT_FN(). If multiple events require a trigger, they currently have no choice but to use TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately causes a lot of wasted duplicate code created. By adding a DEFINE_EVENT_FN(), these events can still use a DECLARE_EVENT_CLASS() and then define their own triggers. Signed-off-by: Steven Rostedt <ro...@go...> Signed-off-by: Seiji Aguchi <sei...@hd...> --- include/linux/tracepoint.h | 2 ++ include/trace/define_trace.h | 5 +++++ include/trace/ftrace.h | 4 ++++ 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 2f322c3..9bf59e5 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -378,6 +378,8 @@ static inline void tracepoint_synchronize_unregister(void) #define DECLARE_EVENT_CLASS(name, proto, args, tstruct, assign, print) #define DEFINE_EVENT(template, name, proto, args) \ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) +#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg)\ + DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) #define DEFINE_EVENT_CONDITION(template, name, proto, \ diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h index 1905ca8..02e1003 100644 --- a/include/trace/define_trace.h +++ b/include/trace/define_trace.h @@ -44,6 +44,10 @@ #define DEFINE_EVENT(template, name, proto, args) \ DEFINE_TRACE(name) +#undef DEFINE_EVENT_FN +#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \ + DEFINE_TRACE_FN(name, reg, unreg) + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ DEFINE_TRACE(name) @@ -91,6 +95,7 @@ #undef TRACE_EVENT_CONDITION #undef DECLARE_EVENT_CLASS #undef DEFINE_EVENT +#undef DEFINE_EVENT_FN #undef DEFINE_EVENT_PRINT #undef DEFINE_EVENT_CONDITION #undef TRACE_HEADER_MULTI_READ diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 40dc5e8..7bab676 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -71,6 +71,10 @@ static struct ftrace_event_call __used \ __attribute__((__aligned__(4))) event_##name +#undef DEFINE_EVENT_FN +#define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg) \ + DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \ DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) -- 1.7.1 |
|
From: Seiji A. <sei...@hd...> - 2013-04-05 19:16:51
|
Change log
v11 -> v12
- Rebase to 3.9-rc5
v10 -> v11
- Rebase to 3.9-rc2
- Add a modification for hyperv_callback vector. (patch 2/3)
- Change a way to switch idt to check the table in use instead of saving/restoring it,
because saving/restoring functions will break if we have to add another one. (patch 2/3)
v9 -> v10
- Add an explanation the reason why tracepoints has to place inside irq enter/exit handling. (patch 3/3)
v8 -> v9
- Rebase to 3.8-rc6
- Add Steven's email address at the top of the message and
move my signed-off-by below Steven's one because it is
originally created by Steven. (patch 1/3)
- Introduce a irq_vector_mutex to avoid a race at registering/unregistering
time. (patch 2/3)
- Use a per_cpu data to orig_idt_descr because IDT descritor is needed to each cpu
and the appropriate data type is per_cpu data. It is suggested by Steven.
(patch 2/3)
v7 -> v8
- Rebase to 3.8-rc4
- Add a patch 1 introducing DEFINE_EVENT_FN() macro.
- Rename original patches 1 and 2 to 2 and 3.
- Change a definition of tracepoint to use DEFINE_EVENT_FN(). (patch 2)
- Change alloc_intr_gate() to use do{}while(0) to avoid a warning
of checkpatch.pl. (patch 2)
- Move entering_irq()/exiting_irq() to arch/x86/include/asm/apic.h (patch 3)
v6 -> v7
- Divide into two patches to make a code review easier.
Summery of each patch is as follows.
- Patch 1/2
- Add an irq_vector tracing infrastructure.
- Create idt_table for tracing. It is refactored to avoid duplicating
existing logic.
- Duplicate new irq handlers inserted tracepoints.
- Patch 2/2
- Share a common logic among irq handlers to make them
manageable and readable.
v5 -> v6
- Rebased to 3.7
v4 -> v5
- Rebased to 3.6.0
- Introduce a logic switching IDT at enabling/disabling TP time
so that a time penalty makes a zero when tracepoints are disabled.
This IDT is created only when CONFIG_TRACEPOINTS is enabled.
- Remove arch_irq_vector_entry/exit and add followings again
so that we can add each tracepoint in a generic way.
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
- Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching
IDT first.
- Move irq_vectors.h in the directory of arch/x86/include/asm/trace because
I'm not sure if a logic switching IDT is sharable with other architectures.
v3 -> v4
- Add a latency measurement of each tracepoint
- Rebased to 3.6-rc6
v2 -> v3
- Remove an invalidate_tlb_vector event because it was replaced by a call function vector
in a following commit.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
v1 -> v2
- Modify variable name from irq to vector.
- Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit.
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
[Purpose of this patch]
As Vaibhav explained in the thread below, tracepoints for irq vectors are useful.
http://www.spinics.net/lists/mm-commits/msg85707.html
<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit provide when an interrupt is handled.
They provide good data about when the system has switched to kernel space and how it affects the currently running processes.
There are some IRQ vectors which trigger the system into kernel space, which are not handled in generic IRQ handlers.
Tracing such events gives us the information about IRQ interaction with other system events.
The trace also tells where the system is spending its time. We want to know which cores are handling interrupts and
how they are affecting other processes in the system. Also, the trace provides information about when the cores are
idle and which interrupts are changing that state.
<snip>
On the other hand, my usecase is tracing just local timer event and getting a value of instruction pointer.
I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.
[Patch Description]
Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.
This patch adds following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
- local_timer_vector
- reschedule_vector
- call_function_vector
- call_function_single_vector
- irq_work_entry_vector
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
Please see descriptions in each patch.
Seiji Aguchi (2):
trace,x86: add x86 irq vector tracepoints
trace,x86: code-sharing between non-trace and trace irq handlers
Steven Rostedt (1):
tracing: Add DEFINE_EVENT_FN() macro
arch/x86/include/asm/apic.h | 27 +++++
arch/x86/include/asm/desc.h | 36 +++++++-
arch/x86/include/asm/entry_arch.h | 5 +-
arch/x86/include/asm/hw_irq.h | 16 +++
arch/x86/include/asm/mshyperv.h | 1 +
arch/x86/include/asm/trace/irq_vectors.h | 159 ++++++++++++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/apic/apic.c | 71 +++++++++++---
arch/x86/kernel/cpu/common.c | 12 ++-
arch/x86/kernel/cpu/mcheck/therm_throt.c | 24 ++++-
arch/x86/kernel/cpu/mcheck/threshold.c | 24 ++++-
arch/x86/kernel/entry_32.S | 12 ++-
arch/x86/kernel/entry_64.S | 29 +++++-
arch/x86/kernel/head_64.S | 6 +
arch/x86/kernel/irq.c | 31 ++++--
arch/x86/kernel/irq_work.c | 24 ++++-
arch/x86/kernel/smp.c | 65 +++++++++++--
arch/x86/kernel/tracepoint.c | 71 +++++++++++++
include/linux/tracepoint.h | 2 +
include/trace/define_trace.h | 5 +
include/trace/ftrace.h | 4 +
include/xen/events.h | 3 +
22 files changed, 570 insertions(+), 58 deletions(-)
create mode 100644 arch/x86/include/asm/trace/irq_vectors.h
create mode 100644 arch/x86/kernel/tracepoint.c
|
|
From: Seiji A. <sei...@hd...> - 2013-04-05 15:35:26
|
> NACK this is putting inappropriate policy decisions into libvirt. > This may be how you want this scenario to work, but it is certainly not what every user of libvirt will want. How about adding some option to libvirt so that the users can choose policies? > If the storage is fubar & the management application wants to still start the guest, they should change the XML config appropriately. I think that providing options is user-friendly rather than asking users to edit xml by hand after the storage is broken. Seiji |
|
From: Daniel P. B. <ber...@re...> - 2013-04-05 15:21:30
|
On Fri, Apr 05, 2013 at 02:36:45PM +0000, Seiji Aguchi wrote:
> [Problem]
> Currently, guest OS's messages can be logged to a local disk of host OS
> by creating chadevs with options below.
> -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0
>
> When a hardware failure happens in the disk, qemu-kvm can't create the chardevs.
> In this case, guest OS doesn't boot up.
>
> Actually, there are users who don't desire that guest OS goes down due to a hardware failure
> of a log disk only.Therefore, libvirt should offer some way to boot guest OS up even if the log
> disk is broken.
>
> [Solution]
> This patch changes a destination to /dev/null in case a log file can't be opened.
>
> Signed-off-by: Seiji Aguchi <sei...@hd...>
> ---
> src/qemu/qemu_process.c | 20 +++++++++++++++++++-
> 1 files changed, 19 insertions(+), 1 deletions(-)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 8c4bfb7..fdf26ad 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -2440,7 +2440,25 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED,
> virReportSystemError(errno,
> _("Unable to pre-create chardev file '%s'"),
> dev->source.data.file.path);
> - return -1;
> + VIR_FREE(dev->source.data.file.path);
> + /*
> + * Change a destination to /dev/null to boot guest OS up
> + * even if a log disk is broken.
> + */
> + dev->source.data.file.path = strdup("/dev/null");
> +
> + if (!dev->source.data.file.path) {
> + virReportOOMError();
> + return -1;
> + }
> +
> + if ((fd = open(dev->source.data.file.path,
> + O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) {
> + virReportSystemError(errno,
> + _("Unable to pre-create chardev file '%s'"),
> + dev->source.data.file.path);
> + return -1;
> + }
> }
NACK this is putting inappropriate policy decisions into libvirt.
This may be how you want this scenario to work, but it is certainly
not what every user of libvirt will want.
If the storage is fubar & the management application wants to still
start the guest, they should change the XML config appropriately.
REgards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
|
|
From: Seiji A. <sei...@hd...> - 2013-04-05 14:37:02
|
[Problem]
Currently, guest OS's messages can be logged to a local disk of host OS
by creating chadevs with options below.
-chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0
When a hardware failure happens in the disk, qemu-kvm can't create the chardevs.
In this case, guest OS doesn't boot up.
Actually, there are users who don't desire that guest OS goes down due to a hardware failure
of a log disk only.Therefore, libvirt should offer some way to boot guest OS up even if the log
disk is broken.
[Solution]
This patch changes a destination to /dev/null in case a log file can't be opened.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
src/qemu/qemu_process.c | 20 +++++++++++++++++++-
1 files changed, 19 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 8c4bfb7..fdf26ad 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2440,7 +2440,25 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED,
virReportSystemError(errno,
_("Unable to pre-create chardev file '%s'"),
dev->source.data.file.path);
- return -1;
+ VIR_FREE(dev->source.data.file.path);
+ /*
+ * Change a destination to /dev/null to boot guest OS up
+ * even if a log disk is broken.
+ */
+ dev->source.data.file.path = strdup("/dev/null");
+
+ if (!dev->source.data.file.path) {
+ virReportOOMError();
+ return -1;
+ }
+
+ if ((fd = open(dev->source.data.file.path,
+ O_CREAT | O_APPEND, S_IRUSR|S_IWUSR)) < 0) {
+ virReportSystemError(errno,
+ _("Unable to pre-create chardev file '%s'"),
+ dev->source.data.file.path);
+ return -1;
+ }
}
VIR_FORCE_CLOSE(fd);
-- 1.7.1
|
|
From: Paolo B. <pbo...@re...> - 2013-04-04 15:20:42
|
Il 04/04/2013 16:54, Laszlo Ersek ha scritto:
> Side note: strictly in theory, this would result in two vfprintf()
> calls. The message log would remain "record oriented", but (again, in
> theory) another thread *might* get interleaved and mess up our format
> with a parallel call to error_report() (or more deeply, to fprintf()).
>
> Importantly I'm not talking about "corrupting data"; stdio streams are
> automatically locked by the fprintf() family. The thing (theoretically,
> possibly) corrupted would be our record-oriented message format, by
> interleaved printfs.
>
> (a) I'm not sure if this is possible at all in qemu.
It would be one more thing that is protected by the big QEMU lock.
Regarding your other comment, please use gmtime_r, not gmtime.
Paolo
> (b) Anyway, there are two ways to fix it:
>
> (b1) In error_report(), lock the stream across the two printfs with
> flockfile(). Probably overkill, and in case we're printing to the
> monitor, wasteful/useless. Or,
> (b2) Format the full message (including the timestamp) into a buffer
> (sprintf, vsprintf(), or their glib wrappers with automatic allocation,
> if any) and print it with a single error_printf("%s", buf).
>
> Anyway I absolutely do not insist on this, so sorry for the noise.
|
|
From: Laszlo E. <le...@re...> - 2013-04-04 14:52:16
|
On 02/01/13 15:53, Seiji Aguchi wrote:
> /*
> * Print an error message to current monitor if we have one, else to stderr.
> * Format arguments like sprintf(). The result should not contain
> @@ -207,6 +296,7 @@ void error_report(const char *fmt, ...)
> {
> va_list ap;
>
> + error_print_timestamp();
> error_print_loc();
> va_start(ap, fmt);
> error_vprintf(fmt, ap);
> -- 1.7.1
Side note: strictly in theory, this would result in two vfprintf()
calls. The message log would remain "record oriented", but (again, in
theory) another thread *might* get interleaved and mess up our format
with a parallel call to error_report() (or more deeply, to fprintf()).
Importantly I'm not talking about "corrupting data"; stdio streams are
automatically locked by the fprintf() family. The thing (theoretically,
possibly) corrupted would be our record-oriented message format, by
interleaved printfs.
(a) I'm not sure if this is possible at all in qemu.
(b) Anyway, there are two ways to fix it:
(b1) In error_report(), lock the stream across the two printfs with
flockfile(). Probably overkill, and in case we're printing to the
monitor, wasteful/useless. Or,
(b2) Format the full message (including the timestamp) into a buffer
(sprintf, vsprintf(), or their glib wrappers with automatic allocation,
if any) and print it with a single error_printf("%s", buf).
Anyway I absolutely do not insist on this, so sorry for the noise.
Thanks
Laszlo
|
|
From: Markus A. <ar...@re...> - 2013-04-03 08:19:49
|
Seiji Aguchi <sei...@hd...> writes: > Anthony, > >> Currently my company doesn't allow me to use git-send-email to send a email. >> So now I'm trying to find work around with IT. > > Satoru and I still talk if we can use git-send-email with my company IT. > But, it seems to take a long time... > > So, I attached Satoru's patch. > Could you please apply it to your tree? If your company's e-mail is broken, consider using a non-broken, non-company e-mail provider for patch submissions. |
|
From: Anthony L. <ali...@us...> - 2013-04-02 19:12:41
|
Seiji Aguchi <sei...@hd...> writes: > Anthony, > >> Currently my company doesn't allow me to use git-send-email to send a email. >> So now I'm trying to find work around with IT. > > Satoru and I still talk if we can use git-send-email with my company IT. > But, it seems to take a long time... > > So, I attached Satoru's patch. > Could you please apply it to your tree? Even attached, the patch still has carriage returns. I don't have a lot of confidence in accept a patch if it cannot be submitted in the correct fashion at least. Regards, Anthony Liguori > > Seiji > >> -----Original Message----- >> From: Satoru Moriya >> Sent: Thursday, March 28, 2013 12:43 AM >> To: Anthony Liguori; qem...@no... >> Cc: Jan Kiszka; mto...@re...; Paolo Bonzini; Seiji Aguchi; Tomoki Sekiyama; dle...@li...; >> sat...@hi... >> Subject: RE: [Qemu-devel] [PATCH v4] Add option to mlock qemu and guest memory >> >> > -----Original Message----- >> > From: Anthony Liguori [mailto:ali...@us...] >> > Sent: Wednesday, March 27, 2013 11:18 AM >> > To: Satoru Moriya; qem...@no... >> > Cc: Jan Kiszka; mto...@re...; Paolo Bonzini; Seiji Aguchi; >> > Tomoki Sekiyama; dle- de...@li...; >> > sat...@hi... >> > Subject: Re: [Qemu-devel] [PATCH v4] Add option to mlock qemu and >> > guest memory >> > >> > Satoru Moriya <sat...@hd...> writes: >> > >> > > In certain scenario, latency induced by paging is significant and >> > > memory locking is needed. Also, in the scenario with untrusted >> > > guests, latency improvement due to mlock is desired. >> > > >> > > This patch introduces a following new option to mlock guest and qemu >> > > memory: >> > > >> > > -realtime mlock=on|off >> > > >> > > Signed-off-by: Satoru Moriya <sat...@hd...> >> > > Reviewed-by: Paolo Bonzini <pbo...@re...> >> > > Reviewed-by: Marcelo Tosatti <mto...@re...> >> > >> > This patch doesn't apply because you're sending it MIME encoded and >> > the patch has carriage returns in it. I think your mailer is badly munging the patch. >> > >> > Please send it via git-send-email directly from the repository. >> >> I'm sorry for bothering you... >> >> Currently my company doesn't allow me to use git-send-email to send a email. >> So now I'm trying to find work around with IT. >> >> Once it is solved, I re-post the patch. >> >> Regards, >> Satoru |
|
From: Seiji A. <sei...@hd...> - 2013-04-02 15:03:33
|
Anthony, > Currently my company doesn't allow me to use git-send-email to send a email. > So now I'm trying to find work around with IT. Satoru and I still talk if we can use git-send-email with my company IT. But, it seems to take a long time... So, I attached Satoru's patch. Could you please apply it to your tree? Seiji > -----Original Message----- > From: Satoru Moriya > Sent: Thursday, March 28, 2013 12:43 AM > To: Anthony Liguori; qem...@no... > Cc: Jan Kiszka; mto...@re...; Paolo Bonzini; Seiji Aguchi; Tomoki Sekiyama; dle...@li...; > sat...@hi... > Subject: RE: [Qemu-devel] [PATCH v4] Add option to mlock qemu and guest memory > > > -----Original Message----- > > From: Anthony Liguori [mailto:ali...@us...] > > Sent: Wednesday, March 27, 2013 11:18 AM > > To: Satoru Moriya; qem...@no... > > Cc: Jan Kiszka; mto...@re...; Paolo Bonzini; Seiji Aguchi; > > Tomoki Sekiyama; dle- de...@li...; > > sat...@hi... > > Subject: Re: [Qemu-devel] [PATCH v4] Add option to mlock qemu and > > guest memory > > > > Satoru Moriya <sat...@hd...> writes: > > > > > In certain scenario, latency induced by paging is significant and > > > memory locking is needed. Also, in the scenario with untrusted > > > guests, latency improvement due to mlock is desired. > > > > > > This patch introduces a following new option to mlock guest and qemu > > > memory: > > > > > > -realtime mlock=on|off > > > > > > Signed-off-by: Satoru Moriya <sat...@hd...> > > > Reviewed-by: Paolo Bonzini <pbo...@re...> > > > Reviewed-by: Marcelo Tosatti <mto...@re...> > > > > This patch doesn't apply because you're sending it MIME encoded and > > the patch has carriage returns in it. I think your mailer is badly munging the patch. > > > > Please send it via git-send-email directly from the repository. > > I'm sorry for bothering you... > > Currently my company doesn't allow me to use git-send-email to send a email. > So now I'm trying to find work around with IT. > > Once it is solved, I re-post the patch. > > Regards, > Satoru |
|
From: Paolo B. <pbo...@re...> - 2013-03-28 23:45:23
|
Il 28/03/2013 20:43, Seiji Aguchi ha scritto: > [Problem] > Currently, guest OS's messages can be logged to a local disk of host OS by creating chadevs with options below. > -chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0 > > When a hardware failure happens in the disk, qemu-kvm can't create the chardevs. > In this case, guest OS doesn't boot up. > > Actually, there are users who don't desire that guest OS goes down due to a hardware failure of a log disk only. > Therefore, qemu should offer some way to boot guest OS up even if the log disk is broken. > > [Solution] > This patch skips error checks in case where opening a log file and creating chardev fail. This kind of policy does not belong in QEMU. QEMU should always fail if it sees a problem. Your management tool should just open the file outside QEMU, and then use either /dev/fd/NN or /dev/null as the destination. Paolo |
|
From: Seiji A. <sei...@hd...> - 2013-03-28 19:43:57
|
[Problem]
Currently, guest OS's messages can be logged to a local disk of host OS by creating chadevs with options below.
-chardev file,id=charserial0,path=<log file's path> -device isa-serial,chardev=chardevserial0,id=serial0
When a hardware failure happens in the disk, qemu-kvm can't create the chardevs.
In this case, guest OS doesn't boot up.
Actually, there are users who don't desire that guest OS goes down due to a hardware failure of a log disk only.
Therefore, qemu should offer some way to boot guest OS up even if the log disk is broken.
[Solution]
This patch skips error checks in case where opening a log file and creating chardev fail.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
hw/qdev-properties-system.c | 2 +-
hw/serial-isa.c | 11 +++++++----
hw/serial-pci.c | 27 ++++++++++++++++-----------
hw/serial.c | 33 +++++++++++++++++++--------------
hw/serial.h | 2 +-
vl.c | 4 +++-
6 files changed, 47 insertions(+), 32 deletions(-)
diff --git a/hw/qdev-properties-system.c b/hw/qdev-properties-system.c
index d9934b5..99eff4f 100644
--- a/hw/qdev-properties-system.c
+++ b/hw/qdev-properties-system.c
@@ -121,7 +121,7 @@ static int parse_chr(DeviceState *dev, const char *str, void **ptr)
{
CharDriverState *chr = qemu_chr_find(str);
if (chr == NULL) {
- return -ENOENT;
+ return 0;
}
if (chr->avail_connections < 1) {
return -EEXIST;
diff --git a/hw/serial-isa.c b/hw/serial-isa.c
index a630a7d..b6cfa0d 100644
--- a/hw/serial-isa.c
+++ b/hw/serial-isa.c
@@ -46,6 +46,7 @@ static int serial_isa_initfn(ISADevice *dev)
static int index;
ISASerialState *isa = DO_UPCAST(ISASerialState, dev, dev);
SerialState *s = &isa->state;
+ int rc;
if (isa->index == -1) {
isa->index = index;
@@ -63,11 +64,13 @@ static int serial_isa_initfn(ISADevice *dev)
s->baudbase = 115200;
isa_init_irq(dev, &s->irq, isa->isairq);
- serial_init_core(s);
- qdev_set_legacy_instance_id(&dev->qdev, isa->iobase, 3);
+ rc = serial_init_core(s);
+ if (!rc) {
+ qdev_set_legacy_instance_id(&dev->qdev, isa->iobase, 3);
- memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
- isa_register_ioport(dev, &s->io, isa->iobase);
+ memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
+ isa_register_ioport(dev, &s->io, isa->iobase);
+ }
return 0;
}
diff --git a/hw/serial-pci.c b/hw/serial-pci.c
index 954657b..3e8b2d5 100644
--- a/hw/serial-pci.c
+++ b/hw/serial-pci.c
@@ -49,15 +49,18 @@ static int serial_pci_init(PCIDevice *dev)
{
PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev);
SerialState *s = &pci->state;
+ int rc;
s->baudbase = 115200;
- serial_init_core(s);
+ rc = serial_init_core(s);
- pci->dev.config[PCI_INTERRUPT_PIN] = 0x01;
- s->irq = pci->dev.irq[0];
+ if (!rc) {
+ pci->dev.config[PCI_INTERRUPT_PIN] = 0x01;
+ s->irq = pci->dev.irq[0];
- memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
- pci_register_bar(&pci->dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->io);
+ memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
+ pci_register_bar(&pci->dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->io);
+ }
return 0;
}
@@ -80,7 +83,7 @@ static int multi_serial_pci_init(PCIDevice *dev)
PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
PCIMultiSerialState *pci = DO_UPCAST(PCIMultiSerialState, dev, dev);
SerialState *s;
- int i;
+ int i, rc;
switch (pc->device_id) {
case 0x0003:
@@ -102,11 +105,13 @@ static int multi_serial_pci_init(PCIDevice *dev)
for (i = 0; i < pci->ports; i++) {
s = pci->state + i;
s->baudbase = 115200;
- serial_init_core(s);
- s->irq = pci->irqs[i];
- pci->name[i] = g_strdup_printf("uart #%d", i+1);
- memory_region_init_io(&s->io, &serial_io_ops, s, pci->name[i], 8);
- memory_region_add_subregion(&pci->iobar, 8 * i, &s->io);
+ rc = serial_init_core(s);
+ if (!rc) {
+ s->irq = pci->irqs[i];
+ pci->name[i] = g_strdup_printf("uart #%d", i+1);
+ memory_region_init_io(&s->io, &serial_io_ops, s, pci->name[i], 8);
+ memory_region_add_subregion(&pci->iobar, 8 * i, &s->io);
+ }
}
return 0;
}
diff --git a/hw/serial.c b/hw/serial.c
index 0ccc499..ab8929e 100644
--- a/hw/serial.c
+++ b/hw/serial.c
@@ -670,11 +670,11 @@ static void serial_reset(void *opaque)
qemu_irq_lower(s->irq);
}
-void serial_init_core(SerialState *s)
+int serial_init_core(SerialState *s)
{
if (!s->chr) {
fprintf(stderr, "Can't create serial device, empty char device\n");
- exit(1);
+ return 1;
}
s->modem_status_poll = qemu_new_timer_ns(vm_clock, (QEMUTimerCB *) serial_update_msl, s);
@@ -684,6 +684,7 @@ void serial_init_core(SerialState *s)
qemu_chr_add_handlers(s->chr, serial_can_receive1, serial_receive1,
serial_event, s);
+ return 0;
}
void serial_exit_core(SerialState *s)
@@ -713,19 +714,20 @@ SerialState *serial_init(int base, qemu_irq irq, int baudbase,
CharDriverState *chr, MemoryRegion *system_io)
{
SerialState *s;
+ int rc;
s = g_malloc0(sizeof(SerialState));
s->irq = irq;
s->baudbase = baudbase;
s->chr = chr;
- serial_init_core(s);
-
- vmstate_register(NULL, base, &vmstate_serial, s);
-
- memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
- memory_region_add_subregion(system_io, base, &s->io);
+ rc = serial_init_core(s);
+ if (!rc) {
+ vmstate_register(NULL, base, &vmstate_serial, s);
+ memory_region_init_io(&s->io, &serial_io_ops, s, "serial", 8);
+ memory_region_add_subregion(system_io, base, &s->io);
+ }
return s;
}
@@ -769,6 +771,7 @@ SerialState *serial_mm_init(MemoryRegion *address_space,
CharDriverState *chr, enum device_endian end)
{
SerialState *s;
+ int rc;
s = g_malloc0(sizeof(SerialState));
@@ -777,13 +780,15 @@ SerialState *serial_mm_init(MemoryRegion *address_space,
s->baudbase = baudbase;
s->chr = chr;
- serial_init_core(s);
- vmstate_register(NULL, base, &vmstate_serial, s);
+ rc = serial_init_core(s);
+ if (!rc) {
+ vmstate_register(NULL, base, &vmstate_serial, s);
- memory_region_init_io(&s->io, &serial_mm_ops[end], s,
- "serial", 8 << it_shift);
- memory_region_add_subregion(address_space, base, &s->io);
+ memory_region_init_io(&s->io, &serial_mm_ops[end], s,
+ "serial", 8 << it_shift);
+ memory_region_add_subregion(address_space, base, &s->io);
- serial_update_msl(s);
+ serial_update_msl(s);
+ }
return s;
}
diff --git a/hw/serial.h b/hw/serial.h
index e884499..0ba2332 100644
--- a/hw/serial.h
+++ b/hw/serial.h
@@ -83,7 +83,7 @@ struct SerialState {
extern const VMStateDescription vmstate_serial;
extern const MemoryRegionOps serial_io_ops;
-void serial_init_core(SerialState *s);
+int serial_init_core(SerialState *s);
void serial_exit_core(SerialState *s);
void serial_set_frequency(SerialState *s, uint32_t frequency);
diff --git a/vl.c b/vl.c
index 7643f16..62e983e 100644
--- a/vl.c
+++ b/vl.c
@@ -2348,7 +2348,9 @@ static int chardev_init_func(QemuOpts *opts, void *opaque)
if (error_is_set(&local_err)) {
fprintf(stderr, "%s\n", error_get_pretty(local_err));
error_free(local_err);
- return -1;
+ if (strcmp("file", qemu_opt_get(opts, "backend"))) {
+ return -1;
+ }
}
return 0;
}
-- 1.7.1
|
|
From: Satoru M. <sat...@hd...> - 2013-03-28 04:43:21
|
> -----Original Message----- > From: Anthony Liguori [mailto:ali...@us...] > Sent: Wednesday, March 27, 2013 11:18 AM > To: Satoru Moriya; qem...@no... > Cc: Jan Kiszka; mto...@re...; Paolo Bonzini; Seiji Aguchi; Tomoki Sekiyama; dle- > de...@li...; sat...@hi... > Subject: Re: [Qemu-devel] [PATCH v4] Add option to mlock qemu and guest memory > > Satoru Moriya <sat...@hd...> writes: > > > In certain scenario, latency induced by paging is significant and > > memory locking is needed. Also, in the scenario with untrusted guests, > > latency improvement due to mlock is desired. > > > > This patch introduces a following new option to mlock guest and qemu > > memory: > > > > -realtime mlock=on|off > > > > Signed-off-by: Satoru Moriya <sat...@hd...> > > Reviewed-by: Paolo Bonzini <pbo...@re...> > > Reviewed-by: Marcelo Tosatti <mto...@re...> > > This patch doesn't apply because you're sending it MIME encoded and the patch has carriage returns in > it. I think your mailer is badly munging the patch. > > Please send it via git-send-email directly from the repository. I'm sorry for bothering you... Currently my company doesn't allow me to use git-send-email to send a email. So now I'm trying to find work around with IT. Once it is solved, I re-post the patch. Regards, Satoru |
|
From: Anthony L. <ali...@us...> - 2013-03-27 15:18:26
|
Satoru Moriya <sat...@hd...> writes:
> In certain scenario, latency induced by paging is significant and
> memory locking is needed. Also, in the scenario with untrusted
> guests, latency improvement due to mlock is desired.
>
> This patch introduces a following new option to mlock guest and
> qemu memory:
>
> -realtime mlock=on|off
>
> Signed-off-by: Satoru Moriya <sat...@hd...>
> Reviewed-by: Paolo Bonzini <pbo...@re...>
> Reviewed-by: Marcelo Tosatti <mto...@re...>
This patch doesn't apply because you're sending it MIME encoded and the
patch has carriage returns in it. I think your mailer is badly munging
the patch.
Please send it via git-send-email directly from the repository.
Regards,
Anthony Liguori
> ---
> ChangeLog:
> v4
> - Update commit message
> v3
> - Modify os_mlock() to return error code
> - Update configure_realtime() to handle return value from os_mlock()
> - Change the variable name from is_mlock to enable_mlock in configure_realtime()
> - Rebase qemu version 1.4.50
>
> v2
> - Change option name from -mlock to -realtime mlock=on|off
> - Rebase qemu version 1.3.91
> - Update patch description
>
> include/sysemu/os-posix.h | 1 +
> include/sysemu/os-win32.h | 5 +++++
> os-posix.c | 12 ++++++++++++
> qemu-options.hx | 13 +++++++++++++
> vl.c | 34 ++++++++++++++++++++++++++++++++++
> 5 files changed, 65 insertions(+)
>
> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
> index 7f198e4..25d0b2a 100644
> --- a/include/sysemu/os-posix.h
> +++ b/include/sysemu/os-posix.h
> @@ -31,6 +31,7 @@ void os_set_proc_name(const char *s);
> void os_setup_signal_handling(void);
> void os_daemonize(void);
> void os_setup_post(void);
> +int os_mlock(void);
>
> typedef struct timeval qemu_timeval;
> #define qemu_gettimeofday(tp) gettimeofday(tp, NULL)
> diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
> index 71f5fa0..bf8523a 100644
> --- a/include/sysemu/os-win32.h
> +++ b/include/sysemu/os-win32.h
> @@ -106,4 +106,9 @@ static inline bool is_daemonized(void)
> return false;
> }
>
> +static inline int os_mlock(void)
> +{
> + return -ENOSYS;
> +}
> +
> #endif
> diff --git a/os-posix.c b/os-posix.c
> index 5c64518..d39261d 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -363,3 +363,15 @@ bool is_daemonized(void)
> {
> return daemonize;
> }
> +
> +int os_mlock(void)
> +{
> + int ret = 0;
> +
> + ret = mlockall(MCL_CURRENT | MCL_FUTURE);
> + if (ret < 0) {
> + perror("mlockall");
> + }
> +
> + return ret;
> +}
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 06dd565..1ec9541 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2569,6 +2569,19 @@ STEXI
> Do not start CPU at startup (you must type 'c' in the monitor).
> ETEXI
>
> +DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
> + "-realtime [mlock=on|off]\n"
> + " run qemu with realtime features\n"
> + " mlock=on|off controls mlock support (default: on)\n",
> + QEMU_ARCH_ALL)
> +STEXI
> +@item -realtime mlock=on|off
> +@findex -realtime
> +Run qemu with realtime features.
> +mlocking qemu and guest memory can be enabled via @option{mlock=on}
> +(enabled by default).
> +ETEXI
> +
> DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
> "-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
> STEXI
> diff --git a/vl.c b/vl.c
> index aeed7f4..71bbcf1 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -521,6 +521,18 @@ static QemuOptsList qemu_tpmdev_opts = {
> },
> };
>
> +static QemuOptsList qemu_realtime_opts = {
> + .name = "realtime",
> + .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
> + .desc = {
> + {
> + .name = "mlock",
> + .type = QEMU_OPT_BOOL,
> + },
> + { /* end of list */ }
> + },
> +};
> +
> const char *qemu_get_vm_name(void)
> {
> return qemu_name;
> @@ -1420,6 +1432,20 @@ static void smp_parse(const char *optarg)
> max_cpus = smp_cpus;
> }
>
> +static void configure_realtime(QemuOpts *opts)
> +{
> + bool enable_mlock;
> +
> + enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
> +
> + if (enable_mlock) {
> + if (os_mlock() < 0) {
> + fprintf(stderr, "qemu: locking memory failed\n");
> + exit(1);
> + }
> + }
> +}
> +
> /***********************************************************/
> /* USB devices */
>
> @@ -2909,6 +2935,7 @@ int main(int argc, char **argv, char **envp)
> qemu_add_opts(&qemu_add_fd_opts);
> qemu_add_opts(&qemu_object_opts);
> qemu_add_opts(&qemu_tpmdev_opts);
> + qemu_add_opts(&qemu_realtime_opts);
>
> runstate_init();
>
> @@ -3878,6 +3905,13 @@ int main(int argc, char **argv, char **envp)
> exit(1);
> }
> break;
> + case QEMU_OPTION_realtime:
> + opts = qemu_opts_parse(qemu_find_opts("realtime"), optarg, 0);
> + if (!opts) {
> + exit(1);
> + }
> + configure_realtime(opts);
> + break;
> default:
> os_parse_cmd_args(popt->index, optarg);
> }
> --
> 1.7.11.7
|
|
From: Satoru M. <sat...@hd...> - 2013-03-26 15:05:56
|
In certain scenario, latency induced by paging is significant and
memory locking is needed. Also, in the scenario with untrusted
guests, latency improvement due to mlock is desired.
This patch introduces a following new option to mlock guest and
qemu memory:
-realtime mlock=on|off
Signed-off-by: Satoru Moriya <sat...@hd...>
Reviewed-by: Paolo Bonzini <pbo...@re...>
Reviewed-by: Marcelo Tosatti <mto...@re...>
---
ChangeLog:
v4
- Update commit message
v3
- Modify os_mlock() to return error code
- Update configure_realtime() to handle return value from os_mlock()
- Change the variable name from is_mlock to enable_mlock in configure_realtime()
- Rebase qemu version 1.4.50
v2
- Change option name from -mlock to -realtime mlock=on|off
- Rebase qemu version 1.3.91
- Update patch description
include/sysemu/os-posix.h | 1 +
include/sysemu/os-win32.h | 5 +++++
os-posix.c | 12 ++++++++++++
qemu-options.hx | 13 +++++++++++++
vl.c | 34 ++++++++++++++++++++++++++++++++++
5 files changed, 65 insertions(+)
diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 7f198e4..25d0b2a 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -31,6 +31,7 @@ void os_set_proc_name(const char *s);
void os_setup_signal_handling(void);
void os_daemonize(void);
void os_setup_post(void);
+int os_mlock(void);
typedef struct timeval qemu_timeval;
#define qemu_gettimeofday(tp) gettimeofday(tp, NULL)
diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
index 71f5fa0..bf8523a 100644
--- a/include/sysemu/os-win32.h
+++ b/include/sysemu/os-win32.h
@@ -106,4 +106,9 @@ static inline bool is_daemonized(void)
return false;
}
+static inline int os_mlock(void)
+{
+ return -ENOSYS;
+}
+
#endif
diff --git a/os-posix.c b/os-posix.c
index 5c64518..d39261d 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -363,3 +363,15 @@ bool is_daemonized(void)
{
return daemonize;
}
+
+int os_mlock(void)
+{
+ int ret = 0;
+
+ ret = mlockall(MCL_CURRENT | MCL_FUTURE);
+ if (ret < 0) {
+ perror("mlockall");
+ }
+
+ return ret;
+}
diff --git a/qemu-options.hx b/qemu-options.hx
index 06dd565..1ec9541 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2569,6 +2569,19 @@ STEXI
Do not start CPU at startup (you must type 'c' in the monitor).
ETEXI
+DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
+ "-realtime [mlock=on|off]\n"
+ " run qemu with realtime features\n"
+ " mlock=on|off controls mlock support (default: on)\n",
+ QEMU_ARCH_ALL)
+STEXI
+@item -realtime mlock=on|off
+@findex -realtime
+Run qemu with realtime features.
+mlocking qemu and guest memory can be enabled via @option{mlock=on}
+(enabled by default).
+ETEXI
+
DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
"-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
STEXI
diff --git a/vl.c b/vl.c
index aeed7f4..71bbcf1 100644
--- a/vl.c
+++ b/vl.c
@@ -521,6 +521,18 @@ static QemuOptsList qemu_tpmdev_opts = {
},
};
+static QemuOptsList qemu_realtime_opts = {
+ .name = "realtime",
+ .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
+ .desc = {
+ {
+ .name = "mlock",
+ .type = QEMU_OPT_BOOL,
+ },
+ { /* end of list */ }
+ },
+};
+
const char *qemu_get_vm_name(void)
{
return qemu_name;
@@ -1420,6 +1432,20 @@ static void smp_parse(const char *optarg)
max_cpus = smp_cpus;
}
+static void configure_realtime(QemuOpts *opts)
+{
+ bool enable_mlock;
+
+ enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
+
+ if (enable_mlock) {
+ if (os_mlock() < 0) {
+ fprintf(stderr, "qemu: locking memory failed\n");
+ exit(1);
+ }
+ }
+}
+
/***********************************************************/
/* USB devices */
@@ -2909,6 +2935,7 @@ int main(int argc, char **argv, char **envp)
qemu_add_opts(&qemu_add_fd_opts);
qemu_add_opts(&qemu_object_opts);
qemu_add_opts(&qemu_tpmdev_opts);
+ qemu_add_opts(&qemu_realtime_opts);
runstate_init();
@@ -3878,6 +3905,13 @@ int main(int argc, char **argv, char **envp)
exit(1);
}
break;
+ case QEMU_OPTION_realtime:
+ opts = qemu_opts_parse(qemu_find_opts("realtime"), optarg, 0);
+ if (!opts) {
+ exit(1);
+ }
+ configure_realtime(opts);
+ break;
default:
os_parse_cmd_args(popt->index, optarg);
}
--
1.7.11.7
|
|
From: Anthony L. <ali...@us...> - 2013-03-26 14:59:20
|
Satoru Moriya <sat...@hd...> writes:
> ChangeLog:
> v3
> - Modify os_mlock() to return error code
> - Update configure_realtime() to handle return value from os_mlock()
> - Change the variable name from is_mlock to enable_mlock in configure_realtime()
> - Rebase qemu version 1.4.50
>
> v2
> - Change option name from -mlock to -realtime mlock=on|off
> - Rebase qemu version 1.3.91
> - Update patch description
>
> We have some plans to migrate legacy enterprise systems which require
> low latency (10 msec order) to kvm virtualized environment. In our
> usecase, the system runs with other untrusted guests and so locking
> memory which is used by the system is needed to avoid latency
> impacts from other guests' memory activity.
>
> ---
By putting this here, you're going to mess up the commit message.
Everything below '---' gets chopped off.
You want to move the changelog below the --- and the commit message
above it.
Please submit a v4 with a corrected commit message.
Regards,
Anthony Liguori
> In certain scenario, latency induced by paging is significant and
> memory locking is needed. Also, in the scenario with untrusted
> guests, latency improvement due to mlock is desired.
>
> This patch introduces a following new option to mlock guest and
> qemu memory:
>
> -realtime mlock=on|off
>
> Signed-off-by: Satoru Moriya <sat...@hd...>
> Reviewed-by: Paolo Bonzini <pbo...@re...>
> Reviewed-by: Marcelo Tosatti <mto...@re...>
>
> ---
> include/sysemu/os-posix.h | 1 +
> include/sysemu/os-win32.h | 5 +++++
> os-posix.c | 12 ++++++++++++
> qemu-options.hx | 13 +++++++++++++
> vl.c | 34 ++++++++++++++++++++++++++++++++++
> 5 files changed, 65 insertions(+)
>
> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
> index 7f198e4..25d0b2a 100644
> --- a/include/sysemu/os-posix.h
> +++ b/include/sysemu/os-posix.h
> @@ -31,6 +31,7 @@ void os_set_proc_name(const char *s);
> void os_setup_signal_handling(void);
> void os_daemonize(void);
> void os_setup_post(void);
> +int os_mlock(void);
>
> typedef struct timeval qemu_timeval;
> #define qemu_gettimeofday(tp) gettimeofday(tp, NULL)
> diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
> index 71f5fa0..bf8523a 100644
> --- a/include/sysemu/os-win32.h
> +++ b/include/sysemu/os-win32.h
> @@ -106,4 +106,9 @@ static inline bool is_daemonized(void)
> return false;
> }
>
> +static inline int os_mlock(void)
> +{
> + return -ENOSYS;
> +}
> +
> #endif
> diff --git a/os-posix.c b/os-posix.c
> index 5c64518..d39261d 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -363,3 +363,15 @@ bool is_daemonized(void)
> {
> return daemonize;
> }
> +
> +int os_mlock(void)
> +{
> + int ret = 0;
> +
> + ret = mlockall(MCL_CURRENT | MCL_FUTURE);
> + if (ret < 0) {
> + perror("mlockall");
> + }
> +
> + return ret;
> +}
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 06dd565..1ec9541 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2569,6 +2569,19 @@ STEXI
> Do not start CPU at startup (you must type 'c' in the monitor).
> ETEXI
>
> +DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
> + "-realtime [mlock=on|off]\n"
> + " run qemu with realtime features\n"
> + " mlock=on|off controls mlock support (default: on)\n",
> + QEMU_ARCH_ALL)
> +STEXI
> +@item -realtime mlock=on|off
> +@findex -realtime
> +Run qemu with realtime features.
> +mlocking qemu and guest memory can be enabled via @option{mlock=on}
> +(enabled by default).
> +ETEXI
> +
> DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
> "-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
> STEXI
> diff --git a/vl.c b/vl.c
> index aeed7f4..71bbcf1 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -521,6 +521,18 @@ static QemuOptsList qemu_tpmdev_opts = {
> },
> };
>
> +static QemuOptsList qemu_realtime_opts = {
> + .name = "realtime",
> + .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
> + .desc = {
> + {
> + .name = "mlock",
> + .type = QEMU_OPT_BOOL,
> + },
> + { /* end of list */ }
> + },
> +};
> +
> const char *qemu_get_vm_name(void)
> {
> return qemu_name;
> @@ -1420,6 +1432,20 @@ static void smp_parse(const char *optarg)
> max_cpus = smp_cpus;
> }
>
> +static void configure_realtime(QemuOpts *opts)
> +{
> + bool enable_mlock;
> +
> + enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
> +
> + if (enable_mlock) {
> + if (os_mlock() < 0) {
> + fprintf(stderr, "qemu: locking memory failed\n");
> + exit(1);
> + }
> + }
> +}
> +
> /***********************************************************/
> /* USB devices */
>
> @@ -2909,6 +2935,7 @@ int main(int argc, char **argv, char **envp)
> qemu_add_opts(&qemu_add_fd_opts);
> qemu_add_opts(&qemu_object_opts);
> qemu_add_opts(&qemu_tpmdev_opts);
> + qemu_add_opts(&qemu_realtime_opts);
>
> runstate_init();
>
> @@ -3878,6 +3905,13 @@ int main(int argc, char **argv, char **envp)
> exit(1);
> }
> break;
> + case QEMU_OPTION_realtime:
> + opts = qemu_opts_parse(qemu_find_opts("realtime"), optarg, 0);
> + if (!opts) {
> + exit(1);
> + }
> + configure_realtime(opts);
> + break;
> default:
> os_parse_cmd_args(popt->index, optarg);
> }
> --
> 1.7.11.7
|
|
From: Satoru M. <sat...@hd...> - 2013-03-26 14:40:44
|
> -----Original Message----- > From: Anthony Liguori [mailto:ali...@us...] > Sent: Tuesday, March 26, 2013 10:28 AM > To: Satoru Moriya; qem...@no... > Cc: Jan Kiszka; dle...@li...; Seiji Aguchi; Paolo Bonzini; mto...@re...; > Tomoki Sekiyama; sat...@hi... > Subject: Re: [Qemu-devel] [PATCH v3] Add option to mlock qemu and guest memory > > Satoru Moriya <sat...@hd...> writes: > > > ChangeLog: > > v3 > > - Modify os_mlock() to return error code > > - Update configure_realtime() to handle return value from os_mlock() > > - Change the variable name from is_mlock to enable_mlock in > > configure_realtime() > > - Rebase qemu version 1.4.50 > > > > v2 > > - Change option name from -mlock to -realtime mlock=on|off > > - Rebase qemu version 1.3.91 > > - Update patch description > > > > We have some plans to migrate legacy enterprise systems which require > > low latency (10 msec order) to kvm virtualized environment. In our > > usecase, the system runs with other untrusted guests and so locking > > memory which is used by the system is needed to avoid latency impacts > > from other guests' memory activity. > > > > --- > > By putting this here, you're going to mess up the commit message. > Everything below '---' gets chopped off. > > You want to move the changelog below the --- and the commit message above it. > > Please submit a v4 with a corrected commit message. Thanks. I'll post v4 soon. Regards, Satoru |
|
From: Satoru M. <sat...@hd...> - 2013-03-23 03:57:17
|
ChangeLog:
v3
- Modify os_mlock() to return error code
- Update configure_realtime() to handle return value from os_mlock()
- Change the variable name from is_mlock to enable_mlock in configure_realtime()
- Rebase qemu version 1.4.50
v2
- Change option name from -mlock to -realtime mlock=on|off
- Rebase qemu version 1.3.91
- Update patch description
We have some plans to migrate legacy enterprise systems which require
low latency (10 msec order) to kvm virtualized environment. In our
usecase, the system runs with other untrusted guests and so locking
memory which is used by the system is needed to avoid latency
impacts from other guests' memory activity.
---
In certain scenario, latency induced by paging is significant and
memory locking is needed. Also, in the scenario with untrusted
guests, latency improvement due to mlock is desired.
This patch introduces a following new option to mlock guest and
qemu memory:
-realtime mlock=on|off
Signed-off-by: Satoru Moriya <sat...@hd...>
Reviewed-by: Paolo Bonzini <pbo...@re...>
Reviewed-by: Marcelo Tosatti <mto...@re...>
---
include/sysemu/os-posix.h | 1 +
include/sysemu/os-win32.h | 5 +++++
os-posix.c | 12 ++++++++++++
qemu-options.hx | 13 +++++++++++++
vl.c | 34 ++++++++++++++++++++++++++++++++++
5 files changed, 65 insertions(+)
diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 7f198e4..25d0b2a 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -31,6 +31,7 @@ void os_set_proc_name(const char *s);
void os_setup_signal_handling(void);
void os_daemonize(void);
void os_setup_post(void);
+int os_mlock(void);
typedef struct timeval qemu_timeval;
#define qemu_gettimeofday(tp) gettimeofday(tp, NULL)
diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
index 71f5fa0..bf8523a 100644
--- a/include/sysemu/os-win32.h
+++ b/include/sysemu/os-win32.h
@@ -106,4 +106,9 @@ static inline bool is_daemonized(void)
return false;
}
+static inline int os_mlock(void)
+{
+ return -ENOSYS;
+}
+
#endif
diff --git a/os-posix.c b/os-posix.c
index 5c64518..d39261d 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -363,3 +363,15 @@ bool is_daemonized(void)
{
return daemonize;
}
+
+int os_mlock(void)
+{
+ int ret = 0;
+
+ ret = mlockall(MCL_CURRENT | MCL_FUTURE);
+ if (ret < 0) {
+ perror("mlockall");
+ }
+
+ return ret;
+}
diff --git a/qemu-options.hx b/qemu-options.hx
index 06dd565..1ec9541 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2569,6 +2569,19 @@ STEXI
Do not start CPU at startup (you must type 'c' in the monitor).
ETEXI
+DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
+ "-realtime [mlock=on|off]\n"
+ " run qemu with realtime features\n"
+ " mlock=on|off controls mlock support (default: on)\n",
+ QEMU_ARCH_ALL)
+STEXI
+@item -realtime mlock=on|off
+@findex -realtime
+Run qemu with realtime features.
+mlocking qemu and guest memory can be enabled via @option{mlock=on}
+(enabled by default).
+ETEXI
+
DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
"-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
STEXI
diff --git a/vl.c b/vl.c
index aeed7f4..71bbcf1 100644
--- a/vl.c
+++ b/vl.c
@@ -521,6 +521,18 @@ static QemuOptsList qemu_tpmdev_opts = {
},
};
+static QemuOptsList qemu_realtime_opts = {
+ .name = "realtime",
+ .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
+ .desc = {
+ {
+ .name = "mlock",
+ .type = QEMU_OPT_BOOL,
+ },
+ { /* end of list */ }
+ },
+};
+
const char *qemu_get_vm_name(void)
{
return qemu_name;
@@ -1420,6 +1432,20 @@ static void smp_parse(const char *optarg)
max_cpus = smp_cpus;
}
+static void configure_realtime(QemuOpts *opts)
+{
+ bool enable_mlock;
+
+ enable_mlock = qemu_opt_get_bool(opts, "mlock", true);
+
+ if (enable_mlock) {
+ if (os_mlock() < 0) {
+ fprintf(stderr, "qemu: locking memory failed\n");
+ exit(1);
+ }
+ }
+}
+
/***********************************************************/
/* USB devices */
@@ -2909,6 +2935,7 @@ int main(int argc, char **argv, char **envp)
qemu_add_opts(&qemu_add_fd_opts);
qemu_add_opts(&qemu_object_opts);
qemu_add_opts(&qemu_tpmdev_opts);
+ qemu_add_opts(&qemu_realtime_opts);
runstate_init();
@@ -3878,6 +3905,13 @@ int main(int argc, char **argv, char **envp)
exit(1);
}
break;
+ case QEMU_OPTION_realtime:
+ opts = qemu_opts_parse(qemu_find_opts("realtime"), optarg, 0);
+ if (!opts) {
+ exit(1);
+ }
+ configure_realtime(opts);
+ break;
default:
os_parse_cmd_args(popt->index, optarg);
}
--
1.7.11.7
|
|
From: Satoru M. <sat...@hd...> - 2013-03-21 19:08:31
|
Hi Jan,
Thank you for reviewing.
On 03/21/2013 05:08 AM, Jan Kiszka wrote:
> On 2013-02-14 21:21, Satoru Moriya wrote:
>> @@ -31,6 +31,7 @@ void os_set_proc_name(const char *s); void
>> os_setup_signal_handling(void); void os_daemonize(void); void
>> os_setup_post(void);
>> +void os_mlock(void);
>>
>> typedef struct timeval qemu_timeval; #define qemu_gettimeofday(tp)
>> gettimeofday(tp, NULL) diff --git a/include/sysemu/os-win32.h
>> b/include/sysemu/os-win32.h index bf9edeb..a74ca13 100644
>> --- a/include/sysemu/os-win32.h
>> +++ b/include/sysemu/os-win32.h
>> @@ -80,6 +80,7 @@ static inline void os_daemonize(void) {} static
>> inline void os_setup_post(void) {} void os_set_line_buffering(void);
>> static inline void os_set_proc_name(const char *dummy) {}
>> +static inline void os_mlock(void) {}
>
> Let's return an error code from os_mlock, a permanent failure like
> -ENOSYS on Win32, the result of mlock() on POSIX. Requesting to lock
> the memory should not silently fail on Windows.
OK. I'll update it.
>> diff --git a/vl.c b/vl.c
>> index 1355f69..c16c8ad 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -491,6 +491,18 @@ static QemuOptsList qemu_object_opts = {
>> },
>> };
>>
>> +static QemuOptsList qemu_realtime_opts = {
>> + .name = "realtime",
>> + .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
>> + .desc = {
>> + {
>> + .name = "mlock",
>> + .type = QEMU_OPT_BOOL,
>> + },
>> + { /* end of list */ }
>> + },
>> +};
>> +
>> const char *qemu_get_vm_name(void)
>> {
>> return qemu_name;
>> @@ -1384,6 +1396,17 @@ static void smp_parse(const char *optarg)
>> max_cpus = smp_cpus;
>> }
>>
>> +static void configure_realtime(QemuOpts *opts) {
>> + bool is_mlock;
>
> "enable_mlock" or just "mlock".
will do.
Regards,
Satoru
|
|
From: Jan K. <jan...@si...> - 2013-03-21 09:08:26
|
On 2013-02-14 21:21, Satoru Moriya wrote:
> We have some plans to migrate legacy enterprise systems which require
> low latency (10 msec order) to kvm virtualized environment. In our
> usecase, the system runs with other untrusted guests and so locking
> memory which is used by the system is needed to avoid latency
> impacts from other guests' memory activity.
>
> ChangeLog:
> v2
> - Change the option name from -mlock to -realtime mlock=on|off
> - Rebase qemu version 1.3.91
> - Update patch description
>
> ---
> In certain scenario, latency induced by paging is significant and
> memory locking is needed. Also, in the scenario with untrusted
> guests, latency improvement due to mlock is desired.
>
> This patch introduces a following new option to mlock guest and
> qemu memory:
>
> -realtime mlock=on|off
>
> Signed-off-by: Satoru Moriya <sat...@hd...>
> ---
> include/sysemu/os-posix.h | 1 +
> include/sysemu/os-win32.h | 1 +
> os-posix.c | 8 ++++++++
> qemu-options.hx | 13 +++++++++++++
> vl.c | 31 +++++++++++++++++++++++++++++++
> 5 files changed, 54 insertions(+)
>
> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
> index 7f198e4..2f2ead6 100644
> --- a/include/sysemu/os-posix.h
> +++ b/include/sysemu/os-posix.h
> @@ -31,6 +31,7 @@ void os_set_proc_name(const char *s);
> void os_setup_signal_handling(void);
> void os_daemonize(void);
> void os_setup_post(void);
> +void os_mlock(void);
>
> typedef struct timeval qemu_timeval;
> #define qemu_gettimeofday(tp) gettimeofday(tp, NULL)
> diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
> index bf9edeb..a74ca13 100644
> --- a/include/sysemu/os-win32.h
> +++ b/include/sysemu/os-win32.h
> @@ -80,6 +80,7 @@ static inline void os_daemonize(void) {}
> static inline void os_setup_post(void) {}
> void os_set_line_buffering(void);
> static inline void os_set_proc_name(const char *dummy) {}
> +static inline void os_mlock(void) {}
Let's return an error code from os_mlock, a permanent failure like
-ENOSYS on Win32, the result of mlock() on POSIX. Requesting to lock the
memory should not silently fail on Windows.
>
> #if !defined(EPROTONOSUPPORT)
> # define EPROTONOSUPPORT EINVAL
> diff --git a/os-posix.c b/os-posix.c
> index 5c64518..1304b0e 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -363,3 +363,11 @@ bool is_daemonized(void)
> {
> return daemonize;
> }
> +
> +void os_mlock(void)
> +{
> + if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
> + perror("mlockall");
> + exit(1);
> + }
> +}
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9d7131a..843fcb4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2464,6 +2464,19 @@ STEXI
> Do not start CPU at startup (you must type 'c' in the monitor).
> ETEXI
>
> +DEF("realtime", HAS_ARG, QEMU_OPTION_realtime,
> + "-realtime [mlock=on|off]\n"
Hmm, an argument to -realtime is actually not optional but mandatory.
But we cannot express this. So this is just a comment, not a to-do.
> + " run qemu with realtime features\n"
> + " mlock=on|off controls mlock support (default: on)\n",
> + QEMU_ARCH_ALL)
> +STEXI
> +@item -realtime mlock=on|off
> +@findex -realtime
> +Run qemu with realtime features.
> +mlocking qemu and guest memory can be enabled via @option{mlock=on}
> +(enabled by default).
> +ETEXI
> +
> DEF("gdb", HAS_ARG, QEMU_OPTION_gdb, \
> "-gdb dev wait for gdb connection on 'dev'\n", QEMU_ARCH_ALL)
> STEXI
> diff --git a/vl.c b/vl.c
> index 1355f69..c16c8ad 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -491,6 +491,18 @@ static QemuOptsList qemu_object_opts = {
> },
> };
>
> +static QemuOptsList qemu_realtime_opts = {
> + .name = "realtime",
> + .head = QTAILQ_HEAD_INITIALIZER(qemu_realtime_opts.head),
> + .desc = {
> + {
> + .name = "mlock",
> + .type = QEMU_OPT_BOOL,
> + },
> + { /* end of list */ }
> + },
> +};
> +
> const char *qemu_get_vm_name(void)
> {
> return qemu_name;
> @@ -1384,6 +1396,17 @@ static void smp_parse(const char *optarg)
> max_cpus = smp_cpus;
> }
>
> +static void configure_realtime(QemuOpts *opts)
> +{
> + bool is_mlock;
"enable_mlock" or just "mlock".
> +
> + is_mlock = qemu_opt_get_bool(opts, "mlock", true);
> +
> + if (is_mlock) {
> + os_mlock();
> + }
> +}
> +
> /***********************************************************/
> /* USB devices */
>
> @@ -2860,6 +2883,7 @@ int main(int argc, char **argv, char **envp)
> qemu_add_opts(&qemu_sandbox_opts);
> qemu_add_opts(&qemu_add_fd_opts);
> qemu_add_opts(&qemu_object_opts);
> + qemu_add_opts(&qemu_realtime_opts);
>
> runstate_init();
>
> @@ -3806,6 +3830,13 @@ int main(int argc, char **argv, char **envp)
> exit(1);
> }
> break;
> + case QEMU_OPTION_realtime:
> + opts = qemu_opts_parse(qemu_find_opts("realtime"), optarg, 0);
> + if (!opts) {
> + exit(1);
> + }
> + configure_realtime(opts);
> + break;
> default:
> os_parse_cmd_args(popt->index, optarg);
> }
>
Otherwise I'm fine with it.
Thanks,
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
|
|
From: Seiji A. <sei...@hd...> - 2013-03-20 02:10:37
|
> -----Original Message----- > From: lin...@vg... [mailto:lin...@vg...] On Behalf Of Matt Fleming > Sent: Tuesday, March 19, 2013 6:16 PM > To: Seiji Aguchi > Cc: lin...@vg...; Andre Heider (a.h...@gm...); Lingzhu Xiang (lx...@re...); mi...@go...; dle- > de...@li...; Satoru Moriya; Tomoki Sekiyama > Subject: Re: [PATCH] efivar: Disable get_next_variable when firmware is broken > > On 03/19/2013 09:17 PM, Seiji Aguchi wrote: > > [Problem] > > Some firmware implementations return the same variable name on > > multiple calls to get_next_variable(). > > In this case, an efivar driver gets stuck in a infinite loop at > > initializing time, register_efivars(). > > It is hard for users to fix the broken firmware. > > > > Here is an actual result of the case above. > > http://article.gmane.org/gmane.linux.kernel.efi/835 > > > > [Solution] > > This patch terminates the loop immediately and disables > > get_next_variable() at the initializing time when the same variable > > name is found on multiple calls to get_next_variable(). > > > > Also, to avoid stucking in the infinite loop, update_sysfs_entries > > returns without calling get_next_variable if the case above happens. > > > > Reported-by: Andre Heider <a.h...@gm...> > > Reported-by: Lingzhu Xiang <lx...@re...> > > Signed-off-by: Matt Fleming <mat...@in...> > > Signed-off-by: Seiji Aguchi <sei...@hd...> > > --- > > drivers/firmware/efivars.c | 34 ++++++++++++++++++++++++++++++++-- > > drivers/firmware/google/gsmi.c | 4 +++- > > include/linux/efi.h | 3 ++- > > 3 files changed, 37 insertions(+), 4 deletions(-) > > I don't see how this solution is an improvement to the patch I posted here. > > http://article.gmane.org/gmane.linux.kernel.efi/905 Ah, I just missed your patch above. It is reasonable to me. > > You unconditionally call efivar_update_sysfs_entries(), even when the algorithm isn't going to succeed. Your patch also spreads this > firmware brain damage into the gsmi.c code, which really doesn't need to concern itself with these problems. > I agree. Thank you for giving me the comment. Seiji > -- > Matt Fleming, Intel Open Source Technology Center > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to maj...@vg... More > majordomo info at http://vger.kernel.org/majordomo-info.html |
|
From: Matt F. <ma...@co...> - 2013-03-19 22:32:48
|
On 03/19/2013 09:17 PM, Seiji Aguchi wrote: > [Problem] > Some firmware implementations return the same variable name on multiple calls to > get_next_variable(). > In this case, an efivar driver gets stuck in a infinite loop at initializing time, > register_efivars(). > It is hard for users to fix the broken firmware. > > Here is an actual result of the case above. > http://article.gmane.org/gmane.linux.kernel.efi/835 > > [Solution] > This patch terminates the loop immediately and disables get_next_variable() > at the initializing time when the same variable name is found on multiple > calls to get_next_variable(). > > Also, to avoid stucking in the infinite loop, > update_sysfs_entries returns without calling get_next_variable if the case above happens. > > Reported-by: Andre Heider <a.h...@gm...> > Reported-by: Lingzhu Xiang <lx...@re...> > Signed-off-by: Matt Fleming <mat...@in...> > Signed-off-by: Seiji Aguchi <sei...@hd...> > --- > drivers/firmware/efivars.c | 34 ++++++++++++++++++++++++++++++++-- > drivers/firmware/google/gsmi.c | 4 +++- > include/linux/efi.h | 3 ++- > 3 files changed, 37 insertions(+), 4 deletions(-) I don't see how this solution is an improvement to the patch I posted here. http://article.gmane.org/gmane.linux.kernel.efi/905 You unconditionally call efivar_update_sysfs_entries(), even when the algorithm isn't going to succeed. Your patch also spreads this firmware brain damage into the gsmi.c code, which really doesn't need to concern itself with these problems. -- Matt Fleming, Intel Open Source Technology Center |