|
From: Masami H. <mhi...@re...> - 2010-05-12 19:12:23
|
Mathieu Desnoyers wrote: > * Masami Hiramatsu (mhi...@re...) wrote: >> Mathieu Desnoyers wrote: >>> * Masami Hiramatsu (mhi...@re...) wrote: >>>> Mathieu Desnoyers wrote: >>>>> * Masami Hiramatsu (mhi...@re...) wrote: >>>>>> Use text_poke_smp_batch() in optimization path for reducing >>>>>> the number of stop_machine() issues. >>>>>> >>>>>> Signed-off-by: Masami Hiramatsu <mhi...@re...> >>>>>> Cc: Ananth N Mavinakayanahalli <an...@in...> >>>>>> Cc: Ingo Molnar <mi...@el...> >>>>>> Cc: Jim Keniston <jke...@us...> >>>>>> Cc: Jason Baron <jb...@re...> >>>>>> Cc: Mathieu Desnoyers <mat...@ef...> >>>>>> --- >>>>>> >>>>>> arch/x86/kernel/kprobes.c | 37 ++++++++++++++++++++++++++++++------- >>>>>> include/linux/kprobes.h | 2 +- >>>>>> kernel/kprobes.c | 13 +------------ >>>>>> 3 files changed, 32 insertions(+), 20 deletions(-) >>>>>> >>>>>> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c >>>>>> index 345a4b1..63a5c24 100644 >>>>>> --- a/arch/x86/kernel/kprobes.c >>>>>> +++ b/arch/x86/kernel/kprobes.c >>>>>> @@ -1385,10 +1385,14 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op) >>>>>> return 0; >>>>>> } >>>>>> >>>>>> -/* Replace a breakpoint (int3) with a relative jump. */ >>>>>> -int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op) >>>>>> +#define MAX_OPTIMIZE_PROBES 256 >>>>> >>>>> So what kind of interrupt latency does a 256-probes batch generate on the >>>>> system ? Are we talking about a few milliseconds, a few seconds ? >>>> >>>> From my experiment on kvm/4cpu, it took about 3 seconds in average. >>> >>> That's 3 seconds for multiple calls to stop_machine(). So we can expect >>> latencies in the area of few microseconds for each call, right ? >> >> Theoretically yes. >> But if we register more than 1000 probes at once, it's hard to do >> anything except optimizing a while(more than 10 sec), because >> it stops machine so frequently. >> >>>> With this patch, it went down to 30ms. (x100 faster :)) >>> >>> This is beefing up the latency from few microseconds to 30ms. It sounds like a >>> regression rather than a gain to me. >> >> If it is not acceptable, I can add a knob for control how many probes >> optimize/unoptimize at once. Anyway, it is expectable latency (after >> registering/unregistering probes) and it will be small if we put a few probes. >> (30ms is the worst case) >> And if you want, it can be disabled by sysctl. > > I think we are starting to see the stop_machine() approach is really limiting > our ability to do even relatively small amount of work without hurting > responsiveness significantly. > > What's the current showstopper with the breakpoint-bypass-ipi approach that > solves this issue properly and makes this batching approach unnecessary ? We still do not have any official answer from chip vendors. As you know, basic implementation has been done. Thank you, -- Masami Hiramatsu e-mail: mhi...@re... |