|
From: Mathieu D. <mat...@ef...> - 2010-05-12 17:48:48
|
* Masami Hiramatsu (mhi...@re...) wrote: > Mathieu Desnoyers wrote: > > * Masami Hiramatsu (mhi...@re...) wrote: > >> Mathieu Desnoyers wrote: > >>> * Masami Hiramatsu (mhi...@re...) wrote: > >>>> Use text_poke_smp_batch() in optimization path for reducing > >>>> the number of stop_machine() issues. > >>>> > >>>> Signed-off-by: Masami Hiramatsu <mhi...@re...> > >>>> Cc: Ananth N Mavinakayanahalli <an...@in...> > >>>> Cc: Ingo Molnar <mi...@el...> > >>>> Cc: Jim Keniston <jke...@us...> > >>>> Cc: Jason Baron <jb...@re...> > >>>> Cc: Mathieu Desnoyers <mat...@ef...> > >>>> --- > >>>> > >>>> arch/x86/kernel/kprobes.c | 37 ++++++++++++++++++++++++++++++------- > >>>> include/linux/kprobes.h | 2 +- > >>>> kernel/kprobes.c | 13 +------------ > >>>> 3 files changed, 32 insertions(+), 20 deletions(-) > >>>> > >>>> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c > >>>> index 345a4b1..63a5c24 100644 > >>>> --- a/arch/x86/kernel/kprobes.c > >>>> +++ b/arch/x86/kernel/kprobes.c > >>>> @@ -1385,10 +1385,14 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op) > >>>> return 0; > >>>> } > >>>> > >>>> -/* Replace a breakpoint (int3) with a relative jump. */ > >>>> -int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op) > >>>> +#define MAX_OPTIMIZE_PROBES 256 > >>> > >>> So what kind of interrupt latency does a 256-probes batch generate on the > >>> system ? Are we talking about a few milliseconds, a few seconds ? > >> > >> From my experiment on kvm/4cpu, it took about 3 seconds in average. > > > > That's 3 seconds for multiple calls to stop_machine(). So we can expect > > latencies in the area of few microseconds for each call, right ? > > Theoretically yes. > But if we register more than 1000 probes at once, it's hard to do > anything except optimizing a while(more than 10 sec), because > it stops machine so frequently. > > >> With this patch, it went down to 30ms. (x100 faster :)) > > > > This is beefing up the latency from few microseconds to 30ms. It sounds like a > > regression rather than a gain to me. > > If it is not acceptable, I can add a knob for control how many probes > optimize/unoptimize at once. Anyway, it is expectable latency (after > registering/unregistering probes) and it will be small if we put a few probes. > (30ms is the worst case) > And if you want, it can be disabled by sysctl. I think we are starting to see the stop_machine() approach is really limiting our ability to do even relatively small amount of work without hurting responsiveness significantly. What's the current showstopper with the breakpoint-bypass-ipi approach that solves this issue properly and makes this batching approach unnecessary ? Thanks, Mathieu > > Thank you, > > -- > Masami Hiramatsu > e-mail: mhi...@re... -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com |