You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 23:43:22
|
Hi,
Here are the patchset of the kprobes jump optimization v4
(a.k.a. Djprobe). This version includes some bugfixes and
generic kernel patching interface which allow you to replace
just single instruction without stop_machine(and this is
enough for Jason's tracepoint jump patching).
These patches can be applied on random-tracing tree.
Changes from v3:
- Change RELATIVEJUMP_INSTRUCTION macro to RELATIVEJUMP_OPCODE.
- Don't clear current_kprobe when reentering probe is boosted.
- Don't clear current_kprobe when reenter probe goes to optimized path.
- Update code against for x86 insn decoder updating.
- Use generic jump patching API.
And kprobe stress test didn't found any regressions,
under kvm/x86.
Jump Optimized Kprobes
======================
o Concept
Kprobes uses the int3 breakpoint instruction on x86 for instrumenting
probes into running kernel. Jump optimization allows kprobes to replace
breakpoint with a jump instruction for reducing probing overhead drastically.
o Performance
An optimized kprobe 5 times faster than a kprobe.
Optimizing probes gains its performance. Usually, a kprobe hit takes
0.5 to 1.0 microseconds to process. On the other hand, a jump optimized
probe hit takes less than 0.1 microseconds (actual number depends on the
processor). Here is a sample overheads.
Intel(R) Xeon(R) CPU E5410 @ 2.33GHz (without debugging options)
x86-32 x86-64
kprobe: 0.68us 0.91us
kprobe+booster: 0.27us 0.40us
kprobe+optimized: 0.06us 0.06us
kretprobe : 0.95us 1.21us
kretprobe+booster: 0.53us 0.71us
kretprobe+optimized: 0.30us 0.35us
(booster skips single-stepping)
Note that jump optimization also consumes more memory, but not so much.
It just uses ~200 bytes, so, even if you use ~10,000 probes, it just
consumes a few MB.
o Usage
Set CONFIG_OPTPROBES=y when building a kernel, then all *probes will be
optimized if possible.
Kprobes decodes probed function and checks whether the target instructions
can be optimized(replaced with a jump) safely. If it can't be, Kprobes just
doesn't optimize it.
o Optimization
Before preparing optimization, Kprobes inserts original(user-defined)
kprobe on the specified address. So, even if the kprobe is not
possible to be optimized, it just uses a normal kprobe.
- Safety check
First, Kprobes gets the address of probed function and checks whether the
optimized region, which will be replaced by a jump instruction, does NOT
straddle the function boundary, because if the optimized region reaches the
next function, its caller causes unexpected results.
Next, Kprobes decodes whole body of probed function and checks there is
NO indirect jump, NO instruction which will cause exception by checking
exception_tables (this will jump to fixup code and fixup code jumps into
same function body) and NO near jump which jumps into the optimized region
(except the 1st byte of jump), because if some jump instruction jumps
into the middle of another instruction, it causes unexpected results too.
Kprobes also measures the length of instructions which will be replaced
by a jump instruction, because a jump instruction is longer than 1 byte,
it may replaces multiple instructions, and it checks whether those
instructions can be executed out-of-line.
- Preparing detour code
Then, Kprobes prepares "detour" buffer, which contains exception emulating
code (push/pop registers, call handler), copied instructions(Kprobes copies
instructions which will be replaced by a jump, to the detour buffer), and
a jump which jumps back to the original execution path.
- Pre-optimization
After preparing detour code, Kprobes enqueues the kprobe to optimizing list
and kicks kprobe-optimizer workqueue to optimize it. To wait other optimized
probes, kprobe-optimizer will delay to work.
When the optimized-kprobe is hit before optimization, its handler
changes IP(instruction pointer) to copied code and exits. So, the
instructions which were copied to detour buffer are executed on the detour
buffer.
- Optimization
Kprobe-optimizer doesn't start instruction-replacing soon, it waits
synchronize_sched for safety, because some processors are possible to be
interrupted on the instructions which will be replaced by a jump instruction.
As you know, synchronize_sched() can ensure that all interruptions which were
executed when synchronize_sched() was called are done, only if
CONFIG_PREEMPT=n. So, this version supports only the kernel with
CONFIG_PREEMPT=n.(*)
After that, kprobe-optimizer replaces the 4 bytes right after int3 breakpoint
with relative-jump destination, and synchronize caches on all processors. Next,
it replaces int3 with relative-jump opcode, and synchronize caches again.
- Unoptimization
When unregistering, disabling kprobe or being blocked by other kprobe,
an optimized-kprobe will be unoptimized. Before kprobe-optimizer runs,
the kprobe just be dequeued from the optimized list. When the optimization
has been done, it replaces a jump with int3 breakpoint and original code.
First it puts int3 at the first byte of the jump, synchronize caches
on all processors, and replaces the 4 bytes right after int3 with the
original code.
(*)This optimization-safety checking may be replaced with stop-machine method
which ksplice is done for supporting CONFIG_PREEMPT=y kernel.
Thank you,
---
Masami Hiramatsu (9):
kprobes: Add documents of jump optimization
kprobes/x86: Support kprobes jump optimization on x86
kprobes/x86: Cleanup save/restore registers
kprobes/x86: Boost probes when reentering
kprobes: Jump optimization sysctl interface
kprobes: Introduce kprobes jump optimization
kprobes: Introduce generic insn_slot framework
x86: Introduce generic jump patching without stop_machine
kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
Documentation/kprobes.txt | 192 +++++++++++-
arch/Kconfig | 13 +
arch/x86/Kconfig | 1
arch/x86/include/asm/alternative.h | 12 +
arch/x86/include/asm/kprobes.h | 31 ++
arch/x86/kernel/alternative.c | 120 ++++++++
arch/x86/kernel/kprobes.c | 571 +++++++++++++++++++++++++++++-------
include/linux/kprobes.h | 45 +++
kernel/kprobes.c | 573 +++++++++++++++++++++++++++++++-----
kernel/sysctl.c | 13 +
10 files changed, 1366 insertions(+), 205 deletions(-)
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 21:09:50
|
On Mon, Sep 14, 2009 at 05:10:33PM -0400, Masami Hiramatsu wrote: > Frederic Weisbecker wrote: >> On Mon, Sep 14, 2009 at 01:16:13PM -0400, Masami Hiramatsu wrote: >>> Frederic Weisbecker wrote: >>>> On Fri, Sep 11, 2009 at 12:03:30PM -0400, Masami Hiramatsu wrote: >>>>> Frederic Weisbecker wrote: >>>>>> May be another step in the todo-list that would be nice: define the format >>>>>> for a type. Like it's done from ftrace events. >>>>> >>>>> Thanks! >>>>> >>>>> BTW, I'm not sure what the type means. Each event already has its own >>>>> event ID and event_call. Could you tell me which part of ftrace I should >>>>> refer to ? >>>>> >>>> >>>> >>>> Actually I meant the format for a field. >>>> Say you define filename=arg1, it would be nice to have >>>> >>>> print "%s", filename >>>> >>>> in the format file. >>> >>> Ah, indeed. It is better to support 'type' casting for each argument. >>> I think type casting can be done as below syntax. >>> >>> NAME=ARG:TYPE >>> e.g. >>> jiffies64=@jiffies64:u64 >>> message=%ax:str >>> >> >> >> Yeah looks good! >> >> >> >>>> Hmm, now that I think about it, we can't dereference an array...for now :-) >>> >>> :-) >>> BTW, currently, an entry of the array can be shown, e.g. +10(sa). >>> Hmm, for more complex dereference(e.g. accessing a->b[a->c]), it might need >>> another dereferencing syntax(e.g. "sa[16][sa[8]]"), or >>> just allow to use braces(e.g. "+(+8(sa))(+16(sa))"). >> >> >> Well, that may be too much complexity. >> I guess if we want multi level dereference, say you want a->b->c >> it should be sufficient to probe the point where b->c gets it's >> value (if any). >> >> But it would be nice to fetch a range: sa[begin:end] >> Or at least just giving the length of the range. > > Hmm, I think it is better to treat the range as a type, because > we don't know how to express each entry without the type. > > NAME=ARG:TYPE[LEN] > e.g. > regs=sa:u64[16] > Yeah indeed, and that fits well in the NAME=ARG:TYPE logic that you have suggested. |
|
From: Masami H. <mhi...@re...> - 2009-09-14 21:07:37
|
Frederic Weisbecker wrote: > On Mon, Sep 14, 2009 at 01:16:13PM -0400, Masami Hiramatsu wrote: >> Frederic Weisbecker wrote: >>> On Fri, Sep 11, 2009 at 12:03:30PM -0400, Masami Hiramatsu wrote: >>>> Frederic Weisbecker wrote: >>>>> May be another step in the todo-list that would be nice: define the format >>>>> for a type. Like it's done from ftrace events. >>>> >>>> Thanks! >>>> >>>> BTW, I'm not sure what the type means. Each event already has its own >>>> event ID and event_call. Could you tell me which part of ftrace I should >>>> refer to ? >>>> >>> >>> >>> Actually I meant the format for a field. >>> Say you define filename=arg1, it would be nice to have >>> >>> print "%s", filename >>> >>> in the format file. >> >> Ah, indeed. It is better to support 'type' casting for each argument. >> I think type casting can be done as below syntax. >> >> NAME=ARG:TYPE >> e.g. >> jiffies64=@jiffies64:u64 >> message=%ax:str >> > > > Yeah looks good! > > > >>> Hmm, now that I think about it, we can't dereference an array...for now :-) >> >> :-) >> BTW, currently, an entry of the array can be shown, e.g. +10(sa). >> Hmm, for more complex dereference(e.g. accessing a->b[a->c]), it might need >> another dereferencing syntax(e.g. "sa[16][sa[8]]"), or >> just allow to use braces(e.g. "+(+8(sa))(+16(sa))"). > > > Well, that may be too much complexity. > I guess if we want multi level dereference, say you want a->b->c > it should be sufficient to probe the point where b->c gets it's > value (if any). > > But it would be nice to fetch a range: sa[begin:end] > Or at least just giving the length of the range. Hmm, I think it is better to treat the range as a type, because we don't know how to express each entry without the type. NAME=ARG:TYPE[LEN] e.g. regs=sa:u64[16] Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 21:07:31
|
On Mon, Sep 14, 2009 at 03:36:33PM -0400, Masami Hiramatsu wrote: > Frederic Weisbecker wrote: >> On Mon, Sep 14, 2009 at 12:54:24PM -0400, Masami Hiramatsu wrote: >>>>> I'd like to have a dispatcher function and flags internally :) >>>> >>>> >>>> You mean kprobes that could support multiple probes? >>>> That would be a nice solution IMHO... >>> >>> Yeah, actually kprobes could support multiple probes on the >>> same point. But kprobe structure has many extensions which >>> kprobe-tracer doesn't need, e.g. post_handler/break_handler, >>> opcode, arch sprcific instructions. >>> Kretprobe consumes more memories for storing return points :(. >>> >>> Thus, if we know there are two functions to be called on the >>> same probe point, I think it is better to have a dispatcher. >>> (Especially, in this case, we can call fixed functions, so >>> it's easier way.) >> >> >> Yeah, you could union the post_handler with profile_handler >> or something like that. > > No, you can't do that, because kprobes calls post_handler if > it is not NULL. Yeah, I was meaning: having the normal probe call ftrace handler and post probe call perf handler. But well, that looks not that sane (your dispatch idea looks fine). >> >> It depends if kprobes may need one day to support an undeterminate >> number of probes. > > Kprobes itself is supporting those multiple kprobes on the same > address. I meant that we don't need to have multiple kprobes on > the same "kprobe-tracer's event". Even if introducing a dispatcher, > kprobe-tracer can support multiple trace-event at the same location. Yeah, I was not meaning several kprobes on the same address but actually a single kprobe instance with multiple probe callbacks. But your dispatch patch (sorry) does that in essence, though specialized to the ftrace/perf couple but that's fine since nothing else may need to have multiplexed probes. >> Also, is the post_handler called at the same location than the normal >> probe? > > No, post_handler is called after single-stepping. Ok. > >> And is a post handler called even if there is no normal handler? > > Yes, it is. > > Hmm, I assume I have told about kprobes infrastructure, and have you > told about kprobe-tracer?:) No I was talking about kprobes in general :) >> There might be some of such factors that would force you to handle a >> lot of corner cases, things that you wouldn't need to worry about >> if you just had to maintain a simple rcu list of probes to call. > > > Anyway, I never see who are using post_handler:). I'm not sure why > it is needed... May be it's a legacy of code that was used by the past, in which case that could perhaps be cleaned up? |
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 20:53:06
|
On Mon, Sep 14, 2009 at 01:16:13PM -0400, Masami Hiramatsu wrote: > Frederic Weisbecker wrote: >> On Fri, Sep 11, 2009 at 12:03:30PM -0400, Masami Hiramatsu wrote: >>> Frederic Weisbecker wrote: >>>> May be another step in the todo-list that would be nice: define the format >>>> for a type. Like it's done from ftrace events. >>> >>> Thanks! >>> >>> BTW, I'm not sure what the type means. Each event already has its own >>> event ID and event_call. Could you tell me which part of ftrace I should >>> refer to ? >>> >> >> >> Actually I meant the format for a field. >> Say you define filename=arg1, it would be nice to have >> >> print "%s", filename >> >> in the format file. > > Ah, indeed. It is better to support 'type' casting for each argument. > I think type casting can be done as below syntax. > > NAME=ARG:TYPE > e.g. > jiffies64=@jiffies64:u64 > message=%ax:str > Yeah looks good! >> Hmm, now that I think about it, we can't dereference an array...for now :-) > > :-) > BTW, currently, an entry of the array can be shown, e.g. +10(sa). > Hmm, for more complex dereference(e.g. accessing a->b[a->c]), it might need > another dereferencing syntax(e.g. "sa[16][sa[8]]"), or > just allow to use braces(e.g. "+(+8(sa))(+16(sa))"). Well, that may be too much complexity. I guess if we want multi level dereference, say you want a->b->c it should be sufficient to probe the point where b->c gets it's value (if any). But it would be nice to fetch a range: sa[begin:end] Or at least just giving the length of the range. |
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:48:13
|
Disable newly created kprobe events by default, not to disturb another user using ftrace. "Disturb" means when someone using ftrace and another user tries to use perf-tools, (in near future) if he defines new kprobe event via perf-tools, then new events will mess up the frace buffer. I think that's ugly. Signed-off-by: Masami Hiramatsu <mhi...@re...> c: Jim Keniston <jke...@us...> Cc: Ananth N Mavinakayanahalli <an...@in...> Cc: Andi Kleen <ak...@li...> Cc: Christoph Hellwig <hc...@in...> Cc: Frank Ch. Eigler <fc...@re...> Cc: Frederic Weisbecker <fwe...@gm...> Cc: H. Peter Anvin <hp...@zy...> Cc: Ingo Molnar <mi...@el...> Cc: Jason Baron <jb...@re...> Cc: K.Prasad <pr...@li...> Cc: Lai Jiangshan <la...@cn...> Cc: Li Zefan <li...@cn...> Cc: Peter Zijlstra <pe...@in...> Cc: Srikar Dronamraju <sr...@li...> Cc: Steven Rostedt <ro...@go...> Cc: Tom Zanussi <tza...@gm...> --- Documentation/trace/kprobetrace.txt | 11 +++++++++-- kernel/trace/trace_kprobe.c | 4 ++-- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 6521681..9b8f7c6 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -122,8 +122,15 @@ print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, R echo > /sys/kernel/debug/tracing/kprobe_events - This clears all probe points. and you can see the traced information via -/sys/kernel/debug/tracing/trace. + This clears all probe points. + + Right after definition, each event is disabled by default. For tracing these +events, you need to enable it. + + echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable + echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable + + And you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index d8db935..f6821f1 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -383,7 +383,7 @@ static int register_trace_probe(struct trace_probe *tp) goto end; } - tp->flags = TP_FLAG_TRACE; + tp->rp.kp.flags |= KPROBE_FLAG_DISABLED; if (probe_is_return(tp)) ret = register_kretprobe(&tp->rp); else @@ -1298,7 +1298,7 @@ static int register_probe_event(struct trace_probe *tp) call->id = register_ftrace_event(&tp->event); if (!call->id) return -ENODEV; - call->enabled = 1; + call->enabled = 0; call->regfunc = probe_event_enable; call->unregfunc = probe_event_disable; -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:48:05
|
Fix *probe_profile_func() to align buffer size, since pref_counter requires.
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Jim Keniston <jke...@us...>
Cc: Ananth N Mavinakayanahalli <an...@in...>
Cc: Andi Kleen <ak...@li...>
Cc: Christoph Hellwig <hc...@in...>
Cc: Frank Ch. Eigler <fc...@re...>
Cc: Frederic Weisbecker <fwe...@gm...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Ingo Molnar <mi...@el...>
Cc: Jason Baron <jb...@re...>
Cc: K.Prasad <pr...@li...>
Cc: Lai Jiangshan <la...@cn...>
Cc: Li Zefan <li...@cn...>
Cc: Peter Zijlstra <pe...@in...>
Cc: Srikar Dronamraju <sr...@li...>
Cc: Steven Rostedt <ro...@go...>
Cc: Tom Zanussi <tza...@gm...>
---
kernel/trace/trace_kprobe.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 70b632c..d8db935 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1149,18 +1149,23 @@ static __kprobes int kprobe_profile_func(struct kprobe *kp,
struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
struct ftrace_event_call *call = &tp->call;
struct kprobe_trace_entry *entry;
- int size, i, pc;
+ int size, __size, i, pc;
unsigned long irq_flags;
local_save_flags(irq_flags);
pc = preempt_count();
- size = SIZEOF_KPROBE_TRACE_ENTRY(tp->nr_args);
+ __size = SIZEOF_KPROBE_TRACE_ENTRY(tp->nr_args);
+ size = ALIGN(__size + sizeof(u32), sizeof(u64));
+ size -= sizeof(u32);
do {
char raw_data[size];
struct trace_entry *ent;
-
+ /*
+ * Zero dead bytes from alignment to avoid stack leak
+ * to userspace
+ */
*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
entry = (struct kprobe_trace_entry *)raw_data;
ent = &entry->ent;
@@ -1183,13 +1188,15 @@ static __kprobes int kretprobe_profile_func(struct kretprobe_instance *ri,
struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
struct ftrace_event_call *call = &tp->call;
struct kretprobe_trace_entry *entry;
- int size, i, pc;
+ int size, __size, i, pc;
unsigned long irq_flags;
local_save_flags(irq_flags);
pc = preempt_count();
- size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp->nr_args);
+ __size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp->nr_args);
+ size = ALIGN(__size + sizeof(u32), sizeof(u64));
+ size -= sizeof(u32);
do {
char raw_data[size];
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:48:00
|
Add kprobe_dispatcher and kretprobe_dispatcher for dispatching event
to both of profile handler and tracing handler.
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Jim Keniston <jke...@us...>
Cc: Ananth N Mavinakayanahalli <an...@in...>
Cc: Andi Kleen <ak...@li...>
Cc: Christoph Hellwig <hc...@in...>
Cc: Frank Ch. Eigler <fc...@re...>
Cc: Frederic Weisbecker <fwe...@gm...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Ingo Molnar <mi...@el...>
Cc: Jason Baron <jb...@re...>
Cc: K.Prasad <pr...@li...>
Cc: Lai Jiangshan <la...@cn...>
Cc: Li Zefan <li...@cn...>
Cc: Peter Zijlstra <pe...@in...>
Cc: Srikar Dronamraju <sr...@li...>
Cc: Steven Rostedt <ro...@go...>
Cc: Tom Zanussi <tza...@gm...>
---
kernel/trace/trace_kprobe.c | 85 ++++++++++++++++++++++++++++++++-----------
1 files changed, 63 insertions(+), 22 deletions(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ea0db8e..70b632c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -185,10 +185,15 @@ struct probe_arg {
const char *name;
};
+/* Flags for trace_probe */
+#define TP_FLAG_TRACE 1
+#define TP_FLAG_PROFILE 2
+
struct trace_probe {
struct list_head list;
struct kretprobe rp; /* Use rp.kp for kprobe use */
unsigned long nhit;
+ unsigned int flags; /* For TP_FLAG_* */
const char *symbol; /* symbol name */
struct ftrace_event_call call;
struct trace_event event;
@@ -200,10 +205,6 @@ struct trace_probe {
(offsetof(struct trace_probe, args) + \
(sizeof(struct probe_arg) * (n)))
-static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
-static int kretprobe_trace_func(struct kretprobe_instance *ri,
- struct pt_regs *regs);
-
static __kprobes int probe_is_return(struct trace_probe *tp)
{
return tp->rp.handler != NULL;
@@ -263,6 +264,10 @@ static void unregister_probe_event(struct trace_probe *tp);
static DEFINE_MUTEX(probe_lock);
static LIST_HEAD(probe_list);
+static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs);
+static int kretprobe_dispatcher(struct kretprobe_instance *ri,
+ struct pt_regs *regs);
+
/*
* Allocate new trace_probe and initialize it (including kprobes).
*/
@@ -288,11 +293,10 @@ static struct trace_probe *alloc_trace_probe(const char *group,
} else
tp->rp.kp.addr = addr;
- /* Set handler here for checking whether this probe is return or not. */
if (is_return)
- tp->rp.handler = kretprobe_trace_func;
+ tp->rp.handler = kretprobe_dispatcher;
else
- tp->rp.kp.pre_handler = kprobe_trace_func;
+ tp->rp.kp.pre_handler = kprobe_dispatcher;
if (!event)
goto error;
@@ -379,6 +383,7 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}
+ tp->flags = TP_FLAG_TRACE;
if (probe_is_return(tp))
ret = register_kretprobe(&tp->rp);
else
@@ -987,23 +992,24 @@ static int probe_event_enable(struct ftrace_event_call *call)
{
struct trace_probe *tp = (struct trace_probe *)call->data;
- if (probe_is_return(tp)) {
- tp->rp.handler = kretprobe_trace_func;
+ tp->flags |= TP_FLAG_TRACE;
+ if (probe_is_return(tp))
return enable_kretprobe(&tp->rp);
- } else {
- tp->rp.kp.pre_handler = kprobe_trace_func;
+ else
return enable_kprobe(&tp->rp.kp);
- }
}
static void probe_event_disable(struct ftrace_event_call *call)
{
struct trace_probe *tp = (struct trace_probe *)call->data;
- if (probe_is_return(tp))
- disable_kretprobe(&tp->rp);
- else
- disable_kprobe(&tp->rp.kp);
+ tp->flags &= ~TP_FLAG_TRACE;
+ if (!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE))) {
+ if (probe_is_return(tp))
+ disable_kretprobe(&tp->rp);
+ else
+ disable_kprobe(&tp->rp.kp);
+ }
}
static int probe_event_raw_init(struct ftrace_event_call *event_call)
@@ -1212,22 +1218,57 @@ static int probe_profile_enable(struct ftrace_event_call *call)
if (atomic_inc_return(&call->profile_count))
return 0;
- if (probe_is_return(tp)) {
- tp->rp.handler = kretprobe_profile_func;
+ tp->flags |= TP_FLAG_PROFILE;
+ if (probe_is_return(tp))
return enable_kretprobe(&tp->rp);
- } else {
- tp->rp.kp.pre_handler = kprobe_profile_func;
+ else
return enable_kprobe(&tp->rp.kp);
- }
}
static void probe_profile_disable(struct ftrace_event_call *call)
{
+ struct trace_probe *tp = (struct trace_probe *)call->data;
+
if (atomic_add_negative(-1, &call->profile_count))
- probe_event_disable(call);
+ tp->flags &= ~TP_FLAG_PROFILE;
+
+ if (!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE))) {
+ if (probe_is_return(tp))
+ disable_kretprobe(&tp->rp);
+ else
+ disable_kprobe(&tp->rp.kp);
+ }
}
+#endif /* CONFIG_EVENT_PROFILE */
+
+
+static __kprobes
+int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
+{
+ struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
+ if (tp->flags & TP_FLAG_TRACE)
+ kprobe_trace_func(kp, regs);
+#ifdef CONFIG_EVENT_PROFILE
+ if (tp->flags & TP_FLAG_PROFILE)
+ kprobe_profile_func(kp, regs);
#endif /* CONFIG_EVENT_PROFILE */
+ return 0; /* We don't tweek kernel, so just return 0 */
+}
+
+static __kprobes
+int kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
+{
+ struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
+
+ if (tp->flags & TP_FLAG_TRACE)
+ kretprobe_trace_func(ri, regs);
+#ifdef CONFIG_EVENT_PROFILE
+ if (tp->flags & TP_FLAG_PROFILE)
+ kretprobe_profile_func(ri, regs);
+#endif /* CONFIG_EVENT_PROFILE */
+ return 0; /* We don't tweek kernel, so just return 0 */
+}
static int register_probe_event(struct trace_probe *tp)
{
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:47:40
|
Lock not only event_mutex but also trace_event_mutex in
trace_remove_event_call() for protecting __unregister_ftrace_event().
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Jim Keniston <jke...@us...>
Cc: Ananth N Mavinakayanahalli <an...@in...>
Cc: Andi Kleen <ak...@li...>
Cc: Christoph Hellwig <hc...@in...>
Cc: Frank Ch. Eigler <fc...@re...>
Cc: Frederic Weisbecker <fwe...@gm...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Ingo Molnar <mi...@el...>
Cc: Jason Baron <jb...@re...>
Cc: K.Prasad <pr...@li...>
Cc: Lai Jiangshan <la...@cn...>
Cc: Li Zefan <li...@cn...>
Cc: Peter Zijlstra <pe...@in...>
Cc: Srikar Dronamraju <sr...@li...>
Cc: Steven Rostedt <ro...@go...>
Cc: Tom Zanussi <tza...@gm...>
---
kernel/trace/trace_events.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 38e82a5..a3d6bad 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1055,6 +1055,9 @@ static void remove_subsystem_dir(const char *name)
}
}
+/*
+ * Must be called under locking both of event_mutex and trace_event_mutex.
+ */
static void __trace_remove_event_call(struct ftrace_event_call *call)
{
ftrace_event_enable_disable(call, 0);
@@ -1071,7 +1074,9 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
void trace_remove_event_call(struct ftrace_event_call *call)
{
mutex_lock(&event_mutex);
+ down_write(&trace_event_mutex);
__trace_remove_event_call(call);
+ up_write(&trace_event_mutex);
mutex_unlock(&event_mutex);
}
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:47:33
|
Initialize event_call.list and handle failuer path in trace_add_event_call() for fixing below bug which occurred when I tried to add invalid event twice. Could not create debugfs 'kmalloc' directory Failed to register kprobe event: kmalloc Faild to register probe event(-1) ------------[ cut here ]------------ WARNING: at /home/mhiramat/ksrc/random-tracing/lib/list_debug.c:26 __list_add+0x27/0x5c() Hardware name: list_add corruption. next->prev should be prev (c07d78cc), but was 00001000. (next=d854236c). Modules linked in: sunrpc uinput virtio_net virtio_balloon i2c_piix4 pcspkr i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan] Pid: 1394, comm: tee Not tainted 2.6.31-rc9 #51 Call Trace: [<c0438424>] warn_slowpath_common+0x65/0x7c [<c05371b3>] ? __list_add+0x27/0x5c [<c043846f>] warn_slowpath_fmt+0x24/0x27 [<c05371b3>] __list_add+0x27/0x5c [<c047f050>] list_add+0xa/0xc [<c047f8f5>] trace_add_event_call+0x60/0x97 [<c0483133>] command_trace_probe+0x42c/0x51b [<c044a1b3>] ? remove_wait_queue+0x22/0x27 [<c042a9c0>] ? __wake_up+0x32/0x3b [<c04832f6>] probes_write+0xd4/0x10a [<c0483222>] ? probes_write+0x0/0x10a [<c04b27a9>] vfs_write+0x80/0xdf [<c04b289c>] sys_write+0x3b/0x5d [<c0670d41>] syscall_call+0x7/0xb ---[ end trace 2b962b5dc1fdc07d ]--- Signed-off-by: Masami Hiramatsu <mhi...@re...> Cc: Jim Keniston <jke...@us...> Cc: Ananth N Mavinakayanahalli <an...@in...> Cc: Andi Kleen <ak...@li...> Cc: Christoph Hellwig <hc...@in...> Cc: Frank Ch. Eigler <fc...@re...> Cc: Frederic Weisbecker <fwe...@gm...> Cc: H. Peter Anvin <hp...@zy...> Cc: Ingo Molnar <mi...@el...> Cc: Jason Baron <jb...@re...> Cc: K.Prasad <pr...@li...> Cc: Lai Jiangshan <la...@cn...> Cc: Li Zefan <li...@cn...> Cc: Peter Zijlstra <pe...@in...> Cc: Srikar Dronamraju <sr...@li...> Cc: Steven Rostedt <ro...@go...> Cc: Tom Zanussi <tza...@gm...> --- kernel/trace/trace_events.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index ba34920..38e82a5 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1009,10 +1009,14 @@ static int __trace_add_event_call(struct ftrace_event_call *call) if (!d_events) return -ENOENT; + INIT_LIST_HEAD(&call->list); list_add(&call->list, &ftrace_events); - return event_create_dir(call, d_events, &ftrace_event_id_fops, + ret = event_create_dir(call, d_events, &ftrace_event_id_fops, &ftrace_enable_fops, &ftrace_event_filter_fops, &ftrace_event_format_fops); + if (ret < 0) + list_del(&call->list); + return ret; } /* Add an additional event_call dynamically */ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:47:26
|
Fix trace_probe registration order. ftrace_event_call and ftrace_event
must be registered before kprobe/kretprobe, because tracing/profiling
handlers use event-id.
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Jim Keniston <jke...@us...>
Cc: Ananth N Mavinakayanahalli <an...@in...>
Cc: Andi Kleen <ak...@li...>
Cc: Christoph Hellwig <hc...@in...>
Cc: Frank Ch. Eigler <fc...@re...>
Cc: Frederic Weisbecker <fwe...@gm...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Ingo Molnar <mi...@el...>
Cc: Jason Baron <jb...@re...>
Cc: K.Prasad <pr...@li...>
Cc: Lai Jiangshan <la...@cn...>
Cc: Li Zefan <li...@cn...>
Cc: Peter Zijlstra <pe...@in...>
Cc: Srikar Dronamraju <sr...@li...>
Cc: Steven Rostedt <ro...@go...>
Cc: Tom Zanussi <tza...@gm...>
---
kernel/trace/trace_kprobe.c | 42 +++++++++++++++++++-----------------------
1 files changed, 19 insertions(+), 23 deletions(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index cbc0870..ea0db8e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -347,20 +347,15 @@ static struct trace_probe *find_probe_event(const char *event)
return NULL;
}
-static void __unregister_trace_probe(struct trace_probe *tp)
+/* Unregister a trace_probe and probe_event: call with locking probe_lock */
+static void unregister_trace_probe(struct trace_probe *tp)
{
if (probe_is_return(tp))
unregister_kretprobe(&tp->rp);
else
unregister_kprobe(&tp->rp.kp);
-}
-
-/* Unregister a trace_probe and probe_event: call with locking probe_lock */
-static void unregister_trace_probe(struct trace_probe *tp)
-{
- unregister_probe_event(tp);
- __unregister_trace_probe(tp);
list_del(&tp->list);
+ unregister_probe_event(tp);
}
/* Register a trace_probe and probe_event */
@@ -371,6 +366,19 @@ static int register_trace_probe(struct trace_probe *tp)
mutex_lock(&probe_lock);
+ /* register as an event */
+ old_tp = find_probe_event(tp->call.name);
+ if (old_tp) {
+ /* delete old event */
+ unregister_trace_probe(old_tp);
+ free_trace_probe(old_tp);
+ }
+ ret = register_probe_event(tp);
+ if (ret) {
+ pr_warning("Faild to register probe event(%d)\n", ret);
+ goto end;
+ }
+
if (probe_is_return(tp))
ret = register_kretprobe(&tp->rp);
else
@@ -384,21 +392,9 @@ static int register_trace_probe(struct trace_probe *tp)
tp->rp.kp.addr);
ret = -EINVAL;
}
- goto end;
- }
- /* register as an event */
- old_tp = find_probe_event(tp->call.name);
- if (old_tp) {
- /* delete old event */
- unregister_trace_probe(old_tp);
- free_trace_probe(old_tp);
- }
- ret = register_probe_event(tp);
- if (ret) {
- pr_warning("Faild to register probe event(%d)\n", ret);
- __unregister_trace_probe(tp);
- }
- list_add_tail(&tp->list, &probe_list);
+ unregister_probe_event(tp);
+ } else
+ list_add_tail(&tp->list, &probe_list);
end:
mutex_unlock(&probe_lock);
return ret;
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 20:47:17
|
Hi,
Here are bugfixes for kprobe-based event tracer. Thank you for review
and reporting bugs! I fixed it. And also I decided to disable
kprobe event when defining, since the events outputs may suddenly
mess up the ftrace ring_buffer.
Frederic, could you see the patch 4/6? which is what I said on the
previous threads.
Thank you,
---
Masami Hiramatsu (6):
tracing/kprobes: Disable kprobe events by default
tracing/kprobes: Fix profiling alignment for perf_counter buffer
tracing/kprobes: Add probe handler dispatcher for support perf and ftrace
ftrace: Fix trace_remove_event_call() to lock trace_event_mutex
ftrace: Fix trace_add_event_call() to initialize list
tracing/kprobes: Fix trace_probe registration order
Documentation/trace/kprobetrace.txt | 11 ++-
kernel/trace/trace_events.c | 11 ++-
kernel/trace/trace_kprobe.c | 146 +++++++++++++++++++++++------------
3 files changed, 114 insertions(+), 54 deletions(-)
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 19:34:08
|
Frederic Weisbecker wrote: > On Mon, Sep 14, 2009 at 12:54:24PM -0400, Masami Hiramatsu wrote: >>>> I'd like to have a dispatcher function and flags internally :) >>> >>> >>> You mean kprobes that could support multiple probes? >>> That would be a nice solution IMHO... >> >> Yeah, actually kprobes could support multiple probes on the >> same point. But kprobe structure has many extensions which >> kprobe-tracer doesn't need, e.g. post_handler/break_handler, >> opcode, arch sprcific instructions. >> Kretprobe consumes more memories for storing return points :(. >> >> Thus, if we know there are two functions to be called on the >> same probe point, I think it is better to have a dispatcher. >> (Especially, in this case, we can call fixed functions, so >> it's easier way.) > > > Yeah, you could union the post_handler with profile_handler > or something like that. No, you can't do that, because kprobes calls post_handler if it is not NULL. > > It depends if kprobes may need one day to support an undeterminate > number of probes. Kprobes itself is supporting those multiple kprobes on the same address. I meant that we don't need to have multiple kprobes on the same "kprobe-tracer's event". Even if introducing a dispatcher, kprobe-tracer can support multiple trace-event at the same location. > Also, is the post_handler called at the same location than the normal > probe? No, post_handler is called after single-stepping. > And is a post handler called even if there is no normal handler? Yes, it is. Hmm, I assume I have told about kprobes infrastructure, and have you told about kprobe-tracer?:) > There might be some of such factors that would force you to handle a > lot of corner cases, things that you wouldn't need to worry about > if you just had to maintain a simple rcu list of probes to call. Anyway, I never see who are using post_handler:). I'm not sure why it is needed... -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 18:55:59
|
On Mon, Sep 14, 2009 at 12:54:24PM -0400, Masami Hiramatsu wrote: >>> I'd like to have a dispatcher function and flags internally :) >> >> >> You mean kprobes that could support multiple probes? >> That would be a nice solution IMHO... > > Yeah, actually kprobes could support multiple probes on the > same point. But kprobe structure has many extensions which > kprobe-tracer doesn't need, e.g. post_handler/break_handler, > opcode, arch sprcific instructions. > Kretprobe consumes more memories for storing return points :(. > > Thus, if we know there are two functions to be called on the > same probe point, I think it is better to have a dispatcher. > (Especially, in this case, we can call fixed functions, so > it's easier way.) > Yeah, you could union the post_handler with profile_handler or something like that. It depends if kprobes may need one day to support an undeterminate number of probes. Also, is the post_handler called at the same location than the normal probe? And is a post handler called even if there is no normal handler? There might be some of such factors that would force you to handle a lot of corner cases, things that you wouldn't need to worry about if you just had to maintain a simple rcu list of probes to call. |
|
From: Masami H. <mhi...@re...> - 2009-09-14 17:13:05
|
Frederic Weisbecker wrote: > On Fri, Sep 11, 2009 at 12:03:30PM -0400, Masami Hiramatsu wrote: >> Frederic Weisbecker wrote: >>> May be another step in the todo-list that would be nice: define the format >>> for a type. Like it's done from ftrace events. >> >> Thanks! >> >> BTW, I'm not sure what the type means. Each event already has its own >> event ID and event_call. Could you tell me which part of ftrace I should >> refer to ? >> > > > Actually I meant the format for a field. > Say you define filename=arg1, it would be nice to have > > print "%s", filename > > in the format file. Ah, indeed. It is better to support 'type' casting for each argument. I think type casting can be done as below syntax. NAME=ARG:TYPE e.g. jiffies64=@jiffies64:u64 message=%ax:str > Hmm, now that I think about it, we can't dereference an array...for now :-) :-) BTW, currently, an entry of the array can be shown, e.g. +10(sa). Hmm, for more complex dereference(e.g. accessing a->b[a->c]), it might need another dereferencing syntax(e.g. "sa[16][sa[8]]"), or just allow to use braces(e.g. "+(+8(sa))(+16(sa))"). Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Masami H. <mhi...@re...> - 2009-09-14 16:51:28
|
Frederic Weisbecker wrote:
> On Fri, Sep 11, 2009 at 12:22:16PM -0400, Masami Hiramatsu wrote:
>> Frederic Weisbecker wrote:
>>>> +static int probe_profile_enable(struct ftrace_event_call *call)
>>>> +{
>>>> + struct trace_probe *tp = (struct trace_probe *)call->data;
>>>> +
>>>> + if (atomic_inc_return(&call->profile_count))
>>>> + return 0;
>>>> +
>>>> + if (probe_is_return(tp)) {
>>>> + tp->rp.handler = kretprobe_profile_func;
>>>> + return enable_kretprobe(&tp->rp);
>>>> + } else {
>>>> + tp->rp.kp.pre_handler = kprobe_profile_func;
>>>> + return enable_kprobe(&tp->rp.kp);
>>>> + }
>>>> +}
>>>
>>>
>>>
>>> May be I misunderstood but it seems that concurrent uses of
>>> ftrace and perf would really mess up the result, as one would
>>> overwrite the handler of the other.
>>>
>>> Even though it's hard to imagine one using both at the same
>>> time on the same probe, but still...
>>
>> Oops, it's my misunderstanding. I thought those are exclusively
>> enabled each other.
>
>
> It's automatically managed with events because ftrace and
> and perf have their individual tracepoint probes, because
> tracepoints support multiple probes.
>
>
>>> Is it possible to have two kprobes having the exact same
>>> properties? (pointing to the same address, having the same
>>> probe handlers, etc...)
>>>
>>> Another solution would be to allow kprobes to have multiple
>>> handlers.
>>
>> It could be to have multiple kprobes on same point, but I think
>> it's waste of the memory and time in this case.
>
>
> Yeah.
>
>
>>
>> I'd like to have a dispatcher function and flags internally :)
>
>
> You mean kprobes that could support multiple probes?
> That would be a nice solution IMHO...
Yeah, actually kprobes could support multiple probes on the
same point. But kprobe structure has many extensions which
kprobe-tracer doesn't need, e.g. post_handler/break_handler,
opcode, arch sprcific instructions.
Kretprobe consumes more memories for storing return points :(.
Thus, if we know there are two functions to be called on the
same probe point, I think it is better to have a dispatcher.
(Especially, in this case, we can call fixed functions, so
it's easier way.)
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-09-14 16:22:33
|
Ananth N Mavinakayanahalli wrote:
> On Sun, Sep 13, 2009 at 09:47:39PM -0400, Masami Hiramatsu wrote:
>> Ananth N Mavinakayanahalli wrote:
>>> On Fri, Sep 11, 2009 at 05:12:54AM +0200, Frederic Weisbecker wrote:
>>>> On Thu, Sep 10, 2009 at 07:53:30PM -0400, Masami Hiramatsu wrote:
>
> ...
>
>> Hmm, if we catch the second registration here, it's too late. At
>> register_kprobe(), we initialized some field of kprobe before calling
>> register_aggr_kprobe(). Especially kprobe.list is cleared.
>>
>> And this can't check the case that 'p' is already registered on
>> an aggr kprobe.
>
> Agreed. I thought of this case after sending out the earlier patch...
>
>> Thus, I think some initialization paths should be changed (kp.flag
>> field is initialized too early, yearh, that's my mistake),
>> and also, you will need to use get_valid_kprobe() to check the kprobe
>> has not been registered.
>
> __get_valid_kprobe() makes the task easy. We should just prevent
> re-registration whether or not the earlier probe is disabled.
>
> How does this patch look?
>
> ---
> Prevent re-registration of the same kprobe. This situation, though
> unlikely, needs to be flagged since it can lead to a system crash if its
> not handled.
>
> The core change itself is small, but the helper routine needed to be
> moved around a bit; hence the diffstat.
>
> Signed-off-by: Ananth N Mavinakayanahalli<an...@in...>
> ---
> kernel/kprobes.c | 58 ++++++++++++++++++++++++++++++++++++-------------------
> 1 file changed, 38 insertions(+), 20 deletions(-)
>
> Index: linux-2.6.31/kernel/kprobes.c
> ===================================================================
> --- linux-2.6.31.orig/kernel/kprobes.c
> +++ linux-2.6.31/kernel/kprobes.c
> @@ -681,6 +681,40 @@ static kprobe_opcode_t __kprobes *kprobe
> return (kprobe_opcode_t *)(((char *)addr) + p->offset);
> }
>
> +/* Check passed kprobe is valid and return kprobe in kprobe_table. */
> +static struct kprobe * __kprobes __get_valid_kprobe(struct kprobe *p)
> +{
> + struct kprobe *old_p, *list_p;
> +
> + old_p = get_kprobe(p->addr);
> + if (unlikely(!old_p))
> + return NULL;
> +
> + if (p != old_p) {
> + list_for_each_entry_rcu(list_p,&old_p->list, list)
> + if (list_p == p)
> + /* kprobe p is a valid probe */
> + goto valid;
> + return NULL;
> + }
> +valid:
> + return old_p;
> +}
> +
> +/* Return error if the kprobe is being re-registered */
> +static inline int check_kprobe_rereg(struct kprobe *p)
> +{
> + int ret = 0;
> + struct kprobe *old_p;
> +
> + mutex_lock(&kprobe_mutex);
> + old_p = __get_valid_kprobe(p);
> + if (old_p == p)
Here, since __get_valid_kprobe(p) will return aggr_kprobe of 'p',
you just need to check old_p != NULL.
Other parts are OK for me.
Thank you!
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Ananth N M. <an...@in...> - 2009-09-14 10:05:29
|
On Sun, Sep 13, 2009 at 09:47:39PM -0400, Masami Hiramatsu wrote:
> Ananth N Mavinakayanahalli wrote:
> > On Fri, Sep 11, 2009 at 05:12:54AM +0200, Frederic Weisbecker wrote:
> >> On Thu, Sep 10, 2009 at 07:53:30PM -0400, Masami Hiramatsu wrote:
...
> Hmm, if we catch the second registration here, it's too late. At
> register_kprobe(), we initialized some field of kprobe before calling
> register_aggr_kprobe(). Especially kprobe.list is cleared.
>
> And this can't check the case that 'p' is already registered on
> an aggr kprobe.
Agreed. I thought of this case after sending out the earlier patch...
> Thus, I think some initialization paths should be changed (kp.flag
> field is initialized too early, yearh, that's my mistake),
> and also, you will need to use get_valid_kprobe() to check the kprobe
> has not been registered.
__get_valid_kprobe() makes the task easy. We should just prevent
re-registration whether or not the earlier probe is disabled.
How does this patch look?
---
Prevent re-registration of the same kprobe. This situation, though
unlikely, needs to be flagged since it can lead to a system crash if its
not handled.
The core change itself is small, but the helper routine needed to be
moved around a bit; hence the diffstat.
Signed-off-by: Ananth N Mavinakayanahalli <an...@in...>
---
kernel/kprobes.c | 58 ++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 20 deletions(-)
Index: linux-2.6.31/kernel/kprobes.c
===================================================================
--- linux-2.6.31.orig/kernel/kprobes.c
+++ linux-2.6.31/kernel/kprobes.c
@@ -681,6 +681,40 @@ static kprobe_opcode_t __kprobes *kprobe
return (kprobe_opcode_t *)(((char *)addr) + p->offset);
}
+/* Check passed kprobe is valid and return kprobe in kprobe_table. */
+static struct kprobe * __kprobes __get_valid_kprobe(struct kprobe *p)
+{
+ struct kprobe *old_p, *list_p;
+
+ old_p = get_kprobe(p->addr);
+ if (unlikely(!old_p))
+ return NULL;
+
+ if (p != old_p) {
+ list_for_each_entry_rcu(list_p, &old_p->list, list)
+ if (list_p == p)
+ /* kprobe p is a valid probe */
+ goto valid;
+ return NULL;
+ }
+valid:
+ return old_p;
+}
+
+/* Return error if the kprobe is being re-registered */
+static inline int check_kprobe_rereg(struct kprobe *p)
+{
+ int ret = 0;
+ struct kprobe *old_p;
+
+ mutex_lock(&kprobe_mutex);
+ old_p = __get_valid_kprobe(p);
+ if (old_p == p)
+ ret = -EINVAL;
+ mutex_unlock(&kprobe_mutex);
+ return ret;
+}
+
int __kprobes register_kprobe(struct kprobe *p)
{
int ret = 0;
@@ -693,6 +727,10 @@ int __kprobes register_kprobe(struct kpr
return -EINVAL;
p->addr = addr;
+ ret = check_kprobe_rereg(p);
+ if (ret)
+ return ret;
+
preempt_disable();
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr)) {
@@ -762,26 +800,6 @@ out:
}
EXPORT_SYMBOL_GPL(register_kprobe);
-/* Check passed kprobe is valid and return kprobe in kprobe_table. */
-static struct kprobe * __kprobes __get_valid_kprobe(struct kprobe *p)
-{
- struct kprobe *old_p, *list_p;
-
- old_p = get_kprobe(p->addr);
- if (unlikely(!old_p))
- return NULL;
-
- if (p != old_p) {
- list_for_each_entry_rcu(list_p, &old_p->list, list)
- if (list_p == p)
- /* kprobe p is a valid probe */
- goto valid;
- return NULL;
- }
-valid:
- return old_p;
-}
-
/*
* Unregister a kprobe without a scheduler synchronization.
*/
|
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 03:08:19
|
On Fri, Sep 11, 2009 at 03:30:24PM -0400, Masami Hiramatsu wrote:
>> Note that the end-result must be u64 aligned for perf ring buffer.
>> And this is a bit tricky.
>> What is inserted in the perf ring buffer is:
>>
>> raw_trace + (u32)raw_trace_size
>>
>> So we must ensure that sizeof(raw_trace) + sizeof(u32)
>> is well u64 aligned.
>>
>> We don't insert the trace_size ourself though, this is done
>> from kernel/perf_counter.c
>>
>> But we need to handle the size of the size (sorry) in the final
>> alignment.
>> To sum-up: sizeof(raw_trace) doesn't need (shouldn't) to be u64
>> aligned but sizeof(raw_trace) + sizeof(u32) must be.
>>
>> Given this aligned size, we then substract it by sizeof(u32)
>> to have the needed size of the raw entry.
>>
>> This result gives you the size of char raw_data[], which
>> is also the same size passed in perf_tpcounter_event().
>>
>> See?
>
> Ah, I see. So the size to write to perf_tpcounter_event must be
> '(a multiple number of sizeof(u64)) - sizeof(u32)', right?
Exactly.
To simplify I guess the raw events just needs to be u32 aligned :)
> (Hmm, why would not perf_counter align data by itself? :)
Because that would require it to copy the data into a seperate
u64 aligned buffer.
>>
>> That's why we have this in trace/ftrace.h:
>>
>> __data_size = "the real entry data size"
>> __entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32), sizeof(u64));
>> __entry_size -= sizeof(u32);
>>
>> do {
>> char raw_data[__entry_size];
>> ...
>> perf_tpcounter_event(event_call->id, __addr, __count, entry,
>> __entry_size);
>> ...
>> } while (0);
>
> Ok, I'll do that.
Thanks!
|
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 03:03:00
|
On Fri, Sep 11, 2009 at 12:22:16PM -0400, Masami Hiramatsu wrote:
> Frederic Weisbecker wrote:
>>> +static int probe_profile_enable(struct ftrace_event_call *call)
>>> +{
>>> + struct trace_probe *tp = (struct trace_probe *)call->data;
>>> +
>>> + if (atomic_inc_return(&call->profile_count))
>>> + return 0;
>>> +
>>> + if (probe_is_return(tp)) {
>>> + tp->rp.handler = kretprobe_profile_func;
>>> + return enable_kretprobe(&tp->rp);
>>> + } else {
>>> + tp->rp.kp.pre_handler = kprobe_profile_func;
>>> + return enable_kprobe(&tp->rp.kp);
>>> + }
>>> +}
>>
>>
>>
>> May be I misunderstood but it seems that concurrent uses of
>> ftrace and perf would really mess up the result, as one would
>> overwrite the handler of the other.
>>
>> Even though it's hard to imagine one using both at the same
>> time on the same probe, but still...
>
> Oops, it's my misunderstanding. I thought those are exclusively
> enabled each other.
It's automatically managed with events because ftrace and
and perf have their individual tracepoint probes, because
tracepoints support multiple probes.
>> Is it possible to have two kprobes having the exact same
>> properties? (pointing to the same address, having the same
>> probe handlers, etc...)
>>
>> Another solution would be to allow kprobes to have multiple
>> handlers.
>
> It could be to have multiple kprobes on same point, but I think
> it's waste of the memory and time in this case.
Yeah.
>
> I'd like to have a dispatcher function and flags internally :)
You mean kprobes that could support multiple probes?
That would be a nice solution IMHO...
|
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 03:00:05
|
On Fri, Sep 11, 2009 at 12:03:30PM -0400, Masami Hiramatsu wrote: > Frederic Weisbecker wrote: >> May be another step in the todo-list that would be nice: define the format >> for a type. Like it's done from ftrace events. > > Thanks! > > BTW, I'm not sure what the type means. Each event already has its own > event ID and event_call. Could you tell me which part of ftrace I should > refer to ? > Actually I meant the format for a field. Say you define filename=arg1, it would be nice to have print "%s", filename in the format file. Hmm, now that I think about it, we can't dereference an array...for now :-) >> I guess we should choose between the low level, very granular >> but uninviting method "kprobe + record + trace" and also an all >> in one quick approach. >> >> And that could be chosen from perf kprobe: >> >> Low level: >> >> perf kprobe --define-only [-p|-r] [probe_name] -a1 [arg1] -a2 [arg2] \ >> --format="%s %...." >> >> perf record -e kprobes:probe_name >> perf trace >> >> Quick: >> >> perf kprobe -p probe_name -a1 ..... cmdline| -a >> >> And after the profiled task is finished, it could launch perf trace >> by itself (or wait for a Ctrl + C if -a/wide profiling) > > Another thought: expand record subcommand. > > perf record -E "p|r:probe_name,place,arg1,arg2..." > perf trace > > And kprobe accept multiple definitions > > perf kprobe -E "p|r:probe_name,place,arg1,arg2..." -E ... Well, perf record could also support multiple definitions too... |
|
From: Masami H. <mhi...@re...> - 2009-09-14 02:52:27
|
Ananth N Mavinakayanahalli wrote:
> On Fri, Sep 11, 2009 at 05:12:54AM +0200, Frederic Weisbecker wrote:
>> On Thu, Sep 10, 2009 at 07:53:30PM -0400, Masami Hiramatsu wrote:
>
> ...
>
>> Is it possible to have two kprobes having the exact same
>> properties? (pointing to the same address, having the same
>> probe handlers, etc...)
>
> Yes, this is possible with two *different* kprobes. However, we have a bug
> with the current code where there is insufficient scaffolding to prevent
> re-registration of the same kprobe. Here is a patch...
Indeed, that is a bug, or spec. I didn't expect that user register
same kprobes twice.
> ---
> Prevent re-registration of the same kprobe. Current code allows this,
> albeit with disastrous consequences. Its not a common case, but should
> be flagged nonetheless.
>
> Signed-off-by: Ananth N Mavinakayanahalli <an...@in...>
> ---
> kernel/kprobes.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: ptrace-10sep/kernel/kprobes.c
> ===================================================================
> --- ptrace-10sep.orig/kernel/kprobes.c
> +++ ptrace-10sep/kernel/kprobes.c
> @@ -589,6 +589,9 @@ static int __kprobes register_aggr_kprob
> int ret = 0;
> struct kprobe *ap = old_p;
>
> + if (old_p == p)
> + /* Attempt to re-register the same kprobe.. fail */
> + return -EINVAL;
> if (old_p->pre_handler != aggr_pre_handler) {
> /* If old_p is not an aggr_probe, create new aggr_kprobe. */
> ap = kzalloc(sizeof(struct kprobe), GFP_KERNEL);
Hmm, if we catch the second registration here, it's too late. At
register_kprobe(), we initialized some field of kprobe before calling
register_aggr_kprobe(). Especially kprobe.list is cleared.
And this can't check the case that 'p' is already registered on
an aggr kprobe.
Thus, I think some initialization paths should be changed (kp.flag
field is initialized too early, yearh, that's my mistake),
and also, you will need to use get_valid_kprobe() to check the kprobe
has not been registered.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Frederic W. <fwe...@gm...> - 2009-09-14 02:23:06
|
On Fri, Sep 11, 2009 at 05:48:24PM -0400, Masami Hiramatsu wrote: > Frederic Weisbecker wrote: >> >> I've tested the above example and it works very well, >> so I've applied this set, and the previous pending patches >> in: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git \ >> tracing/kprobes > > I had a bug on that tree when I did Step.1 > > > (Step.1) Define new events under new group > > > > $ echo p:mygroup/myprobe do_sys_open dfd=a0 filename=a1 flags=a2 mode=a3 \ > > > /debug/tracing/kprobes_events > > $ echo r:mygroup/myretprobe do_sys_open rv >> /debug/tracing/kprobes_events > > $ cat /debug/tracing/kprobes_events > > p:myprobe do_sys_open+0 dfd=a0 filename=a1 flags=a2 mode=a3 > > r:myretprobe do_sys_open+0 rv=rv > > It seems that you forget to pull PATCH 7/7 of my previous series. > And also I've found my probe registration order is buggy. > I'll fix that. > > Thank you, Sorry :-s Applied, thanks. |
|
From: Ananth N M. <an...@in...> - 2009-09-13 10:07:39
|
On Fri, Sep 11, 2009 at 05:12:54AM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 10, 2009 at 07:53:30PM -0400, Masami Hiramatsu wrote:
...
> Is it possible to have two kprobes having the exact same
> properties? (pointing to the same address, having the same
> probe handlers, etc...)
Yes, this is possible with two *different* kprobes. However, we have a bug
with the current code where there is insufficient scaffolding to prevent
re-registration of the same kprobe. Here is a patch...
---
Prevent re-registration of the same kprobe. Current code allows this,
albeit with disastrous consequences. Its not a common case, but should
be flagged nonetheless.
Signed-off-by: Ananth N Mavinakayanahalli <an...@in...>
---
kernel/kprobes.c | 3 +++
1 file changed, 3 insertions(+)
Index: ptrace-10sep/kernel/kprobes.c
===================================================================
--- ptrace-10sep.orig/kernel/kprobes.c
+++ ptrace-10sep/kernel/kprobes.c
@@ -589,6 +589,9 @@ static int __kprobes register_aggr_kprob
int ret = 0;
struct kprobe *ap = old_p;
+ if (old_p == p)
+ /* Attempt to re-register the same kprobe.. fail */
+ return -EINVAL;
if (old_p->pre_handler != aggr_pre_handler) {
/* If old_p is not an aggr_probe, create new aggr_kprobe. */
ap = kzalloc(sizeof(struct kprobe), GFP_KERNEL);
|
|
From: Masami H. <mhi...@re...> - 2009-09-12 01:20:51
|
Christoph Hellwig wrote: > On Fri, Sep 11, 2009 at 09:50:13PM +0200, Mark Wielaard wrote: >> On Fri, 2009-09-11 at 15:06 -0400, Christoph Hellwig wrote: >>> On Fri, Sep 11, 2009 at 03:03:35PM -0400, Frank Ch. Eigler wrote: >>>> Frederic Weisbecker<fwe...@gm...> writes: >>>> >>>>> [...] I'm really looking forward seeing this C expression-like >>>>> kprobe creation tool. It seems powerful enough to replace printk + >>>>> kernel rebuild. No need anymore to write some printk to debug, >>>>> worrying, [...] >>>> >>>> To a large extent, systemtap had delivered this already some years >>>> ago, including the cushy ponies dancing in the sunlight. While such >>>> low-level machinery is fine, some of our experience indicates that it >>>> is dramatically easier to use if high-level, symbolic, debugging data >>>> is used to compute probe locations and variable names/types/locations. >>> >>> No, systemtap has been for years failing to delivers this in a way that >>> it could be usefully integrated into the kernel. >> >> You are saying "No" to a claim Frank didn't even make. > > He said systemtap had delivered it, which is not the case. It has > implemented it, but not actually delivered it in a useful way. I assume that he has meant that systemtap had delivered to end-users, and that is certainly true. Systemtap has been released and distributed with many distributions. I know there are many systemtap users nowadays. However, indeed, that was not so useful way especially for kernel developers. Systemtap developers have also recognized this problem, and we are trying to fix it for good relationship with upstream kernel. I'd like to make my work useful not only for upstream kernel and ftrace, but also for systemtap. Through this enhancement, knowledge and experiences of systemtap will be transferred to upstream. I think systemtap can also use it by re-implementing on in-kernel tracing functions. I believe we can collaborate on this work, at least on stabilizing tracing functions. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |