You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Ingo M. <mi...@el...> - 2009-11-17 06:01:53
|
* Masami Hiramatsu <mhi...@re...> wrote: > > - signal IPI/wakeup events > > All signals might be used for IPI, isn't it? :-) I mean, to analyze the various dynamic delivery details of how a signal send affects a target task: 1) which task/PID was selected to be woken 2) if the task got woken (from sleep) due to the signal sending 3) if it was already woken, whether it needed an IPI via kick_process() What proportion of signals were wakeups and what proportion hit an already running task is a relevant question to ask when analyzing the performance characteristics of signals. Ingo |
|
From: Masami H. <mhi...@re...> - 2009-11-16 23:46:00
|
Roland McGrath wrote: >> Hmm, actually, trace_signal_send() doesn't record the return value. > > Is that because it's called before the action really happens? > Is it important that it be called beforehand? If it's called > afterwards, it's easy to pass the return value. I'm not so sure why signal sending events was put beforehand. However, I assume that original intent might be recording the *timing* of all signal generation (including SIGSTOP/CONT). >> So, what about trace_signal_overflow() for RT-signals and >> trace_signal_loss_info() for non-RT? > > Really you can distinguish those just by looking at sig and info, so > perhaps a single tracepoint is enough. Ah, right :-) > I guess it really depends on what > filtering you would want and how inconvenient it is to have to apply that > filtering. Having these two distinct tracepoints lets you trivially trace > only "silent information loss" without seeing the events where userland > gets full information (if applications are paying attention). > > If you want to have a full suite of tracepoints where each one covers one > unambiguous corner of the semantics, then there are more than these just > for sending. e.g. see below. As Ingo said, I think this kind of finegrained events are optional. I don't think we really need these events soon. IMHO, just adding signal-loss event is enough at the first step. But anyway, thank you so much for suggesting those tracepoints! Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Masami H. <mhi...@re...> - 2009-11-16 23:06:58
|
Show symbol name if insn decoder test find a difference.
This will help us to find out where the issue is.
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Ingo Molnar <mi...@el...>
Cc: Stephen Rothwell <sf...@ca...>
Cc: Randy Dunlap <rd...@xe...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Jim Keniston <jke...@us...>
---
arch/x86/tools/distill.awk | 5 +++++
arch/x86/tools/test_get_len.c | 10 +++++++++-
2 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/arch/x86/tools/distill.awk b/arch/x86/tools/distill.awk
index d433619..c13c0ee 100644
--- a/arch/x86/tools/distill.awk
+++ b/arch/x86/tools/distill.awk
@@ -15,6 +15,11 @@ BEGIN {
fwait_str="9b\tfwait"
}
+/^ *[0-9a-f]+ <[^>]*>:/ {
+ # Symbol entry
+ printf("%s%s\n", $2, $1)
+}
+
/^ *[0-9a-f]+:/ {
if (split($0, field, "\t") < 3) {
# This is a continuation of the same insn.
diff --git a/arch/x86/tools/test_get_len.c b/arch/x86/tools/test_get_len.c
index 5743e51..af75e07 100644
--- a/arch/x86/tools/test_get_len.c
+++ b/arch/x86/tools/test_get_len.c
@@ -110,7 +110,7 @@ static void parse_args(int argc, char **argv)
int main(int argc, char **argv)
{
- char line[BUFSIZE];
+ char line[BUFSIZE], sym[BUFSIZE] = "<unknown>";
unsigned char insn_buf[16];
struct insn insn;
int insns = 0, c;
@@ -122,6 +122,12 @@ int main(int argc, char **argv)
int nb = 0;
unsigned int b;
+ if (line[0] == '<') {
+ /* Symbol line */
+ strcpy(sym, line);
+ continue;
+ }
+
insns++;
memset(insn_buf, 0, 16);
strcpy(copy, line);
@@ -145,6 +151,8 @@ int main(int argc, char **argv)
insn_init(&insn, insn_buf, x86_64);
insn_get_length(&insn);
if (insn.length != nb) {
+ fprintf(stderr, "Error: %s found a difference at %s\n",
+ prog, sym);
fprintf(stderr, "Error: %s", line);
fprintf(stderr, "Error: objdump says %d bytes, but "
"insn_get_length() says %d\n", nb,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-11-16 23:06:49
|
Add verbose option to insn decoder test. This dumps decoded
instruction when building kernel with V=1.
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Ingo Molnar <mi...@el...>
Cc: Stephen Rothwell <sf...@ca...>
Cc: Randy Dunlap <rd...@xe...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Jim Keniston <jke...@us...>
---
arch/x86/tools/Makefile | 9 ++++-
arch/x86/tools/test_get_len.c | 74 +++++++++++++++++++++++++++++++++++------
2 files changed, 71 insertions(+), 12 deletions(-)
diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
index 5e295d9..4688f90 100644
--- a/arch/x86/tools/Makefile
+++ b/arch/x86/tools/Makefile
@@ -1,6 +1,13 @@
PHONY += posttest
+
+ifeq ($(KBUILD_VERBOSE),1)
+ postest_verbose = -v
+else
+ postest_verbose =
+endif
+
quiet_cmd_posttest = TEST $@
- cmd_posttest = $(OBJDUMP) -d -j .text $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len $(CONFIG_64BIT)
+ cmd_posttest = $(OBJDUMP) -d -j .text $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len -$(CONFIG_64BIT) $(posttest_verbose)
posttest: $(obj)/test_get_len vmlinux
$(call cmd,posttest)
diff --git a/arch/x86/tools/test_get_len.c b/arch/x86/tools/test_get_len.c
index 376d338..5743e51 100644
--- a/arch/x86/tools/test_get_len.c
+++ b/arch/x86/tools/test_get_len.c
@@ -20,6 +20,7 @@
#include <stdio.h>
#include <string.h>
#include <assert.h>
+#include <unistd.h>
#define unlikely(cond) (cond)
@@ -36,11 +37,16 @@
*/
const char *prog;
+static int verbose;
+static int x86_64;
static void usage(void)
{
fprintf(stderr, "Usage: objdump -d a.out | awk -f distill.awk |"
- " %s [y|n](64bit flag)\n", prog);
+ " %s [-y|-n] [-v] \n", prog);
+ fprintf(stderr, "\t-y 64bit mode\n");
+ fprintf(stderr, "\t-n 32bit mode\n");
+ fprintf(stderr, "\t-v verbose mode\n");
exit(1);
}
@@ -50,6 +56,56 @@ static void malformed_line(const char *line, int line_nr)
exit(3);
}
+static void dump_field(FILE *fp, const char *name, const char *indent,
+ struct insn_field *field)
+{
+ fprintf(fp, "%s.%s = {\n", indent, name);
+ fprintf(fp, "%s\t.value = %d, bytes[] = {%x, %x, %x, %x},\n",
+ indent, field->value, field->bytes[0], field->bytes[1],
+ field->bytes[2], field->bytes[3]);
+ fprintf(fp, "%s\t.got = %d, .nbytes = %d},\n", indent,
+ field->got, field->nbytes);
+}
+
+static void dump_insn(FILE *fp, struct insn *insn)
+{
+ fprintf(fp, "Instruction = { \n");
+ dump_field(fp, "prefixes", "\t", &insn->prefixes);
+ dump_field(fp, "rex_prefix", "\t", &insn->rex_prefix);
+ dump_field(fp, "vex_prefix", "\t", &insn->vex_prefix);
+ dump_field(fp, "opcode", "\t", &insn->opcode);
+ dump_field(fp, "modrm", "\t", &insn->modrm);
+ dump_field(fp, "sib", "\t", &insn->sib);
+ dump_field(fp, "displacement", "\t", &insn->displacement);
+ dump_field(fp, "immediate1", "\t", &insn->immediate1);
+ dump_field(fp, "immediate2", "\t", &insn->immediate2);
+ fprintf(fp, "\t.attr = %x, .opnd_bytes = %d, .addr_bytes = %d,\n",
+ insn->attr, insn->opnd_bytes, insn->addr_bytes);
+ fprintf(fp, "\t.length = %d, .x86_64 = %d, .kaddr = %p}\n",
+ insn->length, insn->x86_64, insn->kaddr);
+}
+
+static void parse_args(int argc, char **argv)
+{
+ int c;
+ prog = argv[0];
+ while ((c = getopt(argc, argv, "ynv")) != -1) {
+ switch (c) {
+ case 'y':
+ x86_64 = 1;
+ break;
+ case 'n':
+ x86_64 = 0;
+ break;
+ case 'v':
+ verbose = 1;
+ break;
+ default:
+ usage();
+ }
+ }
+}
+
#define BUFSIZE 256
int main(int argc, char **argv)
@@ -57,15 +113,9 @@ int main(int argc, char **argv)
char line[BUFSIZE];
unsigned char insn_buf[16];
struct insn insn;
- int insns = 0;
- int x86_64 = 0;
-
- prog = argv[0];
- if (argc > 2)
- usage();
+ int insns = 0, c;
- if (argc == 2 && argv[1][0] == 'y')
- x86_64 = 1;
+ parse_args(argc, argv);
while (fgets(line, BUFSIZE, stdin)) {
char copy[BUFSIZE], *s, *tab1, *tab2;
@@ -97,8 +147,10 @@ int main(int argc, char **argv)
if (insn.length != nb) {
fprintf(stderr, "Error: %s", line);
fprintf(stderr, "Error: objdump says %d bytes, but "
- "insn_get_length() says %d (attr:%x)\n", nb,
- insn.length, insn.attr);
+ "insn_get_length() says %d\n", nb,
+ insn.length);
+ if (verbose)
+ dump_insn(stderr, &insn);
exit(2);
}
}
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-11-16 23:06:47
|
Since some instructions are not decoded correctly by objdump, it
may cause false positive error in insn decoder posttest. This
changes build error of insn decoder test to build warning.
Signed-off-by: Masami Hiramatsu <mhi...@re...>
Cc: Ingo Molnar <mi...@el...>
Cc: Stephen Rothwell <sf...@ca...>
Cc: Randy Dunlap <rd...@xe...>
Cc: H. Peter Anvin <hp...@zy...>
Cc: Jim Keniston <jke...@us...>
---
arch/x86/tools/test_get_len.c | 17 +++++++++++------
1 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/tools/test_get_len.c b/arch/x86/tools/test_get_len.c
index af75e07..d8214dc 100644
--- a/arch/x86/tools/test_get_len.c
+++ b/arch/x86/tools/test_get_len.c
@@ -114,6 +114,7 @@ int main(int argc, char **argv)
unsigned char insn_buf[16];
struct insn insn;
int insns = 0, c;
+ int warnings = 0;
parse_args(argc, argv);
@@ -151,18 +152,22 @@ int main(int argc, char **argv)
insn_init(&insn, insn_buf, x86_64);
insn_get_length(&insn);
if (insn.length != nb) {
- fprintf(stderr, "Error: %s found a difference at %s\n",
+ warnings++;
+ fprintf(stderr, "Warning: %s found difference at %s\n",
prog, sym);
- fprintf(stderr, "Error: %s", line);
- fprintf(stderr, "Error: objdump says %d bytes, but "
+ fprintf(stderr, "Warning: %s", line);
+ fprintf(stderr, "Warning: objdump says %d bytes, but "
"insn_get_length() says %d\n", nb,
insn.length);
if (verbose)
dump_insn(stderr, &insn);
- exit(2);
}
}
- fprintf(stderr, "Succeed: decoded and checked %d instructions\n",
- insns);
+ if (warnings)
+ fprintf(stderr, "Warning: decoded and checked %d"
+ " instructions with %d warnings\n", insns, warnings);
+ else
+ fprintf(stderr, "Succeed: decoded and checked %d"
+ " instructions\n", insns);
return 0;
}
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Masami H. <mhi...@re...> - 2009-11-16 23:06:30
|
Here are the patches which update x86 instruction decoder build-time test.
As Stephen reported on linux-next, sometimes objdump decodes bad
instructions as normal. This will cause a false positive result on
x86 insn decoder test. This patches update the test as below;
- Show more information with V=1
- Show in which symbol the difference places.
- Just warning instead of build failure.
Thank you,
---
Masami Hiramatsu (3):
x86: insn decoder test shows build warning
x86: Show symbol name if insn decoder test failed
x86: Add verbose option to insn decoder test
arch/x86/tools/Makefile | 9 +++-
arch/x86/tools/distill.awk | 5 ++
arch/x86/tools/test_get_len.c | 99 ++++++++++++++++++++++++++++++++++-------
3 files changed, 95 insertions(+), 18 deletions(-)
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Roland M. <ro...@re...> - 2009-11-16 23:01:02
|
> Hmm, actually, trace_signal_send() doesn't record the return value.
Is that because it's called before the action really happens?
Is it important that it be called beforehand? If it's called
afterwards, it's easy to pass the return value.
> So, what about trace_signal_overflow() for RT-signals and
> trace_signal_loss_info() for non-RT?
Really you can distinguish those just by looking at sig and info, so
perhaps a single tracepoint is enough. I guess it really depends on what
filtering you would want and how inconvenient it is to have to apply that
filtering. Having these two distinct tracepoints lets you trivially trace
only "silent information loss" without seeing the events where userland
gets full information (if applications are paying attention).
If you want to have a full suite of tracepoints where each one covers one
unambiguous corner of the semantics, then there are more than these just
for sending. e.g. see below.
Thanks,
Roland
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -838,8 +841,10 @@ static int __send_signal(int sig, struct
assert_spin_locked(&t->sighand->siglock);
- if (!prepare_signal(sig, t, from_ancestor_ns))
+ if (!prepare_signal(sig, t, from_ancestor_ns)) {
+ trace_signal_generate_ignored(sig, group, info);
return 0;
+ }
pending = group ? &t->signal->shared_pending : &t->pending;
/*
@@ -847,8 +852,10 @@ static int __send_signal(int sig, struct
* exactly one non-rt signal, so that we can get more
* detailed information about the cause of the signal.
*/
- if (legacy_queue(pending, sig))
+ if (legacy_queue(pending, sig)) {
+ trace_signal_generate_dropped_duplicate(sig, group, info);
return 0;
+ }
/*
* fast-pathed signals for kernel-internal things like SIGSTOP
* or SIGKILL.
@@ -896,12 +903,22 @@ static int __send_signal(int sig, struct
break;
}
} else if (!is_si_special(info)) {
- if (sig >= SIGRTMIN && info->si_code != SI_USER)
- /*
- * Queue overflow, abort. We may abort if the signal was rt
- * and sent by user using something other than kill().
- */
+ if (sig >= SIGRTMIN && info->si_code != SI_USER) {
+ /*
+ * Queue overflow, abort. We may abort if the
+ * signal was rt and sent by user using something
+ * other than kill().
+ */
+ trace_signal_generate_overflow_fail(sig, group, info);
return -EAGAIN;
+ } else {
+ /*
+ * This is a silent loss of information. We still
+ * send the signal, but the *info bits are lost.
+ */
+ trace_signal_generate_overflow_lose_info(sig, group,
+ info);
+ }
}
out_set:
|
|
From: Masami H. <mhi...@re...> - 2009-11-16 22:42:06
|
Roland McGrath wrote:
>>> - signal loss events (queue overflow)
>>
>> Perhaps, this event is only for rt-signals, since
>> legacy signals just overwritten if it was sent.
>
> Not exactly. Nothing is ever "overwritten". If a non-RT signal is already
> pending, then you just leave the existing queue elements alone--i.e. the
> first one isn't overwritten, rather the second one is dropped. But this is
> not really the point.
>
> The "queue overflow" happens in two ways. For RT signals it really is a
> "signal loss" event--but that's also reported to the sender as -EAGAIN. So
> a signal-generation tracepoint that reports the return value would already
> cover that in a way.
>
> For non-RT signals, a new signal is never lost. But __sigqueue_alloc() can
> still fail. In this case, you get no queue element and thus no siginfo_t
> stored, so you can lose some information about the signal. You don't lose
> the signal itself, but will later dequeue it with an all-zeros siginfo_t.
> While calling this a "signal loss" is inaccurate, it is indeed a silent
> failure of sorts (unlike the RT signal case, which the userland caller
> knows about from the return value).
Hmm, actually, trace_signal_send() doesn't record the return value.
So, what about trace_signal_overflow() for RT-signals and
trace_signal_loss_info() for non-RT?
e.g.
@@ -918,12 +918,15 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
break;
}
} else if (!is_si_special(info)) {
- if (sig >= SIGRTMIN && info->si_code != SI_USER)
+ if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
* Queue overflow, abort. We may abort if the signal was rt
* and sent by user using something other than kill().
*/
+ trace_signal_overflow(sig, t);
return -EAGAIN;
+ }
+ trace_signal_loss_info(sig, info);
}
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Roland M. <ro...@re...> - 2009-11-16 22:12:36
|
(I CC'd Oleg, who doesn't care much about tracepoints, but always stays up
to speed about all things related to signals.)
> > - signal IPI/wakeup events
>
> All signals might be used for IPI, isn't it? :-)
> Or, did you mean SIGSTOP/SIGCONT pare?
I suspect he means something approximating signal_wake_up() calls. If a
signal is blocked, ignored, already pending, etc., then the "sending" event
does not lead to any kind of wakeup or interrupt.
i.e. perhaps something like:
@@ -529,8 +529,11 @@ void signal_wake_up(struct task_struct *t, int resume)
mask = TASK_INTERRUPTIBLE;
if (resume)
mask |= TASK_WAKEKILL;
- if (!wake_up_state(t, mask))
+ trace_signal_wakeup(t, resume);
+ if (!wake_up_state(t, mask)) {
kick_process(t);
+ trace_signal_ipi(t, resume);
+ }
OTOH, kick_process() is only called from here.
Perhaps tracepoints in the sched layer can cover these.
Anyway, Ingo can be more precise about what he has in mind.
> > - signal loss events (queue overflow)
>
> Perhaps, this event is only for rt-signals, since
> legacy signals just overwritten if it was sent.
Not exactly. Nothing is ever "overwritten". If a non-RT signal is already
pending, then you just leave the existing queue elements alone--i.e. the
first one isn't overwritten, rather the second one is dropped. But this is
not really the point.
The "queue overflow" happens in two ways. For RT signals it really is a
"signal loss" event--but that's also reported to the sender as -EAGAIN. So
a signal-generation tracepoint that reports the return value would already
cover that in a way.
For non-RT signals, a new signal is never lost. But __sigqueue_alloc() can
still fail. In this case, you get no queue element and thus no siginfo_t
stored, so you can lose some information about the signal. You don't lose
the signal itself, but will later dequeue it with an all-zeros siginfo_t.
While calling this a "signal loss" is inaccurate, it is indeed a silent
failure of sorts (unlike the RT signal case, which the userland caller
knows about from the return value).
Thanks,
Roland
|
|
From: Masami H. <mhi...@re...> - 2009-11-16 21:52:01
|
Ingo Molnar wrote: >> Especially if you call this "get" rather than "deliver", there is >> another place that should invoke this tracepoint (or perhaps a third >> one). sys_rt_sigtimedwait "gets" a signal without delivering it. In >> POSIX terminology this is called "accepting" the signal: the three >> things that can happen in the life of a signal are "generate", >> "deliver", and "accept". If you are trying to match up what happened >> to a signal generated by kill() or whatnot, then you want to notice >> both delivery and acceptance as the complementary event. >> >> (And again I have no clue why this signal stuff should be called >> "sched" at all.) > > it shouldnt be called 'sched' - it should go into 'events/signal.h'. > > But we also need fuller coverage than this. Coredumps and signal > delivery events are just a small part of all things signals, we also > want: That's a good idea. I'll put coredump and signal related events into events/signal.h. > > - signal generation events (send_sig*() variants) Those events finally calls __send_signal(), so I think trace_signal_send() can trace those events. > - signal IPI/wakeup events All signals might be used for IPI, isn't it? :-) Or, did you mean SIGSTOP/SIGCONT pare? > - signal loss events (queue overflow) Perhaps, this event is only for rt-signals, since legacy signals just overwritten if it was sent. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Roland M. <ro...@re...> - 2009-11-14 02:35:54
|
> Some sort of flexible core dump server perhaps, driven by user-space > logic? (vastly more flexible than coredump-filter or coredump ulimits > are today) Tracepoints are not a very good fit for that. I think people working on things like that use the corename=|command support. Thanks, Roland |
|
From: Ingo M. <mi...@el...> - 2009-11-14 01:04:14
|
* Roland McGrath <ro...@re...> wrote: > This is orthogonal to the core-dump tracepoint, I don't see why you > call them a unified patch series. > > The proper name for this event is "signal delivery". But since the > proper name for "send_signal" is "signal generation", I suppose "get" > is analogously improper to the existing "send" tracepoint. ;-) I'd suggest to add include/trace/events/signal.h and put these tracepoints there. > Especially if you call this "get" rather than "deliver", there is > another place that should invoke this tracepoint (or perhaps a third > one). sys_rt_sigtimedwait "gets" a signal without delivering it. In > POSIX terminology this is called "accepting" the signal: the three > things that can happen in the life of a signal are "generate", > "deliver", and "accept". If you are trying to match up what happened > to a signal generated by kill() or whatnot, then you want to notice > both delivery and acceptance as the complementary event. > > (And again I have no clue why this signal stuff should be called > "sched" at all.) it shouldnt be called 'sched' - it should go into 'events/signal.h'. But we also need fuller coverage than this. Coredumps and signal delivery events are just a small part of all things signals, we also want: - signal generation events (send_sig*() variants) - signal IPI/wakeup events - signal loss events (queue overflow) - [ optional: signal blocking/unblocking events ] - [ optional: specific signal handler installation/deinstallation ] That's what we generally require of new events: they should form a coherent whole, a logical set of events that 'make sense' and explain the workings of a subsystem on a given level of detail. How finegrained or coarse the level of details is is an open question, but if a given level of detail has been picked, we want completeness on that level. So for example in the list above, the '[ optional ]' events are finegrained ones that could be left out of the initial version. We've done this consistently for all subsystems that added tracepoints: scheduling, locking, timers, workqueues, block IO, SLAB, IRQs, etc., and we want a similar approach for newly covered subsystems (such as signals) as well. Thanks, Ingo |
|
From: Ingo M. <mi...@el...> - 2009-11-14 00:59:57
|
* Roland McGrath <ro...@re...> wrote: > > Do you know about exact usecases where these tracepoints would be > > utilized? Would be interesting (and relevant) to list them. > > Masami's colleagues do have something in particular in mind, but I'm > not sure whether they intend to use systemtap as part of the > implementation of that or not. Some sort of flexible core dump server perhaps, driven by user-space logic? (vastly more flexible than coredump-filter or coredump ulimits are today) Ingo |
|
From: Masami H. <mhi...@re...> - 2009-11-14 00:30:42
|
Roland McGrath wrote: > This is orthogonal to the core-dump tracepoint, I don't see why you > call them a unified patch series. Agreed, I'll split them. > The proper name for this event is "signal delivery". But since the > proper name for "send_signal" is "signal generation", I suppose "get" > is analogously improper to the existing "send" tracepoint. ;-) Ah, I see. 'deliver_signal' is good to me :-). Thank you, > Especially if you call this "get" rather than "deliver", there is > another place that should invoke this tracepoint (or perhaps a third > one). sys_rt_sigtimedwait "gets" a signal without delivering it. In > POSIX terminology this is called "accepting" the signal: the three > things that can happen in the life of a signal are "generate", > "deliver", and "accept". If you are trying to match up what happened to > a signal generated by kill() or whatnot, then you want to notice both > delivery and acceptance as the complementary event. > > (And again I have no clue why this signal stuff should be called > "sched" at all.) > > > Thanks, > Roland > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Masami H. <mhi...@re...> - 2009-11-14 00:27:06
|
Roland McGrath wrote: >> Oh, so SystemTap is the motivator of these tracepoints? > > Nope. My suggestion about arguments to pass was just generic based on the > innate context of this tracepoint. > >> Do you know about exact usecases where these tracepoints would be >> utilized? Would be interesting (and relevant) to list them. > > Masami's colleagues do have something in particular in mind, > but I'm not sure whether they intend to use systemtap as part > of the implementation of that or not. Yeah, we'd like to use this tracepoint to analyze coredump miss-configuration. Sometimes, users miss-configure dump-filter and rlimit, that will cause system-slowdown when several processes, which share a large amount of memory among them (e.g. database), start core-dump with shared memory. And then, it can cause a system-switching on HA cluster system. With this tracepoint, we can analyze why the coredumps were slow. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhi...@re... |
|
From: Roland M. <ro...@re...> - 2009-11-14 00:07:08
|
> Oh, so SystemTap is the motivator of these tracepoints? Nope. My suggestion about arguments to pass was just generic based on the innate context of this tracepoint. > Do you know about exact usecases where these tracepoints would be > utilized? Would be interesting (and relevant) to list them. Masami's colleagues do have something in particular in mind, but I'm not sure whether they intend to use systemtap as part of the implementation of that or not. Thanks, Roland |
|
From: Ingo M. <mi...@el...> - 2009-11-14 00:02:54
|
* Roland McGrath <ro...@re...> wrote: > I can't really see what this has to do with "sched" to warrant that > name. But, whatever. That's a misnomer indeed. I'd suggest to introduce a separate 'signal' subsystem category. > Note that you put the tracepoint where it won't get called in the > various cases where no dump is really being made because of > RLIMIT_CORE or file failures. I suspect you would like to get those > reported. (Perhaps especially so, since there won't be any file > around to notice later.) Yeah, good point. > Also, it seems nice to give the tracepoint the chance to look at the > actual open file in case a fancy one wants to do that. Oh, so SystemTap is the motivator of these tracepoints? Do you know about exact usecases where these tracepoints would be utilized? Would be interesting (and relevant) to list them. Ingo |
|
From: Masami H. <mhi...@re...> - 2009-11-14 00:00:51
|
Roland McGrath wrote:
> I can't really see what this has to do with "sched" to warrant that name.
> But, whatever.
>
> Note that you put the tracepoint where it won't get called in the various
> cases where no dump is really being made because of RLIMIT_CORE or file
> failures. I suspect you would like to get those reported. (Perhaps
> especially so, since there won't be any file around to notice later.)
Exactly, yes.
> Also, it seems nice to give the tracepoint the chance to look at the actual
> open file in case a fancy one wants to do that.
Ah, that's very nice to me! thanks!
>
> e.g.
>
> diff --git a/fs/exec.c b/fs/exec.c
> index ba112bd..0000000 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1822,9 +1822,7 @@ void do_coredump(long signr, int exit_co
> ispipe = format_corename(corename, signr);
> unlock_kernel();
>
> - if ((!ispipe)&& (core_limit< binfmt->min_coredump))
> - goto fail_unlock;
> -
> + file = NULL;
> if (ispipe) {
> if (core_limit == 0) {
> /*
> @@ -1845,7 +1843,7 @@ void do_coredump(long signr, int exit_co
> "Process %d(%s) has RLIMIT_CORE set to 0\n",
> task_tgid_vnr(current), current->comm);
> printk(KERN_WARNING "Aborting core\n");
> - goto fail_unlock;
> + goto nopipe;
> }
>
> dump_count = atomic_inc_return(&core_dump_count);
> @@ -1853,14 +1851,14 @@ void do_coredump(long signr, int exit_co
> printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
> task_tgid_vnr(current), current->comm);
> printk(KERN_WARNING "Skipping core dump\n");
> - goto fail_dropcount;
> + goto nopipe;
> }
>
> helper_argv = argv_split(GFP_KERNEL, corename+1,&helper_argc);
> if (!helper_argv) {
> printk(KERN_WARNING "%s failed to allocate memory\n",
> __func__);
> - goto fail_dropcount;
> + goto nopipe;
> }
>
> core_limit = RLIM_INFINITY;
> @@ -1870,13 +1868,19 @@ void do_coredump(long signr, int exit_co
> &file)) {
> printk(KERN_INFO "Core dump to %s pipe failed\n",
> corename);
> - goto fail_dropcount;
> + goto nopipe;
> }
> - } else
> + } else if (core_limit>= binfmt->min_coredump) {
> file = filp_open(corename,
> O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
> 0600);
> - if (IS_ERR(file))
> + }
> +
> +nopipe:
> + trace_process_coredump((int) signr, core_limit, mm_flags,
> + corename, file);
> +
> + if (!file || IS_ERR(file))
> goto fail_dropcount;
> inode = file->f_path.dentry->d_inode;
> if (inode->i_nlink> 1)
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Roland M. <ro...@re...> - 2009-11-13 23:53:56
|
This is orthogonal to the core-dump tracepoint, I don't see why you call them a unified patch series. The proper name for this event is "signal delivery". But since the proper name for "send_signal" is "signal generation", I suppose "get" is analogously improper to the existing "send" tracepoint. ;-) Especially if you call this "get" rather than "deliver", there is another place that should invoke this tracepoint (or perhaps a third one). sys_rt_sigtimedwait "gets" a signal without delivering it. In POSIX terminology this is called "accepting" the signal: the three things that can happen in the life of a signal are "generate", "deliver", and "accept". If you are trying to match up what happened to a signal generated by kill() or whatnot, then you want to notice both delivery and acceptance as the complementary event. (And again I have no clue why this signal stuff should be called "sched" at all.) Thanks, Roland |
|
From: Masami H. <mhi...@re...> - 2009-11-13 23:45:37
|
Ingo Molnar wrote:
>
> * Andrew Morton<ak...@li...> wrote:
>
>> On Fri, 13 Nov 2009 17:52:27 -0500
>> Masami Hiramatsu<mhi...@re...> wrote:
>>
>>> Pass mm->flags to binfmt core_dump for bitflag consistency.
>>> Since mm->flags bit flags is not protected by locks, it will be
>>> changed while dumping core. This patch copies mm->flags to a
>>> mm_flags local variable at the beginning of do_coredump() and
>>> use it while dumping. mm_flags also includes dump_filter which
>>> filters elf sections from core file in elf_core_dump().
>>> So, this patch also passes mm_flags to each binfmt->core_dump().
>>
>> I can kind-of guess the answer, but it would be much more reliable if
>> we were to hear this from yourself:
>>
>> Why did you write this patch? What problem is being observed?
>
> i'm not Masami so i'm only guessing that while writing the tracepoint a
> race got noticed but that otherwise there's no big practical effect,
> 'just' a cleanliness problem fixed.
Right, I'd like to add a tracepoint of coredump event with
its information. And also, this patch may fix a small
dumpable inconsistency issue below code
---
1787 if (mm->core_state || !get_dumpable(mm)) { <- (1)
1788 up_write(&mm->mmap_sem);
1789 put_cred(cred);
1790 goto fail;
1791 }
1792
[...]
1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ <-(2)
1799 flag = O_EXCL; /* Stop rewrite attacks */
1800 cred->fsuid = 0; /* Dump root private */
1801 }
Since dumpable bits are not protected by lock, there is a
chance to change these bits between (1) and (2).
This patch copies mm->flags to a local variable and check the variable
for consistency.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhi...@re...
|
|
From: Roland M. <ro...@re...> - 2009-11-13 23:39:30
|
I can't really see what this has to do with "sched" to warrant that name.
But, whatever.
Note that you put the tracepoint where it won't get called in the various
cases where no dump is really being made because of RLIMIT_CORE or file
failures. I suspect you would like to get those reported. (Perhaps
especially so, since there won't be any file around to notice later.)
Also, it seems nice to give the tracepoint the chance to look at the actual
open file in case a fancy one wants to do that.
e.g.
diff --git a/fs/exec.c b/fs/exec.c
index ba112bd..0000000 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1822,9 +1822,7 @@ void do_coredump(long signr, int exit_co
ispipe = format_corename(corename, signr);
unlock_kernel();
- if ((!ispipe) && (core_limit < binfmt->min_coredump))
- goto fail_unlock;
-
+ file = NULL;
if (ispipe) {
if (core_limit == 0) {
/*
@@ -1845,7 +1843,7 @@ void do_coredump(long signr, int exit_co
"Process %d(%s) has RLIMIT_CORE set to 0\n",
task_tgid_vnr(current), current->comm);
printk(KERN_WARNING "Aborting core\n");
- goto fail_unlock;
+ goto nopipe;
}
dump_count = atomic_inc_return(&core_dump_count);
@@ -1853,14 +1851,14 @@ void do_coredump(long signr, int exit_co
printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",
task_tgid_vnr(current), current->comm);
printk(KERN_WARNING "Skipping core dump\n");
- goto fail_dropcount;
+ goto nopipe;
}
helper_argv = argv_split(GFP_KERNEL, corename+1, &helper_argc);
if (!helper_argv) {
printk(KERN_WARNING "%s failed to allocate memory\n",
__func__);
- goto fail_dropcount;
+ goto nopipe;
}
core_limit = RLIM_INFINITY;
@@ -1870,13 +1868,19 @@ void do_coredump(long signr, int exit_co
&file)) {
printk(KERN_INFO "Core dump to %s pipe failed\n",
corename);
- goto fail_dropcount;
+ goto nopipe;
}
- } else
+ } else if (core_limit >= binfmt->min_coredump) {
file = filp_open(corename,
O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
0600);
- if (IS_ERR(file))
+ }
+
+nopipe:
+ trace_process_coredump((int) signr, core_limit, mm_flags,
+ corename, file);
+
+ if (!file || IS_ERR(file))
goto fail_dropcount;
inode = file->f_path.dentry->d_inode;
if (inode->i_nlink > 1)
|
|
From: Roland M. <ro...@re...> - 2009-11-13 23:30:33
|
> this would fix the (probably harmless) race too, but isnt the whole > approach taken by the patch more robust, i.e. to take a snapshot of > mm->flags value and pass it along coredump processing? Yes, I think it is a fine thing to do. I'm just being explicit about what I think the (only) concrete issues are motivating a change. > we should probably introduce a coredump parameter struct, and pass that > along: Also sounds fine to me. Thanks, Roland |
|
From: Ingo M. <mi...@el...> - 2009-11-13 23:25:17
|
* Andrew Morton <ak...@li...> wrote: > On Fri, 13 Nov 2009 17:52:27 -0500 > Masami Hiramatsu <mhi...@re...> wrote: > > > Pass mm->flags to binfmt core_dump for bitflag consistency. > > Since mm->flags bit flags is not protected by locks, it will be > > changed while dumping core. This patch copies mm->flags to a > > mm_flags local variable at the beginning of do_coredump() and > > use it while dumping. mm_flags also includes dump_filter which > > filters elf sections from core file in elf_core_dump(). > > So, this patch also passes mm_flags to each binfmt->core_dump(). > > I can kind-of guess the answer, but it would be much more reliable if > we were to hear this from yourself: > > Why did you write this patch? What problem is being observed? i'm not Masami so i'm only guessing that while writing the tracepoint a race got noticed but that otherwise there's no big practical effect, 'just' a cleanliness problem fixed. Ingo |
|
From: Ingo M. <mi...@el...> - 2009-11-13 23:23:26
|
* Roland McGrath <ro...@re...> wrote:
> To clarify, proc_coredump_filter_write() is the one place that can
> change mm->flags during a core dump. I don't think any other is
> possible while all the other tasks sharing that mm are prevented from
> running. Is there any other way that mm->flags might change during
> do_coredump()?
>
> I don't see anything wrong with this change. But (assuming that is
> the only case), there is another approach we could take instead. That
> is, have proc_coredump_filter_write() do:
>
> down_read(&mm->mmap_sem);
> ret = mm->core_state ? -EBUSY : 0;
> up_read(&mm->mmap_sem);
this would fix the (probably harmless) race too, but isnt the whole
approach taken by the patch more robust, i.e. to take a snapshot of
mm->flags value and pass it along coredump processing?
That makes it evidently immutable in the future too. It also makes the
code a bit easier to read IMO - instead of get_dumpable() we use the
mm_flags.
The only other observation i have is that for this parameter set:
long signr, struct pt_regs *regs, struct file *file,
unsigned long limit, unsigned long mm_flags)
we should probably introduce a coredump parameter struct, and pass that
along:
struct coredump_params {
long signr;
struct pt_regs *regs;
struct file *file;
unsigned long limit;
unsigned long mm_flags;
}
Had this been done in the past this present patch would be a lot simpler
as well: we could have added mm_flags to coredump_params, instead of
having to propagate it through ~6 function interface surfaces.
Thanks,
Ingo
|
|
From: Roland M. <ro...@re...> - 2009-11-13 23:16:25
|
To clarify, proc_coredump_filter_write() is the one place that can change mm->flags during a core dump. I don't think any other is possible while all the other tasks sharing that mm are prevented from running. Is there any other way that mm->flags might change during do_coredump()? I don't see anything wrong with this change. But (assuming that is the only case), there is another approach we could take instead. That is, have proc_coredump_filter_write() do: down_read(&mm->mmap_sem); ret = mm->core_state ? -EBUSY : 0; up_read(&mm->mmap_sem); Thanks, Roland |