dle-develop Mailing List for Dependable Linux Effort (Page 41)

Brought to you by: h-aoki, hiramatu, s-oshima, y-sugita

dle-develop — Dependable Linux Effort mailing list for developers

You can subscribe to this list here.

2009	Jan	Feb	Mar	Apr	May (32)	Jun (66)	Jul (102)	Aug (78)	Sep (106)	Oct (137)	Nov (147)	Dec (147)
2010	Jan (71)	Feb (139)	Mar (86)	Apr (76)	May (57)	Jun (10)	Jul (12)	Aug (6)	Sep (8)	Oct (12)	Nov (12)	Dec (18)
2011	Jan (16)	Feb (19)	Mar (3)	Apr (1)	May (16)	Jun (17)	Jul (74)	Aug (22)	Sep (18)	Oct (24)	Nov (21)	Dec (30)
2012	Jan (31)	Feb (16)	Mar (22)	Apr (25)	May (18)	Jun (13)	Jul (83)	Aug (49)	Sep (20)	Oct (60)	Nov (35)	Dec (28)
2013	Jan (39)	Feb (61)	Mar (35)	Apr (21)	May (45)	Jun (56)	Jul (20)	Aug (9)	Sep (10)	Oct (31)	Nov (8)	Dec (4)
2014	Jan (6)	Feb (7)	Mar (7)	Apr (6)	May (4)	Jun (8)	Jul (5)	Aug (2)	Sep (4)	Oct (4)	Nov (11)	Dec (5)
2015	Jan (4)	Feb (4)	Mar (3)	Apr (4)	May (9)	Jun (4)	Jul (15)	Aug (8)	Sep (16)	Oct (18)	Nov (15)	Dec (7)
2016	Jan (20)	Feb (9)	Mar (15)	Apr (24)	May (16)	Jun (28)	Jul (22)	Aug (23)	Sep (18)	Oct (30)	Nov (40)	Dec (9)
2017	Jan (1)	Feb (8)	Mar (37)	Apr (26)	May (25)	Jun (46)	Jul (24)	Aug (9)	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 39 40 41 42 43 .. 99 > >> (Page 41 of 99)

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Don Z. <dz...@re...> - 2012-01-11 17:26:25

On Wed, Jan 11, 2012 at 03:28:11PM +0800, Chen Gong wrote:
> 于 2012/1/11 4:29, Seiji Aguchi 写道:
> >
> >>I agree with you. How about adding macros or something like WARN_ON(XX_ARCH) or
> >>Kconfig to limit its scope?
> >
> >Thank you for giving me your idea.
> >Your suggestions above will work for me because I'm a x86 user.
> >If Tony agrees to it, I can update my patch.
> >
> >But, I'm hesitating to add WARN_ON() or change Kconfig only for specific arch
> >because pstore aims for generic interface and this is related to its design.
> >Also, ramoops is going to use pstore now. It doesn't depend on x86.
> >I'm worried that ramoops users will complain about this change.
> >
> >So, I think a reasonable solution at this time is just adding some explanations
> >about smp_send_stop() to documentation as follows.
> >
> >Users can use pstore with their own responsibility and ask developers
> >if smp_send_stop() is reliable enough in panic situation on architecture they want to run.
> >
> >What do you think?
> >
> >---
> >  Documentation/ABI/testing/pstore |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> >
> >diff --git a/Documentation/ABI/testing/pstore b/Documentation/ABI/testing/pstore
> >index ff1df4e..5583729 100644
> >--- a/Documentation/ABI/testing/pstore
> >+++ b/Documentation/ABI/testing/pstore
> >@@ -11,6 +11,14 @@ Description:	Generic interface to platform dependent persistent storage.
> >  		of the console log is captured, but other interesting
> >  		data can also be saved.
> >
> >+		In case of panic, pstore is invoked after smp_send_stop()
> >+		,a function call stopping other cpus, so that we can get
> >+		logs simpler and cleaner with just one running cpu.
> >+
> >+		As for x86, smp_send_stop() is reliable enough to work in
> >+		panic situation. But we are not guaranteed that it works
> >+		reliably on other architectures.
> >+
> >  		# mount -t pstore -o kmsg_bytes=8000 - /dev/pstore
> >
> >  		$ ls -l /dev/pstore
> 
> The explanation is great. but In my opinion, I still insist that
> a WARN_ON() is necessary. What do you think, Tony and Don?

I guess I still don't understand why.  Who uses kmsg_dump besides x86?  It
seems like there was only 3 or 4 subsystems that were registering.

Cheers,
Don

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Chen G. <gon...@li...> - 2012-01-11 08:02:43

于 2012/1/11 4:29, Seiji Aguchi 写道:
>
>> I agree with you. How about adding macros or something like WARN_ON(XX_ARCH) or
>> Kconfig to limit its scope?
>
> Thank you for giving me your idea.
> Your suggestions above will work for me because I'm a x86 user.
> If Tony agrees to it, I can update my patch.
>
> But, I'm hesitating to add WARN_ON() or change Kconfig only for specific arch
> because pstore aims for generic interface and this is related to its design.
> Also, ramoops is going to use pstore now. It doesn't depend on x86.
> I'm worried that ramoops users will complain about this change.
>
> So, I think a reasonable solution at this time is just adding some explanations
> about smp_send_stop() to documentation as follows.
>
> Users can use pstore with their own responsibility and ask developers
> if smp_send_stop() is reliable enough in panic situation on architecture they want to run.
>
> What do you think?
>
> ---
>   Documentation/ABI/testing/pstore |    8 ++++++++
>   1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/ABI/testing/pstore b/Documentation/ABI/testing/pstore
> index ff1df4e..5583729 100644
> --- a/Documentation/ABI/testing/pstore
> +++ b/Documentation/ABI/testing/pstore
> @@ -11,6 +11,14 @@ Description:	Generic interface to platform dependent persistent storage.
>   		of the console log is captured, but other interesting
>   		data can also be saved.
>
> +		In case of panic, pstore is invoked after smp_send_stop()
> +		,a function call stopping other cpus, so that we can get
> +		logs simpler and cleaner with just one running cpu.
> +
> +		As for x86, smp_send_stop() is reliable enough to work in
> +		panic situation. But we are not guaranteed that it works
> +		reliably on other architectures.
> +
>   		# mount -t pstore -o kmsg_bytes=8000 - /dev/pstore
>
>   		$ ls -l /dev/pstore

The explanation is great. but In my opinion, I still insist that
a WARN_ON() is necessary. What do you think, Tony and Don?

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Seiji A. <sei...@hd...> - 2012-01-10 20:30:20

>I agree with you. How about adding macros or something like WARN_ON(XX_ARCH) or
>Kconfig to limit its scope?

Thank you for giving me your idea.
Your suggestions above will work for me because I'm a x86 user.
If Tony agrees to it, I can update my patch.

But, I'm hesitating to add WARN_ON() or change Kconfig only for specific arch 
because pstore aims for generic interface and this is related to its design.
Also, ramoops is going to use pstore now. It doesn't depend on x86.
I'm worried that ramoops users will complain about this change.

So, I think a reasonable solution at this time is just adding some explanations 
about smp_send_stop() to documentation as follows.

Users can use pstore with their own responsibility and ask developers 
if smp_send_stop() is reliable enough in panic situation on architecture they want to run.

What do you think?

---
 Documentation/ABI/testing/pstore |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/pstore b/Documentation/ABI/testing/pstore
index ff1df4e..5583729 100644
--- a/Documentation/ABI/testing/pstore
+++ b/Documentation/ABI/testing/pstore
@@ -11,6 +11,14 @@ Description:	Generic interface to platform dependent persistent storage.
 		of the console log is captured, but other interesting
 		data can also be saved.
 
+		In case of panic, pstore is invoked after smp_send_stop()
+		,a function call stopping other cpus, so that we can get
+		logs simpler and cleaner with just one running cpu.
+
+		As for x86, smp_send_stop() is reliable enough to work in
+		panic situation. But we are not guaranteed that it works
+		reliably on other architectures.
+
 		# mount -t pstore -o kmsg_bytes=8000 - /dev/pstore
 
 		$ ls -l /dev/pstore
-- 

Seiji

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Chen G. <gon...@li...> - 2012-01-10 03:40:52

于 2012/1/10 1:59, Seiji Aguchi 写道:
>>
>> I don't know how to prove something is hardened other than not seeing any
>> hangs or false reboots on in that piece of code.
>>
>
> I don't know how we can prove reliability of smp_send_stop() of all archs as well.
>
> Another possible solution is replacing smp_send_stop() with
> machine_crash_shutdown().
> This is a function call stopping other cpus reliably in panic situation
> so that kdump can work .
> But, I don't know whether it works in all archs.

machine_crash_shutdown is defined in only some Archs so it is obvious that it
can't be used in all platforms, BTW, this function is bracketed by CONFIG_KEXEC,
which means it can't be used without this macro. So it is not suitable for
this scenario.

> We need further discussion with Eric for checking this idea is feasible...
>
>
> I would like to go forward step by step rather than doing tough work at once.
> I think just moving kmsg_dump() below smp_send_stop() is a good start for improving pstore.

I agree with you. How about adding macros or something like WARN_ON(XX_ARCH) or
Kconfig to limit its scope?

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Seiji A. <sei...@hd...> - 2012-01-09 18:00:38

>
>I don't know how to prove something is hardened other than not seeing any
>hangs or false reboots on in that piece of code.
>

I don't know how we can prove reliability of smp_send_stop() of all archs as well.

Another possible solution is replacing smp_send_stop() with 
machine_crash_shutdown().
This is a function call stopping other cpus reliably in panic situation
so that kdump can work .
But, I don't know whether it works in all archs.
We need further discussion with Eric for checking this idea is feasible...


I would like to go forward step by step rather than doing tough work at once.
I think just moving kmsg_dump() below smp_send_stop() is a good start for improving pstore.

Seiji

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Don Z. <dz...@re...> - 2012-01-05 21:01:43

On Thu, Jan 05, 2012 at 03:10:25PM -0500, Seiji Aguchi wrote:
> 
> >Aren't you worried about the comment about smp_send_stop() not
> >being hardened to work in a panic situation?
> > 	/*
> > 	 * Note smp_send_stop is the usual smp shutdown function, which
> > 	 * unfortunately means it may not be hardened to work in a panic
> 
> This comment is wrong because Don improved smp_send_stop() by switching REBOOT_VECTOR to NMI.
> And his patch has already merged to linux-next tree.

I only fixed x86.  Who knows what the other arches do..

I don't know how to prove something is hardened other than not seeing any
hangs or false reboots on in that piece of code.

Cheers,
Don

> 
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=3603a2512f9e69dc87914ba922eb4a0812b21cd6
> 
> So, current smp_send_stop() is hardened to work in a panic situation.
> 
> I will remove this wrong comment.
> 
> Seiji
> 
> >-----Original Message-----
> >From: Luck, Tony [mailto:ton...@in...]
> >Sent: Thursday, January 05, 2012 2:07 PM
> >To: Seiji Aguchi; Don Zickus
> >Cc: lin...@vg...; Matthew Garrett; Vivek Goyal; Chen, Gong; ak...@li...; Brown, Len;
> >'yin...@in...'; 'ak...@li...'; 'hu...@ch...'; 'mi...@el...'; jm...@na...;
> >a.p...@ch...; nam...@gm...; dle...@li...; Satoru Moriya
> >Subject: RE: [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()
> >
> >-	kmsg_dump(KMSG_DUMP_PANIC);
> >-
> > 	/*
> > 	 * Note smp_send_stop is the usual smp shutdown function, which
> > 	 * unfortunately means it may not be hardened to work in a panic
> >@@ -117,6 +115,8 @@ void panic(const char *fmt, ...)
> > 	 */
> > 	smp_send_stop();
> >
> >+	kmsg_dump(KMSG_DUMP_PANIC);
> >+
> >
> >Aren't you worried about the comment about smp_send_stop() not
> >being hardened to work in a panic situation?
> >
> >If it does work - we are clearly much better off moving the
> >kmsg_dump() call down like this. It makes life much simpler
> >and cleaner to work with just one running cpu.
> >
> >But if something goes wrong - we might not see the dump at all!
> >
> >How do we compare these cases and decide that it is better to
> >trust that smp_send_stop() will return?
> >
> >-Tony

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Seiji A. <sei...@hd...> - 2012-01-05 20:11:10

>Aren't you worried about the comment about smp_send_stop() not
>being hardened to work in a panic situation?
> 	/*
> 	 * Note smp_send_stop is the usual smp shutdown function, which
> 	 * unfortunately means it may not be hardened to work in a panic

This comment is wrong because Don improved smp_send_stop() by switching REBOOT_VECTOR to NMI.
And his patch has already merged to linux-next tree.

http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=3603a2512f9e69dc87914ba922eb4a0812b21cd6

So, current smp_send_stop() is hardened to work in a panic situation.

I will remove this wrong comment.

Seiji

>-----Original Message-----
>From: Luck, Tony [mailto:ton...@in...]
>Sent: Thursday, January 05, 2012 2:07 PM
>To: Seiji Aguchi; Don Zickus
>Cc: lin...@vg...; Matthew Garrett; Vivek Goyal; Chen, Gong; ak...@li...; Brown, Len;
>'yin...@in...'; 'ak...@li...'; 'hu...@ch...'; 'mi...@el...'; jm...@na...;
>a.p...@ch...; nam...@gm...; dle...@li...; Satoru Moriya
>Subject: RE: [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()
>
>-	kmsg_dump(KMSG_DUMP_PANIC);
>-
> 	/*
> 	 * Note smp_send_stop is the usual smp shutdown function, which
> 	 * unfortunately means it may not be hardened to work in a panic
>@@ -117,6 +115,8 @@ void panic(const char *fmt, ...)
> 	 */
> 	smp_send_stop();
>
>+	kmsg_dump(KMSG_DUMP_PANIC);
>+
>
>Aren't you worried about the comment about smp_send_stop() not
>being hardened to work in a panic situation?
>
>If it does work - we are clearly much better off moving the
>kmsg_dump() call down like this. It makes life much simpler
>and cleaner to work with just one running cpu.
>
>But if something goes wrong - we might not see the dump at all!
>
>How do we compare these cases and decide that it is better to
>trust that smp_send_stop() will return?
>
>-Tony

Re: [Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Luck, T. <ton...@in...> - 2012-01-05 19:07:29

-	kmsg_dump(KMSG_DUMP_PANIC);
-
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
 	 * unfortunately means it may not be hardened to work in a panic
@@ -117,6 +115,8 @@ void panic(const char *fmt, ...)
 	 */
 	smp_send_stop();
 
+	kmsg_dump(KMSG_DUMP_PANIC);
+

Aren't you worried about the comment about smp_send_stop() not
being hardened to work in a panic situation?

If it does work - we are clearly much better off moving the
kmsg_dump() call down like this. It makes life much simpler
and cleaner to work with just one running cpu.

But if something goes wrong - we might not see the dump at all!

How do we compare these cases and decide that it is better to
trust that smp_send_stop() will return?

-Tony

[Dle-develop] [RFC][PATCH v4 -next 4/4] Skip spin_lock of efi_pstore_write() in panic case

From: Seiji A. <sei...@hd...> - 2012-01-05 17:42:22

This patch skips spin_lock of efi_pstore_write() in panic path for these reasons.
 - efi_pstore_write() is serialized via smp_send_stop().
 - SetVariable runtime service is guaranteed to work correctly if you call
   it a second time when you have taken an exception in trying
   to execute them already.

We can avoid deadlock of spin_lock(&efivars->lock) in panic path and get panic log
reliably.


 Signed-off-by: Seiji Aguchi <sei...@hd...>

---
 drivers/firmware/efivars.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index d25599f..657e6f1 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -510,7 +510,15 @@ static int efi_pstore_write(enum pstore_type_id type,
 	sprintf(stub_name, "dump-type%u-%u-", type, part);
 	sprintf(name, "%s%lu", stub_name, get_seconds());
 
-	spin_lock(&efivars->lock);
+	/*
+	 * We don't need to take any locks in panic path for these reasons.
+	 *  - efi_pstore_write() is serialized via smp_send_stop().
+	 *  - SetVariable runtime service is guaranteed to work correctly
+	 *    if you call it a second time when you have taken an exception
+	 *    in trying to execute them already.
+	 */
+	if (reason != KMSG_DUMP_PANIC)
+		spin_lock(&efivars->lock);
 
 	for (i = 0; i < DUMP_NAME_LEN; i++)
 		efi_name[i] = stub_name[i];
@@ -548,7 +556,8 @@ static int efi_pstore_write(enum pstore_type_id type,
 	efivars->ops->set_variable(efi_name, &vendor, PSTORE_EFI_ATTRIBUTES,
 				   size, psi->buf);
 
-	spin_unlock(&efivars->lock);
+	if (reason != KMSG_DUMP_PANIC)
+		spin_unlock(&efivars->lock);
 
 	if (found)
 		efivar_unregister(found);
-- 
1.7.1

[Dle-develop] [RFC][PATCH v4 -next 3/4]Skip subsequent kmsg_dump() function calls in panic path

From: Seiji A. <sei...@hd...> - 2012-01-05 17:40:43

This patch skips subsequent kmsg_dump() function calls in panic path

With this patch, we can avoid deadlock due to the subsequent calls.
Actually, kmsg_dump(KMSG_DUMP_EMREG) is called after kmsg_dump(KMSG_DUMP_PANIC) 
when panic_timeout variable is set.

 Signed-off-by: Seiji Aguchi <sei...@hd...>

---
 kernel/printk.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 5294426..186d792 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1731,6 +1731,16 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 	const char *s1, *s2;
 	unsigned long l1, l2;
 	unsigned long flags;
+	static bool panicked;
+
+	/*
+	 * kmsg_dump() is skipped because we already got panic log.
+	 */
+	if (panicked)
+		return;
+
+	if (reason == KMSG_DUMP_PANIC)
+		panicked = true;
 
 	/*
 	 * Currently, "in_nmi() && reason != KMSG_DUMP_PANIC" case never happens
-- 
1.7.1

[Dle-develop] [RFC][PATCH v4 -next 2/4] Skip spin_locks in panic case and Add WARN_ON()

From: Seiji A. <sei...@hd...> - 2012-01-05 17:38:32

This patch does followings.

   - Skip spin_locks in panic case in both kmsg_dump() and pstore_dump() because they are 
     serialized via smp_send_stop

   - Add WARN_ON() in "in_nmi() and !panic" case into kmsg_dump(). Currently, this case never 
     happens because only kmsg_dump(KMSG_DUMP_PANIC) is called in NMI case.
     But if someone adds new kmsg_dump() into NMI path in the future, kmsg_dump() may deadlock. 
     We can trap it and complain with this WARN_ON().
     

 With this patch, kmsg_dump()/pstore_dump() work as follows.
   panic case (KMSG_DUMP_PANIC):
     - don't take lock because they are serialized.

   not panic case (KMSG_DUMP_OOPS/KMSG_DUMP_EMERG/KMSG_DUMP_RESTART/KMSG_DUMP_HALT):
     - take locks normally
 
 Regarding as NMI case,
    - kmsg_dump()/pstore_dump() don't take locks, so deadlock issue will not happen
      because kmsg_dump() is called in just panic case with current implementation.
    - If someone adds new kmsg_dump() into NMI path, WARN_ON() is called.
      So we can trap it and ask to fix it.

 Signed-off-by: Seiji Aguchi <sei...@hd...>

---
 fs/pstore/platform.c |   16 ++++++----------
 kernel/printk.c      |   20 ++++++++++++++++++--
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 9ec22d3..426e77d 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -90,18 +90,17 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 	int		hsize, ret;
 	unsigned int	part = 1;
 	unsigned long	flags = 0;
-	int		is_locked = 0;
 
 	if (reason < ARRAY_SIZE(reason_str))
 		why = reason_str[reason];
 	else
 		why = "Unknown";
 
-	if (in_nmi()) {
-		is_locked = spin_trylock(&psinfo->buf_lock);
-		if (!is_locked)
-			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
-	} else
+	/*
+	 * pstore_dump() is serialized in panic path.
+	 * So, we don't need to take any locks.
+	 */
+	if (reason != KMSG_DUMP_PANIC)
 		spin_lock_irqsave(&psinfo->buf_lock, flags);
 	oopscount++;
 	while (total < kmsg_bytes) {
@@ -131,10 +130,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 		total += l1_cpy + l2_cpy;
 		part++;
 	}
-	if (in_nmi()) {
-		if (is_locked)
-			spin_unlock(&psinfo->buf_lock);
-	} else
+	if (reason != KMSG_DUMP_PANIC)
 		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
 }
 
diff --git a/kernel/printk.c b/kernel/printk.c
index 13c0a11..5294426 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1732,13 +1732,29 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 	unsigned long l1, l2;
 	unsigned long flags;
 
+	/*
+	 * Currently, "in_nmi() && reason != KMSG_DUMP_PANIC" case never happens
+	 * because ,in NMI path, only kmsg_dump(KMSG_DUMP_PANIC) is called.
+	 * But if someone adds new kmsg_dump() into NMI path in the future,
+	 * kmsg_dump() may deadlock. We can trap it and complain
+	 * with this WARN_ON().
+	 */
+	WARN_ON(in_nmi() && reason != KMSG_DUMP_PANIC);
+
 	/* Theoretically, the log could move on after we do this, but
 	   there's not a lot we can do about that. The new messages
 	   will overwrite the start of what we dump. */
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
+
+	/*
+	 * kmsg_dump(KMSG_DUMP_PANIC) is serialized.
+	 * So, we don't need to take any locks.
+	 */
+	if (reason != KMSG_DUMP_PANIC)
+		raw_spin_lock_irqsave(&logbuf_lock, flags);
 	end = log_end & LOG_BUF_MASK;
 	chars = logged_chars;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	if (reason != KMSG_DUMP_PANIC)
+		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
 	if (chars > end) {
 		s1 = log_buf + log_buf_len - chars + end;
-- 
1.7.1

[Dle-develop] [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

From: Seiji A. <sei...@hd...> - 2012-01-05 17:37:23

This patch just moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop
for serializing logging process via smp_send_stop.

 Signed-off-by: Seiji Aguchi <sei...@hd...>
 Acked-by: Don Zickus <dz...@re...>

---
 kernel/panic.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 80aed44..da585b8 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -108,8 +108,6 @@ void panic(const char *fmt, ...)
 	 */
 	crash_kexec(NULL);
 
-	kmsg_dump(KMSG_DUMP_PANIC);
-
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
 	 * unfortunately means it may not be hardened to work in a panic
@@ -117,6 +115,8 @@ void panic(const char *fmt, ...)
 	 */
 	smp_send_stop();
 
+	kmsg_dump(KMSG_DUMP_PANIC);
+
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
 	bust_spinlocks(0);
-- 
1.7.1

[Dle-develop] [RFC][PATCH v4 -next 0/4] Make pstore/kmsg_dump run after stopping other cpus in panic path

From: Seiji A. <sei...@hd...> - 2012-01-05 17:36:58

Hi,

Discussion:
 As Don mentioned in following thread, it would be nice for pstore/kmsg_dump to serialize 
 panic path because they can log messages reliably.

 https://lkml.org/lkml/2011/10/13/427

 This patchset is based on his proposal switching smp_send_stop() from REBOOT_VECTOR to NMI.

Change Log:

 v3 -> v4
   - Add comment for explaining the purpose of WARN_ON() based on Don's comment (patch 2/4)
     https://lkml.org/lkml/2011/12/12/296

   - Skip spin_lock of efi_pstore_write() in panic case based on discussion with Tony (patch 4/4)
     https://lkml.org/lkml/2012/1/3/151

   - Apply this patchset to -next tree instead of linus -tree so that "patch 4/4" makes simple

 v2 -> v3
   - Skip spin_locks in panic case in both kmsg_dump() and pstore_dump() instead of calling spin_lock_init()
     to avoid potential issues due to spin_lock_init()
   - Add WARN_ON() in "in_nmi() and !panic" case into kmsg_dump() so that we trap when someone adds 
     new kmsg_dump() in NMI path in the future
   - Skip subsequent kmsg_dump() function calls to avoid deadlock.

 v1 -> v2
  - Add trylocks to kmsg_dump()/pstore_dump() so that they can work in NMI context.  
  - Divide a patch into two
     First one is just moving kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()
     Second one is changing lock operations in kmsg_dump()/pstore_dump()
 v1
  - Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop
  - Bust logbuf_lock of kmsg_dump() in panic path for avoiding deadlock
  - Bust psinfo->buf_lock of pstore_dump() in panic path for avoiding deadlock

Patch Description:
   [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()
     - Just move kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop()

   [RFC][PATCH v4 -next 2/4] Skip spin_locks in panic case and Add WARN_ON()
     - Skip spin_locks in panic case in both kmsg_dump() and pstore_dump()
     - Add WARN_ON() in "in_nmi() and !panic" case into kmsg_dump()

   [RFC][PATCH v4 -next 3/4] Skip subsequent kmsg_dump()
     - Skip subsequent kmsg_dump() function calls in panic path

   [RFC][PATCH v4 -next 4/4] Skip spin_lock of efi_pstore_write() in panic case
     - Skip spin_lock of efi_pstore_write() in panic case to avoid deadlock

Seiji

Re: [Dle-develop] [RFC][PATCH v3 2/3] Skip spin_locks in panic case and add WARN_ON()

From: Seiji A. <sei...@hd...> - 2012-01-03 18:22:53

>But this seems pretty clear that we don't need to self-NMI.

Thank you for your explanation.
Now I understand that we don't need self-NMI and efi_pstore can blindly continue in panic path
because setvariable() will succeed even if the previous call that took the exception didn't succeed.

I will submit an updated patch by removing spin_lock of efi_pstore_write() in panic path.

Seiji

Re: [Dle-develop] [RFC][PATCH v3 2/3] Skip spin_locks in panic case and add WARN_ON()

From: Luck, T. <ton...@in...> - 2012-01-03 17:49:31

> I'll ask an Intel EFI guru whether I'm on target.

Got a short note back from the EFI people:

----
The reason the paragraph exists in 7.1.1 is that certain services (as listed
in table 31) are guaranteed to work if you call them a second time when you
have taken an exception in trying to execute them already -- say SetVariable()
face plants with an exception...because of the way our code is designed you
can call it a second time even if the first call that took the exception
didn't succeed.

At any rate, I checked with a couple of the other experts around us and none
of us can see any reason to send yourself an NMI just to preface a call to
a UEFI runtime service.  Thus, our collective opinion is that you are correct
in your interpretation.
----

If you want to discuss this more - they apparently have an external mailing
list some place on TianoCore.org. But this seems pretty clear that we don't
need to self-NMI.

-Tony

Re: [Dle-develop] [RFC][PATCH v3 2/3] Skip spin_locks in panic case and add WARN_ON()

From: Seiji A. <sei...@hd...> - 2012-01-03 16:05:02

>So I wonder whether there is any value in the cpu sending an NMI to itself to
>meet the exact wording of this section in the spec.
>
>I'll ask an Intel EFI guru whether I'm on target.

Thanks, Tony. I wait for their reply.

Seiji

Re: [Dle-develop] [RFC][PATCH v3 2/3] Skip spin_locks in panic case and add WARN_ON()

From: Luck, T. <ton...@in...> - 2012-01-02 19:42:03

>  7.1.1 Exception for Machine Check, INIT, and NMI.

I wonder if this section is poorly named ... I suspect that it may really mean
"Exception for fatal situations".  I don't think that the EFI code has any way
to know that one of these events has happened (in theory it could tell that it
was executing a machine check handler if it looked at MCIP bit in MCG_STATUS,
but I doubt that it does that ... INIT/NMI would seem even harder to detect).

>   My suggestion is that panicked cpu sends NMI to itself and then it calls kmsg_dump(KMSG_DUMP_PANIC) in NMI.

So I wonder whether there is any value in the cpu sending an NMI to itself to
meet the exact wording of this section in the spec.

I'll ask an Intel EFI guru whether I'm on target.

-Tony

Re: [Dle-develop] [PATCH 0/2] Tracepoint for tcp retransmission

From: Eric D. <eri...@gm...> - 2011-12-21 07:05:35

Le mardi 20 décembre 2011 à 18:40 -0500, Satoru Moriya a écrit :

> We use trace_kfree_skb() which 63e03724b51 uses to detect packet
> drop event. In addition to that, we would like to detect errors
> in TCP layer for better trouble shooting.
> 

If you feel we miss a counter in a particular spot, just add it, so that
even a newbie can report "netstat -s" values in a message for analysis.

Not re-sending a frame because of a temporary shortage of memory is not
an error per se, TCP will recover eventually, but keeping track of these
events might be useful (once we have not percpu counters for this, but
regular atomic counters)

Re: [Dle-develop] [PATCH 0/2] Tracepoint for tcp retransmission

From: Satoru M. <sat...@hd...> - 2011-12-20 23:40:50

On 12/17/2011 07:49 PM, Hagen Paul Pfeifer wrote:
>> Sometimes network packets are dropped for some reason. In enterprise 
>> systems which require strict RAS functionality, we must know the 
>> reason why it happened and explain it to our customers even if using 
>> TCP. When we investigate the incidents, at first we try to find out 
>> whether the problem is in the server(kernel, application) or else 
>> (router, hub etc). And next we try to find out which layer
>> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem 
>> occurs.
> 
> For the first question tcpdump may the right tool.

We'd like to keep records on memory and save it to file when
we detect problems so that we can keep tracing overhead low.
We can also keep the amount of trace data lower than with
tcpdump because we only record data when retransmission occurs.

Capturing all the packets and saving them to file cannot satisfy
our requirements. I should have written them in the cover-letter.
I'll fix it.

Also, we can analyze incidents in combination with the data
from this tracepoint and from others easily.

> For the later systemtap can be used. I mean we now have the 
> possibility to instrument the kernel at runtime, without bloating the 
> source.

Yes, we can use systemtap to get the data we need. But systemtap
is not included in kernel and we must maintain systemtap scripts
to follow kernel modification.

By adding tracepoint here, we can get useful data via ftrace/perf
without any instruments which is not included in kernel.
Of course, systemtap can insert a probe with this tracepoint too.

> Anyway: is 63e03724b51 not suitable to gather the required information 
> easily?

We use trace_kfree_skb() which 63e03724b51 uses to detect packet
drop event. In addition to that, we would like to detect errors
in TCP layer for better trouble shooting.

Regards,
Satoru

Re: [Dle-develop] [PATCH 0/2] Tracepoint for tcp retransmission

From: Satoru M. <sat...@hd...> - 2011-12-20 18:13:20

On 12/16/2011 07:17 PM, Stephen Hemminger wrote:
> 
>> Sometimes network packets are dropped for some reason. In enterprise 
>> systems which require strict RAS functionality, we must know the 
>> reason why it happened and explain it to our customers even if using 
>> TCP. When we investigate the incidents, at first we try to find out 
>> whether the problem is in the server(kernel, application) or else 
>> (router, hub etc). And next we try to find out which layer
>> (application/middleware/kernel(IP/TCP/UDP/..)etc.) the problem 
>> occurs.
> 
> I feel sorry for you, your users don't understand TCP. TCP 
> intentionally induces loss to measure capacity. This is one of the 
> fundamental principles of loss based congestion control.

Maybe my explanation was not enough, I think...

Yes, as you said above, TCP induces loss in principle. Actually,
customers doesn't think that is a problem. But, at the same time
packet drop occurs everywhere in network due to BUG, wrong
configuration, broken hardware and/or etc. and sometimes it
causes serious problems to customers' system.

We provide Linux support service in our business and when a
serious problem occurs, we collect and analyze logs and find
where the problem is.

With this tracepoint, we're able to know whether the problem
is in OS or somewhere else. (Negative return value means packet
drop happened inside OS.)
That's a great help for us narrow down the root cause of
the problem.

I'll rewrite the cover letter when I post v2.

Regards,
Satoru

Re: [Dle-develop] [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point

From: Satoru M. <sat...@hd...> - 2011-12-20 16:56:12

On 12/19/2011 07:10 AM, David Laight wrote:
>  
>> This is just a cleanup patch for making easy to hook return value 
>> with a tracepoint.
> 
> Another way to temporarily add such a trace, is to temporarily
> add a wrapper function.

I'm afraid that I don't know how to add a wrapper function temporarily.
Could you please tell me how to do that? Jprobe?

Regards,
Satoru

Re: [Dle-develop] [RFC][PATCH v3 2/3] Skip spin_locks in panic case and add WARN_ON()

From: Seiji A. <sei...@hd...> - 2011-12-19 23:37:51

Hi,

Thank you for coordinating discussion.

>Hitachi's whole goal behind
>kmsg_dump is to record every possible reason for rebooting the machine for
>inspection on reboot, I think if I understood Seiji privately.

Exactly.

>Now we have a scenario (though rare), where kmsg_dump can fail to save a
>dump because someone already has the lock.
>Seiji, has does this failure impact kmsg_dump's goal?

When we use EFI-enabled system, we can avoid this failure because
,as you can see below(spec of EFI 2.3), kernel in NMI can call set_variable 
runtime service even if previous call is busy.

<snip>
  7.1 Runtime Services Rules and Restrictions
  7.1.1 Exception for Machine Check, INIT, and NMI.
   .
   .
   .
  If the OS determines that the Machine Check is non-recoverable, the OS level handler may 
  ignore the normal restrictions and may invoke the runtime services described in Table 31 even 
  in the case where a previous call was busy. The system firmware will honor the new runtime 
  service call(s) and the operation of the previous interrupted call is not guaranteed. 
  Any interrupted runtime functions will not be restarted.
  The INIT and NMI events follow the same restrictions.
<snip>

So, next steps we should take are as follows.

 - Fix efi_pstore so "efivars->lock" in efi_pstore_write() skips in panic path.

 - Fix smp_send_stop() (or introduce new function call) 
   so that kmsg_dump(KMSG_DUMP_PANIC) is always called in NMI.

   Currently, panicked cpu sends NMI to all cpus other than itself in smp_send_stop() (with Don's patch).
   My suggestion is that panicked cpu sends NMI to itself and then it calls kmsg_dump(KMSG_DUMP_PANIC) in NMI.

Seiji

Re: [Dle-develop] [RFC][PATCH v3 2/3] Skip spin_locks in panic case and add WARN_ON()

From: Don Z. <dz...@re...> - 2011-12-19 21:01:15

On Mon, Dec 12, 2011 at 10:48:52AM -0800, Luck, Tony wrote:
> > if (reason == KMSG_DUMP_PANIC) {
> > 	if(is_spin_locked(&psinfo->buf_lock))
> > 		pr_err("lock is taken.\n");
> > else {
> > 	spin_lock_irqsave(&psinfo->buf_lock, flags);
> > }
> >
> > However, this won't work for this reason.
> >  - printk() must not be called in serialized path because deadlock of logbuf_lock may cause.
> 
> There is also the issue that kmsg has already computed the addresses
> of useful pieces in __log_buf at this point - so any printk() we
> add here is not going to be saved into pstore.  So a user relying
> on pstore to see what happened, won't see any messages we add here.

Ah.  Good point.

Now we have a scenario (though rare), where kmsg_dump can fail to save a
dump because some one already has the lock.  Hitachi's whole goal behind
kmsg_dump is to record every possible reason for rebooting the machine for
inspection on reboot, I think if I understood Seiji privately.

Seiji, has does this failure impact kmsg_dump's goal?

Cheers,
Don

Re: [Dle-develop] [PATCH 1/2] tcp: refactor tcp_retransmit_skb() for a single return point

From: David L. <Dav...@AC...> - 2011-12-19 12:11:50

 
> This is just a cleanup patch for making easy to hook return value
> with a tracepoint.

Another way to temporarily add such a trace, is to temporarily
add a wrapper function.

	David

Re: [Dle-develop] [RFC][PATCH]ACPI, APEI, ERST, back end driver for NVRAM

From: Huang Y. <yin...@in...> - 2011-12-19 01:24:38

On Fri, 2011-12-16 at 00:03 +0800, Seiji Aguchi wrote:
> Hi,
> 
> >
> >No.  I have different understanding.
> >
> >Because error log address range resides in NVRAM, the contents will be
> >reserved even after reboot.  So we do not need read/clear operations at
> >all, and should place all records in error log address range.
> 
> I have a quick question about your comment above.
> Do you know whether this optimization of read/clear operations works 
> on machines which APEI is enabled by WHEA _OSC call?

I have no machine with

!!(erst_erange.attr & ERST_RANGE_NVRAM) == 1

Do you have?

Best Regards,
Huang Ying

454 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 39 40 41 42 43 .. 99 > >> (Page 41 of 99)