You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(32) |
Jun
(66) |
Jul
(102) |
Aug
(78) |
Sep
(106) |
Oct
(137) |
Nov
(147) |
Dec
(147) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(71) |
Feb
(139) |
Mar
(86) |
Apr
(76) |
May
(57) |
Jun
(10) |
Jul
(12) |
Aug
(6) |
Sep
(8) |
Oct
(12) |
Nov
(12) |
Dec
(18) |
| 2011 |
Jan
(16) |
Feb
(19) |
Mar
(3) |
Apr
(1) |
May
(16) |
Jun
(17) |
Jul
(74) |
Aug
(22) |
Sep
(18) |
Oct
(24) |
Nov
(21) |
Dec
(30) |
| 2012 |
Jan
(31) |
Feb
(16) |
Mar
(22) |
Apr
(25) |
May
(18) |
Jun
(13) |
Jul
(83) |
Aug
(49) |
Sep
(20) |
Oct
(60) |
Nov
(35) |
Dec
(28) |
| 2013 |
Jan
(39) |
Feb
(61) |
Mar
(35) |
Apr
(21) |
May
(45) |
Jun
(56) |
Jul
(20) |
Aug
(9) |
Sep
(10) |
Oct
(31) |
Nov
(8) |
Dec
(4) |
| 2014 |
Jan
(6) |
Feb
(7) |
Mar
(7) |
Apr
(6) |
May
(4) |
Jun
(8) |
Jul
(5) |
Aug
(2) |
Sep
(4) |
Oct
(4) |
Nov
(11) |
Dec
(5) |
| 2015 |
Jan
(4) |
Feb
(4) |
Mar
(3) |
Apr
(4) |
May
(9) |
Jun
(4) |
Jul
(15) |
Aug
(8) |
Sep
(16) |
Oct
(18) |
Nov
(15) |
Dec
(7) |
| 2016 |
Jan
(20) |
Feb
(9) |
Mar
(15) |
Apr
(24) |
May
(16) |
Jun
(28) |
Jul
(22) |
Aug
(23) |
Sep
(18) |
Oct
(30) |
Nov
(40) |
Dec
(9) |
| 2017 |
Jan
(1) |
Feb
(8) |
Mar
(37) |
Apr
(26) |
May
(25) |
Jun
(46) |
Jul
(24) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
|
From: H. P. A. <hp...@zy...> - 2011-05-10 20:45:09
|
On 05/10/2011 12:18 PM, Seiji Aguchi wrote: > Hi, > > Thank you for your comments. > >> Btw, is this case of the possibility for APEI of corrupting NVRAM >> storage something you actually experienced on a real machine or simply a >> conclusion from looking at APEI code doing ioremap()? > > I simply concluded from looking at ioremap() of APEI code. > The thing is that you're effectively betting that the EFI method will be less broken than our kernels. That is not at all given... -hpa |
|
From: Seiji A. <sei...@hd...> - 2011-05-10 19:42:40
|
Hi, Thank you for your comments. >Btw, is this case of the possibility for APEI of corrupting NVRAM >storage something you actually experienced on a real machine or simply a >conclusion from looking at APEI code doing ioremap()? I simply concluded from looking at ioremap() of APEI code. Seiji |
|
From: Borislav P. <bp...@am...> - 2011-05-10 17:19:54
|
On Tue, May 10, 2011 at 09:11:37AM -0700, Greg KH wrote: > On Tue, May 10, 2011 at 11:00:44AM -0400, Seiji Aguchi wrote: > > Description of boot paremeters is following. > > > > - efi_pstore_enable > > enable EFI support of pstore. > > > > - efi_pstore_len > > Sets the buffer size of EFI variable space used by pstore. > > Please don't add new boot parameters if at all possible. Distros will > not know to enable them, and users don't know how to either. > > Use sane defaults, and provide ways to override them if needed, but > don't rely on them for new functionality if at all possible. > > Why would this option ever _not_ be something that should be enabled? Right, so if I understand this correctly, we want actually to _switch_ to the EFI method if it is safer than APEI. So it should work like this: if the pstore detects that the system has EFI, it should switch to use it as a method for writing persistent records to NVRAM. Btw, is this case of the possibility for APEI of corrupting NVRAM storage something you actually experienced on a real machine or simply a conclusion from looking at APEI code doing ioremap()? Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 |
|
From: Greg KH <gr...@kr...> - 2011-05-10 16:25:45
|
On Tue, May 10, 2011 at 11:00:44AM -0400, Seiji Aguchi wrote: > Description of boot paremeters is following. > > - efi_pstore_enable > enable EFI support of pstore. > > - efi_pstore_len > Sets the buffer size of EFI variable space used by pstore. Please don't add new boot parameters if at all possible. Distros will not know to enable them, and users don't know how to either. Use sane defaults, and provide ways to override them if needed, but don't rely on them for new functionality if at all possible. Why would this option ever _not_ be something that should be enabled? thanks, greg k-h |
|
From: Seiji A. <sei...@hd...> - 2011-05-10 15:38:06
|
Hi,
This prototype patch enables EFI support of pstore.
This patch hasn't been implemented contents of pstore callback functions
,writer/reader/eraser, yet because testing is needed more.
I would appreciate it if you could review usage of pstore on this patch.
[Advantage of EFI]
Pstore has APEI method to save kernel messages into some persistent storages such as NVRAM.
In addition to APEI method, I suggest UEFI method because it has advantage from a point of
view of protection of data in NVRAM.
- APEI
APEI driver maps NVRAM to virtual memory address for accessing to data of NVRAM. So, there
is a possibility that kernel corrupts data of NVRAM due to its bugs. That's less likely to
corrupt data of NVRAM
- EFI
When using EFI for accessing to data of NVRAM, we don't need to maps NVRAM to virtual memory
address because EFI allows to access to NVRAM with EFI runtime service only.
So, we have a small chance to corrupt data of NVRAM due to kernel's bug, compared to APEI.
[Patch Description]
This patch is updated from previous one.
- previous patch
[RFC][PATCH]kmsg_dumper for NVRAM
http://www.spinics.net/lists/linux-doc/msg02208.html
Changelog
- added pstore structure for using pstore filesystem.
- deleted implementation of set_variables services.
- changed kernel parameters name as follows.
- nvram_kmsg_dump_enable -> efi_pstore_enable
- nvram_kmsg_dump_len -> efi_pstore_len
- removed definition of efi_pstore_enable in case of CONFIG_EFI=y and CONFIG_X86=n
in accordance with Cong Wangs's comments.
Description of boot paremeters is following.
- efi_pstore_enable
enable EFI support of pstore.
- efi_pstore_len
Sets the buffer size of EFI variable space used by pstore.
[TODO]
- Implement pstore callback functions ,writer/reader/eraser.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
Documentation/kernel-parameters.txt | 10 ++++
arch/x86/platform/efi/efi.c | 100 +++++++++++++++++++++++++++++++++++
include/linux/efi.h | 3 +
init/main.c | 5 ++-
4 files changed, 117 insertions(+), 1 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index cc85a92..531597e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -705,6 +705,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
edd= [EDD]
Format: {"off" | "on" | "skip[mbr]"}
+ efi_pstore_enable [X86]
+ Enable EFI support of pstore.
+
+ efi_pstore_len=n [X86]
+ Sets the buffer size of EFI variable space used
+ by pstore, in bytes.
+ Format: { n | nk | nM }
+ n must be a power of two. The default is the same
+ as default log_buf size set in the kernel config file.
+
eisa_irq_edge= [PARISC,HW]
See header of drivers/parisc/eisa.c.
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 0fe27d7..687ab19 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -37,6 +37,7 @@
#include <linux/io.h>
#include <linux/reboot.h>
#include <linux/bcd.h>
+#include <linux/pstore.h>
#include <asm/setup.h>
#include <asm/efi.h>
@@ -59,6 +60,25 @@ struct efi_memory_map memmap;
static struct efi efi_phys __initdata;
static efi_system_table_t efi_systab __initdata;
+int efi_pstore_enabled;
+#define __EFI_PSTORE_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+static char __efi_pstore_buf[__EFI_PSTORE_LEN];
+static char *efi_pstore_buf = __efi_pstore_buf;
+static int efi_pstore_len = __EFI_PSTORE_LEN;
+
+static u64 efi_pstore_writer(enum pstore_type_id , size_t);
+static size_t efi_pstore_reader(u64 *, enum pstore_type_id *,
+ struct timespec *);
+static int efi_pstore_eraser(u64);
+
+static struct pstore_info efi_pstore_info = {
+ .owner = NULL,
+ .name = "efi_pstore",
+ .read = efi_pstore_reader,
+ .write = efi_pstore_writer,
+ .erase = efi_pstore_eraser,
+};
+
static int __init setup_noefi(char *arg)
{
efi_enabled = 0;
@@ -611,3 +631,83 @@ u64 efi_mem_attributes(unsigned long phys_addr)
}
return 0;
}
+
+static int __init setup_efi_pstore_enable(char *arg)
+{
+ efi_pstore_enabled = 1;
+ return 0;
+}
+__setup("efi_pstore_enable", setup_efi_pstore_enable);
+
+static int __init setup_efi_pstore_len(char *str)
+{
+ unsigned size;
+
+ if (!efi_enabled) {
+ printk(KERN_INFO "setup_efi_pstore_len: EFI is disabled.\n");
+ return 1;
+ }
+
+ size = memparse(str, &str);
+ if (size)
+ size = roundup_pow_of_two(size);
+ if (size > efi_pstore_len) {
+ char *new_efi_pstore_buf;
+
+ new_efi_pstore_buf = alloc_bootmem(size);
+ if (!new_efi_pstore_buf) {
+ printk(KERN_WARNING "efi_pstore_len: "
+ "allocation failed\n");
+ return 1;
+ }
+ efi_pstore_len = size;
+ efi_pstore_buf = new_efi_pstore_buf;
+ }
+ printk(KERN_NOTICE "efi_pstore_len: %d\n", efi_pstore_len);
+
+ return 0;
+
+}
+__setup("efi_pstore_len=", setup_efi_pstore_len);
+
+static u64 efi_pstore_writer(enum pstore_type_id type, size_t size)
+{
+
+ /* not implement */
+
+ return -EINVAL;
+}
+
+static size_t efi_pstore_reader(u64 *id, enum pstore_type_id *type,
+ struct timespec *time)
+{
+ /* not implement */
+ printk(KERN_INFO "efi_pstore not implement\n");
+ return -EINVAL;
+}
+
+static int efi_pstore_eraser(u64 record_id)
+{
+ /* not implement */
+
+ return -EINVAL;
+}
+
+void efi_pstore_init(void)
+{
+
+ int rc = 0;
+
+ efi_pstore_info.buf = efi_pstore_buf;
+ efi_pstore_info.bufsize = efi_pstore_len;
+ mutex_init(&efi_pstore_info.buf_mutex);
+
+ rc = pstore_register(&efi_pstore_info);
+ if (rc) {
+ printk(KERN_ERR "efi_pstore_init: fail %d\n", rc);
+ return;
+ }
+ printk(KERN_NOTICE "efi_pstore initialized\n");
+
+ return;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 33fa120..7a8f900 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -290,6 +290,7 @@ extern void efi_map_pal_code (void);
extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);
extern void efi_gettimeofday (struct timespec *ts);
extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */
+extern void efi_pstore_init(void);
extern u64 efi_get_iobase (void);
extern u32 efi_mem_type (unsigned long phys_addr);
extern u64 efi_mem_attributes (unsigned long phys_addr);
@@ -333,11 +334,13 @@ extern int __init efi_setup_pcdp_console(char *);
#ifdef CONFIG_EFI
# ifdef CONFIG_X86
extern int efi_enabled;
+ extern int efi_pstore_enabled;
# else
# define efi_enabled 1
# endif
#else
# define efi_enabled 0
+# define efi_pstore_enabled 0
#endif
/*
diff --git a/init/main.c b/init/main.c
index 4a9479e..eae313b 100644
--- a/init/main.c
+++ b/init/main.c
@@ -591,8 +591,11 @@ asmlinkage void __init start_kernel(void)
pidmap_init();
anon_vma_init();
#ifdef CONFIG_X86
- if (efi_enabled)
+ if (efi_enabled) {
efi_enter_virtual_mode();
+ if (efi_pstore_enabled)
+ efi_pstore_init();
+ }
#endif
thread_info_cache_init();
cred_init();
--
1.7.1
|
|
From: Ian D. <ian...@in...> - 2011-05-09 23:38:36
|
I am Ian Davies ;an accredited vendor of Alliot Groups, a subsidiary firm of Emirates International Holding (EIH); A private equity funds holding company that focuses on hedge funds. I have contacted you in the hope that you can be my associate by accepting to stand as the legal recipient to a Fixed-Income deposit, valued at 25MUSD by providing an International Offshore account to clear the funds. Once I file your details as the new recipient to the funds, the funds will be approved through the AUTOMATED CLEARING HOUSE (ACH) - A facility used by financial institutions to distribute electronic debit and credit entries to bank accounts and therefore settles such entries. Under the automated clearing house system. upon approval of your details as the new recipient; a Credit advice will be issued in your favor and the funds will clear in your account within three banking days. I am willing to give you 40% which is 10MUSD as your commission out of the 25MUSD for your assistance in providing an International Offshore account to clear the funds. I am confident you will be honest enough to adhere to our agreed commissions in spite of the 25MUSD coming through your account. I will need you to forward me your legal names address and phone to file your details on the fund as the new recipient in this first Quarter of the financial fiscal year 2011. Looking forward to working with you. Ian Davies Accredited vendor Alliot Groups PS |
|
From: Nao N. <nao...@hi...> - 2011-02-25 15:31:07
|
This patch supports for SCSI unnamed module.
If scsi_mod.persistent_name=1, device names is assigned by udev.
If scsi_mod.persistent_name=0, device names is assigned the order
of logical unit recognizing.
Signed-off-by: Nao Nishijima <nao...@hi...>
---
drivers/scsi/sd.c | 13 ++++++++++---
drivers/scsi/sr.c | 7 ++++++-
drivers/scsi/st.c | 6 +++++-
3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e567302..7578a2d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -65,6 +65,7 @@
#include "sd.h"
#include "scsi_logging.h"
+#include "scsi_unnamed.h"
MODULE_AUTHOR("Eric Youngdale");
MODULE_DESCRIPTION("SCSI disk (sd) driver");
@@ -2431,6 +2432,9 @@ static int sd_probe(struct device *dev)
if (sdp->type != TYPE_DISK && sdp->type != TYPE_MOD && sdp->type != TYPE_RBC)
goto out;
+ if (check_device_name_prefix("sd", dev))
+ return -EINVAL;
+
SCSI_LOG_HLQUEUE(3, sdev_printk(KERN_INFO, sdp,
"sd_attach\n"));
@@ -2461,9 +2465,12 @@ static int sd_probe(struct device *dev)
goto out_free_index;
}
- error = sd_format_disk_name("sd", index, gd->disk_name, DISK_NAME_LEN);
- if (error)
- goto out_free_index;
+ if (!copy_persistent_name(gd->disk_name, dev)) {
+ error = sd_format_disk_name("sd", index,
+ gd->disk_name, DISK_NAME_LEN);
+ if (error)
+ goto out_free_index;
+ }
sdkp->device = sdp;
sdkp->driver = &sd_template;
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index aefadc6..7d10361 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -58,6 +58,7 @@
#include "scsi_logging.h"
#include "sr.h"
+#include "scsi_unnamed.h"
MODULE_DESCRIPTION("SCSI cdrom (sr) driver");
@@ -611,6 +612,9 @@ static int sr_probe(struct device *dev)
if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
goto fail;
+ if (check_device_name_prefix("sr", dev))
+ return -EINVAL;
+
error = -ENOMEM;
cd = kzalloc(sizeof(*cd), GFP_KERNEL);
if (!cd)
@@ -634,7 +638,8 @@ static int sr_probe(struct device *dev)
disk->major = SCSI_CDROM_MAJOR;
disk->first_minor = minor;
- sprintf(disk->disk_name, "sr%d", minor);
+ if (!copy_persistent_name(disk->disk_name, dev))
+ sprintf(disk->disk_name, "sr%d", minor);
disk->fops = &sr_bdops;
disk->flags = GENHD_FL_CD;
disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST;
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 1871b8a..fb6d5bd 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -74,6 +74,7 @@ static const char *verstr = "20101219";
#include "st_options.h"
#include "st.h"
+#include "scsi_unnamed.h"
static DEFINE_MUTEX(st_mutex);
static int buffer_kbs;
@@ -3983,6 +3984,8 @@ static int st_probe(struct device *dev)
if (SDp->type != TYPE_TAPE)
return -ENODEV;
+ if (check_device_name_prefix("st", dev))
+ return -EINVAL;
if ((stp = st_incompatible(SDp))) {
sdev_printk(KERN_INFO, SDp, "Found incompatible tape\n");
printk(KERN_INFO "st: The suggested driver is %s.\n", stp);
@@ -4051,7 +4054,8 @@ static int st_probe(struct device *dev)
}
kref_init(&tpnt->kref);
tpnt->disk = disk;
- sprintf(disk->disk_name, "st%d", i);
+ if (!copy_persistent_name(disk->disk_name, dev))
+ sprintf(disk->disk_name, "st%d", i);
disk->private_data = &tpnt->driver;
disk->queue = SDp->request_queue;
tpnt->driver = &st_template;
|
|
From: Nao N. <nao...@hi...> - 2011-02-25 15:31:05
|
Add a SCSI option for persistent name support. About the persistent name idea, please refer below mail. http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-10/msg03007.html Device names(e.g. sda) are assigned in the order of logical unit recognizing. the new option can assigne device name regardless of the order of logical unit recognizing. If using this option, add the following kernel parameter. scsi_mod.persistent_name=1 Also, add the following udev rules. SUBSYSTEM=="scsi_unnamed", ATTR{byid}=="xxx", PROGRAM="echo -n sda > /sys/%p/device_name Signed-off-by: Nao Nishijima <nao...@hi...> --- drivers/scsi/Makefile | 1 drivers/scsi/scsi_sysfs.c | 6 + drivers/scsi/scsi_unnamed.c | 188 +++++++++++++++++++++++++++++++++++++++++++ drivers/scsi/scsi_unnamed.h | 4 + 4 files changed, 199 insertions(+), 0 deletions(-) create mode 100644 drivers/scsi/scsi_unnamed.c create mode 100644 drivers/scsi/scsi_unnamed.h diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile index 2e9a87e..939fd52 100644 --- a/drivers/scsi/Makefile +++ b/drivers/scsi/Makefile @@ -166,6 +166,7 @@ scsi_mod-$(CONFIG_SYSCTL) += scsi_sysctl.o scsi_mod-$(CONFIG_SCSI_PROC_FS) += scsi_proc.o scsi_mod-y += scsi_trace.o scsi_mod-$(CONFIG_PM_OPS) += scsi_pm.o +scsi_mod-y += scsi_unnamed.o scsi_tgt-y += scsi_tgt_lib.o scsi_tgt_if.o diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 490ce21..30eb955 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -22,6 +22,7 @@ #include "scsi_priv.h" #include "scsi_logging.h" +#include "scsi_unnamed.h" static struct device_type scsi_dev_type; @@ -393,6 +394,10 @@ int scsi_sysfs_register(void) { int error; + error = scsi_unnamed_init(&scsi_bus_type); + if (error < 0) + return error; + error = bus_register(&scsi_bus_type); if (!error) { error = class_register(&sdev_class); @@ -407,6 +412,7 @@ void scsi_sysfs_unregister(void) { class_unregister(&sdev_class); bus_unregister(&scsi_bus_type); + scsi_unnamed_exit(); } /* diff --git a/drivers/scsi/scsi_unnamed.c b/drivers/scsi/scsi_unnamed.c new file mode 100644 index 0000000..330a52f --- /dev/null +++ b/drivers/scsi/scsi_unnamed.c @@ -0,0 +1,188 @@ +/* + * SCSI unnamed module + */ + +#include <linux/module.h> +#include <linux/err.h> +#include <linux/device.h> +#include <linux/slab.h> +#include <linux/kobject.h> +#include <linux/kdev_t.h> +#include <linux/sysdev.h> +#include <linux/list.h> + +#include <scsi/scsi_driver.h> +#include <scsi/scsi_device.h> + +#define MAX_BUFFER_LEN 256 +#define DISK_NAME_LEN 32 + +static LIST_HEAD(su_list); +static int persistent_name; + +MODULE_PARM_DESC(persistent_name, "SCSI unnamed device support"); +module_param(persistent_name, bool, 0644); + +static struct class su_sysfs_class = { + .name = "scsi_unnamed", +}; + +struct scsi_unnamed { + struct list_head list; + struct device dev; + char byid[MAX_BUFFER_LEN]; + char device_name[DISK_NAME_LEN]; +}; + +#define to_scsi_unnamed(d) \ + container_of(d, struct scsi_unnamed, dev) + +static int get_byid(struct scsi_device *sdev, char *byid) +{ + char *buf; + unsigned int len; + + buf = kmalloc(MAX_BUFFER_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (scsi_get_vpd_page(sdev, 0x83, buf, MAX_BUFFER_LEN)) { + kfree(buf); + return -EINVAL; + } + + /* need some check. TBD */ + len = ((unsigned char *)buf)[7]; + memcpy(byid, strim(buf + 8), len); + kfree(buf); + + return 0; +} + +static int probe_again(struct device_driver *drv, void *data) +{ + struct device *dev = data; + + if (drv->probe) + return drv->probe(dev); + + return -EINVAL; +} + +static ssize_t byid_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%s\n", + container_of(dev, struct scsi_unnamed, dev)->byid); +} + +static ssize_t device_name_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%s\n", to_scsi_unnamed(dev)->device_name); +} + +static ssize_t device_name_store(struct device *dev, + struct device_attribute *attr, char *buf, size_t count) +{ + struct scsi_unnamed *su = to_scsi_unnamed(dev); + struct scsi_unnamed *tmp; + int ret = 0; + + BUG_ON(su == NULL); + + if (su->device_name[0] != '\0' || count >= DISK_NAME_LEN) + return -EINVAL; + + list_for_each_entry(tmp, &su_list, list) { + if (strcmp(tmp->device_name, buf) == 0) { + printk(KERN_NOTICE "duplicate name!\n"); + return -EINVAL; + } + } + + strncpy(su->device_name, buf, DISK_NAME_LEN); + + dev->parent->init_name = su->device_name; + + ret = bus_for_each_drv(dev->parent->bus, NULL, dev->parent, + probe_again); + + return count; +} + +static DEVICE_ATTR(device_name, S_IRUGO|S_IWUSR, device_name_show, + device_name_store); +static DEVICE_ATTR(byid, S_IRUGO|S_IWUSR, byid_show, NULL); + +int scsi_unnamed_probe(struct device *dev) +{ + struct scsi_unnamed *su; + struct scsi_device *sdev = to_scsi_device(dev); + int ret = -EINVAL; + static int i; + + su = kzalloc(sizeof(*su), GFP_KERNEL); + if (!su) + return -ENOMEM; + + list_add(&su->list, &su_list); + device_initialize(&su->dev); + su->dev.parent = dev; + su->dev.class = &su_sysfs_class; + dev_set_name(&su->dev, "su%d", i++); + + if (sdev->type == TYPE_DISK) + ret = get_byid(sdev, su->byid); + if (ret < 0) + strncpy(su->byid, dev_name(dev), MAX_BUFFER_LEN); + + ret = device_add(&su->dev); + if (ret) + return ret; + + ret = device_create_file(&su->dev, &dev_attr_device_name); + if (ret) + return -ENODEV; + + ret = device_create_file(&su->dev, &dev_attr_byid); + if (ret) + return -ENODEV; + + return 0; +} + +int check_device_name_prefix(const char *prefix, struct device *dev) +{ + if (persistent_name && strncmp(prefix, dev_name(dev), 2)) + return -EINVAL; + return 0; +} +EXPORT_SYMBOL(check_device_name_prefix); + +int copy_persistent_name(char *disk_name, struct device *dev) +{ + if (persistent_name) { + sprintf(disk_name, "%s", dev->init_name); + return 1; + } + return 0; +} +EXPORT_SYMBOL(copy_persistent_name); + +int scsi_unnamed_init(struct bus_type *bus) +{ + if (persistent_name) { + bus->probe = scsi_unnamed_probe; + return class_register(&su_sysfs_class); + } + return 0; +} +EXPORT_SYMBOL(scsi_unnamed_init); + +void scsi_unnamed_exit(void) +{ + if (persistent_name) + class_unregister(&su_sysfs_class); +} +EXPORT_SYMBOL(scsi_unnamed_exit); diff --git a/drivers/scsi/scsi_unnamed.h b/drivers/scsi/scsi_unnamed.h new file mode 100644 index 0000000..3cf858c --- /dev/null +++ b/drivers/scsi/scsi_unnamed.h @@ -0,0 +1,4 @@ +extern int check_device_name_prefix(char *, struct device *); +extern int copy_persistent_name(char *, struct device *); +extern int scsi_unnamed_init(struct bus_type *); +extern void scsi_unnamed_exit(void); |
|
From: Nao N. <nao...@hi...> - 2011-02-25 15:31:05
|
This is just a concept code and must be rewritten.
Signed-off-by: Nao Nishijima <nao...@hi...>
---
drivers/scsi/scsi_unnamed.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/scsi_unnamed.c b/drivers/scsi/scsi_unnamed.c
index 330a52f..6857ca0 100644
--- a/drivers/scsi/scsi_unnamed.c
+++ b/drivers/scsi/scsi_unnamed.c
@@ -10,9 +10,12 @@
#include <linux/kdev_t.h>
#include <linux/sysdev.h>
#include <linux/list.h>
+#include <linux/usb.h>
#include <scsi/scsi_driver.h>
#include <scsi/scsi_device.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi.h>
#define MAX_BUFFER_LEN 256
#define DISK_NAME_LEN 32
@@ -37,6 +40,18 @@ struct scsi_unnamed {
#define to_scsi_unnamed(d) \
container_of(d, struct scsi_unnamed, dev)
+static int get_usb_serial(struct scsi_device *sdev, char *byid)
+{
+ struct Scsi_Host *shost = sdev->host;
+ struct usb_interface *intf =
+ to_usb_interface(shost->shost_gendev.parent);
+ struct usb_device *udev = interface_to_usbdev(intf);
+ strncpy(byid, udev->serial, MAX_BUFFER_LEN);
+ if (byid[0] == '\0')
+ return -EFAULT;
+ return 0;
+}
+
static int get_byid(struct scsi_device *sdev, char *byid)
{
char *buf;
@@ -132,8 +147,12 @@ int scsi_unnamed_probe(struct device *dev)
su->dev.class = &su_sysfs_class;
dev_set_name(&su->dev, "su%d", i++);
- if (sdev->type == TYPE_DISK)
+ if (sdev->type == TYPE_DISK) {
ret = get_byid(sdev, su->byid);
+ if (ret < 0)
+ ret = get_usb_serial(sdev, su->byid);
+ }
+
if (ret < 0)
strncpy(su->byid, dev_name(dev), MAX_BUFFER_LEN);
|
|
From: Seiji A. <sei...@hd...> - 2011-02-23 17:47:36
|
Sorry.
I resend this patch because some mail address ,lin...@vg... and
kos...@jp..., were missing.
Seiji
Hi,
This patch tries to execute kmsg_dump() reliably in kdump path.
[Needs for kmsg_dump() in kdump path]
From our support service experience, we always need to detect root cause of OS panic.
Customers in enterprise area never forgive us if kdump fails and we can't detect the root cause of panic due to lack of materials for investigation.
On the other hand, kdump could be unreliable for following reason.
- Before booting 2nd kernel, kdump checks its sha256 checksum and if it fails to
verify the correctness, kdump doesn't start 2nd kernel. In other words, we may
loose materials for detecting root cause of kernel panic when memory corruption happens.
For avoiding losing materials, we want two mechanisms in place.
- One is light weight ,kmsg_dump, which tries to save kernel buffers in NVRAM/flush memory.
- The other is heavy weight one ,kdump, which tries to save the entire/filtered kernel core.
[Discussion about kmsg_dump() in kdump path]
Eric(and others) think that kmsg_dump() should be removed from kdump path because code of
kmsg_dump() is unreliable and it may cause kdump failure.
The patch has already been proposed.
https://lkml.org/lkml/2011/2/1/33
On the other hand, Hitachi would like to store kernel buffers to NVRAM in kdump path because we may not have any information after the crash if kdump fails.
For executing kmsg_dump() reliably and avoiding losing materials, Vivek suggested an idea.
This is overview of his idea.
- Share common parts ,stopping other cpus by NMI/IPI, of kdump/panic.
- Save kernel buffer in NVRAM/flush memory after stopping other cpus.
- Introduce new mutex_lock for sending IPI/NMI reliably when two cpus panics
at the same time.
Detailed explanation is following.
https://lkml.org/lkml/2011/2/8/223
[Patch Description]
This patch is developed based on Vivek's idea above.
<changelog>
- Merge machine_crash_shutdown() and smp_send_stop() into stop_cpus_on_panic() for sharing
common parts ,stopping other cpus by NMI/IPI, of kdump/panic.
- Move kmsg_dump(KMSG_DUMP_PANIC) just after stop_cpus_on_panic() for saving kernel buffer
in NVRAM/flush memory reliably.
- Introduce panic_mutex for sending IPI/NMI reliably when two cpus panics at the same time.
<flowchart>
panic happens
- printing panic strings and stacks. (dumpstack, etc)
- Stop other cpus. (stop_cpus_on_panic())
- Dump kernel buffer to NVRAM or flash memory.(kmsg_dump(KMSG_DUMP_PANIC))
- When kdump is enabled, 2nd kernel boots.(crash_kexec())
- When kdump is disabled, panic_notifier() is called.
<new function call>
stop_cpus_on_panic()
- When kdump is enabled, crash_setup_regs(), crash_save_vmcoreinfo()
and machine_crash_shutdown() is called.
- When kdump is disabled, smp_send_stop() is called.
<modified function call>
crash_kexec()
- When kdump is enabled, machine_kexec() is called.
- When kdump is disabled, returns with doing nothing.
<new mutex_lock>
panic_mutex
It is introduced for sending IPI/NMI reliably when two cpus panics at the same time.
[Build status]
This patch is built against 2.6.38-rc6.
[Test Status]
<simple regression test>
- Case 1
Condition
kernel panics when kdump is enabled.
Result
- kmsg_dump() is called.
- 2nd kernel boots and dumps memory successfully.
- Case 2
Condition
kernel panics when kdump is disabled.
Result
panic notifier is called successfully.
<checking timing issue of kexec_mutex and value of kexec_crash_image>
- Case 3
Condition
cpuX panics while cpuX is getting kexec_mutex, 2nd kernel is not loaded.
Result
kmsg_dump() and panic notifier are called successfully.
- Case 4
Condition
cpuX panics while cpuX is getting kexec_mutex, 2nd kernel is loaded.
Result
kmsg_dump() and panic notifier are called successfully.
- Case 5
Condition
cpuY panics while cpuX is getting kexec_mutex, 2nd kernel is not loaded.
Result
kmsg_dump() and panic notifier are called successfully.
- Case 6
Condition
cpuY panics while cpuX is getting kexec_mutex, 2nd kernel is loaded.
Result
kmsg_dump() and panic notifier are called successfully.
<checking timing issue of panic_mutex>
- Case 7
Condition
cpuX and cpuY panics at the same time when kdump is enabled.
Result
- kmsg_dump() is called,
- 2nd kernel boots and memory dump succeed
- Case 8
Condition
cpuX and cpuY panics at the same time when kdump is disabled.
Result
kmsg_dump() and panic notifier are called successfully.
Any comments and suggestions are welcome.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
include/linux/kexec.h | 2 ++
include/linux/smp.h | 12 ++++++++++++
kernel/kexec.c | 30 +++++++++++++++++++-----------
kernel/panic.c | 23 +++++++++++++----------
4 files changed, 46 insertions(+), 21 deletions(-)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 03e8e8d..8860ee9 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -125,6 +125,8 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long entry, #endif extern struct page *kimage_alloc_control_pages(struct kimage *image,
unsigned int order);
+extern void crash_kexec_prepare(struct pt_regs *); extern void
+stop_cpus_on_panic(void);
extern void crash_kexec(struct pt_regs *); int kexec_should_crash(struct task_struct *); void crash_save_cpu(struct pt_regs *regs, int cpu); diff --git a/include/linux/smp.h b/include/linux/smp.h index 6dc95ca..164d9a9 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -46,6 +46,12 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
*/
extern void smp_send_stop(void);
+#ifdef CONFIG_KEXEC
+extern void stop_cpus_on_panic(void);
+#else
+static inline void stop_cpus_on_panic(void){ smp_send_stop(); } #endif
+
/*
* sends a 'reschedule' event to another CPU:
*/
@@ -119,6 +125,12 @@ extern unsigned int setup_max_cpus;
static inline void smp_send_stop(void) { }
+#ifdef CONFIG_KEXEC
+extern void stop_cpus_on_panic(void);
+#else
+static inline void stop_cpus_on_panic(void) { } #endif
+
/*
* These macros fold the SMP functionality into a single CPU system
*/
diff --git a/kernel/kexec.c b/kernel/kexec.c index ec19b92..f68ea03 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -49,6 +49,8 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
size_t vmcoreinfo_size;
size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
+static int kexec_mutex_is_locked;
+
/* Location of the reserved area for the crash kernel */ struct resource crashk_res = {
.name = "Crash kernel",
@@ -1064,6 +1066,21 @@ asmlinkage long compat_sys_kexec_load(unsigned long entry, } #endif
+void stop_cpus_on_panic(void)
+{
+ if (mutex_trylock(&kexec_mutex)) {
+ kexec_mutex_is_locked = 1;
+ if (kexec_crash_image) {
+ struct pt_regs fixed_regs;
+ crash_setup_regs(&fixed_regs, NULL);
+ crash_save_vmcoreinfo();
+ machine_crash_shutdown(&fixed_regs);
+ return;
+ }
+ }
+ smp_send_stop();
+}
+
void crash_kexec(struct pt_regs *regs)
{
/* Take the kexec_mutex here to prevent sys_kexec_load @@ -1074,17 +1091,8 @@ void crash_kexec(struct pt_regs *regs)
* of memory the xchg(&kexec_crash_image) would be
* sufficient. But since I reuse the memory...
*/
- if (mutex_trylock(&kexec_mutex)) {
- if (kexec_crash_image) {
- struct pt_regs fixed_regs;
-
- kmsg_dump(KMSG_DUMP_KEXEC);
-
- crash_setup_regs(&fixed_regs, regs);
- crash_save_vmcoreinfo();
- machine_crash_shutdown(&fixed_regs);
- machine_kexec(kexec_crash_image);
- }
+ if ((kexec_mutex_is_locked == 1) && kexec_crash_image) {
+ machine_kexec(kexec_crash_image);
mutex_unlock(&kexec_mutex);
}
}
diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..9dd5fdd 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -40,6 +40,8 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
EXPORT_SYMBOL(panic_notifier_list);
+static DEFINE_MUTEX(panic_mutex);
+
static long no_blink(int state)
{
return 0;
@@ -86,16 +88,17 @@ NORET_TYPE void panic(const char * fmt, ...)
* everything else.
* Do we want to call this before we try to display a message?
*/
- crash_kexec(NULL);
-
- kmsg_dump(KMSG_DUMP_PANIC);
-
- /*
- * Note smp_send_stop is the usual smp shutdown function, which
- * unfortunately means it may not be hardened to work in a panic
- * situation.
- */
- smp_send_stop();
+ if (mutex_trylock(&panic_mutex)) {
+ stop_cpus_on_panic();
+ kmsg_dump(KMSG_DUMP_PANIC);
+ crash_kexec(NULL);
+ mutex_unlock(&panic_mutex);
+ } else {
+ /* Waiting for NMI or IPI from panicked cpu. */
+ local_irq_enable();
+ while (1)
+ cpu_relax();
+ }
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
--
1.7.1
|
|
From: Seiji A. <sei...@hd...> - 2011-02-23 17:42:13
|
Hi,
This patch tries to execute kmsg_dump() reliably in kdump path.
[Needs for kmsg_dump() in kdump path]
From our support service experience, we always need to detect root cause of OS panic.
Customers in enterprise area never forgive us if kdump fails and we can't detect the root
cause of panic due to lack of materials for investigation.
On the other hand, kdump could be unreliable for following reason.
- Before booting 2nd kernel, kdump checks its sha256 checksum and if it fails to
verify the correctness, kdump doesn't start 2nd kernel. In other words, we may
loose materials for detecting root cause of kernel panic when memory corruption happens.
For avoiding losing materials, we want two mechanisms in place.
- One is light weight ,kmsg_dump, which tries to save kernel buffers in NVRAM/flush memory.
- The other is heavy weight one ,kdump, which tries to save the entire/filtered kernel core.
[Discussion about kmsg_dump() in kdump path]
Eric(and others) think that kmsg_dump() should be removed from kdump path because code of
kmsg_dump() is unreliable and it may cause kdump failure.
The patch has already been proposed.
https://lkml.org/lkml/2011/2/1/33
On the other hand, Hitachi would like to store kernel buffers to NVRAM in kdump path because
we may not have any information after the crash if kdump fails.
For executing kmsg_dump() reliably and avoiding losing materials, Vivek suggested an idea.
This is overview of his idea.
- Share common parts ,stopping other cpus by NMI/IPI, of kdump/panic.
- Save kernel buffer in NVRAM/flush memory after stopping other cpus.
- Introduce new mutex_lock for sending IPI/NMI reliably when two cpus panics
at the same time.
Detailed explanation is following.
https://lkml.org/lkml/2011/2/8/223
[Patch Description]
This patch is developed based on Vivek's idea above.
<changelog>
- Merge machine_crash_shutdown() and smp_send_stop() into stop_cpus_on_panic() for sharing
common parts ,stopping other cpus by NMI/IPI, of kdump/panic.
- Move kmsg_dump(KMSG_DUMP_PANIC) just after stop_cpus_on_panic() for saving kernel buffer
in NVRAM/flush memory reliably.
- Introduce panic_mutex for sending IPI/NMI reliably when two cpus panics at the same time.
<flowchart>
panic happens
- printing panic strings and stacks. (dumpstack, etc)
- Stop other cpus. (stop_cpus_on_panic())
- Dump kernel buffer to NVRAM or flash memory.(kmsg_dump(KMSG_DUMP_PANIC))
- When kdump is enabled, 2nd kernel boots.(crash_kexec())
- When kdump is disabled, panic_notifier() is called.
<new function call>
stop_cpus_on_panic()
- When kdump is enabled, crash_setup_regs(), crash_save_vmcoreinfo()
and machine_crash_shutdown() is called.
- When kdump is disabled, smp_send_stop() is called.
<modified function call>
crash_kexec()
- When kdump is enabled, machine_kexec() is called.
- When kdump is disabled, returns with doing nothing.
<new mutex_lock>
panic_mutex
It is introduced for sending IPI/NMI reliably when two cpus panics at the same time.
[Build status]
This patch is built against 2.6.38-rc6.
[Test Status]
<simple regression test>
- Case 1
Condition
kernel panics when kdump is enabled.
Result
- kmsg_dump() is called.
- 2nd kernel boots and dumps memory successfully.
- Case 2
Condition
kernel panics when kdump is disabled.
Result
panic notifier is called successfully.
<checking timing issue of kexec_mutex and value of kexec_crash_image>
- Case 3
Condition
cpuX panics while cpuX is getting kexec_mutex, 2nd kernel is not loaded.
Result
kmsg_dump() and panic notifier are called successfully.
- Case 4
Condition
cpuX panics while cpuX is getting kexec_mutex, 2nd kernel is loaded.
Result
kmsg_dump() and panic notifier are called successfully.
- Case 5
Condition
cpuY panics while cpuX is getting kexec_mutex, 2nd kernel is not loaded.
Result
kmsg_dump() and panic notifier are called successfully.
- Case 6
Condition
cpuY panics while cpuX is getting kexec_mutex, 2nd kernel is loaded.
Result
kmsg_dump() and panic notifier are called successfully.
<checking timing issue of panic_mutex>
- Case 7
Condition
cpuX and cpuY panics at the same time when kdump is enabled.
Result
- kmsg_dump() is called,
- 2nd kernel boots and memory dump succeed
- Case 8
Condition
cpuX and cpuY panics at the same time when kdump is disabled.
Result
kmsg_dump() and panic notifier are called successfully.
Any comments and suggestions are welcome.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
include/linux/kexec.h | 2 ++
include/linux/smp.h | 12 ++++++++++++
kernel/kexec.c | 30 +++++++++++++++++++-----------
kernel/panic.c | 23 +++++++++++++----------
4 files changed, 46 insertions(+), 21 deletions(-)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 03e8e8d..8860ee9 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -125,6 +125,8 @@ extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
#endif
extern struct page *kimage_alloc_control_pages(struct kimage *image,
unsigned int order);
+extern void crash_kexec_prepare(struct pt_regs *);
+extern void stop_cpus_on_panic(void);
extern void crash_kexec(struct pt_regs *);
int kexec_should_crash(struct task_struct *);
void crash_save_cpu(struct pt_regs *regs, int cpu);
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 6dc95ca..164d9a9 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -46,6 +46,12 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
*/
extern void smp_send_stop(void);
+#ifdef CONFIG_KEXEC
+extern void stop_cpus_on_panic(void);
+#else
+static inline void stop_cpus_on_panic(void){ smp_send_stop(); }
+#endif
+
/*
* sends a 'reschedule' event to another CPU:
*/
@@ -119,6 +125,12 @@ extern unsigned int setup_max_cpus;
static inline void smp_send_stop(void) { }
+#ifdef CONFIG_KEXEC
+extern void stop_cpus_on_panic(void);
+#else
+static inline void stop_cpus_on_panic(void) { }
+#endif
+
/*
* These macros fold the SMP functionality into a single CPU system
*/
diff --git a/kernel/kexec.c b/kernel/kexec.c
index ec19b92..f68ea03 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -49,6 +49,8 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
size_t vmcoreinfo_size;
size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
+static int kexec_mutex_is_locked;
+
/* Location of the reserved area for the crash kernel */
struct resource crashk_res = {
.name = "Crash kernel",
@@ -1064,6 +1066,21 @@ asmlinkage long compat_sys_kexec_load(unsigned long entry,
}
#endif
+void stop_cpus_on_panic(void)
+{
+ if (mutex_trylock(&kexec_mutex)) {
+ kexec_mutex_is_locked = 1;
+ if (kexec_crash_image) {
+ struct pt_regs fixed_regs;
+ crash_setup_regs(&fixed_regs, NULL);
+ crash_save_vmcoreinfo();
+ machine_crash_shutdown(&fixed_regs);
+ return;
+ }
+ }
+ smp_send_stop();
+}
+
void crash_kexec(struct pt_regs *regs)
{
/* Take the kexec_mutex here to prevent sys_kexec_load
@@ -1074,17 +1091,8 @@ void crash_kexec(struct pt_regs *regs)
* of memory the xchg(&kexec_crash_image) would be
* sufficient. But since I reuse the memory...
*/
- if (mutex_trylock(&kexec_mutex)) {
- if (kexec_crash_image) {
- struct pt_regs fixed_regs;
-
- kmsg_dump(KMSG_DUMP_KEXEC);
-
- crash_setup_regs(&fixed_regs, regs);
- crash_save_vmcoreinfo();
- machine_crash_shutdown(&fixed_regs);
- machine_kexec(kexec_crash_image);
- }
+ if ((kexec_mutex_is_locked == 1) && kexec_crash_image) {
+ machine_kexec(kexec_crash_image);
mutex_unlock(&kexec_mutex);
}
}
diff --git a/kernel/panic.c b/kernel/panic.c
index 991bb87..9dd5fdd 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -40,6 +40,8 @@ ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
EXPORT_SYMBOL(panic_notifier_list);
+static DEFINE_MUTEX(panic_mutex);
+
static long no_blink(int state)
{
return 0;
@@ -86,16 +88,17 @@ NORET_TYPE void panic(const char * fmt, ...)
* everything else.
* Do we want to call this before we try to display a message?
*/
- crash_kexec(NULL);
-
- kmsg_dump(KMSG_DUMP_PANIC);
-
- /*
- * Note smp_send_stop is the usual smp shutdown function, which
- * unfortunately means it may not be hardened to work in a panic
- * situation.
- */
- smp_send_stop();
+ if (mutex_trylock(&panic_mutex)) {
+ stop_cpus_on_panic();
+ kmsg_dump(KMSG_DUMP_PANIC);
+ crash_kexec(NULL);
+ mutex_unlock(&panic_mutex);
+ } else {
+ /* Waiting for NMI or IPI from panicked cpu. */
+ local_irq_enable();
+ while (1)
+ cpu_relax();
+ }
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
--
1.7.1
|
|
From: Feiota <ty...@af...> - 2011-02-17 13:40:51
|
本 有 一 擅 以 费 联 KU 公 几 般 长 上 用 系 PZ 司 十 的 代 漂 适 手 UE 可 家 费 理 据 中 机 ZK 通 遍 用 时 均 I EP 过 布 都 间 为 验 3 JT 关 一 可 长 各 后 7 Y 系 线 找 , 单 付 6 MD 优 城 到 费 位 款 1 RI 惠 市 对 用 在 8 XN 代 的 口 支 税 9 BS 开 合 公 出 务 5 GX 各 作 司 复 局 0 LC 地 公 开 杂 所 1 Q 发 司 具 的 申 7 VJ 漂 漂 领 O 据 。 陈 S 生 X D |
|
From: Hidetoshi S. <set...@jp...> - 2011-02-14 01:22:05
|
(2011/02/10 18:14), Borislav Petkov wrote:
> On Thu, Feb 10, 2011 at 05:36:58PM +0900, Hidetoshi Seto wrote:
>> (2011/02/10 1:35), Seiji Aguchi wrote:
>
> [..]
>
>>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>>> index d916183..e76b47b 100644
>>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>>> @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>>>
>>> percpu_inc(mce_exception_count);
>>>
>>> + hwerr_flag = 1;
>>> +
>>> if (notify_die(DIE_NMI, "machine check", regs, error_code,
>>> 18, SIGKILL) == NOTIFY_STOP)
>>> goto out;
>>
>> Now x86 supports some recoverable machine check, so setting
>> flag here will prevent running kexec on systems that have
>> encountered such recoverable machine check and recovered.
>>
>> I think mce_panic() is proper place to set this flag "hwerr_flag".
>
> I agree, in that case it is unsafe to run kexec only after the error
> cannot be recovered by software.
>
> Also, hwerr_flag is really a bad naming choice, how about
> "hwerr_unrecoverable" or "hw_compromised" or "recovery_futile" or
> "hw_incurable" or simply say what happened: "pcc" = processor context
> corrupt (and a reliable restarting might not be possible). This could be
> used by others too, besides kexec.
Or how about something like hwerr_panic() to clear that the panic is
requested due to hardware error.
Anyway, Aguchi-san, please note that we should not turn off kexec before
encountering fatal hardware error and before printing/transmitting
enough hardware error log to out of this system.
>
> [..]
>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
>>> int res;
>>> unsigned int nr_pages;
>>>
>>> + hwerr_flag = 1;
>>> +
>>> if (!sysctl_memory_failure_recovery)
>>> panic("Memory failure from trap %d on page %lx", trapno, pfn);
>>>
>>
>> For similar reason, setting flag here is not good for
>> systems working after isolating some poisoned memory page.
>>
>> Why not:
>> if (!sysctl_memory_failure_recovery) {
>> hwerr_flag = 1;
>> panic("Memory failure from trap %d on page %lx", trapno, pfn);
>> }
>
> Why do we need that in memory-failure.c at all? I mean, when we consume
> the UC, we'll end up in mce_panic() anyway.
One possible answer is that memory-failure.c is not x86 specific.
Thanks,
H.Seto
|
|
From: Satoru M. <sat...@hd...> - 2011-02-10 18:35:05
|
On 01/20/2011 07:16 PM, Rik van Riel wrote:
> On 01/07/2011 05:03 PM, Satoru Moriya wrote:
>
> > The result is following.
> >
> > | default | case 1 | case 2 |
> > ----------------------------------------------------------
> > wmark_min_kbytes | 5752 | 5752 | 5752 |
> > wmark_low_kbytes | 7190 | 16384 | 32768 | (KB)
> > wmark_high_kbytes | 8628 | 20480 | 40960 |
> > ----------------------------------------------------------
> > real | 503 | 364 | 337 |
> > user | 3 | 5 | 4 | (msec)
> > sys | 153 | 149 | 146 |
> > ----------------------------------------------------------
> > page fault | 32768 | 32768 | 32768 |
> > kswapd_wakeup | 1809 | 335 | 228 | (times)
> > direct reclaim | 5 | 0 | 0 |
> >
> > As you can see, direct reclaim was performed 5 times and
> > its exec time was 503 msec in the default case. On the other
> > hand, in case 1 (large delta case ) no direct reclaim was
> > performed and its exec time was 364 msec.
>
> Saving 1.5 seconds on a one-off workload is probably not
> worth the complexity of giving a system administrator
> yet another set of tunables to mess with.
Above table shows average data but they might not be enough.
In a low-latency enterprise system, worst latency is the most
important. I recorded worst latency data per one page allocation
and here it is.
| default | case 1 | case 2 |
----------------------------------------------------------
worst latency | 223 | 75 | 50 | (usec)
per one page alloc | | | |
In the default case, the worst latency is 223 usec and at that time
direct reclaim occurred. OTOH our target latency is under 100 usec.
So I'd like to ensure that direct reclaim is never executed in a certain
situation.
> However, I suspect it may be a good idea if the kernel
> could adjust these watermarks automatically, since direct
> reclaim could lead to quite a big performance penalty.
>
> I do not know which events should be used to increase and
> decrease the watermarks, but I have some ideas:
> - direct reclaim (increase)
> - kswapd has trouble freeing pages (increase)
> - kswapd frees enough memory at DEF_PRIORITY (decrease)
> - next to no direct reclaim events in the last N (1000?)
> reclaim events (decrease)
I think it might be good idea but not enough because we can't avoid
direct reclaim completely. So what do you think of introducing a learning
mode to your idea? In the learning mode, kernel calculates appropriate
watermarks and next boot users use them.
It is useful for a enterprise system because we normally do performance/stress
tests and tune it before release. If we run stress tests under the learning mode,
we can get the appropriate watermarks for that system. By using them we can avoid
direct reclaim and keep latency low enough in a product system.
> I guess we will also need to be sure that the watermarks
> are never raised above some sane upper threshold. Maybe
> 4x or 5x the default?
>
>
> --
> All rights reversed
|
|
From: Borislav P. <bp...@al...> - 2011-02-10 09:32:20
|
On Thu, Feb 10, 2011 at 05:36:58PM +0900, Hidetoshi Seto wrote:
> (2011/02/10 1:35), Seiji Aguchi wrote:
[..]
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index d916183..e76b47b 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >
> > percpu_inc(mce_exception_count);
> >
> > + hwerr_flag = 1;
> > +
> > if (notify_die(DIE_NMI, "machine check", regs, error_code,
> > 18, SIGKILL) == NOTIFY_STOP)
> > goto out;
>
> Now x86 supports some recoverable machine check, so setting
> flag here will prevent running kexec on systems that have
> encountered such recoverable machine check and recovered.
>
> I think mce_panic() is proper place to set this flag "hwerr_flag".
I agree, in that case it is unsafe to run kexec only after the error
cannot be recovered by software.
Also, hwerr_flag is really a bad naming choice, how about
"hwerr_unrecoverable" or "hw_compromised" or "recovery_futile" or
"hw_incurable" or simply say what happened: "pcc" = processor context
corrupt (and a reliable restarting might not be possible). This could be
used by others too, besides kexec.
[..]
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
> > int res;
> > unsigned int nr_pages;
> >
> > + hwerr_flag = 1;
> > +
> > if (!sysctl_memory_failure_recovery)
> > panic("Memory failure from trap %d on page %lx", trapno, pfn);
> >
>
> For similar reason, setting flag here is not good for
> systems working after isolating some poisoned memory page.
>
> Why not:
> if (!sysctl_memory_failure_recovery) {
> hwerr_flag = 1;
> panic("Memory failure from trap %d on page %lx", trapno, pfn);
> }
Why do we need that in memory-failure.c at all? I mean, when we consume
the UC, we'll end up in mce_panic() anyway.
--
Regards/Gruss,
Boris.
|
|
From: Hidetoshi S. <set...@jp...> - 2011-02-10 08:37:58
|
(2011/02/10 1:35), Seiji Aguchi wrote: > Hi, > > I submitted a quite similar patch last December. > > http://www.spinics.net/lists/linux-mm/msg13157.html > > I retry it with different description of the purpose. > > [Changelog] > from v1: > - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. > - Move variable declaration from <asm/mce.h> to <kernel/panic.h>. > - Remove CONFIG_X86_MCE in *.c files. > - Modify [Purpose]/[Patch Description]. > > [Purpose] > There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers. > We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware. > So, memory dump is not necessary for detecting root cause of machine check. > Also, we can reduce down-time by skipping kdump. > > Of course, there are a lot of servers which don't have logging features of firmware/hardware. > So, I proposed a option controlling kexec behaviour when hardware error occurred. > > [Patch Description] > This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred. > > - Permission > - 0644 > - Value(default is "1") > - non-zero: Kexec is enabled regardless of hardware error. > - 0: Kexec is disabled when MCE occurred. > > > Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec > > -------------------------------------------------- > kernel.kexec_on_hwerr| hardware error | kexec > -------------------------------------------------- > non-zero | occurred | enabled > ----------------------------- > | not occurred | enabled > -------------------------------------------------- > 0 | occurred | disabled > |---------------------------- > | not occurred | enabled > -------------------------------------------------- > > > Any comments and suggestions are welcome. > > Signed-off-by: Seiji Aguchi <sei...@hd...> > > --- > Documentation/sysctl/kernel.txt | 11 +++++++++++ > arch/x86/kernel/cpu/mcheck/mce.c | 2 ++ > include/linux/kernel.h | 2 ++ > include/linux/sysctl.h | 1 + > kernel/panic.c | 15 ++++++++++++++- > kernel/sysctl.c | 8 ++++++++ > kernel/sysctl_binary.c | 1 + > mm/memory-failure.c | 2 ++ > 8 files changed, 41 insertions(+), 1 deletions(-) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -34,6 +34,7 @@ show up in /proc/sys/kernel: > - hotplug > - java-appletviewer [ binfmt_java, obsolete ] > - java-interpreter [ binfmt_java, obsolete ] > +- kexec_on_hwerr [ x86 only ] > - kptr_restrict > - kstack_depth_to_print [ X86 only ] > - l2cr [ PPC only ] > @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If 0, the cache is disabled. Enabled if nonzero. > > ============================================================== > +kexec_on_hwerr: (X86 only) > + > +Controls the behaviour of kexec when panic occurred due to hardware > +error. > +Default value is 1. > + > +0: Kexec is disabled. > +non-zero: Kexec is enabled. > + > +============================================================== > > kptr_restrict: > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index d916183..e76b47b 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code) > > percpu_inc(mce_exception_count); > > + hwerr_flag = 1; > + > if (notify_die(DIE_NMI, "machine check", regs, error_code, > 18, SIGKILL) == NOTIFY_STOP) > goto out; Now x86 supports some recoverable machine check, so setting flag here will prevent running kexec on systems that have encountered such recoverable machine check and recovered. I think mce_panic() is proper place to set this flag "hwerr_flag". > diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag); extern int test_taint(unsigned flag); extern unsigned long get_taint(void); extern int root_mountflags; > +extern int kexec_on_hwerr; > +extern int hwerr_flag; > > extern bool early_boot_irqs_disabled; > > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644 > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -153,6 +153,7 @@ enum > KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ > KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ > KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ > + KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error > +*/ > }; > > > diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -28,6 +28,8 @@ > #define PANIC_BLINK_SPD 18 > > int panic_on_oops; > +int kexec_on_hwerr = 1; > +int hwerr_flag; > static unsigned long tainted_mask; > static int pause_on_oops; > static int pause_on_oops_flag; > @@ -45,6 +47,16 @@ static long no_blink(int state) > return 0; > } > > +static int kexec_should_skip(void) > +{ > + if (!kexec_on_hwerr && hwerr_flag) { > + printk(KERN_WARNING "Kexec is skipped because hardware error " > + "occurred.\n"); > + return 1; > + } > + return 0; > +} > + > /* Returns how long it waited in ms */ > long (*panic_blink)(int state); > EXPORT_SYMBOL(panic_blink); > @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...) > * everything else. > * Do we want to call this before we try to display a message? > */ > - crash_kexec(NULL); > + if (!kexec_should_skip()) > + crash_kexec(NULL); > > kmsg_dump(KMSG_DUMP_PANIC); > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > + { > + .procname = "kexec_on_hwerr", > + .data = &kexec_on_hwerr, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > + > #endif > #if defined(CONFIG_MMU) > { > diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644 > --- a/kernel/sysctl_binary.c > +++ b/kernel/sysctl_binary.c > @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = { > { CTL_INT, KERN_COMPAT_LOG, "compat-log" }, > { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" }, > { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" }, > + { CTL_INT, KERN_KEXEC_ON_HWERR, "kexec_on_hwerr" }, > {} > }; > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) > int res; > unsigned int nr_pages; > > + hwerr_flag = 1; > + > if (!sysctl_memory_failure_recovery) > panic("Memory failure from trap %d on page %lx", trapno, pfn); > For similar reason, setting flag here is not good for systems working after isolating some poisoned memory page. Why not: if (!sysctl_memory_failure_recovery) { hwerr_flag = 1; panic("Memory failure from trap %d on page %lx", trapno, pfn); } Thanks, H.Seto |
|
From: Cong W. <am...@re...> - 2011-02-10 03:05:43
|
于 2011年02月10日 01:07, Eric W. Biederman 写道: > > Is there any reason we can't put logic to decided if we should write > a crashdump in the crashdump userspace? > Doesn't this already provide a choice for the user to decide if he wants a crashdump via sysctl? Except some minor issues pointed by you and Greg, this patch looks fine for me. Thanks. |
|
From: <ebi...@xm...> - 2011-02-09 17:07:26
|
Seiji Aguchi <sei...@hd...> writes: > Hi, > > I submitted a quite similar patch last December. > > http://www.spinics.net/lists/linux-mm/msg13157.html > > I retry it with different description of the purpose. > > [Changelog] > from v1: > - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. > - Move variable declaration from <asm/mce.h> to <kernel/panic.h>. > - Remove CONFIG_X86_MCE in *.c files. > - Modify [Purpose]/[Patch Description]. > > [Purpose] > There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers. > We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware. > So, memory dump is not necessary for detecting root cause of machine check. > Also, we can reduce down-time by skipping kdump. > > Of course, there are a lot of servers which don't have logging features of firmware/hardware. > So, I proposed a option controlling kexec behaviour when hardware error occurred. Mostly this seems reasonable. If we can get the logic simple enough it is fool proof I am for it. > [Patch Description] > This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred. > > - Permission > - 0644 > - Value(default is "1") > - non-zero: Kexec is enabled regardless of hardware error. > - 0: Kexec is disabled when MCE occurred. > > > Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec If we do a version that is potentially arch agnostic but x86 for now, and we call it kexec_on_logged_hwerr. Because it is important that we expect that the hardware will log the error. Is there any reason we can't put logic to decided if we should write a crashdump in the crashdump userspace? Eric > -------------------------------------------------- > kernel.kexec_on_hwerr| hardware error | kexec > -------------------------------------------------- > non-zero | occurred | enabled > ----------------------------- > | not occurred | enabled > -------------------------------------------------- > 0 | occurred | disabled > |---------------------------- > | not occurred | enabled > -------------------------------------------------- > > > Any comments and suggestions are welcome. > > Signed-off-by: Seiji Aguchi <sei...@hd...> > > --- > Documentation/sysctl/kernel.txt | 11 +++++++++++ > arch/x86/kernel/cpu/mcheck/mce.c | 2 ++ > include/linux/kernel.h | 2 ++ > include/linux/sysctl.h | 1 + > kernel/panic.c | 15 ++++++++++++++- > kernel/sysctl.c | 8 ++++++++ > kernel/sysctl_binary.c | 1 + > mm/memory-failure.c | 2 ++ > 8 files changed, 41 insertions(+), 1 deletions(-) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -34,6 +34,7 @@ show up in /proc/sys/kernel: > - hotplug > - java-appletviewer [ binfmt_java, obsolete ] > - java-interpreter [ binfmt_java, obsolete ] > +- kexec_on_hwerr [ x86 only ] > - kptr_restrict > - kstack_depth_to_print [ X86 only ] > - l2cr [ PPC only ] > @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If 0, the cache is disabled. Enabled if nonzero. > > ============================================================== > +kexec_on_hwerr: (X86 only) > + > +Controls the behaviour of kexec when panic occurred due to hardware > +error. > +Default value is 1. > + > +0: Kexec is disabled. > +non-zero: Kexec is enabled. > + > +============================================================== > > kptr_restrict: > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index d916183..e76b47b 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code) > > percpu_inc(mce_exception_count); > > + hwerr_flag = 1; > + > if (notify_die(DIE_NMI, "machine check", regs, error_code, > 18, SIGKILL) == NOTIFY_STOP) > goto out; > diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag); extern int test_taint(unsigned flag); extern unsigned long get_taint(void); extern int root_mountflags; > +extern int kexec_on_hwerr; > +extern int hwerr_flag; > > extern bool early_boot_irqs_disabled; > > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644 > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -153,6 +153,7 @@ enum > KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ > KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ > KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ > + KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error > +*/ Don't change this file. You don't need a binary number. > }; > > > diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -28,6 +28,8 @@ > #define PANIC_BLINK_SPD 18 > > int panic_on_oops; > +int kexec_on_hwerr = 1; > +int hwerr_flag; > static unsigned long tainted_mask; > static int pause_on_oops; > static int pause_on_oops_flag; > @@ -45,6 +47,16 @@ static long no_blink(int state) > return 0; > } > > +static int kexec_should_skip(void) > +{ > + if (!kexec_on_hwerr && hwerr_flag) { > + printk(KERN_WARNING "Kexec is skipped because hardware error " > + "occurred.\n"); > + return 1; > + } > + return 0; > +} > + > /* Returns how long it waited in ms */ > long (*panic_blink)(int state); > EXPORT_SYMBOL(panic_blink); > @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...) > * everything else. > * Do we want to call this before we try to display a message? > */ > - crash_kexec(NULL); > + if (!kexec_should_skip()) > + crash_kexec(NULL); > > kmsg_dump(KMSG_DUMP_PANIC); > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > + { > + .procname = "kexec_on_hwerr", > + .data = &kexec_on_hwerr, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > + > #endif > #if defined(CONFIG_MMU) > { > diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644 > --- a/kernel/sysctl_binary.c > +++ b/kernel/sysctl_binary.c > @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = { > { CTL_INT, KERN_COMPAT_LOG, "compat-log" }, > { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" }, > { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" }, > + { CTL_INT, KERN_KEXEC_ON_HWERR, "kexec_on_hwerr" }, > {} Don't change this file. No one uses the binary interface. > }; > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) > int res; > unsigned int nr_pages; > > + hwerr_flag = 1; > + > if (!sysctl_memory_failure_recovery) > panic("Memory failure from trap %d on page %lx", trapno, > pfn); I get the feeling that we should either call a function besides panic or do something different so that we aren't controlling this trough an implicit parameter set in a global variable. That just seems scary racy and hard to understand by reading the code. Eric |
|
From: <ebi...@xm...> - 2011-02-09 17:07:12
|
Seiji Aguchi <sei...@hd...> writes: > Hi, > > I submitted a quite similar patch last December. > > http://www.spinics.net/lists/linux-mm/msg13157.html > > I retry it with different description of the purpose. > > [Changelog] > from v1: > - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. > - Move variable declaration from <asm/mce.h> to <kernel/panic.h>. > - Remove CONFIG_X86_MCE in *.c files. > - Modify [Purpose]/[Patch Description]. > > [Purpose] > There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers. > We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware. > So, memory dump is not necessary for detecting root cause of machine check. > Also, we can reduce down-time by skipping kdump. > > Of course, there are a lot of servers which don't have logging features of firmware/hardware. > So, I proposed a option controlling kexec behaviour when hardware error occurred. Mostly this seems reasonable. If we can get the logic simple enough it is fool proof I am for it. > [Patch Description] > This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred. > > - Permission > - 0644 > - Value(default is "1") > - non-zero: Kexec is enabled regardless of hardware error. > - 0: Kexec is disabled when MCE occurred. > > > Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec If we do a version that is potentially arch agnostic but x86 for now, and we call it kexec_on_logged_hwerr. Because it is important that we expect that the hardware will log the error. Is there any reason we can't put logic to decided if we should write a crashdump in the crashdump userspace? Eric > -------------------------------------------------- > kernel.kexec_on_hwerr| hardware error | kexec > -------------------------------------------------- > non-zero | occurred | enabled > ----------------------------- > | not occurred | enabled > -------------------------------------------------- > 0 | occurred | disabled > |---------------------------- > | not occurred | enabled > -------------------------------------------------- > > > Any comments and suggestions are welcome. > > Signed-off-by: Seiji Aguchi <sei...@hd...> > > --- > Documentation/sysctl/kernel.txt | 11 +++++++++++ > arch/x86/kernel/cpu/mcheck/mce.c | 2 ++ > include/linux/kernel.h | 2 ++ > include/linux/sysctl.h | 1 + > kernel/panic.c | 15 ++++++++++++++- > kernel/sysctl.c | 8 ++++++++ > kernel/sysctl_binary.c | 1 + > mm/memory-failure.c | 2 ++ > 8 files changed, 41 insertions(+), 1 deletions(-) > > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644 > --- a/Documentation/sysctl/kernel.txt > +++ b/Documentation/sysctl/kernel.txt > @@ -34,6 +34,7 @@ show up in /proc/sys/kernel: > - hotplug > - java-appletviewer [ binfmt_java, obsolete ] > - java-interpreter [ binfmt_java, obsolete ] > +- kexec_on_hwerr [ x86 only ] > - kptr_restrict > - kstack_depth_to_print [ X86 only ] > - l2cr [ PPC only ] > @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If 0, the cache is disabled. Enabled if nonzero. > > ============================================================== > +kexec_on_hwerr: (X86 only) > + > +Controls the behaviour of kexec when panic occurred due to hardware > +error. > +Default value is 1. > + > +0: Kexec is disabled. > +non-zero: Kexec is enabled. > + > +============================================================== > > kptr_restrict: > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index d916183..e76b47b 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code) > > percpu_inc(mce_exception_count); > > + hwerr_flag = 1; > + > if (notify_die(DIE_NMI, "machine check", regs, error_code, > 18, SIGKILL) == NOTIFY_STOP) > goto out; > diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag); extern int test_taint(unsigned flag); extern unsigned long get_taint(void); extern int root_mountflags; > +extern int kexec_on_hwerr; > +extern int hwerr_flag; > > extern bool early_boot_irqs_disabled; > > diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644 > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -153,6 +153,7 @@ enum > KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ > KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ > KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ > + KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error > +*/ Don't change this file. You don't need a binary number. > }; > > > diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -28,6 +28,8 @@ > #define PANIC_BLINK_SPD 18 > > int panic_on_oops; > +int kexec_on_hwerr = 1; > +int hwerr_flag; > static unsigned long tainted_mask; > static int pause_on_oops; > static int pause_on_oops_flag; > @@ -45,6 +47,16 @@ static long no_blink(int state) > return 0; > } > > +static int kexec_should_skip(void) > +{ > + if (!kexec_on_hwerr && hwerr_flag) { > + printk(KERN_WARNING "Kexec is skipped because hardware error " > + "occurred.\n"); > + return 1; > + } > + return 0; > +} > + > /* Returns how long it waited in ms */ > long (*panic_blink)(int state); > EXPORT_SYMBOL(panic_blink); > @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...) > * everything else. > * Do we want to call this before we try to display a message? > */ > - crash_kexec(NULL); > + if (!kexec_should_skip()) > + crash_kexec(NULL); > > kmsg_dump(KMSG_DUMP_PANIC); > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > + { > + .procname = "kexec_on_hwerr", > + .data = &kexec_on_hwerr, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > + > #endif > #if defined(CONFIG_MMU) > { > diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644 > --- a/kernel/sysctl_binary.c > +++ b/kernel/sysctl_binary.c > @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = { > { CTL_INT, KERN_COMPAT_LOG, "compat-log" }, > { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" }, > { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" }, > + { CTL_INT, KERN_KEXEC_ON_HWERR, "kexec_on_hwerr" }, > {} Don't change this file. No one uses the binary interface. > }; > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) > int res; > unsigned int nr_pages; > > + hwerr_flag = 1; > + > if (!sysctl_memory_failure_recovery) > panic("Memory failure from trap %d on page %lx", trapno, > pfn); I get the feeling that we should either call a function besides panic or do something different so that we aren't controlling this trough an implicit parameter set in a global variable. That just seems scary racy and hard to understand by reading the code. Eric |
|
From: Greg KH <gr...@su...> - 2011-02-09 16:52:01
|
On Wed, Feb 09, 2011 at 11:35:43AM -0500, Seiji Aguchi wrote: > --- a/include/linux/sysctl.h > +++ b/include/linux/sysctl.h > @@ -153,6 +153,7 @@ enum > KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ > KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ > KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ > + KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error > +*/ Odd trailing comment on the next line. |
|
From: Seiji A. <sei...@hd...> - 2011-02-09 16:38:52
|
Hi, I submitted a quite similar patch last December. http://www.spinics.net/lists/linux-mm/msg13157.html I retry it with different description of the purpose. [Changelog] from v1: - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. - Move variable declaration from <asm/mce.h> to <kernel/panic.h>. - Remove CONFIG_X86_MCE in *.c files. - Modify [Purpose]/[Patch Description]. [Purpose] There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers. We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware. So, memory dump is not necessary for detecting root cause of machine check. Also, we can reduce down-time by skipping kdump. Of course, there are a lot of servers which don't have logging features of firmware/hardware. So, I proposed a option controlling kexec behaviour when hardware error occurred. [Patch Description] This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred. - Permission - 0644 - Value(default is "1") - non-zero: Kexec is enabled regardless of hardware error. - 0: Kexec is disabled when MCE occurred. Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec -------------------------------------------------- kernel.kexec_on_hwerr| hardware error | kexec -------------------------------------------------- non-zero | occurred | enabled ----------------------------- | not occurred | enabled -------------------------------------------------- 0 | occurred | disabled |---------------------------- | not occurred | enabled -------------------------------------------------- Any comments and suggestions are welcome. Signed-off-by: Seiji Aguchi <sei...@hd...> --- Documentation/sysctl/kernel.txt | 11 +++++++++++ arch/x86/kernel/cpu/mcheck/mce.c | 2 ++ include/linux/kernel.h | 2 ++ include/linux/sysctl.h | 1 + kernel/panic.c | 15 ++++++++++++++- kernel/sysctl.c | 8 ++++++++ kernel/sysctl_binary.c | 1 + mm/memory-failure.c | 2 ++ 8 files changed, 41 insertions(+), 1 deletions(-) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -34,6 +34,7 @@ show up in /proc/sys/kernel: - hotplug - java-appletviewer [ binfmt_java, obsolete ] - java-interpreter [ binfmt_java, obsolete ] +- kexec_on_hwerr [ x86 only ] - kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If 0, the cache is disabled. Enabled if nonzero. ============================================================== +kexec_on_hwerr: (X86 only) + +Controls the behaviour of kexec when panic occurred due to hardware +error. +Default value is 1. + +0: Kexec is disabled. +non-zero: Kexec is enabled. + +============================================================== kptr_restrict: diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index d916183..e76b47b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code) percpu_inc(mce_exception_count); + hwerr_flag = 1; + if (notify_die(DIE_NMI, "machine check", regs, error_code, 18, SIGKILL) == NOTIFY_STOP) goto out; diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag); extern int test_taint(unsigned flag); extern unsigned long get_taint(void); extern int root_mountflags; +extern int kexec_on_hwerr; +extern int hwerr_flag; extern bool early_boot_irqs_disabled; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -153,6 +153,7 @@ enum KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */ KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */ KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */ + KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error +*/ }; diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -28,6 +28,8 @@ #define PANIC_BLINK_SPD 18 int panic_on_oops; +int kexec_on_hwerr = 1; +int hwerr_flag; static unsigned long tainted_mask; static int pause_on_oops; static int pause_on_oops_flag; @@ -45,6 +47,16 @@ static long no_blink(int state) return 0; } +static int kexec_should_skip(void) +{ + if (!kexec_on_hwerr && hwerr_flag) { + printk(KERN_WARNING "Kexec is skipped because hardware error " + "occurred.\n"); + return 1; + } + return 0; +} + /* Returns how long it waited in ms */ long (*panic_blink)(int state); EXPORT_SYMBOL(panic_blink); @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...) * everything else. * Do we want to call this before we try to display a message? */ - crash_kexec(NULL); + if (!kexec_should_skip()) + crash_kexec(NULL); kmsg_dump(KMSG_DUMP_PANIC); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "kexec_on_hwerr", + .data = &kexec_on_hwerr, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + #endif #if defined(CONFIG_MMU) { diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644 --- a/kernel/sysctl_binary.c +++ b/kernel/sysctl_binary.c @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = { { CTL_INT, KERN_COMPAT_LOG, "compat-log" }, { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" }, { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" }, + { CTL_INT, KERN_KEXEC_ON_HWERR, "kexec_on_hwerr" }, {} }; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags) int res; unsigned int nr_pages; + hwerr_flag = 1; + if (!sysctl_memory_failure_recovery) panic("Memory failure from trap %d on page %lx", trapno, pfn); -- 1.7.1 |
|
From: Seiji A. <sei...@hd...> - 2011-02-03 21:11:52
|
Hi Tony, >I wonder whether you could use my pstore file system interface >for this ... you'd need to write a backend that used EFI variable >space to save the pieces of a console log, in much the same way >that I used ERST to stash the pieces. > >This might be a bit messy - but I think that it would be >worth doing in order to provide a single user interface >to the kmsg_dump on different architectures, regardless >of the underlying storage method used. I will check whether I could use your pstore file system interface. Could you please send your latest patch to me? Seiji >-----Original Message----- >From: Luck, Tony [mailto:ton...@in...] >Sent: Tuesday, February 01, 2011 2:47 PM >To: Américo Wang; Seiji Aguchi >Cc: rd...@xe...; Yu, Fenghua; tg...@li...; mi...@re...; hp...@zy...; x8...@ke...; >tj...@ke...; ak...@li...; a.p...@ch...; ar...@ar...; lin...@vg...; >lin...@vg...; lin...@vg...; dle...@li...; sh...@re...; >pj...@re...; Satoru Moriya >Subject: RE: [RFC][PATCH] kmsg_dumper for NVRAM > >> This looks like what Tony wanted, pstore. > >Yes - this looks like another means to the same end (making console log >Available after a crash). > >I wonder whether you could use my pstore file system interface >for this ... you'd need to write a backend that used EFI variable >space to save the pieces of a console log, in much the same way >that I used ERST to stash the pieces. > >This might be a bit messy - but I think that it would be >worth doing in order to provide a single user interface >to the kmsg_dump on different architectures, regardless >of the underlying storage method used. I.e. the OS >vendors would just have to write startup scripts to glean >information from /dev/pstore and clear it by removing the >files there. Rather than having one set of scripts that >looks at EFI variables for machines that use that, a different >set for machines that have the sparc64 method of saving in >some special area of ram, and yet another set for a machine >that has some other motherboard magical non volatile storage >that hasn't been designed yet. > >-Tony |
|
From: Luck, T. <ton...@in...> - 2011-02-01 19:47:17
|
> This looks like what Tony wanted, pstore. Yes - this looks like another means to the same end (making console log Available after a crash). I wonder whether you could use my pstore file system interface for this ... you'd need to write a backend that used EFI variable space to save the pieces of a console log, in much the same way that I used ERST to stash the pieces. This might be a bit messy - but I think that it would be worth doing in order to provide a single user interface to the kmsg_dump on different architectures, regardless of the underlying storage method used. I.e. the OS vendors would just have to write startup scripts to glean information from /dev/pstore and clear it by removing the files there. Rather than having one set of scripts that looks at EFI variables for machines that use that, a different set for machines that have the sparc64 method of saving in some special area of ram, and yet another set for a machine that has some other motherboard magical non volatile storage that hasn't been designed yet. -Tony |
|
From: Américo W. <xiy...@gm...> - 2011-02-01 09:30:03
|
On Mon, Jan 31, 2011 at 11:21:17AM -0500, Seiji Aguchi wrote:
>Hi,
>
>This prototype patch introduces kmsg_dumper for NVRAM(Non-Volatile RAM).
>
Hi,
This looks like what Tony wanted, pstore.
...
>[Patch Description]
>This patch adds following boot paremeters.
>
> - nvram_kmsg_dump_enable
> Enable kmsg_dumper for NVRAM with UEFI.
>
> - nvram_kmsg_dump_len
> Size of kernel messages dumped to NVRAM.
> default size is 1KB.(because I would like to use efivars for reading them from userspace.)
> Maximum size is 32KB.
>
>On the next boot, sysfs files are created as follows and through these files we can see
>the kernel messages stored in NVRAM.
>
> /sys/firmware/efi/vars/LinuxKmsgDump001-8be4df61-93ca-11d2-aa0d-00e098032b8c/data
> /sys/firmware/efi/vars/LinuxKmsgDump002-8be4df61-93ca-11d2-aa0d-00e098032b8c/data
> .
> .
> /sys/firmware/efi/vars/LinuxKmsgDump032-8be4df61-93ca-11d2-aa0d-00e098032b8c/data
>
> - Size of each entry is 1KB. 32 entries are created at a maximum.
> - "8be4df61-93ca-11d2-aa0d-00e098032b8c" is EFI_GLOBAL_VARIABLE which is defined in
> UEFI specification.
>
So, 'cat /sys/firmware/efi/vars/LinuxKmsgDump*/data' will show
the whole kernel messages? And in the right order?
Also ,will these data be flushed after the next next boot? If not, how
can they be flushed/deleted?
>
>---
> Documentation/kernel-parameters.txt | 9 +++
> arch/ia64/kernel/efi.c | 4 +
> arch/x86/platform/efi/efi.c | 135 +++++++++++++++++++++++++++++++++++
> include/linux/efi.h | 4 +
> init/main.c | 5 +-
> 5 files changed, 156 insertions(+), 1 deletions(-)
Some comments below.
>+static int __init setup_nvram_kmsg_dump_enable(char *arg)
>+{
>+ nvram_kmsg_dump_enabled = 1;
>+ return 0;
>+}
>+__setup("nvram_kmsg_dump_enable", setup_nvram_kmsg_dump_enable);
>+
>+static int __init setup_nvram_kmsg_dump_len(char *str)
>+{
>+ unsigned size = memparse(str, &str);
You ignored errors here.
>+
>+ if (!efi_enabled) {
>+ printk(KERN_INFO "setup_nvram_kmsg_dump_len: EFI is disabled\n");
>+ return 1;
>+ }
>+
>+ if (size)
>+ size = roundup_pow_of_two(size);
>+ if (size > nvram_kmsg_dump_len) {
>+ char *new_nvram_kmsg_dump;
>+
>+ new_nvram_kmsg_dump = alloc_bootmem(size);
>+ if (!new_nvram_kmsg_dump) {
>+ printk(KERN_WARNING "nvram_kmsg_dump_len: \
>+allocation failed\n");
We don't split strings like this.
>+ return 1;
>+ }
>+ nvram_kmsg_dump_len = size;
>+ nvram_kmsg_dump_buf = new_nvram_kmsg_dump;
>+ }
>+ printk(KERN_NOTICE "nvram_kmsg_dump_len: %d\n", nvram_kmsg_dump_len);
>+
>+ return 0;
>+
>+}
>+__setup("nvram_kmsg_dump_len=", setup_nvram_kmsg_dump_len);
>+
>+static void nvram_do_kmsg_dump(struct kmsg_dumper *dumper,
>+ enum kmsg_dump_reason reason, const char *s1, unsigned long l1,
>+ const char *s2, unsigned long l2)
>+{
>+ unsigned long s1_start, s2_start, l1_cpy, l2_cpy;
>+ unsigned long attribute = 0xf, total, tmp, cpy_size;
>+ int i;
>+ efi_status_t efi_status;
>+ void *tmp_buf;
>+
>+ l2_cpy = min(l2, (unsigned long)nvram_kmsg_dump_len);
>+ l1_cpy = min(l1, (unsigned long)nvram_kmsg_dump_len - l2_cpy);
>+
>+ s2_start = l2 - l2_cpy;
>+ s1_start = l1 - l1_cpy;
>+
>+ memcpy(nvram_kmsg_dump_buf, s1 + s1_start, l1_cpy);
>+ memcpy(nvram_kmsg_dump_buf + l1_cpy, s2 + s2_start, l2_cpy);
>+
>+ /* initialize */
>+ for (i = 0; i < MAX_ENTRY; i++)
>+ efi.set_variable(kmsg_dump_value_utf16[i],
>+ &EFI_GLOBAL_VARIABLE_GUID,
>+ attribute, 0, NULL);
>+
>+ /* write data */
>+ total = l1_cpy + l2_cpy;
>+ tmp_buf = (void *)nvram_kmsg_dump_buf + total;
>+ cpy_size = 0;
>+ for (i = 0; i < MAX_ENTRY; i++) {
>+ tmp = min(total - cpy_size, (unsigned long)SET_VARIABLE_LEN);
>+ tmp_buf -= tmp;
>+ efi_status = efi.set_variable(kmsg_dump_value_utf16[i],
>+ &EFI_GLOBAL_VARIABLE_GUID,
>+ attribute, tmp, tmp_buf);
>+ if (efi_status) {
>+ printk(KERN_WARNING "nvram_do_kmsg_dump: \
>+set_variable %d failed 0x%lx\n", i, efi_status);
Ditto.
>+ efi.set_variable(kmsg_dump_value_utf16[i],
>+ &EFI_GLOBAL_VARIABLE_GUID,
>+ attribute, 0, NULL);
>+ tmp_buf += tmp;
>+ cpy_size -= tmp;
>+ }
>+ cpy_size += tmp;
>+ if (cpy_size >= total)
>+ break;
>+ }
>+}
>+
>+void nvram_kmsg_dump_init(void)
>+{
>+
>+ int i, outlen, err;
>+
>+ for (i = 0; i < MAX_ENTRY; i++) {
>+ snprintf(kmsg_dump_value, sizeof(kmsg_dump_value),
>+ "%s%04d", LINUX_KMSG_DUMP_PREFIX, i + 1);
>+ outlen = utf8s_to_utf16s((u8 *)kmsg_dump_value,
>+ sizeof(kmsg_dump_value),
>+ (wchar_t *)kmsg_dump_value_utf16[i]);
>+ if (outlen != LINUX_KMSG_DUMP_LEN - 1) {
>+ printk(KERN_ERR
>+ "nvram_kmsg_dump_init: utf8s_to_utf16s %d\n",
>+ outlen);
>+ return;
>+ }
>+ }
>+
>+ memset(&nvram_kmsg_dumper, 0, sizeof(nvram_kmsg_dumper));
You don't need to memset a static gloabl var to 0.
>+ nvram_kmsg_dumper.dump = nvram_do_kmsg_dump;
>+ err = kmsg_dump_register(&nvram_kmsg_dumper);
>+ if (err) {
>+ printk(KERN_ERR "nvram_kmsg_dump_init: kmsg_dump_register %d\n",
>+ err);
>+ return;
>+ }
>+ printk(KERN_NOTICE "nvram_kmsg_dump initialized\n");
>+
>+ return;
>+}
>diff --git a/include/linux/efi.h b/include/linux/efi.h
>index fb737bc..d0d1a3c 100644
>--- a/include/linux/efi.h
>+++ b/include/linux/efi.h
>@@ -300,6 +300,7 @@ extern void efi_initialize_iomem_resources(struct resource *code_resource,
> extern unsigned long efi_get_time(void);
> extern int efi_set_rtc_mmss(unsigned long nowtime);
> extern struct efi_memory_map memmap;
>+extern void nvram_kmsg_dump_init(void);
>
> /**
> * efi_range_is_wc - check the WC bit on an address range
>@@ -333,11 +334,14 @@ extern int __init efi_setup_pcdp_console(char *);
> #ifdef CONFIG_EFI
> # ifdef CONFIG_X86
> extern int efi_enabled;
>+ extern int nvram_kmsg_dump_enabled;
> # else
> # define efi_enabled 1
>+# define nvram_kmsg_dump_enabled 1
There is a global var with the same name, right?
|
|
From: Seiji A. <sei...@hd...> - 2011-01-31 16:27:05
|
Hi,
This prototype patch introduces kmsg_dumper for NVRAM(Non-Volatile RAM).
[Purpose]
My purpose is developing reliable logging feature for enterprise use.
I plan to realize it by using NVRAM equipped with some enterprise servers.
[Solution]
There are following enterprise servers equipped with NVRAM.
1. Supporting UEFI
2. Not supporting UEFI
At first, I suggest kmsg_dumper with UEFI because customers will have advantage as follows.
- UEFI is tolerant of memory corruption because kernel can't access to UEFI area.
In other words, Kernel messages are gotten reliably.
- Kernel messages are gotten at a very early stage of boot process.
So we can investigate root cause of a problem even if kdump isn't enabled.
[Patch Description]
This patch adds following boot paremeters.
- nvram_kmsg_dump_enable
Enable kmsg_dumper for NVRAM with UEFI.
- nvram_kmsg_dump_len
Size of kernel messages dumped to NVRAM.
default size is 1KB.(because I would like to use efivars for reading them from userspace.)
Maximum size is 32KB.
On the next boot, sysfs files are created as follows and through these files we can see
the kernel messages stored in NVRAM.
/sys/firmware/efi/vars/LinuxKmsgDump001-8be4df61-93ca-11d2-aa0d-00e098032b8c/data
/sys/firmware/efi/vars/LinuxKmsgDump002-8be4df61-93ca-11d2-aa0d-00e098032b8c/data
.
.
/sys/firmware/efi/vars/LinuxKmsgDump032-8be4df61-93ca-11d2-aa0d-00e098032b8c/data
- Size of each entry is 1KB. 32 entries are created at a maximum.
- "8be4df61-93ca-11d2-aa0d-00e098032b8c" is EFI_GLOBAL_VARIABLE which is defined in
UEFI specification.
[Test]
I tested this feature with x86_64.
This is a prototype patch.
So, any comments are welcome.
Signed-off-by: Seiji Aguchi <sei...@hd...>
---
Documentation/kernel-parameters.txt | 9 +++
arch/ia64/kernel/efi.c | 4 +
arch/x86/platform/efi/efi.c | 135 +++++++++++++++++++++++++++++++++++
include/linux/efi.h | 4 +
init/main.c | 5 +-
5 files changed, 156 insertions(+), 1 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b72e071..10212a9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1778,6 +1778,15 @@ and is between 256 and 4096 characters. It is defined in the file
This can be set from sysctl after boot.
See Documentation/sysctl/vm.txt for details.
+ nvram_kmsg_dump_enable [X86]
+ Enable kmsg_dump for NVRAM
+
+ nvram_kmsg_dump_len=n [X86]
+ Sets the buffer size of kmsg_dump for NVRAM, in bytes.
+ Format: { n | nk | nM }
+ n must be a power of two. The default size
+ is 1024.
+
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
See Documentation/debugging-via-ohci1394.txt for more
info.
diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index a0f0019..6ba22aa 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1366,3 +1366,7 @@ vmcore_find_descriptor_size (unsigned long address)
return ret;
}
#endif
+
+void nvram_kmsg_dump_init(void)
+{
+}
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 0fe27d7..1499fb6 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -37,6 +37,10 @@
#include <linux/io.h>
#include <linux/reboot.h>
#include <linux/bcd.h>
+#include <linux/kmsg_dump.h>
+#include <linux/nls.h>
+#include <linux/slab_def.h>
+#include <linux/gfp.h>
#include <asm/setup.h>
#include <asm/efi.h>
@@ -59,6 +63,18 @@ struct efi_memory_map memmap;
static struct efi efi_phys __initdata;
static efi_system_table_t efi_systab __initdata;
+int nvram_kmsg_dump_enabled;
+#define LINUX_KMSG_DUMP_PREFIX "LinuxKmsgDump"
+#define LINUX_KMSG_DUMP_LEN 18
+#define MAX_ENTRY 32
+#define SET_VARIABLE_LEN 1024
+static char kmsg_dump_value[LINUX_KMSG_DUMP_LEN];
+static efi_char16_t kmsg_dump_value_utf16[MAX_ENTRY][LINUX_KMSG_DUMP_LEN];
+static char __nvram_kmsg_dump_buf[SET_VARIABLE_LEN];
+static char *nvram_kmsg_dump_buf = __nvram_kmsg_dump_buf;
+static int nvram_kmsg_dump_len = SET_VARIABLE_LEN;
+static struct kmsg_dumper nvram_kmsg_dumper;
+
static int __init setup_noefi(char *arg)
{
efi_enabled = 0;
@@ -611,3 +627,122 @@ u64 efi_mem_attributes(unsigned long phys_addr)
}
return 0;
}
+
+static int __init setup_nvram_kmsg_dump_enable(char *arg)
+{
+ nvram_kmsg_dump_enabled = 1;
+ return 0;
+}
+__setup("nvram_kmsg_dump_enable", setup_nvram_kmsg_dump_enable);
+
+static int __init setup_nvram_kmsg_dump_len(char *str)
+{
+ unsigned size = memparse(str, &str);
+
+ if (!efi_enabled) {
+ printk(KERN_INFO "setup_nvram_kmsg_dump_len: EFI is disabled\n");
+ return 1;
+ }
+
+ if (size)
+ size = roundup_pow_of_two(size);
+ if (size > nvram_kmsg_dump_len) {
+ char *new_nvram_kmsg_dump;
+
+ new_nvram_kmsg_dump = alloc_bootmem(size);
+ if (!new_nvram_kmsg_dump) {
+ printk(KERN_WARNING "nvram_kmsg_dump_len: \
+allocation failed\n");
+ return 1;
+ }
+ nvram_kmsg_dump_len = size;
+ nvram_kmsg_dump_buf = new_nvram_kmsg_dump;
+ }
+ printk(KERN_NOTICE "nvram_kmsg_dump_len: %d\n", nvram_kmsg_dump_len);
+
+ return 0;
+
+}
+__setup("nvram_kmsg_dump_len=", setup_nvram_kmsg_dump_len);
+
+static void nvram_do_kmsg_dump(struct kmsg_dumper *dumper,
+ enum kmsg_dump_reason reason, const char *s1, unsigned long l1,
+ const char *s2, unsigned long l2)
+{
+ unsigned long s1_start, s2_start, l1_cpy, l2_cpy;
+ unsigned long attribute = 0xf, total, tmp, cpy_size;
+ int i;
+ efi_status_t efi_status;
+ void *tmp_buf;
+
+ l2_cpy = min(l2, (unsigned long)nvram_kmsg_dump_len);
+ l1_cpy = min(l1, (unsigned long)nvram_kmsg_dump_len - l2_cpy);
+
+ s2_start = l2 - l2_cpy;
+ s1_start = l1 - l1_cpy;
+
+ memcpy(nvram_kmsg_dump_buf, s1 + s1_start, l1_cpy);
+ memcpy(nvram_kmsg_dump_buf + l1_cpy, s2 + s2_start, l2_cpy);
+
+ /* initialize */
+ for (i = 0; i < MAX_ENTRY; i++)
+ efi.set_variable(kmsg_dump_value_utf16[i],
+ &EFI_GLOBAL_VARIABLE_GUID,
+ attribute, 0, NULL);
+
+ /* write data */
+ total = l1_cpy + l2_cpy;
+ tmp_buf = (void *)nvram_kmsg_dump_buf + total;
+ cpy_size = 0;
+ for (i = 0; i < MAX_ENTRY; i++) {
+ tmp = min(total - cpy_size, (unsigned long)SET_VARIABLE_LEN);
+ tmp_buf -= tmp;
+ efi_status = efi.set_variable(kmsg_dump_value_utf16[i],
+ &EFI_GLOBAL_VARIABLE_GUID,
+ attribute, tmp, tmp_buf);
+ if (efi_status) {
+ printk(KERN_WARNING "nvram_do_kmsg_dump: \
+set_variable %d failed 0x%lx\n", i, efi_status);
+ efi.set_variable(kmsg_dump_value_utf16[i],
+ &EFI_GLOBAL_VARIABLE_GUID,
+ attribute, 0, NULL);
+ tmp_buf += tmp;
+ cpy_size -= tmp;
+ }
+ cpy_size += tmp;
+ if (cpy_size >= total)
+ break;
+ }
+}
+
+void nvram_kmsg_dump_init(void)
+{
+
+ int i, outlen, err;
+
+ for (i = 0; i < MAX_ENTRY; i++) {
+ snprintf(kmsg_dump_value, sizeof(kmsg_dump_value),
+ "%s%04d", LINUX_KMSG_DUMP_PREFIX, i + 1);
+ outlen = utf8s_to_utf16s((u8 *)kmsg_dump_value,
+ sizeof(kmsg_dump_value),
+ (wchar_t *)kmsg_dump_value_utf16[i]);
+ if (outlen != LINUX_KMSG_DUMP_LEN - 1) {
+ printk(KERN_ERR
+ "nvram_kmsg_dump_init: utf8s_to_utf16s %d\n",
+ outlen);
+ return;
+ }
+ }
+
+ memset(&nvram_kmsg_dumper, 0, sizeof(nvram_kmsg_dumper));
+ nvram_kmsg_dumper.dump = nvram_do_kmsg_dump;
+ err = kmsg_dump_register(&nvram_kmsg_dumper);
+ if (err) {
+ printk(KERN_ERR "nvram_kmsg_dump_init: kmsg_dump_register %d\n",
+ err);
+ return;
+ }
+ printk(KERN_NOTICE "nvram_kmsg_dump initialized\n");
+
+ return;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index fb737bc..d0d1a3c 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -300,6 +300,7 @@ extern void efi_initialize_iomem_resources(struct resource *code_resource,
extern unsigned long efi_get_time(void);
extern int efi_set_rtc_mmss(unsigned long nowtime);
extern struct efi_memory_map memmap;
+extern void nvram_kmsg_dump_init(void);
/**
* efi_range_is_wc - check the WC bit on an address range
@@ -333,11 +334,14 @@ extern int __init efi_setup_pcdp_console(char *);
#ifdef CONFIG_EFI
# ifdef CONFIG_X86
extern int efi_enabled;
+ extern int nvram_kmsg_dump_enabled;
# else
# define efi_enabled 1
+# define nvram_kmsg_dump_enabled 1
# endif
#else
# define efi_enabled 0
+# define nvram_kmsg_dump_enabled 0
#endif
/*
diff --git a/init/main.c b/init/main.c
index 33c37c3..a8ee2bd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -679,8 +679,11 @@ asmlinkage void __init start_kernel(void)
pidmap_init();
anon_vma_init();
#ifdef CONFIG_X86
- if (efi_enabled)
+ if (efi_enabled) {
efi_enter_virtual_mode();
+ if (nvram_kmsg_dump_enabled)
+ nvram_kmsg_dump_init();
+ }
#endif
thread_info_cache_init();
cred_init();
--
1.7.1
|