From: Blaisorblade <bla...@ya...> - 2005-09-21 17:46:46
|
This is a collection of more intrusive UML fixes for 2.6.14. However, I've been careful in them (at least so I hope). You may want to put some in -mm, but please let's make sure that they get into -mm. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:00
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> pte_modify marks a page as needing flush, which is redundant because the resulting PTE is still set with set_pte, which already handles that. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- include/asm-um/pgtable.h | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/include/asm-um/pgtable.h b/include/asm-um/pgtable.h --- a/include/asm-um/pgtable.h +++ b/include/asm-um/pgtable.h @@ -346,7 +346,6 @@ static inline void set_pte(pte_t *pteptr static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pte_set_val(pte, (pte_val(pte) & _PAGE_CHG_MASK), newprot); - if(pte_present(pte)) pte = pte_mknewpage(pte_mknewprot(pte)); return pte; } |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:00
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> Avoid setting w = 0 twice. Spotted this (trivial) thing which is needed for another patch. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/kernel/tlb.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c --- a/arch/um/kernel/tlb.c +++ b/arch/um/kernel/tlb.c @@ -193,12 +193,12 @@ void fix_range_common(struct mm_struct * r = pte_read(*npte); w = pte_write(*npte); x = pte_exec(*npte); - if(!pte_dirty(*npte)) - w = 0; - if(!pte_young(*npte)){ - r = 0; - w = 0; - } + if (!pte_young(*npte)) { + r = 0; + w = 0; + } else if (!pte_dirty(*npte)) { + w = 0; + } if(force || pte_newpage(*npte)){ if(pte_present(*npte)) ret = add_mmap(addr, |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:00
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> Simplify the code by using strlcat() instead of strncat() and manual appending. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/include/user.h | 4 +++- arch/um/kernel/umid.c | 11 ++++------- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/um/include/user.h b/arch/um/include/user.h --- a/arch/um/include/user.h +++ b/arch/um/include/user.h @@ -14,7 +14,9 @@ extern void *um_kmalloc_atomic(int size) extern void kfree(void *ptr); extern int in_aton(char *str); extern int open_gdb_chan(void); -extern int strlcpy(char *, const char *, int); +/* These use size_t, however unsigned long is correct on both i386 and x86_64. */ +extern unsigned long strlcpy(char *, const char *, unsigned long); +extern unsigned long strlcat(char *, const char *, unsigned long); extern void *um_vmalloc(int size); extern void vfree(void *ptr); diff --git a/arch/um/kernel/umid.c b/arch/um/kernel/umid.c --- a/arch/um/kernel/umid.c +++ b/arch/um/kernel/umid.c @@ -237,16 +237,13 @@ static int __init make_uml_dir(void) strlcpy(dir, home, sizeof(dir)); uml_dir++; } + strlcat(dir, uml_dir, sizeof(dir)); len = strlen(dir); - strncat(dir, uml_dir, sizeof(dir) - len); - len = strlen(dir); - if((len > 0) && (len < sizeof(dir) - 1) && (dir[len - 1] != '/')){ - dir[len] = '/'; - dir[len + 1] = '\0'; - } + if (len > 0 && dir[len - 1] != '/') + strlcat(dir, "/", sizeof(dir)); uml_dir = malloc(strlen(dir) + 1); - if(uml_dir == NULL){ + if (uml_dir == NULL) { printf("make_uml_dir : malloc failed, errno = %d\n", errno); exit(1); } |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:01
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> User get *a lot* confused when consoles don't work but we don't report anything. And, as reported in the comment, using printk to report "your console doesn't work" isn't likely to go that far. Fix the problem on the base of this: stack consumption by host printf(). Use kernel sprintf() and os_write_file, using a wild guess that one page will be enough for the message, to preallocate the buffer with kmalloc(). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/drivers/chan_kern.c | 58 +++++++++++++++++++++++++++++++------------ 1 files changed, 42 insertions(+), 16 deletions(-) diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c --- a/arch/um/drivers/chan_kern.c +++ b/arch/um/drivers/chan_kern.c @@ -19,18 +19,44 @@ #include "line.h" #include "os.h" -#ifdef CONFIG_NOCONFIG_CHAN +/* XXX: could well be moved to somewhere else, if needed. */ +static int my_printf(const char * fmt, ...) + __attribute__ ((format (printf, 1, 2))); + +static int my_printf(const char * fmt, ...) +{ + /* Yes, can be called on atomic context.*/ + char *buf = kmalloc(4096, GFP_ATOMIC); + va_list args; + int r; + + if (!buf) { + /* We print directly fmt. + * Yes, yes, yes, feel free to complain. */ + r = strlen(fmt); + } else { + va_start(args, fmt); + r = vsprintf(buf, fmt, args); + va_end(args); + fmt = buf; + } + + if (r) + r = os_write_file(1, fmt, r); + return r; -/* The printk's here are wrong because we are complaining that there is no - * output device, but printk is printing to that output device. The user will - * never see the error. printf would be better, except it can't run on a - * kernel stack because it will overflow it. - * Use printk for now since that will avoid crashing. - */ +} + +#ifdef CONFIG_NOCONFIG_CHAN +/* Despite its name, there's no added trailing newline. */ +static int my_puts(const char * buf) +{ + return os_write_file(1, buf, strlen(buf)); +} static void *not_configged_init(char *str, int device, struct chan_opts *opts) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); return(NULL); } @@ -38,27 +64,27 @@ static void *not_configged_init(char *st static int not_configged_open(int input, int output, int primary, void *data, char **dev_out) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); return(-ENODEV); } static void not_configged_close(int fd, void *data) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); } static int not_configged_read(int fd, char *c_out, void *data) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); return(-EIO); } static int not_configged_write(int fd, const char *buf, int len, void *data) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); return(-EIO); } @@ -66,7 +92,7 @@ static int not_configged_write(int fd, c static int not_configged_console_write(int fd, const char *buf, int len, void *data) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); return(-EIO); } @@ -74,14 +100,14 @@ static int not_configged_console_write(i static int not_configged_window_size(int fd, void *data, unsigned short *rows, unsigned short *cols) { - printk(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); return(-ENODEV); } static void not_configged_free(void *data) { - printf(KERN_ERR "Using a channel type which is configured out of " + my_puts("Using a channel type which is configured out of " "UML\n"); } @@ -457,7 +483,7 @@ static struct chan *parse_chan(char *str } } if(ops == NULL){ - printk(KERN_ERR "parse_chan couldn't parse \"%s\"\n", + my_printf("parse_chan couldn't parse \"%s\"\n", str); return(NULL); } |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:01
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> Only remove the UML pidfile and management socket if we created them. Currently in case two UMLs are started with the same umid, the second will remove the first's ones. Probably we should also panic() at that point, not sure however. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/kernel/umid.c | 30 +++++++++++++++++++----------- 1 files changed, 19 insertions(+), 11 deletions(-) diff --git a/arch/um/kernel/umid.c b/arch/um/kernel/umid.c --- a/arch/um/kernel/umid.c +++ b/arch/um/kernel/umid.c @@ -31,6 +31,8 @@ static char *uml_dir = UML_DIR; /* Changed by set_umid */ static int umid_is_random = 1; static int umid_inited = 0; +/* Have we created the files? Should we remove them? */ +static int umid_owned = 0; static int make_umid(int (*printer)(const char *fmt, ...)); @@ -82,20 +84,21 @@ int __init umid_file_name(char *name, ch extern int tracing_pid; -static int __init create_pid_file(void) +static void __init create_pid_file(void) { char file[strlen(uml_dir) + UMID_LEN + sizeof("/pid\0")]; char pid[sizeof("nnnnn\0")]; int fd, n; - if(umid_file_name("pid", file, sizeof(file))) return 0; + if(umid_file_name("pid", file, sizeof(file))) + return; fd = os_open_file(file, of_create(of_excl(of_rdwr(OPENFLAGS()))), 0644); if(fd < 0){ printf("Open of machine pid file \"%s\" failed: %s\n", file, strerror(-fd)); - return 0; + return; } sprintf(pid, "%d\n", os_getpid()); @@ -103,7 +106,6 @@ static int __init create_pid_file(void) if(n != strlen(pid)) printf("Write of pid file failed - err = %d\n", -n); os_close_file(fd); - return 0; } static int actually_do_remove(char *dir) @@ -147,7 +149,8 @@ static int actually_do_remove(char *dir) void remove_umid_dir(void) { char dir[strlen(uml_dir) + UMID_LEN + 1]; - if(!umid_inited) return; + if (!umid_owned) + return; sprintf(dir, "%s%s", uml_dir, umid); actually_do_remove(dir); @@ -155,11 +158,12 @@ void remove_umid_dir(void) char *get_umid(int only_if_set) { - if(only_if_set && umid_is_random) return(NULL); - return(umid); + if(only_if_set && umid_is_random) + return NULL; + return umid; } -int not_dead_yet(char *dir) +static int not_dead_yet(char *dir) { char file[strlen(uml_dir) + UMID_LEN + sizeof("/pid\0")]; char pid[sizeof("nnnnn\0")], *end; @@ -193,7 +197,8 @@ int not_dead_yet(char *dir) (p == CHOOSE_MODE(tracing_pid, os_getpid()))) dead = 1; } - if(!dead) return(1); + if(!dead) + return(1); return(actually_do_remove(dir)); } @@ -286,6 +291,7 @@ static int __init make_umid(int (*printe if(errno == EEXIST){ if(not_dead_yet(tmp)){ (*printer)("umid '%s' is in use\n", umid); + umid_owned = 0; return(-1); } err = mkdir(tmp, 0777); @@ -296,7 +302,8 @@ static int __init make_umid(int (*printe return(-1); } - return(0); + umid_owned = 1; + return 0; } __uml_setup("uml_dir=", set_uml_dir, @@ -309,7 +316,8 @@ static int __init make_umid_setup(void) /* one function with the ordering we need ... */ make_uml_dir(); make_umid(printf); - return create_pid_file(); + create_pid_file(); + return 0; } __uml_postsetup(make_umid_setup); |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:01
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> Things are breaking horribly with sysrq called in interrupt context. I want to try to fix it, but probably this is simpler. To tell the truth, sysrq is normally run in interrupt context, so there shouldn't be any problem. There's also a warning from the fault handler because it's run in atomic context (I have a patch for that, only I deferred it). This is why I'm doing this. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/drivers/mconsole_user.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/um/drivers/mconsole_user.c b/arch/um/drivers/mconsole_user.c --- a/arch/um/drivers/mconsole_user.c +++ b/arch/um/drivers/mconsole_user.c @@ -23,7 +23,7 @@ static struct mconsole_command commands[ { "reboot", mconsole_reboot, MCONSOLE_PROC }, { "config", mconsole_config, MCONSOLE_PROC }, { "remove", mconsole_remove, MCONSOLE_PROC }, - { "sysrq", mconsole_sysrq, MCONSOLE_INTR }, + { "sysrq", mconsole_sysrq, MCONSOLE_PROC }, { "help", mconsole_help, MCONSOLE_INTR }, { "cad", mconsole_cad, MCONSOLE_INTR }, { "stop", mconsole_stop, MCONSOLE_PROC }, |
From: Jeff D. <jd...@ad...> - 2005-09-21 20:57:36
|
On Wed, Sep 21, 2005 at 07:28:57PM +0200, Paolo 'Blaisorblade' Giarrusso wrote: > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > Things are breaking horribly with sysrq called in interrupt context. I want to > try to fix it, but probably this is simpler. To tell the truth, sysrq is > normally run in interrupt context, so there shouldn't be any problem. How are they breaking? Jeff |
From: Blaisorblade <bla...@ya...> - 2005-09-22 19:44:34
|
On Wednesday 21 September 2005 22:50, Jeff Dike wrote: > On Wed, Sep 21, 2005 at 07:28:57PM +0200, Paolo 'Blaisorblade' Giarrusso wrote: > > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > > > Things are breaking horribly with sysrq called in interrupt context. I > > want to try to fix it, but probably this is simpler. To tell the truth, > > sysrq is normally run in interrupt context, so there shouldn't be any > > problem. > > How are they breaking? sysrq t is broken (and stays), but additionally there are some warnings from some commands (enable sleep inside spinlock checking and spinlock debugging), which go to the down_read inside handle_page_fault IIRC. So try to run in process context. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.messenger.yahoo.com |
From: Jeff D. <jd...@ad...> - 2005-09-22 20:44:47
|
On Thu, Sep 22, 2005 at 09:20:20PM +0200, Blaisorblade wrote: > sysrq t is broken (and stays), There's a fix on the way for that. This has nothing to do with interrupt context anyway. > but additionally there are some warnings from > some commands (enable sleep inside spinlock checking and spinlock debugging), > which go to the down_read inside handle_page_fault IIRC. So try to run in > process context. Which ones? They should be fixed. It is fairly fundamental to sysrq that it work from interrupt context. You may be diagnosing a system which can't context switch any more. This patch should be dropped, and the real problems fixed. Jeff |
From: Blaisorblade <bla...@ya...> - 2005-09-22 20:56:34
|
On Thursday 22 September 2005 22:37, Jeff Dike wrote: > On Thu, Sep 22, 2005 at 09:20:20PM +0200, Blaisorblade wrote: > > sysrq t is broken (and stays), > There's a fix on the way for that. Already found? Nice. > This has nothing to do with interrupt > context anyway. > > but additionally there are some warnings from > > some commands (enable sleep inside spinlock checking and spinlock > > debugging), which go to the down_read inside handle_page_fault IIRC. So > > try to run in process context. > Which ones? They should be fixed. Ok, will drop this patch and retry again to look at the messages. > It is fairly fundamental to sysrq that it work from interrupt context. You > may be diagnosing a system which can't context switch any more. Ok, I felt a bit uncertain about this, but didn't realize this very problem. > This patch should be dropped, and the real problems fixed. Ok, that's nice for me. I'll look into this. Possibly, the if (in_atomic()) don't take semaphore in handle_page_fault was the right fix for this. I'll have a look soon. > Jeff -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.messenger.yahoo.com |
From: Andrew M. <ak...@os...> - 2005-09-23 07:41:18
|
Jeff Dike <jd...@ad...> wrote: > > On Thu, Sep 22, 2005 at 09:20:20PM +0200, Blaisorblade wrote: > > sysrq t is broken (and stays), > > There's a fix on the way for that. This has nothing to do with interrupt > context anyway. > > > but additionally there are some warnings from > > some commands (enable sleep inside spinlock checking and spinlock debugging), > > which go to the down_read inside handle_page_fault IIRC. So try to run in > > process context. > > Which ones? They should be fixed. > > It is fairly fundamental to sysrq that it work from interrupt context. You > may be diagnosing a system which can't context switch any more. > > This patch should be dropped, and the real problems fixed. > Well Linus has just merged this patch. If you hadn't removed me from the cc yesterday this would not have happened. |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:01
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> The current code doesn't handle well general protection faults on the host - it thinks that cr2 is always the address of a page fault. While actually, on general protection faults, that address is not accessible, so we'd better assume we couldn't satisfy the fault. Currently instead we think we've fixed it, so we go back, retry the instruction and fault again endlessly. This leads to the kernel hanging when doing copy_from_user(dest, -1, ...) in TT mode, since reading *(-1) causes a GFP, and we don't support kernel preemption. Thanks to Luo Xin for testing UML with LTP and reporting the failures he got. Cc: Luo Xin <luo...@si...> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/kernel/trap_kern.c | 11 ++++++++++- arch/um/kernel/tt/uaccess_user.c | 11 +++++++++-- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/um/kernel/trap_kern.c b/arch/um/kernel/trap_kern.c --- a/arch/um/kernel/trap_kern.c +++ b/arch/um/kernel/trap_kern.c @@ -18,6 +18,7 @@ #include "asm/a.out.h" #include "asm/current.h" #include "asm/irq.h" +#include "sysdep/sigcontext.h" #include "user_util.h" #include "kern_util.h" #include "kern.h" @@ -125,7 +126,15 @@ unsigned long segv(struct faultinfo fi, } else if(current->mm == NULL) panic("Segfault with no mm"); - err = handle_page_fault(address, ip, is_write, is_user, &si.si_code); + + if (SEGV_IS_FIXABLE(&fi)) + err = handle_page_fault(address, ip, is_write, is_user, &si.si_code); + else { + err = -EFAULT; + /* A thread accessed NULL, we get a fault, but CR2 is invalid. + * This code is used in __do_copy_from_user() of TT mode. */ + address = 0; + } catcher = current->thread.fault_catcher; if(!err) diff --git a/arch/um/kernel/tt/uaccess_user.c b/arch/um/kernel/tt/uaccess_user.c --- a/arch/um/kernel/tt/uaccess_user.c +++ b/arch/um/kernel/tt/uaccess_user.c @@ -22,8 +22,15 @@ int __do_copy_from_user(void *to, const __do_copy, &faulted); TASK_REGS(get_current())->tt = save; - if(!faulted) return(0); - else return(n - (fault - (unsigned long) from)); + if(!faulted) + return 0; + else if (fault) + return n - (fault - (unsigned long) from); + else + /* In case of a general protection fault, we don't have the + * fault address, so NULL is used instead. Pretend we didn't + * copy anything. */ + return n; } static void __do_strncpy(void *dst, const void *src, int count) |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:01
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> GFP_ATOMIC | GFP_KERNEL is meaningless and won't work. Actually it never worked, even in 2.4. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/kernel/process_kern.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/um/kernel/process_kern.c b/arch/um/kernel/process_kern.c --- a/arch/um/kernel/process_kern.c +++ b/arch/um/kernel/process_kern.c @@ -82,7 +82,8 @@ unsigned long alloc_stack(int order, int unsigned long page; int flags = GFP_KERNEL; - if(atomic) flags |= GFP_ATOMIC; + if (atomic) + flags = GFP_ATOMIC; page = __get_free_pages(flags, order); if(page == 0) return(0); |
From: Bill D. <dav...@tm...> - 2005-09-21 19:20:56
|
Paolo 'Blaisorblade' Giarrusso wrote: > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > GFP_ATOMIC | GFP_KERNEL is meaningless and won't work. Actually it never worked, > even in 2.4. > > Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > --- > > arch/um/kernel/process_kern.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/arch/um/kernel/process_kern.c b/arch/um/kernel/process_kern.c > --- a/arch/um/kernel/process_kern.c > +++ b/arch/um/kernel/process_kern.c > @@ -82,7 +82,8 @@ unsigned long alloc_stack(int order, int > unsigned long page; > int flags = GFP_KERNEL; > > - if(atomic) flags |= GFP_ATOMIC; > + if (atomic) > + flags = GFP_ATOMIC; > page = __get_free_pages(flags, order); > if(page == 0) > return(0); > int flags = (atomic ? GFP_ATOMIC : GFP_KERNEL); -- -bill davidsen (dav...@tm...) "The secret to procrastination is to put things off until the last possible moment - but no longer" -me |
From: Jeff D. <jd...@ad...> - 2005-09-21 21:28:54
|
On Wed, Sep 21, 2005 at 07:29:21PM +0200, Paolo 'Blaisorblade' Giarrusso wrote: > - if(atomic) flags |= GFP_ATOMIC; > + if (atomic) > + flags = GFP_ATOMIC; We should take a careful look at where this is called from, and see if we can get rid of the atomic business altogether. Jeff |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:01
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> Following i386, we should maybe refuse trying to fault in pages when we're doing atomic operations, because to handle the fault we could need to take already taken spinlocks. Also, if we're doing an atomic operation (in the sense of in_atomic()) we're surely in kernel mode and we're surely going to handle adequately the failed fault, so it's safe to behave this way. Currently, on UML SMP is rarely used, and we don't support PREEMPT, so this is unlikely to create problems right now, but it might in the future. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/kernel/trap_kern.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/um/kernel/trap_kern.c b/arch/um/kernel/trap_kern.c --- a/arch/um/kernel/trap_kern.c +++ b/arch/um/kernel/trap_kern.c @@ -40,6 +40,12 @@ int handle_page_fault(unsigned long addr int err = -EFAULT; *code_out = SEGV_MAPERR; + + /* If the fault was during atomic operation, don't take the fault, just + * fail. */ + if (in_atomic()) + goto out_nosemaphore; + down_read(&mm->mmap_sem); vma = find_vma(mm, address); if(!vma) @@ -90,6 +96,7 @@ survive: flush_tlb_page(vma, address); out: up_read(&mm->mmap_sem); +out_nosemaphore: return(err); /* |
From: Paolo 'B. G. <bla...@ya...> - 2005-09-21 17:49:02
|
From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> setup_initial_poll is only called with sigio_lock() held, so use appropriate allocation. Also, parse_chan() can also be called when holding a spinlock (see line_open() -> parse_chan_pair()). I have sporadical problems (spinlock taken twice, with spinlock debugging on UP) which could be caused by a sequence like "take spinlock, alloc and go to sleep, take again the spinlock in the other thread". Signed-off-by: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> --- arch/um/drivers/chan_kern.c | 2 +- arch/um/kernel/sigio_user.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c --- a/arch/um/drivers/chan_kern.c +++ b/arch/um/drivers/chan_kern.c @@ -465,7 +465,7 @@ static struct chan *parse_chan(char *str data = (*ops->init)(str, device, opts); if(data == NULL) return(NULL); - chan = kmalloc(sizeof(*chan), GFP_KERNEL); + chan = kmalloc(sizeof(*chan), GFP_ATOMIC); if(chan == NULL) return(NULL); *chan = ((struct chan) { .list = LIST_HEAD_INIT(chan->list), .primary = 1, diff --git a/arch/um/kernel/sigio_user.c b/arch/um/kernel/sigio_user.c --- a/arch/um/kernel/sigio_user.c +++ b/arch/um/kernel/sigio_user.c @@ -340,7 +340,7 @@ static int setup_initial_poll(int fd) { struct pollfd *p; - p = um_kmalloc(sizeof(struct pollfd)); + p = um_kmalloc_atomic(sizeof(struct pollfd)); if(p == NULL){ printk("setup_initial_poll : failed to allocate poll\n"); return(-1); |
From: Andrew M. <ak...@os...> - 2005-09-21 19:50:43
|
"Paolo 'Blaisorblade' Giarrusso" <bla...@ya...> wrote: > > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > Following i386, we should maybe refuse trying to fault in pages when we're > doing atomic operations, because to handle the fault we could need to take > already taken spinlocks. > > Also, if we're doing an atomic operation (in the sense of in_atomic()) we're > surely in kernel mode and we're surely going to handle adequately the failed > fault, so it's safe to behave this way. > > Currently, on UML SMP is rarely used, and we don't support PREEMPT, so this is > unlikely to create problems right now, but it might in the future. > That's not really an accurate explanation/understanding of what's going on in there. There's an extremely special-case in the pagefault handlers where we fail the fault if in_atomic(). It's unrelated to spinlocks (spinlocks don't even cause in_atomic() to become true if !CONFIG_PREEMPT). It has to do with kmap_atomic(). There's tricksy code in mm/filemap.c which will fault the target page in by hand and will then take an atomic kmap and will then raise current->preempt_count by hand, so in_atomic() becomes true even if !COFNIG_PREEMPT. So at this stage we expect to be able to do a copy_to/from_user to/from pagecache without taking a fault, because we just faulted the page in by hand. And we're not allowed to take a fault, because we're holding an atomic kmap. But if we _do_ take a fault (extreme memory pressure, racing munmap, etc) then we want to fail the pagefault immediately. The in_atomic() test in x86's do_page_fault() is in fact a message passed into it from filemap.c's kmap_atomic(). It has accidental side-effects, such as making copy_to_user() fail if inside spinlocks when CONFIG_PREEMPT=y. So I think this change is only needed if UML implements kmap_atomic, as in arch/i386/mm/highmem.c, which it surely does not do? > > diff --git a/arch/um/kernel/trap_kern.c b/arch/um/kernel/trap_kern.c > --- a/arch/um/kernel/trap_kern.c > +++ b/arch/um/kernel/trap_kern.c > @@ -40,6 +40,12 @@ int handle_page_fault(unsigned long addr > int err = -EFAULT; > > *code_out = SEGV_MAPERR; > + > + /* If the fault was during atomic operation, don't take the fault, just > + * fail. */ > + if (in_atomic()) > + goto out_nosemaphore; > + > down_read(&mm->mmap_sem); > vma = find_vma(mm, address); > if(!vma) > @@ -90,6 +96,7 @@ survive: > flush_tlb_page(vma, address); > out: > up_read(&mm->mmap_sem); > +out_nosemaphore: > return(err); > > /* |
From: Linus T. <tor...@os...> - 2005-09-21 20:30:26
|
On Wed, 21 Sep 2005, Andrew Morton wrote: > > There's an extremely special-case in the pagefault handlers where we fail > the fault if in_atomic(). It's unrelated to spinlocks (spinlocks don't > even cause in_atomic() to become true if !CONFIG_PREEMPT). There's a few other places where we use those semantics, though. Like "get_futex_value_locked()". > So I think this change is only needed if UML implements kmap_atomic, as in > arch/i386/mm/highmem.c, which it surely does not do? No. Every architecture needs to honor it, or they are screwed. Linus |
From: Blaisorblade <bla...@ya...> - 2005-09-21 20:24:22
|
On Wednesday 21 September 2005 21:49, Andrew Morton wrote: > "Paolo 'Blaisorblade' Giarrusso" <bla...@ya...> wrote: > > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > The in_atomic() test in x86's do_page_fault() is in fact a message passed > into it from filemap.c's kmap_atomic(). Ok, this can be ok, but: > It has accidental side-effects, > such as making copy_to_user() fail if inside spinlocks when > CONFIG_PREEMPT=y. Sorry, but should it ever succeed inside spinlocks? I mean, should it ever call down() inside spinlocks? (We never do down_trylock, and ever if we did the x86 trick, that wouldn't make the whole thing safe at all - they still take the spinlock and potentially sleep. And it's legal only if no spinlock is held). Even if spinlocks don't always trigger in_atomic() - which means that we'd need to have a better fix for this. (Btw, I took the above reasoning from something said, as an aside, on LWN.net kernel page, about the FUTEX deadlock on mm->mmap_sem of ~ 2.6.8 - yes, it wasn't the full truth, but not totally dumb). > So I think this change is only needed if UML implements kmap_atomic, as in > arch/i386/mm/highmem.c, which it surely does not do? NACK, see above. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: Andrew M. <ak...@os...> - 2005-09-21 20:48:12
|
Blaisorblade <bla...@ya...> wrote: > > On Wednesday 21 September 2005 21:49, Andrew Morton wrote: > > "Paolo 'Blaisorblade' Giarrusso" <bla...@ya...> wrote: > > > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > > The in_atomic() test in x86's do_page_fault() is in fact a message passed > > into it from filemap.c's kmap_atomic(). > Ok, this can be ok, but: > > It has accidental side-effects, > > such as making copy_to_user() fail if inside spinlocks when > > CONFIG_PREEMPT=y. > Sorry, but should it ever succeed inside spinlocks? I mean, should it ever > call down() inside spinlocks? (We never do down_trylock, and ever if we did > the x86 trick, that wouldn't make the whole thing safe at all - they still > take the spinlock and potentially sleep. And it's legal only if no spinlock > is held). Not sure what you're asking here. copy_to/from_user() will fail inside spinlock if CONFIG_PREMPT=y and if the copy happens to cause a fault. Otherwise it will succeed inside spinlock, and it won't spew a sleeping-while-atomic warning, because that uses in_atomic() too. It might deadlock if we schedule away and try to retake the same lock. > Even if spinlocks don't always trigger in_atomic() - which means that we'd > need to have a better fix for this. The patch you have will correctly cause copy_*_user()->pagefault to fail the copy if the caller has run inc_preempt_count(). It will not cause copy_*_user()->pagefault to fail inside spinlocks unless UML does inc_preempt_count() in its spinlock implementation. > (Btw, I took the above reasoning from something said, as an aside, on LWN.net > kernel page, about the FUTEX deadlock on mm->mmap_sem of ~ 2.6.8 - yes, it > wasn't the full truth, but not totally dumb). > > > So I think this change is only needed if UML implements kmap_atomic, as in > > arch/i386/mm/highmem.c, which it surely does not do? > NACK, see above. Yup, the patch is needed for the futex code, and for general correct implementation of inc_preempt_count()'s intended effect. |
From: Blaisorblade <bla...@ya...> - 2005-09-22 19:47:08
|
On Wednesday 21 September 2005 22:47, Andrew Morton wrote: > Blaisorblade <bla...@ya...> wrote: > > On Wednesday 21 September 2005 21:49, Andrew Morton wrote: > > > "Paolo 'Blaisorblade' Giarrusso" <bla...@ya...> wrote: > > > > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > > It has accidental side-effects, > > > such as making copy_to_user() fail if inside spinlocks when > > > CONFIG_PREEMPT=3Dy. > > Sorry, but should it ever succeed inside spinlocks? I mean, should it > > ever call down() inside spinlocks? (We never do down_trylock, and ever = if > > we did the x86 trick, that wouldn't make the whole thing safe at all - > > they still take the spinlock and potentially sleep. And it's legal only > > if no spinlock is held). > Not sure what you're asking here. > copy_to/from_user() will fail inside spinlock if CONFIG_PREMPT=3Dy and if= the > copy happens to cause a fault. > Otherwise it will succeed inside spinlock,=20 > and it won't spew a sleeping-while-atomic warning, because that uses > in_atomic() too. > It might deadlock if we schedule away and try to retake=20 > the same lock. Exactly - the point is: is it legal to call copy_from_user() while holding = a=20 spinlock (which is my original question)? Or should copy_from_user try to=20 satisfy the fault, instead of seeing in_atomic() or something similar and=20 fail? =46rom what you say, copy_*_user is called in such a way, but it's=20 deadlock-prone, when CONFIG_PREEMPT is disabled. > > Even if spinlocks don't always trigger in_atomic() - which means that > > we'd need to have a better fix for this. > The patch you have will correctly cause copy_*_user()->pagefault to fail > the copy if the caller has run inc_preempt_count(). It will not cause > copy_*_user()->pagefault to fail inside spinlocks unless UML does > inc_preempt_count() in its spinlock implementation. No, it doesn't. But for that case, we're in the same situation as i386. Consider even that while UML doesn't implement PREEMPT (but we'll fix that,= =20 sooner or later) it does implement HIGHMEM, and answering to your original= =20 question: > So I think this change is only needed if UML implements kmap_atomic, as in > arch/i386/mm/highmem.c, which it surely does not do? Apart that anybody would make sure that atomic kmaps are indeed atomic, UML= =20 doesn't use a "similar implementation" - it verbatim symlinks the i386 file= =20 and builds it (see arch/um/sys-i386/Makefile and=20 arch/um/scripts/Makefile.rules if you want). Hmm, that's something worth considering... at least when debugging is enabl= ed,=20 for debugging purposes. > > NACK, see above. > Yup, the patch is needed for the futex code, > and for general correct=20 > implementation of inc_preempt_count()'s intended effect. Ok, that's enough I guess (and I just read what Linus said). =2D-=20 Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: Andrew M. <ak...@os...> - 2005-09-22 19:58:54
|
Blaisorblade <bla...@ya...> wrote: > > On Wednesday 21 September 2005 22:47, Andrew Morton wrote: > > Blaisorblade <bla...@ya...> wrote: > > > On Wednesday 21 September 2005 21:49, Andrew Morton wrote: > > > > "Paolo 'Blaisorblade' Giarrusso" <bla...@ya...> wrote: > > > > > From: Paolo 'Blaisorblade' Giarrusso <bla...@ya...> > > > > > It has accidental side-effects, > > > > such as making copy_to_user() fail if inside spinlocks when > > > > CONFIG_PREEMPT=y. > > > > Sorry, but should it ever succeed inside spinlocks? I mean, should it > > > ever call down() inside spinlocks? (We never do down_trylock, and ever if > > > we did the x86 trick, that wouldn't make the whole thing safe at all - > > > they still take the spinlock and potentially sleep. And it's legal only > > > if no spinlock is held). > > > Not sure what you're asking here. > > > copy_to/from_user() will fail inside spinlock if CONFIG_PREMPT=y and if the > > copy happens to cause a fault. > > > Otherwise it will succeed inside spinlock, > > and it won't spew a sleeping-while-atomic warning, because that uses > > in_atomic() too. > > > It might deadlock if we schedule away and try to retake > > the same lock. > Exactly - the point is: is it legal to call copy_from_user() while holding a > spinlock (which is my original question)? Or should copy_from_user try to > satisfy the fault, instead of seeing in_atomic() or something similar and > fail? No, it is not legal to call copy_*_user() while holding a spinlock. If CONFIG_PREEMPT=n, do_page_fault() has no way of knowing that the caller holds a spinlock. |
From: Linus T. <tor...@os...> - 2005-09-22 20:54:48
|
On Thu, 22 Sep 2005, Blaisorblade wrote: > > Exactly - the point is: is it legal to call copy_from_user() while holding a > spinlock (which is my original question)? Or should copy_from_user try to > satisfy the fault, instead of seeing in_atomic() or something similar and > fail? It is not legal to call copy_to/from_user() under a spinlock in general. But what _is_ legal to do is something slightly more complex, ie spin_lock(...) inc_preempt_count(); ret = __copy_from_user_inatomic(..) dec_preempt_count(); spin_unlock(); but you have to realize that the copy-from-user will fail if the target is swapped out (so the code that does things like this has to look at the return value, and if the copy didn't copy anything it needs to release the locks and do it all over again without the atomic thing). We don't do it very much, because it gets more complicated and it hasn't historically been legal, but it could in theory be very useful if you hold a lock and want to avoid releasing and re-taking it just to do a user access. Right now the only case where that happens is the futex code, and maybe that's a very special case. But maybe it isn't. Linus |