From: Lennert B. <bu...@gn...> - 2001-12-28 14:15:13
|
Please apply, this makes my job easier.. :) --- process_kern.c.orig Fri Dec 28 15:12:58 2001 +++ process_kern.c Fri Dec 28 15:13:29 2001 @@ -227,7 +227,7 @@ alrm = change_sig(SIGALRM, 0); c = 0; - current = to; + set_current(to); if(write(to->thread.switch_pipe[1], &c, sizeof(c)) != sizeof(c)) panic("write of switch_pipe failed"); |
From: Jeff D. <jd...@ka...> - 2001-12-29 00:01:08
|
bu...@gn... said: > - current = to; > + set_current(to); I think the thing to do here is to put the current_task pointer in the private page and forget about calculating it from the stack. Then, a UML thread can find its task structure no matter what stack it's running on. This also opens up the possibility of kmallocing task structs instead of sticking them on the bottom of the stack, which would let UML processes be 2 * PAGE_SIZE - sizeof(struct task_struct) smaller. There also wouldn't seem to be much point to having current_task around if current is a normal pointer. However, the tracing thread still needs to grovel through its array to do the pid -> task struct mapping. Jeff |
From: Lennert B. <bu...@gn...> - 2001-12-29 22:44:38
|
On Fri, Dec 28, 2001 at 08:21:27PM -0500, Jeff Dike wrote: > I think the thing to do here is to put the current_task pointer in the > private page and forget about calculating it from the stack. Then, a UML > thread can find its task structure no matter what stack it's running on. How does the attached patch look? It boots, and seems to work OK with my RH filesystem. - I need a way to tell a newly fork()ed process its 'current', so I've extended the OP_FORK structure with a task_struct pointer, and made attach_process poke that into the child's private page. This is somewhat hackish, but I'm not sure it can be done in a cleaner way. - I'm not happy about find_task_by_external_pid, it shouldn't be needed (We have a per-cpu tracing thread and can keep the last scheduled task in a per-cpu array a la cpu_task; this would get rid of task_to_processor as well). Actually, having written the patch I'm not so sure that sticking current in the private page is a good idea anymore. The private page is evil. (Can we discuss this on IRC?) > There also wouldn't seem to be much point to having current_task around if > current is a normal pointer. I tried replacing current_task by current, but got some undefined refs from the i386 sysdep stuff. Haven't really looked into it yet. > However, the tracing thread still needs to grovel through its array to do > the pid -> task struct mapping. You mean cpu_task? cheers, Lennert P.S. unmap_fin.o(.data) is zero bytes, why did link.ld pull it into .thread_private ?? --- linux-2.4.17-1um-up/include/asm-um/current.h.orig Thu Dec 27 23:49:27 2001 +++ linux-2.4.17-1um-up/include/asm-um/current.h Fri Dec 28 10:10:34 2001 @@ -3,24 +3,10 @@ #ifndef __ASSEMBLY__ -#include "linux/config.h" - struct task_struct; - -#ifdef CONFIG_SMP -extern struct task_struct *current_task[]; - -#define CURRENT_TASK(dummy) (((unsigned long) &dummy) & (PAGE_MASK << 2)) - -#define current ({ int dummy; (struct task_struct *) CURRENT_TASK(dummy); }) - -#else - extern struct task_struct *current_task; #define current current_task -#endif - #endif /* __ASSEMBLY__ */ #endif --- linux-2.4.17-1um-up/include/asm-um/processor-generic.h.orig Fri Dec 28 21:28:37 2001 +++ linux-2.4.17-1um-up/include/asm-um/processor-generic.h Fri Dec 28 21:28:22 2001 @@ -55,7 +55,11 @@ union { struct { int pid; - } fork, exec; + struct task_struct *new_task; + } fork; + struct { + int pid; + } exec; struct { int (*proc)(void *); void *arg; --- linux-2.4.17-1um-up/arch/um/include/kern_util.h.orig Fri Dec 28 17:45:58 2001 +++ linux-2.4.17-1um-up/arch/um/include/kern_util.h Fri Dec 28 21:11:33 2001 @@ -72,6 +72,8 @@ extern void idle_timer(void); extern unsigned int do_IRQ(int irq, int user_mode); extern int external_pid(void); +extern struct task_struct *find_task_by_external_pid(int pid); +extern int task_to_processor(struct task_struct *task); extern void boot_timer_handler(int sig); extern void interrupt_end(void); extern void tracing_reboot(void); @@ -89,7 +91,7 @@ extern void finish_fork_handler(int sig); extern int user_context(unsigned long sp); extern void timer_irq(int user_mode); -extern void set_repeat_syscall(int again); +extern void set_repeat_syscall(struct task_struct *task, int again); extern int get_repeat_syscall(void *t); extern void force_flush_all(void); extern void unprotect_stack(unsigned long stack); --- linux-2.4.17-1um-up/arch/um/include/user_util.h.orig Fri Dec 28 21:30:16 2001 +++ linux-2.4.17-1um-up/arch/um/include/user_util.h Fri Dec 28 21:38:25 2001 @@ -98,7 +98,7 @@ extern void init_new_thread(void *sig_stack, void (*usr1_handler)(int)); extern void start_exec(int old_pid, int new_pid, int *error, struct sys_pt_regs *regs); -extern void attach_process(int pid); +extern void attach_process(int pid, void *task); extern void calc_sigframe_size(void); extern int fork_tramp(void *sig_stack); extern void do_exec(int old_pid, int new_pid); --- linux-2.4.17-1um-up/arch/um/kernel/current.c.orig Thu Dec 27 23:49:37 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/current.c Fri Dec 28 10:10:22 2001 @@ -6,11 +6,8 @@ #include "linux/config.h" #include "linux/sched.h" -#ifndef CONFIG_SMP +struct task_struct *current_task __attribute__((section(".thread_private"))); -struct task_struct *current_task; - -#endif /* * Overrides for Emacs so that we follow Linus's tabbing style. --- linux-2.4.17-1um-up/arch/um/kernel/tlb.c.orig Thu Dec 27 23:56:15 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/tlb.c Fri Dec 28 21:59:14 2001 @@ -28,8 +28,8 @@ if((current->thread.extern_pid != -1) && (current->thread.extern_pid != getpid())) - panic("fix_range fixing wrong address space, current = 0x%p", - current); + panic("fix_range fixing wrong address space, current = 0x%p, pid = %i, expected pid = %i", + current, getpid(), current->thread.extern_pid); if(mm == NULL) return; for(addr=start_addr;addr<end_addr;){ if(addr == TASK_SIZE){ --- linux-2.4.17-1um-up/arch/um/kernel/process_kern.c.orig Fri Dec 28 00:02:29 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/process_kern.c Fri Dec 28 21:29:27 2001 @@ -57,9 +57,32 @@ return(current->thread.extern_pid); } -void set_current(void *t) +struct task_struct *find_task_by_external_pid(int pid) { - current = t; + struct task_struct *p; + + read_lock(&tasklist_lock); + for_each_task (p) { + if (p->thread.extern_pid == pid) { + read_unlock(&tasklist_lock); + return p; + } + } + read_unlock(&tasklist_lock); + +#if 0 + printk("Can't find task for pid %i\n", pid); +#endif + return &init_task_union.task; +} + +int task_to_processor(struct task_struct *task) +{ +#ifdef CONFIG_SMP + return task->processor; +#else + return 0; +#endif } void free_stack(unsigned long stack) @@ -107,7 +130,7 @@ int (*fn)(void *), pid; void *arg; - task = t; + current = task = t; trace_myself(); init_new_thread(NULL, NULL); pid = getpid(); @@ -227,7 +250,6 @@ alrm = change_sig(SIGALRM, 0); c = 0; - current = to; if(write(to->thread.switch_pipe[1], &c, sizeof(c)) != sizeof(c)) panic("write of switch_pipe failed"); @@ -316,6 +338,7 @@ current->thread.request.op = OP_FORK; current->thread.request.u.fork.pid = new_pid; + current->thread.request.u.fork.new_task = p; usr1_pid(getpid()); } current->need_resched = 1; @@ -372,7 +395,7 @@ thread->request.u.thread.new_pid = pid; break; case OP_FORK: - attach_process(thread->request.u.fork.pid); + attach_process(thread->request.u.fork.pid, thread->request.u.fork.new_task); break; case OP_CB: (*thread->request.u.cb.proc)(thread->request.u.cb.arg); @@ -561,9 +584,9 @@ return(task->thread.repeat_syscall); } -void set_repeat_syscall(int again) +void set_repeat_syscall(struct task_struct *task, int again) { - current->thread.repeat_syscall = again; + task->thread.repeat_syscall = again; } void dump_thread(struct pt_regs *regs, struct user *u) --- linux-2.4.17-1um-up/arch/um/kernel/um_arch.c.orig Fri Dec 28 00:11:49 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/um_arch.c Fri Dec 28 17:45:04 2001 @@ -326,9 +326,6 @@ init_task.thread.kernel_stack = (unsigned long) &init_task + 2 * PAGE_SIZE; -#ifndef CONFIG_SMP - current = &init_task; -#endif task_protections((unsigned long) &init_task); sp = (void *) init_task.thread.kernel_stack + 2 * PAGE_SIZE - sizeof(unsigned long); --- linux-2.4.17-1um-up/arch/um/kernel/process.c.orig Fri Dec 28 14:42:07 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/process.c Fri Dec 28 21:51:15 2001 @@ -206,12 +206,16 @@ return(n); } -void attach_process(int pid) +void attach_process(int pid, void *task) { + extern struct task_struct *current_task; + if((ptrace(PTRACE_ATTACH, pid, 0, 0) < 0) || (ptrace(PTRACE_CONT, pid, 0, 0) < 0)) tracer_panic("OP_FORK failed to attach pid"); wait_for_stop(pid, SIGSTOP, PTRACE_CONT); + if(ptrace(PTRACE_POKEDATA, pid, (void *)¤t_task, task) < 0) + tracer_panic("OP_FORK failed to write child's current"); if(ptrace(PTRACE_CONT, pid, 0, 0) < 0) tracer_panic("OP_FORK failed to continue process"); } --- linux-2.4.17-1um-up/arch/um/kernel/trap_user.c.orig Fri Dec 28 15:00:20 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/trap_user.c Fri Dec 28 17:47:49 2001 @@ -209,14 +209,10 @@ eip = ptrace(PTRACE_PEEKUSER, pid, UM_IP_OFFSET, 0); signal_record[signal_index].addr = eip; signal_record[signal_index++].signal = sig; -#ifdef CONFIG_SMP /* XXX user code can't refer to CONFIG_* */ - proc_id = pid_to_processor_id(pid); - task = cpu_tasks[proc_id].task; -#else - proc_id = 0; - task = get_current_task(); -#endif + task = find_task_by_external_pid(pid); + proc_id = task_to_processor(task); tracing = is_tracing(task); + switch(sig){ case SIGUSR1: sig = 0; --- linux-2.4.17-1um-up/arch/um/kernel/syscall_user.c.orig Fri Dec 28 21:00:37 2001 +++ linux-2.4.17-1um-up/arch/um/kernel/syscall_user.c Fri Dec 28 21:15:16 2001 @@ -73,7 +73,7 @@ (result == -ERESTARTNOINTR)) do_signal(&result, &again); UM_SET_SYSCALL_RETURN(regs, result); - set_repeat_syscall(again); + set_repeat_syscall(get_current_task(), again); syscall_trace(); syscall_record[index].result = result; gettimeofday(&syscall_record[index].end, NULL); @@ -95,7 +95,7 @@ tracing = 1; again = get_repeat_syscall(task); - set_repeat_syscall(0); + set_repeat_syscall(task, 0); restore = get_restore_regs(task); regs = process_state(task); if(restore){ --- linux-2.4.17-1um-up/arch/um/link.ld.in.orig Thu Dec 27 23:33:46 2001 +++ linux-2.4.17-1um-up/arch/um/link.ld.in Fri Dec 28 14:06:57 2001 @@ -12,7 +12,7 @@ __start_thread_private = .; errno = .; . += 4; - arch/um/kernel/unmap_fin.o (.data) + *(.thread_private); __end_thread_private = .; } _foo = .; |
From: Jeff D. <jd...@ka...> - 2001-12-30 05:01:28
|
bu...@gn... said: > - I need a way to tell a newly fork()ed process its 'current', so I've > extended the OP_FORK structure with a task_struct pointer, and made > attach_process poke that into the child's private page. This is somewhat > hackish, but I'm not sure it can be done in a cleaner way. I'm doing this by passing a pointer to a structure on the parent's stack as the clone argument. The structure contains the child's current and its stack. This is safe because the parent waits for the child to set itself up before returning from copy_thread. > Actually, having written the patch I'm not so sure that sticking > current in the private page is a good idea anymore. The private page > is evil. You're right. I forgot that there can be threads that share mms. That completely breaks the private page idea. So, we have to go back to current being calculated from the stack. > - I'm not happy about find_task_by_external_pid, it shouldn't be needed > (We have a per-cpu tracing thread and can keep the last scheduled task > in a per-cpu array a la cpu_task; this would get rid of task_to_processor > as well). We don't have a per-cpu tracing thread, we have one tracing thread for the whole thing. If we did, there would be no need for the cpu_task array. Each tracing thread would keep the current task in a local. What was the matter with cpu_task anyway? It's a lot faster than searching the whole task list. > P.S. unmap_fin.o(.data) is zero bytes, why did link.ld pull it into > .thread_private ?? I did that in case unmap_fin.o ever acquired data. I'm reworking what I did and incorporating as much of your stuff as makes sense. Watch for the next patch. Jeff |
From: Lennert B. <bu...@gn...> - 2001-12-30 16:38:59
|
On Sun, Dec 30, 2001 at 01:21:47AM -0500, Jeff Dike wrote: > > - I need a way to tell a newly fork()ed process its 'current', so I've > > extended the OP_FORK structure with a task_struct pointer, and made > > attach_process poke that into the child's private page. This is somewhat > > hackish, but I'm not sure it can be done in a cleaner way. > > I'm doing this by passing a pointer to a structure on the parent's stack > as the clone argument. The structure contains the child's current and its > stack. This is safe because the parent waits for the child to set itself > up before returning from copy_thread. OK, that sounds sane. I must have missed that bit of code. > > Actually, having written the patch I'm not so sure that sticking > > current in the private page is a good idea anymore. The private page > > is evil. > > You're right. I forgot that there can be threads that share mms. That > completely breaks the private page idea. So, we have to go back to current > being calculated from the stack. I'm all in favor of killing the private page altogether. There has to be a better way to fix the 'errno = 0' problem. > > - I'm not happy about find_task_by_external_pid, it shouldn't be needed > > (We have a per-cpu tracing thread and can keep the last scheduled task > > in a per-cpu array a la cpu_task; this would get rid of task_to_processor > > as well). > > We don't have a per-cpu tracing thread, we have one tracing thread for the > whole thing. If we did, there would be no need for the cpu_task array. Each > tracing thread would keep the current task in a local. Whoops *-| Are you still planning on killing the tracing thread? > What was the matter with cpu_task anyway? It's a lot faster than searching > the whole task list. Yeah, I deferred that for later. I was having lots of trouble with tasks having a wrong idea of current after I made it per-process so I just ripped all global current_task/cpu_task stuff to make killing references easier. > > P.S. unmap_fin.o(.data) is zero bytes, why did link.ld pull it into > > .thread_private ?? > > I did that in case unmap_fin.o ever acquired data. Would that have to be 'unshared'? > I'm reworking what I did and incorporating as much of your stuff as makes > sense. Watch for the next patch. I'm not particularly attached to this patch :-) I don't really mind seeing it die a painful death, as it has taught me quite a bit about uml internals. You really don't have to feel guilty for sending me down the wrong path ;-) cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2001-12-30 18:44:05
|
bu...@gn... said: > I'm all in favor of killing the private page altogether. There has to > be a better way to fix the 'errno = 0' problem. libc must have a way of giving threads their own private errnos. Otherwise UML needs locking around every system call, which I really, really, really don't want to do. For now, it's a good thing. It at least gives the tracing thread a different errno from the rest of the kernel, which means that when you step through code that contains a failing system call, errno doesn't get munged back to zero as you step. > Are you still planning on killing the tracing thread? Yeah, that's why I don't want to give it any more jobs. > Yeah, I deferred that for later. I was having lots of trouble with > tasks having a wrong idea of current after I made it per-process so I > just ripped all global current_task/cpu_task stuff to make killing > references easier. OK, I put cpu_tasks back and it seems healthy. > Would that have to be 'unshared'? The thing that matters is that it shouldn't be unmapped. That code (and maybe future data) is only used at boot time, where there's only one thread running anyway. > I'm not particularly attached to this patch :-) Good :-) > I don't really mind > seeing it die a painful death, as it has taught me quite a bit about > uml internals. You really don't have to feel guilty for sending me > down the wrong path ;-) Heh... I was happily going down that same path. I was chasing a bug when your "the private page has to die" message came in, and I thought to myself "Lennert is full of shit" and went back to debugging. About 5 minutes later, I saw vfork on a stack of a messed up process, and I realized that the private page wasn't as private as I thought and that you were right. Jeff |
From: Lennert B. <bu...@gn...> - 2001-12-31 20:56:07
|
On Sun, Dec 30, 2001 at 03:04:31PM -0500, Jeff Dike wrote: > > I'm all in favor of killing the private page altogether. There has to > > be a better way to fix the 'errno = 0' problem. > > libc must have a way of giving threads their own private errnos. Otherwise > UML needs locking around every system call, which I really, really, really > don't want to do. Me neither. I think pthreads uses the local descriptor table and %fs/%gs for thread-private stuff, but I'm not sure. Anyway, we can't use that solution for uml anyway, since that would involve switching LDTs around on uml kernel entry/exit, which is arguably worse than locking every syscall. > For now, it's a good thing. It at least gives the tracing thread a different > errno from the rest of the kernel, which means that when you step through > code that contains a failing system call, errno doesn't get munged back to > zero as you step. Yes, I agree that what it does is needed, I'd just like it to be done differently. The fact that 1023/1024 of the page is unused isn't really what bothers me, and neither that memory usage this way is O(threads) instead of O(cpus), but more that it breaks 'unified addressing space' (address space looking different depending on what angle you look from), and the fact that the page is plain inaccessible to other tasks, not even under a different address. It just gives me a plain nasty feeling. A fast hard_smp_processor_id() / are_we_in_tracing_thread() might just be all we need. > > Are you still planning on killing the tracing thread? > > Yeah, that's why I don't want to give it any more jobs. I don't follow the mailing list (closely), do you perhaps have a pointer to some details on how you would like to do this? > > Would that have to be 'unshared'? > > The thing that matters is that it shouldn't be unmapped. That code (and maybe > future data) is only used at boot time, where there's only one thread running > anyway. OK, that makes sense. > I was chasing a bug when > your "the private page has to die" message came in, and I thought to myself > "Lennert is full of shit" and went back to debugging. Hmm, well, I hope you're not holding any personal grudges against me now.. :) cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-01 01:11:43
|
bu...@gn... said: > Anyway, we can't use that solution for uml anyway, since that would > involve switching LDTs around on uml kernel entry/exit, which is > arguably worse than locking every syscall. I don't think so. There are probably > 1 host system calls per uml kernel entry, plus there are only a few ways of entering the uml kernel. If we had to lock every host system call, we'd have to know whenever we were making a system call, including when calling into libc. That's a nightmare. Doing some special stuff on uml kernel entry and exit is nothing in comparison. > Yes, I agree that what it does is needed, I'd just like it to be done > differently. Got any ideas? One thing I've contemplated is separating the tracing thread into a completely separate address space, maybe as a different binary, which would make it sort of a UML loader. Con - I'm planning on getting rid of the tracing thread, so it's not clear that the effort would be well-spent. Pro - The tracing thread may be needed for gdb to work - I'm not sure about this, but if it does stay around, the effort may be worth it. Con - It needs access to kernel memory for some of the things it does Pro - Those things don't need to be done by the tracing thread, so if it only did system call interception, then it probably wouldn't need access to UML memory. Pro - Structurally, it would be nice to separate the tracing thread from everything else. > The fact that 1023/1024 of the page is unused isn't > really what bothers me, and neither that memory usage this way is > O(threads) instead of O(cpus), but more that it breaks 'unified > addressing space' (address space looking different depending on what > angle you look from), and the fact that the page is plain inaccessible > to other tasks, not even under a different address. It just gives me > a plain nasty feeling. This is all true, but I was looking at the possibility of being able to set current once for each thread, never change it again, and have the thread be able to refer to it, no matter what stack it is running on. The ability to have multiple threads refer to the same address and get different data requires breaking the unified address space (which is broken anyway since UML process address spaces are completely different in general anyway). I really like that idea, and if I can resurrect it somehow, I will, unless you pursuade me of its evilness. > A fast hard_smp_processor_id() / are_we_in_tracing_thread() might just > be all we need. Yeah, that's what we're going with now. > I don't follow the mailing list (closely), do you perhaps have a > pointer to some details on how you would like to do this? I'm not sure if I've ever written the whole thing down in one place. The tracing thread needs to have all of its jobs taken away from it, except for system call interception and turning tracing on and off. Then the host needs a new system call interception path which just delivers a signal to the process. The handler for that signal would essentially be syscall_handler. Tracing would be disable before entering the handler, and re-enabled either by the handler or by sys_sigreturn when the handler exits. That would turn four context switches per system call into a signal delivery plus signal return per system call. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-02 10:46:01
|
On Mon, Dec 31, 2001 at 09:32:08PM -0500, Jeff Dike wrote: > > Yes, I agree that what it does is needed, I'd just like it to be done > > differently. > > Got any ideas? Something like this? Or am I completely missing your point? int errnos[NR_CPUS+1]; int *__errno_location() { if (we_are_currently_tracing()) return &errnos[NR_CPUS]; return &errnos[hard_smp_processor_id()]; } > The ability > to have multiple threads refer to the same address and get different data > requires breaking the unified address space (which is broken anyway since > UML process address spaces are completely different in general anyway). > > I really like that idea, and if I can resurrect it somehow, I will, unless > you pursuade me of its evilness. Breaking threads-share-LDT and thereby breaking pthreads, bind, jvm and all the rest under uml :~( Plus of course the complexities of having to implement LDT and TLB (well, mmap) flush IPIs, which would make getting SMP to work MUCH harder. (I must say I like the idea of leaving flush IPIs to the host kernel much better.) cheers, Lennert |
From: Lennert B. <bu...@gn...> - 2002-01-06 14:31:00
|
On Mon, Dec 31, 2001 at 09:32:08PM -0500, Jeff Dike wrote: > > Anyway, we can't use that solution for uml anyway, since that would > > involve switching LDTs around on uml kernel entry/exit, which is > > arguably worse than locking every syscall. > > I don't think so. There are probably > 1 host system calls per uml kernel > entry, plus there are only a few ways of entering the uml kernel. If we > had to lock every host system call, we'd have to know whenever we were making > a system call, including when calling into libc. That's a nightmare. Doing > some special stuff on uml kernel entry and exit is nothing in comparison. For the record, on my Athlon 750 a modify_ldt plus segment register reload is about 570 cycles. (And I'm not saying either is a good solution :) > One thing I've contemplated is separating the tracing thread into a completely > separate address space, maybe as a different binary, which would make it > sort of a UML loader. I don't really think the advantages would outweigh the disadvantages here, at least not until some tasks are split out. > That would turn four context switches per system call into a signal delivery > plus signal return per system call. What is the main switching bottleneck right now? I could imagine things like forward_interrupts being high on the list. cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-06 18:06:24
|
bu...@gn... said: > For the record, on my Athlon 750 a modify_ldt plus segment register > reload is about 570 cycles. Ouch. That will really hurt any attempt to get UML system calls close to native ones in terms of speed. > (And I'm not saying either is a good solution :) No. > What is the main switching bottleneck right now? I could imagine > things like forward_interrupts being high on the list. That's just a bunch of F_SETOWNs. I think that's bookkeepping, so that shouldn't be a big deal. I got the tracing thread out of the picture, but I may have added an extra context switch, at least some of the time. The old context switching did this: prev --> tracing thread --> next with two host context switches. The current switching is supposed to do this: prev --> next one host context switch with prev waking up next directly by writing into its pipe. However, I bet it does this a good part of the time: prev next write to next's pipe wake up and start doing stuff preempt next and sleep by reading own pipe continue doing stuff which is three context switches, i.e. one more than the scheme which involved the tracing thread. Also, if UML is swapping, the memory switch is still pretty heavy-weight, with a scan of the address space and remapping anything that had changed while it was asleep. However, nothing beats real numbers. I'd be interested in seeing some oprofile numbers for UML. Jeff |
From: Alex P. <al...@pi...> - 2002-01-06 18:19:28
|
On Sat, 5 Jan 2002, Jeff Dike wrote: > However, I bet it does this a good part of the time: > > prev next > > write to next's pipe > wake up and start doing stuff > preempt next and sleep > by reading own pipe > continue doing stuff > > which is three context switches, i.e. one more than the scheme which involved > the tracing thread. Bear with me, a silly question: Would it be possible to use semop to both wakeup the other process and put ourselves to sleep? (Semop can both increment one sem and decrement another one in one syscall). I was always wondering why UML uses pipes for IPC instead of shm/semaphors. If you could shed some light, I'll really appreciate it. Thanks -alex |
From: Jeff D. <jd...@ka...> - 2002-01-06 19:50:31
|
al...@pi... said: > I was always wondering why UML uses pipes for IPC instead of shm/ > semaphors. If you could shed some light, I'll really appreciate it. Because it never occurred to me. If I can increment and decrement arbtrary semaphores in one system call, that would definitely get rid of the scheduling oddities. And I was really proud of having thought of using pipes :-) Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-06 20:08:43
|
On Sat, Jan 05, 2002 at 02:49:20PM -0500, Jeff Dike wrote: > > I was always wondering why UML uses pipes for IPC instead of shm/ > > semaphors. If you could shed some light, I'll really appreciate it. > > Because it never occurred to me. I think people on l-k mostly agreed that pipes are the way to go for lightweight user-level semaphores. I'm not sure why they didn't think much of SysV sems, > If I can increment and decrement arbtrary semaphores in one system call, > that would definitely get rid of the scheduling oddities. Are you sure that these scheduler oddities are happening right now? I think if read/write on pipe gives bad scheduler behavior, that is a scheduler bug (pipelines have pretty much the same model). cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-06 22:10:15
|
bu...@gn... said: > I think people on l-k mostly agreed that pipes are the way to go for > lightweight user-level semaphores. I'm not sure why they didn't think > much of SysV sems, Hmmm, if someone could dig up a reference, that would be good. > Are you sure that these scheduler oddities are happening right now? I think so, but I have no idea how often. My data is old, though. The very first context switcher I wrote for UML had a bug which caused a hang if this scheduler oddity happened. And it happened often enough that UML hung reliably on boot. > I think if read/write on pipe gives bad scheduler behavior, that is a > scheduler bug (pipelines have pretty much the same model). And I think it's possible that the scheduler runs the reader immediately after the write on purpose. I have a hazy recollection that this was done so that the written data would still be in L1 cache when the reader ran. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-07 21:36:39
|
On Sat, Jan 05, 2002 at 05:12:27PM -0500, Jeff Dike wrote: > > I think people on l-k mostly agreed that pipes are the way to go for > > lightweight user-level semaphores. I'm not sure why they didn't think > > much of SysV sems, > > Hmmm, if someone could dig up a reference, that would be good. I'll ask around. > > I think if read/write on pipe gives bad scheduler behavior, that is a > > scheduler bug (pipelines have pretty much the same model). > > And I think it's possible that the scheduler runs the reader immediately > after the write on purpose. I have a hazy recollection that this was done > so that the written data would still be in L1 cache when the reader ran. Grep for wake_up in fs/pipe.c. Synchronous wakeups (other-end-of-the-pipe-runs- first) is only done if - we are reading and there is no data left to read; signal all writers - we are writing and the pipe buffer is full; signal all readers In all other cases, 'standard' wakeups are done (which probably means that if the woken up process has a higher scheduler priority, the current process is preempted, otherwise the current process keeps running). cheers, Lennert |
From: Shane K. <sh...@ti...> - 2002-01-06 22:27:10
|
On 2002-01-06 15:06:31 -0500, Lennert Buytenhek wrote: > > On Sat, Jan 05, 2002 at 02:49:20PM -0500, Jeff Dike wrote: > > > > I was always wondering why UML uses pipes for IPC instead of shm/ > > > semaphors. If you could shed some light, I'll really appreciate > > > it. > > > > Because it never occurred to me. > > I think people on l-k mostly agreed that pipes are the way to go for > lightweight user-level semaphores. I'm not sure why they didn't think > much of SysV sems, In my experience (as a user-level person), pipes have several advantages, all related to being able to use poll()/select() on them. These include being able to wait on more than one, being able to wait on I/O at the same time, and being able to specify a timeout. -- Shane Carpe Diem |
From: Jeff D. <jd...@ka...> - 2002-01-07 00:25:46
|
sh...@ti... said: > In my experience (as a user-level person), pipes have several > advantages, all related to being able to use poll()/select() on them. > These include being able to wait on more than one, being able to wait > on I/O at the same time, and being able to specify a timeout. None of these matter in the situation we are talking about. The current UML context switcher is basically this: write(to->thread.switch_pipe[1], &c, sizeof(c); read(from->thread.switch_pipe[0], &c, sizeof(c); The next process is sleeping in that read. The write wakes it up and it goes about its business. Meanwhile, the outgoing process goes into that read and waits to be woken up by some other process going to sleep and writing a byte into its pipe. So, there are no issues with being able to monitor multiple descriptors at once or being able to set timeouts. Jeff |
From: Shane K. <sh...@ti...> - 2002-01-07 11:59:45
|
On 2002-01-06 19:27:52 -0500, Jeff Dike wrote: > sh...@ti... said: > > In my experience (as a user-level person), pipes have several > > advantages, all related to being able to use poll()/select() on > > them. These include being able to wait on more than one, being able > > to wait on I/O at the same time, and being able to specify a > > timeout. > > None of these matter in the situation we are talking about. > > The current UML context switcher is basically this: > > write(to->thread.switch_pipe[1], &c, sizeof(c); > read(from->thread.switch_pipe[0], &c, sizeof(c); > > The next process is sleeping in that read. The write wakes it up and > it goes about its business. Meanwhile, the outgoing process goes into > that read and waits to be woken up by some other process going to > sleep and writing a byte into its pipe. Apologies for being lazy and not Reading The Fine Code, but... Does each process have it's own pipe (soon to be semaphore)? If not, it seems like you'd get scheduler starvation (which I in fact saw in a version of UML here a few months ago). I.e. the process that writes then reads immediately and nothing else gets to go. Not all the time, of course, but often. Seems like UML must maintain a run queue of some sort. > So, there are no issues with being able to monitor multiple > descriptors at once or being able to set timeouts. Agreed. -- Shane Carpe Diem |
From: Adam H. <ad...@do...> - 2002-01-07 12:49:32
|
On Mon, 7 Jan 2002, Shane Kerr wrote: > Does each process have it's own pipe (soon to be semaphore)? If not, > it seems like you'd get scheduler starvation (which I in fact saw in a > version of UML here a few months ago). I.e. the process that writes > then reads immediately and nothing else gets to go. Not all the time, > of course, but often. Each process has a pipe that is read from. When one process wants to switch to another, the current process writes the command to the pipe of the new process, then sleeps waiting on a read on it's own pipe. The new process, which is sleeping on reading of its own pipe, now has data, and wakes up, to receive the command. |
From: Lennert B. <bu...@gn...> - 2002-01-06 19:09:38
|
On Sat, Jan 05, 2002 at 01:08:31PM -0500, Jeff Dike wrote: > > What is the main switching bottleneck right now? I could imagine > > things like forward_interrupts being high on the list. > > That's just a bunch of F_SETOWNs. I think that's bookkeepping, so that > shouldn't be a big deal. Again on my Athlon 750, a null syscall (getpid) is ~290 cycles. F_SETOWN to current PID is about 370. Context switch seems to be about 2450 cycles with different MMs, and 2350 with shared MMs (threads). > I got the tracing thread out of the picture, but I may have added an extra > context switch, at least some of the time. The old context switching did > this: > prev --> tracing thread --> next > > with two host context switches. How did the tracing thread wake up 'next'? If it did it by cont'ing it and then going to sleep itself, you might have had the same Issue. (Not sure how prev would wake up the tracing thread, but probably by sending a signal to itself, in which case it wouldn't be an issue.) > However, nothing beats real numbers. I'd be interested in seeing some > oprofile numbers for UML. Me too.. cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-06 20:01:11
|
bu...@gn... said: > Again on my Athlon 750, a null syscall (getpid) is ~290 cycles. > F_SETOWN to current PID is about 370. So, F_SETOWN adds 80 cycles to the null syscall. That doesn't seem like too much. > Context switch seems to be > about 2450 cycles with different MMs, and 2350 with shared MMs > (threads). I would have thought that shared mms would have made a larger difference than that... > How did the tracing thread wake up 'next'? If it did it by cont'ing > it and then going to sleep itself, you might have had the same Issue. You're right. So, the old mechanism would look like this at least some of the time: prev --> tracing thread --> next --> tracing thread sleeps --> next which is 4 host context switches. > (Not sure how prev would wake up the tracing thread, but probably by > sending a signal to itself, in which case it wouldn't be an issue.) Yeah, it did it by sending a SIGUSR1 to itself. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-06 20:14:30
|
On Sat, Jan 05, 2002 at 03:03:01PM -0500, Jeff Dike wrote: > > Again on my Athlon 750, a null syscall (getpid) is ~290 cycles. > > F_SETOWN to current PID is about 370. > > So, F_SETOWN adds 80 cycles to the null syscall. That doesn't seem like > too much. Nope. How many interrupts do we forward typically? 5 or so? > > Context switch seems to be > > about 2450 cycles with different MMs, and 2350 with shared MMs > > (threads). > > I would have thought that shared mms would have made a larger difference > than that... This is the quick test program I used. I figured it doesn't really matter whether parent-continues-running or woken-up-task-runs-first is used on pipe write, since if the woken up task runs first, it will most likely run to completion (i.e complete read, issue write, block on read) instead of being preempted. Do you see any obvious mistakes in the test? cheers, Lennert #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <pthread.h> #include <unistd.h> int p[2]; int q[2]; void *a(void *x) { unsigned char c; unsigned long i; for (i=0;i<2000000;i++) { write(p[1], &c, 1); read(q[0], &c, 1); } return NULL; } void *b(void *x) { unsigned char c; unsigned long i; for (i=0;i<2000000;i++) { read(p[0], &c, 1); write(q[1], &c, 1); } return NULL; } int main() { pthread_t X; int f; pipe(p); pipe(q); #if 1 f = fork(); if (f < 0) { perror("fork"); exit(-1); } if (f) a(NULL); else b(NULL); #else pthread_create(&X, NULL, b, NULL); a(NULL); #endif return 0; } |
From: Jeff D. <jd...@ka...> - 2002-01-06 22:02:53
|
bu...@gn... said: > How many interrupts do we forward typically? 5 or so? Pretty much. With my UMLs 7 or 8 is typical. > Do you see any obvious mistakes in the test? For uniformity, using clone !CLONE_VM/CLONE_VM instead of fork/pthread_create would be better. Throwing pthreads in adds a variable which doesn't need to be there. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-07 21:31:09
|
On Sat, Jan 05, 2002 at 05:05:04PM -0500, Jeff Dike wrote: > > How many interrupts do we forward typically? 5 or so? > > Pretty much. With my UMLs 7 or 8 is typical. That would be about 3 kilocycles.. more than the cost of a host context switch. Syscalls add up pretty quickly I guess. > > Do you see any obvious mistakes in the test? > > For uniformity, using clone !CLONE_VM/CLONE_VM instead of fork/pthread_create > would be better. Throwing pthreads in adds a variable which doesn't need > to be there. Sure, but I figured it wouldn't matter much, as pthreads won't run during the loops. I'll retry with clone, and report if it makes a radical difference. cheers, Lennert |