From: Jeff D. <jd...@ka...> - 2001-12-31 03:25:09
|
bu...@gn... said: > Last time I looked it didn't (admittedly a long while ago). When I was chasing the gdb pthread hang in UML a while back, gdb noticed when new threads happened and printed out little messages about it. It would be very cool to induce it to do the same thing when the extra idle threads start up. Jeff |
From: Jeff D. <jd...@ka...> - 2002-01-02 15:12:40
|
bu...@gn... said: > int errnos[NR_CPUS+1]; > int *__errno_location() { ... > } Hmmm, I didn't think of trying that. Does it actually work? > Breaking threads-share-LDT and thereby breaking pthreads, bind, jvm > and all the rest under uml :~( Yeah, the ldt is the thing that makes this all not work. However, not all arches have ldts, so the private page scheme might work there. Jeff |
From: Jeff D. <jd...@ka...> - 2002-01-07 15:07:09
|
sh...@ti... said: > Does each process have it's own pipe (soon to be semaphore)? Yes. > If not, > it seems like you'd get scheduler starvation (which I in fact saw in a > version of UML here a few months ago). I.e. the process that writes > then reads immediately and nothing else gets to go. The only way for that to happen is for something else to have switched back to it. Jeff |
From: Jeff D. <jd...@ka...> - 2002-01-07 18:15:44
|
bu...@gn... said: > Are you sure that these scheduler oddities are happening right now? > I think if read/write on pipe gives bad scheduler behavior, that is a > scheduler bug (pipelines have pretty much the same model). I did an experiment to see how often the reader of a pipe runs immediately (i.e. before the writer returns from the write). The code is below. Two threads run in the same address space scheduling each other. Mis-schedules are detected by a local counter (expect) being different from a global one (counter) because the other thread has bumped it. It turns out that this happens a significant amount of the time, but not all. On both 2.2.20 and 2.4.17, I see about 40% mis-scheds: 2.2.20: % for i in 1 2 3 4 5 6 7 8 9 10; do ./a.out; done | awk '{print; n++; total += $3; mis += $6} END { print "Ave mis-scheds " mis/n " " mis * 100 / total "%" }' scheds = 200000 mis-scheds = 95805 scheds = 200000 mis-scheds = 74020 scheds = 200000 mis-scheds = 88586 scheds = 200000 mis-scheds = 86637 scheds = 200000 mis-scheds = 81900 scheds = 200000 mis-scheds = 86731 scheds = 200000 mis-scheds = 93272 scheds = 200000 mis-scheds = 91755 scheds = 200000 mis-scheds = 74975 scheds = 200000 mis-scheds = 97391 Ave mis-scheds 87107.2 43.5536% 2.4.17 : % for i in 1 2 3 4 5 6 7 8 9 10; do ./a.out; done | awk '{print; n++; total += $3; mis += $6} END { print "Ave mis-scheds " mis/n " " mis * 100 / total "%" }' scheds = 200000 mis-scheds = 80090 scheds = 200000 mis-scheds = 83951 scheds = 200000 mis-scheds = 72304 scheds = 200000 mis-scheds = 83453 scheds = 200000 mis-scheds = 84245 scheds = 200000 mis-scheds = 81430 scheds = 200000 mis-scheds = 77704 scheds = 200000 mis-scheds = 86758 scheds = 200000 mis-scheds = 92188 scheds = 200000 mis-scheds = 83848 Ave mis-scheds 82597.1 41.2985% So, assuming this holds for UML threads, this means that UML context switches involve ~1.8 host context switches (3 * .4 + 1 * .6) on average instead of the desired 1. This may be high because sometimes the reader will do its work and schedule back to the writer in one quantum. In this case, the writer will run and go right through its own read without sleeping, making this a one host context switch transition (with the host switch happening in the wrong place). Jeff #include <errno.h> #include <sched.h> char stack[65536]; int counter = 0; int sched = 0; int mis_sched = 0; static void switcher(int me, int you) { int i, expect; char c; for(i = 0; i < 100000; i++){ expect = counter + 1; counter++; sched++; if(write(you, &c, sizeof(c)) < sizeof(c)){ perror("write"); } if(expect != counter) mis_sched++; if(read(me, &c, sizeof(c)) < sizeof(c)){ perror("read"); } } } static int child(void *arg) { int *fds = arg; char c; switcher(fds[0], fds[1]); if(write(fds[1], &c, sizeof(c)) < sizeof(c)){ perror("write"); } return(0); } int main(int argc, char **argv) { int parent_pipe[2], child_pipe[2], child_fds[2], pid; char c; if((pipe(parent_pipe) < 0) || (pipe(child_pipe) < 0)){ perror("pipe"); exit(1); } child_fds[0] = child_pipe[0]; child_fds[1] = parent_pipe[1]; pid = clone(child, &stack[sizeof(stack)/sizeof(stack[0])], CLONE_VM, child_fds); if(pid < 0){ perror("clone"); exit(1); } if(read(parent_pipe[0], &c, sizeof(c)) < sizeof(c)){ perror("read"); exit(1); } switcher(parent_pipe[0], child_pipe[1]); printf("scheds = %d\tmis-scheds = %d\n", sched, mis_sched); } |
From: Jeff D. <jd...@ka...> - 2002-01-07 19:50:56
|
al...@pi... said: > Bear with me, a silly question: Would it be possible to use semop to > both wakeup the other process and put ourselves to sleep? (Semop can > both increment one sem and decrement another one in one syscall). Semaphores can't work. What we want is an atomic wake-the-other-guy-and-go-to-sleep operation. When you atomically up one semaphore and down another, either both operations succeed, in which case the semop returns without sleeping, or the down fails and you sleep, but the up doesn't happen either (and the other guy doesn't wake up) because the two ops are atomic. It was a nice idea though... Jeff |
From: Alex P. <al...@pi...> - 2002-01-07 20:09:59
|
On Mon, 7 Jan 2002, Jeff Dike wrote: > al...@pi... said: > > Bear with me, a silly question: Would it be possible to use semop to > > both wakeup the other process and put ourselves to sleep? (Semop can > > both increment one sem and decrement another one in one syscall). > > Semaphores can't work. What we want is an atomic > wake-the-other-guy-and-go-to-sleep operation. When you atomically up > one semaphore and down another, either both operations succeed, in which case > the semop returns without sleeping, or the down fails and you sleep, but the > up doesn't happen either (and the other guy doesn't wake up) because the two > ops are atomic. Semaphore can only fail when you use IPC_NOWAIT. Since you won't be using this flag, it won't fail, hence no problem. -alex |
From: Shane K. <sh...@ti...> - 2002-01-07 21:39:48
|
On 2002-01-07 15:17:05 -0500, Alex Pilosov wrote: > On Mon, 7 Jan 2002, Jeff Dike wrote: > > > al...@pi... said: > > > Bear with me, a silly question: Would it be possible to use semop > > > to both wakeup the other process and put ourselves to sleep? > > > (Semop can both increment one sem and decrement another one in one > > > syscall). > > > > Semaphores can't work. What we want is an atomic > > wake-the-other-guy-and-go-to-sleep operation. When you atomically > > up one semaphore and down another, either both operations succeed, > > in which case the semop returns without sleeping, or the down fails > > and you sleep, but the up doesn't happen either (and the other guy > > doesn't wake up) because the two ops are atomic. > > Semaphore can only fail when you use IPC_NOWAIT. Since you won't be > using this flag, it won't fail, hence no problem. What Mr. Dike is saying is: 1. semop() calls operate on a set of semaphores 2. all operations occur at once The two operations needed are: A. signal another process to execute B. wait until I can execute A depends on B, since in the proposed semop() both must happen at once, so neither will ever happen. BUT... Personally, I'm wondering what is to prevent the application from simply making two calls to semop(). Perform B then A: { struct sembuf sb; sb.sem_num = 0; sb.sem_flg = 0; /* wake next process... */ sb.sem_op = 1; semop(next_process->context_semid, &sb, 1); /* ...and wait until we're ready to run */ sb.sem_op = -1; semop(this_process->context_semid, &sb, 1); } This may be no better than using pipes, but it might be. Semaphores do leave icky structures lying around (type "ipcs" to see them), have namespace problems, and so on. If UML is a pthread application (can't remember), you can use condition variables to do the same thing without the spew, but you have to use a mutex as well: pthread_mutex_lock(&next_process->context_mutex); pthread_cond_signal(&next_process->context_cond); pthread_mutex_unlock(&next_process->context_mutex); pthread_cond_wait(&this_process->context_cond, &this_process->context_mutex); The mutex should be locked for each process when it starts. Again, this might not be any better than pipes or semaphores, but you never know. In FreeBSD, for instance, mutexes with only two threads are extremely fast. -- Shane Carpe Diem |
From: Matt Z. <md...@de...> - 2002-01-07 23:05:22
|
On Mon, Jan 07, 2002 at 10:39:42PM +0100, Shane Kerr wrote: > If UML is a pthread application (can't remember), you can use condition > variables to do the same thing without the spew, but you have to use a > mutex as well: > > pthread_mutex_lock(&next_process->context_mutex); > pthread_cond_signal(&next_process->context_cond); > pthread_mutex_unlock(&next_process->context_mutex); > pthread_cond_wait(&this_process->context_cond, &this_process->context_mutex); > > The mutex should be locked for each process when it starts. Again, this > might not be any better than pipes or semaphores, but you never know. > In FreeBSD, for instance, mutexes with only two threads are extremely > fast. This would definitely be better, since condition variables are supposed to unlock the mutex and go to sleep atomically. I think in the case of pthreads this is done using a mutex, though, so it is atomic from a synchronization point of view, but not for purposes of scheduling. For this to work for UML, there would have to be some in-kernel implementation of condition variables, I think. -- - mdz |
From: Matt Z. <md...@de...> - 2002-01-07 21:41:15
|
On Mon, Jan 07, 2002 at 03:17:05PM -0500, Alex Pilosov wrote: > On Mon, 7 Jan 2002, Jeff Dike wrote: > > > al...@pi... said: > > > Bear with me, a silly question: Would it be possible to use semop to > > > both wakeup the other process and put ourselves to sleep? (Semop can > > > both increment one sem and decrement another one in one syscall). > > > > Semaphores can't work. What we want is an atomic > > wake-the-other-guy-and-go-to-sleep operation. When you atomically up > > one semaphore and down another, either both operations succeed, in which case > > the semop returns without sleeping, or the down fails and you sleep, but the > > up doesn't happen either (and the other guy doesn't wake up) because the two > > ops are atomic. > > Semaphore can only fail when you use IPC_NOWAIT. Since you won't be using > this flag, it won't fail, hence no problem. It doesn't have to fail, though. The thread can be de-scheduled after one semaphore op, and before the other, causing a context switch just to block on the other semaphore op, hence the desire for an atomic operation to do both. -- - mdz |
From: Jeff D. <jd...@ka...> - 2002-01-07 23:28:11
|
md...@de... said: > The thread can be de-scheduled after one semaphore op, and before the > other, causing a context switch just to block on the other semaphore > op, hence the desire for an atomic operation to do both. This is actually non-atomic. I went looking in the sem* manpages for a sequential flag, and it's not there. Jeff |
From: Matt Z. <md...@de...> - 2002-01-08 02:38:32
|
On Mon, Jan 07, 2002 at 06:29:35PM -0500, Jeff Dike wrote: > md...@de... said: > > The thread can be de-scheduled after one semaphore op, and before the > > other, causing a context switch just to block on the other semaphore > > op, hence the desire for an atomic operation to do both. > > This is actually non-atomic. I went looking in the sem* manpages for a > sequential flag, and it's not there. I thought that was what I said...since the thread can be preempted between the two operations, they cannot be performed as one atomic operation. -- - mdz |
From: Jeff D. <jd...@ka...> - 2002-01-07 21:58:48
|
al...@pi... said: > Semaphore can only fail when you use IPC_NOWAIT. Since you won't be > using this flag, it won't fail, hence no problem. I'm not talking about it failing. I'm talking about the atomicity of the two operations. Neither happens until they both happen. If the process sleeps, which is a goal, then the down hasn't happened yet. So, neither has the up. So, the other process isn't awakened. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-07 22:05:01
|
On Mon, Jan 07, 2002 at 04:59:17PM -0500, Jeff Dike wrote: > > Semaphore can only fail when you use IPC_NOWAIT. Since you won't be > > using this flag, it won't fail, hence no problem. > > I'm not talking about it failing. I'm talking about the atomicity of the > two operations. Neither happens until they both happen. If the process > sleeps, which is a goal, then the down hasn't happened yet. So, neither > has the up. So, the other process isn't awakened. One thing I've comtemplated hacking up for UML to see if it makes any difference is to make another syscall path into the kernel which would take 'compound syscall blocks', basically an array of longs (or whatever) that describe a set of syscalls to be executed sequentially. In this way, you could tell the kernel to do all your switching syscalls in one syscalls, avoiding the 300-something cycles per user-kernel-user transition, and in the pipe switching case it would give you the additional benefit of guaranteed non-synchronous wakeups (as the kernel isn't preempted until it returns back to userspace). Does this sound completely stupid? cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-07 23:30:04
|
bu...@gn... said: > One thing I've comtemplated hacking up for UML to see if it makes any > difference is to make another syscall path into the kernel which would > take 'compound syscall blocks', basically an array of longs (or > whatever) that describe a set of syscalls to be executed sequentially. > In this way, you could tell the kernel to do all your switching > syscalls in one syscalls, avoiding the 300-something cycles per > user-kernel-user transition, and in the pipe switching case it would > give you the additional benefit of guaranteed non-synchronous wakeups > (as the kernel isn't preempted until it returns back to userspace). There's no chance of this making it in the kernel. Anyway, I'm eventually going to go from one host thread per UML thread to one host thread per virtual processor. Then, the host scheduling behavior will stop being an issue. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-07 23:38:45
|
On Mon, Jan 07, 2002 at 06:31:31PM -0500, Jeff Dike wrote: > > One thing I've comtemplated hacking up for UML to see if it makes any > > difference is to make another syscall path into the kernel which would > > take 'compound syscall blocks', basically an array of longs (or > > whatever) that describe a set of syscalls to be executed sequentially. > > In this way, you could tell the kernel to do all your switching > > syscalls in one syscalls, avoiding the 300-something cycles per > > user-kernel-user transition, and in the pipe switching case it would > > give you the additional benefit of guaranteed non-synchronous wakeups > > (as the kernel isn't preempted until it returns back to userspace). > > There's no chance of this making it in the kernel. Well, doh ("...to see if it makes any difference...") > Anyway, I'm eventually going to go from one host thread per UML thread to > one host thread per virtual processor. Then, the host scheduling behavior > will stop being an issue. You don't think remapping address space on every context switch would be slow (it would definitely seem slower than reloading %cr3..) ? cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-08 01:33:53
|
bu...@gn... said: > You don't think remapping address space on every context switch would > be slow (it would definitely seem slower than reloading %cr3..) ? If I have direct access to switch_mm, then it'll be the same speed (or close) to a native memory context switch. The plan is to allow processes to manipulate host address spaces. So there will be one host thread per virtual processor, and one host address space (i.e. mm_struct) per UML thread. The processor threads will switch between address spaces without needing to manually remap its single address space. Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-08 12:41:47
|
Sounds good. Count me in. On Mon, Jan 07, 2002 at 08:35:16PM -0500, Jeff Dike wrote: > > You don't think remapping address space on every context switch would > > be slow (it would definitely seem slower than reloading %cr3..) ? > > If I have direct access to switch_mm, then it'll be the same speed (or close) > to a native memory context switch. > > The plan is to allow processes to manipulate host address spaces. So there > will be one host thread per virtual processor, and one host address space > (i.e. mm_struct) per UML thread. The processor threads will switch between > address spaces without needing to manually remap its single address space. > > Jeff > |
From: Alex P. <al...@pi...> - 2002-01-07 23:58:09
|
On Mon, 7 Jan 2002, Lennert Buytenhek wrote: > > two operations. Neither happens until they both happen. If the process > > sleeps, which is a goal, then the down hasn't happened yet. So, neither > > has the up. So, the other process isn't awakened. Sorry, my mistake. > One thing I've comtemplated hacking up for UML to see if it make s any > difference is to make another syscall path into the kernel which would > take 'compound syscall blocks', basically an array of longs (or > whatever) that describe a set of syscalls to be executed sequentially. > In this way, you could tell the kernel to do all your switching > syscalls in one syscalls, avoiding the 300-something cycles per > user-kernel-user transition, and in the pipe switching case it would > give you the additional benefit of guaranteed non-synchronous wakeups > (as the kernel isn't preempted until it returns back to userspace). Sounds interesting to me, wonder if anyone thought of this before. Error handling is interesting in this situation, what do you do when one of syscalls in chain fails? I wonder if there are any pieces of kernel which would be confused by these (ex: something chained to exit())? -alex |
From: Jeff D. <jd...@ka...> - 2002-01-08 01:44:39
|
al...@pi... said: > Sorry, my mistake. I did use the word 'fail' in my original message, which was a bad choice... > Sounds interesting to me, wonder if anyone thought of this before. Yes, they have. > Error handling is interesting in this situation, what do you do when > one of syscalls in chain fails? I wonder if there are any pieces of > kernel which would be confused by these (ex: something chained to > exit())? Yup :-) Jeff |
From: Lennert B. <bu...@gn...> - 2002-01-08 10:06:05
|
On Mon, Jan 07, 2002 at 08:42:06PM -0500, Jeff Dike wrote: > > Sounds interesting to me, wonder if anyone thought of this before. > > Yes, they have. Got a pointer? cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2002-01-07 23:19:33
|
sh...@ti... said: > Personally, I'm wondering what is to prevent the application from > simply making two calls to semop(). > This may be no better than using pipes, but it might be. Right, it has exactly the same host scheduling properties as the write-then-read. And pipes are a lot easier to understand. > If UML is a pthread application (can't remember), It's not. > you can use > condition variables to do the same thing without the spew, but you > have to use a mutex as well: I'd still want to know what the underlying implementation is. Jeff |
From: Jeff D. <jd...@ka...> - 2002-01-07 23:26:54
|
bu...@gn... said: > Grep for wake_up in fs/pipe.c. Synchronous wakeups > (other-end-of-the-pipe-runs- first) is only done if > - we are reading and there is no data left to read; signal all > writers > - we are writing and the pipe buffer is full; signal all readers > In all other cases, 'standard' wakeups are done (which probably means > that if the woken up process has a higher scheduler priority, the > current process is preempted, otherwise the current process keeps > running). Yeah, from my discovery that the other end runs first only ~40% of the time, I figured that there wasn't a rule firing, just the normal rescheduling. Jeff |