Thread: Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

user-mode-linux-devel

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2001-12-31 03:25:09

bu...@gn... said:
>  Last time I looked it didn't (admittedly a long while ago). 

When I was chasing the gdb pthread hang in UML a while back, gdb noticed
when new threads happened and printed out little messages about it.

It would be very cool to induce it to do the same thing when the extra idle
threads start up.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-02 15:12:40

bu...@gn... said:
> int errnos[NR_CPUS+1];
> int *__errno_location() {
	...
> }

Hmmm, I didn't think of trying that.  Does it actually work?

> Breaking threads-share-LDT and thereby breaking pthreads, bind, jvm
> and all the rest under uml :~( 

Yeah, the ldt is the thing that makes this all not work.  However, not all
arches have ldts, so the private page scheme might work there.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 15:07:09

sh...@ti... said:
> Does each process have it's own pipe (soon to be semaphore)?  

Yes.

> If not,
> it seems like you'd get scheduler starvation (which I in fact saw in a
> version of UML here a few months ago).  I.e. the process that writes
> then reads immediately and nothing else gets to go.

The only way for that to happen is for something else to have switched back to
it.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 18:15:44

bu...@gn... said:
> Are you sure that these scheduler oddities are happening right now?
> I think if read/write on pipe gives bad scheduler behavior, that is a
> scheduler bug (pipelines have pretty much the same model). 

I did an experiment to see how often the reader of a pipe runs immediately
(i.e. before the writer returns from the write).

The code is below.  Two threads run in the same address space scheduling
each other.  Mis-schedules are detected by a local counter (expect) being
different from a global one (counter) because the other thread has bumped it.

It turns out that this happens a significant amount of the time, but not all.
On both 2.2.20 and 2.4.17, I see about 40% mis-scheds:

2.2.20:

% for i in 1 2 3 4 5 6 7 8 9 10; do ./a.out; done | awk '{print; n++; total += $3; mis += $6} END { print "Ave mis-scheds " mis/n " " mis * 100 / total "%" }'
scheds = 200000 mis-scheds = 95805
scheds = 200000 mis-scheds = 74020
scheds = 200000 mis-scheds = 88586
scheds = 200000 mis-scheds = 86637
scheds = 200000 mis-scheds = 81900
scheds = 200000 mis-scheds = 86731
scheds = 200000 mis-scheds = 93272
scheds = 200000 mis-scheds = 91755
scheds = 200000 mis-scheds = 74975
scheds = 200000 mis-scheds = 97391
Ave mis-scheds 87107.2 43.5536%

2.4.17 :

% for i in 1 2 3 4 5 6 7 8 9 10; do ./a.out; done | awk '{print; n++; total += $3; mis += $6} END { print "Ave mis-scheds " mis/n " " mis * 100 / total "%" }'
scheds = 200000 mis-scheds = 80090
scheds = 200000 mis-scheds = 83951
scheds = 200000 mis-scheds = 72304
scheds = 200000 mis-scheds = 83453
scheds = 200000 mis-scheds = 84245
scheds = 200000 mis-scheds = 81430
scheds = 200000 mis-scheds = 77704
scheds = 200000 mis-scheds = 86758
scheds = 200000 mis-scheds = 92188
scheds = 200000 mis-scheds = 83848
Ave mis-scheds 82597.1 41.2985%

So, assuming this holds for UML threads, this means that UML context switches
involve ~1.8 host context switches (3 * .4 + 1 * .6) on average instead of the 
desired 1.

This may be high because sometimes the reader will do its work and schedule
back to the writer in one quantum.  In this case, the writer will run and
go right through its own read without sleeping, making this a one host context
switch transition (with the host switch happening in the wrong place).

				Jeff


#include <errno.h>
#include <sched.h>

char stack[65536];

int counter = 0;
int sched = 0;
int mis_sched = 0;

static void switcher(int me, int you)
{
  int i, expect;
  char c;

  for(i = 0; i < 100000; i++){
    expect = counter + 1;
    counter++;

    sched++;
    if(write(you, &c, sizeof(c)) < sizeof(c)){
      perror("write");
    }

    if(expect != counter) mis_sched++;

    if(read(me, &c, sizeof(c)) < sizeof(c)){
      perror("read");
    }
  }
}

static int child(void *arg)
{
  int *fds = arg;
  char c;

  switcher(fds[0], fds[1]);
  if(write(fds[1], &c, sizeof(c)) < sizeof(c)){
    perror("write");
  }
  return(0);
}

int main(int argc, char **argv)
{
  int parent_pipe[2], child_pipe[2], child_fds[2], pid;
  char c;

  if((pipe(parent_pipe) < 0) || (pipe(child_pipe) < 0)){
    perror("pipe");
    exit(1);
  }
  child_fds[0] = child_pipe[0];
  child_fds[1] = parent_pipe[1];
  pid = clone(child, &stack[sizeof(stack)/sizeof(stack[0])], CLONE_VM, 
	      child_fds);
  if(pid < 0){
    perror("clone");
    exit(1);
  }
  
  if(read(parent_pipe[0], &c, sizeof(c)) < sizeof(c)){
    perror("read");
    exit(1);
  }
  switcher(parent_pipe[0], child_pipe[1]);
  printf("scheds = %d\tmis-scheds = %d\n", sched, mis_sched);
}

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 19:50:56

al...@pi... said:
> Bear with me, a silly question: Would it be possible to use semop to
> both wakeup the other process and put ourselves to sleep? (Semop can
> both increment one sem and decrement another one in one syscall). 

Semaphores can't work.  What we want is an atomic 
wake-the-other-guy-and-go-to-sleep operation.  When you atomically up
one semaphore and down another, either both operations succeed, in which case
the semop returns without sleeping, or the down fails and you sleep, but the
up doesn't happen either (and the other guy doesn't wake up) because the two 
ops are atomic.

It was a nice idea though...

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Alex P. <al...@pi...> - 2002-01-07 20:09:59

On Mon, 7 Jan 2002, Jeff Dike wrote:

> al...@pi... said:
> > Bear with me, a silly question: Would it be possible to use semop to
> > both wakeup the other process and put ourselves to sleep? (Semop can
> > both increment one sem and decrement another one in one syscall). 
> 
> Semaphores can't work.  What we want is an atomic 
> wake-the-other-guy-and-go-to-sleep operation.  When you atomically up
> one semaphore and down another, either both operations succeed, in which case
> the semop returns without sleeping, or the down fails and you sleep, but the
> up doesn't happen either (and the other guy doesn't wake up) because the two 
> ops are atomic.

Semaphore can only fail when you use IPC_NOWAIT. Since you won't be using
this flag, it won't fail, hence no problem.

-alex

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Shane K. <sh...@ti...> - 2002-01-07 21:39:48

On 2002-01-07 15:17:05 -0500, Alex Pilosov wrote:
> On Mon, 7 Jan 2002, Jeff Dike wrote:
> 
> > al...@pi... said:
> > > Bear with me, a silly question: Would it be possible to use semop
> > > to both wakeup the other process and put ourselves to sleep?
> > > (Semop can both increment one sem and decrement another one in one
> > > syscall). 
> > 
> > Semaphores can't work.  What we want is an atomic
> > wake-the-other-guy-and-go-to-sleep operation.  When you atomically
> > up one semaphore and down another, either both operations succeed,
> > in which case the semop returns without sleeping, or the down fails
> > and you sleep, but the up doesn't happen either (and the other guy
> > doesn't wake up) because the two ops are atomic.
> 
> Semaphore can only fail when you use IPC_NOWAIT. Since you won't be
> using this flag, it won't fail, hence no problem.

What Mr. Dike is saying is:

1. semop() calls operate on a set of semaphores
2. all operations occur at once

The two operations needed are:

A. signal another process to execute
B. wait until I can execute

A depends on B, since in the proposed semop() both must happen at once,
so neither will ever happen.

BUT...

Personally, I'm wondering what is to prevent the application from simply
making two calls to semop().  Perform B then A:

{
    struct sembuf sb;
    sb.sem_num = 0;
    sb.sem_flg = 0;

    /* wake next process... */
    sb.sem_op = 1;
    semop(next_process->context_semid, &sb, 1);

    /* ...and wait until we're ready to run */
    sb.sem_op = -1;
    semop(this_process->context_semid, &sb, 1);
}

This may be no better than using pipes, but it might be.  Semaphores do
leave icky structures lying around (type "ipcs" to see them), have
namespace problems, and so on.  

If UML is a pthread application (can't remember), you can use condition
variables to do the same thing without the spew, but you have to use a
mutex as well:

pthread_mutex_lock(&next_process->context_mutex);
pthread_cond_signal(&next_process->context_cond);
pthread_mutex_unlock(&next_process->context_mutex);
pthread_cond_wait(&this_process->context_cond, &this_process->context_mutex);

The mutex should be locked for each process when it starts.  Again, this
might not be any better than pipes or semaphores, but you never know.
In FreeBSD, for instance, mutexes with only two threads are extremely
fast.

-- 
Shane
Carpe Diem

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Matt Z. <md...@de...> - 2002-01-07 23:05:22

On Mon, Jan 07, 2002 at 10:39:42PM +0100, Shane Kerr wrote:

> If UML is a pthread application (can't remember), you can use condition
> variables to do the same thing without the spew, but you have to use a
> mutex as well:
> 
> pthread_mutex_lock(&next_process->context_mutex);
> pthread_cond_signal(&next_process->context_cond);
> pthread_mutex_unlock(&next_process->context_mutex);
> pthread_cond_wait(&this_process->context_cond, &this_process->context_mutex);
> 
> The mutex should be locked for each process when it starts.  Again, this
> might not be any better than pipes or semaphores, but you never know.
> In FreeBSD, for instance, mutexes with only two threads are extremely
> fast.

This would definitely be better, since condition variables are supposed to
unlock the mutex and go to sleep atomically.  I think in the case of
pthreads this is done using a mutex, though, so it is atomic from a
synchronization point of view, but not for purposes of scheduling.

For this to work for UML, there would have to be some in-kernel
implementation of condition variables, I think.

-- 
 - mdz

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Matt Z. <md...@de...> - 2002-01-07 21:41:15

On Mon, Jan 07, 2002 at 03:17:05PM -0500, Alex Pilosov wrote:

> On Mon, 7 Jan 2002, Jeff Dike wrote:
> 
> > al...@pi... said:
> > > Bear with me, a silly question: Would it be possible to use semop to
> > > both wakeup the other process and put ourselves to sleep? (Semop can
> > > both increment one sem and decrement another one in one syscall). 
> > 
> > Semaphores can't work.  What we want is an atomic 
> > wake-the-other-guy-and-go-to-sleep operation.  When you atomically up
> > one semaphore and down another, either both operations succeed, in which case
> > the semop returns without sleeping, or the down fails and you sleep, but the
> > up doesn't happen either (and the other guy doesn't wake up) because the two 
> > ops are atomic.
> 
> Semaphore can only fail when you use IPC_NOWAIT. Since you won't be using
> this flag, it won't fail, hence no problem.

It doesn't have to fail, though.  The thread can be de-scheduled after one
semaphore op, and before the other, causing a context switch just to block
on the other semaphore op, hence the desire for an atomic operation to do
both.

-- 
 - mdz

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 23:28:11

md...@de... said:
> The thread can be de-scheduled after one semaphore op, and before the
> other, causing a context switch just to block on the other semaphore
> op, hence the desire for an atomic operation to do both.

This is actually non-atomic.  I went looking in the sem* manpages for a
sequential flag, and it's not there.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Matt Z. <md...@de...> - 2002-01-08 02:38:32

On Mon, Jan 07, 2002 at 06:29:35PM -0500, Jeff Dike wrote:

> md...@de... said:
> > The thread can be de-scheduled after one semaphore op, and before the
> > other, causing a context switch just to block on the other semaphore
> > op, hence the desire for an atomic operation to do both.
> 
> This is actually non-atomic.  I went looking in the sem* manpages for a
> sequential flag, and it's not there.

I thought that was what I said...since the thread can be preempted between
the two operations, they cannot be performed as one atomic operation.

-- 
 - mdz

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 21:58:48

al...@pi... said:
> Semaphore can only fail when you use IPC_NOWAIT. Since you won't be
> using this flag, it won't fail, hence no problem. 

I'm not talking about it failing.  I'm talking about the atomicity of the
two operations.  Neither happens until they both happen.  If the process
sleeps, which is a goal, then the down hasn't happened yet.  So, neither
has the up.  So, the other process isn't awakened.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Lennert B. <bu...@gn...> - 2002-01-07 22:05:01

On Mon, Jan 07, 2002 at 04:59:17PM -0500, Jeff Dike wrote:

> > Semaphore can only fail when you use IPC_NOWAIT. Since you won't be
> > using this flag, it won't fail, hence no problem. 
> 
> I'm not talking about it failing.  I'm talking about the atomicity of the
> two operations.  Neither happens until they both happen.  If the process
> sleeps, which is a goal, then the down hasn't happened yet.  So, neither
> has the up.  So, the other process isn't awakened.

One thing I've comtemplated hacking up for UML to see if it makes any difference
is to make another syscall path into the kernel which would take 'compound
syscall blocks', basically an array of longs (or whatever) that describe a set
of syscalls to be executed sequentially.  In this way, you could tell the kernel
to do all your switching syscalls in one syscalls, avoiding the 300-something
cycles per user-kernel-user transition, and in the pipe switching case it
would give you the additional benefit of guaranteed non-synchronous wakeups
(as the kernel isn't preempted until it returns back to userspace).

Does this sound completely stupid?

cheers,
Lennert

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 23:30:04

bu...@gn... said:
> One thing I've comtemplated hacking up for UML to see if it makes any
> difference is to make another syscall path into the kernel which would
> take 'compound syscall blocks', basically an array of longs (or
> whatever) that describe a set of syscalls to be executed sequentially.
>  In this way, you could tell the kernel to do all your switching
> syscalls in one syscalls, avoiding the 300-something cycles per
> user-kernel-user transition, and in the pipe switching case it would
> give you the additional benefit of guaranteed non-synchronous wakeups
> (as the kernel isn't preempted until it returns back to userspace).

There's no chance of this making it in the kernel.

Anyway, I'm eventually going to go from one host thread per UML thread to
one host thread per virtual processor.  Then, the host scheduling behavior
will stop being an issue.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Lennert B. <bu...@gn...> - 2002-01-07 23:38:45

On Mon, Jan 07, 2002 at 06:31:31PM -0500, Jeff Dike wrote:

> > One thing I've comtemplated hacking up for UML to see if it makes any
> > difference is to make another syscall path into the kernel which would
> > take 'compound syscall blocks', basically an array of longs (or
> > whatever) that describe a set of syscalls to be executed sequentially.
> >  In this way, you could tell the kernel to do all your switching
> > syscalls in one syscalls, avoiding the 300-something cycles per
> > user-kernel-user transition, and in the pipe switching case it would
> > give you the additional benefit of guaranteed non-synchronous wakeups
> > (as the kernel isn't preempted until it returns back to userspace).
> 
> There's no chance of this making it in the kernel.

Well, doh   ("...to see if it makes any difference...")


> Anyway, I'm eventually going to go from one host thread per UML thread to
> one host thread per virtual processor.  Then, the host scheduling behavior
> will stop being an issue.

You don't think remapping address space on every context switch would be
slow (it would definitely seem slower than reloading %cr3..) ?


cheers,
Lennert

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-08 01:33:53

bu...@gn... said:
> You don't think remapping address space on every context switch would
> be slow (it would definitely seem slower than reloading %cr3..) ? 

If I have direct access to switch_mm, then it'll be the same speed (or close)
to a native memory context switch.

The plan is to allow processes to manipulate host address spaces.  So there 
will be one host thread per virtual processor, and one host address space
(i.e. mm_struct) per UML thread.  The processor threads will switch between
address spaces without needing to manually remap its single address space.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Lennert B. <bu...@gn...> - 2002-01-08 12:41:47

Sounds good.  Count me in.


On Mon, Jan 07, 2002 at 08:35:16PM -0500, Jeff Dike wrote:

> > You don't think remapping address space on every context switch would
> > be slow (it would definitely seem slower than reloading %cr3..) ? 
> 
> If I have direct access to switch_mm, then it'll be the same speed (or close)
> to a native memory context switch.
> 
> The plan is to allow processes to manipulate host address spaces.  So there 
> will be one host thread per virtual processor, and one host address space
> (i.e. mm_struct) per UML thread.  The processor threads will switch between
> address spaces without needing to manually remap its single address space.
> 
> 				Jeff
>

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Alex P. <al...@pi...> - 2002-01-07 23:58:09

On Mon, 7 Jan 2002, Lennert Buytenhek wrote:

> > two operations.  Neither happens until they both happen.  If the process
> > sleeps, which is a goal, then the down hasn't happened yet.  So, neither
> > has the up.  So, the other process isn't awakened.
Sorry, my mistake.

> One thing I've comtemplated hacking up for UML to see if it make     s
any
> difference is to make another syscall path into the kernel which would
> take 'compound syscall blocks', basically an array of longs (or
> whatever) that describe a set of syscalls to be executed sequentially.  
> In this way, you could tell the kernel to do all your switching
> syscalls in one syscalls, avoiding the 300-something cycles per
> user-kernel-user transition, and in the pipe switching case it would
> give you the additional benefit of guaranteed non-synchronous wakeups
> (as the kernel isn't preempted until it returns back to userspace).
Sounds interesting to me, wonder if anyone thought of this before.
Error handling is interesting in this situation, what do you do when one
of syscalls in chain fails? I wonder if there are any pieces of kernel
which would be confused by these (ex: something chained to exit())?

-alex

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-08 01:44:39

al...@pi... said:
> Sorry, my mistake. 

I did use the word 'fail' in my original message, which was a bad choice...

> Sounds interesting to me, wonder if anyone thought of this before.

Yes, they have.

> Error handling is interesting in this situation, what do you do when
> one of syscalls in chain fails? I wonder if there are any pieces of
> kernel which would be confused by these (ex: something chained to
> exit())? 

Yup :-)

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Lennert B. <bu...@gn...> - 2002-01-08 10:06:05

On Mon, Jan 07, 2002 at 08:42:06PM -0500, Jeff Dike wrote:

> > Sounds interesting to me, wonder if anyone thought of this before.
> 
> Yes, they have.

Got a pointer?


cheers,
Lennert

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 23:19:33

sh...@ti... said:
> Personally, I'm wondering what is to prevent the application from
> simply making two calls to semop().

> This may be no better than using pipes, but it might be.

Right, it has exactly the same host scheduling properties as the 
write-then-read.  And pipes are a lot easier to understand.

> If UML is a pthread application (can't remember), 

It's not.

> you can use
> condition variables to do the same thing without the spew, but you
> have to use a mutex as well:

I'd still want to know what the underlying implementation is.

				Jeff

Re: [uml-devel] 'current' and such (was: [PATCH] use set_current instead of 'current = x')

From: Jeff D. <jd...@ka...> - 2002-01-07 23:26:54

bu...@gn... said:
> Grep for wake_up in fs/pipe.c.  Synchronous wakeups
> (other-end-of-the-pipe-runs- first) is only done if
> 	- we are reading and there is no data left to read; signal all
> writers
> 	- we are writing and the pipe buffer is full; signal all readers

> In all other cases, 'standard' wakeups are done (which probably means
> that if the woken up process has a higher scheduler priority, the
> current process is preempted, otherwise the current process keeps
> running). 

Yeah, from my discovery that the other end runs first only ~40% of the time,
I figured that there wasn't a rule firing, just the normal rescheduling.

				Jeff