From: johan v. <joh...@pa...> - 2001-04-06 22:17:48
Attachments:
irq_user.patch
|
hello, This patch moves irq_user.c from using select() to using poll(). This allows an extra argument to active_fd() to provide the pollevents for which one wishes to receive an irq. At the moment, the patch hardcodes this to (POLLIN | POLLPRI). This results in identical behavior as before. The memory usage is now dependent on the number of fd's to be monitored. The dispatching of interrupts has become somewhat faster, the administrative functions of activate_fd, reactivate_fd and free_irq_fd have become slower (mainly because of cumbersome struct pollfd manipulations). The kernel seems to boot and shutdown flawlessly with this. (Which is all the testing I have done.) enjoy, J. |
From: Lennert B. <bu...@gn...> - 2001-04-06 22:23:57
|
Hi, What's been on my TODO list for a while is having irq_user.c use POSIX realtime signals. I think this is a much cleaner interface than select/poll, as it doesn't require linear pollfd/bitmap arrays, nor does is it a linear operation. Would this be an idea for version 2 of your patch? :) cheers, Lennert On Sat, Apr 07, 2001 at 12:08:46AM +0200, johan verrept wrote: > hello, > > This patch moves irq_user.c from using select() to using poll(). > This allows an extra argument to active_fd() to provide the pollevents for which one wishes to > receive an irq. At the moment, the patch hardcodes this to (POLLIN | POLLPRI). This results in > identical behavior as before. > > The memory usage is now dependent on the number of fd's to be monitored. The dispatching of > interrupts has become somewhat faster, the administrative functions of activate_fd, reactivate_fd > and free_irq_fd have become slower (mainly because of cumbersome struct pollfd manipulations). > > The kernel seems to boot and shutdown flawlessly with this. (Which is all the testing I have done.) > > enjoy, > |
From: johan v. <joh...@pa...> - 2001-04-06 23:57:41
|
Lennert Buytenhek wrote: > > Hi, > > What's been on my TODO list for a while is having irq_user.c use POSIX realtime > signals. I think this is a much cleaner interface than select/poll, as it > doesn't require linear pollfd/bitmap arrays, nor does is it a linear operation. > Would this be an idea for version 2 of your patch? :) Unless I am missing something, this would only be usefull if drivers would be using dedicated asynchronous i/o operations. Interrupts would only be generated at completions of aio_read()/aio_write() calls. We would have to rewrite every driver that uses read/write. The new I/O needs can be integrated with the current API. The major difference would be stepping from read/write to um_read/um_write and an argument change to activate_fd()/reactivate_fd(). They would only be used for reading and they would have to include a buffer and a size. (because thats where we are actually doing the aio_read.) Write can be simulated without changes. Interrupts would indeed be directly mapable to the the destination, but we would only move the problem to um_read/um_write. They would have to identify which interrupt handler will be called at io completion for the provided fd (back to the linear search). This can be circumvented by returning a 'um_fd' when they register the irq and use that as fd to um_read/um_write/active_fd/reactivate_fd. The 'um_fd' would be a (void *). A small disadvantage would be that the client driver would have to store both fd (for normal operations) and um_fd. Before I even start considering doing this rather large change, I want to discuss whether the small advantage of not having to run down a list of few dozen filedescriptor is worth the complexity that needs to be added. Another point where I will need convincing is that this scenario prevents exactly that which I wanted accomplish by moving from select() to poll(). I want an interrupt when a file descriptor becomes writable, not when my write is completed. We would loose the flexibility I wanted to add with this change. I'm not really against this. It has less overhead, I would love to play with the aio_* calls and I can circumvent the problem in my code but it all seems rather overkill... J. |
From: Lennert B. <bu...@gn...> - 2001-04-07 15:18:54
|
On Sat, Apr 07, 2001 at 01:48:46AM +0200, johan verrept wrote: > > What's been on my TODO list for a while is having irq_user.c use POSIX realtime > > signals. I think this is a much cleaner interface than select/poll, as it > > doesn't require linear pollfd/bitmap arrays, nor does is it a linear operation. > > Would this be an idea for version 2 of your patch? :) > > Unless I am missing something, this would only be usefull if drivers would be using dedicated > asynchronous i/o operations. I don't see why? POSIX rtsigs are just like poll and select, if done right, but constant time. cheers, Lennert |
From: Jeff D. <jd...@ka...> - 2001-04-07 00:07:52
|
bu...@gn... said: > What's been on my TODO list for a while is having irq_user.c use POSIX > realtime signals. I think this is a much cleaner interface than select/ > poll, as it doesn't require linear pollfd/bitmap arrays, nor does is > it a linear operation. Long-term, I think a combination of signals/poll is probably the best way to go, speedwise. I don't like having to take a signal on every piece of input that comes in. Taking a signal, and then draining all available input sources while you're in the handler sounds better to me. Jeff |
From: Henrik N. <hn...@ma...> - 2001-04-07 11:42:17
|
Jeff Dike wrote: > Long-term, I think a combination of signals/poll is probably the best way to > go, speedwise. I don't like having to take a signal on every piece of input > that comes in. Taking a signal, and then draining all available input sources > while you're in the handler sounds better to me. You do not need to take every signal if you do not want to. RT signals are queued up to a queue depth of 1024 (SIGIO on overflow), and the queues can be polled like "poll()/select()" with sigtimedwait(). It returns one signal at a time, but is a very quick call regardless). So you only need to take the first signal, the rest of the events are drained from within the handler. The beauty of RT signal driven I/O is that they carry information about the event generating the signal, so when you retreive the signal from the queue you already know which filedescriptor the signal belongs to and so on.. the drawback is that you must also have code falling back on poll/select on queue overflow (SIGIO). -- Henrik Nordstrom MARA Systems |
From: Lennert B. <bu...@gn...> - 2001-04-07 15:15:02
|
On Fri, Apr 06, 2001 at 08:20:01PM -0500, Jeff Dike wrote: > > What's been on my TODO list for a while is having irq_user.c use POSIX > > realtime signals. I think this is a much cleaner interface than select/ > > poll, as it doesn't require linear pollfd/bitmap arrays, nor does is > > it a linear operation. > > Long-term, I think a combination of signals/poll is probably the best way to > go, speedwise. I don't like having to take a signal on every piece of input > that comes in. Taking a signal, and then draining all available input sources > while you're in the handler sounds better to me. This is what all sane users of posix rtsigs do. You don't take the signal on every IO, that would be silly. Instead, you mask the signal and then drain the event queue with sigwaitinfo(2). Have a look at phhttpd (not phttpd) for an example. cheers, Lennert |
From: James S. <mi...@st...> - 2001-04-07 15:58:00
|
Hi yes a great invention pitty they dont work in the 2.2.x kernel well not in 2.2.17 anyway when i tried to use them > >This is what all sane users of posix rtsigs do. You don't take the signal >on every IO, that would be silly. Instead, you mask the signal and then drain >the event queue with sigwaitinfo(2). Have a look at phhttpd (not phttpd) for >an example. > > >cheers, >Lennert > >_______________________________________________ >User-mode-linux-devel mailing list >Use...@li... >http://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > -- --------------------------------------------- Check Out: http://stev.org E-Mail: mi...@st... 5:00pm up 12 days, 55 min, 5 users, load average: 0.32, 0.27, 0.20 |
From: Lennert B. <bu...@gn...> - 2001-04-07 22:06:27
|
Hi, You need this patch: http://www.math.leidenuniv.nl/~buytenh/uml/sigio-2.2.16.diff as 2.2 kernels do not come with this support. 2.4 kernels have rtsig functionality integrated. cheers, Lennert On Sat, Apr 07, 2001 at 05:06:10PM +0000, James Stevenson wrote: > Hi > > yes a great invention pitty they dont work in the 2.2.x > kernel well not in 2.2.17 anyway when i tried to use them > > > > >This is what all sane users of posix rtsigs do. You don't take the signal > >on every IO, that would be silly. Instead, you mask the signal and then drain > >the event queue with sigwaitinfo(2). Have a look at phhttpd (not phttpd) for > >an example. > > > > > >cheers, > >Lennert > > |
From: johan v. <joh...@pa...> - 2001-04-07 00:55:39
|
Lennert Buytenhek wrote: > > Hi, > > What's been on my TODO list for a while is having irq_user.c use POSIX realtime > signals. I think this is a much cleaner interface than select/poll, as it > doesn't require linear pollfd/bitmap arrays, nor does is it a linear operation. > Would this be an idea for version 2 of your patch? :) Unless I am missing something, this would only be usefull if drivers would be using dedicated asynchronous i/o operations. Interrupts would only be generated at completions of aio_read()/aio_write() calls. We would have to rewrite every driver that uses read/write. The new I/O needs can be integrated with the current API. The major difference would be stepping from read/write to um_read/um_write and an argument change to activate_fd()/reactivate_fd(). They would only be used for reading and they would have to include a buffer and a size. (because thats where we are actually doing the aio_read.) Write can be simulated without changes. Interrupts would indeed be directly mapable to the the destination, but we would only move the problem to um_read/um_write. They would have to identify which interrupt handler will be called at io completion for the provided fd (back to the linear search). This can be circumvented by returning a 'um_fd' when they register the irq and use that as fd to um_read/um_write/active_fd/reactivate_fd. The 'um_fd' would be a (void *). A small disadvantage would be that the client driver would have to store both fd (for normal operations) and um_fd. Before I even start considering doing this rather large change, I want to discuss whether the small advantage of not having to run down a list of few dozen filedescriptor is worth the complexity that needs to be added. Another point where I will need convincing is that this scenario prevents exactly that which I wanted accomplish by moving from select() to poll(). I want an interrupt when a file descriptor becomes writable, not when my write is completed. We would loose the flexibility I wanted to add with this change. I'm not really against this. It has less overhead, I would love to play with the aio_* calls and I can circumvent the problem in my code but it all seems rather overkill... J. |
From: Jeff D. <jd...@ka...> - 2001-04-07 02:25:19
|
joh...@pa... said: > This patch moves irq_user.c from using select() to using poll(). Why should I prefer poll over select? My understanding is that select has scaling problems with very large numbers of active descriptors (or maybe active descriptors spread over a very large range). But I don't see UML having that kind of problem. It basically has one fd per device, and for the things that it's used for, there aren't going to be huge numbers of devices. Are there other reasons for preferring poll over select? Jeff |
From: johan v. <joh...@pa...> - 2001-04-07 03:55:40
|
Jeff Dike wrote: > Are there other reasons for preferring poll over select? The main reason to use poll() over select() is that we can have users supply the events they want to receive. You can receive an interrupt for most important events on a fd, see poll(2). This would require adding an events argument to activate_fd(). You could also have it for reactive_fd() so the user can easily adjust the flags for his needs. In this patch the events were hardcoded for compatibility, so I didn't have to patch the other files. I wrote this because I was doing ioctl based requests that, on completion, cause the fd to become writable. There is no way at this moment that I can capture that. Except with timer-based polling using select/poll... adding overhead and delay. Other than that, my driver uses multiple fd's. A minimum of 1 and a theoretical maximum of 128... which I am probably going to have to limit :). In practice I guess that it might go to a maximum of 10 fds. About the select() scaling problems, this might be because a lot of select() implementations are wrappers around poll(). J. |
From: Henrik N. <hn...@ma...> - 2001-04-07 11:46:25
|
johan verrept wrote: > About the select() scaling problems, this might be because a lot of select() implementations are > wrappers around poll(). The select() design indeed have scaling problems, but these only start to be noticeable when going above several hundred concurrently open files. select() is also limited by the size of fdset_t which effectively limits it to 1024 filedescriptors per process maximum (can't handle a fd number with a larger value than 1023, or you'll corrupt memory). poll() happily handles any number of filedescriptors in a reasonable manner, and does not have any scaling problem with checking the status of fds 1,5,145726 if so wanted. Linux-2.0.X does not support poll(). Any calls to poll() there is emulated by libc using select(). I don't remember exacly which Linux version working support for poll() appeared in, but I think it was some time during 2.2. The main difference in I/O capabilities using poll() vs select() is that poll() has the capability to indicate other events such as: * Hang up * Indicates the existence of priority data events differently from normal data events Both have the capability to indicate if a fd is ready for reading, writing or both. The poll() interface is a bit cleaner than select() in my opinion as there is no need to reconstruct the list of wanted events for each call (request and reply data is separated in poll), and the facts that is supports a wider range of events, is not limited by fd_setsize and scales in a reasonable manner is a nice bonuses. -- Henrik Nordstrom MARA Systems |
From: Lennert B. <bu...@gn...> - 2001-04-07 15:28:15
|
On Fri, Apr 06, 2001 at 10:37:22PM -0500, Jeff Dike wrote: > > This patch moves irq_user.c from using select() to using poll(). > > Why should I prefer poll over select? My understanding is that select has > scaling problems with very large numbers of active descriptors (or maybe > active descriptors spread over a very large range). > > But I don't see UML having that kind of problem. It basically has one fd per > device, and for the things that it's used for, there aren't going to be huge > numbers of devices. > > Are there other reasons for preferring poll over select? Getting rid of fd_set? sizeof(fd_set) is 128 on modern glibc's, so things like 'save_fds = fds', 'fds = save_fds', 'fds = active_fd_mask' aren't as cheap as you had probably thought.. cheers, Lennert |
From: James S. <mi...@st...> - 2001-04-06 23:54:39
|
Hi which version is that off ? patching file arch/um/kernel/irq_user.c Hunk #2 FAILED at 28. Hunk #3 succeeded at 69 (offset -15 lines). Hunk #4 FAILED at 93. 2 out of 4 hunks FAILED -- saving rejects to file arch/um/kernel/irq_user.c.rej In local.linux-user-mode-linux-devel, you wrote: >This is a multi-part message in MIME format. >--------------892AA45FCD15D9FCE5AAC486 >Content-Type: text/plain; charset=us-ascii >Content-Transfer-Encoding: 7bit > > >hello, > > This patch moves irq_user.c from using select() to using poll(). >This allows an extra argument to active_fd() to provide the pollevents for which one wishes to >receive an irq. At the moment, the patch hardcodes this to (POLLIN | POLLPRI). This results in >identical behavior as before. > >The memory usage is now dependent on the number of fd's to be monitored. The dispatching of >interrupts has become somewhat faster, the administrative functions of activate_fd, reactivate_fd >and free_irq_fd have become slower (mainly because of cumbersome struct pollfd manipulations). > >The kernel seems to boot and shutdown flawlessly with this. (Which is all the testing I have done.) > >enjoy, > > J. >--------------892AA45FCD15D9FCE5AAC486 >Content-Type: text/plain; charset=us-ascii; > name="irq_user.patch" >Content-Transfer-Encoding: 7bit >Content-Disposition: inline; > filename="irq_user.patch" > >--- linux-orig/arch/um/kernel/irq_user.c Fri Mar 16 02:17:16 2001 >+++ linux/arch/um/kernel/irq_user.c Fri Apr 6 23:47:50 2001 >@@ -11,10 +11,16 @@ > #include <errno.h> > #include <sys/types.h> > #include <sys/time.h> >+#include <sys/poll.h> > #include "user_util.h" > #include "kern_util.h" > #include "user.h" > >+#define ASSERT(x...) if (!(x)) {\ >+ printk ("ASSERT failed at %s:%d\n", __FILE__, __LINE__);\ >+ stop();\ >+ } >+ > struct irq_fd { > struct irq_fd *next; > void *id; >@@ -22,59 +28,50 @@ > int irq; > int pid; > int active; >+ short events; > }; > > static struct irq_fd *active_fds = NULL; >-static fd_set active_fd_mask; >-static int max_fd = -1; >-static fd_set fds; >-static fd_set save_fds; >+static struct irq_fd **last_fds = &active_fds; >+static struct pollfd *pollfds = NULL; >+static int pollfds_num = 0; >+static int pollfds_size = 0; > > void sigio_handler(int sig) > { > struct sigcontext_struct *sc; > struct irq_fd *irq_fd, *next; >- struct timeval tv; >- int i, n, user_mode, save_errno, count; >+ struct pollfd *pfd; >+ int n, user_mode, save_errno; > >- sc = (struct sigcontext_struct *) (&sig + 1); >+ // printk ("sigio_handler: called\n"); >+ sc = (struct sigcontext_struct *) (&sig + 1); > user_mode = user_context(sc->esp_at_signal); > save_errno = errno; > change_sig(SIGUSR1, 1); > if(user_mode) fill_in_regs(process_state(NULL, NULL, NULL), sc); >- fds = active_fd_mask; >- while(1){ >- tv.tv_sec = 0; >- tv.tv_usec = 0; >- if((n = select(max_fd + 1, &fds, NULL, NULL, &tv)) < 0){ >- printk("sigio_handler : select returned %d, " >- "errno = %d\n", n, errno); >- break; >- } >- save_fds = fds; >- count = 0; >- for(i=0;i<=max_fd;i++){ >- if(FD_ISSET(i, &fds)){ >- FD_CLR(i, &active_fd_mask); >- count++; >- } >- } >- if(count == 0) break; >- for(irq_fd=active_fds;irq_fd != NULL;irq_fd = irq_fd->next){ >- if(FD_ISSET(irq_fd->fd, &fds)) irq_fd->active = 1; >- } >- fds = save_fds; >- for(irq_fd=active_fds;irq_fd != NULL;irq_fd = next){ >- /* This mysterious assignment protects us against >- * the irq handler freeing the irq from under us. >- */ >- next = irq_fd->next; >- if(irq_fd->active){ >- irq_fd->active = 0; >- do_IRQ(irq_fd->irq, user_mode); >- } >- } >- } >+ while (1) { >+ if ((n=poll (pollfds, pollfds_num, 0))<0) { >+ printk ("sigio_handler : poll returned %d, " >+ "errno %d\n", n, errno); >+ break; >+ } >+ >+ if (n==0) >+ break; >+ >+ for (irq_fd=active_fds, pfd = pollfds; irq_fd != NULL; irq_fd = next, pfd++) { >+ ASSERT(irq_fd->fd == pfd->fd); >+ /* This mysterious assignment protects us against >+ * the irq handler freeing the irq from under us. >+ */ >+ next = irq_fd->next; >+ if (pfd->revents) { >+ pfd->events = 0; /* disable */ >+ do_IRQ (irq_fd->irq, user_mode); >+ } >+ } >+ } > if(user_mode){ > interrupt_end(); > block_signals(); >@@ -87,7 +84,9 @@ > int activate_fd(int irq, int fd, void *dev_id) > { > struct irq_fd *new_fd; >+ struct pollfd *tmp_pfd; > int pid, retval; >+ int events = POLLIN | POLLPRI; > > for(new_fd = active_fds;new_fd;new_fd = new_fd->next){ > if(new_fd->fd == fd){ >@@ -109,39 +108,77 @@ > return(-retval); > } > new_fd = um_kmalloc(sizeof(*new_fd)); >- if(new_fd == NULL) return(-ENOMEM); >- new_fd->next = active_fds; >+ if(new_fd == NULL) >+ return(-ENOMEM); >+ >+ new_fd->next = NULL; > new_fd->id = dev_id; > new_fd->irq = irq; > new_fd->fd = fd; > new_fd->pid = current_external_pid(); > new_fd->active = 0; >- active_fds = new_fd; >- if(fd > max_fd) max_fd = fd; >- FD_SET(fd, &active_fd_mask); >- return(0); >+ new_fd->events = events; >+ >+ *last_fds = new_fd; >+ last_fds = &(new_fd->next); >+ >+ pollfds_num++; >+ if (pollfds_num > pollfds_size) { >+ tmp_pfd = pollfds; >+ pollfds = um_kmalloc (pollfds_num * sizeof (struct pollfd)); >+ if (tmp_pfd) { >+ memcpy (pollfds, tmp_pfd, (sizeof(struct pollfd)*pollfds_size)); >+ kfree (tmp_pfd); >+ }; >+ pollfds_size = pollfds_num; >+ } >+ tmp_pfd = pollfds + (pollfds_num-1); >+ tmp_pfd->fd = fd; >+ tmp_pfd->events = events; >+ tmp_pfd->revents = 0; >+ >+ return(0); > } > > void free_irq_fd(void *dev_id) > { >+ int i; > struct irq_fd **prev; >+ struct pollfd *pfd; > > prev = &active_fds; >+ pfd = pollfds; > while(*prev != NULL){ > if((*prev)->id == dev_id){ > struct irq_fd *old_fd = *prev; >+ >+ ASSERT (pfd->fd == old_fd->fd); >+ memcpy (pfd, pfd+1, ((unsigned long)(pollfds+pollfds_num) - (unsigned long) (pfd+1))); >+ pollfds_num--; >+ >+ if (last_fds == &(old_fd->next)) >+ last_fds = prev; >+ > *prev = (*prev)->next; >- FD_CLR(old_fd->fd, &active_fd_mask); >- kfree(old_fd); >+ kfree(old_fd); > return; > } > prev = &(*prev)->next; >+ pfd++; >+ i++; > } > } > > void reactivate_fd(int fd) > { >- FD_SET(fd, &active_fd_mask); >+ struct pollfd *pfd; >+ struct irq_fd *ifd; >+ >+ for (ifd = active_fds, pfd = pollfds; ifd && (ifd->fd != fd); ifd = ifd->next, pfd++) ASSERT (ifd->fd == pfd->fd); >+ if (!ifd) >+ return; >+ >+ pfd->events = ifd->events; > } > > void forward_interrupts(int pid) > >--------------892AA45FCD15D9FCE5AAC486-- > > >_______________________________________________ >User-mode-linux-devel mailing list >Use...@li... >http://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > -- --------------------------------------------- Check Out: http://stev.org E-Mail: mi...@st... 1:00am up 11 days, 8:55, 4 users, load average: 0.16, 0.05, 0.01 |
From: johan v. <joh...@pa...> - 2001-04-07 00:18:54
|
James Stevenson wrote: > > Hi > > which version is that off ? sorry. This against 2.4.2 with uml-patch-2.4.2.bz2 J. |