From: Pawel S. <pawsa@TheoChem.kth.se> - 2002-08-25 07:31:43
|
Hi, when I try to use bleeding-edge (20020821) R200 drivers on my Sapphire 8500LE card (http://www.sapphiretech.com/8500/8500LE-149-00.htm for more info), I get following warning message: unknown chip id, assuming full r200 support The output of lspci -vv -s 01:00.0 is: 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon QL (prog-if 00 [VGA]) Subsystem: Unknown device 174b:7149 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (2000ns min), cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: Memory at d8000000 (32-bit, prefetchable) [size=128M] Region 1: I/O ports at d800 [size=256] Region 2: Memory at d7000000 (32-bit, non-prefetchable) [size=64K] Expansion ROM at d7fe0000 [disabled] [size=128K] Capabilities: [58] AGP version 2.0 Status: RQ=47 SBA+ 64bit- FW+ Rate=x1,x2 Command: RQ=31 SBA+ AGP+ 64bit- FW- Rate=<none> Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Who may be interested in the card ID to add it to proper tables? The card is working fine otherwise. There were few things that I observed: 1. I get sometimes warnings/informational messages: r200_makeX86Normal3fv/197 CVAL 0 OFFSET 14 VAL 40a4ed20 r200_makeX86Normal3fv/198 CVAL 4 OFFSET 20 VAL 40a4ed24 r200_makeX86Normal3fv/199 CVAL 8 OFFSET 25 VAL 40a4ed28 r200_makeX86Normal3fv done 2. GtkGLArea library does not appear to work well with hardware acceleration: The program 'moled' received an X Window System error. This probably reflects a bug in the program. The error was 'BadValue (integer parameter out of range for operation)'. (Details: serial 226579 error_code 2 request_code 128 minor_code 9) I believe this is GtkGLarea problemsince I saw similar error messages on other 3D accelerated hardware as well - although it does not show up when DRI is off. 3. performance of glxgears when running on a "fresh" system is about 1780fps but sometimes it gets stuck on 172fps - that's an order of magnitude difference. Is it something on should expect? I do not know what exactly triggers it but I remember I ran two OpenGL programs at the same time, perhaps this made it happen? Pawel Salek |
From: Jan S. <th...@bi...> - 2002-08-25 09:18:19
|
<quote who="Pawel Salek"> > when I try to use bleeding-edge (20020821) R200 drivers on my Sapphire > 8500LE card (http://www.sapphiretech.com/8500/8500LE-149-00.htm for > more info), I get following warning message: > > unknown chip id, assuming full r200 support I get this too. Someone can probably confirm it, but from my reading of the code, the check indicates that the card was not in the list of "Cards we know we don't support", so it assumes that it is ok. > 3. performance of glxgears when running on a "fresh" system is about > 1780fps but sometimes it gets stuck on 172fps - that's an order of > magnitude difference. Is it something on should expect? I do not know > what exactly triggers it but I remember I ran two OpenGL programs at > the same time, perhaps this made it happen? Might indicate that the program has been run in a software rendering fallback mode. Can you trigger it reliably, ie by having other DRI apps running at the same time or somesuch? J. -- Jan Schmidt th...@ma... "Stoke me a clipper, I'll be back for Christmas" -- Arnold 'Ace' Rimmer, Red Dwarf |
From: Pawel S. <pawsa@TheoChem.kth.se> - 2002-08-25 09:55:14
|
On 2002.08.25 11:16 Jan Schmidt wrote: > <quote who="Pawel Salek"> > > 3. performance of glxgears when running on a "fresh" system is about > > > 1780fps but sometimes it gets stuck on 172fps - that's an order of > > magnitude difference. Is it something on should expect? I do not > > Might indicate that the program has been run in a software rendering > fallback mode. Can you trigger it reliably, ie by having other DRI > apps running at the same time or somesuch? Thanks for the response. I have checked it more carefully and have to apologise for the misinformation: it turns out that I have forgotten to set R200_NO_USLEEPS=1 I can get the higher value only with this environment variable set. -- Pawel Salek http://www.theochem.kth.se/~pawsa/ |
From: Linus T. <tor...@tr...> - 2002-08-25 18:35:41
|
On Sun, 25 Aug 2002, Pawel Salek wrote: > > I have checked it more carefully and have to apologise for the > misinformation: it turns out that I have forgotten to set > R200_NO_USLEEPS=1 > I can get the higher value only with this environment variable set. I really don't think that the radeon driver should _ever_ use "usleep()" while in user mode. "udelay()" in kernel mode really tries to delay for one microsecond. In contrast, "usleep()" in user mode tries to _sleep_ for one microsecond, which is a totally different kettle of fish, since that implies a dependency on the timer granularity (which is usually not even _close_ to a microsecond - on 2.4.x kernels it is ten milliseconds on x86, while in 2.5.x kernels it is one millisecond). As you can imagine, you don't need that many 10 ms sleeps to make your framerate go down the toilet. In user mode, I suspect that the best approximation of "udelay()" is actually "sched_yield()". By virtue of being a system call, it usually takes about one usec anyway, _and_ it will help interactive feel tremendously if (for example) the X server is ready to do some work due to a keypress or similar. It would be interesting to hear what peoples experience would be with the usleep() replaced by a sched_yield(). Maybe that has other downsides... (sched_yield() also has the added advantage that the kernel actually can understand what the problem is: the process doesn't want to sleep, but it is busy-waiting for something, so it's telling the kernel that it might be advantageous to schedule another process if something is available). Linus |
From: Jan S. <th...@bi...> - 2002-08-25 22:25:25
|
<quote who="Linus Torvalds"> > In user mode, I suspect that the best approximation of "udelay()" is > actually "sched_yield()". By virtue of being a system call, it usually > takes about one usec anyway, _and_ it will help interactive feel > tremendously if (for example) the X server is ready to do some work due to > a keypress or similar. > > It would be interesting to hear what peoples experience would be with the > usleep() replaced by a sched_yield(). Maybe that has other downsides... What other downsides could there be, apart from slightly more busy waiting? In a sleep the process will yield anyway, just for longer, right? -- Jan Schmidt th...@ma... Homer: "No TV and No Beer make Homer something something" Marge: "Go Crazy?" Homer: "Don't mind if I do! aaaarrrarrgghar!" |
From: Peter \Firefly\ L. <fi...@di...> - 2002-08-25 22:33:12
|
On Mon, 26 Aug 2002, Jan Schmidt wrote: > What other downsides could there be, apart from slightly more busy waiting? > In a sleep the process will yield anyway, just for longer, right? Slightly warmer CPU? ;) -Peter |
From: Linus T. <tor...@tr...> - 2002-08-26 03:57:49
|
On Mon, 26 Aug 2002, Peter "Firefly" Lund wrote: > On Mon, 26 Aug 2002, Jan Schmidt wrote: > > > What other downsides could there be, apart from slightly more busy waiting? > > In a sleep the process will yield anyway, just for longer, right? > > Slightly warmer CPU? ;) The CPU will run hotter, yes (and that is an issue on laptops, but probably not one that most people care about while running DOOM III, as getting your pants hot probably just adds to the experience ;) The other (and likely bigger) issue is that the actual semantics of "yield()" will vary between different kernels, and some old kernels in particular will believe that a yield means that the process really isn't very interested in getting CPU time, so it will be perhaps _overly_ eager to give CPU time to other processes. This will mean that GL applications will probably run _much_ slower if you do a kernel compile in the background, but I don't know if people consider that to be a big problem either.. Linus |
From: Ingo M. <mi...@el...> - 2002-08-30 11:27:54
|
On Sun, 25 Aug 2002, Linus Torvalds wrote: > The other (and likely bigger) issue is that the actual semantics of > "yield()" will vary between different kernels, and some old kernels in > particular will believe that a yield means that the process really isn't > very interested in getting CPU time, so it will be perhaps _overly_ > eager to give CPU time to other processes. in fact latest 2.5 kernels do this as well, since this is what some of the important yield() users expect (the VM, some threaded code, etc.) - so if the goal is really to do micro-sleeps then yield() is not the right way to go. It might be better to create some sort of in-kernel wait object for the purposes of DRM, and let userspace sleep on it. This, if performance is really important, should IMO be a spin-futex - when trying to acquire the futex then the user should first spin for a few thousand cycles, and then call FUTEX_WAIT. This futex can be shared across arbitrary process contexts as well. Ingo |
From: Keith W. <ke...@tu...> - 2002-08-30 11:42:52
|
Ingo Molnar wrote: > On Sun, 25 Aug 2002, Linus Torvalds wrote: > > >>The other (and likely bigger) issue is that the actual semantics of >>"yield()" will vary between different kernels, and some old kernels in >>particular will believe that a yield means that the process really isn't >>very interested in getting CPU time, so it will be perhaps _overly_ >>eager to give CPU time to other processes. >> > > in fact latest 2.5 kernels do this as well, since this is what some of the > important yield() users expect (the VM, some threaded code, etc.) - so if > the goal is really to do micro-sleeps then yield() is not the right way to > go. > > It might be better to create some sort of in-kernel wait object for the > purposes of DRM, and let userspace sleep on it. This, if performance is > really important, should IMO be a spin-futex - when trying to acquire the > futex then the user should first spin for a few thousand cycles, and then > call FUTEX_WAIT. This futex can be shared across arbitrary process > contexts as well. The radeon cards can send me back IRQ's at helpful points in the command stream. I think I'd like to use these for most synchronization, however some cleverness may be required to reduce the total number of irq's generated. Is there a threshold say in irqs/second beyond which it makes more sense to busy wait? In a lot of cases it doesn't matter to send the process to sleep for a little longer than necessary, as there's a big queue to hardware and the things we're sleeping on are often in the *middle* of the queue. Ie, if we're waiting on the hardware then we often have some leeway about how long we wait as the hardware has work to get on with in the meantime. I can see this not working so well for situations where the driver is waiting for just one or two commands to drain from an almost empty ring -- eg. mixed accelerated and software rasterization, which I think is common in the X server. Keith |
From: Ingo M. <mi...@el...> - 2002-08-30 11:47:07
|
On Fri, 30 Aug 2002, Keith Whitwell wrote: > The radeon cards can send me back IRQ's at helpful points in the command > stream. I think I'd like to use these for most synchronization, however > some cleverness may be required to reduce the total number of irq's > generated. > > Is there a threshold say in irqs/second beyond which it makes more sense > to busy wait? what is the basic command completion notification method of the hardware? is it basically a pipeline (or ring) of commands and a status bit in some (mmap-shared) register, which is polled by userspace to see whether more commands can be posted? Or do you poll the commands directly? plus the hardware has the ability to also send a notification interrupt if the command queue gets empty? [or half empty - or some threshold?] and what is the typical latency of commands (as executed by the GX hw) - and what is the maximum latency of commands that can occur? also, most of the looping done here is not due to some other process taking the 'DRI lock', correct? Ingo |
From: Keith W. <ke...@tu...> - 2002-08-30 12:06:38
|
Ingo Molnar wrote: > On Fri, 30 Aug 2002, Keith Whitwell wrote: > > >>The radeon cards can send me back IRQ's at helpful points in the command >>stream. I think I'd like to use these for most synchronization, however >>some cleverness may be required to reduce the total number of irq's >>generated. >> >>Is there a threshold say in irqs/second beyond which it makes more sense >>to busy wait? >> > > what is the basic command completion notification method of the hardware? > > is it basically a pipeline (or ring) of commands and a status bit in some > (mmap-shared) register, which is polled by userspace to see whether more > commands can be posted? Or do you poll the commands directly? It's a ring, with head & tail pointer in mmio registers and possibly written back to main memory automatically by the card. However userspace never really gets to see the ring, but builds up buffers of commands and vertices which are dispatched to the ring by the kernel module. Effectively, the ring is big enough that you always run out of something else first. > plus the hardware has the ability to also send a notification interrupt if > the command queue gets empty? [or half empty - or some threshold?] I haven't spotted this one in the docs, but it isn't impossible. > and what is the typical latency of commands (as executed by the GX hw) - > and what is the maximum latency of commands that can occur? This is utterly variable. From vanishingly small to hundreths of seconds per command should cover the normal range of operations. > also, most of the looping done here is not due to some other process > taking the 'DRI lock', correct? Correct. We're looping because there previously was no choice but to poll status registers (or their in-memory shadows) to determine things like: - Runaway rendering. With a big pipe and small, expensive frames, you can queue up minutes worth of frames before running out of resources. Not good for interactivity, so I limit the nr of outstanding frames to 2. We poll an in-memory value which is written to by the card. This is where previously we had a tight loop, then a loop with usleep(), Linus suggestted sched_yield(), and which now I've got irq's working, I think are probably the best option. - Shutting down the 3d pipe. For software fallbacks you want to let it drain so you can access the framebuffer directly. - Waiting for resources. Driver currently has dinky little 64k dma buffers and needs to keep releasing them and picking them up again. Sometimes it has to wait for one to become free. This is a question mark for me -- if I dropped an irq after releasing each one of these, that would be a vast number of the things generated per second. Part of the trouble is the small size of the buffers relative to the actual volume of traffic to the card... I'm thinking about letting the driver grab a chunk of agp sized according to its needs and letting it organize it's own synchronization, hopefully piggybacking off the irq's for runaway rendering prevention. The DRI lock has always been a combined userspace/system thing - to grab it there is a 'lock xxx' instruction which only on failing requires an ioctl. However, the drivers don't grab it that often, and when they do they're usually heading for kernel space anyway. Keith |
From: Ingo M. <mi...@el...> - 2002-08-30 12:20:25
|
On Fri, 30 Aug 2002, Keith Whitwell wrote: > It's a ring, with head & tail pointer in mmio registers and possibly > written back to main memory automatically by the card. However > userspace never really gets to see the ring, but builds up buffers of > commands and vertices which are dispatched to the ring by the kernel > module. one more (probably stupid) question: you enter kernel mode (via the ioctl) only if some sort of exceptional thing occurs, right - or do you enter it for every GX op posted to the card? I was assuming that userspace had direct access to the command ring [thus posting a GX op was a userspace-only thing], but your mail made me unsure about this. > Correct. We're looping because there previously was no choice but to > poll status registers (or their in-memory shadows) to determine things > like: > > - Runaway rendering. With a big pipe and small, expensive frames, > you can queue up minutes worth of frames before running out of > resources. Not good for interactivity, so I limit the nr of outstanding > frames to 2. We poll an in-memory value which is written to by the > card. This is where previously we had a tight loop, then a loop with > usleep(), Linus suggestted sched_yield(), and which now I've got irq's > working, I think are probably the best option. i'd suggest to use interrupts only if it can be ensured that you will get an interrupt only per frame. On current x86 hardware interrupts have a typical latency of 10 usecs, 30-50% of that is direct CPU overhead, ie. on a 1 GHz CPU it's 3000-5000 cycles - this is just the pure IRQ entry/exit. if the polling only has to be done to limit work to be at most 1 frame away from the current frame (correct?), and if the GPU can interrupt the host CPU when encountering a specific gx op (the last operation belonging to the current frame?), then the use of interrupts is ideal, the overhead will be low and the processes interact with the OS in the nicest possible way. The process wont lose any timeslices due to sched_yield() - all waiting/wakeup can be nicely done from the DRM kernel code and DRM IRQ handler. > - Shutting down the 3d pipe. For software fallbacks you want to > let it drain so you can access the framebuffer directly. ok - this is a boundary case anyway, but IRQs should work equally well here. > - Waiting for resources. Driver currently has dinky little 64k > dma buffers and needs to keep releasing them and picking them up again. > Sometimes it has to wait for one to become free. This is a question > mark for me -- if I dropped an irq after releasing each one of these, > that would be a vast number of the things generated per second. Part of > the trouble is the small size of the buffers relative to the actual > volume of traffic to the card... I'm thinking about letting the driver > grab a chunk of agp sized according to its needs and letting it organize > it's own synchronization, hopefully piggybacking off the irq's for > runaway rendering prevention. i'd suggest to grab all DMA resources upon module initialization (unless it's some unrealistic amount of RAM), this will also make coding easier and will make the code faster. You should most definitely avoid any IRQ overhead (and even allocation overhead) in this area, unless some hardware limit really forces you to do so. I really think that anyone who wants to run a r200 based card can afford 16 MB (or more) preallocated DMA space. > The DRI lock has always been a combined userspace/system thing - to grab > it there is a 'lock xxx' instruction which only on failing requires an > ioctl. ok. Basically a pre-sys_futex() futex :-) > However, the drivers don't grab it that often, and when they do they're > usually heading for kernel space anyway. ok. Ingo |
From: Keith W. <ke...@tu...> - 2002-08-30 12:42:40
|
Ingo Molnar wrote: > On Fri, 30 Aug 2002, Keith Whitwell wrote: > > >>It's a ring, with head & tail pointer in mmio registers and possibly >>written back to main memory automatically by the card. However >>userspace never really gets to see the ring, but builds up buffers of >>commands and vertices which are dispatched to the ring by the kernel >>module. >> > > one more (probably stupid) question: you enter kernel mode (via the ioctl) > only if some sort of exceptional thing occurs, right - or do you enter it > for every GX op posted to the card? I was assuming that userspace had > direct access to the command ring [thus posting a GX op was a > userspace-only thing], but your mail made me unsure about this. For security reasons the kernel has to be involved. The userspace driver builds a buffer of commands (which need to be vetted) and popluates agp buffers directly with vertex data (which is safe). At a later point, the lock is grabbed and an ioctl is called to fire the accumulated commands. > >>Correct. We're looping because there previously was no choice but to >>poll status registers (or their in-memory shadows) to determine things >>like: >> >> - Runaway rendering. With a big pipe and small, expensive frames, >>you can queue up minutes worth of frames before running out of >>resources. Not good for interactivity, so I limit the nr of outstanding >>frames to 2. We poll an in-memory value which is written to by the >>card. This is where previously we had a tight loop, then a loop with >>usleep(), Linus suggestted sched_yield(), and which now I've got irq's >>working, I think are probably the best option. >> > > i'd suggest to use interrupts only if it can be ensured that you will get > an interrupt only per frame. On current x86 hardware interrupts have a > typical latency of 10 usecs, 30-50% of that is direct CPU overhead, ie. on > a 1 GHz CPU it's 3000-5000 cycles - this is just the pure IRQ entry/exit. Fair enough. This is a good answer to my question. > if the polling only has to be done to limit work to be at most 1 frame > away from the current frame (correct?), and if the GPU can interrupt the > host CPU when encountering a specific gx op (the last operation belonging > to the current frame?), then the use of interrupts is ideal, the overhead > will be low and the processes interact with the OS in the nicest possible > way. The process wont lose any timeslices due to sched_yield() - all > waiting/wakeup can be nicely done from the DRM kernel code and DRM IRQ > handler. Yep, this is the scheme I'd like. >> - Shutting down the 3d pipe. For software fallbacks you want to >>let it drain so you can access the framebuffer directly. >> > > ok - this is a boundary case anyway, but IRQs should work equally well > here. > > >> - Waiting for resources. Driver currently has dinky little 64k >>dma buffers and needs to keep releasing them and picking them up again. >>Sometimes it has to wait for one to become free. This is a question >>mark for me -- if I dropped an irq after releasing each one of these, >>that would be a vast number of the things generated per second. Part of >>the trouble is the small size of the buffers relative to the actual >>volume of traffic to the card... I'm thinking about letting the driver >>grab a chunk of agp sized according to its needs and letting it organize >>it's own synchronization, hopefully piggybacking off the irq's for >>runaway rendering prevention. >> > > i'd suggest to grab all DMA resources upon module initialization (unless > it's some unrealistic amount of RAM), this will also make coding easier > and will make the code faster. You should most definitely avoid any IRQ > overhead (and even allocation overhead) in this area, unless some hardware > limit really forces you to do so. I really think that anyone who wants to > run a r200 based card can afford 16 MB (or more) preallocated DMA space. Effectively this is what I'm proposing. The drm kernel module already preallocates it's dma resources, but they are then shared out 64k at a time to active userspace contexts. I think we'd be better off having the userspace contexts each grab and hold onto several meg of agp for this each. Keith |
From: Ingo M. <mi...@el...> - 2002-08-30 12:57:35
|
On Fri, 30 Aug 2002, Keith Whitwell wrote: > For security reasons the kernel has to be involved. The userspace > driver builds a buffer of commands (which need to be vetted) and > popluates agp buffers directly with vertex data (which is safe). At a > later point, the lock is grabbed and an ioctl is called to fire the > accumulated commands. okay - as long as commands are buffered, and the 'bulk' of command related data is shared between the card and userspace, entering the kernel to post (possibly thousands of) commands is not a performance issue, and it's thus the right solution. Is the basic model how eg. Windows uses 3D cards? Ingo |
From: Keith W. <ke...@tu...> - 2002-08-30 13:28:09
|
Ingo Molnar wrote: > On Fri, 30 Aug 2002, Keith Whitwell wrote: > > >>For security reasons the kernel has to be involved. The userspace >>driver builds a buffer of commands (which need to be vetted) and >>popluates agp buffers directly with vertex data (which is safe). At a >>later point, the lock is grabbed and an ioctl is called to fire the >>accumulated commands. >> > > okay - as long as commands are buffered, and the 'bulk' of command related > data is shared between the card and userspace, entering the kernel to post > (possibly thousands of) commands is not a performance issue, and it's thus > the right solution. Is the basic model how eg. Windows uses 3D cards? I think the windows and closed-source drivers generally don't recognise the security problems or don't care about them. The driver probably looks similar, but avoids the inspection/copying step that we have to do - so that the userspace can just build command buffers in agp space which are scheduled onto the ring perhaps by a kernel module. Keith |
From: Ingo M. <mi...@el...> - 2002-08-30 13:21:12
|
On Fri, 30 Aug 2002, Keith Whitwell wrote: > I think the windows and closed-source drivers generally don't recognise > the security problems or don't care about them. > > The driver probably looks similar, but avoids the inspection/copying > step that we have to do - so that the userspace can just build command > buffers in agp space which are scheduled onto the ring perhaps by a > kernel module. hm, what are the wost-case security issues - can userspace cause the GX card to DMA into arbitrary RAM? Or only GX denial-of-service type of issues like monopolizing the GX card or interfering with other apps' GX ops? Ingo |
From: Keith W. <ke...@tu...> - 2002-08-30 13:41:17
|
Ingo Molnar wrote: > On Fri, 30 Aug 2002, Keith Whitwell wrote: > > >>I think the windows and closed-source drivers generally don't recognise >>the security problems or don't care about them. >> >>The driver probably looks similar, but avoids the inspection/copying >>step that we have to do - so that the userspace can just build command >>buffers in agp space which are scheduled onto the ring perhaps by a >>kernel module. >> > > hm, what are the wost-case security issues - can userspace cause the GX > card to DMA into arbitrary RAM? Typically. > Or only GX denial-of-service type of > issues like monopolizing the GX card or interfering with other apps' GX > ops? That's pretty much impossible to stop. Also hard lockups are typically possible even with streams of 'correct' commands -- cards can be incredibly sensitive about the oddest things and just hard lock the system if they get someting they don't like. Maybe when we get to the point that we don't do this ourselves unintentionally we can start worrying about apps doing it intentionally... Keith |
From: Ingo M. <mi...@el...> - 2002-08-30 13:34:58
|
On Fri, 30 Aug 2002, Keith Whitwell wrote: > > hm, what are the wost-case security issues - can userspace cause the GX > > card to DMA into arbitrary RAM? > > Typically. hm, do card-vendors know about these security problems? Ie. are these cards documented as not usable on server systems under Windows? or are there perhaps some security features that enable the passing of 'safe' commands via userspace-only methods, which we are simply not utilizing in Linux? Arbitrary DMA to kernel memory is a wide enough security hole to drive a truck through, even on Windows. > > Or only GX denial-of-service type of > > issues like monopolizing the GX card or interfering with other apps' GX > > ops? > > That's pretty much impossible to stop. Also hard lockups are typically > possible even with streams of 'correct' commands -- cards can be > incredibly sensitive about the oddest things and just hard lock the > system if they get someting they don't like. Maybe when we get to the > point that we don't do this ourselves unintentionally we can start > worrying about apps doing it intentionally... yeah - but it's possible to sanitize the interface via incremental changes to the kernel code - while the Windows drivers appear to have the security hole designed in pretty much. Ingo |
From: Keith W. <ke...@tu...> - 2002-08-30 14:10:24
|
Ingo Molnar wrote: > On Fri, 30 Aug 2002, Keith Whitwell wrote: > > >>>hm, what are the wost-case security issues - can userspace cause the GX >>>card to DMA into arbitrary RAM? >>> >>Typically. >> > > hm, do card-vendors know about these security problems? Ie. are these > cards documented as not usable on server systems under Windows? > > or are there perhaps some security features that enable the passing of > 'safe' commands via userspace-only methods, which we are simply not > utilizing in Linux? Arbitrary DMA to kernel memory is a wide enough > security hole to drive a truck through, even on Windows. It's possible - the i810 had a scheme to do that, but it also had a flaw that rendered it useless. I haven't seen a way to do it on the radeon-type cards. >>>Or only GX denial-of-service type of >>>issues like monopolizing the GX card or interfering with other apps' GX >>>ops? >>> >>That's pretty much impossible to stop. Also hard lockups are typically >>possible even with streams of 'correct' commands -- cards can be >>incredibly sensitive about the oddest things and just hard lock the >>system if they get someting they don't like. Maybe when we get to the >>point that we don't do this ourselves unintentionally we can start >>worrying about apps doing it intentionally... >> > > yeah - but it's possible to sanitize the interface via incremental changes > to the kernel code - while the Windows drivers appear to have the security > hole designed in pretty much. Bear in mind that I haven't seen the windows drivers, so I can really only speculate on what is happening there. I'm heading off now on holidays -- back on the 9th. Keith |
From: Ingo M. <mi...@el...> - 2002-08-30 11:35:08
|
basically yield() these days is for userspace to express: "my locking design sucks, i dont have the slightest idea what i'm waiting for, but right now i cannot continue with my work so give some other process a chance to resolve the locking. Schedule me back later on." And the 2.5 scheduler really treats the thread in such a way => yield() is a big heavy penalty in the order of scheduling. on the 2.5 kernel there's almost no legitimate reason to use yield() these days (except in some well-specified kernel-internal cases), with the availability of futexes there's no "i dont want to enter the kernel to get a lock" excuse anymore. eg. one of the biggest (ab-)users of sched_yield() was libpthread. In the new, futex based pthreads code nothing uses yield() anymore, internally. Ingo |