From: <uni...@sh...> - 2004-10-31 15:52:45
|
Hi, list! With display cards that have more and more hardware on them, (TV-capture, mpeg decoders) etc. that can work independently of oneanother, but share the same DMA engine I've find the need for more than one hardware lock. I've done a simple implementation for the mpeg decoder of the via driver, but that one doesn't cover the DMA case. The question arises "Why should I need to wait for DMA quiescent to check whether the decoder is done with a frame, if there is no decoder data in any of the pending DMA buffers"? In the VIA / Unichrome case alone there is a need for even more such locks for different parts of the chip if one were to make a clean implementation of drivers for all features that are on the chip. My idea would be to extend drm with options for multiple locks, and I suspect not only VIA cards could benefit from this. I was thinking of. 1. A separate sarea to contain these locks, to avoid messing up the current sarea with binary incompatibilities as a consequence. 2. Other kernel modules should be able to take and release these locks. (V4L for example). 3. Each DMA buffer is marked (or in the VIA case, each submission to the ring-buffer is marked) wether it accesses the resource that is protected by a certain lock. 4. A resource will become available to a client when the client has taken the lock and there are no pending DMA buffers / parts of buffers that are marked touching this resource. 5. The client is responsible for reinitializing the resource once the lock is taken. These are just initial thoughts. Is there a mechanism for this in DRM today or could it be done in a better way? /Thomas |
From: Jon S. <jon...@gm...> - 2004-10-31 16:34:29
|
On Sun, 31 Oct 2004 16:52:35 +0100, Thomas Hellstr=F6m <uni...@sh...> wrote: > 1. A separate sarea to contain these locks, to avoid messing up the > current sarea with binary incompatibilities as a consequence. It would probably be better to extend the current driver specific sarea. You can negotiate the driver interface version to enable the new functions. There should be room: #define SAREA_MAX 0x2000 Where is sarea allocated? I looked for five minutes and couldn't find it. > 2. Other kernel modules should be able to take and release these locks. > (V4L for example). > 3. Each DMA buffer is marked (or in the VIA case, each submission to the > ring-buffer is marked) wether it accesses the resource that is protected > by a certain lock. > 4. A resource will become available to a client when the client has > taken the lock and there are no pending DMA buffers / parts of buffers > that are marked touching this resource. > 5. The client is responsible for reinitializing the resource once the > lock is taken. >=20 > These are just initial thoughts. Is there a mechanism for this in DRM > today or could > it be done in a better way? >=20 > /Thomas >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=3D5588&alloc_id=3D12065&op=3Dclick > -- > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel >=20 --=20 Jon Smirl jon...@gm... |
From: <uni...@sh...> - 2004-10-31 17:41:49
|
Jon Smirl wrote: >On Sun, 31 Oct 2004 16:52:35 +0100, Thomas Hellstr=F6m ><uni...@sh...> wrote: > =20 > >>1. A separate sarea to contain these locks, to avoid messing up the >>current sarea with binary incompatibilities as a consequence. >> =20 >> > >It would probably be better to extend the current driver specific >sarea. You can negotiate the driver interface version to enable the >new functions. There should be room: >#define SAREA_MAX 0x2000 > >Where is sarea allocated? I looked for five minutes and couldn't find it > =20 > Hi! The idea of using a separate sarea is that it would be easy to extend=20 the number of locks and more suitable for more drivers than via.=20 Otherwise one idea would be to fill the private sarea from the back, but that would break DDX tests for=20 size of usable area. Different sareas are allocated using drmAddMap with type=3DDRM_SHM. The=20 one containing the current hardware lock is specified with the flag=20 DRM_CONTAINS_LOCK. /Thomas > =20 > >>2. Other kernel modules should be able to take and release these locks. >>(V4L for example). >>3. Each DMA buffer is marked (or in the VIA case, each submission to th= e >>ring-buffer is marked) wether it accesses the resource that is protecte= d >>by a certain lock. >>4. A resource will become available to a client when the client has >>taken the lock and there are no pending DMA buffers / parts of buffers >>that are marked touching this resource. >>5. The client is responsible for reinitializing the resource once the >>lock is taken. >> >>These are just initial thoughts. Is there a mechanism for this in DRM >>today or could >>it be done in a better way? >> >>/Thomas >> >>------------------------------------------------------- >>This SF.Net email is sponsored by: >>Sybase ASE Linux Express Edition - download now for FREE >>LinuxWorld Reader's Choice Award Winner for best database on Linux. >>http://ads.osdn.com/?ad_id=3D5588&alloc_id=3D12065&op=3Dclick >>-- >>_______________________________________________ >>Dri-devel mailing list >>Dri...@li... >>https://lists.sourceforge.net/lists/listinfo/dri-devel >> >> =20 >> > > > =20 > |
From: Jon S. <jon...@gm...> - 2004-10-31 18:12:23
|
On Sun, 31 Oct 2004 18:41:42 +0100, Thomas Hellstr=F6m <uni...@sh...> wrote: > The idea of using a separate sarea is that it would be easy to extend th= e > number of locks and more suitable for more drivers than via. Otherwise on= e > idea would be to=20 > fill the private sarea from the back, but that would break DDX tests for > size of usable area. > =20 > Different sareas are allocated using drmAddMap with type=3DDRM_SHM. The = one > containing the current hardware lock is specified with the flag > DRM_CONTAINS_LOCK. Shouldn't the sarea have been allocated by the driver in the first place? Maybe this is another place for pemanent maps. I will probably have to change this for multihead support running indenpendent X servers. The current design assumes a master process that creates/deletes sarea and that isn't the case for indepenent multi-head. Code like this is a mistake: drmInfo.sarea_priv_offset =3D sizeof(drm_sarea_t); The first member of drm_sarea_t should have been an offset to the private sarea. Doing it that way would automatically adjust if the size of drm_sarea_t is changed. Offset should have been filled in by the DRM driver. I don't see any code computing sizeof(drm_sarea_t) + sizeof(drm_xxx_sarea_t). What is getting stored in the SAREA page after the private area? --=20 Jon Smirl jon...@gm... |
From: <uni...@sh...> - 2004-10-31 18:41:15
|
Jon Smirl wrote: >On Sun, 31 Oct 2004 18:41:42 +0100, Thomas Hellstr=F6m ><uni...@sh...> wrote: > =20 > >> The idea of using a separate sarea is that it would be easy to extend = the >>number of locks and more suitable for more drivers than via. Otherwise = one >>idea would be to=20 >> fill the private sarea from the back, but that would break DDX tests f= or >>size of usable area. >>=20 >> Different sareas are allocated using drmAddMap with type=3DDRM_SHM. Th= e one >>containing the current hardware lock is specified with the flag >>DRM_CONTAINS_LOCK. >> =20 >> > >Shouldn't the sarea have been allocated by the driver in the first >place? Maybe this is another place for pemanent maps. I will probably >have to change this for multihead support running indenpendent X >servers. The current design assumes a master process that >creates/deletes sarea and that isn't the case for indepenent >multi-head. > >Code like this is a mistake: >drmInfo.sarea_priv_offset =3D sizeof(drm_sarea_t); > >The first member of drm_sarea_t should have been an offset to the >private sarea. Doing it that way would automatically adjust if the >size of drm_sarea_t is changed. Offset should have been filled in by >the DRM driver. > >I don't see any code computing sizeof(drm_sarea_t) + >sizeof(drm_xxx_sarea_t). What is getting stored in the SAREA page >after the private area? > > =20 > X contains code like /* For now the mapping works by using a fixed size defined * in the SAREA header */ if (sizeof(XF86DRISAREARec)+sizeof(VIASAREAPriv) > SAREA_MAX) { xf86DrvMsg(pScrn->scrnIndex, X_ERROR, "Data does not fit in SAREA\n"); return FALSE; } pDRIInfo->SAREASize =3D SAREA_MAX; So if locks are going to be squeezed in somewhere I'd either have to fit=20 them in the XF86DRISAREARec or put them into every driver's private area. BTW, The "Old" drm functionality and design was very well documented by=20 precision insight / VAlinux. Now that permanent maps are introduced=20 and new requirements are made on the hw-specific drivers, is there a=20 chance that these documents could be updated? Regards Thomas |
From: Jon S. <jon...@gm...> - 2004-10-31 18:54:13
|
On Sun, 31 Oct 2004 19:41:03 +0100, Thomas Hellstr=F6m <uni...@sh...> wrote: > =20 > /* For now the mapping works by using a fixed size defined > * in the SAREA header > */ > if (sizeof(XF86DRISAREARec)+sizeof(VIASAREAPriv) > SAREA_MAX) { > xf86DrvMsg(pScrn->scrnIndex, X_ERROR, > "Data does not fit in SAREA\n"); > return FALSE; > } > pDRIInfo->SAREASize =3D SAREA_MAX; > =20 > So if locks are going to be squeezed in somewhere I'd either have to fit > them in the XF86DRISAREARec or put them into every driver's private area= . You can't put them into XF86DRISAREARec because of code like this: drmInfo.sarea_priv_offset =3D sizeof(drm_sarea_t); drm_sarea_t is the same structure as XF86DRISAREARec. Are the locks generic enough that all hardware needs them? You can extend VIASAREAPriv (drm_via_sarea_t) without messing up the above check. drm_via_sarea_t is much smaller than SAREA_MAX. You will still need to negotiate an interface version since some servers will know about the extended locks and others won't. You'll have to revert to the big lock if all of the clients don't know about the new lock scheme. --=20 Jon Smirl jon...@gm... |
From: Keith W. <ke...@tu...> - 2004-10-31 19:24:25
|
Thomas Hellstr=F6m wrote: > Hi, list! >=20 > With display cards that have more and more hardware on them,=20 > (TV-capture, mpeg decoders) etc. that can work independently of=20 > oneanother, but share the same DMA engine I've find the need for more=20 > than one hardware lock.=20 The first question is - have you found that lock contention is actually a= =20 problem? > I've done a simple implementation for the mpeg=20 > decoder of the via driver, but that one doesn't cover the DMA case. The= =20 > question arises "Why should I need to wait for DMA quiescent to check=20 > whether the decoder is done with a frame, if there is no decoder data i= n=20 > any of the pending DMA buffers"? But this question isn't really answered by having multiple locks - it sou= nds=20 more like you want some sort of IRQ notification or timestamping mechanis= m.=20 Under normal circumstances grabbing the lock doesn't mean waiting for DMA= =20 quiescence. > In the VIA / Unichrome case alone there is a need for even more such=20 > locks for different parts of the chip if one were to make a clean=20 > implementation of drivers for all features that are on the chip. >=20 > My idea would be to extend drm with options for multiple locks, and I=20 > suspect not only VIA cards could benefit from this. I was thinking of. For many cards, there is a single dma-driven command queue, and the lock = is=20 used to protect that queue. All sorts of stuff (video, 2d, 3d) is delive= red=20 on the same queue. It sounds like the VIA driver follows a similar=20 single-queue model. > 1. A separate sarea to contain these locks, to avoid messing up the=20 > current sarea with binary incompatibilities as a consequence. > 2. Other kernel modules should be able to take and release these locks.= =20 > (V4L for example). > 3. Each DMA buffer is marked (or in the VIA case, each submission to th= e=20 > ring-buffer is marked) wether it accesses the resource that is protecte= d=20 > by a certain lock. > 4. A resource will become available to a client when the client has=20 > taken the lock and there are no pending DMA buffers / parts of buffers=20 > that are marked touching this resource. > 5. The client is responsible for reinitializing the resource once the=20 > lock is taken. But it still sounds like there is a single ring buffer, right? Won't you= need=20 a lock to protect the ringbuffer? Won't everything have to grab that loc= k? Also, how does direct framebuffer access work? The X server presumably n= ow=20 has to grab all of the locks, and likewise 3d fallbacks, to prevent all a= ccess=20 to the framebuffer? > These are just initial thoughts. Is there a mechanism for this in DRM=20 > today or could > it be done in a better way? I guess I'm not sure which problem you're trying to solve. There are a c= ouple=20 I can think of so I'll list them here: - Lock contention. Under what circumstances? - Unnecessary flushing of the DMA queue/ringbuffer. IE. If you want to= =20 write to/read from a surface in video ram, how do you know when the video= card=20 has finished with it? - Something else? Keith =09 |
From: <uni...@sh...> - 2004-10-31 21:18:45
|
Keith Whitwell wrote: > Thomas Hellstr=F6m wrote: > >> Hi, list! >> >> With display cards that have more and more hardware on them,=20 >> (TV-capture, mpeg decoders) etc. that can work independently of=20 >> oneanother, but share the same DMA engine I've find the need for more=20 >> than one hardware lock.=20 > > > The first question is - have you found that lock contention is=20 > actually a problem? > >> I've done a simple implementation for the mpeg decoder of the via=20 >> driver, but that one doesn't cover the DMA case. The question arises=20 >> "Why should I need to wait for DMA quiescent to check whether the=20 >> decoder is done with a frame, if there is no decoder data in any of=20 >> the pending DMA buffers"? > > > But this question isn't really answered by having multiple locks - it=20 > sounds more like you want some sort of IRQ notification or=20 > timestamping mechanism. Under normal circumstances grabbing the lock=20 > doesn't mean waiting for DMA quiescence. > The typical case here: I want a DRI client to flip a video frame to screen, using a hardware=20 entity called the HQV. This is a rather time critical operation. To do=20 this I have to take the hardware lock. While this is happening, another thread is waiting for the mpeg decoder=20 to complete a frame. To to that, this thread needs to take the hardware=20 lock, wait for quiescent DMA, and then wait for the mpeg decoder to=20 signal idle. It might be that the DMA command queue does not even=20 contain mpeg data. This waiting delays the frame flip enough to create a=20 visible jump in the video. With multiple locks: The first thread checks the HQV lock, it is available and frame flipping=20 is done immediately. The other thread meanwhile takes the MPEG engine lock, waits until the=20 DMA engine has processed all MPEG commands in the command queue and then=20 waits for the MPEG engine to be idle. DMA might still be processing 3D=20 commands. >> In the VIA / Unichrome case alone there is a need for even more such=20 >> locks for different parts of the chip if one were to make a clean=20 >> implementation of drivers for all features that are on the chip. >> >> My idea would be to extend drm with options for multiple locks, and I=20 >> suspect not only VIA cards could benefit from this. I was thinking of. > > > For many cards, there is a single dma-driven command queue, and the=20 > lock is used to protect that queue. All sorts of stuff (video, 2d,=20 > 3d) is delivered on the same queue. It sounds like the VIA driver=20 > follows a similar single-queue model. > Yes. >> 1. A separate sarea to contain these locks, to avoid messing up the=20 >> current sarea with binary incompatibilities as a consequence. >> 2. Other kernel modules should be able to take and release these=20 >> locks. (V4L for example). >> 3. Each DMA buffer is marked (or in the VIA case, each submission to=20 >> the ring-buffer is marked) wether it accesses the resource that is=20 >> protected by a certain lock. >> 4. A resource will become available to a client when the client has=20 >> taken the lock and there are no pending DMA buffers / parts of=20 >> buffers that are marked touching this resource. >> 5. The client is responsible for reinitializing the resource once the=20 >> lock is taken. > > > But it still sounds like there is a single ring buffer, right? Won't=20 > you need a lock to protect the ringbuffer? Won't everything have to=20 > grab that lock? > Only while submitting command buffer data. This will hopefully be a very=20 fast operation. The IOCTL copying this data to the ringbuffer will check=20 that all locks are held for the hardware entities that the submitted=20 command batch touches. The user will have to tell the IOCTL which=20 entities these are, or possibly the command verifier checks this but I=20 consider that an overkill since that is more of a bug-check than a=20 security check. > Also, how does direct framebuffer access work? The X server=20 > presumably now has to grab all of the locks, and likewise 3d=20 > fallbacks, to prevent all access to the framebuffer? > The current heavyweight lock will protect framebuffer areas that are not=20 managed by the drm memory manager. The IOCTL checking dma submission=20 will check that this lock is held for 3d and 2d engine command=20 submissions that touch this area. This will guarantee compatibility with=20 the current DRI locking mechanism. Still, it will be possible to submit,=20 for example. mpeg data to the command queue without taking the=20 heavywieght lock, or submit 2d engine commands that blits one off-screen=20 mpeg frame-buffer to another. These operations should not need to wait=20 for software rendering into other parts of the frame-buffer. >> These are just initial thoughts. Is there a mechanism for this in DRM=20 >> today or could >> it be done in a better way? > > > I guess I'm not sure which problem you're trying to solve. There are=20 > a couple I can think of so I'll list them here: > > - Lock contention. Under what circumstances? > > - Unnecessary flushing of the DMA queue/ringbuffer. IE. If you=20 > want to write to/read from a surface in video ram, how do you know=20 > when the video card has finished with it? > > - Something else? > > Keith > > =20 I hope I described the problem and the proposed way to solve it. Further comments are appreciated. /Thomas |
From: Eric A. <et...@lc...> - 2004-11-01 00:23:44
|
On Sun, 2004-10-31 at 13:18, Thomas Hellstr=F6m wrote: > Keith Whitwell wrote: >=20 > > Thomas Hellstr=F6m wrote: > > > >> Hi, list! > >> > >> With display cards that have more and more hardware on them,=20 > >> (TV-capture, mpeg decoders) etc. that can work independently of=20 > >> oneanother, but share the same DMA engine I've find the need for more=20 > >> than one hardware lock.=20 > > > > > > The first question is - have you found that lock contention is=20 > > actually a problem? > > > >> I've done a simple implementation for the mpeg decoder of the via=20 > >> driver, but that one doesn't cover the DMA case. The question arises=20 > >> "Why should I need to wait for DMA quiescent to check whether the=20 > >> decoder is done with a frame, if there is no decoder data in any of=20 > >> the pending DMA buffers"? > > > > > > But this question isn't really answered by having multiple locks - it=20 > > sounds more like you want some sort of IRQ notification or=20 > > timestamping mechanism. Under normal circumstances grabbing the lock=20 > > doesn't mean waiting for DMA quiescence. > > > The typical case here: >=20 > I want a DRI client to flip a video frame to screen, using a hardware=20 > entity called the HQV. This is a rather time critical operation. To do=20 > this I have to take the hardware lock. >=20 > While this is happening, another thread is waiting for the mpeg decoder=20 > to complete a frame. To to that, this thread needs to take the hardware=20 > lock, wait for quiescent DMA, and then wait for the mpeg decoder to=20 > signal idle. It might be that the DMA command queue does not even=20 > contain mpeg data. This waiting delays the frame flip enough to create a=20 > visible jump in the video. >=20 > With multiple locks: >=20 > The first thread checks the HQV lock, it is available and frame flipping=20 > is done immediately. >=20 > The other thread meanwhile takes the MPEG engine lock, waits until the=20 > DMA engine has processed all MPEG commands in the command queue and then=20 > waits for the MPEG engine to be idle. DMA might still be processing 3D=20 > commands. Do you not have interrupts that either signal MPEG engine idle, or just sw interrupts you can drop in the command stream? That would let you sleep waiting for them (rather than spinning, a win in itself) and you wouldn't have to hold the hardware lock. --=20 Eric Anholt et...@lc... =20 http://people.freebsd.org/~anholt/ anholt@FreeBSD.org |
From: Thomas <uni...@sh...> - 2004-11-01 06:04:06
|
> On Sun, 2004-10-31 at 13:18, Thomas Hellstr=F6m wrote: >> Keith Whitwell wrote: >> >> > Thomas Hellstr=F6m wrote: >> > >> >> Hi, list! >> >> >> >> With display cards that have more and more hardware on them, >> >> (TV-capture, mpeg decoders) etc. that can work independently of >> >> oneanother, but share the same DMA engine I've find the need for mo= re >> >> than one hardware lock. >> > >> > >> > The first question is - have you found that lock contention is >> > actually a problem? >> > >> >> I've done a simple implementation for the mpeg decoder of the via >> >> driver, but that one doesn't cover the DMA case. The question arise= s >> >> "Why should I need to wait for DMA quiescent to check whether the >> >> decoder is done with a frame, if there is no decoder data in any of >> >> the pending DMA buffers"? >> > >> > >> > But this question isn't really answered by having multiple locks - i= t >> > sounds more like you want some sort of IRQ notification or >> > timestamping mechanism. Under normal circumstances grabbing the lock >> > doesn't mean waiting for DMA quiescence. >> > >> The typical case here: >> >> I want a DRI client to flip a video frame to screen, using a hardware >> entity called the HQV. This is a rather time critical operation. To do >> this I have to take the hardware lock. >> >> While this is happening, another thread is waiting for the mpeg decode= r >> to complete a frame. To to that, this thread needs to take the hardwar= e >> lock, wait for quiescent DMA, and then wait for the mpeg decoder to >> signal idle. It might be that the DMA command queue does not even >> contain mpeg data. This waiting delays the frame flip enough to create= a >> visible jump in the video. >> >> With multiple locks: >> >> The first thread checks the HQV lock, it is available and frame flippi= ng >> is done immediately. >> >> The other thread meanwhile takes the MPEG engine lock, waits until the >> DMA engine has processed all MPEG commands in the command queue and th= en >> waits for the MPEG engine to be idle. DMA might still be processing 3D >> commands. > > Do you not have interrupts that either signal MPEG engine idle, or just > sw interrupts you can drop in the command stream? That would let you > sleep waiting for them (rather than spinning, a win in itself) and you > wouldn't have to hold the hardware lock. You're right. Unfortunately the MPEG interrupt is not functioning on the CLE266 (HW bug according to VIA). Also there doesn't seem to be a SW command stream interrupt either. Not even a command stream completion interrupt. /Thomas > > -- > Eric Anholt et...@lc... > http://people.freebsd.org/~anholt/ anholt@FreeBSD.org > > |
From: Keith W. <ke...@tu...> - 2004-11-01 10:02:31
|
Thomas Hellstr=F6m wrote: >>On Sun, 2004-10-31 at 13:18, Thomas Hellstr=F6m wrote: >> >>>Keith Whitwell wrote: >>> >>> >>>>Thomas Hellstr=F6m wrote: >>>> >>>> >>>>>Hi, list! >>>>> >>>>>With display cards that have more and more hardware on them, >>>>>(TV-capture, mpeg decoders) etc. that can work independently of >>>>>oneanother, but share the same DMA engine I've find the need for mor= e >>>>>than one hardware lock. >>>> >>>> >>>>The first question is - have you found that lock contention is >>>>actually a problem? >>>> >>>> >>>>>I've done a simple implementation for the mpeg decoder of the via >>>>>driver, but that one doesn't cover the DMA case. The question arises >>>>>"Why should I need to wait for DMA quiescent to check whether the >>>>>decoder is done with a frame, if there is no decoder data in any of >>>>>the pending DMA buffers"? >>>> >>>> >>>>But this question isn't really answered by having multiple locks - it >>>>sounds more like you want some sort of IRQ notification or >>>>timestamping mechanism. Under normal circumstances grabbing the lock >>>>doesn't mean waiting for DMA quiescence. >>>> >>> >>>The typical case here: >>> >>>I want a DRI client to flip a video frame to screen, using a hardware >>>entity called the HQV. This is a rather time critical operation. To do >>>this I have to take the hardware lock. >>> >>>While this is happening, another thread is waiting for the mpeg decode= r >>>to complete a frame. To to that, this thread needs to take the hardwar= e >>>lock, wait for quiescent DMA, and then wait for the mpeg decoder to >>>signal idle. It might be that the DMA command queue does not even >>>contain mpeg data. This waiting delays the frame flip enough to create= a >>>visible jump in the video. >>> >>>With multiple locks: >>> >>>The first thread checks the HQV lock, it is available and frame flippi= ng >>>is done immediately. >>> >>>The other thread meanwhile takes the MPEG engine lock, waits until the >>>DMA engine has processed all MPEG commands in the command queue and th= en >>>waits for the MPEG engine to be idle. DMA might still be processing 3D >>>commands. >> >>Do you not have interrupts that either signal MPEG engine idle, or just >>sw interrupts you can drop in the command stream? That would let you >>sleep waiting for them (rather than spinning, a win in itself) and you >>wouldn't have to hold the hardware lock. >=20 >=20 > You're right. Unfortunately the MPEG interrupt is not functioning on th= e > CLE266 (HW bug according to VIA). Also there doesn't seem to be a SW > command stream interrupt either. Not even a command stream completion > interrupt. How frustrating... I'd like to investigate all possiblities for getting = some=20 sort of synchronization information out of the hardware as this would see= m to=20 address your problem more directly & would hopefully keep the VIA driver=20 looking more like the other drivers. At very worst, there is the technique of adding a tiny little blit comman= d to=20 write a timestamp (ie 32bit color) to a piece of offscreen memory. Proce= sses=20 waiting for certain events should be able to poll the timestamp without=20 holding the lock. Better would be to have the hardware blit or write bac= k to=20 a piece of system memory. You're still stuck with polling unless there i= s=20 some IRQ (any IRQ) that can be enlisted for synchronization. Keith |
From: Mike M. <che...@ya...> - 2004-11-01 01:12:22
|
--- Thomas Hellström <uni...@sh...> wrote: > Keith Whitwell wrote: > > The typical case here: > > I want a DRI client to flip a video frame to screen, using a hardware > entity called the HQV. This is a rather time critical operation. To do > this I have to take the hardware lock. > > While this is happening, another thread is waiting for the mpeg decoder > to complete a frame. To to that, this thread needs to take the hardware > lock, wait for quiescent DMA, and then wait for the mpeg decoder to > signal idle. It might be that the DMA command queue does not even > contain mpeg data. This waiting delays the frame flip enough to create a > > visible jump in the video. > > With multiple locks: > > The first thread checks the HQV lock, it is available and frame flipping > > is done immediately. > > The other thread meanwhile takes the MPEG engine lock, waits until the > DMA engine has processed all MPEG commands in the command queue and then > > waits for the MPEG engine to be idle. DMA might still be processing 3D > commands. > > Only while submitting command buffer data. This will hopefully be a very > > fast operation. The IOCTL copying this data to the ringbuffer will check > > that all locks are held for the hardware entities that the submitted > command batch touches. The user will have to tell the IOCTL which > entities these are, or possibly the command verifier checks this but I > consider that an overkill since that is more of a bug-check than a > security check. > Part of security is making sure authorised users can't make changes to other users tasks. In this case killing all the tasks, by causing a hardware fault, is a good example. It's a fackt that any user with rights to the DRM device can use any other code or program to play with the card or send junk into the cmd streem. The DRM should detect and prevent this, even if it means a slight proformance loss. Can I get an AMEN? __________________________________ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail |
From: Thomas <uni...@sh...> - 2004-11-01 06:01:55
|
> --- Thomas Hellstr=F6m <uni...@sh...> wrote: > >> Keith Whitwell wrote: >> >> The typical case here: >> >> I want a DRI client to flip a video frame to screen, using a hardware >> entity called the HQV. This is a rather time critical operation. To do >> this I have to take the hardware lock. >> >> While this is happening, another thread is waiting for the mpeg decode= r >> to complete a frame. To to that, this thread needs to take the hardwar= e >> lock, wait for quiescent DMA, and then wait for the mpeg decoder to >> signal idle. It might be that the DMA command queue does not even >> contain mpeg data. This waiting delays the frame flip enough to create= a >> >> visible jump in the video. >> >> With multiple locks: >> >> The first thread checks the HQV lock, it is available and frame flippi= ng >> >> is done immediately. >> >> The other thread meanwhile takes the MPEG engine lock, waits until the >> DMA engine has processed all MPEG commands in the command queue and th= en >> >> waits for the MPEG engine to be idle. DMA might still be processing 3D >> commands. >> >> Only while submitting command buffer data. This will hopefully be a ve= ry >> >> fast operation. The IOCTL copying this data to the ringbuffer will che= ck >> >> that all locks are held for the hardware entities that the submitted >> command batch touches. The user will have to tell the IOCTL which >> entities these are, or possibly the command verifier checks this but I >> consider that an overkill since that is more of a bug-check than a >> security check. >> > Part of security is making sure authorised users can't make changes to > other users tasks. In this case killing all the tasks, by causing a > hardware fault, is a good example. It's a fackt that any user with rig= hts > to the DRM device can use any other code or program to play with the ca= rd > or send junk into the cmd streem. The DRM should detect and prevent th= is, > even if it means a slight proformance loss. > > Can I get an AMEN? You are probably right, and it would be quite easy to implement such checks in the via command verifier as long as each lock is associated wit= h a certain hardware address range. However, I don't quite see the point in plugging such a security hole whe= n there are a similar ways to accomplish DOS, hardware crashes and even complete lockups using DRI. On via, for example, writing random data to the framebuffer, writing random data to the sarea, taking the hardware lock and sleeping for an indefinite amount of time. Writing certain data sequences to the HQV lock= s the north bridge etc. Seems like DRI allow authorized clients to do these things by design? /Thomas > > > > > > __________________________________ > Do you Yahoo!? > Yahoo! Mail - You care about security. So do we. > http://promotions.yahoo.com/new_mail > |
From: Mike M. <che...@ya...> - 2004-11-01 21:03:32
|
--- Thomas Hellström <uni...@sh...> wrote: > > You are probably right, and it would be quite easy to implement such > checks in the via command verifier as long as each lock is associated > with > a certain hardware address range. > > However, I don't quite see the point in plugging such a security hole > when > there are a similar ways to accomplish DOS, hardware crashes and even > complete lockups using DRI. > The ideas is to plug all of them, soner or later. > On via, for example, writing random data to the framebuffer, writing > random data to the sarea, taking the hardware lock and sleeping for an > indefinite amount of time. Writing certain data sequences to the HQV > locks > the north bridge etc. > > Seems like DRI allow authorized clients to do these things by design? > > > /Thomas > > > > > > > > > > > > > > > > __________________________________ > > Do you Yahoo!? > > Yahoo! Mail - You care about security. So do we. > > http://promotions.yahoo.com/new_mail > > > > > __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com |
From: Thomas <uni...@sh...> - 2004-11-02 11:15:15
|
> --- Thomas Hellstr=F6m <uni...@sh...> wrote: > >> >> You are probably right, and it would be quite easy to implement such >> checks in the via command verifier as long as each lock is associated >> with >> a certain hardware address range. >> >> However, I don't quite see the point in plugging such a security hole >> when >> there are a similar ways to accomplish DOS, hardware crashes and even >> complete lockups using DRI. >> > The ideas is to plug all of them, soner or later. > Ok, I'll buy this. I'll implement a command queue check if I go for the multiple lock thing. AMEN ;) /Thomas |
From: Nicolai H. <pre...@gm...> - 2004-11-01 12:15:47
|
On Monday 01 November 2004 07:01, Thomas Hellstr=F6m wrote: > You are probably right, and it would be quite easy to implement such > checks in the via command verifier as long as each lock is associated with > a certain hardware address range. >=20 > However, I don't quite see the point in plugging such a security hole when > there are a similar ways to accomplish DOS, hardware crashes and even > complete lockups using DRI. >=20 > On via, for example, writing random data to the framebuffer, writing > random data to the sarea, taking the hardware lock and sleeping for an > indefinite amount of time. Writing certain data sequences to the HQV locks > the north bridge etc. >=20 > Seems like DRI allow authorized clients to do these things by design? =20 =46rom what I've learned, the DRI isn't exactly designed for robustness.=20 Still, an authorized client should never be able to cause a hardware=20 crash/lockup, and an authorized client must not be able to issue arbitrary= =20 DMA requests. As far as I know, all DRMs that are enabled by default=20 enforce at least the latter. Personally I believe that in the long term, the DRI should have (at least)= =20 the following security properties: 1. Protect against arbitrary DMA (arbitrary DMA trivially allows=20 circumvention of process boundaries) This can be done via command-stream checks. 2. Prevent hardware lockup or provide a robust recovery mechanism=20 (protection of multi-user systems, as well as data protection) Should be relatively cheap via command-stream checks on most hardware=20 (unless there are crazy hardware problems with command ordering like there= =20 seem to be on some Radeons). I believe that in the long term, recovery=20 should be in the kernel rather than the X server. 3. Make sure that no client can cause another client to crash=20 (malfunctioning clients shouldn't cause data loss in other applications) In other words, make sure that a DRI client can continue even if the shared= =20 memory areas are overwritten with entirely random values. That does seem=20 like a daunting task. 4. Make sure that no client can block access to the hardware forever (don't= =20 force the user to reboot) I have posted a watchdog patch that protects against the "take lock, sleep= =20 forever" problem a long time ago. The patch has recently been updated by=20 Dieter N=FCtzel (search for updated drm.watchdog.3). However, I have to adm= it=20 that the patch doesn't feel quite right to me. 5. Enable the user to kill/suspend resource hogs Even if we protect against lock abuse, a client could still use excessive=20 amounts of texture memory (thus causing lots of swap) or emit rendering=20 calls that take extremely long to execute. That kills latency and makes the= =20 system virtually unusable. Perhaps the process that authorizes DRI clients= =20 should be able to revoke or suspend that authorization. A suspend would=20 essentially mean that drmGetLock waits until the suspend is lifted. I know that actually implementing these things in such a way that they Just= =20 Work is not a pleasant task. I just felt like sharing a brain dump. cu, Nicolai |
From: Mike M. <che...@ya...> - 2004-11-01 22:42:18
|
--- Nicolai Haehnle <pre...@gm...> wrote: > On Monday 01 November 2004 07:01, Thomas Hellström wrote: > > You are probably right, and it would be quite easy to implement such > > checks in the via command verifier as long as each lock is associated > with > > a certain hardware address range. > > > > However, I don't quite see the point in plugging such a security hole > when > > there are a similar ways to accomplish DOS, hardware crashes and even > > complete lockups using DRI. > > > > On via, for example, writing random data to the framebuffer, writing > > random data to the sarea, taking the hardware lock and sleeping for an > > indefinite amount of time. Writing certain data sequences to the HQV > locks > > the north bridge etc. > > > > Seems like DRI allow authorized clients to do these things by design? > > From what I've learned, the DRI isn't exactly designed for robustness. > Still, an authorized client should never be able to cause a hardware > crash/lockup, and an authorized client must not be able to issue > arbitrary > DMA requests. As far as I know, all DRMs that are enabled by default > enforce at least the latter. > > Personally I believe that in the long term, the DRI should have (at > least) > the following security properties: > 1. Protect against arbitrary DMA (arbitrary DMA trivially allows > circumvention of process boundaries) > This can be done via command-stream checks. > > 2. Prevent hardware lockup or provide a robust recovery mechanism > (protection of multi-user systems, as well as data protection) > Should be relatively cheap via command-stream checks on most hardware > (unless there are crazy hardware problems with command ordering like > there This is something I think has been discussed. Hopefully the DRM currently varifies the cmd stream so that only the order in DRI's client side drivers is accepted. Other ordering could be fixed, sine the size of the cmds dosen't change, by simply memcpy'ing every thing into this right order. > seem to be on some Radeons). I believe that in the long term, recovery > should be in the kernel rather than the X server. > > 3. Make sure that no client can cause another client to crash > (malfunctioning clients shouldn't cause data loss in other applications) > In other words, make sure that a DRI client can continue even if the > shared > memory areas are overwritten with entirely random values. That does seem > > like a daunting task. > > 4. Make sure that no client can block access to the hardware forever > (don't > force the user to reboot) > I have posted a watchdog patch that protects against the "take lock, > sleep > forever" problem a long time ago. The patch has recently been updated by > > Dieter Nützel (search for updated drm.watchdog.3). However, I have to > admit > that the patch doesn't feel quite right to me. > > 5. Enable the user to kill/suspend resource hogs > Even if we protect against lock abuse, a client could still use > excessive > amounts of texture memory (thus causing lots of swap) or emit rendering > calls that take extremely long to execute. That kills latency and makes > the > system virtually unusable. Perhaps the process that authorizes DRI > clients > should be able to revoke or suspend that authorization. A suspend would > essentially mean that drmGetLock waits until the suspend is lifted. > > I know that actually implementing these things in such a way that they > Just > Work is not a pleasant task. I just felt like sharing a brain dump. > > cu, > Nicolai > > ATTACHMENT part 2 application/pgp-signature __________________________________ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com |
From: <uni...@sh...> - 2004-11-01 13:22:00
|
Nicolai Haehnle wrote: >On Monday 01 November 2004 07:01, Thomas Hellstr=F6m wrote: > =20 > >>You are probably right, and it would be quite easy to implement such >>checks in the via command verifier as long as each lock is associated w= ith >>a certain hardware address range. >> >>However, I don't quite see the point in plugging such a security hole w= hen >>there are a similar ways to accomplish DOS, hardware crashes and even >>complete lockups using DRI. >> >>On via, for example, writing random data to the framebuffer, writing >>random data to the sarea, taking the hardware lock and sleeping for an >>indefinite amount of time. Writing certain data sequences to the HQV lo= cks >>the north bridge etc. >> >>Seems like DRI allow authorized clients to do these things by design? >> =20 >> >=20 >From what I've learned, the DRI isn't exactly designed for robustness.=20 >Still, an authorized client should never be able to cause a hardware=20 >crash/lockup, and an authorized client must not be able to issue arbitra= ry=20 >DMA requests. As far as I know, all DRMs that are enabled by default=20 >enforce at least the latter. > >Personally I believe that in the long term, the DRI should have (at leas= t)=20 >the following security properties: >1. Protect against arbitrary DMA (arbitrary DMA trivially allows=20 >circumvention of process boundaries) >This can be done via command-stream checks. > > =20 > Hmm, correct me If I'm wrong, but after a brief check in the code, it=20 seems like the current _DRM_LOCK_IS_HELD() used in dma buffer=20 submission IOCTLS just checks that the lock is indeed held, but not if=20 it is held by the current caller. Thus any authorized client should be=20 able to sneek in DMA commands while the lock is held by another client=20 or the X server. -> potential system crash. /Thomas |
From: Ian R. <id...@us...> - 2004-11-01 18:37:26
|
Thomas Hellstr=F6m wrote: > I want a DRI client to flip a video frame to screen, using a hardware=20 > entity called the HQV. This is a rather time critical operation. To do=20 > this I have to take the hardware lock. >=20 > While this is happening, another thread is waiting for the mpeg decoder= =20 > to complete a frame. To to that, this thread needs to take the hardware= =20 > lock, wait for quiescent DMA, and then wait for the mpeg decoder to=20 > signal idle. It might be that the DMA command queue does not even=20 > contain mpeg data. This waiting delays the frame flip enough to create = a=20 > visible jump in the video. >=20 > With multiple locks: >=20 > The first thread checks the HQV lock, it is available and frame flippin= g=20 > is done immediately. >=20 > The other thread meanwhile takes the MPEG engine lock, waits until the=20 > DMA engine has processed all MPEG commands in the command queue and the= n=20 > waits for the MPEG engine to be idle. DMA might still be processing 3D=20 > commands. This sounds conceptually similar to waiting for the vertical retrace. It sounds like what you really want is an ioctl to wait for the MPEG=20 engine to complete and acquire the lock. Is it possible to have the=20 engine generate an interrupt when it's done with a certain frame? If=20 so, you could have the hardware do that, and have the ioctl make the=20 process sleep until the interrpt arrives. At that point acquire the=20 existing heavy-weight lock and return. |
From: Michel <mi...@da...> - 2004-11-02 06:18:27
|
On Mon, 2004-11-01 at 14:21 +0100, Thomas Hellstr=C3=B6m wrote: >=20 > Hmm, correct me If I'm wrong, but after a brief check in the code, it > seems like the current _DRM_LOCK_IS_HELD() used in dma buffer > submission IOCTLS just checks that the lock is indeed held, but not if > it is held by the current caller. Thus any authorized client should be > able to sneek in DMA commands while the lock is held by another client > or the X server. -> potential system crash. Hence _DRM_LOCK_IS_HELD() always seems to be (supposed to be) accompanied by another test that verifies the ownership. --=20 Earthling Michel D=C3=A4nzer | Debian (powerpc), X and DRI develop= er Libre software enthusiast | http://svcs.affero.net/rm.php?r=3Ddaenzer |
From: <uni...@sh...> - 2004-11-02 09:12:21
|
Michel D=C3=A4nzer wrote: >On Mon, 2004-11-01 at 14:21 +0100, Thomas Hellstr=C3=B6m wrote: > =20 > >>Hmm, correct me If I'm wrong, but after a brief check in the code, it >>seems like the current _DRM_LOCK_IS_HELD() used in dma buffer >>submission IOCTLS just checks that the lock is indeed held, but not if >>it is held by the current caller. Thus any authorized client should be >>able to sneek in DMA commands while the lock is held by another client >>or the X server. -> potential system crash. >> =20 >> > >Hence _DRM_LOCK_IS_HELD() always seems to be (supposed to be) >accompanied by another test that verifies the ownership. > > =20 > Michael, I just checked i830_dma.c, i915_dma.c and via_dma.c, and=20 _DRM_LOCK_IS_HELD() is used without such a test, AFAICT. The correct macro to call seems to be LOCK_TEST_WITH_RETURN() which does incorporate such a test. In fact, the use of _DRM_LOCK_IS_HELD() here should allow malfunctioning=20 or malicious SMP dri clients to modify internal drm data structures and=20 DMA ring-buffers simultaneously? /Thomas |
From: Mike M. <che...@ya...> - 2004-10-31 21:09:32
|
--- Thomas Hellström <uni...@sh...> wrote: > Hi, list! > > With display cards that have more and more hardware on them, > (TV-capture, mpeg decoders) etc. that can work independently of > oneanother, but share the same DMA engine I've find the need for more > than one hardware lock. I've done a simple implementation for the mpeg > decoder of the via driver, but that one doesn't cover the DMA case. The > question arises "Why should I need to wait for DMA quiescent to check > whether the decoder is done with a frame, if there is no decoder data in > > any of the pending DMA buffers"? > > In the VIA / Unichrome case alone there is a need for even more such > locks for different parts of the chip if one were to make a clean > implementation of drivers for all features that are on the chip. > > My idea would be to extend drm with options for multiple locks, and I > suspect not only VIA cards could benefit from this. I was thinking of. > > 1. A separate sarea to contain these locks, to avoid messing up the > current sarea with binary incompatibilities as a consequence. > 2. Other kernel modules should be able to take and release these locks. > (V4L for example). > 3. Each DMA buffer is marked (or in the VIA case, each submission to the > > ring-buffer is marked) wether it accesses the resource that is protected > There is a problem with A "client" being able to lock/unlock resources it may/may not be using. It's important that Client's arn't able to DOS the system by submitting junk cmds /wo setting the right locs for that junk. > by a certain lock. > 4. A resource will become available to a client when the client has > taken the lock and there are no pending DMA buffers / parts of buffers > that are marked touching this resource. > 5. The client is responsible for reinitializing the resource once the > lock is taken. > > These are just initial thoughts. Is there a mechanism for this in DRM > today or could > it be done in a better way? > > /Thomas > > > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Sybase ASE Linux Express Edition - download now for FREE > LinuxWorld Reader's Choice Award Winner for best database on Linux. > http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click > -- > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel > __________________________________ Do you Yahoo!? Y! Messenger - Communicate in real time. Download now. http://messenger.yahoo.com |
From: <uni...@sh...> - 2004-10-31 21:27:13
|
Mike Mestnik wrote: >--- Thomas Hellstr=F6m <uni...@sh...> wrote: > > =20 > >>Hi, list! >> >>With display cards that have more and more hardware on them,=20 >>(TV-capture, mpeg decoders) etc. that can work independently of=20 >>oneanother, but share the same DMA engine I've find the need for more=20 >>than one hardware lock. I've done a simple implementation for the mpeg=20 >>decoder of the via driver, but that one doesn't cover the DMA case. The= =20 >>question arises "Why should I need to wait for DMA quiescent to check=20 >>whether the decoder is done with a frame, if there is no decoder data i= n >> >>any of the pending DMA buffers"? >> >>In the VIA / Unichrome case alone there is a need for even more such=20 >>locks for different parts of the chip if one were to make a clean=20 >>implementation of drivers for all features that are on the chip. >> >>My idea would be to extend drm with options for multiple locks, and I=20 >>suspect not only VIA cards could benefit from this. I was thinking of. >> >>1. A separate sarea to contain these locks, to avoid messing up the=20 >>current sarea with binary incompatibilities as a consequence. >>2. Other kernel modules should be able to take and release these locks.= =20 >>(V4L for example). >>3. Each DMA buffer is marked (or in the VIA case, each submission to th= e >> >>ring-buffer is marked) wether it accesses the resource that is protecte= d >> >> =20 >> >There is a problem with A "client" being able to lock/unlock resources i= t >may/may not be using. It's important that Client's arn't able to DOS th= e >system by submitting junk cmds /wo setting the right locs for that junk. > =20 > Such a case would be a client submitting 2D engine commands while the X=20 server waits for 2D engine idle. Either this has to be implemented in=20 the command verifier or considered acceptable behaviour. Today any dri=20 client can continously clear the screen without taking the hardware lock. > =20 > >>by a certain lock. >>4. A resource will become available to a client when the client has=20 >>taken the lock and there are no pending DMA buffers / parts of buffers=20 >>that are marked touching this resource. >>5. The client is responsible for reinitializing the resource once the=20 >>lock is taken. >> >>These are just initial thoughts. Is there a mechanism for this in DRM=20 >>today or could >>it be done in a better way? >> >>/Thomas >> >> >> >> >> >> >> >> >>------------------------------------------------------- >>This SF.Net email is sponsored by: >>Sybase ASE Linux Express Edition - download now for FREE >>LinuxWorld Reader's Choice Award Winner for best database on Linux. >>http://ads.osdn.com/?ad_id=3D5588&alloc_id=3D12065&op=3Dclick >>-- >>_______________________________________________ >>Dri-devel mailing list >>Dri...@li... >>https://lists.sourceforge.net/lists/listinfo/dri-devel >> >> =20 >> > > > > =09 >__________________________________ >Do you Yahoo!? >Y! Messenger - Communicate in real time. Download now.=20 >http://messenger.yahoo.com > =20 > |
From: Mike M. <che...@ya...> - 2004-10-31 21:30:19
|
--- Thomas Hellström <uni...@sh...> wrote: > Such a case would be a client submitting 2D engine commands while the X > server waits for 2D engine idle. Either this has to be implemented in > the command verifier or considered acceptable behaviour. Today any dri > client can continously clear the screen without taking the hardware > lock. > There are many factors that come into play. However if a potentialy hamfull interface can be fixed easily there may be no reason not to. __________________________________ Do you Yahoo!? Y! Messenger - Communicate in real time. Download now. http://messenger.yahoo.com |
From: <uni...@sh...> - 2004-10-31 21:54:36
|
Jon Smirl wrote: >On Sun, 31 Oct 2004 19:41:03 +0100, Thomas Hellstr=F6m ><uni...@sh...> wrote: > =20 > >>=20 >> /* For now the mapping works by using a fixed size defined >> * in the SAREA header >> */ >> if (sizeof(XF86DRISAREARec)+sizeof(VIASAREAPriv) > SAREA_MAX) { >> xf86DrvMsg(pScrn->scrnIndex, X_ERROR, >> "Data does not fit in SAREA\n"); >> return FALSE; >> } >> pDRIInfo->SAREASize =3D SAREA_MAX; >>=20 >> So if locks are going to be squeezed in somewhere I'd either have to f= it >>them in the XF86DRISAREARec or put them into every driver's private ar= ea. >> =20 >> > >You can't put them into XF86DRISAREARec because of code like this: >drmInfo.sarea_priv_offset =3D sizeof(drm_sarea_t); >drm_sarea_t is the same structure as XF86DRISAREARec. > > =20 > Wouldn't this severely brake backwards binary compatibility with dri=20 clients compiled with the old size of drm_sarea_t? >Are the locks generic enough that all hardware needs them? > =20 > The idea was that if such an implementation exists and works, It could be used by any driver that found a potential gain. The generic part would be just a number of locks sitting there if=20 somebody wanted to use them. Each driver would have to assign a certain meaning to each=20 lock used. For each lock there would be a way to resolve contention and to clear=20 the lock if the holder dies. Still I'd have to make a working trial implementation for the VIA=20 driver. The important thing at this stage is to get the basic thoughts=20 right. /Thomas >You can extend VIASAREAPriv (drm_via_sarea_t) without messing up the >above check. drm_via_sarea_t is much smaller than SAREA_MAX. > >You will still need to negotiate an interface version since some >servers will know about the extended locks and others won't. You'll >have to revert to the big lock if all of the clients don't know about >the new lock scheme. > > =20 > /Thomas |