From: Dave A. <ai...@li...> - 2007-10-09 05:49:04
|
Hi, Once the 915 super ioctl is merged, the patch attached removes the unused interfaces left behind... Are any of these worth saving? Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie Linux kernel - DRI, VAX / pam_smb / ILUG |
From: <th...@tu...> - 2007-10-09 08:24:36
|
Dave Airlie wrote: > > Hi, > > Once the 915 super ioctl is merged, the patch attached removes the > unused interfaces left behind... > > Are any of these worth saving? > > Dave. > > Dave, As mentioned previously to Eric, I think we should keep the single buffer validate interface with the exception that the hint DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin interface. We can perhaps rename it to drmBOSetStatus or something more suitable. This will get rid of the user-space unfenced list access (which I believe was the main motivation behind the set pin interface?) while keeping the currently heavily used (at least in Poulsbo) functionality to move out NO_EVICT scanout buffers to local memory before unpinning them, (to avoid VRAM and TT fragmentation, as DRI clients may still reference those buffers, so they won't get destroyed before a new one is allocated). It would also allow us to specify where we want to pin buffers. If we remove the memory flag specification from drmBOCreate there's no other way to do that, except running the buffer through a superioctl which isn't very nice. Also it would make it much easier to unbreak i915 zone rendering and derived work. If we can agree on this, I'll come up with a patch. /Thomas > ------------------------------------------------------------------------ |
From: Dave A. <ai...@li...> - 2007-10-09 09:22:47
|
> Dave, > As mentioned previously to Eric, I think we should keep the single > buffer validate interface with the exception that the hint > DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin > interface. We can perhaps rename it to drmBOSetStatus or something more > suitable. > > This will get rid of the user-space unfenced list access (which I > believe was the main motivation behind the set pin interface?) while > keeping the currently heavily used (at least in Poulsbo) functionality > to move out NO_EVICT scanout buffers to local memory before unpinning > them, (to avoid VRAM and TT fragmentation, as DRI clients may still > reference those buffers, so they won't get destroyed before a new one is > allocated). > > It would also allow us to specify where we want to pin buffers. If we > remove the memory flag specification from drmBOCreate there's no other > way to do that, except running the buffer through a superioctl which > isn't very nice. > > Also it would make it much easier to unbreak i915 zone rendering and > derived work. > > If we can agree on this, I'll come up with a patch. I'm quite happy to have this type of interface I can definitely see its uses.. we also need to investigate some sort of temporary NO_MOVE like interface (the NO_MOVE until next fencing...) in order to avoid relocations, but it might be possible to make this driver specific.. Keith P also had an idea for relocation avoidance in the simple case which I've allowed for in my interface, we could use the 4th uint32_t in the relocation to pass in the value we've already written and only relocate it if the buffers location changes, so after doing one superioctl, the validated offsets would be passed back to userspace and used by it and we only have to relocate future buffers if the buffers move.. Dave. |
From: Keith W. <ke...@tu...> - 2007-10-09 09:45:38
|
Dave Airlie wrote: >> Dave, >> As mentioned previously to Eric, I think we should keep the single >> buffer validate interface with the exception that the hint >> DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin >> interface. We can perhaps rename it to drmBOSetStatus or something more >> suitable. >> >> This will get rid of the user-space unfenced list access (which I >> believe was the main motivation behind the set pin interface?) while >> keeping the currently heavily used (at least in Poulsbo) functionality >> to move out NO_EVICT scanout buffers to local memory before unpinning >> them, (to avoid VRAM and TT fragmentation, as DRI clients may still >> reference those buffers, so they won't get destroyed before a new one is >> allocated). >> >> It would also allow us to specify where we want to pin buffers. If we >> remove the memory flag specification from drmBOCreate there's no other >> way to do that, except running the buffer through a superioctl which >> isn't very nice. >> >> Also it would make it much easier to unbreak i915 zone rendering and >> derived work. >> >> If we can agree on this, I'll come up with a patch. > > I'm quite happy to have this type of interface I can definitely see its > uses.. we also need to investigate some sort of temporary NO_MOVE like > interface (the NO_MOVE until next fencing...) in order to avoid > relocations, but it might be possible to make this driver specific.. > > Keith P also had an idea for relocation avoidance in the simple case which > I've allowed for in my interface, we could use the 4th uint32_t in the > relocation to pass in the value we've already written and only relocate it > if the buffers location changes, so after doing one superioctl, the > validated offsets would be passed back to userspace and used by it and we > only have to relocate future buffers if the buffers move.. Theoretically the kernel could keep the relocation lists for each buffer hanging around after use and do this automatically if a buffer is reused, and the buffers that its relocations point to have been moved. That would be a good approach for one sort of buffer reuse, ie persistent state object buffers that are reused over multiple frames but contain references to other buffers. Note that it only makes sense to reuse relocations in situations where those relocations target a small number of buffers - probably other command buffers or buffers containing state objects which themselves make no further reference to other buffers. Trying to go beyond that, eg to reuse buffers of state objects that contain references to texture images, can lead to a major waste of resources. If you think about a situation with a buffer of 50 texture state objects, each referencing 4 texture images, and you just want to reuse one of those state objects -- you will emit a relocation to that state buffer, which will need to be validated and then should recursively require all 200 texture images to be validated, even if you only needed access to 4 of them... The pre-validate/no-move/whatever thing is a useful optimization, but it only makes sense up to a certain level - a handful of command, indirect state and/or vertex buffers is pretty much it. Keith |
From: Dave A. <ai...@li...> - 2007-10-16 11:22:30
|
> DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin > interface. We can perhaps rename it to drmBOSetStatus or something more > suitable. > > This will get rid of the user-space unfenced list access (which I > believe was the main motivation behind the set pin interface?) while > keeping the currently heavily used (at least in Poulsbo) functionality > to move out NO_EVICT scanout buffers to local memory before unpinning > them, (to avoid VRAM and TT fragmentation, as DRI clients may still > reference those buffers, so they won't get destroyed before a new one is > allocated). > > It would also allow us to specify where we want to pin buffers. If we > remove the memory flag specification from drmBOCreate there's no other > way to do that, except running the buffer through a superioctl which > isn't very nice. > > Also it would make it much easier to unbreak i915 zone rendering and > derived work. > > If we can agree on this, I'll come up with a patch. Have you had a chance to look at this I can probably spend some time on this to get the interface finalised.. Dave. |
From: <th...@tu...> - 2007-10-17 09:32:25
|
Dave Airlie wrote: >> DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin >> interface. We can perhaps rename it to drmBOSetStatus or something more >> suitable. >> >> This will get rid of the user-space unfenced list access (which I >> believe was the main motivation behind the set pin interface?) while >> keeping the currently heavily used (at least in Poulsbo) functionality >> to move out NO_EVICT scanout buffers to local memory before unpinning >> them, (to avoid VRAM and TT fragmentation, as DRI clients may still >> reference those buffers, so they won't get destroyed before a new one is >> allocated). >> >> It would also allow us to specify where we want to pin buffers. If we >> remove the memory flag specification from drmBOCreate there's no other >> way to do that, except running the buffer through a superioctl which >> isn't very nice. >> >> Also it would make it much easier to unbreak i915 zone rendering and >> derived work. >> >> If we can agree on this, I'll come up with a patch. >> > > Have you had a chance to look at this I can probably spend some time on > this to get the interface finalised.. > > Dave. > Hi, Dave, So, I did some quick work on this with the result in the drm-ttm-finalize branch. Basically what's done is to revert the setPin interface, and replace drmBOValidate with drmBOSetStatus. drmBOSetStatus is a single buffer interface with the same functionality as previously drmBOValidate but with the exception that it implies DRM_BO_HINT_DONT_FENCE, NO_MOVE buffers have been blocked pending a proper implementation, and optional tiling info has been added. The OP linked interface is gone, but the arguments are kept in drm.h for now, for driver specific IOCTLS. Looking at the buffer object create interface, I think it's a better idea to remove the need for locking and specify the memory flags rather than to assume a buffer in system memory before first validation. Consider a driver that wants to put a texture buffer in VRAM. It does createbuffer, copies the texture in, and then it gets validated as part of the superioctl. If we don't specify the VRAM flag at buffer creation, DRM will go ahead and create a ttm, allocate a lot of pages and then at validate do a copy and free all pages again. If we specify the VRAM flag, the buffer will on creation just reserve a vram area and point any mapped pages to it. Buffer creation and destruction in fixed memory areas was quite fast previously, since no cache flushes were needed, and I'd like to keep it that way. If we don't care or know how buffers are created, we just specify DRM_BO_FLAG_MEM_LOCAL, which will give the modified behavior currently present in master. But we need to remove the need for locking in the buffer manager. Locking was previosly needed a) to protect the ring buffer of i915 as the X server might use it. The i915 buffer driver used it for blit moves to and from "VRAM", but there are driver cases (eviction, pagefaults of non-mappable buffers) where we cannot assume that we have the lock. Better to disable i915 blit buffer moves until only DRM is allowed to touch the ring. It must be up to the driver code to ensure that the lock is not needed for any buffer move operations. b) To protect against memory region allocations / validations during take-down and vt-switches when certain memory regions are cleaned and all buffers are swapped out. Before this is merged, we need to disable i915 blit buffer moves and come up with an in-kernel fix for b). For b) we should be able to use something like a condition variable. Basically we want to interruptibly block validation if cleaned or on takedown. Also the unfenced list cleaning could probably be removed. It should be considered a bug to leave something on the unfenced list outside of the super-ioctl. /Thomas |
From: Eric A. <er...@an...> - 2007-10-17 23:41:09
|
On Wed, 2007-10-17 at 11:32 +0200, Thomas Hellstr=C3=B6m wrote: > Dave Airlie wrote: > >> DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin= =20 > >> interface. We can perhaps rename it to drmBOSetStatus or something mor= e=20 > >> suitable. > >> > >> This will get rid of the user-space unfenced list access (which I=20 > >> believe was the main motivation behind the set pin interface?) while=20 > >> keeping the currently heavily used (at least in Poulsbo) functionality= =20 > >> to move out NO_EVICT scanout buffers to local memory before unpinning=20 > >> them, (to avoid VRAM and TT fragmentation, as DRI clients may still=20 > >> reference those buffers, so they won't get destroyed before a new one = is=20 > >> allocated). Yes, the unfenced list thing and the associated DRM_BO_HINTS required was part of the motivation for set_pin. But it also made sense from the X Server's perspective: we want to allocate buffers, and then there are times where we want them pinned because we're scanning out from them, but that pinning isn't associated with doing rendering to/from them that requires fencing as part of being involved in the command stream. And it avoided the issue with the previous validate-based interface where if you didn't sync the flags with what the X Server had updated, you might mistakenly pin/unpin a buffer. Can you clarify the operation being done where you move scanout buffers before unpinning them? That seems contradictory to me -- how are you scanning out while the object is being moved, and how are you considering it pinned during that time? Potentially related to your concerns, dave brought up an issue with the current set_pin implementation. Take the framebuffer resize sequence I'd proposed before: Turn off CRTCs Unpin old framebuffer Allocate new framebuffer Copy from old to new Free old Pin new Turn it back on. We'll have chosen an offset for new at "Copy old from new", and so during pin we'll just pin it right in place without moving it to avoid fragmentation. This sucks. It was a problem with the old interface as well, but it should be fixable by replacing the current poorly-suited validate call in the set_pin implementation with something that chooses the best available offset rather than just anything that matches memory types, and moves it into place before pinning. > >> It would also allow us to specify where we want to pin buffers. If we=20 > >> remove the memory flag specification from drmBOCreate there's no other= =20 > >> way to do that, except running the buffer through a superioctl which=20 > >> isn't very nice. Yeah, not making bo_set_pin include a pin location was an oversight that I didn't end up correcting before push. > >> Also it would make it much easier to unbreak i915 zone rendering and=20 > >> derived work. > >> > >> If we can agree on this, I'll come up with a patch. > >> =20 > > > > Have you had a chance to look at this I can probably spend some time on= =20 > > this to get the interface finalised.. > > > > Dave. > > =20 > Hi, Dave, >=20 > So, I did some quick work on this with the result in the=20 > drm-ttm-finalize branch. > Basically what's done is to revert the setPin interface, and replace=20 > drmBOValidate with drmBOSetStatus. > drmBOSetStatus is a single buffer interface with the same functionality=20 > as previously drmBOValidate but with the exception that > it implies DRM_BO_HINT_DONT_FENCE, > NO_MOVE buffers have been blocked pending a proper implementation, and=20 > optional tiling info has been added. > The OP linked interface is gone, but the arguments are kept in drm.h for=20 > now, for driver specific IOCTLS. >=20 > Looking at the buffer object create interface, I think it's a better=20 > idea to remove the need for locking and specify the memory flags rather=20 > than to assume a buffer in system memory before first validation.=20 > Consider a driver that wants to put a texture buffer in VRAM. > It does createbuffer, copies the texture in, and then it gets validated=20 > as part of the superioctl. If we don't specify the VRAM flag at buffer=20 > creation, DRM will go ahead and create a ttm, allocate a lot of pages=20 > and then at validate do a copy and free all pages again. If we specify=20 > the VRAM flag, the buffer will on creation just reserve a vram area and=20 > point any mapped pages to it. > Buffer creation and destruction in fixed memory areas was quite fast=20 > previously, since no cache flushes were needed, > and I'd like to keep it that way. If we don't care or know how buffers=20 > are created, we just specify DRM_BO_FLAG_MEM_LOCAL, which will give the=20 > modified behavior currently present in master. But we need to remove=20 > the need for locking in the buffer manager. I think we decided at XDS that, since you need to have the pages available during evict, allocating them up front at object create and failing then is a way better thing to be doing. But yes, if we can handle the locking issues, being able to write directly into VRAM (or even uncached TTM memory) on your first map of a buffer that you're going to just hand off for rendering is nice. However, I think putting this "Could you place this here so we can go faster?" information into create is the wrong place. My example for this would be a screen capture app that's scraping the screen, doing some accelerated operations on a sequence of buffers to encode it to video, then at some point in the pipeline doing software operations on some buffers because it can't be done in hardware any more. At that point of mapping those objects you want to migrate to cached local memory for your repeated sampling. If we only allow memory location hinting* for mapping at object create, we can't fix this case, while if we move location hinting to map, we can do both this and your VRAM case. I believe GL also has some hints that applications can pass at object map time that might be relevant to our buffer management, though I haven't reviewed that in some time. And mesa's VBO-based software tnl could use it for a nice speedup on i915. * I don't care whether it's strict or just a hint. > Also the unfenced list cleaning could probably be removed. It should be=20 > considered a bug to leave something on the unfenced list outside of the=20 > super-ioctl. Absolutely. --=20 Eric Anholt anholt@FreeBSD.org er...@an... eri...@in... |
From: <th...@tu...> - 2007-10-18 08:24:15
|
Hi, Eric. Eric Anholt wrote: >On Wed, 2007-10-17 at 11:32 +0200, Thomas Hellström wrote: > > >>Dave Airlie wrote: >> >> >>>>DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin >>>>interface. We can perhaps rename it to drmBOSetStatus or something more >>>>suitable. >>>> >>>>This will get rid of the user-space unfenced list access (which I >>>>believe was the main motivation behind the set pin interface?) while >>>>keeping the currently heavily used (at least in Poulsbo) functionality >>>>to move out NO_EVICT scanout buffers to local memory before unpinning >>>>them, (to avoid VRAM and TT fragmentation, as DRI clients may still >>>>reference those buffers, so they won't get destroyed before a new one is >>>>allocated). >>>> >>>> > >Yes, the unfenced list thing and the associated DRM_BO_HINTS required >was part of the motivation for set_pin. But it also made sense from the >X Server's perspective: we want to allocate buffers, and then there are >times where we want them pinned because we're scanning out from them, >but that pinning isn't associated with doing rendering to/from them that >requires fencing as part of being involved in the command stream. > A good point that I missed. If we go for setStatus() we should remove all fencing parameters, making it basically a setPin() with flags. > And >it avoided the issue with the previous validate-based interface where if >you didn't sync the flags with what the X Server had updated, you might >mistakenly pin/unpin a buffer. > > This was actually a bug in the code, which is now fixed by only allowing the creator of a shared buffer to change the buffer mask. Perhaps we should change this to only allow the creator to change pinning flags. >Can you clarify the operation being done where you move scanout buffers >before unpinning them? That seems contradictory to me -- how are you >scanning out while the object is being moved, and how are you >considering it pinned during that time? > > Actually it's very similar to Dave's issue, and the buffers aren't pinned when they are thrown out. What we'd want to do is the following: 1) Turn of Crtcs. 2) Unpin the buffer 3) Destroy the buffer, leaving it's memory area free. 4) Create and pin new buffer (skipping the copy) 5) Turn on Crtcs. However, with DRI clients operating, 3) will fail. As they may maintain a reference on the front buffer, the old buffer won't immediately get destroyed and it's aperture / VRAM memory area isn't freed up, unless it gets evicted by a new allocation. This will in many cases lead to fragmentation where it is really possible to avoid it. The best thing we can do at 3) is to move it out, and then unreference it. When the DRI client recognizes through the SAREA that there's a new front buffer, it will immediately release its reference on the old one, but basically, the old front buffer may be hanging around for quite some time (paused dri clients...) and we don't want it to be present in the aperture, even if it's evictable. This won't stop fragmentation in all cases, but will certainly reduce it. >Potentially related to your concerns, dave brought up an issue with the >current set_pin implementation. Take the framebuffer resize sequence I'd >proposed before: > >Turn off CRTCs >Unpin old framebuffer >Allocate new framebuffer >Copy from old to new >Free old >Pin new >Turn it back on. > >We'll have chosen an offset for new at "Copy old from new", and so >during pin we'll just pin it right in place without moving it to avoid >fragmentation. This sucks. It was a problem with the old interface as >well, but it should be fixable by replacing the current poorly-suited >validate call in the set_pin implementation with something that chooses >the best available offset rather than just anything that matches memory >types, and moves it into place before pinning. > > Moving out the old one before allocating a new one will reduce this problem but not eliminate it. The low level memory manager chooses the "best" region within a single memory type, but we currently don't optimize between memory types. The meaning of "best" is that it chooses the smallest free region where the buffer will fit. > > >>>>It would also allow us to specify where we want to pin buffers. If we >>>>remove the memory flag specification from drmBOCreate there's no other >>>>way to do that, except running the buffer through a superioctl which >>>>isn't very nice. >>>> >>>> > >Yeah, not making bo_set_pin include a pin location was an oversight that >I didn't end up correcting before push. > > > >>>>Also it would make it much easier to unbreak i915 zone rendering and >>>>derived work. >>>> >>>>If we can agree on this, I'll come up with a patch. >>>> >>>> >>>> >>>Have you had a chance to look at this I can probably spend some time on >>>this to get the interface finalised.. >>> >>>Dave. >>> >>> >>> >>Hi, Dave, >> >>So, I did some quick work on this with the result in the >>drm-ttm-finalize branch. >>Basically what's done is to revert the setPin interface, and replace >>drmBOValidate with drmBOSetStatus. >>drmBOSetStatus is a single buffer interface with the same functionality >>as previously drmBOValidate but with the exception that >>it implies DRM_BO_HINT_DONT_FENCE, >>NO_MOVE buffers have been blocked pending a proper implementation, and >>optional tiling info has been added. >>The OP linked interface is gone, but the arguments are kept in drm.h for >>now, for driver specific IOCTLS. >> >>Looking at the buffer object create interface, I think it's a better >>idea to remove the need for locking and specify the memory flags rather >>than to assume a buffer in system memory before first validation. >>Consider a driver that wants to put a texture buffer in VRAM. >>It does createbuffer, copies the texture in, and then it gets validated >>as part of the superioctl. If we don't specify the VRAM flag at buffer >>creation, DRM will go ahead and create a ttm, allocate a lot of pages >>and then at validate do a copy and free all pages again. If we specify >>the VRAM flag, the buffer will on creation just reserve a vram area and >>point any mapped pages to it. >>Buffer creation and destruction in fixed memory areas was quite fast >>previously, since no cache flushes were needed, >>and I'd like to keep it that way. If we don't care or know how buffers >>are created, we just specify DRM_BO_FLAG_MEM_LOCAL, which will give the >>modified behavior currently present in master. But we need to remove >>the need for locking in the buffer manager. >> >> > >I think we decided at XDS that, since you need to have the pages >available during evict, allocating them up front at object create and >failing then is a way better thing to be doing. > > Hmm, I missed that part. Shouldn't we be able to handle this case with a proper accounting scheme? Having an, in many cases, unused backup pool of pinned RAM seems a bit wasteful?? >But yes, if we can handle the locking issues, being able to write >directly into VRAM (or even uncached TTM memory) on your first map of a >buffer that you're going to just hand off for rendering is nice. >However, I think putting this "Could you place this here so we can go >faster?" information into create is the wrong place. > >My example for this would be a screen capture app that's scraping the >screen, doing some accelerated operations on a sequence of buffers to >encode it to video, then at some point in the pipeline doing software >operations on some buffers because it can't be done in hardware any >more. At that point of mapping those objects you want to migrate to >cached local memory for your repeated sampling. If we only allow memory >location hinting* for mapping at object create, we can't fix this case, >while if we move location hinting to map, we can do both this and your >VRAM case. I believe GL also has some hints that applications can pass >at object map time that might be relevant to our buffer management, >though I haven't reviewed that in some time. And mesa's VBO-based >software tnl could use it for a nice speedup on i915. > >* I don't care whether it's strict or just a hint. > > In any case, the buffer manager cannot guarantee that the buffer stays in the hinted location for any duration of time, so it is just a hint, even if the buffer manager will act on it immediately. Actually drmBOSetStatus() will cover these cases, and it can be called even when mapped, but that actually also means we can use it before first map of a VRAM buffer. But we do need to defer the ttm creation until the first buffer map for local memory buffers. So, to summarize: * Yes, we can skip location hinting in drmBOCreate if we defer the ttm creation, to make use of the VRAM optimization. * We can use drmBOSetStatus() to do location hinting, be it for map(), to avoid fragmentation or just to get a move speed benchmark out of "ttmtest". * We can implement it also in drmBOMap() if we want to. What's the general opinion? /Thomas |
From: Keith W. <ke...@tu...> - 2007-10-18 08:58:11
|
Thomas Hellstr=F6m wrote: > Hi, Eric. >=20 > Eric Anholt wrote: =2E.. >> Can you clarify the operation being done where you move scanout buffer= s >> before unpinning them? That seems contradictory to me -- how are you >> scanning out while the object is being moved, and how are you >> considering it pinned during that time? >> =20 >> > Actually it's very similar to Dave's issue, and the buffers aren't=20 > pinned when they are thrown out. What we'd want to do is the following:= >=20 > 1) Turn of Crtcs. > 2) Unpin the buffer > 3) Destroy the buffer, leaving it's memory area free. > 4) Create and pin new buffer (skipping the copy) > 5) Turn on Crtcs. >=20 > However, with DRI clients operating, 3) will fail. As they may maintain= =20 > a reference on the front buffer, the old buffer won't immediately get=20 > destroyed and it's aperture / VRAM memory area isn't freed up, unless i= t=20 > gets evicted by a new allocation. Is there really a long-term need for DRI clients to maintain a reference = to the front buffer? We're moving away from this in lot of ways and if=20 it is a benefit to the TTM work, we could look at severing that tie=20 sooner rather than later... > This will in many cases lead to=20 > fragmentation where it is really possible to avoid it. The best thing w= e=20 > can do at 3) is to move it out, and then unreference it. When the DRI=20 > client recognizes through the SAREA that there's a new front buffer, it= =20 > will immediately release its reference on the old one, but basically,=20 > the old front buffer may be hanging around for quite some time (paused = > dri clients...) and we don't want it to be present in the aperture, eve= n=20 > if it's evictable. This won't stop fragmentation in all cases, but will= =20 > certainly reduce it. At very least, current DRI/ttm clients could be modified to only use the = frontbuffer reference in locked regions, and to have some way of getting = the correct handle for the current frontbuffer at that point. Longer term, it's easy to imagine DRI clients not touching the front=20 buffer independently and not requiring a reference to that buffer... Keith |
From: Jerome G. <gl...@fr...> - 2007-10-18 09:17:01
|
Keith Whitwell wrote: > Thomas Hellström wrote: > >> Hi, Eric. >> >> Eric Anholt wrote: >> > > ... > > >>> Can you clarify the operation being done where you move scanout buffers >>> before unpinning them? That seems contradictory to me -- how are you >>> scanning out while the object is being moved, and how are you >>> considering it pinned during that time? >>> >>> >>> >> Actually it's very similar to Dave's issue, and the buffers aren't >> pinned when they are thrown out. What we'd want to do is the following: >> >> 1) Turn of Crtcs. >> 2) Unpin the buffer >> 3) Destroy the buffer, leaving it's memory area free. >> 4) Create and pin new buffer (skipping the copy) >> 5) Turn on Crtcs. >> >> However, with DRI clients operating, 3) will fail. As they may maintain >> a reference on the front buffer, the old buffer won't immediately get >> destroyed and it's aperture / VRAM memory area isn't freed up, unless it >> gets evicted by a new allocation. >> > > Is there really a long-term need for DRI clients to maintain a reference > to the front buffer? We're moving away from this in lot of ways and if > it is a benefit to the TTM work, we could look at severing that tie > sooner rather than later... > > > >> This will in many cases lead to >> fragmentation where it is really possible to avoid it. The best thing we >> can do at 3) is to move it out, and then unreference it. When the DRI >> client recognizes through the SAREA that there's a new front buffer, it >> will immediately release its reference on the old one, but basically, >> the old front buffer may be hanging around for quite some time (paused >> dri clients...) and we don't want it to be present in the aperture, even >> if it's evictable. This won't stop fragmentation in all cases, but will >> certainly reduce it. >> > > At very least, current DRI/ttm clients could be modified to only use the > frontbuffer reference in locked regions, and to have some way of getting > the correct handle for the current frontbuffer at that point. > > Longer term, it's easy to imagine DRI clients not touching the front > buffer independently and not requiring a reference to that buffer... > > Doesn't Kristian changes to DRI interface (DRI2) already allow to clients to not care about front buffer. I mean if they all got private back buffer then they render into it. But i might have misunderstood this. Cheers, Jerome Glisse |
From: Keith P. <ke...@ke...> - 2007-10-17 23:57:52
|
On Wed, 2007-10-17 at 16:40 -0700, Eric Anholt wrote: > Turn off CRTCs > Unpin old framebuffer > Allocate new framebuffer > Copy from old to new We needn't copy on resize, leaving us with allocate new, unpin old, pin new, free old. Seems like this would avoid some of the worst issues. --=20 kei...@in... |
From: Keith W. <ke...@tu...> - 2007-10-18 09:21:03
|
Jerome Glisse wrote: > Keith Whitwell wrote: >> Thomas Hellstr=F6m wrote: >> =20 >>> Hi, Eric. >>> >>> Eric Anholt wrote: >>> =20 >> >> ... >> >> =20 >>>> Can you clarify the operation being done where you move scanout buff= ers >>>> before unpinning them? That seems contradictory to me -- how are yo= u >>>> scanning out while the object is being moved, and how are you >>>> considering it pinned during that time? >>>> =20 >>>> >>>> =20 >>> Actually it's very similar to Dave's issue, and the buffers aren't=20 >>> pinned when they are thrown out. What we'd want to do is the followin= g: >>> >>> 1) Turn of Crtcs. >>> 2) Unpin the buffer >>> 3) Destroy the buffer, leaving it's memory area free. >>> 4) Create and pin new buffer (skipping the copy) >>> 5) Turn on Crtcs. >>> >>> However, with DRI clients operating, 3) will fail. As they may=20 >>> maintain a reference on the front buffer, the old buffer won't=20 >>> immediately get destroyed and it's aperture / VRAM memory area isn't = >>> freed up, unless it gets evicted by a new allocation. >>> =20 >> >> Is there really a long-term need for DRI clients to maintain a=20 >> reference to the front buffer? We're moving away from this in lot of = >> ways and if it is a benefit to the TTM work, we could look at severing= =20 >> that tie sooner rather than later... >> >> >> =20 >>> This will in many cases lead to fragmentation where it is really=20 >>> possible to avoid it. The best thing we can do at 3) is to move it=20 >>> out, and then unreference it. When the DRI client recognizes through = >>> the SAREA that there's a new front buffer, it will immediately=20 >>> release its reference on the old one, but basically, the old front=20 >>> buffer may be hanging around for quite some time (paused dri=20 >>> clients...) and we don't want it to be present in the aperture, even = >>> if it's evictable. This won't stop fragmentation in all cases, but=20 >>> will certainly reduce it. >>> =20 >> >> At very least, current DRI/ttm clients could be modified to only use=20 >> the frontbuffer reference in locked regions, and to have some way of=20 >> getting the correct handle for the current frontbuffer at that point. >> >> Longer term, it's easy to imagine DRI clients not touching the front=20 >> buffer independently and not requiring a reference to that buffer... >> >> =20 > Doesn't Kristian changes to DRI interface (DRI2) already allow to=20 > clients to not care > about front buffer. I mean if they all got private back buffer then the= y=20 > render into it. > But i might have misunderstood this. Yes, of course. I'm not sure how watertight the isolation of the=20 frontbuffer is in DRI2, but if it's complete and if the rule is that ttm = supports only DRI2 clients, this concern would seem to be addressed. Keith |
From: <th...@tu...> - 2007-10-18 09:42:30
|
Keith Whitwell wrote: > Jerome Glisse wrote: >> Keith Whitwell wrote: >>> Thomas Hellstr=F6m wrote: >>> =20 >>>> Hi, Eric. >>>> >>>> Eric Anholt wrote: >>>> =20 >>> >>> ... >>> >>> =20 >>>>> Can you clarify the operation being done where you move scanout=20 >>>>> buffers >>>>> before unpinning them? That seems contradictory to me -- how are y= ou >>>>> scanning out while the object is being moved, and how are you >>>>> considering it pinned during that time? >>>>> =20 >>>>> >>>>> =20 >>>> Actually it's very similar to Dave's issue, and the buffers aren't=20 >>>> pinned when they are thrown out. What we'd want to do is the=20 >>>> following: >>>> >>>> 1) Turn of Crtcs. >>>> 2) Unpin the buffer >>>> 3) Destroy the buffer, leaving it's memory area free. >>>> 4) Create and pin new buffer (skipping the copy) >>>> 5) Turn on Crtcs. >>>> >>>> However, with DRI clients operating, 3) will fail. As they may=20 >>>> maintain a reference on the front buffer, the old buffer won't=20 >>>> immediately get destroyed and it's aperture / VRAM memory area=20 >>>> isn't freed up, unless it gets evicted by a new allocation. >>>> =20 >>> >>> Is there really a long-term need for DRI clients to maintain a=20 >>> reference to the front buffer? We're moving away from this in lot=20 >>> of ways and if it is a benefit to the TTM work, we could look at=20 >>> severing that tie sooner rather than later... >>> While this is probably a good idea, the discussion is really whether we=20 should use a more general drmBOSetStatus() or a subset drmBOSetPin().=20 With drmBOSetStatus() we can easily implement a workaround for the=20 (perhaps soon disappearing) fragmentation problem, but there are a=20 couple of other uses as well. /Thomas |
From: Eric A. <er...@an...> - 2007-10-18 19:57:12
|
On Thu, 2007-10-18 at 07:55 +0800, Keith Packard wrote: > On Wed, 2007-10-17 at 16:40 -0700, Eric Anholt wrote: >=20 > > Turn off CRTCs > > Unpin old framebuffer > > Allocate new framebuffer > > Copy from old to new >=20 > We needn't copy on resize, leaving us with allocate new, unpin old, pin > new, free old. Seems like this would avoid some of the worst issues. True. The issue would still exist if you happened to accelerated clear your new frontbuffer before pinning it, which we could avoid in the driver. And even if you do everything right, in the case of expanding your framebuffer I don't think there would be any guarantee of getting put at the least-fragmenting place to have a pinned buffer. I think the solution to that that I suggested is reasonable and deals with thomas's and our fragmentation concerns (don't use validate, just find the space we would most like to be at not containing pinned objects, evict whoever's there, and pin it). Also, while I'm not a fan of the vague name of drmBOSetStatus, it sounds like it would be usable to do the map hinting I suggested. --=20 Eric Anholt anholt@FreeBSD.org er...@an... eri...@in... |
From: <th...@tu...> - 2007-10-19 20:32:02
|
Hi. Eric Anholt wrote: >On Thu, 2007-10-18 at 07:55 +0800, Keith Packard wrote: > > >>On Wed, 2007-10-17 at 16:40 -0700, Eric Anholt wrote: >> >> >> >>>Turn off CRTCs >>>Unpin old framebuffer >>>Allocate new framebuffer >>>Copy from old to new >>> >>> >>We needn't copy on resize, leaving us with allocate new, unpin old, pin >>new, free old. Seems like this would avoid some of the worst issues. >> >> > >True. The issue would still exist if you happened to accelerated clear >your new frontbuffer before pinning it, which we could avoid in the >driver. And even if you do everything right, in the case of expanding >your framebuffer I don't think there would be any guarantee of getting >put at the least-fragmenting place to have a pinned buffer. > >I think the solution to that that I suggested is reasonable and deals >with thomas's and our fragmentation concerns (don't use validate, just >find the space we would most like to be at not containing pinned >objects, evict whoever's there, and pin it). > > > Hmm, I missed this message somehow. Anyway, this scheme seems to be the best solution to avoid pinned buffer fragmentation. However it's not straightforward to implement with the current code. One way would be to decide on a target memory region, do a full memory region clean on that one (keeping only pinned buffers) and then let the default logic handle the rest. >Also, while I'm not a fan of the vague name of drmBOSetStatus, it sounds >like it would be usable to do the map hinting I suggested. > > > It would. If we're going to change the name we'd better do it now. Any suggestions? /Thomas |
From: <kr...@bi...> - 2007-10-18 20:33:13
|
On 10/18/07, Keith Whitwell <ke...@tu...> wrote: ... > > Doesn't Kristian changes to DRI interface (DRI2) already allow to > > clients to not care > > about front buffer. I mean if they all got private back buffer then they > > render into it. > > But i might have misunderstood this. > > Yes, of course. I'm not sure how watertight the isolation of the > frontbuffer is in DRI2, but if it's complete and if the rule is that ttm > supports only DRI2 clients, this concern would seem to be addressed. What I'm planning to do is to keep the front buffer bo handle in the kernel, associateed with the drm_drawable_t. The swap buffer ioctl will reference a drm_drawable_t as the front buffer destination and thus it will always be up to date and userspace won't hold any references to front buffers. The kernel holds a reference through each of the drm_drawable_t's that are non-redirected child-windows of the front buffer, and once the X server has re-attached the new front buffer bo to each drm_drawable_t, there should be no references to the front buffer. This happens in the SetWindowPixmap hook. It sounds like we want to free the old front buffer before we allocate the new one, in which case we'll have to call the SetWindowPixmap hook with a null pixmap or something to drop the kernel side references before allocating the new buffer. However, I was hoping we could avoid this, and allocate a new buffer while still scanning out from the old one, copy the contents, change the scan out address and then free the old one. This avoids flicker, and as for the fragmentation concern; can't we just alternate between allocating from different ends of the aperture? Or alternatively, do a two-step operation: allocate a temporary front buffer far enough into the aperture that we free the old buffer and then allocate the final new frontbuffer at offset 0? We should avoid turning off the crtcs here if at all possible. cheers, Kristian |
From: Jerome G. <gl...@fr...> - 2007-10-19 01:12:03
|
Kristian Høgsberg wrote: > On 10/18/07, Keith Whitwell <ke...@tu...> wrote: > ... > >>> Doesn't Kristian changes to DRI interface (DRI2) already allow to >>> clients to not care >>> about front buffer. I mean if they all got private back buffer then they >>> render into it. >>> But i might have misunderstood this. >>> >> Yes, of course. I'm not sure how watertight the isolation of the >> frontbuffer is in DRI2, but if it's complete and if the rule is that ttm >> supports only DRI2 clients, this concern would seem to be addressed. >> > > What I'm planning to do is to keep the front buffer bo handle in the > kernel, associateed with the drm_drawable_t. The swap buffer ioctl > will reference a drm_drawable_t as the front buffer destination and > thus it will always be up to date and userspace won't hold any > references to front buffers. The kernel holds a reference through > each of the drm_drawable_t's that are non-redirected child-windows of > the front buffer, and once the X server has re-attached the new front > buffer bo to each drm_drawable_t, there should be no references to the > front buffer. This happens in the SetWindowPixmap hook. > > It sounds like we want to free the old front buffer before we allocate > the new one, in which case we'll have to call the SetWindowPixmap hook > with a null pixmap or something to drop the kernel side references > before allocating the new buffer. > > However, I was hoping we could avoid this, and allocate a new buffer > while still scanning out from the old one, copy the contents, change > the scan out address and then free the old one. This avoids flicker, > and as for the fragmentation concern; can't we just alternate between > allocating from different ends of the aperture? Or alternatively, do > a two-step operation: allocate a temporary front buffer far enough > into the aperture that we free the old buffer and then allocate the > final new frontbuffer at offset 0? We should avoid turning off the > crtcs here if at all possible. > > cheers, > Kristian > There is also the following (i don't think it was mentioned before in this thread): card with 8Mo of ram (who the hell have such hw ? :)) a scanout buffer which use 4Mo and user resizing to buffer which need 5Mo obviously we can't allocate it until we free the previous one... Maybe we can accept some kind of allocate over style or simply a resize function. Cheers, Jerome Glisse |
From: <th...@tu...> - 2007-10-19 06:06:18
|
Jerome Glisse wrote: > Kristian Høgsberg wrote: > >> On 10/18/07, Keith Whitwell <ke...@tu...> wrote: >> ... >> >> >>>> Doesn't Kristian changes to DRI interface (DRI2) already allow to >>>> clients to not care >>>> about front buffer. I mean if they all got private back buffer then >>>> they >>>> render into it. >>>> But i might have misunderstood this. >>>> >>> >>> Yes, of course. I'm not sure how watertight the isolation of the >>> frontbuffer is in DRI2, but if it's complete and if the rule is that >>> ttm >>> supports only DRI2 clients, this concern would seem to be addressed. >>> >> >> >> What I'm planning to do is to keep the front buffer bo handle in the >> kernel, associateed with the drm_drawable_t. The swap buffer ioctl >> will reference a drm_drawable_t as the front buffer destination and >> thus it will always be up to date and userspace won't hold any >> references to front buffers. The kernel holds a reference through >> each of the drm_drawable_t's that are non-redirected child-windows of >> the front buffer, and once the X server has re-attached the new front >> buffer bo to each drm_drawable_t, there should be no references to the >> front buffer. This happens in the SetWindowPixmap hook. >> >> It sounds like we want to free the old front buffer before we allocate >> the new one, in which case we'll have to call the SetWindowPixmap hook >> with a null pixmap or something to drop the kernel side references >> before allocating the new buffer. >> >> However, I was hoping we could avoid this, and allocate a new buffer >> while still scanning out from the old one, copy the contents, change >> the scan out address and then free the old one. This avoids flicker, >> and as for the fragmentation concern; can't we just alternate between >> allocating from different ends of the aperture? Or alternatively, do >> a two-step operation: allocate a temporary front buffer far enough >> into the aperture that we free the old buffer and then allocate the >> final new frontbuffer at offset 0? We should avoid turning off the >> crtcs here if at all possible. >> >> cheers, >> Kristian >> > > > There is also the following (i don't think it was mentioned before > in this thread): > card with 8Mo of ram (who the hell have such hw ? :)) a scanout > buffer which use 4Mo and user resizing to buffer which need 5Mo > obviously we can't allocate it until we free the previous one... Maybe > we can accept some kind of allocate over style or simply a resize > function. > > Cheers, > Jerome Glisse Actually, any scheme mentioned handles this, as long as you disable the crtcs and unpin one buffer before pinning the other. The old buffer will get evicted when the new one is allocated, but you still have CPU access to it. /Thomas |
From: Eric A. <er...@an...> - 2007-10-19 18:27:22
|
On Fri, 2007-10-19 at 03:04 +0200, Jerome Glisse wrote: > Kristian H=C3=B8gsberg wrote: > > On 10/18/07, Keith Whitwell <ke...@tu...> wrote: > > ... > > =20 > >>> Doesn't Kristian changes to DRI interface (DRI2) already allow to > >>> clients to not care > >>> about front buffer. I mean if they all got private back buffer then t= hey > >>> render into it. > >>> But i might have misunderstood this. > >>> =20 > >> Yes, of course. I'm not sure how watertight the isolation of the > >> frontbuffer is in DRI2, but if it's complete and if the rule is that t= tm > >> supports only DRI2 clients, this concern would seem to be addressed. > >> =20 > > > > What I'm planning to do is to keep the front buffer bo handle in the > > kernel, associateed with the drm_drawable_t. The swap buffer ioctl > > will reference a drm_drawable_t as the front buffer destination and > > thus it will always be up to date and userspace won't hold any > > references to front buffers. The kernel holds a reference through > > each of the drm_drawable_t's that are non-redirected child-windows of > > the front buffer, and once the X server has re-attached the new front > > buffer bo to each drm_drawable_t, there should be no references to the > > front buffer. This happens in the SetWindowPixmap hook. > > > > It sounds like we want to free the old front buffer before we allocate > > the new one, in which case we'll have to call the SetWindowPixmap hook > > with a null pixmap or something to drop the kernel side references > > before allocating the new buffer. > > > > However, I was hoping we could avoid this, and allocate a new buffer > > while still scanning out from the old one, copy the contents, change > > the scan out address and then free the old one. This avoids flicker, > > and as for the fragmentation concern; can't we just alternate between > > allocating from different ends of the aperture? Or alternatively, do > > a two-step operation: allocate a temporary front buffer far enough > > into the aperture that we free the old buffer and then allocate the > > final new frontbuffer at offset 0? We should avoid turning off the > > crtcs here if at all possible. > > > > cheers, > > Kristian > > =20 >=20 > There is also the following (i don't think it was mentioned before > in this thread): > card with 8Mo of ram (who the hell have such hw ? :)) a scanout > buffer which use 4Mo and user resizing to buffer which need 5Mo > obviously we can't allocate it until we free the previous one... Maybe > we can accept some kind of allocate over style or simply a resize > function. No, you can't validate or pin the new one until you unpin the old one. Creation is fine, and validation is fine as long as old is unpinned. It can be referenced still -- who cares? Are we really trying to optimize out the migration of the old buffer to system memory when it's about to be thrown away? --=20 Eric Anholt anholt@FreeBSD.org er...@an... eri...@in... |
From: Daniel K. <da...@en...> - 2007-10-19 04:00:16
|
On Fri, 2007-10-19 at 03:04 +0200, Jerome Glisse wrote: > There is also the following (i don't think it was mentioned before > in this thread): > card with 8Mo of ram (who the hell have such hw ? :)) I've got 40 of them :( All of our desktops have integrated Intel ( i845g ) chips, and the BIOS has the option of stealing either 1MB or 8MB of system ram. It's hard to know whether to laugh or cry ... Dan |
From: Dave A. <ai...@li...> - 2007-10-19 04:04:06
|
> > There is also the following (i don't think it was mentioned before > > in this thread): > > card with 8Mo of ram (who the hell have such hw ? :)) > > I've got 40 of them :( > > All of our desktops have integrated Intel ( i845g ) chips, and the BIOS > has the option of stealing either 1MB or 8MB of system ram. It's hard to > know whether to laugh or cry ... they don't count, the driver dynamically allocate main RAM into the aperture in this case :-) the initial BIOS allocation is just for text mode to get up and going, and VGA type things. Dave. |