Eric Anholt wrote:
>On Wed, 2007-10-17 at 11:32 +0200, Thomas Hellström wrote:
>>Dave Airlie wrote:
>>>>DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin
>>>>interface. We can perhaps rename it to drmBOSetStatus or something more
>>>>This will get rid of the user-space unfenced list access (which I
>>>>believe was the main motivation behind the set pin interface?) while
>>>>keeping the currently heavily used (at least in Poulsbo) functionality
>>>>to move out NO_EVICT scanout buffers to local memory before unpinning
>>>>them, (to avoid VRAM and TT fragmentation, as DRI clients may still
>>>>reference those buffers, so they won't get destroyed before a new one is
>Yes, the unfenced list thing and the associated DRM_BO_HINTS required
>was part of the motivation for set_pin. But it also made sense from the
>X Server's perspective: we want to allocate buffers, and then there are
>times where we want them pinned because we're scanning out from them,
>but that pinning isn't associated with doing rendering to/from them that
>requires fencing as part of being involved in the command stream.
A good point that I missed. If we go for setStatus() we should remove
all fencing parameters, making it basically a setPin() with flags.
>it avoided the issue with the previous validate-based interface where if
>you didn't sync the flags with what the X Server had updated, you might
>mistakenly pin/unpin a buffer.
This was actually a bug in the code, which is now fixed by only allowing the
creator of a shared buffer to change the buffer mask. Perhaps we should
change this to only
allow the creator to change pinning flags.
>Can you clarify the operation being done where you move scanout buffers
>before unpinning them? That seems contradictory to me -- how are you
>scanning out while the object is being moved, and how are you
>considering it pinned during that time?
Actually it's very similar to Dave's issue, and the buffers aren't
pinned when they are thrown out. What we'd want to do is the following:
1) Turn of Crtcs.
2) Unpin the buffer
3) Destroy the buffer, leaving it's memory area free.
4) Create and pin new buffer (skipping the copy)
5) Turn on Crtcs.
However, with DRI clients operating, 3) will fail. As they may maintain
a reference on the front buffer, the old buffer won't immediately get
destroyed and it's aperture / VRAM memory area isn't freed up, unless it
gets evicted by a new allocation. This will in many cases lead to
fragmentation where it is really possible to avoid it. The best thing we
can do at 3) is to move it out, and then unreference it. When the DRI
client recognizes through the SAREA that there's a new front buffer, it
will immediately release its reference on the old one, but basically,
the old front buffer may be hanging around for quite some time (paused
dri clients...) and we don't want it to be present in the aperture, even
if it's evictable. This won't stop fragmentation in all cases, but will
certainly reduce it.
>Potentially related to your concerns, dave brought up an issue with the
>current set_pin implementation. Take the framebuffer resize sequence I'd
>Turn off CRTCs
>Unpin old framebuffer
>Allocate new framebuffer
>Copy from old to new
>Turn it back on.
>We'll have chosen an offset for new at "Copy old from new", and so
>during pin we'll just pin it right in place without moving it to avoid
>fragmentation. This sucks. It was a problem with the old interface as
>well, but it should be fixable by replacing the current poorly-suited
>validate call in the set_pin implementation with something that chooses
>the best available offset rather than just anything that matches memory
>types, and moves it into place before pinning.
Moving out the old one before allocating a new one will reduce this
problem but not eliminate it. The low level memory manager chooses the
"best" region within a single memory type, but we currently don't
optimize between memory types. The meaning of "best" is that it chooses
the smallest free region where the buffer will fit.
>>>>It would also allow us to specify where we want to pin buffers. If we
>>>>remove the memory flag specification from drmBOCreate there's no other
>>>>way to do that, except running the buffer through a superioctl which
>>>>isn't very nice.
>Yeah, not making bo_set_pin include a pin location was an oversight that
>I didn't end up correcting before push.
>>>>Also it would make it much easier to unbreak i915 zone rendering and
>>>>If we can agree on this, I'll come up with a patch.
>>>Have you had a chance to look at this I can probably spend some time on
>>>this to get the interface finalised..
>>So, I did some quick work on this with the result in the
>>Basically what's done is to revert the setPin interface, and replace
>>drmBOValidate with drmBOSetStatus.
>>drmBOSetStatus is a single buffer interface with the same functionality
>>as previously drmBOValidate but with the exception that
>>it implies DRM_BO_HINT_DONT_FENCE,
>>NO_MOVE buffers have been blocked pending a proper implementation, and
>>optional tiling info has been added.
>>The OP linked interface is gone, but the arguments are kept in drm.h for
>>now, for driver specific IOCTLS.
>>Looking at the buffer object create interface, I think it's a better
>>idea to remove the need for locking and specify the memory flags rather
>>than to assume a buffer in system memory before first validation.
>>Consider a driver that wants to put a texture buffer in VRAM.
>>It does createbuffer, copies the texture in, and then it gets validated
>>as part of the superioctl. If we don't specify the VRAM flag at buffer
>>creation, DRM will go ahead and create a ttm, allocate a lot of pages
>>and then at validate do a copy and free all pages again. If we specify
>>the VRAM flag, the buffer will on creation just reserve a vram area and
>>point any mapped pages to it.
>>Buffer creation and destruction in fixed memory areas was quite fast
>>previously, since no cache flushes were needed,
>>and I'd like to keep it that way. If we don't care or know how buffers
>>are created, we just specify DRM_BO_FLAG_MEM_LOCAL, which will give the
>>modified behavior currently present in master. But we need to remove
>>the need for locking in the buffer manager.
>I think we decided at XDS that, since you need to have the pages
>available during evict, allocating them up front at object create and
>failing then is a way better thing to be doing.
Hmm, I missed that part.
Shouldn't we be able to handle this case with a proper accounting
Having an, in many cases, unused backup pool of pinned RAM seems a bit
>But yes, if we can handle the locking issues, being able to write
>directly into VRAM (or even uncached TTM memory) on your first map of a
>buffer that you're going to just hand off for rendering is nice.
>However, I think putting this "Could you place this here so we can go
>faster?" information into create is the wrong place.
>My example for this would be a screen capture app that's scraping the
>screen, doing some accelerated operations on a sequence of buffers to
>encode it to video, then at some point in the pipeline doing software
>operations on some buffers because it can't be done in hardware any
>more. At that point of mapping those objects you want to migrate to
>cached local memory for your repeated sampling. If we only allow memory
>location hinting* for mapping at object create, we can't fix this case,
>while if we move location hinting to map, we can do both this and your
>VRAM case. I believe GL also has some hints that applications can pass
>at object map time that might be relevant to our buffer management,
>though I haven't reviewed that in some time. And mesa's VBO-based
>software tnl could use it for a nice speedup on i915.
>* I don't care whether it's strict or just a hint.
In any case, the buffer manager cannot guarantee that the buffer stays
hinted location for any duration of time, so it is just a hint, even if
the buffer manager will act on it immediately.
Actually drmBOSetStatus() will cover these cases, and it can be called
even when mapped,
but that actually also means we can use it before first map of a VRAM
buffer. But we do need to defer the ttm creation until the first buffer
map for local memory buffers.
So, to summarize:
* Yes, we can skip location hinting in drmBOCreate if we defer the
ttm creation, to make use of the VRAM optimization.
* We can use drmBOSetStatus() to do location hinting, be it for
map(), to avoid fragmentation or just to get a move speed
benchmark out of "ttmtest".
* We can implement it also in drmBOMap() if we want to.
What's the general opinion?