Thread: [Dri-devel] The next round of texture memory management...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

What follows is the collected requirements for the new DRI memory 
manager.  This list is the product of several discussions between Brian, 
Keith, Allen, and myself several months ago.  After the list, I have 
included some of my thoughts on the big picture that I see from these 
requirements.

1. Single-copy textures

Right now each texture exists in two or three places.  There is a copy 
in on-card or AGP memory, in system memory (managed by the driver), and 
in application memory.  Any solution should be able to eliminate one or 
two of those copies.

If the driver-tracked copy in system memory is eliminated, care must be 
taken when the texture needs to be removed from on-card / AGP memory. 
Additionally, changes to the texture image made via glCopyTexImage must 
not be lost.

It may be possible to eliminate one copy of the texture using 
APPLE_client_storage.  A portion of this could be done purely in Mesa. 
If the user supplied image matches the internal format of the texture, 
then the driver can use the application's copy of the texture in place 
of the driver's copy.

Modulo implementation difficulties, it may even be possible to use the 
pages that hold the texture as backing store for a portion of the AGP 
aperture.  The is the only way to truly achieve single-copy textures. 
The implementation may prove too difficult on existing x86 systems to be 
worth the effort.  This functionality is available in MacOS 10.1, so the 
same difficulties may not exist on Linux PPC.

2. Share texture memory among multiple OpenGL contexts

Texture memory is currently shared by all OpenGL contexts.  That is, 
when an OpenGL context switch happens it is not necessary to reload all 
textures.  The texture manager needs to continue to use a paged memory 
model (as opposed to a segmented memory model).

3. Accommodate other OpenGL buffers

The allocator should also be used for allocating vertex buffers, render 
targets (pbuffers, back-buffers, depth-buffers, etc.), and other 
buffers.  This can be useful beyond supporting SGIX_pbuffer, 
ARB_vertex_array_objects, and optimized display lists.  Dynamically 
allocating per-context depth and back-buffers will allow multiple Z 
depths be used at a time (i.e., 16-bit depth-buffer for one window and 
24-bit depth-buffer for another) and super-sampling FSAA.

4. Support texture pseudo-render targets

Accelerating some OpenGL functions, such as glCopyTexImage, 
SGIS_generate_mipmaps, and ARB_render_texture, may require special 
support and consideration.

5. Additional AGP related issues

There may be cases where textures need to be moved back-and-forth 
between AGP and on-card memory.  For example, a texture might reside in 
AGP memory, and an operation may be requested that requires that the 
texture be in on-card memory.

6. Additional texture formats and layouts

Compressed, 1D, 3D, cube map, and non-power-of-two textures need to be 
supported in addition to "traditional" 2D power-of-two textures.

7. Allen Akin's pinned-texture proposal

If we ever expose memory management to the user (beyond texture 
priorities) we want to be sure our allocator is designed with this in mind.

8. Device independence

As much as possible, the source code for the memory manager should live 
somewhere device independent.  This is both for the benefit of newly 
developed drivers and for maintaining existing drivers.

* My Thoughts *

There are really only two radical departures from the existing memory 
manager.  The first is using the memory manager for non-texture memory 
objects.  The second, which is partially a result of the first, is the 
need to "pin" objects.  It would not do to have one context kick another 
context's depth-buffer out of memory!

My initial thought on how to accomplish this was to move the allocator 
into the kernel.  There would be a low-level allocator that could be 
used for non-texture buffers and a way to create textures (from data). 
In the texture case, the kernel would only allocate memory when a 
texture was used.  In stead of using the actual texture address in 
drawing command streams, the user-level driver would insert texture IDs. 
  The kernel would use these IDs to map to real texture addresses.

The benefit is that all memory management would be handled by a single 
omniscient execution context (the kernel).  The downside is that it 
would move a LOT of code into the kernel.  It would be almost entirely 
OS and device independent, but there would likely be a lot of it.

After talking with Jeff Hartmann in IRC on 1/13, I started thinking 
about all of this again.  Jeff had some serious reservations about 
moving that volume of code into the kernel, and he believed that all of 
the requirements could be met by a purely user-space implementation. 
After thinking about things some more, I'm starting to agree.

What follows is a fairly random series of thoughts on how a user-space 
memory manager could be made to work.

I believe that everything could be done by breaking each memory space 
down into blocks (as is currently done) and tracking two values, either 
implicitly or explicitly, with each block.  The first value is some sort 
of swap-out priority.  This is currently implicitly tracked by the list 
ordering in the SAREA.  The other value is basically a semaphore, but it 
could be implemented as a simple can-swap bit.

Blocks that have active depth-buffer would never have can-swap set. 
Blocks that have "normal" textures, back-buffer, render-target textures, 
and puffers would have their can-swap bit conditionally set.  Each of 
these types of blocks would have the can-swap bit cleared under the 
following situations:

     - Normal textures - While a rendering operation is queued that
       will use the texture.
     - SGIS_generate_mipmaps textures - While the blits are in progress
       to create the filtered mipmaps.
     - glCopyTexImage textures - While the blit to copy image data to
       the texture is in progress and while the data in the texture has
       not been copied to some sort of backing store.
     - pbuffers - While rendering operations to the pbuffer are in
       progress.  pbuffers have a mechanism to tell an application when
       the contents of the pbuffer have been "lost."  This could be
       exploited by the memory manager.  One caveat is when a pbuffer
       is bound to a texture (ARB_render_texture).  While the pbuffer
       is bound to a texture, its contents cannot be lost.  Can the
       contents be "swapped out" to some sort of backing store, like
       with glCopyTexImage targets?
     - Back-buffers - In unextended GLX, back-buffers can never be
       swapped.  However, if OML_sync_control is available, a "double
       buffered" visual may want to have many virtual back-buffers.
       Each time glXSwapBuffersMscOML (essentially an asynchronous
       glXSwapBuffers call) is made, a new back-buffer is allocated as
       the rendering target.  Once a back-buffer is copied to the
       front-buffer (i.e., the queued buffer-swap completes), the
       back-buffer can be swapped-out.

There may be other situations where can-swap is cleared, but that's all 
I could think of.  Similar rules would exist for vertex buffers (for 
ARB_vertex_array_object, EXT_compiled_vertex_array, optimized display 
lists, etc.).

Only a single bit per block is needed in the SAREA.  That bit is the 
union of the bits for each object that is part of that block.  This 
union must be calculated by the user-space driver.  This presents a 
possible problem of user-space clients failing to update the can-swap 
bits for some reason (process hung on blocking IO call?).  The current 
implementation avoids this problem by forcing all bocks to be swappable 
at all times.

At this point I'm left with a few questions.

1. In a scheme like this, how could processes be forced to update the
    can-swap bits on blocks that they own?
2. What is the best way for processes to be notified of events that
    could cause can-swap bits to change (i.e., rendering completion,
    asynchronous buffer-swap completion, etc.)?  Signals from the kernel?
    Polling "age" variables?
3. If some sort of signal based notification is used, could it be used
    to implement NV_fence and / or APPLE_fence?
4. How could the memory manager handle objects that span multiple
    blocks?  In other words, could the memory manager be made to prefer
    to swap-out blocks that wholly contain all of the objects that
    overlap the block?  Are there other useful metrics?  Prefer to
    swap-out blocks that are half full over blocks that are completely
    full?
5. What other things I have I missed that might prevent this system
    from working? :)

Thread: [Dri-devel] The next round of texture memory management...

dri-devel