Re: ttm bo interface..

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Friday, April 04, 2008 11:14 am Thomas Hellström wrote:
> Dave Airlie wrote:
> > I'm just wondering if rather than specify all the CACHED and MAPPABLE and
> > SHAREABLE flags we make the BO interface in terms of CPU and GPU
> > operations..
> >
> > So we define
> > CPU_READ  - cpu needs to read from this buffer
> > CPU_WRITE - cpu need to write to the buffer
> > CPU_POOL  - cpu wants to use the buffer for suballocs
> >
> > GPU_READ  - gpu reads
> > GPU_WRITE - gpu writes
> > (GPU_EXEC??) - batchbuffers? (maybe buffers that need relocs.. not sure)
> >
> > We can then let the drivers internally decide what types of buffer to use
> > and not expose the flags mess to userspace.
> >
> > Dave.
>
> This might be a good idea for most situations. However, there are
> situations where the user-space drivers need to provide more info as to
> what the buffers are used for.
>
> Cache coherent buffers is an excellent way to transfer data from GPU to
> CPU, but they are usually very slow to render from. How would you tell
> DRM that you want a cache-coherent buffer for download-from-screen type
> of operations?

They also can't be used in many cases, right?  Which would mean something like 
a batchbuffer allocation would need CPU_READ|CPU_WRITE|GPU_READ|GPU_EXEC, 
which would have to be a WC mapping, but the driver wouldn't know just from 
the flags what type of mapping to create.  So yeah, I think we need some 
notion of usage or at least a bit more granularity in the type passed down.

Maybe it's instructive to take a look at the way Linux does DMA mapping for 
drivers?  The basic concepts are coherent buffers, one time buffers, and 
device<->CPU ownership transfer.  In the graphics case though, coherent 
mappings aren't *generally* possible (at least not yet), so we're reduced to 
doing non-coherent mappings and transferring ownership back & forth, or just 
keeping the mappings uncached on the CPU side in order to keep things 
consistent.

Even that's not expressive enough for what we want though.  For small objects, 
mapping into CPU space cached, then flushing out to the CPU may be much more 
expensive than just copying the data from a cacheable CPU buffer to a WC GTT 
page.  But with large objects taking an existing CPU mapping, switching it to 
uncached and mapping its pages directly into the GTT is probably a big win 
(better yet, never map it into the CPU address space as cached at all to 
avoid all the flushing overhead).

> Please take a look at i915tex (mesa i915tex_branch)
> intel_buffer_objects.c, the function intel_bufferobj_select() that
> translates the GL target + usage hints to a subset of the flags
> available. My opinion is that we need to be able to keep this
> functionality.

It looks like that code is #if 0'd, but I like the idea that the various types 
are broken down into what type of memory will work best, and it definitely 
clarifies my understanding of the flags a bit.  Of course, some man pages for 
the libdrm drmBO* calls would be even better. :)

I think part of what we're running into here is platform specific.  There's 
already a big divide between what might be necessary for pure UMA 
architectures vs. ones with lots of fast VRAM, and there are also the highly 
platform specific cacheability concerns for integrated devices on Intel.  I 
just wonder if a general purpose memory manager is ever going to be "optimal" 
for a given platform...  At SGI at least there tended to be new memory 
managers for each new architecture, without much sharing that I'm aware of...

Anyway hopefully we can get this sorted out soon so we can push it all 
upstream along with the kernel mode setting work which depends on it.  I 
think everyone's agreed that we want an API & architecture that's easy to 
understand for both users and developers; we must be getting close to that by 
now. :)

Thanks,
Jesse