From: Jesse B. <jb...@vi...> - 2008-04-04 19:56:39
|
On Friday, April 04, 2008 11:14 am Thomas Hellström wrote: > Dave Airlie wrote: > > I'm just wondering if rather than specify all the CACHED and MAPPABLE and > > SHAREABLE flags we make the BO interface in terms of CPU and GPU > > operations.. > > > > So we define > > CPU_READ - cpu needs to read from this buffer > > CPU_WRITE - cpu need to write to the buffer > > CPU_POOL - cpu wants to use the buffer for suballocs > > > > GPU_READ - gpu reads > > GPU_WRITE - gpu writes > > (GPU_EXEC??) - batchbuffers? (maybe buffers that need relocs.. not sure) > > > > We can then let the drivers internally decide what types of buffer to use > > and not expose the flags mess to userspace. > > > > Dave. > > This might be a good idea for most situations. However, there are > situations where the user-space drivers need to provide more info as to > what the buffers are used for. > > Cache coherent buffers is an excellent way to transfer data from GPU to > CPU, but they are usually very slow to render from. How would you tell > DRM that you want a cache-coherent buffer for download-from-screen type > of operations? They also can't be used in many cases, right? Which would mean something like a batchbuffer allocation would need CPU_READ|CPU_WRITE|GPU_READ|GPU_EXEC, which would have to be a WC mapping, but the driver wouldn't know just from the flags what type of mapping to create. So yeah, I think we need some notion of usage or at least a bit more granularity in the type passed down. Maybe it's instructive to take a look at the way Linux does DMA mapping for drivers? The basic concepts are coherent buffers, one time buffers, and device<->CPU ownership transfer. In the graphics case though, coherent mappings aren't *generally* possible (at least not yet), so we're reduced to doing non-coherent mappings and transferring ownership back & forth, or just keeping the mappings uncached on the CPU side in order to keep things consistent. Even that's not expressive enough for what we want though. For small objects, mapping into CPU space cached, then flushing out to the CPU may be much more expensive than just copying the data from a cacheable CPU buffer to a WC GTT page. But with large objects taking an existing CPU mapping, switching it to uncached and mapping its pages directly into the GTT is probably a big win (better yet, never map it into the CPU address space as cached at all to avoid all the flushing overhead). > Please take a look at i915tex (mesa i915tex_branch) > intel_buffer_objects.c, the function intel_bufferobj_select() that > translates the GL target + usage hints to a subset of the flags > available. My opinion is that we need to be able to keep this > functionality. It looks like that code is #if 0'd, but I like the idea that the various types are broken down into what type of memory will work best, and it definitely clarifies my understanding of the flags a bit. Of course, some man pages for the libdrm drmBO* calls would be even better. :) I think part of what we're running into here is platform specific. There's already a big divide between what might be necessary for pure UMA architectures vs. ones with lots of fast VRAM, and there are also the highly platform specific cacheability concerns for integrated devices on Intel. I just wonder if a general purpose memory manager is ever going to be "optimal" for a given platform... At SGI at least there tended to be new memory managers for each new architecture, without much sharing that I'm aware of... Anyway hopefully we can get this sorted out soon so we can push it all upstream along with the kernel mode setting work which depends on it. I think everyone's agreed that we want an API & architecture that's easy to understand for both users and developers; we must be getting close to that by now. :) Thanks, Jesse |