Jens Owen wrote:
> Jeff Hartmann wrote:
>> Keith Whitwell wrote:
>>> Benjamin Herrenschmidt wrote:
>>>>> HOWEVER, if you tied the GART mapping to the DRM lock, you might be ok.
>>>>> That gives you the required system exclusion, and if you make it an
>>>>> explicit "get my GART context" function that is only called under
>>>>> the DRM
>>>>> lock _and_ only called when you actually need the AGP access, you also
>>>>> avoid the unnecessary context switches.
>>>>> You might still have some performance issues simply because you
>>>>> would do
>>>>> extra work when switching aperture mappings, but hopefully the GART
>>>>> wouldn't be a common operation.
>>>>> The flexibility you would get _might_ be worth it.
>>>> Well, I would personally vote for the processes _not_ relying on having
>>>> the AGP aperture mapped directly, but instead, the various memory pages
>>>> making their AGP aperture. Several chipsets (Apple ones for sure, but it
>>>> seems others are hitting this too nowadays) don't support AGP aperture
>>>> accesses from the CPU.
>>> What are you actually saying, that pages mapped in agp can't be
>>> written by any means, or just that they can't be written through the
>>> agp address range?
>>> It sounds kindof broken to me in any case. How to mtrrs work in this
>> Actually we should go to this model eventually. However it needs me to
>> have time to finish the Page Attribute Table support I started on at
>> VA. This allows write combining to be set on a per page basis, and is
>> the direction we want to go even on x86.
>>>> That way, if you want several AGP contexts, you can have the processes
>>>> tapping their AGP buffers without lock, locking would only be required
>>>> once it's time to move one of these buffers in/out the physical GART
>>>> under the arbitration of the DRM.
>>> You don't need to lock to write to agp buffers in the current scheme.
>>> You also don't need to play with the gart table just to draw a
>>> 2-triangle strip. On some chipsets, particularly under smp,
>>> modifying the gart table is very slow. Ask Jeff about this.
>> This is also true, but I've done alot of heavy think on this very
>> issue. The key is to manage the agp aperture and only swap out regions
>> when you absolutely have too. The big key to getting something like
>> this to work is a memory manager that every client uses, and is based on
>> some sort of sarea. It should be designed with a certain minimum block
>> size, and have a few different flags for what kind of usage that memory
>> block has. (I can go into more detail on design, but you probably have
>> a good idea what I mean here.) Then the next step is to create kernel
>> calls which can swap things to an from agp space and the card. One
>> cards that support it, another path (which prevents GART rewrites
>> entirely) is to add support to swap to normal cached memory.
>> This is what I envision making sense in the long run. A global
>> memory manager using an sarea (doesn't have to be the main one) and a
>> good aging mechanism get us most of the way there.
> It might be helpful to clarify the different uses we are discussing WRT
> to AGP. In this thread so far, we've been jumping all over. Here's a
> shot at an AGP breakdown. Feel free to correct my misconceptions.
> 1) The original utilization of AGP under Linux is faster MMIO
> transactions than PCI. Some level of improvement happens here by simply
> accessing a device on an AGP bus, and no special AGP programming is
> 2) Simple MMIO transactions can be optimized by enabling fast writes.
> This case is identical to the MMIO transactions in the first case, but
> the bus and graphics chipset utilize hardware pipelining to increase
> thrueput. There is a penalty for turning the bus around
> write/read/write/read because of the pipelining. There are also certain
> combinations of host chipsets and graphics chips where enabling fast
> writes can cause hangs.
> The remaining cases all utilize AGP bus mastering where the graphics
> chip can read and write directly from AGP memory.
> 3) Static AGP Allocation. This is the primary functionality that the
> agpgart module provides today. Physical memory is allocated by agpgart
> as needed and that memory is managed on behalf of the user space and DRM
> drivers at run time. There is a finite amount of this memory available
> dictated by the size of the AGP apperature (typically 64M). We have not
> fully exploited this case in user space, yet. The prototype for the AGP
> allocator and transfer mechanism of glDrawPixels in the Matrox G400
> driver is a good example of the potential here.
> 4) Dynamic AGP Binding. This functionality is spec'ed in the agpgart
> interface but is not fully implemented, yet. The intention is for user
> space processes to be able to bind normal virtual pages to the AGP
> apperature in a very dynamic fashion. Some of the discussions about
> binding and unbinding virtual memory make this option sound less
> appealing. Linus indicated it would probably be more efficient to just
> copy the virtual memory to an uncached AGP page from the static
> allocator case.
Fully implemented in agpgart, but not really used in any graphics driver
at this time.
> 5) AGP Swapping w/ Graphics HW access only. The hope here is that the
> graphics hardware could somehow utilize more memory than could fit in
> the apperature at any given time. This would be useful for efficiently
> swapping in and out large chunks of data only accessed by the graphics
> hardware. No need for VM access by the host, just a need to virtualize
> many instances of this type of data when swapping between graphics
> contexts (think private back buffers, extended texture store, etc). If
> a subset of the AGP apperature could be backed by a much larger number
> of uncached system memory pages, this would be a very useful mechanism
> indeed. We could even utilize the drmLock to protect access to these
> pages if that helped to minimize the TLB flush issues. I don't know if
> this can be done efficiently.
This sort of thing can be done, but it can get expensive. If I remember
correctly there really isn't too much expense on the cpu side for
binding / unbinding (I'm pretty sure we have the GATT/GART table mapped
uncached). However you have to insure that the graphics card isn't
accessing the memory anymore. This requires waiting on an age, perhaps
an interrupt, or in the worst case serializing the graphics pipe. If
the GATT/GART isn't properly mapped to a non-cached mapping, you also
will have the cpu expense. You basically need to only swap portions of
the agp aperture, this makes all these schemes more manageable.
> 6) AGP Swapping w/ Host access to Swapped in pages. Same as case 5, but
> we would also like the host to be able to access the pages when they are
> swapped in. This case would make it possible to put AGP textures in the
> swapped space.
I don't understand the difference of this case, you always can write to
things bound to the agp aperture
> 7) AGP Swapping w/ Host access to all pages. Same as case 5, but the
> host would also be able to access all AGP pages regardless of whether
> they are swapped in our not. This is alot like the case 4, Dynamic AGP
> Binding except the memory would be allocated from agpgart first. I
> don't know if accessing swapped out pages has any immediate value.
Actually this is probably more efficent. CPU TLB's are probably faster
than the host chipset TLB's, so writes to memory are probably faster
(needs testing, but a plausible effect.) If we do things this way, we
are also alot more flexible and cross platform friendly. The agp spec
does not require a host chipset to provide an interface for cpu to agp
aperture writes. As time progresses I believe many vendors will go to
this model. (Intel has for the ia64, Apple's chipsets have this
limitation, and probably a few others.)