Re: [rfc] cache flush avoidance..

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Dave Airlie wrote:
>
>>>
>> Dave,
>> I like the idea of moving over to clflush, but I still think we
>> shouldn't use TT drmBOs for batch buffers and buffers with similar use.
>> (i915tex uses a sub-allocator for batch buffers, which avoids the cache
>> flushes completely during normal rendering).
>> I think something similar should be used from the X server.
>
> However you weren't doing in-kernel relocations.. ioremap/iounmap are 
> both causing flushes... so we would need to keep ioremapped cached 
> copies of the batchbuffer bo around.. which wastes vmalloc space.. or 
> use kmap and flush once per buffer.. my previous attempts to use kmap 
> without flushing failed badly..
>
> Dave.
Ah, OK. I see what you mean.

So I know the Intel docs states that you must not access GART-mapped 
pages in CMA mode, but they don't say why.
Are you using this mode for the kmap / flush mode?

Since Poulsbo is CMA, to avoid the SMP ipi issue, it should be possible 
to enclose the whole reloc fixup within a spinlock and use
kmap_atomic which should be faster than kmap.
Since within a spinlock, also preemption is disabled we can guarantee 
that a batchbuffer write followed by a clflush executes on the same 
processor => no need for ipi, and the clflush can follow immediately 
after a write.
We've used this technique in psb_mmu.c, although we're using 
preempt_disable() / preempt_enable() to collect per-processor clflushes.

So, basically something like the following should be a fast ipi-free way 
to do this:

spin_lock()
while(more_relocs_to_do) {
  kmap_atomic(dst_buffer); // Reuse old map if same page
  apply_reloc():
  clflush(newly_written_address);
  kunmap_atomic(dst_buffer);
}
spin_unlock();

/Thomas