From: <th...@tu...> - 2007-10-09 12:32:52
|
Dave Airlie wrote: > >>> >> Dave, >> I like the idea of moving over to clflush, but I still think we >> shouldn't use TT drmBOs for batch buffers and buffers with similar use. >> (i915tex uses a sub-allocator for batch buffers, which avoids the cache >> flushes completely during normal rendering). >> I think something similar should be used from the X server. > > However you weren't doing in-kernel relocations.. ioremap/iounmap are > both causing flushes... so we would need to keep ioremapped cached > copies of the batchbuffer bo around.. which wastes vmalloc space.. or > use kmap and flush once per buffer.. my previous attempts to use kmap > without flushing failed badly.. > > Dave. Ah, OK. I see what you mean. So I know the Intel docs states that you must not access GART-mapped pages in CMA mode, but they don't say why. Are you using this mode for the kmap / flush mode? Since Poulsbo is CMA, to avoid the SMP ipi issue, it should be possible to enclose the whole reloc fixup within a spinlock and use kmap_atomic which should be faster than kmap. Since within a spinlock, also preemption is disabled we can guarantee that a batchbuffer write followed by a clflush executes on the same processor => no need for ipi, and the clflush can follow immediately after a write. We've used this technique in psb_mmu.c, although we're using preempt_disable() / preempt_enable() to collect per-processor clflushes. So, basically something like the following should be a fast ipi-free way to do this: spin_lock() while(more_relocs_to_do) { kmap_atomic(dst_buffer); // Reuse old map if same page apply_reloc(): clflush(newly_written_address); kunmap_atomic(dst_buffer); } spin_unlock(); /Thomas |