Jesse Barnes wrote:
> On Tuesday, March 04, 2008 3:38 am Thomas Hellström wrote:
>> Eric Anholt wrote:
>>> On Fri, 2008-02-29 at 16:03 +0100, Thomas Hellström wrote:
>>>> I've pushed the intel-post-reloc branch with the following stuff:
>>>> 1) Full backwards compatibility.
>>>> 2) A new reloc type 1, which is applied _after_ all validations and with
>>>> slightly different format.
>>>> 3) If the buffer is idle, type 1 relocations are performed using the new
>>>> kmap_atomic_prot_pfn if it's available.
>>>> 4) If the buffer is busy, It's never mapped, and relocations are
>>>> performed using a single dword 2D blit, and we never have to idle the
>>>> buffer. This comes at a cost of an additional single MI_FLUSH after all
>>>> blit-relocations have been performed.
>>>> This could help avoid pre-validation relocation processing, race
>>>> conditions due to the relocatee not being on the unfenced list when
>>>> relocs are applied and unnecessary buffer idling.
>>> What are the performance results of using this? We've thought about
>>> doing this before, but cworth's experiments with it in the 2d driver
>>> were supposedly not too impressive. (but then, applying relocations to
>>> currently in-flight buffers is sufficiently rare I think that it
>>> probably doesn't matter)
>> I've tried this only with a patched version of the old i915tex driver
>> and the cost of applying relocations for typical 3D use seems to be very
>> small. Can't see any big performance- or CPU usage impact when I turn
>> off the PRESUMED_OFFSET hint. However, re-running the relocation
>> application 100 times per batch-buffer made gears framerate drop from
>> 1050 or so to around 800. System cpu up from 14 to 40, so there is an
>> impact. (These were kmap-applied relocs, not blitted ones).
>> For blitted relocations, It's hard to tell, except that if they're
>> forcefully used for applications like gears, there's no visible negative
>> impact on framerate when swhitching off PRESUMED_OFFSET.
> IIRC Eric had the relocation costs down in the "negligible" range, but with
> the latest Mesa & DRM bits, applying relocations seems to be a big part of
> openarena profiles at least:
> samples % app name symbol name
> 27354 11.0340 libopenal.so.0.0.0 (no symbols)
> 26907 10.8537 ioquake3.x86_64 (no symbols)
> 25328 10.2167 i915 i915_apply_reloc
> 10186 4.1088 i965_dri.so search_cache
> 9411 3.7962 intel_drv.so i830SetLVDSPanelPower
> 8920 3.5981 i915 i915_flush_ttm
> 7538 3.0407 cgame.o_uaKVFT (deleted) (no symbols)
> 6286 2.5356 libc-2.7.so memcpy
> 5398 2.1774 vmlinux read_hpet
> 4768 1.9233 vmlinux clear_page_c
> 4037 1.6284 i965_dri.so _mesa_UpdateTexEnvProgram
> 3824 1.5425 libpthread-2.7.so pthread_mutex_lock
> 3655 1.4743 vmlinux mwait_idle_with_hints
> 3015 1.2162 vmlinux acpi_os_read_port
> 2915 1.1758 i965_dri.so dri_ttm_bo_process_reloc
> 2830 1.1416 drm drm_ht_find_key
> 2629 1.0605 vmlinux acpi_idle_enter_bm
> 2563 1.0339 opreport (no symbols)
> I'm using the below profiling script to setup oprofile (i830SetLVDSPanelPower
> is still in there because profiling started right near the end of openarena's
> modesetting, which called dpms off/on).
> opcontrol --reset
> openarena +exec anholt 2>&1 | egrep -e '[0-9]+ frames' &
> sleep 10 # avoid openarena jit & mode setting
> opcontrol --start
> wait $OPENARENA
> opcontrol --dump
> opreport -t 1 -l
> opcontrol --stop
The post-reloc branch should not in any way alter the way relocations
are performed on the mesa master drivers, since they are still using
relocation type 0. Post-relocs only affect relocation type 1.
So the performance degradation is probably caused by something else.
Could you narrow it down with a git-bisect?