From: Thomas H. <th...@tu...> - 2008-02-29 15:03:19
|
Hi. I've pushed the intel-post-reloc branch with the following stuff: 1) Full backwards compatibility. 2) A new reloc type 1, which is applied _after_ all validations and with slightly different format. 3) If the buffer is idle, type 1 relocations are performed using the new kmap_atomic_prot_pfn if it's available. 4) If the buffer is busy, It's never mapped, and relocations are performed using a single dword 2D blit, and we never have to idle the buffer. This comes at a cost of an additional single MI_FLUSH after all blit-relocations have been performed. This could help avoid pre-validation relocation processing, race conditions due to the relocatee not being on the unfenced list when relocs are applied and unnecessary buffer idling. /Thomas |
From: seventh g. <sev...@gm...> - 2008-02-29 15:14:30
|
On Fri, Feb 29, 2008 at 3:03 PM, Thomas Hellström <th...@tu...> wrote: > Hi. > > I've pushed the intel-post-reloc branch with the following stuff: > > 1) Full backwards compatibility. > 2) A new reloc type 1, which is applied _after_ all validations and with > slightly different format. > 3) If the buffer is idle, type 1 relocations are performed using the new > kmap_atomic_prot_pfn if it's available. > 4) If the buffer is busy, It's never mapped, and relocations are > performed using a single dword 2D blit, and we never have to idle the > buffer. This comes at a cost of an additional single MI_FLUSH after all > blit-relocations have been performed. > > This could help avoid pre-validation relocation processing, race > conditions due to the relocatee not being on the unfenced list when > relocs are applied and unnecessary buffer idling. I can confirm that x86-64 compilation is fixed. Thanks! Cheers, Renato Caldas |
From: seventh g. <sev...@gm...> - 2008-02-29 15:15:52
|
On Fri, Feb 29, 2008 at 3:14 PM, seventh guardian <sev...@gm...> wrote: > On Fri, Feb 29, 2008 at 3:03 PM, Thomas Hellström > <th...@tu...> wrote: > > Hi. > > > > I've pushed the intel-post-reloc branch with the following stuff: > > > > 1) Full backwards compatibility. > > 2) A new reloc type 1, which is applied _after_ all validations and with > > slightly different format. > > 3) If the buffer is idle, type 1 relocations are performed using the new > > kmap_atomic_prot_pfn if it's available. > > 4) If the buffer is busy, It's never mapped, and relocations are > > performed using a single dword 2D blit, and we never have to idle the > > buffer. This comes at a cost of an additional single MI_FLUSH after all > > blit-relocations have been performed. > > > > This could help avoid pre-validation relocation processing, race > > conditions due to the relocatee not being on the unfenced list when > > relocs are applied and unnecessary buffer idling. > > I can confirm that x86-64 compilation is fixed. Thanks! Sorry, wrong topic.. > Cheers, > Renato Caldas > |
From: Eric A. <er...@an...> - 2008-03-04 04:36:23
|
On Fri, 2008-02-29 at 16:03 +0100, Thomas Hellström wrote: > Hi. > > I've pushed the intel-post-reloc branch with the following stuff: > > 1) Full backwards compatibility. > 2) A new reloc type 1, which is applied _after_ all validations and with > slightly different format. > 3) If the buffer is idle, type 1 relocations are performed using the new > kmap_atomic_prot_pfn if it's available. > 4) If the buffer is busy, It's never mapped, and relocations are > performed using a single dword 2D blit, and we never have to idle the > buffer. This comes at a cost of an additional single MI_FLUSH after all > blit-relocations have been performed. > > This could help avoid pre-validation relocation processing, race > conditions due to the relocatee not being on the unfenced list when > relocs are applied and unnecessary buffer idling. What are the performance results of using this? We've thought about doing this before, but cworth's experiments with it in the 2d driver were supposedly not too impressive. (but then, applying relocations to currently in-flight buffers is sufficiently rare I think that it probably doesn't matter) -- Eric Anholt anholt@FreeBSD.org er...@an... eri...@in... |
From: Thomas H. <th...@tu...> - 2008-03-04 10:38:42
|
Eric Anholt wrote: > On Fri, 2008-02-29 at 16:03 +0100, Thomas Hellström wrote: > >> Hi. >> >> I've pushed the intel-post-reloc branch with the following stuff: >> >> 1) Full backwards compatibility. >> 2) A new reloc type 1, which is applied _after_ all validations and with >> slightly different format. >> 3) If the buffer is idle, type 1 relocations are performed using the new >> kmap_atomic_prot_pfn if it's available. >> 4) If the buffer is busy, It's never mapped, and relocations are >> performed using a single dword 2D blit, and we never have to idle the >> buffer. This comes at a cost of an additional single MI_FLUSH after all >> blit-relocations have been performed. >> >> This could help avoid pre-validation relocation processing, race >> conditions due to the relocatee not being on the unfenced list when >> relocs are applied and unnecessary buffer idling. >> > > What are the performance results of using this? We've thought about > doing this before, but cworth's experiments with it in the 2d driver > were supposedly not too impressive. (but then, applying relocations to > currently in-flight buffers is sufficiently rare I think that it > probably doesn't matter) > I've tried this only with a patched version of the old i915tex driver and the cost of applying relocations for typical 3D use seems to be very small. Can't see any big performance- or CPU usage impact when I turn off the PRESUMED_OFFSET hint. However, re-running the relocation application 100 times per batch-buffer made gears framerate drop from 1050 or so to around 800. System cpu up from 14 to 40, so there is an impact. (These were kmap-applied relocs, not blitted ones). For blitted relocations, It's hard to tell, except that if they're forcefully used for applications like gears, there's no visible negative impact on framerate when swhitching off PRESUMED_OFFSET. /Thomas |
From: Jesse B. <jb...@vi...> - 2008-03-19 22:03:25
|
On Tuesday, March 04, 2008 3:38 am Thomas Hellström wrote: > Eric Anholt wrote: > > On Fri, 2008-02-29 at 16:03 +0100, Thomas Hellström wrote: > >> Hi. > >> > >> I've pushed the intel-post-reloc branch with the following stuff: > >> > >> 1) Full backwards compatibility. > >> 2) A new reloc type 1, which is applied _after_ all validations and with > >> slightly different format. > >> 3) If the buffer is idle, type 1 relocations are performed using the new > >> kmap_atomic_prot_pfn if it's available. > >> 4) If the buffer is busy, It's never mapped, and relocations are > >> performed using a single dword 2D blit, and we never have to idle the > >> buffer. This comes at a cost of an additional single MI_FLUSH after all > >> blit-relocations have been performed. > >> > >> This could help avoid pre-validation relocation processing, race > >> conditions due to the relocatee not being on the unfenced list when > >> relocs are applied and unnecessary buffer idling. > > > > What are the performance results of using this? We've thought about > > doing this before, but cworth's experiments with it in the 2d driver > > were supposedly not too impressive. (but then, applying relocations to > > currently in-flight buffers is sufficiently rare I think that it > > probably doesn't matter) > > I've tried this only with a patched version of the old i915tex driver > and the cost of applying relocations for typical 3D use seems to be very > small. Can't see any big performance- or CPU usage impact when I turn > off the PRESUMED_OFFSET hint. However, re-running the relocation > application 100 times per batch-buffer made gears framerate drop from > 1050 or so to around 800. System cpu up from 14 to 40, so there is an > impact. (These were kmap-applied relocs, not blitted ones). > For blitted relocations, It's hard to tell, except that if they're > forcefully used for applications like gears, there's no visible negative > impact on framerate when swhitching off PRESUMED_OFFSET. IIRC Eric had the relocation costs down in the "negligible" range, but with the latest Mesa & DRM bits, applying relocations seems to be a big part of openarena profiles at least: samples % app name symbol name 27354 11.0340 libopenal.so.0.0.0 (no symbols) 26907 10.8537 ioquake3.x86_64 (no symbols) 25328 10.2167 i915 i915_apply_reloc 10186 4.1088 i965_dri.so search_cache 9411 3.7962 intel_drv.so i830SetLVDSPanelPower 8920 3.5981 i915 i915_flush_ttm 7538 3.0407 cgame.o_uaKVFT (deleted) (no symbols) 6286 2.5356 libc-2.7.so memcpy 5398 2.1774 vmlinux read_hpet 4768 1.9233 vmlinux clear_page_c 4037 1.6284 i965_dri.so _mesa_UpdateTexEnvProgram 3824 1.5425 libpthread-2.7.so pthread_mutex_lock 3655 1.4743 vmlinux mwait_idle_with_hints 3015 1.2162 vmlinux acpi_os_read_port 2915 1.1758 i965_dri.so dri_ttm_bo_process_reloc 2830 1.1416 drm drm_ht_find_key 2629 1.0605 vmlinux acpi_idle_enter_bm 2563 1.0339 opreport (no symbols) I'm using the below profiling script to setup oprofile (i830SetLVDSPanelPower is still in there because profiling started right near the end of openarena's modesetting, which called dpms off/on). Thanks, Jesse opcontrol --reset openarena +exec anholt 2>&1 | egrep -e '[0-9]+ frames' & OPENARENA=$! sleep 10 # avoid openarena jit & mode setting opcontrol --start wait $OPENARENA opcontrol --dump opreport -t 1 -l opcontrol --stop |
From: Thomas H. <th...@tu...> - 2008-03-19 22:15:10
|
Jesse Barnes wrote: > On Tuesday, March 04, 2008 3:38 am Thomas Hellström wrote: > >> Eric Anholt wrote: >> >>> On Fri, 2008-02-29 at 16:03 +0100, Thomas Hellström wrote: >>> >>>> Hi. >>>> >>>> I've pushed the intel-post-reloc branch with the following stuff: >>>> >>>> 1) Full backwards compatibility. >>>> 2) A new reloc type 1, which is applied _after_ all validations and with >>>> slightly different format. >>>> 3) If the buffer is idle, type 1 relocations are performed using the new >>>> kmap_atomic_prot_pfn if it's available. >>>> 4) If the buffer is busy, It's never mapped, and relocations are >>>> performed using a single dword 2D blit, and we never have to idle the >>>> buffer. This comes at a cost of an additional single MI_FLUSH after all >>>> blit-relocations have been performed. >>>> >>>> This could help avoid pre-validation relocation processing, race >>>> conditions due to the relocatee not being on the unfenced list when >>>> relocs are applied and unnecessary buffer idling. >>>> >>> What are the performance results of using this? We've thought about >>> doing this before, but cworth's experiments with it in the 2d driver >>> were supposedly not too impressive. (but then, applying relocations to >>> currently in-flight buffers is sufficiently rare I think that it >>> probably doesn't matter) >>> >> I've tried this only with a patched version of the old i915tex driver >> and the cost of applying relocations for typical 3D use seems to be very >> small. Can't see any big performance- or CPU usage impact when I turn >> off the PRESUMED_OFFSET hint. However, re-running the relocation >> application 100 times per batch-buffer made gears framerate drop from >> 1050 or so to around 800. System cpu up from 14 to 40, so there is an >> impact. (These were kmap-applied relocs, not blitted ones). >> For blitted relocations, It's hard to tell, except that if they're >> forcefully used for applications like gears, there's no visible negative >> impact on framerate when swhitching off PRESUMED_OFFSET. >> > > IIRC Eric had the relocation costs down in the "negligible" range, but with > the latest Mesa & DRM bits, applying relocations seems to be a big part of > openarena profiles at least: > > samples % app name symbol name > 27354 11.0340 libopenal.so.0.0.0 (no symbols) > 26907 10.8537 ioquake3.x86_64 (no symbols) > 25328 10.2167 i915 i915_apply_reloc > 10186 4.1088 i965_dri.so search_cache > 9411 3.7962 intel_drv.so i830SetLVDSPanelPower > 8920 3.5981 i915 i915_flush_ttm > 7538 3.0407 cgame.o_uaKVFT (deleted) (no symbols) > 6286 2.5356 libc-2.7.so memcpy > 5398 2.1774 vmlinux read_hpet > 4768 1.9233 vmlinux clear_page_c > 4037 1.6284 i965_dri.so _mesa_UpdateTexEnvProgram > 3824 1.5425 libpthread-2.7.so pthread_mutex_lock > 3655 1.4743 vmlinux mwait_idle_with_hints > 3015 1.2162 vmlinux acpi_os_read_port > 2915 1.1758 i965_dri.so dri_ttm_bo_process_reloc > 2830 1.1416 drm drm_ht_find_key > 2629 1.0605 vmlinux acpi_idle_enter_bm > 2563 1.0339 opreport (no symbols) > > I'm using the below profiling script to setup oprofile (i830SetLVDSPanelPower > is still in there because profiling started right near the end of openarena's > modesetting, which called dpms off/on). > > Thanks, > Jesse > > opcontrol --reset > openarena +exec anholt 2>&1 | egrep -e '[0-9]+ frames' & > OPENARENA=$! > sleep 10 # avoid openarena jit & mode setting > opcontrol --start > wait $OPENARENA > opcontrol --dump > opreport -t 1 -l > opcontrol --stop > Jesse, The post-reloc branch should not in any way alter the way relocations are performed on the mesa master drivers, since they are still using relocation type 0. Post-relocs only affect relocation type 1. So the performance degradation is probably caused by something else. Could you narrow it down with a git-bisect? /Thomas |
From: Jesse B. <jb...@vi...> - 2008-03-19 22:44:31
|
On Wednesday, March 19, 2008 3:14 pm Thomas Hellström wrote: > > IIRC Eric had the relocation costs down in the "negligible" range, but > > with the latest Mesa & DRM bits, applying relocations seems to be a big > > part of openarena profiles at least: > > > > samples % app name symbol name > > 27354 11.0340 libopenal.so.0.0.0 (no symbols) > > 26907 10.8537 ioquake3.x86_64 (no symbols) > > 25328 10.2167 i915 i915_apply_reloc > > 10186 4.1088 i965_dri.so search_cache > > 9411 3.7962 intel_drv.so i830SetLVDSPanelPower > > 8920 3.5981 i915 i915_flush_ttm > > 7538 3.0407 cgame.o_uaKVFT (deleted) (no symbols) > > 6286 2.5356 libc-2.7.so memcpy > > 5398 2.1774 vmlinux read_hpet > > 4768 1.9233 vmlinux clear_page_c > > 4037 1.6284 i965_dri.so _mesa_UpdateTexEnvProgram > > 3824 1.5425 libpthread-2.7.so pthread_mutex_lock > > 3655 1.4743 vmlinux mwait_idle_with_hints > > 3015 1.2162 vmlinux acpi_os_read_port > > 2915 1.1758 i965_dri.so dri_ttm_bo_process_reloc > > 2830 1.1416 drm drm_ht_find_key > > 2629 1.0605 vmlinux acpi_idle_enter_bm > > 2563 1.0339 opreport (no symbols) > > > > I'm using the below profiling script to setup oprofile > > (i830SetLVDSPanelPower is still in there because profiling started right > > near the end of openarena's modesetting, which called dpms off/on). > > > > Thanks, > > Jesse > > > > opcontrol --reset > > openarena +exec anholt 2>&1 | egrep -e '[0-9]+ frames' & > > OPENARENA=$! > > sleep 10 # avoid openarena jit & mode setting > > opcontrol --start > > wait $OPENARENA > > opcontrol --dump > > opreport -t 1 -l > > opcontrol --stop > > Jesse, > The post-reloc branch should not in any way alter the way relocations > are performed on the mesa master drivers, since they are still using > relocation type 0. Post-relocs only affect relocation type 1. Ah ok... > So the performance degradation is probably caused by something else. > Could you narrow it down with a git-bisect? I'm not even sure there was a performance degradation. At 1024x768 I'm seeing ~46 FPS with Eric's demo regardless of whether the PRESUMED_OFFSET stuff is enabled or not, which doesn't sound too unreasonable. I was just worried that the profile might be way different than what I was hearing from Eric, but that could easily have been due to differences in the bits we're testing or the fact that he was using sysprof and not oprofile. Jesse |
From: Thomas H. <th...@tu...> - 2008-03-20 07:19:12
|
Jesse Barnes wrote: > On Wednesday, March 19, 2008 3:14 pm Thomas Hellström wrote: > >>> IIRC Eric had the relocation costs down in the "negligible" range, but >>> with the latest Mesa & DRM bits, applying relocations seems to be a big >>> part of openarena profiles at least: >>> >>> samples % app name symbol name >>> 27354 11.0340 libopenal.so.0.0.0 (no symbols) >>> 26907 10.8537 ioquake3.x86_64 (no symbols) >>> 25328 10.2167 i915 i915_apply_reloc >>> 10186 4.1088 i965_dri.so search_cache >>> 9411 3.7962 intel_drv.so i830SetLVDSPanelPower >>> 8920 3.5981 i915 i915_flush_ttm >>> 7538 3.0407 cgame.o_uaKVFT (deleted) (no symbols) >>> 6286 2.5356 libc-2.7.so memcpy >>> 5398 2.1774 vmlinux read_hpet >>> 4768 1.9233 vmlinux clear_page_c >>> 4037 1.6284 i965_dri.so _mesa_UpdateTexEnvProgram >>> 3824 1.5425 libpthread-2.7.so pthread_mutex_lock >>> 3655 1.4743 vmlinux mwait_idle_with_hints >>> 3015 1.2162 vmlinux acpi_os_read_port >>> 2915 1.1758 i965_dri.so dri_ttm_bo_process_reloc >>> 2830 1.1416 drm drm_ht_find_key >>> 2629 1.0605 vmlinux acpi_idle_enter_bm >>> 2563 1.0339 opreport (no symbols) >>> >>> I'm using the below profiling script to setup oprofile >>> (i830SetLVDSPanelPower is still in there because profiling started right >>> near the end of openarena's modesetting, which called dpms off/on). >>> >>> Thanks, >>> Jesse >>> >>> opcontrol --reset >>> openarena +exec anholt 2>&1 | egrep -e '[0-9]+ frames' & >>> OPENARENA=$! >>> sleep 10 # avoid openarena jit & mode setting >>> opcontrol --start >>> wait $OPENARENA >>> opcontrol --dump >>> opreport -t 1 -l >>> opcontrol --stop >>> >> Jesse, >> The post-reloc branch should not in any way alter the way relocations >> are performed on the mesa master drivers, since they are still using >> relocation type 0. Post-relocs only affect relocation type 1. >> > > Ah ok... > > >> So the performance degradation is probably caused by something else. >> Could you narrow it down with a git-bisect? >> > > I'm not even sure there was a performance degradation. At 1024x768 I'm seeing > ~46 FPS with Eric's demo regardless of whether the PRESUMED_OFFSET stuff is > enabled or not, which doesn't sound too unreasonable. I was just worried > that the profile might be way different than what I was hearing from Eric, > but that could easily have been due to differences in the bits we're testing > or the fact that he was using sysprof and not oprofile. > > Jesse > I would have thought that the PRESUMED_OFFSET stuff would take care of most relocations. However, when relocations type 0 _need_ to be performed there's a recent commit that evicts the relocatee first. For an app with many performed applications this would probably show up on a profile. I think the evict is needed to force a clflush() on the relocatee, but it should be more efficient to just clflush() the cache line of the value just written, while keeping the relocatee bound... /Thomas |