From: Jesse B. <jb...@vi...> - 2008-10-29 00:30:47
|
On Tuesday, October 28, 2008 4:55 pm Eric Anholt wrote: > On Tue, 2008-10-28 at 14:37 -0700, Jesse Barnes wrote: > > On Monday, October 27, 2008 11:27 am Jesse Barnes wrote: > > > On Friday, October 24, 2008 2:57 pm Jesse Barnes wrote: > > > > Ok this one doesn't crash and doesn't leave the flushing list full at > > > > leavevt time, so I think it's ready for some actual review. > > > > > > > > I'm using the patch I posted to intel-gfx@ to do tiled EXA pixmaps, > > > > but I think my approach of faulting in fence registers may not be the > > > > best one (though I haven't tried making the fence register allocator > > > > use LRU yet); it > > > > seems like we may want a big contiguous chunk of GTT space where > > > > pixmaps sit so we can re-use a single fence register to cover the > > > > needs of most pixmaps. Suggestions appreciated. > > > > > > > > This patch should be pretty safe to push upstream I think since the > > > > new code won't be used unless applications actually call the GTT > > > > mapping ioctl. > > > > > > And with untested 915/830 fence register support. > > > > Merged up to drm-next from this morning. > > > > Jesse > > Comment here noting what this function's about (handling gtt mmap setup) > might be nice, as just naming-wise it sounds similar to the gem mmap > ioctl which doesn't hit this. > > > +int > > +drm_gem_mmap(struct file *filp, struct vm_area_struct *vma) Oh yeah, should have thought of that. Will document. > > + /** > > + * Required alignment for the object > > + */ > > + uint32_t gtt_alignment; > > We've got another gtt alignment value that comes in through > i915_gem_pin_ioctl. We probably want to just calculate this one (which > is about getting things aligned for tiling fence regs) in the kernel > instad of being an ioctl argument from the gtt mmap ioctl, and be sure > in bind_to_gtt to take this value if it's larger than what pin passed > us. Yeah now that you mention it, that would be much better. I think we can calculate it correctly in every case for the various tiling formats and chip types, so we shouldn't need to pass it in. > > + /* Need a new fence register? */ > > + if (obj_priv->fence_reg == I915_FENCE_REG_NONE && > > + obj_priv->tiling_mode != I915_TILING_NONE) > > + i915_gem_object_get_fence_reg(obj); > > I like that this is here as opposed to always at object bind time. > However, we need to make sure we get a fence register covering X-tiled > stuff at exec time on pre-965, as it's relied on for 2D rendering (since > they didn't give us bits in the 2D instructions to specify it. sigh.) See below. > > +int > > +i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data, > > + struct drm_file *file_priv) > > +{ > > + struct drm_i915_gem_mmap_gtt *args = data; > > + struct drm_i915_private *dev_priv = dev->dev_private; > > + struct drm_gem_object *obj; > > + struct drm_i915_gem_object *obj_priv; > > + int ret; > > + > > + if (!(dev->driver->driver_features & DRIVER_GEM)) > > + return -ENODEV; > > + > > + mutex_lock(&dev->struct_mutex); > > + obj = drm_gem_object_lookup(dev, file_priv, args->handle); > > + if (obj == NULL) { > > + drm_gem_object_unreference(obj); > > + mutex_unlock(&dev->struct_mutex); > > + return -EBADF; > > + } > > You can call object_lookup without the struct_mutex held. It's protected > by the table_lock spinlock. Also, unref on obj == NULL? Oops. :) I'll fix that up. > > > @@ -656,10 +787,6 @@ i915_gem_retire_request(struct drm_device *dev, > > */ > > if (obj_priv->last_rendering_seqno != request->seqno) > > return; > > -#if WATCH_LRU > > - DRM_INFO("%s: retire %d moves to inactive list %p\n", > > - __func__, request->seqno, obj); > > -#endif > > > > if (obj->write_domain != 0) { > > list_move_tail(&obj_priv->list, > > @@ -788,7 +915,7 @@ i915_wait_request(struct drm_device *dev, uint32_t > > seqno) * buffer to have made it to the inactive list, and we would need * > > a separate wait queue to handle that. > > */ > > - if (ret == 0) > > + if (ret == 0 || ret == -ERESTARTSYS) > > i915_gem_retire_requests(dev); > > > > return ret; > > Not sure what these two hunks are doing in this patch. Just leftovers from my debugging of the flushing list stuff I saw, I can kill them (unless you think the second chunk is a valid fix; seems to be). > > + if (obj_priv->fence_reg != I915_FENCE_REG_NONE) { > > + if (IS_I965G(dev)) { > > + I915_WRITE64(FENCE_REG_965_0 + > > + (obj_priv->fence_reg * 8), 0); > > + } else { > > + I915_WRITE(FENCE_REG_830_0 + > > + (obj_priv->fence_reg * 4), 0); > > + } > > + dev_priv->fence_regs[obj_priv->fence_reg].obj = NULL; > > + obj_priv->fence_reg = I915_FENCE_REG_NONE; > > + } > > + > > i915_gem_object_free_page_list(obj); > > > > if (obj_priv->gtt_space) { > > Is unmap_mapping_range costly even when nobody's got it mapped? I'm > concerned, since we've already got significant overhead in unmap/map > thrashing that's hurting us on relevant apps. Not sure, I'll dig through it a little and see if we need to optimize away calls to it. > (didn't review any of the actual fence register mapping code that > follows, though I like that it's in pretty little functions now and not > the atrocity I committed in i830_memory.c I haven't tested the i915 bits yet, but yeah the functions are pretty small and easy to deal with. > > +try_again: > > + /* Could try to use LRU here instead... */ > > + for (i = dev_priv->fence_reg_start; > > + i < dev_priv->num_fence_regs; i++) { > > + reg = &dev_priv->fence_regs[i]; > > + old_obj_priv = reg->obj->driver_private; > > + if (!old_obj_priv->pin_count) > > + break; > > + } > > We'll definitely need to get this fixed, particularly on 915 where we'll > be hitting fence register setup from the accelerator (though we may want > to think about queuing flushes and register writes from the CP to avoid > lockstepping with the hardware too much) So yeah for i915 we may need to do the fence setup from the ring or batchbuffer rather than at GTT binding time, depending on how many tiled surfaces we expect to be in flight at any given time. On 915 and before we've only got 8 fence regs, so trying to map them all up front for a given batch may not be doable. But that means synchronizing the CP fence reg writes vs. GTT mapping faults... > > + /* > > + * Now things get ugly... we have to wait for one of the > > + * objects to finish before trying again. > > + */ > > + if (i == dev_priv->num_fence_regs) { > > + ret = i915_gem_object_wait_rendering(reg->obj); > > + if (ret) { > > + WARN(ret, "wait_rendering failed: %d\n", ret); > > + return; > > + } > > + goto try_again; > > + } > > + > > + /* > > + * Zap this virtual mapping so we can set up a fence again > > + * for this object next time we need it. > > + */ > > + offset = ((loff_t) reg->obj->map_list.hash.key) << PAGE_SHIFT; > > + unmap_mapping_range(dev->dev_mapping, offset, > > + reg->obj->size, 1); > > + old_obj_priv->fence_reg = I915_FENCE_REG_NONE; > > Move removal of a fence register from an object to its own function? Yeah that's a good idea. Will do. > > +out_free_mm: > > + drm_mm_put_block(list->file_offset_node); > > +out_free_list: > > + drm_free(list->map, sizeof(struct drm_map_list), DRM_MEM_DRIVER); > > + > > + return ret; > > } > > Given that the number of GTT mmaps will be much lower than the number of > objects, this setup should probably be part of gtt_mmap_ioctl. Yeah, you're right, I'll move it over and just do the bare minimum of initialization in object_init. > > @@ -2269,8 +2658,7 @@ i915_gem_idle(struct drm_device *dev) > > */ > > i915_gem_flush(dev, ~(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT), > > ~(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT)); > > - seqno = i915_add_request(dev, ~(I915_GEM_DOMAIN_CPU | > > - I915_GEM_DOMAIN_GTT)); > > + seqno = i915_add_request(dev, ~I915_GEM_DOMAIN_CPU); > > > > if (seqno == 0) { > > mutex_unlock(&dev->struct_mutex); > > I don't think I've thought about this enough yet. I think this is actually papering over a bug from elsewhere. It looks like Keith put together some fixes in this area, but the bugs he described didn't sound quite like what I was seeing. Before this change I'd see a ton of objects on the flushing list at leavevt time, but not all of them. So it could be the race he talked about; I haven't tested yet. > > @@ -2303,7 +2691,15 @@ i915_gem_idle(struct drm_device *dev) > > * waited for a sequence higher than any pending execbuffer > > */ > > BUG_ON(!list_empty(&dev_priv->mm.active_list)); > > - BUG_ON(!list_empty(&dev_priv->mm.flushing_list)); > > + if (!list_empty(&dev_priv->mm.flushing_list)) { > > + struct drm_i915_gem_object *obj_priv; > > + DRM_ERROR("flushing list not empty, still has:\n"); > > + list_for_each_entry(obj_priv, &dev_priv->mm.flushing_list, > > + list) { > > + DRM_ERROR(" %p: %d\n", obj_priv, > > + obj_priv->last_rendering_seqno); > > + } > > + } > > > > /* Request should now be empty as we've also waited > > * for the last request in the list > > With keithp's fix this shouldn't be necessary, and probably shouldn't be > part of this patch either way. Yeah this was just debug code so I could figure out what was going wrong with the flushing. I'll remove it. > > @@ -2566,5 +2961,13 @@ i915_gem_load(struct drm_device *dev) > > i915_gem_retire_work_handler); > > dev_priv->mm.next_gem_seqno = 1; > > > > + /* Old X drivers will take 0-2 for front, back, depth buffers */ > > + dev_priv->fence_reg_start = 3; > > + > > + if (IS_I965G(dev)) > > + dev_priv->num_fence_regs = 16; > > + else > > + dev_priv->num_fence_regs = 8; > > + > > i915_gem_detect_bit_6_swizzle(dev); > > I like this. Now to remember at some point have the driver tell us that > it's not using 0-3. Yeah; though we may be able to bump up the limit on the first GTT mapping ioctl, since only new X drivers will know about it and they'll not be doing their own fence register management. Thanks for another thorough review. -- Jesse Barnes, Intel Open Source Technology Center |