From: Luc V. <li...@sk...> - 2009-08-18 23:12:12
|
On Wed, Aug 19, 2009 at 09:07:55AM +1000, Dave Airlie wrote: > On Wed, Aug 19, 2009 at 8:03 AM, Luc Verhaegen<li...@sk...> wrote: > > On Wed, Aug 19, 2009 at 07:03:41AM +1000, Dave Airlie wrote: > >> On Wed, Aug 19, 2009 at 2:12 AM, Keith Whitwell<ke...@vm...> wrote: > >> > I think the bug in question was because somebody (Jon Smirl??) > >> > removed the empty & apparently unused poll implementation from the > >> > drm fd, only to discover that the X server was actually polling the > >> > fd. > >> > >> You have a better memory than me, but thats exactly what happened alright. > >> > >> Dave. > > > > You never did any of the sort, right? > > > > Try pouncing issues created by those who are currently still active, > > preferably shortly after they made their mistakes, even if they belong > > to your current political faction. > > > > Thanks Luc, yet again your technical contribution to the topic at hand > is outstandingly useful, hopefully any future employers understand your > ability to involve yourself in technical discourse. > > Dave. And on the flip side of this, what you do is purely technical, always. #dri-devel 00:05 <+airlied> krh: you've been smirled Luc Verhaegen. |
From: Kristian H. <kr...@bi...> - 2009-08-18 23:17:20
|
On Tue, Aug 18, 2009 at 7:12 PM, Luc Verhaegen<li...@sk...> wrote: > And on the flip side of this, what you do is purely technical, always. > > #dri-devel 00:05 <+airlied> krh: you've been smirled Yeah, I was wondering about that, I thought the word was smirl-rolled :D cheers, Kristian |
From: Kristian H. <kr...@bi...> - 2009-08-19 00:41:11
|
On Tue, Aug 18, 2009 at 6:31 PM, Thomas Hellström<th...@sh...> wrote: > Jesse Barnes wrote: >> >> Anyway, to me this discussion is more of a "future directions" one than >> a blocker for this particular patchset. AFAICT the only thing that >> needs fixing with this patch is my lock confusion (struct_mutex vs >> mode_config). Or would you like something substantial changed with >> these bits before they land? >> >> > > I think as long as there are efficient ways to short-circuit this > implementation if the command submission mechanism does a better job, I > think it's OK. I had a quick look in the new dri2 code and it looks like the > driver can trick dri2 into thinking there are no outstanding swaps, thus > avoiding the event reads? What's important is that the IOCTL interface gets > right. Let me follow up on this tomorrow, I've used up my out-going mail quota for today. > What remains to be fixed are the mutex issue and possibly the poll issue. I'll resend the patch with the mutex issue fixed. The poll issue is not a problem as long as we don't enable DRI1 and DRI2 at the same time. > I also think without looking to closely that the drm_read() function doesn't > do return values properly (see ldd3 and man 2 read for return values for the > various blocking modes). In particular, wait_event_interruptible() may > return -ERESTARTSYS which should never be returned to user-space. Returning -ERESTARTSYS from read is the expected behaviour in case you get a signal as far as I know and is handled by glibc. It's used all over the kernel. > It also looks like the new ioctl argument lengths are not 64 bit aligned? Not sure what you mean... that the size of the ioctl struct isn't a multiple of 64 bits? That's not a requirement. cheers, Kristian |
From: Thomas H. <th...@sh...> - 2009-08-19 07:33:09
|
Kristian Høgsberg wrote: >> I also think without looking to closely that the drm_read() function doesn't >> do return values properly (see ldd3 and man 2 read for return values for the >> various blocking modes). In particular, wait_event_interruptible() may >> return -ERESTARTSYS which should never be returned to user-space. >> > > Returning -ERESTARTSYS from read is the expected behaviour in case you > get a signal as far as I know and is handled by glibc. It's used all > over the kernel. > Ah , yes. I was confusing syscalls and ioctls. Sorry. > >> It also looks like the new ioctl argument lengths are not 64 bit aligned? >> > > Not sure what you mean... that the size of the ioctl struct isn't a > multiple of 64 bits? That's not a requirement. > You'll get into 64 / 32 bit incompatibilities. DRM used to check that the user-space size and the kernel-space size of core ioctl arguments were the same, and that they had the same read / write mode. This has been changed so that the kernel always uses the kernel space size when copying. (I'm not sure why, since it was actually a good way to catch errors and should ideally be implemented by drivers as well). Anyway, If your 32-bit argument is 12 bytes and the 64-bit kernel tries to copy 16, you might end up with an EFAULT in the ioctl. Of course one could change the drm implementation to always use the user-space size as it does for driver-specific ioctls. Better to use all structs padded to 64-bits. Particularly if there are arguments which have structs embedded within other structs. /Thomas > cheers, > Kristian > |
From: Kristian H. <kr...@re...> - 2009-08-27 01:10:14
|
This patch adds a vblank synced pageflip ioctl for to the modesetting family of ioctls. The ioctl takes a crtc and an fb and schedules a pageflip to the new fb at the next coming vertical blank event. This feature lets userspace implement tear-free updating of the screen contents with hw-guaranteed low latency page flipping. The ioctl is asynchronous in that it returns immediately and then later notifies the client by making an event available for reading on the drm fd. This lets applications add the drm fd to their main loop and handle other tasks while waiting for the flip to happen. The event includes the time of the flip, the frame counter and a 64 bit opaque token provided by user space in the ioctl. Based on work and suggestions from Jesse Barnes <jb...@vi...>, Jakob Bornecrantz <wal...@gm...>, Chris Wilson <ch...@ch...> Signed-off-by: Kristian Høgsberg <kr...@re...> Signed-off-by: Jesse Barnes <jb...@vi...> --- Okay, here's the patch again with the locking fixed. Everything else is the same. As for the potential problem with polling on the drm fd with both dri1 and dri2 enabled, I suggest that we just disable the dri1 sigio handler in the xserver. We'll do this in a commit that precedes the merge of the page flipping code, so that no server will be able to poll on the drm fd in both dri1 and dri2 code. cheers, Kristian drivers/gpu/drm/drm_crtc.c | 171 ++++++++++++++++++++++++++++++- drivers/gpu/drm/drm_crtc_helper.c | 10 ++ drivers/gpu/drm/drm_drv.c | 1 + drivers/gpu/drm/drm_fops.c | 68 ++++++++++++- drivers/gpu/drm/drm_irq.c | 42 ++++++++ drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/intel_display.c | 16 +++- drivers/gpu/drm/radeon/radeon_display.c | 3 +- include/drm/drm.h | 25 +++++ include/drm/drmP.h | 32 ++++++ include/drm/drm_crtc.h | 27 +++++ include/drm/drm_crtc_helper.h | 4 + include/drm/drm_mode.h | 16 +++ 13 files changed, 409 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 8fab789..d877c21 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -34,6 +34,8 @@ #include "drmP.h" #include "drm_crtc.h" +#undef set_base + struct drm_prop_enum_list { int type; char *name; @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb) EXPORT_SYMBOL(drm_framebuffer_cleanup); /** + * drm_crtc_async_flip - do a set_base call from a work queue + * @work: work struct + * + * Called when a set_base call is queued by the page flip code. This + * allows the flip ioctl itself to return immediately and allow userspace + * to continue working. + */ +static void drm_crtc_async_flip(struct work_struct *work) +{ + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip); + struct drm_device *dev = crtc->dev; + struct drm_pending_flip *pending; + + mutex_lock(&dev->mode_config.mutex); + + BUG_ON(crtc->pending_flip == NULL); + + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL); + + pending = crtc->pending_flip; + crtc->pending_flip = NULL; + + pending->frame = drm_vblank_count(dev, crtc->pipe); + list_add_tail(&pending->link, &dev->flip_list); + + mutex_unlock(&dev->mode_config.mutex); +} + +/** * drm_crtc_init - Initialise a new CRTC object * @dev: DRM device * @crtc: CRTC object to init @@ -352,17 +383,19 @@ EXPORT_SYMBOL(drm_framebuffer_cleanup); * * Inits a new object created as base part of an driver crtc object. */ -void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, +void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, int pipe, const struct drm_crtc_funcs *funcs) { crtc->dev = dev; crtc->funcs = funcs; + crtc->pipe = pipe; mutex_lock(&dev->mode_config.mutex); drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); list_add_tail(&crtc->head, &dev->mode_config.crtc_list); dev->mode_config.num_crtc++; + INIT_WORK(&crtc->async_flip, drm_crtc_async_flip); mutex_unlock(&dev->mode_config.mutex); } EXPORT_SYMBOL(drm_crtc_init); @@ -381,6 +414,9 @@ void drm_crtc_cleanup(struct drm_crtc *crtc) { struct drm_device *dev = crtc->dev; + mutex_lock(&dev->mode_config.mutex); + flush_work(&crtc->async_flip); + if (crtc->gamma_store) { kfree(crtc->gamma_store); crtc->gamma_store = NULL; @@ -388,6 +424,7 @@ void drm_crtc_cleanup(struct drm_crtc *crtc) drm_mode_object_put(dev, &crtc->base); list_del(&crtc->head); + mutex_unlock(&dev->mode_config.mutex); dev->mode_config.num_crtc--; } EXPORT_SYMBOL(drm_crtc_cleanup); @@ -2452,3 +2489,135 @@ out: mutex_unlock(&dev->mode_config.mutex); return ret; } + +/** + * drm_mode_page_flip_ioctl - page flip ioctl + * @dev: DRM device + * @data: ioctl args + * @file_priv: file private data + * + * The page flip ioctl replaces the current front buffer with a new + * one, using the CRTC's set_base function, which should just update + * the front buffer base pointer. It's up to set_base to make + * sure the update doesn't result in tearing (on some hardware the + * base register is double buffered, so this is easy). + * + * Note that this covers just the simple case of flipping the front + * buffer immediately. Interval handling and interlaced modes have to + * be handled by userspace, or with new ioctls. + */ +int drm_mode_page_flip_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_pending_flip *pending; + struct drm_mode_page_flip *flip_data = data; + struct drm_mode_object *drm_obj, *fb_obj; + struct drm_crtc *crtc; + int ret = 0; + + if (!(drm_core_check_feature(dev, DRIVER_MODESET))) + return -ENODEV; + + /* + * Reject unknown flags so future userspace knows what we (don't) + * support + */ + if (flip_data->flags & (~DRM_MODE_PAGE_FLIP_FLAGS_MASK)) { + DRM_DEBUG("bad page flip flags\n"); + return -EINVAL; + } + + pending = kzalloc(sizeof *pending, GFP_KERNEL); + if (pending == NULL) + return -ENOMEM; + + mutex_lock(&dev->mode_config.mutex); + + fb_obj = drm_mode_object_find(dev, flip_data->fb_id, + DRM_MODE_OBJECT_FB); + if (!fb_obj) { + DRM_DEBUG("unknown fb %d\n", flip_data->fb_id); + ret = -ENOENT; + goto out_unlock; + } + + drm_obj = drm_mode_object_find(dev, flip_data->crtc_id, + DRM_MODE_OBJECT_CRTC); + if (!drm_obj) { + DRM_DEBUG("unknown crtc %d\n", flip_data->crtc_id); + ret = -ENOENT; + goto out_unlock; + } + crtc = obj_to_crtc(drm_obj); + if (!crtc->enabled) { + DRM_DEBUG("crtc %d not enabled\n", flip_data->crtc_id); + ret = -EINVAL; + goto out_unlock; + } + + if (crtc->fb->funcs->unpin == NULL) { + DRM_DEBUG("fb for crtc %d does not support delayed unpin\n", + flip_data->crtc_id); + ret = -ENODEV; + goto out_unlock; + } + + pending->crtc = crtc; + pending->old_fb = crtc->fb; + pending->pipe = crtc->pipe; + INIT_LIST_HEAD(&pending->link); + pending->event.base.type = DRM_EVENT_MODE_PAGE_FLIP; + pending->event.base.length = sizeof pending->event; + pending->event.user_data = flip_data->user_data; + pending->pending_event.event = &pending->event.base; + pending->pending_event.file_priv = file_priv; + pending->pending_event.destroy = + (void (*) (struct drm_pending_event *)) kfree; + + /* Get vblank ref for completion handling */ + ret = drm_vblank_get(dev, crtc->pipe); + if (ret) { + DRM_DEBUG("failed to take vblank ref\n"); + goto out_unlock; + } + + /* + * The set_base call will change the domain on the new fb, + * which will force the rendering to finish and block the + * ioctl. We need to do this last part from a work queue, to + * avoid blocking userspace here. + */ + crtc->fb = obj_to_fb(fb_obj); + + if (crtc->pending_flip != NULL) { + struct drm_pending_flip *old_flip; + + /* We have an outstanding flip request for this crtc/pipe. + * In order to satisfy the user we can either queue the requests + * and apply them on sequential vblanks, or we can drop old + * requests. + * + * Here we choose to discard the previous request for + * simplicity. Note that since we have not yet applied the + * previous flip, we need to preserve the original (i.e. still + * current) fb. + */ + + old_flip = crtc->pending_flip; + pending->old_fb = old_flip->old_fb; + old_flip->old_fb = NULL; + drm_finish_pending_flip (dev, old_flip, 0); + } else + schedule_work(&crtc->async_flip); + crtc->pending_flip = pending; + + mutex_unlock(&dev->mode_config.mutex); + + return 0; + +out_unlock: + mutex_unlock(&dev->mode_config.mutex); + kfree(pending); + + return ret; +} diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index 3da9cfa..b2d694f 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -1095,3 +1095,13 @@ int drm_helper_resume_force_mode(struct drm_device *dev) return 0; } EXPORT_SYMBOL(drm_helper_resume_force_mode); + +int +drm_crtc_helper_set_base(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb) +{ + struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private; + + return crtc_funcs->mode_set_base(crtc, x, y, old_fb); +} +EXPORT_SYMBOL(drm_crtc_helper_set_base); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index b39d7bf..c66c993 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -145,6 +145,7 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETFB, drm_mode_getfb, DRM_MASTER|DRM_CONTROL_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB, drm_mode_addfb, DRM_MASTER|DRM_CONTROL_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_MODE_RMFB, drm_mode_rmfb, DRM_MASTER|DRM_CONTROL_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_MODE_PAGE_FLIP, drm_mode_page_flip_ioctl, DRM_MASTER|DRM_CONTROL_ALLOW), }; #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 251bc0e..dcd9c66 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct file *filp, INIT_LIST_HEAD(&priv->lhead); INIT_LIST_HEAD(&priv->fbs); + INIT_LIST_HEAD(&priv->event_list); + init_waitqueue_head(&priv->event_wait); if (dev->driver->driver_features & DRIVER_GEM) drm_gem_open(dev, priv); @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp) { struct drm_file *file_priv = filp->private_data; struct drm_device *dev = file_priv->minor->dev; + struct drm_pending_flip *f, *ft; + struct drm_pending_event *e, *et; + int retcode = 0; lock_kernel(); @@ -451,6 +456,19 @@ int drm_release(struct inode *inode, struct file *filp) if (file_priv->minor->master) drm_master_release(dev, filp); + mutex_lock(&dev->struct_mutex); + + /* Remove pending flips */ + list_for_each_entry_safe(f, ft, &dev->flip_list, link) + if (f->pending_event.file_priv == file_priv) + drm_finish_pending_flip(dev, f, 0); + + /* Remove unconsumed events */ + list_for_each_entry_safe(e, et, &file_priv->event_list, link) + e->destroy(e); + + mutex_unlock(&dev->struct_mutex); + if (dev->driver->driver_features & DRIVER_GEM) drm_gem_release(dev, file_priv); @@ -544,9 +562,55 @@ int drm_release(struct inode *inode, struct file *filp) } EXPORT_SYMBOL(drm_release); -/** No-op. */ +ssize_t drm_read(struct file *filp, char __user *buffer, + size_t count, loff_t *offset) +{ + struct drm_file *file_priv = filp->private_data; + struct drm_device *dev = file_priv->minor->dev; + struct drm_pending_event *event; + ssize_t total, ret; + + ret = wait_event_interruptible(file_priv->event_wait, + !list_empty(&file_priv->event_list)); + if (ret < 0) + return ret; + + total = 0; + while (!list_empty(&file_priv->event_list)) { + mutex_lock(&dev->struct_mutex); + event = list_first_entry(&file_priv->event_list, + struct drm_pending_event, link); + if (total + event->event->length > count) { + mutex_unlock(&dev->struct_mutex); + break; + } + list_del(&event->link); + mutex_unlock(&dev->struct_mutex); + + if (copy_to_user(buffer + total, + event->event, event->event->length)) { + total = -EFAULT; + break; + } + + total += event->event->length; + event->destroy(event); + } + + return total; +} +EXPORT_SYMBOL(drm_read); + unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait) { - return 0; + struct drm_file *file_priv = filp->private_data; + unsigned int mask = 0; + + poll_wait(filp, &file_priv->event_wait, wait); + + if (!list_empty(&file_priv->event_list)) + mask |= POLLIN | POLLRDNORM; + + return mask; } EXPORT_SYMBOL(drm_poll); diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index b4a3dbc..a0c120c 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -34,6 +34,7 @@ */ #include "drmP.h" +#include "drm_crtc_helper.h" #include <linux/interrupt.h> /* For task queue support */ @@ -71,6 +72,44 @@ int drm_irq_by_busid(struct drm_device *dev, void *data, return 0; } +#define vblank_passed(a,b) ((long)(a - b) > 0) + +void drm_finish_pending_flip(struct drm_device *dev, + struct drm_pending_flip *f, u32 frame) +{ + struct timeval now; + + f->event.frame = frame; + do_gettimeofday(&now); + f->event.tv_sec = now.tv_sec; + f->event.tv_usec = now.tv_usec; + drm_vblank_put(dev, f->pipe); + list_del_init(&f->link); + list_add_tail(&f->pending_event.link, + &f->pending_event.file_priv->event_list); + if (f->old_fb) + f->old_fb->funcs->unpin(f->old_fb); + wake_up_interruptible(&f->pending_event.file_priv->event_wait); +} + +static void drm_flip_work_func(struct work_struct *work) +{ + struct drm_device *dev = + container_of(work, struct drm_device, flip_work); + struct drm_pending_flip *f, *t; + u32 frame; + + mutex_lock(&dev->struct_mutex); + + list_for_each_entry_safe(f, t, &dev->flip_list, link) { + frame = drm_vblank_count(dev, f->pipe); + if (vblank_passed(frame, f->frame)) + drm_finish_pending_flip(dev, f, frame); + } + + mutex_unlock(&dev->struct_mutex); +} + static void vblank_disable_fn(unsigned long arg) { struct drm_device *dev = (struct drm_device *)arg; @@ -161,6 +200,8 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs) atomic_set(&dev->vblank_refcount[i], 0); } + INIT_LIST_HEAD(&dev->flip_list); + INIT_WORK(&dev->flip_work, drm_flip_work_func); dev->vblank_disable_allowed = 0; return 0; @@ -626,5 +667,6 @@ void drm_handle_vblank(struct drm_device *dev, int crtc) { atomic_inc(&dev->_vblank_count[crtc]); DRM_WAKEUP(&dev->vbl_queue[crtc]); + schedule_work(&dev->flip_work); } EXPORT_SYMBOL(drm_handle_vblank); diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index fc4b68a..322b0f2 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -203,6 +203,7 @@ static struct drm_driver driver = { .mmap = drm_gem_mmap, .poll = drm_poll, .fasync = drm_fasync, + .read = drm_read, #ifdef CONFIG_COMPAT .compat_ioctl = i915_compat_ioctl, #endif diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 508838e..291c067 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -972,12 +972,12 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y, I915_READ(dspbase); } - intel_wait_for_vblank(dev); - if (old_fb) { intel_fb = to_intel_framebuffer(old_fb); + intel_wait_for_vblank(dev); i915_gem_object_unpin(intel_fb->obj); } + mutex_unlock(&dev->struct_mutex); if (!dev->primary->master) @@ -2840,6 +2840,7 @@ static const struct drm_crtc_funcs intel_crtc_funcs = { .gamma_set = intel_crtc_gamma_set, .set_config = drm_crtc_helper_set_config, .destroy = intel_crtc_destroy, + .set_base = drm_crtc_helper_set_base, }; @@ -2852,7 +2853,7 @@ static void intel_crtc_init(struct drm_device *dev, int pipe) if (intel_crtc == NULL) return; - drm_crtc_init(dev, &intel_crtc->base, &intel_crtc_funcs); + drm_crtc_init(dev, &intel_crtc->base, pipe, &intel_crtc_funcs); drm_mode_crtc_set_gamma_size(&intel_crtc->base, 256); intel_crtc->pipe = pipe; @@ -3071,9 +3072,18 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb, return drm_gem_handle_create(file_priv, object, handle); } +static void intel_user_framebuffer_unpin(struct drm_framebuffer *fb) +{ + struct intel_framebuffer *intel_fb; + + intel_fb = to_intel_framebuffer(fb); + i915_gem_object_unpin(intel_fb->obj); +} + static const struct drm_framebuffer_funcs intel_fb_funcs = { .destroy = intel_user_framebuffer_destroy, .create_handle = intel_user_framebuffer_create_handle, + .unpin = intel_user_framebuffer_unpin }; int intel_framebuffer_create(struct drm_device *dev, diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 3efcf1a..4d73f0b 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -171,6 +171,7 @@ static const struct drm_crtc_funcs radeon_crtc_funcs = { .gamma_set = radeon_crtc_gamma_set, .set_config = drm_crtc_helper_set_config, .destroy = radeon_crtc_destroy, + .set_base = drm_crtc_helper_set_base, }; static void radeon_crtc_init(struct drm_device *dev, int index) @@ -183,7 +184,7 @@ static void radeon_crtc_init(struct drm_device *dev, int index) if (radeon_crtc == NULL) return; - drm_crtc_init(dev, &radeon_crtc->base, &radeon_crtc_funcs); + drm_crtc_init(dev, &radeon_crtc->base, index, &radeon_crtc_funcs); drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256); radeon_crtc->crtc_id = index; diff --git a/include/drm/drm.h b/include/drm/drm.h index 7cb50bd..1920323 100644 --- a/include/drm/drm.h +++ b/include/drm/drm.h @@ -686,6 +686,7 @@ struct drm_gem_open { #define DRM_IOCTL_MODE_GETFB DRM_IOWR(0xAD, struct drm_mode_fb_cmd) #define DRM_IOCTL_MODE_ADDFB DRM_IOWR(0xAE, struct drm_mode_fb_cmd) #define DRM_IOCTL_MODE_RMFB DRM_IOWR(0xAF, unsigned int) +#define DRM_IOCTL_MODE_PAGE_FLIP DRM_IOW( 0xB0, struct drm_mode_page_flip) /** * Device specific ioctls should only be in their respective headers @@ -698,6 +699,30 @@ struct drm_gem_open { #define DRM_COMMAND_BASE 0x40 #define DRM_COMMAND_END 0xA0 +/** + * Header for events written back to userspace on the drm fd. The + * type defines the type of event, the length specifies the total + * length of the event (including the header), and user_data is + * typically a 64 bit value passed with the ioctl that triggered the + * event. A read on the drm fd will always only return complete + * events, that is, if for example the read buffer is 100 bytes, and + * there are two 64 byte events pending, only one will be returned. + */ +struct drm_event { + __u32 type; + __u32 length; +}; + +#define DRM_EVENT_MODE_PAGE_FLIP 0x01 + +struct drm_event_page_flip { + struct drm_event base; + __u64 user_data; + __u32 tv_sec; + __u32 tv_usec; + __u32 frame; +}; + /* typedef area */ #ifndef __KERNEL__ typedef struct drm_clip_rect drm_clip_rect_t; diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 45b67d9..4ff43ab 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -402,6 +402,14 @@ struct drm_buf_entry { struct drm_freelist freelist; }; +/* Event queued up for userspace to read */ +struct drm_pending_event { + struct drm_event *event; + struct list_head link; + struct drm_file *file_priv; + void (*destroy) (struct drm_pending_event *event); +}; + /** File private data */ struct drm_file { int authenticated; @@ -425,6 +433,9 @@ struct drm_file { struct drm_master *master; /* master this node is currently associated with N.B. not always minor->master */ struct list_head fbs; + + wait_queue_head_t event_wait; + struct list_head event_list; }; /** Wait queue */ @@ -873,6 +884,16 @@ struct drm_minor { struct drm_mode_group mode_group; }; +struct drm_pending_flip { + struct drm_pending_event pending_event; + struct drm_framebuffer *old_fb; + struct drm_crtc *crtc; + u32 frame; + int pipe; + struct list_head link; + struct drm_event_page_flip event; +}; + /** * DRM device structure. This structure represent a complete card that * may contain multiple heads. @@ -972,6 +993,13 @@ struct drm_device { u32 max_vblank_count; /**< size of vblank counter register */ + struct work_struct flip_work; + + /** + * List of objects waiting on flip completion + */ + struct list_head flip_list; + /*@} */ cycles_t ctx_start; cycles_t lck_start; @@ -1108,6 +1136,8 @@ extern int drm_lastclose(struct drm_device *dev); extern int drm_open(struct inode *inode, struct file *filp); extern int drm_stub_open(struct inode *inode, struct file *filp); extern int drm_fasync(int fd, struct file *filp, int on); +extern ssize_t drm_read(struct file *filp, char __user *buffer, + size_t count, loff_t *offset); extern int drm_release(struct inode *inode, struct file *filp); /* Mapping support (drm_vm.h) */ @@ -1274,6 +1304,8 @@ extern void drm_vblank_pre_modeset(struct drm_device *dev, int crtc); extern void drm_vblank_post_modeset(struct drm_device *dev, int crtc); extern int drm_modeset_ctl(struct drm_device *dev, void *data, struct drm_file *file_priv); +extern void drm_finish_pending_flip(struct drm_device *dev, + struct drm_pending_flip *f, u32 frame); /* AGP/GART support (drm_agpsupport.h) */ extern struct drm_agp_head *drm_agp_init(struct drm_device *dev); diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h index 7300fb8..0b5dc47 100644 --- a/include/drm/drm_crtc.h +++ b/include/drm/drm_crtc.h @@ -238,6 +238,12 @@ struct drm_display_info { }; struct drm_framebuffer_funcs { + /* + * Unpin the old fb after setting a mode. Must be called + * after the old framebuffer is no longer visible, ie, after + * the next vblank, typically. + */ + void (*unpin)(struct drm_framebuffer *fb); void (*destroy)(struct drm_framebuffer *framebuffer); int (*create_handle)(struct drm_framebuffer *fb, struct drm_file *file_priv, @@ -288,6 +294,7 @@ struct drm_property { struct drm_crtc; struct drm_connector; struct drm_encoder; +struct drm_pending_flip; /** * drm_crtc_funcs - control CRTCs for a given device @@ -331,17 +338,29 @@ struct drm_crtc_funcs { void (*destroy)(struct drm_crtc *crtc); int (*set_config)(struct drm_mode_set *set); + + /* + * Move the crtc on the current fb to the given position. + * This function is optional. If old_fb is provided, the + * function will wait for vblank and unpin it. If old_fb is + * NULL, nothing is unpinned and the caller must call + * mode_unpin_fb to release the old framebuffer. + */ + int (*set_base)(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb); }; /** * drm_crtc - central CRTC control structure * @enabled: is this CRTC enabled? + * @pipe: pipe number (as seen by DRM vblank functions) * @x: x position on screen * @y: y position on screen * @desired_mode: new desired mode * @desired_x: desired x for desired_mode * @desired_y: desired y for desired_mode * @funcs: CRTC control functions + * @async_work: work queue for async set base calls * * Each CRTC may have one or more connectors associated with it. This structure * allows the CRTC to be controlled. @@ -359,6 +378,7 @@ struct drm_crtc { struct drm_display_mode mode; + int pipe; int x, y; struct drm_display_mode *desired_mode; int desired_x, desired_y; @@ -368,6 +388,10 @@ struct drm_crtc { uint32_t gamma_size; uint16_t *gamma_store; + /* Allow async set_pipe_base calls for flipping */ + struct work_struct async_flip; + struct drm_pending_flip *pending_flip; + /* if you are using the helper */ void *helper_private; }; @@ -589,6 +613,7 @@ struct drm_mode_config { extern void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, + int pipe, const struct drm_crtc_funcs *funcs); extern void drm_crtc_cleanup(struct drm_crtc *crtc); @@ -736,4 +761,6 @@ extern int drm_mode_gamma_get_ioctl(struct drm_device *dev, extern int drm_mode_gamma_set_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); extern bool drm_detect_hdmi_monitor(struct edid *edid); +extern int drm_mode_page_flip_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); #endif /* __DRM_CRTC_H__ */ diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h index 6769ff6..dd10566 100644 --- a/include/drm/drm_crtc_helper.h +++ b/include/drm/drm_crtc_helper.h @@ -123,4 +123,8 @@ static inline void drm_connector_helper_add(struct drm_connector *connector, } extern int drm_helper_resume_force_mode(struct drm_device *dev); + +extern int drm_crtc_helper_set_base(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb); + #endif diff --git a/include/drm/drm_mode.h b/include/drm/drm_mode.h index ae304cc..464b779 100644 --- a/include/drm/drm_mode.h +++ b/include/drm/drm_mode.h @@ -265,4 +265,20 @@ struct drm_mode_crtc_lut { __u64 blue; }; +#define DRM_MODE_PAGE_FLIP_WAIT (1<<0) /* block on previous page flip */ +#define DRM_MODE_PAGE_FLIP_FLAGS_MASK (DRM_MODE_PAGE_FLIP_WAIT) + +struct drm_mode_page_flip { + /** Handle of new front buffer */ + __u32 fb_id; + __u32 crtc_id; + + /* 64 bit cookie returned to userspace in the page flip event. */ + __u64 user_data; + /** + * page flip flags (wait on flip only for now) + */ + __u32 flags; +}; + #endif -- 1.6.4 |
From: Thomas H. <th...@sh...> - 2009-08-27 12:53:42
|
Kristian Høgsberg skrev: > --- > > Okay, here's the patch again with the locking fixed. Everything else is > the same. As for the potential problem with polling on the drm fd with > both dri1 and dri2 enabled, I suggest that we just disable the dri1 > sigio handler in the xserver. We'll do this in a commit that precedes the > merge of the page flipping code, so that no server will be able to poll > on the drm fd in both dri1 and dri2 code. > > cheers, > Kristian > <Grr> 1) The locking problems are still not fixed. See below. I just did a quick look, so the comment list is not complete. 2) The 64/32-bit EFAULT problems mentioned are ignored. See below. </Grr> Even if that is fixed, I believe this patch adds a framework that doesn't belong in a modern day graphical subsystem for the following reasons, many of them already mentioned, but let me reiterate. Although the dri1 compatibility issue will be fixed with your suggestion. a) It puts requirements on user-space for correct functionality of DRM ioctls. I'm talking about the master needing to parse the event queue. b) It requires the master to act as a scheduler, and circumvents the DRM command submission mechanism through the delayed unpin callback. If this is to workaround any inability of GEM to serve a command submission while a previous command submission is blocked in the kernel, then IMHO that should be fixed and not worked around. c) The implementation will mostly be worked around with capable schedulers and hardware. A couple of questions: Why are you guys so reluctant to use kernel scheduling for this instead of a mix of kernel / user-space scheduling? If the plan is to eliminate DRI2GetBuffers() once per frame, what will then be used to block clients rendering to the old back buffer? > drivers/gpu/drm/drm_crtc.c | 171 ++++++++++++++++++++++++++++++- > drivers/gpu/drm/drm_crtc_helper.c | 10 ++ > drivers/gpu/drm/drm_drv.c | 1 + > drivers/gpu/drm/drm_fops.c | 68 ++++++++++++- > drivers/gpu/drm/drm_irq.c | 42 ++++++++ > drivers/gpu/drm/i915/i915_drv.c | 1 + > drivers/gpu/drm/i915/intel_display.c | 16 +++- > drivers/gpu/drm/radeon/radeon_display.c | 3 +- > include/drm/drm.h | 25 +++++ > include/drm/drmP.h | 32 ++++++ > include/drm/drm_crtc.h | 27 +++++ > include/drm/drm_crtc_helper.h | 4 + > include/drm/drm_mode.h | 16 +++ > 13 files changed, 409 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c > index 8fab789..d877c21 100644 > --- a/drivers/gpu/drm/drm_crtc.c > +++ b/drivers/gpu/drm/drm_crtc.c > @@ -34,6 +34,8 @@ > #include "drmP.h" > #include "drm_crtc.h" > > +#undef set_base > + > struct drm_prop_enum_list { > int type; > char *name; > @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb) > EXPORT_SYMBOL(drm_framebuffer_cleanup); > > /** > + * drm_crtc_async_flip - do a set_base call from a work queue > + * @work: work struct > + * > + * Called when a set_base call is queued by the page flip code. This > + * allows the flip ioctl itself to return immediately and allow userspace > + * to continue working. > + */ > +static void drm_crtc_async_flip(struct work_struct *work) > +{ > + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip); > + struct drm_device *dev = crtc->dev; > + struct drm_pending_flip *pending; > + > + mutex_lock(&dev->mode_config.mutex); > + > + BUG_ON(crtc->pending_flip == NULL); > + > + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL); > + > + pending = crtc->pending_flip; > + crtc->pending_flip = NULL; > + > + pending->frame = drm_vblank_count(dev, crtc->pipe); > What's protecting dev->flip_list here? > + list_add_tail(&pending->link, &dev->flip_list); > + > + mutex_unlock(&dev->mode_config.mutex); > +} > + > > + > > #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > index 251bc0e..dcd9c66 100644 > --- a/drivers/gpu/drm/drm_fops.c > +++ b/drivers/gpu/drm/drm_fops.c > @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct file *filp, > > INIT_LIST_HEAD(&priv->lhead); > INIT_LIST_HEAD(&priv->fbs); > + INIT_LIST_HEAD(&priv->event_list); > + init_waitqueue_head(&priv->event_wait); > > if (dev->driver->driver_features & DRIVER_GEM) > drm_gem_open(dev, priv); > @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp) > { > struct drm_file *file_priv = filp->private_data; > struct drm_device *dev = file_priv->minor->dev; > + struct drm_pending_flip *f, *ft; > + struct drm_pending_event *e, *et; > + > int retcode = 0; > > lock_kernel(); > @@ -451,6 +456,19 @@ int drm_release(struct inode *inode, struct file *filp) > if (file_priv->minor->master) > drm_master_release(dev, filp); > > And here? What's the locking order between mode_config mutex and the struct_mutex? > + mutex_lock(&dev->struct_mutex); > + > + /* Remove pending flips */ > + list_for_each_entry_safe(f, ft, &dev->flip_list, link) > + if (f->pending_event.file_priv == file_priv) > + drm_finish_pending_flip(dev, f, 0); > + > + /* Remove unconsumed events */ > + list_for_each_entry_safe(e, et, &file_priv->event_list, link) > + e->destroy(e); > + > + mutex_unlock(&dev->struct_mutex); > + > if (dev->driver->driver_features & DRIVER_GEM) > drm_gem_release(dev, file_priv); > > @@ -544,9 +562,55 @@ int drm_release(struct inode *inode, struct file *filp) > } > EXPORT_SYMBOL(drm_release); > > -/** No-op. */ > +ssize_t drm_read(struct file *filp, char __user *buffer, > + size_t count, loff_t *offset) > +{ > + struct drm_file *file_priv = filp->private_data; > + struct drm_device *dev = file_priv->minor->dev; > + struct drm_pending_event *event; > + ssize_t total, ret; > + > + ret = wait_event_interruptible(file_priv->event_wait, > + !list_empty(&file_priv->event_list)); > + if (ret < 0) > + return ret; > + > + total = 0; > What's protecting file_priv->event_list here? > + while (!list_empty(&file_priv->event_list)) { > + mutex_lock(&dev->struct_mutex); > + event = list_first_entry(&file_priv->event_list, > + struct drm_pending_event, link); > + if (total + event->event->length > count) { > + mutex_unlock(&dev->struct_mutex); > + break; > + } > + list_del(&event->link); > + mutex_unlock(&dev->struct_mutex); > + > + if (copy_to_user(buffer + total, > + event->event, event->event->length)) { > + total = -EFAULT; > + break; > + } > + > + total += event->event->length; > + event->destroy(event); > + } > + > + return total; > +} > +EXPORT_SYMBOL(drm_read); > + > unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait) > { > - return 0; > + struct drm_file *file_priv = filp->private_data; > + unsigned int mask = 0; > + > + poll_wait(filp, &file_priv->event_wait, wait); > + > And here? > + if (!list_empty(&file_priv->event_list)) > + mask |= POLLIN | POLLRDNORM; > + > + return mask; > } > EXPORT_SYMBOL(drm_poll); > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > index b4a3dbc..a0c120c 100644 > --- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -34,6 +34,7 @@ > */ > > #include "drmP.h" > +#include "drm_crtc_helper.h" > > #include <linux/interrupt.h> /* For task queue support */ > > @@ -71,6 +72,44 @@ int drm_irq_by_busid(struct drm_device *dev, void *data, > return 0; > } > > +#define vblank_passed(a,b) ((long)(a - b) > 0) > + > +void drm_finish_pending_flip(struct drm_device *dev, > + struct drm_pending_flip *f, u32 frame) > +{ > + struct timeval now; > + > + f->event.frame = frame; > + do_gettimeofday(&now); > + f->event.tv_sec = now.tv_sec; > + f->event.tv_usec = now.tv_usec; > + drm_vblank_put(dev, f->pipe); > + list_del_init(&f->link); > What's protecting the file_priv->event_list here? > + list_add_tail(&f->pending_event.link, > + &f->pending_event.file_priv->event_list); > + if (f->old_fb) > + f->old_fb->funcs->unpin(f->old_fb); > + wake_up_interruptible(&f->pending_event.file_priv->event_wait); > +} > + > +/** > + * Header for events written back to userspace on the drm fd. The > + * type defines the type of event, the length specifies the total > + * length of the event (including the header), and user_data is > + * typically a 64 bit value passed with the ioctl that triggered the > + * event. A read on the drm fd will always only return complete > + * events, that is, if for example the read buffer is 100 bytes, and > + * there are two 64 byte events pending, only one will be returned. > + */ > +struct drm_event { > + __u32 type; > + __u32 length; > +}; > + > +#define DRM_EVENT_MODE_PAGE_FLIP 0x01 > What is user-space required to do if kernel- and user-space disagrees about the size of the event, as may be the case here: > + > +struct drm_event_page_flip { > + struct drm_event base; > + __u64 user_data; > + __u32 tv_sec; > + __u32 tv_usec; > + __u32 frame; > +}; > + > > The size of this struct differs in 64-bit and 32-bit mode. May cause EFAULTs. > +struct drm_mode_page_flip { > + /** Handle of new front buffer */ > + __u32 fb_id; > + __u32 crtc_id; > + > + /* 64 bit cookie returned to userspace in the page flip event. */ > + __u64 user_data; > + /** > + * page flip flags (wait on flip only for now) > + */ > + __u32 flags; > +}; > + > #endif > cheers, Thomas. . |
From: Kristian H. <kr...@bi...> - 2009-08-27 21:33:32
|
2009/8/27 Thomas Hellström <th...@sh...>: > Kristian Høgsberg skrev: >> --- >> >> Okay, here's the patch again with the locking fixed. Everything else is >> the same. As for the potential problem with polling on the drm fd with >> both dri1 and dri2 enabled, I suggest that we just disable the dri1 >> sigio handler in the xserver. We'll do this in a commit that precedes the >> merge of the page flipping code, so that no server will be able to poll >> on the drm fd in both dri1 and dri2 code. >> >> cheers, >> Kristian >> > <Grr> > > 1) The locking problems are still not fixed. See below. I just did a > quick look, so the comment list is not complete. It's not as bad as you think. There is one glitch where I changed struct_mutex to mode_config.mutex and left dev->flip_list unprotected. The other issues you comment on should be fine, I've followed up in detail below. > 2) The 64/32-bit EFAULT problems mentioned are ignored. See below. Not ignored, they were good comments, I just forgot about them in the heat of the discussion. I'll pad the structs. > </Grr> > > Even if that is fixed, I believe this patch adds a framework that > doesn't belong in a modern day graphical subsystem for the following > reasons, many of them already mentioned, but let me reiterate. Although > the dri1 compatibility issue will be fixed with your suggestion. > > a) It puts requirements on user-space for correct functionality of DRM > ioctls. I'm talking about the master needing to parse the event queue. Don't we already have a lot of requirements on userspace to use the DRM? It's not exactly trivial to prepare a execbuffer ioctl, or to find a connector, crtc and mode and set that, for example. Part of this work is a new libdrm entry point to read an parse the events from the fd. See the dri2-swapbuffer branch in drm.git. > b) It requires the master to act as a scheduler, and circumvents the DRM > command submission mechanism through the delayed unpin callback. If this > is to workaround any inability of GEM to serve a command submission > while a previous command submission is blocked in the kernel, then IMHO > that should be fixed and not worked around. It's not about workarounds. Your suggestion *blocks the hw* while waiting for vsync. My patch doesn't do that, it lets other clients submit rendering to pixmap or backbuffer BOs that aren't involved in the pending page flip. If you're willing to block the command stream, GEM can keep the buffer pinned for you until the swap happens just fine. Just like it does for any other command - it's not about that. In fact, I think using the scheduler to keep buffers pinned for scanout is conflating things. The scheduler pulls buffers in and out of the aperture so that they are there for the GPU when it needs to access them. Pinning and unpinning buffers for scanout is a different matter. > c) The implementation will mostly be worked around with capable > schedulers and hardware. This is not about the capability of the hw or the sw. The design is deliberate. > A couple of questions: > Why are you guys so reluctant to use kernel scheduling for this instead > of a mix of kernel / user-space scheduling? I'm not opposed to kernel scheduling as well, but userspace needs this event. I've made the case for AIGLX in the X server already, and direct rendering clients (for example, compositors) that want to handle input for as long as possible and avoid blocking has the same needs. > If the plan is to eliminate DRI2GetBuffers() once per frame, what will > then be used to block clients rendering to the old back buffer? There'll be an event that's sent back after each DRI2SwapBuffer and the clients will block on receiving that event. We still need to send a request to the xserver and receive confirmation that the xserver has received it before we can render again. DRI2GetBuffers is a request that expects a reply and will block the client on the xserver when we call it. DRI2SwapBuffers is an async request, ie there's no reply and calling it wont block necessarily the client. We still have to wait for the new event before we can go on rendering, but doing it this way makes the client and server less tightly coupled. We may end up doing the roundtrip between client and server at a point where the client was going to block anyway (like disk i/o or something) saving a context switch. > > > >> drivers/gpu/drm/drm_crtc.c | 171 ++++++++++++++++++++++++++++++- >> drivers/gpu/drm/drm_crtc_helper.c | 10 ++ >> drivers/gpu/drm/drm_drv.c | 1 + >> drivers/gpu/drm/drm_fops.c | 68 ++++++++++++- >> drivers/gpu/drm/drm_irq.c | 42 ++++++++ >> drivers/gpu/drm/i915/i915_drv.c | 1 + >> drivers/gpu/drm/i915/intel_display.c | 16 +++- >> drivers/gpu/drm/radeon/radeon_display.c | 3 +- >> include/drm/drm.h | 25 +++++ >> include/drm/drmP.h | 32 ++++++ >> include/drm/drm_crtc.h | 27 +++++ >> include/drm/drm_crtc_helper.h | 4 + >> include/drm/drm_mode.h | 16 +++ >> 13 files changed, 409 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c >> index 8fab789..d877c21 100644 >> --- a/drivers/gpu/drm/drm_crtc.c >> +++ b/drivers/gpu/drm/drm_crtc.c >> @@ -34,6 +34,8 @@ >> #include "drmP.h" >> #include "drm_crtc.h" >> >> +#undef set_base >> + >> struct drm_prop_enum_list { >> int type; >> char *name; >> @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb) >> EXPORT_SYMBOL(drm_framebuffer_cleanup); >> >> /** >> + * drm_crtc_async_flip - do a set_base call from a work queue >> + * @work: work struct >> + * >> + * Called when a set_base call is queued by the page flip code. This >> + * allows the flip ioctl itself to return immediately and allow userspace >> + * to continue working. >> + */ >> +static void drm_crtc_async_flip(struct work_struct *work) >> +{ >> + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip); >> + struct drm_device *dev = crtc->dev; >> + struct drm_pending_flip *pending; >> + >> + mutex_lock(&dev->mode_config.mutex); >> + >> + BUG_ON(crtc->pending_flip == NULL); >> + >> + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL); >> + >> + pending = crtc->pending_flip; >> + crtc->pending_flip = NULL; >> + >> + pending->frame = drm_vblank_count(dev, crtc->pipe); >> > > What's protecting dev->flip_list here? That should be struct_mutex. As mentioned above, I missed it when I changed the mutex to mode_config.mutex. >> + list_add_tail(&pending->link, &dev->flip_list); >> + >> + mutex_unlock(&dev->mode_config.mutex); >> +} >> + >> >> + > >> >> #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) >> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c >> index 251bc0e..dcd9c66 100644 >> --- a/drivers/gpu/drm/drm_fops.c >> +++ b/drivers/gpu/drm/drm_fops.c >> @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct file *filp, >> >> INIT_LIST_HEAD(&priv->lhead); >> INIT_LIST_HEAD(&priv->fbs); >> + INIT_LIST_HEAD(&priv->event_list); >> + init_waitqueue_head(&priv->event_wait); >> >> if (dev->driver->driver_features & DRIVER_GEM) >> drm_gem_open(dev, priv); >> @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp) >> { >> struct drm_file *file_priv = filp->private_data; >> struct drm_device *dev = file_priv->minor->dev; >> + struct drm_pending_flip *f, *ft; >> + struct drm_pending_event *e, *et; >> + >> int retcode = 0; >> >> lock_kernel(); >> @@ -451,6 +456,19 @@ int drm_release(struct inode *inode, struct file *filp) >> if (file_priv->minor->master) >> drm_master_release(dev, filp); >> >> > > And here? What's the locking order between mode_config mutex and the > struct_mutex? Here it's protected by struct_mutex as intended. What is your concern about the locking order? >> + mutex_lock(&dev->struct_mutex); >> + >> + /* Remove pending flips */ >> + list_for_each_entry_safe(f, ft, &dev->flip_list, link) >> + if (f->pending_event.file_priv == file_priv) >> + drm_finish_pending_flip(dev, f, 0); >> + >> + /* Remove unconsumed events */ >> + list_for_each_entry_safe(e, et, &file_priv->event_list, link) >> + e->destroy(e); >> + >> + mutex_unlock(&dev->struct_mutex); >> + >> if (dev->driver->driver_features & DRIVER_GEM) >> drm_gem_release(dev, file_priv); >> >> @@ -544,9 +562,55 @@ int drm_release(struct inode *inode, struct file *filp) >> } >> EXPORT_SYMBOL(drm_release); >> >> -/** No-op. */ >> +ssize_t drm_read(struct file *filp, char __user *buffer, >> + size_t count, loff_t *offset) >> +{ >> + struct drm_file *file_priv = filp->private_data; >> + struct drm_device *dev = file_priv->minor->dev; >> + struct drm_pending_event *event; >> + ssize_t total, ret; >> + >> + ret = wait_event_interruptible(file_priv->event_wait, >> + !list_empty(&file_priv->event_list)); >> + if (ret < 0) >> + return ret; >> + >> + total = 0; >> > > What's protecting file_priv->event_list here? You can test for list emptiness without taking the lock. >> + while (!list_empty(&file_priv->event_list)) { >> + mutex_lock(&dev->struct_mutex); >> + event = list_first_entry(&file_priv->event_list, >> + struct drm_pending_event, link); >> + if (total + event->event->length > count) { >> + mutex_unlock(&dev->struct_mutex); >> + break; >> + } >> + list_del(&event->link); >> + mutex_unlock(&dev->struct_mutex); >> + >> + if (copy_to_user(buffer + total, >> + event->event, event->event->length)) { >> + total = -EFAULT; >> + break; >> + } >> + >> + total += event->event->length; >> + event->destroy(event); >> + } >> + >> + return total; >> +} >> +EXPORT_SYMBOL(drm_read); >> + >> unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait) >> { >> - return 0; >> + struct drm_file *file_priv = filp->private_data; >> + unsigned int mask = 0; >> + >> + poll_wait(filp, &file_priv->event_wait, wait); >> + >> > > And here? Again, testing list emptiness. >> + if (!list_empty(&file_priv->event_list)) >> + mask |= POLLIN | POLLRDNORM; >> + >> + return mask; >> } >> EXPORT_SYMBOL(drm_poll); >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >> index b4a3dbc..a0c120c 100644 >> --- a/drivers/gpu/drm/drm_irq.c >> +++ b/drivers/gpu/drm/drm_irq.c >> @@ -34,6 +34,7 @@ >> */ >> >> #include "drmP.h" >> +#include "drm_crtc_helper.h" >> >> #include <linux/interrupt.h> /* For task queue support */ >> >> @@ -71,6 +72,44 @@ int drm_irq_by_busid(struct drm_device *dev, void *data, >> return 0; >> } >> >> +#define vblank_passed(a,b) ((long)(a - b) > 0) >> + >> +void drm_finish_pending_flip(struct drm_device *dev, >> + struct drm_pending_flip *f, u32 frame) >> +{ >> + struct timeval now; >> + >> + f->event.frame = frame; >> + do_gettimeofday(&now); >> + f->event.tv_sec = now.tv_sec; >> + f->event.tv_usec = now.tv_usec; >> + drm_vblank_put(dev, f->pipe); >> + list_del_init(&f->link); >> > > What's protecting the file_priv->event_list here? This function is always called with the struct_mutex held. >> + list_add_tail(&f->pending_event.link, >> + &f->pending_event.file_priv->event_list); >> + if (f->old_fb) >> + f->old_fb->funcs->unpin(f->old_fb); >> + wake_up_interruptible(&f->pending_event.file_priv->event_wait); >> +} >> + >> +/** >> + * Header for events written back to userspace on the drm fd. The >> + * type defines the type of event, the length specifies the total >> + * length of the event (including the header), and user_data is >> + * typically a 64 bit value passed with the ioctl that triggered the >> + * event. A read on the drm fd will always only return complete >> + * events, that is, if for example the read buffer is 100 bytes, and >> + * there are two 64 byte events pending, only one will be returned. >> + */ >> +struct drm_event { >> + __u32 type; >> + __u32 length; >> +}; >> + >> +#define DRM_EVENT_MODE_PAGE_FLIP 0x01 >> > > What is user-space required to do if kernel- and user-space disagrees > about the size of the event, as may be the case here: There's a 'length' field in the event header. >> + >> +struct drm_event_page_flip { >> + struct drm_event base; >> + __u64 user_data; >> + __u32 tv_sec; >> + __u32 tv_usec; >> + __u32 frame; >> +}; >> + >> >> > > > The size of this struct differs in 64-bit and 32-bit mode. May cause > EFAULTs. > > >> +struct drm_mode_page_flip { >> + /** Handle of new front buffer */ >> + __u32 fb_id; >> + __u32 crtc_id; >> + >> + /* 64 bit cookie returned to userspace in the page flip event. */ >> + __u64 user_data; >> + /** >> + * page flip flags (wait on flip only for now) >> + */ >> + __u32 flags; >> +}; >> + >> #endif >> > cheers, > Thomas. > . > > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > -- > _______________________________________________ > Dri-devel mailing list > Dri...@li... > https://lists.sourceforge.net/lists/listinfo/dri-devel > |
From: Thomas H. <th...@sh...> - 2009-08-28 08:15:38
|
Kristian Høgsberg skrev: > 2009/8/27 Thomas Hellström <th...@sh...>: > >> Kristian Høgsberg skrev: >> >>> --- >>> >>> Okay, here's the patch again with the locking fixed. Everything else is >>> the same. As for the potential problem with polling on the drm fd with >>> both dri1 and dri2 enabled, I suggest that we just disable the dri1 >>> sigio handler in the xserver. We'll do this in a commit that precedes the >>> merge of the page flipping code, so that no server will be able to poll >>> on the drm fd in both dri1 and dri2 code. >>> >>> cheers, >>> Kristian >>> >>> >> <Grr> >> >> 1) The locking problems are still not fixed. See below. I just did a >> quick look, so the comment list is not complete. >> > > It's not as bad as you think. There is one glitch where I changed > struct_mutex to mode_config.mutex and left dev->flip_list unprotected. > The other issues you comment on should be fine, I've followed up in > detail below. > > I actually think it is. See below. > > >> </Grr> >> >> Even if that is fixed, I believe this patch adds a framework that >> doesn't belong in a modern day graphical subsystem for the following >> reasons, many of them already mentioned, but let me reiterate. Although >> the dri1 compatibility issue will be fixed with your suggestion. >> >> a) It puts requirements on user-space for correct functionality of DRM >> ioctls. I'm talking about the master needing to parse the event queue. >> > > Don't we already have a lot of requirements on userspace to use the > DRM? It's not exactly trivial to prepare a execbuffer ioctl, or to > find a connector, crtc and mode and set that, for example. Part of > this work is a new libdrm entry point to read an parse the events from > the fd. See the dri2-swapbuffer branch in drm.git. > > >> b) It requires the master to act as a scheduler, and circumvents the DRM >> command submission mechanism through the delayed unpin callback. If this >> is to workaround any inability of GEM to serve a command submission >> while a previous command submission is blocked in the kernel, then IMHO >> that should be fixed and not worked around. >> > > It's not about workarounds. Your suggestion *blocks the hw* while > waiting for vsync. No it doesn't. It blocks dri clients when they try to render to old fronts. Other dri clients would continue rendering. It provides a natural migration path to triple buffering where automagically nothing is blocked, and also to advanced software schedulers that can buffer command submissions instead of blocking. Now the concern about GEM was that if the kernel takes a global mutex _before_ blocking a client and doesn't release that mutex when the client is blocked, all rendering will naturally be blocked as a consequence. What my suggestion *does* is to block the X server if AIGLX tries to render to the old front, and the command scheduler is simple. Now I'm not completely sure how serious that is, given that AIGLX will, as stated before, frequently block the X server anyway since it's not running in a separate thread. >> c) The implementation will mostly be worked around with capable >> schedulers and hardware. >> > > This is not about the capability of the hw or the sw. The design is deliberate. > What I mean is that if I don't want the kernel code to do delayed unpins, because my cs already handles that, and I don't want the X server to block clients because my cs or hardware will handle that, I would do my very best to work around this code. >> A couple of questions: >> Why are you guys so reluctant to use kernel scheduling for this instead >> of a mix of kernel / user-space scheduling? >> > > I'm not opposed to kernel scheduling as well, but userspace needs this > event. I've made the case for AIGLX in the X server already, and > direct rendering clients (for example, compositors) that want to > handle input for as long as possible and avoid blocking has the same > needs. > I've still failed to understand this. Let's _assume_ for a while that the kernel handles scheduling perfectly in a non-blocking fashion, or let's assume we have triple-buffering, or let's assume we have a multi-FIFO card that can do pageflipping and vblank barriers on all FIFOs. Why then exactly are events needed? and why are we required to track the progress of the command fifo with events like jbarnes suggests, and finally why is this mechanism not needed in the non-pageflipping case? If you can give a typical use-case that would probably help a lot and avoid confusion. > >> If the plan is to eliminate DRI2GetBuffers() once per frame, what will >> then be used to block clients rendering to the old back buffer? >> > > There'll be an event that's sent back after each DRI2SwapBuffer and > the clients will block on receiving that event. We still need to send > a request to the xserver and receive confirmation that the xserver has > received it before we can render again. The above is to make sure the swap is scheduled before any continued rendering, right? > DRI2GetBuffers is a request > that expects a reply and will block the client on the xserver when we > call it. DRI2SwapBuffers is an async request, ie there's no reply and > calling it wont block necessarily the client. We still have to wait > for the new event before we can go on rendering, but doing it this way > makes the client and server less tightly coupled. We may end up doing > the roundtrip between client and server at a point where the client > was going to block anyway (like disk i/o or something) saving a > context switch. > > Hmm. I don't understand fully. So up to now, my picture of how a frame was rendered looks like this. swapBuffers(); if (check_for_needed_getbuffers()) getbuffers(); render(); swapBuffers(); This is one X call per frame in the steady-state case. Now, where do you add the dri2 pageflip throttling if we don't need to call getbuffers()? Is it in check_for_needed_getbuffers()? >> >> >>> drivers/gpu/drm/drm_crtc.c | 171 ++++++++++++++++++++++++++++++- >>> drivers/gpu/drm/drm_crtc_helper.c | 10 ++ >>> drivers/gpu/drm/drm_drv.c | 1 + >>> drivers/gpu/drm/drm_fops.c | 68 ++++++++++++- >>> drivers/gpu/drm/drm_irq.c | 42 ++++++++ >>> drivers/gpu/drm/i915/i915_drv.c | 1 + >>> drivers/gpu/drm/i915/intel_display.c | 16 +++- >>> drivers/gpu/drm/radeon/radeon_display.c | 3 +- >>> include/drm/drm.h | 25 +++++ >>> include/drm/drmP.h | 32 ++++++ >>> include/drm/drm_crtc.h | 27 +++++ >>> include/drm/drm_crtc_helper.h | 4 + >>> include/drm/drm_mode.h | 16 +++ >>> 13 files changed, 409 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c >>> index 8fab789..d877c21 100644 >>> --- a/drivers/gpu/drm/drm_crtc.c >>> +++ b/drivers/gpu/drm/drm_crtc.c >>> @@ -34,6 +34,8 @@ >>> #include "drmP.h" >>> #include "drm_crtc.h" >>> >>> +#undef set_base >>> + >>> struct drm_prop_enum_list { >>> int type; >>> char *name; >>> @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb) >>> EXPORT_SYMBOL(drm_framebuffer_cleanup); >>> >>> /** >>> + * drm_crtc_async_flip - do a set_base call from a work queue >>> + * @work: work struct >>> + * >>> + * Called when a set_base call is queued by the page flip code. This >>> + * allows the flip ioctl itself to return immediately and allow userspace >>> + * to continue working. >>> + */ >>> +static void drm_crtc_async_flip(struct work_struct *work) >>> +{ >>> + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip); >>> + struct drm_device *dev = crtc->dev; >>> + struct drm_pending_flip *pending; >>> + >>> + mutex_lock(&dev->mode_config.mutex); >>> + >>> + BUG_ON(crtc->pending_flip == NULL); >>> + >>> + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL); >>> + >>> + pending = crtc->pending_flip; >>> + crtc->pending_flip = NULL; >>> + >>> + pending->frame = drm_vblank_count(dev, crtc->pipe); >>> >>> >> What's protecting dev->flip_list here? >> > > That should be struct_mutex. As mentioned above, I missed it when I > changed the mutex to mode_config.mutex. > Hence the locking order question. If there is a need to keep the mode_config mutex held _as well_ across the list insertion, since crtc->pending_flip.link is modified at list insertion. That really depends on what's protecting that member. > >>> + list_add_tail(&pending->link, &dev->flip_list); >>> + >>> + mutex_unlock(&dev->mode_config.mutex); >>> +} >>> + >>> >>> + >>> >>> #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) >>> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c >>> index 251bc0e..dcd9c66 100644 >>> --- a/drivers/gpu/drm/drm_fops.c >>> +++ b/drivers/gpu/drm/drm_fops.c >>> @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct file *filp, >>> >>> INIT_LIST_HEAD(&priv->lhead); >>> INIT_LIST_HEAD(&priv->fbs); >>> + INIT_LIST_HEAD(&priv->event_list); >>> + init_waitqueue_head(&priv->event_wait); >>> >>> if (dev->driver->driver_features & DRIVER_GEM) >>> drm_gem_open(dev, priv); >>> @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp) >>> { >>> struct drm_file *file_priv = filp->private_data; >>> struct drm_device *dev = file_priv->minor->dev; >>> + struct drm_pending_flip *f, *ft; >>> + struct drm_pending_event *e, *et; >>> + >>> int retcode = 0; >>> >>> lock_kernel(); >>> @@ -451,6 +456,19 @@ int drm_release(struct inode *inode, struct file *filp) >>> if (file_priv->minor->master) >>> drm_master_release(dev, filp); >>> >>> >>> >> And here? What's the locking order between mode_config mutex and the >> struct_mutex? >> > > Here it's protected by struct_mutex as intended. What is your concern > about the locking order? > > See above. >>> + mutex_lock(&dev->struct_mutex); >>> + >>> + /* Remove pending flips */ >>> + list_for_each_entry_safe(f, ft, &dev->flip_list, link) >>> + if (f->pending_event.file_priv == file_priv) >>> + drm_finish_pending_flip(dev, f, 0); >>> + >>> + /* Remove unconsumed events */ >>> + list_for_each_entry_safe(e, et, &file_priv->event_list, link) >>> + e->destroy(e); >>> + >>> + mutex_unlock(&dev->struct_mutex); >>> + >>> if (dev->driver->driver_features & DRIVER_GEM) >>> drm_gem_release(dev, file_priv); >>> >>> @@ -544,9 +562,55 @@ int drm_release(struct inode *inode, struct file *filp) >>> } >>> EXPORT_SYMBOL(drm_release); >>> >>> -/** No-op. */ >>> +ssize_t drm_read(struct file *filp, char __user *buffer, >>> + size_t count, loff_t *offset) >>> +{ >>> + struct drm_file *file_priv = filp->private_data; >>> + struct drm_device *dev = file_priv->minor->dev; >>> + struct drm_pending_event *event; >>> + ssize_t total, ret; >>> + >>> + ret = wait_event_interruptible(file_priv->event_wait, >>> + !list_empty(&file_priv->event_list)); >>> + if (ret < 0) >>> + return ret; >>> + >>> + total = 0; >>> >>> >> What's protecting file_priv->event_list here? >> > > You can test for list emptiness without taking the lock. > Are you suggesting accessing a member of a mutex protected struct without taking the mutex? Do you have a pointer to where what you say above is documented? That would be highly architecture dependent and in this particular case require the processor being capable of atomic pointer reads and writes, no processor r/w reordering and the compiler assuming the list head being declared volatile. Even if that's the case let's assume list_empty() returns true, and then another thread sneaks in and empties the queue before you take the mutex. The next thing you do is to access and try to remove "event", which doesn't exist because the list is empty. If you're lucky you'll see an oops. If not, you end up with strange memory corruption. > >>> + while (!list_empty(&file_priv->event_list)) { >>> + mutex_lock(&dev->struct_mutex); >>> + event = list_first_entry(&file_priv->event_list, >>> + struct drm_pending_event, link); >>> + if (total + event->event->length > count) { >>> + mutex_unlock(&dev->struct_mutex); >>> + break; >>> + } >>> + list_del(&event->link); >>> + mutex_unlock(&dev->struct_mutex); >>> + >>> + if (copy_to_user(buffer + total, >>> + event->event, event->event->length)) { >>> + total = -EFAULT; >>> + break; >>> + } >>> + >>> + total += event->event->length; >>> + event->destroy(event); >>> + } >>> + >>> + return total; >>> +} >>> +EXPORT_SYMBOL(drm_read); >>> + >>> unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait) >>> { >>> - return 0; >>> + struct drm_file *file_priv = filp->private_data; >>> + unsigned int mask = 0; >>> + >>> + poll_wait(filp, &file_priv->event_wait, wait); >>> + >>> >>> >> And here? >> > > Again, testing list emptiness. > > See above. Although here, the consequenses probably aren't that fatal. >>> + if (!list_empty(&file_priv->event_list)) >>> + mask |= POLLIN | POLLRDNORM; >>> + >>> + return mask; >>> } >>> EXPORT_SYMBOL(drm_poll); >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >>> index b4a3dbc..a0c120c 100644 >>> --- a/drivers/gpu/drm/drm_irq.c >>> +++ b/drivers/gpu/drm/drm_irq.c >>> @@ -34,6 +34,7 @@ >>> */ >>> >>> #include "drmP.h" >>> +#include "drm_crtc_helper.h" >>> >>> #include <linux/interrupt.h> /* For task queue support */ >>> >>> @@ -71,6 +72,44 @@ int drm_irq_by_busid(struct drm_device *dev, void *data, >>> return 0; >>> } >>> >>> +#define vblank_passed(a,b) ((long)(a - b) > 0) >>> + >>> +void drm_finish_pending_flip(struct drm_device *dev, >>> + struct drm_pending_flip *f, u32 frame) >>> +{ >>> + struct timeval now; >>> + >>> + f->event.frame = frame; >>> + do_gettimeofday(&now); >>> + f->event.tv_sec = now.tv_sec; >>> + f->event.tv_usec = now.tv_usec; >>> + drm_vblank_put(dev, f->pipe); >>> + list_del_init(&f->link); >>> >>> >> What's protecting the file_priv->event_list here? >> > > This function is always called with the struct_mutex held. > No. It's not. It's called with the mode_config mutex held. |
From: Thomas H. <th...@sh...> - 2009-08-28 12:31:42
|
Thomas Hellström skrev: > >>>> >>>> >>> What's protecting file_priv->event_list here? >>> >>> >> You can test for list emptiness without taking the lock. >> >> > > Are you suggesting accessing a member of a mutex protected struct > without taking the mutex? > Do you have a pointer to where what you say above is documented? > > That would be highly architecture dependent and in this particular case > require the processor being capable of atomic pointer reads and writes, > no processor r/w reordering and the compiler assuming the list head > being declared volatile. > > Even if that's the case let's assume list_empty() returns true, and then > That should of course be "... returns false, ..." > another thread sneaks in and empties the queue before you take the > mutex. The next thing you do is to access and try to remove "event", > which doesn't exist because the list is empty. If you're lucky you'll > see an oops. If not, you end up with strange memory corruption. > > >> >> >>>> + while (!list_empty(&file_priv->event_list)) { >>>> + mutex_lock(&dev->struct_mutex); >>>> + event = list_first_entry(&file_priv->event_list, >>>> + struct drm_pending_event, link); >>>> /Thomas |
From: Jesse B. <jb...@vi...> - 2009-08-28 17:57:07
|
On Fri, 28 Aug 2009 10:08:10 +0200 Thomas Hellström <th...@sh...> wrote: > Why then exactly are events needed? and why are we required to track > the progress of the command fifo with events like jbarnes suggests, > and finally why is this mechanism not needed in the non-pageflipping > case? If you can give a typical use-case that would probably help a > lot and avoid confusion. After you dropped off yesterday we chatted some more and I understand how this can be avoided now. If your driver has entry points to track completion of various events, you should be able to get an accurate swap buffer count (the main thing I was concerned about here) w/o the event mechanism. -- Jesse Barnes, Intel Open Source Technology Center |
From: Kristian H. <kr...@bi...> - 2009-08-28 20:04:48
|
This mail is getting out of control... too many sub-threads going on... maybe we shold break it out and talk about events vs kernel scheduling and wait with the patch review until we've figured something out. 2009/8/28 Thomas Hellström <th...@sh...>: > Kristian Høgsberg skrev: >> >> 2009/8/27 Thomas Hellström <th...@sh...>: >> >>> >>> Kristian Høgsberg skrev: >>> >>>> >>>> --- >>>> >>>> Okay, here's the patch again with the locking fixed. Everything else is >>>> the same. As for the potential problem with polling on the drm fd with >>>> both dri1 and dri2 enabled, I suggest that we just disable the dri1 >>>> sigio handler in the xserver. We'll do this in a commit that precedes >>>> the >>>> merge of the page flipping code, so that no server will be able to poll >>>> on the drm fd in both dri1 and dri2 code. >>>> >>>> cheers, >>>> Kristian >>>> >>>> >>> >>> <Grr> >>> >>> 1) The locking problems are still not fixed. See below. I just did a >>> quick look, so the comment list is not complete. >>> >> >> It's not as bad as you think. There is one glitch where I changed >> struct_mutex to mode_config.mutex and left dev->flip_list unprotected. >> The other issues you comment on should be fine, I've followed up in >> detail below. >> >> > > I actually think it is. See below. > >> >> >>> >>> </Grr> >>> >>> Even if that is fixed, I believe this patch adds a framework that >>> doesn't belong in a modern day graphical subsystem for the following >>> reasons, many of them already mentioned, but let me reiterate. Although >>> the dri1 compatibility issue will be fixed with your suggestion. >>> >>> a) It puts requirements on user-space for correct functionality of DRM >>> ioctls. I'm talking about the master needing to parse the event queue. >>> >> >> Don't we already have a lot of requirements on userspace to use the >> DRM? It's not exactly trivial to prepare a execbuffer ioctl, or to >> find a connector, crtc and mode and set that, for example. Part of >> this work is a new libdrm entry point to read an parse the events from >> the fd. See the dri2-swapbuffer branch in drm.git. >> >> > > >>> b) It requires the master to act as a scheduler, and circumvents the DRM >>> command submission mechanism through the delayed unpin callback. If this >>> is to workaround any inability of GEM to serve a command submission >>> while a previous command submission is blocked in the kernel, then IMHO >>> that should be fixed and not worked around. >>> >> >> It's not about workarounds. Your suggestion *blocks the hw* while >> waiting for vsync. > > No it doesn't. It blocks dri clients when they try to render to old fronts. > Other dri clients would continue rendering. It provides a natural migration > path to triple buffering where automagically nothing is blocked, and also to > advanced software schedulers that can buffer command submissions instead of > blocking. You advocated blocking the command queue using a wait-for-vblank command at some point. > Now the concern about GEM was that if the kernel takes a global mutex > _before_ blocking a client and doesn't release that mutex when the client is > blocked, all rendering will naturally be blocked as a consequence. That's not how the patch works. We hold no locks while the client is blocked. > What my suggestion *does* is to block the X server if AIGLX tries to render > to the old front, and the command scheduler is simple. Now I'm not > completely sure how serious that is, given that AIGLX will, as stated > before, frequently block the X server anyway since it's not running in a > separate thread. Are you talking about an AIGLX client submitting an expensive shader locking up the X server for a long time? I'm not aware of any way AIGLX can block the server for up to 20ms at a time right now, but that is what it'll do if it blocks on vsync. >>> c) The implementation will mostly be worked around with capable >>> schedulers and hardware. >>> >> >> This is not about the capability of the hw or the sw. The design is >> deliberate. >> > > What I mean is that if I don't want the kernel code to do delayed unpins, > because my cs already handles that, and I don't want the X server to block > clients because my cs or hardware will handle that, I would do my very best > to work around this code. I don't see the difference between a delayed unpin and a fence. We're doing the same thing, why is it ok if we call it a fence? You were saying that we should make the vsync interrupt look like a sw command queue and use that to fence the scan-out buffer right? Is that really better? I feel a bit like we're getting dragged back into the GEM vs TTM discussion here. I have no stake in that battle, I'm just trying to work with what's there. I don't think there's a fundamental problem nor benefit with either, and what you can do with a TTM fence you can do by waiting on the right GEM BO, as far as I understand. >>> A couple of questions: >>> Why are you guys so reluctant to use kernel scheduling for this instead >>> of a mix of kernel / user-space scheduling? >>> >> >> I'm not opposed to kernel scheduling as well, but userspace needs this >> event. I've made the case for AIGLX in the X server already, and >> direct rendering clients (for example, compositors) that want to >> handle input for as long as possible and avoid blocking has the same >> needs. >> > > I've still failed to understand this. Let's _assume_ for a while that the > kernel handles scheduling perfectly in a non-blocking fashion, or let's > assume we have triple-buffering, or let's assume we have a multi-FIFO card > that can do pageflipping and vblank barriers on all FIFOs. > > Why then exactly are events needed? and why are we required to track the > progress of the command fifo with events like jbarnes suggests, and finally > why is this mechanism not needed in the non-pageflipping case? If you can > give a typical use-case that would probably help a lot and avoid confusion. I think we have different applications in mind. My guess is that you're thinking of a typical game workload that tries to render as many frames per second as possible. What I have in mind is an application like a compositor or gui type application, where it doesn't use the hw much and doesn't spend much time rendering but needs to respond to input events and requests from clients. What I'd like to know is, how do you design the main loop of that application so that it doesn't spend most of it's time blocked in some gl entry point waiting for the swap to finish. It's not an application that's looking to queue up as many frames ahead as possible, that only introduces lag between the input events and what's on the screen. I'm not looking for "use threads" and I don't think there currently is a way to do this using just the OpenGL/GLX APIs. I agree that aside from the case with AIGLX blocking the server, you're right, we don't need the event. But what I'd like to do is to add a GLX extension that lets applications add a file descriptor to their main loop and that way discover when the flip is done. That lets them stay in phase with the vsync, and provides a way to avoid spending most of their time blocked in an ioctl. And as I've said before, I'm not opposed to doing the scheduling in the kernel, as long as we also get the event, so applications have a chance of knowing when they might block. >>> If the plan is to eliminate DRI2GetBuffers() once per frame, what will >>> then be used to block clients rendering to the old back buffer? >> >> There'll be an event that's sent back after each DRI2SwapBuffer and >> the clients will block on receiving that event. We still need to send >> a request to the xserver and receive confirmation that the xserver has >> received it before we can render again. > > The above is to make sure the swap is scheduled before any continued > rendering, right? Yes. >> DRI2GetBuffers is a request >> that expects a reply and will block the client on the xserver when we >> call it. DRI2SwapBuffers is an async request, ie there's no reply and >> calling it wont block necessarily the client. We still have to wait >> for the new event before we can go on rendering, but doing it this way >> makes the client and server less tightly coupled. We may end up doing >> the roundtrip between client and server at a point where the client >> was going to block anyway (like disk i/o or something) saving a >> context switch. >> >> > > Hmm. I don't understand fully. So up to now, my picture of how a frame was > rendered looks like this. > > swapBuffers(); > if (check_for_needed_getbuffers()) > getbuffers(); > render(); > swapBuffers(); > > This is one X call per frame in the steady-state case. Now, where do you add > the dri2 pageflip throttling if we don't need to call getbuffers()? Is it in > check_for_needed_getbuffers()? I described this in more detail and hopefully more coherently in my email to Michel. If that's still not clear, follow up there. >>>> drivers/gpu/drm/drm_crtc.c | 171 >>>> ++++++++++++++++++++++++++++++- >>>> drivers/gpu/drm/drm_crtc_helper.c | 10 ++ >>>> drivers/gpu/drm/drm_drv.c | 1 + >>>> drivers/gpu/drm/drm_fops.c | 68 ++++++++++++- >>>> drivers/gpu/drm/drm_irq.c | 42 ++++++++ >>>> drivers/gpu/drm/i915/i915_drv.c | 1 + >>>> drivers/gpu/drm/i915/intel_display.c | 16 +++- >>>> drivers/gpu/drm/radeon/radeon_display.c | 3 +- >>>> include/drm/drm.h | 25 +++++ >>>> include/drm/drmP.h | 32 ++++++ >>>> include/drm/drm_crtc.h | 27 +++++ >>>> include/drm/drm_crtc_helper.h | 4 + >>>> include/drm/drm_mode.h | 16 +++ >>>> 13 files changed, 409 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c >>>> index 8fab789..d877c21 100644 >>>> --- a/drivers/gpu/drm/drm_crtc.c >>>> +++ b/drivers/gpu/drm/drm_crtc.c >>>> @@ -34,6 +34,8 @@ >>>> #include "drmP.h" >>>> #include "drm_crtc.h" >>>> >>>> +#undef set_base >>>> + >>>> struct drm_prop_enum_list { >>>> int type; >>>> char *name; >>>> @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer >>>> *fb) >>>> EXPORT_SYMBOL(drm_framebuffer_cleanup); >>>> >>>> /** >>>> + * drm_crtc_async_flip - do a set_base call from a work queue >>>> + * @work: work struct >>>> + * >>>> + * Called when a set_base call is queued by the page flip code. This >>>> + * allows the flip ioctl itself to return immediately and allow >>>> userspace >>>> + * to continue working. >>>> + */ >>>> +static void drm_crtc_async_flip(struct work_struct *work) >>>> +{ >>>> + struct drm_crtc *crtc = container_of(work, struct drm_crtc, >>>> async_flip); >>>> + struct drm_device *dev = crtc->dev; >>>> + struct drm_pending_flip *pending; >>>> + >>>> + mutex_lock(&dev->mode_config.mutex); >>>> + >>>> + BUG_ON(crtc->pending_flip == NULL); >>>> + >>>> + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL); >>>> + >>>> + pending = crtc->pending_flip; >>>> + crtc->pending_flip = NULL; >>>> + >>>> + pending->frame = drm_vblank_count(dev, crtc->pipe); >>>> >>>> >>> >>> What's protecting dev->flip_list here? >>> >> >> That should be struct_mutex. As mentioned above, I missed it when I >> changed the mutex to mode_config.mutex. >> > > Hence the locking order question. If there is a need to keep the mode_config > mutex held _as well_ across the list insertion, since > crtc->pending_flip.link is modified at list insertion. That really depends > on what's protecting that member. I missed a mutex around the dev->flip_list manipulation, I didn't confuse the locking order. ... >>>> -/** No-op. */ >>>> +ssize_t drm_read(struct file *filp, char __user *buffer, >>>> + size_t count, loff_t *offset) >>>> +{ >>>> + struct drm_file *file_priv = filp->private_data; >>>> + struct drm_device *dev = file_priv->minor->dev; >>>> + struct drm_pending_event *event; >>>> + ssize_t total, ret; >>>> + >>>> + ret = wait_event_interruptible(file_priv->event_wait, >>>> + >>>> !list_empty(&file_priv->event_list)); >>>> + if (ret < 0) >>>> + return ret; >>>> + >>>> + total = 0; >>>> >>>> >>> >>> What's protecting file_priv->event_list here? >>> >> >> You can test for list emptiness without taking the lock. >> > > Are you suggesting accessing a member of a mutex protected struct without > taking the mutex? > Do you have a pointer to where what you say above is documented? > > That would be highly architecture dependent and in this particular case > require the processor being capable of atomic pointer reads and writes, no > processor r/w reordering and the compiler assuming the list head being > declared volatile. My drm_read() code below is broken in that it needs to recheck for list_empty again after taking the lock. Other than that, testing for list empty without taking the lock that normally protects the list is a common idiom in the kernel. It doesn't require write barriers or atomic ops, and of course, since it's done without the lock somebody could be changing the list while you're trying to establish whether it's empty. But the end result is no different from grabbing the lock, testing for empty, and then dropping the lock. Once you drop the lock, the result is potentially invalid anyway. So for something like poll implementations, where you just need to return whether there's data to read or not, it works just fine. If several processes/threads share the same fd, then the poll syscall is inherently racy, since between poll and read, some other process may have read the data. It's also used like in drm_read below, for testing for an empty list before taking the lock, though that's more of a micro-optimization and requires that the list is tested for non-empty again once under the lock before you operate on the list. See for example drivers/acpi/event.c, infiniband/core/ucm.c, drivers/usb/core/devio.c, sound/core/control.c for cases where a list is altered under a lock, but tested for emptiness without the lock in the poll implementation. There are even more cases with handrolled linked lists or other data structures used in the same way - no lock necessary when testing for empty. > Even if that's the case let's assume list_empty() returns true, and then > another thread sneaks in and empties the queue before you take the mutex. > The next thing you do is to access and try to remove "event", which doesn't > exist because the list is empty. If you're lucky you'll see an oops. If not, > you end up with strange memory corruption. >> >>>> >>>> + while (!list_empty(&file_priv->event_list)) { >>>> + mutex_lock(&dev->struct_mutex); >>>> + event = list_first_entry(&file_priv->event_list, >>>> + struct drm_pending_event, link); >>>> + if (total + event->event->length > count) { >>>> + mutex_unlock(&dev->struct_mutex); >>>> + break; >>>> + } >>>> + list_del(&event->link); >>>> + mutex_unlock(&dev->struct_mutex); >>>> + >>>> + if (copy_to_user(buffer + total, >>>> + event->event, event->event->length)) { >>>> + total = -EFAULT; >>>> + break; >>>> + } >>>> + >>>> + total += event->event->length; >>>> + event->destroy(event); >>>> + } >>>> + >>>> + return total; >>>> +} >>>> +EXPORT_SYMBOL(drm_read); >>>> + >>>> unsigned int drm_poll(struct file *filp, struct poll_table_struct >>>> *wait) >>>> { >>>> - return 0; >>>> + struct drm_file *file_priv = filp->private_data; >>>> + unsigned int mask = 0; >>>> + >>>> + poll_wait(filp, &file_priv->event_wait, wait); >>>> + >>> >>> And here? >> >> Again, testing list emptiness. > > See above. Although here, the consequenses probably aren't that fatal. > >>>> + if (!list_empty(&file_priv->event_list)) >>>> + mask |= POLLIN | POLLRDNORM; >>>> + >>>> + return mask; >>>> } ... >>>> +void drm_finish_pending_flip(struct drm_device *dev, >>>> + struct drm_pending_flip *f, u32 frame) >>>> +{ >>>> + struct timeval now; >>>> + >>>> + f->event.frame = frame; >>>> + do_gettimeofday(&now); >>>> + f->event.tv_sec = now.tv_sec; >>>> + f->event.tv_usec = now.tv_usec; >>>> + drm_vblank_put(dev, f->pipe); >>>> + list_del_init(&f->link); >>> >>> What's protecting the file_priv->event_list here? >> >> This function is always called with the struct_mutex held. > > No. It's not. It's called with the mode_config mutex held. No it's protected by struct_mutex, but there's one call where I missed struct_mutex: the path in the page_flip ioctl where we're replacing a pending flip. I've added struct_mutex locking around the call. cheers, Kristian |
From: Thomas H. <th...@sh...> - 2009-08-28 21:27:37
|
Kristian Høgsberg skrev: > This mail is getting out of control... too many sub-threads going > on... maybe we shold break it out and talk about events vs kernel > scheduling and wait with the patch review until we've figured > something out. > Sure. >>>> b) It requires the master to act as a scheduler, and circumvents the DRM >>>> command submission mechanism through the delayed unpin callback. If this >>>> is to workaround any inability of GEM to serve a command submission >>>> while a previous command submission is blocked in the kernel, then IMHO >>>> that should be fixed and not worked around. >>>> >>>> >>> It's not about workarounds. Your suggestion *blocks the hw* while >>> waiting for vsync. >>> >> No it doesn't. It blocks dri clients when they try to render to old fronts. >> Other dri clients would continue rendering. It provides a natural migration >> path to triple buffering where automagically nothing is blocked, and also to >> advanced software schedulers that can buffer command submissions instead of >> blocking. >> > > You advocated blocking the command queue using a wait-for-vblank > command at some point. > I mentioned an example with how unichromes could handle pageflipping with vblank barriers yes, but also experessed concerns about the stall. That's not what I'm proposing. I sent a mail titled "Pageflipping scheduling" to avoid such misunderstandings a day or so ago. That email details what I'm suggesting. > >> Now the concern about GEM was that if the kernel takes a global mutex >> _before_ blocking a client and doesn't release that mutex when the client is >> blocked, all rendering will naturally be blocked as a consequence. >> > > That's not how the patch works. We hold no locks while the client is blocked. > I was not referring to your patch but to _kernel_ blocking with GEM. My concern was that GEM might block all clients instead of just the client trying to render to the old front. My initial question was whether the DRI2 blocking implementation was a workaround for that. >> What my suggestion *does* is to block the X server if AIGLX tries to render >> to the old front, and the command scheduler is simple. Now I'm not >> completely sure how serious that is, given that AIGLX will, as stated >> before, frequently block the X server anyway since it's not running in a >> separate thread. >> > > Are you talking about an AIGLX client submitting an expensive shader > locking up the X server for a long time? I'm not aware of any way > AIGLX can block the server for up to 20ms at a time right now, but > that is what it'll do if it blocks on vsync. > > No, I was mostly referring to drivers doing swapbuffer throttling. A gpu-bound app at a low frame rate will certainly block for longer than that. Anyway, regardless of this AIGLX is special in that there actually is a benefit of a drm event as an optimization hint if AIGLX >>>> c) The implementation will mostly be worked around with capable >>>> schedulers and hardware. >>>> >>>> >>> This is not about the capability of the hw or the sw. The design is >>> deliberate. >>> >>> >> What I mean is that if I don't want the kernel code to do delayed unpins, >> because my cs already handles that, and I don't want the X server to block >> clients because my cs or hardware will handle that, I would do my very best >> to work around this code. >> > > I don't see the difference between a delayed unpin and a fence. We're > doing the same thing, why is it ok if we call it a fence? You were > saying that we should make the vsync interrupt look like a sw command > queue and use that to fence the scan-out buffer right? Is that really > better? I feel a bit like we're getting dragged back into the GEM vs > TTM discussion here. I have no stake in that battle, I'm just trying > to work with what's there. I don't think there's a fundamental problem > nor benefit with either, and what you can do with a TTM fence you can > do by waiting on the right GEM BO, as far as I understand. > Please, just forget about fences. They are one of the various ways to implement selective kernel blocking, and a good example. There are hopefully ways GEM can do this nicely as well. This has nothing to do with GEM vs TTM. What I'm saying is If people have a way to handle this in hardware or in the kernel more or less for free. Then people will try to work around the code you are proposing. There is hardware that can send pageflips down the command FIFO and if there are multiple FIFOS that can accept pageflip commands makes software blocking completely unnecessary. >>>> A couple of questions: >>>> Why are you guys so reluctant to use kernel scheduling for this instead >>>> of a mix of kernel / user-space scheduling? >>>> >>>> >>> I'm not opposed to kernel scheduling as well, but userspace needs this >>> event. I've made the case for AIGLX in the X server already, and >>> direct rendering clients (for example, compositors) that want to >>> handle input for as long as possible and avoid blocking has the same >>> needs. >>> >>> >> I've still failed to understand this. Let's _assume_ for a while that the >> kernel handles scheduling perfectly in a non-blocking fashion, or let's >> assume we have triple-buffering, or let's assume we have a multi-FIFO card >> that can do pageflipping and vblank barriers on all FIFOs. >> >> Why then exactly are events needed? and why are we required to track the >> progress of the command fifo with events like jbarnes suggests, and finally >> why is this mechanism not needed in the non-pageflipping case? If you can >> give a typical use-case that would probably help a lot and avoid confusion. >> > > I think we have different applications in mind. My guess is that > you're thinking of a typical game workload that tries to render as > many frames per second as possible. What I have in mind is an > application like a compositor or gui type application, where it > doesn't use the hw much and doesn't spend much time rendering but > needs to respond to input events and requests from clients. What I'd > like to know is, how do you design the main loop of that application > so that it doesn't spend most of it's time blocked in some gl entry > point waiting for the swap to finish. It's not an application that's > looking to queue up as many frames ahead as possible, that only > introduces lag between the input events and what's on the screen. > > I'm not looking for "use threads" and I don't think there currently is > a way to do this using just the OpenGL/GLX APIs. I agree that aside > from the case with AIGLX blocking the server, you're right, we don't > need the event. But what I'd like to do is to add a GLX extension > that lets applications add a file descriptor to their main loop and > that way discover when the flip is done. That lets them stay in phase > with the vsync, and provides a way to avoid spending most of their > time blocked in an ioctl. And as I've said before, I'm not opposed to > doing the scheduling in the kernel, as long as we also get the event, > so applications have a chance of knowing when they might block. > OK. Now this makes much more sense. Then I won't argue against the event mechanism as long as it's used as an optimization by clients to help avoid self-blocking, that should be turned on only when the client requests them. But It shouldn't IMO be used by the X server to block dri clients, which means that rendering should be correct even if there are no kernel events. It should be up to the kernel to ensure that. > >>>> If the plan is to eliminate DRI2GetBuffers() once per frame, what will >>>> then be used to block clients rendering to the old back buffer? >>>> >>> There'll be an event that's sent back after each DRI2SwapBuffer and >>> the clients will block on receiving that event. We still need to send >>> a request to the xserver and receive confirmation that the xserver has >>> received it before we can render again. >>> >> The above is to make sure the swap is scheduled before any continued >> rendering, right? >> > > Yes. > > >>> DRI2GetBuffers is a request >>> that expects a reply and will block the client on the xserver when we >>> call it. DRI2SwapBuffers is an async request, ie there's no reply and >>> calling it wont block necessarily the client. We still have to wait >>> for the new event before we can go on rendering, but doing it this way >>> makes the client and server less tightly coupled. We may end up doing >>> the roundtrip between client and server at a point where the client >>> was going to block anyway (like disk i/o or something) saving a >>> context switch. >>> >>> >>> >> Hmm. I don't understand fully. So up to now, my picture of how a frame was >> rendered looks like this. >> >> swapBuffers(); >> if (check_for_needed_getbuffers()) >> getbuffers(); >> render(); >> swapBuffers(); >> >> This is one X call per frame in the steady-state case. Now, where do you add >> the dri2 pageflip throttling if we don't need to call getbuffers()? Is it in >> check_for_needed_getbuffers()? >> > > I described this in more detail and hopefully more coherently in my > email to Michel. If that's still not clear, follow up there. > > I've read the mail and understand the proposal, thanks. /Thomas |
From: Michel D. <mi...@da...> - 2009-08-28 09:20:12
|
On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote: > 2009/8/27 Thomas Hellström <th...@sh...>: > > > > b) It requires the master to act as a scheduler, and circumvents the DRM > > command submission mechanism through the delayed unpin callback. If this > > is to workaround any inability of GEM to serve a command submission > > while a previous command submission is blocked in the kernel, then IMHO > > that should be fixed and not worked around. > > It's not about workarounds. Your suggestion *blocks the hw* while > waiting for vsync. My patch doesn't do that, it lets other clients > submit rendering to pixmap or backbuffer BOs that aren't involved in > the pending page flip. If you're willing to block the command stream, > GEM can keep the buffer pinned for you until the swap happens just > fine. Just like it does for any other command - it's not about that. > In fact, I think using the scheduler to keep buffers pinned for > scanout is conflating things. The scheduler pulls buffers in and out > of the aperture so that they are there for the GPU when it needs to > access them. Pinning and unpinning buffers for scanout is a different > matter. How is scanout not GPU access? :) I'd look at it the other way around: pinning a scanout buffer is a workaround which is only needed while there are no outstanding fences on it. > > If the plan is to eliminate DRI2GetBuffers() once per frame, what will > > then be used to block clients rendering to the old back buffer? > > There'll be an event that's sent back after each DRI2SwapBuffer and > the clients will block on receiving that event. Are you referring to a DRI2 protocol event or a DRM event? > We still need to send a request to the xserver and receive confirmation > that the xserver has received it before we can render again. DRI2GetBuffers is a request > that expects a reply and will block the client on the xserver when we > call it. DRI2SwapBuffers is an async request, ie there's no reply and > calling it wont block necessarily the client. We still have to wait > for the new event before we can go on rendering, but doing it this way > makes the client and server less tightly coupled. We may end up doing > the roundtrip between client and server at a point where the client > was going to block anyway (like disk i/o or something) saving a > context switch. It's still a bit fuzzy how this is all supposed to work. To me it seems like (ab)using DRI2GetBuffers for this will only allow clients to avoid synchronous rendering with triple buffering, which would be a shame. If you disagree, can you explain why that isn't the case? -- Earthling Michel Dänzer | http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer |
From: Thomas H. <th...@sh...> - 2009-08-28 12:40:31
|
Michel Dänzer skrev: > On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote: > >> 2009/8/27 Thomas Hellström <th...@sh...>: >> >>> b) It requires the master to act as a scheduler, and circumvents the DRM >>> command submission mechanism through the delayed unpin callback. If this >>> is to workaround any inability of GEM to serve a command submission >>> while a previous command submission is blocked in the kernel, then IMHO >>> that should be fixed and not worked around. >>> >> It's not about workarounds. Your suggestion *blocks the hw* while >> waiting for vsync. My patch doesn't do that, it lets other clients >> submit rendering to pixmap or backbuffer BOs that aren't involved in >> the pending page flip. If you're willing to block the command stream, >> GEM can keep the buffer pinned for you until the swap happens just >> fine. Just like it does for any other command - it's not about that. >> In fact, I think using the scheduler to keep buffers pinned for >> scanout is conflating things. The scheduler pulls buffers in and out >> of the aperture so that they are there for the GPU when it needs to >> access them. Pinning and unpinning buffers for scanout is a different >> matter. >> > > How is scanout not GPU access? :) I'd look at it the other way around: > pinning a scanout buffer is a workaround which is only needed while > there are no outstanding fences on it. > > Hmm, I missed this part. Yeah using the scheduler synchronization to protect buffers which have pending commands on them in the fifo is quite natural and also falls out quite neatly as detailed before. The workaround needed for page-flipping is that the buffers need to be pinned / unpinned just after validation. /Thomas |
From: Kristian H. <kr...@bi...> - 2009-08-28 12:58:11
|
On Fri, Aug 28, 2009 at 5:19 AM, Michel Dänzer<mi...@da...> wrote: > On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote: >> 2009/8/27 Thomas Hellström <th...@sh...>: >> > >> > b) It requires the master to act as a scheduler, and circumvents the DRM >> > command submission mechanism through the delayed unpin callback. If this >> > is to workaround any inability of GEM to serve a command submission >> > while a previous command submission is blocked in the kernel, then IMHO >> > that should be fixed and not worked around. >> >> It's not about workarounds. Your suggestion *blocks the hw* while >> waiting for vsync. My patch doesn't do that, it lets other clients >> submit rendering to pixmap or backbuffer BOs that aren't involved in >> the pending page flip. If you're willing to block the command stream, >> GEM can keep the buffer pinned for you until the swap happens just >> fine. Just like it does for any other command - it's not about that. >> In fact, I think using the scheduler to keep buffers pinned for >> scanout is conflating things. The scheduler pulls buffers in and out >> of the aperture so that they are there for the GPU when it needs to >> access them. Pinning and unpinning buffers for scanout is a different >> matter. > > How is scanout not GPU access? :) I'd look at it the other way around: > pinning a scanout buffer is a workaround which is only needed while > there are no outstanding fences on it. I guess I don't see the scanout engine as part of the GPU. If you think pinning a scanout buffer is a workaround, how do you propose we keep it in the aperture? Using a special fence class that never expire? Isn't that what pinning is? >> > If the plan is to eliminate DRI2GetBuffers() once per frame, what will >> > then be used to block clients rendering to the old back buffer? >> >> There'll be an event that's sent back after each DRI2SwapBuffer and >> the clients will block on receiving that event. > > Are you referring to a DRI2 protocol event or a DRM event? Sorry, a DRI2 event. >> We still need to send a request to the xserver and receive confirmation >> that the xserver has received it before we can render again. DRI2GetBuffers is a request >> that expects a reply and will block the client on the xserver when we >> call it. DRI2SwapBuffers is an async request, ie there's no reply and >> calling it wont block necessarily the client. We still have to wait >> for the new event before we can go on rendering, but doing it this way >> makes the client and server less tightly coupled. We may end up doing >> the roundtrip between client and server at a point where the client >> was going to block anyway (like disk i/o or something) saving a >> context switch. > > It's still a bit fuzzy how this is all supposed to work. To me it seems > like (ab)using DRI2GetBuffers for this will only allow clients to avoid > synchronous rendering with triple buffering, which would be a shame. If > you disagree, can you explain why that isn't the case? The way it works is that glXSwapBuffers() will send the DRI2SwapBuffer request and return immediately. There is not response to the DRI2SwapBuffer request, so it won't block on the X server there. The application is now free to do other processing, loading a new frame of disk and decode it, for example. However, the glXSwapBuffers() call also invalidates the DRI drivers buffers, so once the application wants to render again, it has to call DRI2GetBuffers, which is a round trip, so here we'll block on the X server. This ensures that the X server has seen the DRI2SwapBuffer request before we start rendering again, and provides the application with the buffers it needs for the next frame. This last bit is important, as the server may or may not be able to do a page flip, so the client can't generally predict what the buffer to use for the next frame will be. What I'm proposing over this scheme with the DRI2 event is that instead of requiring the client to call DRI2GetBuffers, we can send out a DRI2 event once the flip happens. This breaks the required roundtrip into a one-way request and an event. Worst case, this still requires the clent to block on receiving the event. However, it also allows the X server to process the DRI2SwapBuffer request and send the event to the client before the client gets around to needing, in which case the round trip disappears. cheers, Kristian |
From: Michel D. <mi...@da...> - 2009-08-28 13:15:04
|
On Fri, 2009-08-28 at 08:57 -0400, Kristian Høgsberg wrote: > On Fri, Aug 28, 2009 at 5:19 AM, Michel Dänzer<mi...@da...> wrote: > > On Thu, 2009-08-27 at 17:33 -0400, Kristian Høgsberg wrote: > >> 2009/8/27 Thomas Hellström <th...@sh...>: > >> > > >> > b) It requires the master to act as a scheduler, and circumvents the DRM > >> > command submission mechanism through the delayed unpin callback. If this > >> > is to workaround any inability of GEM to serve a command submission > >> > while a previous command submission is blocked in the kernel, then IMHO > >> > that should be fixed and not worked around. > >> > >> It's not about workarounds. Your suggestion *blocks the hw* while > >> waiting for vsync. My patch doesn't do that, it lets other clients > >> submit rendering to pixmap or backbuffer BOs that aren't involved in > >> the pending page flip. If you're willing to block the command stream, > >> GEM can keep the buffer pinned for you until the swap happens just > >> fine. Just like it does for any other command - it's not about that. > >> In fact, I think using the scheduler to keep buffers pinned for > >> scanout is conflating things. The scheduler pulls buffers in and out > >> of the aperture so that they are there for the GPU when it needs to > >> access them. Pinning and unpinning buffers for scanout is a different > >> matter. > > > > How is scanout not GPU access? :) I'd look at it the other way around: > > pinning a scanout buffer is a workaround which is only needed while > > there are no outstanding fences on it. > > I guess I don't see the scanout engine as part of the GPU. If you > think pinning a scanout buffer is a workaround, how do you propose we > keep it in the aperture? Using a special fence class that never > expire? Isn't that what pinning is? That's kind of the point I guess - which is the real thing and which is a workaround depends on the point of view. > >> > If the plan is to eliminate DRI2GetBuffers() once per frame, what will > >> > then be used to block clients rendering to the old back buffer? > >> > >> There'll be an event that's sent back after each DRI2SwapBuffer and > >> the clients will block on receiving that event. > > > > Are you referring to a DRI2 protocol event or a DRM event? > > Sorry, a DRI2 event. > > >> We still need to send a request to the xserver and receive confirmation > >> that the xserver has received it before we can render again. DRI2GetBuffers is a request > >> that expects a reply and will block the client on the xserver when we > >> call it. DRI2SwapBuffers is an async request, ie there's no reply and > >> calling it wont block necessarily the client. We still have to wait > >> for the new event before we can go on rendering, but doing it this way > >> makes the client and server less tightly coupled. We may end up doing > >> the roundtrip between client and server at a point where the client > >> was going to block anyway (like disk i/o or something) saving a > >> context switch. > > > > It's still a bit fuzzy how this is all supposed to work. To me it seems > > like (ab)using DRI2GetBuffers for this will only allow clients to avoid > > synchronous rendering with triple buffering, which would be a shame. If > > you disagree, can you explain why that isn't the case? > > The way it works is that glXSwapBuffers() will send the DRI2SwapBuffer > request and return immediately. There is not response to the > DRI2SwapBuffer request, so it won't block on the X server there. The > application is now free to do other processing, loading a new frame of > disk and decode it, for example. However, the glXSwapBuffers() call > also invalidates the DRI drivers buffers, so once the application > wants to render again, it has to call DRI2GetBuffers, which is a round > trip, so here we'll block on the X server. This ensures that the X > server has seen the DRI2SwapBuffer request before we start rendering > again, and provides the application with the buffers it needs for the > next frame. This last bit is important, as the server may or may not > be able to do a page flip, so the client can't generally predict what > the buffer to use for the next frame will be. > > What I'm proposing over this scheme with the DRI2 event is that > instead of requiring the client to call DRI2GetBuffers, we can send > out a DRI2 event once the flip happens. This breaks the required > roundtrip into a one-way request and an event. Worst case, this still > requires the clent to block on receiving the event. However, it also > allows the X server to process the DRI2SwapBuffer request and send the > event to the client before the client gets around to needing, in which > case the round trip disappears. Thanks. Unfortunately, I wasn't missing anything AFAICT. The problem is that this doesn't allow the client to even start generating hardware commands for the next frame before the flip occurs, so with a single client the GPU pipeline is pretty much guaranteed to drain, without triple buffering. One possible solution for this would be for the DRI2SwapBuffer request to return the suggested set of buffers for the next frame, and then for the client to block (which could involve a DRM event or whatever if just plain blocking turns out to be a real problem in practice - doubtful IME) on command submission if those buffers still have pending flips. -- Earthling Michel Dänzer | http://www.vmware.com Libre software enthusiast | Debian, X and DRI developer |
From: Jesse B. <jb...@vi...> - 2009-08-28 17:54:24
|
On Fri, 28 Aug 2009 15:14:47 +0200 Michel Dänzer <mi...@da...> wrote: > One possible solution for this would be for the DRI2SwapBuffer request > to return the suggested set of buffers for the next frame, and then > for the client to block (which could involve a DRM event or whatever > if just plain blocking turns out to be a real problem in practice - > doubtful IME) on command submission if those buffers still have > pending flips. Yeah, that's what the initial implementation did (though with more blocking than was necessary iirc, but that was just an implementation detail). I don't think it precludes things like triple buffering either; the server is still free to allocate new buffers and return them to the client at swapbuffers time. I think we're all agreed that we want to minimize blocking of any kind; we should keep all the clients busy submitting commands as much as possible. Any kind of GPU drain or wait on event is a killer. -- Jesse Barnes, Intel Open Source Technology Center |
From: Thomas H. <th...@sh...> - 2009-08-30 15:09:47
|
Thomas Hellström skrev: > >> I described this in more detail and hopefully more coherently in my >> email to Michel. If that's still not clear, follow up there. >> >> >> > I've read the mail and understand the proposal, thanks. > > /Thomas > > > So, I've been doing some thinking over the weekend and here's a constructive proposal that hopefully can be the base of an agreement: 1) We require the semantics of the pageflip ioctl to be such that it is safe to schedule render commands that reference the buffers involved immediately after the ioctl return, or in other words, the pageflip has entered the graphics pipeline and any render operations to the referenced buffers will be guaranteed to be executed after the pageflip. How this is implemented is up to the driver, and thus the code will need driver-specific hooks. There is a multitude of ways to implement this, ranging from full hardware support to a real naive and stupid software implementation that blocks all command submission while there are pending flips. A simple sufficient implementation is to scan the buffers to be validated at cs time to see if there are pending pageflips on any of them. In that case, release all held cs locks and block. When the pageflip happens, continue. But again, that will be up to the driver. 2) We rip the blocking code out of the DRI2 protocol, since there is no longer any need for it. 3) The DRM event mechanism stays as proposed, The ioctl caller needs to explicitly request an event to get one. Events will initially be used by clients to wake _themselves_ from voluntary select() blocking. The motivation for 1-2 is as follows: a) The solution fits all kinds of smart hardware and cs mechanism, whereas the DRI2 blocking solution assumes a simple hardware and a naive kernel cs mechanism. One can argue that smart kernel schedulers or advanced hardware can work around the DRI2 blocking solution by sending out the event immediately, but there we are again working around the DRI2 blocking solution. We shouldn't need to do that. b) There's no requirement on masters to do scheduling with this proposal. Otherwise we'd have to live with that forever and implement it in all masters utilizing the pageflip ioctl. c) latency - performance. Consider the sequence of events between the vsync and the first set of rendering commands on the next frame being submit to the hardware: c1) DRI2 blocking: (Please correct me if I misunderstood something here) * vsync irq * schedule a wq thread that adds an event and wakes the X server. * X server issues a syscall to read the drm event * X server returns an event to the client with the new buffers. (write to socket ?) * Client reads the event (read from socket ?) * Client prepares the first command buffer (this is usually quite time consuming and in effect not only synchronizes GPU and command buffer building, but in effect serializes them). * Client builds and issues a cs ioctl. * Kernel submits commands to hardware. c2) Kernel scheduling (this proposal). * vsync irq * schedule the client thread that immediately submits commands to hardware. IMHO, c1 is far from optimal and should not be considered. One can argue once again, that this doesn't add much latency in practice, but we can't keep arguing like that for every such item we add per frame, and in this case the serialized command buffer building *does* add too much latency. We should seek and implement the optimal solution if it doesn't imply too much work or have side-effects. Some added functionality we should also perhaps consider adding to the ioctl interface: 1) A flag whether to vsync or not. Ideally a driconf option so that should perhaps be communicated as part of dri2 swapbuffers as well. I guess on Intel hardware you can only flip on vsync(?) but on some other hardware you can just send a new scanout startaddress down the FIFO. You'll definitely see tearing, though. 2) Driver private data. An example: Drivers with multiple hardware FIFOs that can do pageflipping and barriers on each FIFO might want to indicate to the kernel on which FIFO or HW context to schedule the pageflip. I guess this private data might also need to be passed along with dri2 swapbuffers. Thanks, /Thomas |
From: Kristian H. <kr...@bi...> - 2009-08-31 18:00:20
|
Thomas, What you describe sounds reasonable and I'll look into updating the patch. I'm not too keen on the driver specific flag you suggest, since it makes it hard to use the ioctl in generic code. Maybe drivers that can do pageflip from one of several fifos can expose a driver specific ioctl to configure which one should be used. I don't imagine it's something that will change much? cheers, Kristiain 2009/8/30 Thomas Hellström <th...@sh...>: > Thomas Hellström skrev: >> >>> I described this in more detail and hopefully more coherently in my >>> email to Michel. If that's still not clear, follow up there. >>> >>> >> >> I've read the mail and understand the proposal, thanks. >> >> /Thomas >> >> >> > > So, I've been doing some thinking over the weekend and here's a constructive > proposal that hopefully can be the base of an agreement: > > 1) We require the semantics of the pageflip ioctl to be such that it is safe > to schedule render commands that reference the buffers involved immediately > after the ioctl return, or in other words, the pageflip has entered the > graphics pipeline and any render operations to the referenced buffers will > be guaranteed to be executed after the pageflip. How this is implemented is > up to the driver, and thus the code will need driver-specific hooks. There > is a multitude of ways to implement this, ranging from full hardware support > to a real naive and stupid software implementation that blocks all command > submission while there are pending flips. > A simple sufficient implementation is to scan the buffers to be validated at > cs time to see if there are pending pageflips on any of them. In that case, > release all held cs locks and block. When the pageflip happens, continue. > But again, that will be up to the driver. > > 2) We rip the blocking code out of the DRI2 protocol, since there is no > longer any need for it. > > 3) The DRM event mechanism stays as proposed, The ioctl caller needs to > explicitly request an event to get one. Events will initially be used by > clients to wake _themselves_ from voluntary select() blocking. > > The motivation for 1-2 is as follows: > a) The solution fits all kinds of smart hardware and cs mechanism, whereas > the DRI2 blocking solution assumes a simple hardware and a naive kernel cs > mechanism. One can argue that smart kernel schedulers or advanced hardware > can work around the DRI2 blocking solution by sending out the event > immediately, but there we are again working around the DRI2 blocking > solution. We shouldn't need to do that. > > b) There's no requirement on masters to do scheduling with this proposal. > Otherwise we'd have to live with that forever and implement it in all > masters utilizing the pageflip ioctl. > > c) latency - performance. Consider the sequence of events between the vsync > and the first set of rendering commands on the next frame being submit to > the hardware: > > c1) DRI2 blocking: (Please correct me if I misunderstood something here) > * vsync irq > * schedule a wq thread that adds an event and wakes the X server. > * X server issues a syscall to read the drm event > * X server returns an event to the client with the new buffers. (write to > socket ?) > * Client reads the event (read from socket ?) > * Client prepares the first command buffer (this is usually quite time > consuming and in effect not only synchronizes GPU and command buffer > building, but in effect serializes them). > * Client builds and issues a cs ioctl. > * Kernel submits commands to hardware. > > c2) Kernel scheduling (this proposal). > * vsync irq > * schedule the client thread that immediately submits commands to hardware. > > IMHO, c1 is far from optimal and should not be considered. One can argue > once again, that this doesn't add much latency in practice, but we can't > keep arguing like that for every such item we add per frame, and in this > case the serialized command buffer building *does* add too much latency. We > should seek and implement the optimal solution if it doesn't imply too much > work or have side-effects. > > Some added functionality we should also perhaps consider adding to the ioctl > interface: > > 1) A flag whether to vsync or not. Ideally a driconf option so that should > perhaps be communicated as part of dri2 swapbuffers as well. I guess on > Intel hardware you can only flip on vsync(?) but on some other hardware you > can just send a new scanout startaddress down the FIFO. You'll definitely > see tearing, though. > > 2) Driver private data. An example: Drivers with multiple hardware FIFOs > that can do pageflipping and barriers on each FIFO might want to indicate to > the kernel on which FIFO or HW context to schedule the pageflip. I guess > this private data might also need to be passed along with dri2 swapbuffers. > > Thanks, > /Thomas > > |
From: Thomas H. <th...@sh...> - 2009-08-31 19:18:24
|
Kristian Høgsberg wrote: > Thomas, > > What you describe sounds reasonable and I'll look into updating the > patch. I'm not too keen on the driver specific flag you suggest, > since it makes it hard to use the ioctl in generic code. Maybe > drivers that can do pageflip from one of several fifos can expose a > driver specific ioctl to configure which one should be used. I don't > imagine it's something that will change much? > > Kristian, Yes, that sounds OK with me. That last driver specific part was not very well thought through, I admit. There's probably other ways to do that as you suggest. /Thomas > cheers, > Kristiain > > 2009/8/30 Thomas Hellström <th...@sh...>: > >> Thomas Hellström skrev: >> >>>> I described this in more detail and hopefully more coherently in my >>>> email to Michel. If that's still not clear, follow up there. >>>> >>>> >>>> >>> I've read the mail and understand the proposal, thanks. >>> >>> /Thomas >>> >>> >>> >>> >> So, I've been doing some thinking over the weekend and here's a constructive >> proposal that hopefully can be the base of an agreement: >> >> 1) We require the semantics of the pageflip ioctl to be such that it is safe >> to schedule render commands that reference the buffers involved immediately >> after the ioctl return, or in other words, the pageflip has entered the >> graphics pipeline and any render operations to the referenced buffers will >> be guaranteed to be executed after the pageflip. How this is implemented is >> up to the driver, and thus the code will need driver-specific hooks. There >> is a multitude of ways to implement this, ranging from full hardware support >> to a real naive and stupid software implementation that blocks all command >> submission while there are pending flips. >> A simple sufficient implementation is to scan the buffers to be validated at >> cs time to see if there are pending pageflips on any of them. In that case, >> release all held cs locks and block. When the pageflip happens, continue. >> But again, that will be up to the driver. >> >> 2) We rip the blocking code out of the DRI2 protocol, since there is no >> longer any need for it. >> >> 3) The DRM event mechanism stays as proposed, The ioctl caller needs to >> explicitly request an event to get one. Events will initially be used by >> clients to wake _themselves_ from voluntary select() blocking. >> >> The motivation for 1-2 is as follows: >> a) The solution fits all kinds of smart hardware and cs mechanism, whereas >> the DRI2 blocking solution assumes a simple hardware and a naive kernel cs >> mechanism. One can argue that smart kernel schedulers or advanced hardware >> can work around the DRI2 blocking solution by sending out the event >> immediately, but there we are again working around the DRI2 blocking >> solution. We shouldn't need to do that. >> >> b) There's no requirement on masters to do scheduling with this proposal. >> Otherwise we'd have to live with that forever and implement it in all >> masters utilizing the pageflip ioctl. >> >> c) latency - performance. Consider the sequence of events between the vsync >> and the first set of rendering commands on the next frame being submit to >> the hardware: >> >> c1) DRI2 blocking: (Please correct me if I misunderstood something here) >> * vsync irq >> * schedule a wq thread that adds an event and wakes the X server. >> * X server issues a syscall to read the drm event >> * X server returns an event to the client with the new buffers. (write to >> socket ?) >> * Client reads the event (read from socket ?) >> * Client prepares the first command buffer (this is usually quite time >> consuming and in effect not only synchronizes GPU and command buffer >> building, but in effect serializes them). >> * Client builds and issues a cs ioctl. >> * Kernel submits commands to hardware. >> >> c2) Kernel scheduling (this proposal). >> * vsync irq >> * schedule the client thread that immediately submits commands to hardware. >> >> IMHO, c1 is far from optimal and should not be considered. One can argue >> once again, that this doesn't add much latency in practice, but we can't >> keep arguing like that for every such item we add per frame, and in this >> case the serialized command buffer building *does* add too much latency. We >> should seek and implement the optimal solution if it doesn't imply too much >> work or have side-effects. >> >> Some added functionality we should also perhaps consider adding to the ioctl >> interface: >> >> 1) A flag whether to vsync or not. Ideally a driconf option so that should >> perhaps be communicated as part of dri2 swapbuffers as well. I guess on >> Intel hardware you can only flip on vsync(?) but on some other hardware you >> can just send a new scanout startaddress down the FIFO. You'll definitely >> see tearing, though. >> >> 2) Driver private data. An example: Drivers with multiple hardware FIFOs >> that can do pageflipping and barriers on each FIFO might want to indicate to >> the kernel on which FIFO or HW context to schedule the pageflip. I guess >> this private data might also need to be passed along with dri2 swapbuffers. >> >> Thanks, >> /Thomas >> >> >> |
From: <kr...@re...> - 2009-08-27 14:52:05
|
This patch adds a vblank synced pageflip ioctl for to the modesetting family of ioctls. The ioctl takes a crtc and an fb and schedules a pageflip to the new fb at the next coming vertical blank event. This feature lets userspace implement tear-free updating of the screen contents with hw-guaranteed low latency page flipping. The ioctl is asynchronous in that it returns immediately and then later notifies the client by making an event available for reading on the drm fd. This lets applications add the drm fd to their main loop and handle other tasks while waiting for the flip to happen. The event includes the time of the flip, the frame counter and a 64 bit opaque token provided by user space in the ioctl. Based on work and suggestions from Jesse Barnes <jb...@vi...>, Jakob Bornecrantz <wal...@gm...>, Chris Wilson <ch...@ch...> Signed-off-by: Kristian Høgsberg <kr...@re...> Signed-off-by: Jesse Barnes <jb...@vi...> --- Resend with checkpatch.pl problems fixed. cheers, Kristian drivers/gpu/drm/drm_crtc.c | 171 ++++++++++++++++++++++++++++++- drivers/gpu/drm/drm_crtc_helper.c | 10 ++ drivers/gpu/drm/drm_drv.c | 1 + drivers/gpu/drm/drm_fops.c | 68 ++++++++++++- drivers/gpu/drm/drm_irq.c | 42 ++++++++ drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/intel_display.c | 16 +++- drivers/gpu/drm/radeon/radeon_display.c | 3 +- include/drm/drm.h | 25 +++++ include/drm/drmP.h | 32 ++++++ include/drm/drm_crtc.h | 27 +++++ include/drm/drm_crtc_helper.h | 4 + include/drm/drm_mode.h | 16 +++ 13 files changed, 409 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 8fab789..a6ee04f 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -34,6 +34,8 @@ #include "drmP.h" #include "drm_crtc.h" +#undef set_base + struct drm_prop_enum_list { int type; char *name; @@ -342,6 +344,35 @@ void drm_framebuffer_cleanup(struct drm_framebuffer *fb) EXPORT_SYMBOL(drm_framebuffer_cleanup); /** + * drm_crtc_async_flip - do a set_base call from a work queue + * @work: work struct + * + * Called when a set_base call is queued by the page flip code. This + * allows the flip ioctl itself to return immediately and allow userspace + * to continue working. + */ +static void drm_crtc_async_flip(struct work_struct *work) +{ + struct drm_crtc *crtc = container_of(work, struct drm_crtc, async_flip); + struct drm_device *dev = crtc->dev; + struct drm_pending_flip *pending; + + mutex_lock(&dev->mode_config.mutex); + + BUG_ON(crtc->pending_flip == NULL); + + crtc->funcs->set_base(crtc, crtc->x, crtc->y, NULL); + + pending = crtc->pending_flip; + crtc->pending_flip = NULL; + + pending->frame = drm_vblank_count(dev, crtc->pipe); + list_add_tail(&pending->link, &dev->flip_list); + + mutex_unlock(&dev->mode_config.mutex); +} + +/** * drm_crtc_init - Initialise a new CRTC object * @dev: DRM device * @crtc: CRTC object to init @@ -352,17 +383,19 @@ EXPORT_SYMBOL(drm_framebuffer_cleanup); * * Inits a new object created as base part of an driver crtc object. */ -void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, +void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, int pipe, const struct drm_crtc_funcs *funcs) { crtc->dev = dev; crtc->funcs = funcs; + crtc->pipe = pipe; mutex_lock(&dev->mode_config.mutex); drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); list_add_tail(&crtc->head, &dev->mode_config.crtc_list); dev->mode_config.num_crtc++; + INIT_WORK(&crtc->async_flip, drm_crtc_async_flip); mutex_unlock(&dev->mode_config.mutex); } EXPORT_SYMBOL(drm_crtc_init); @@ -381,6 +414,9 @@ void drm_crtc_cleanup(struct drm_crtc *crtc) { struct drm_device *dev = crtc->dev; + mutex_lock(&dev->mode_config.mutex); + flush_work(&crtc->async_flip); + if (crtc->gamma_store) { kfree(crtc->gamma_store); crtc->gamma_store = NULL; @@ -388,6 +424,7 @@ void drm_crtc_cleanup(struct drm_crtc *crtc) drm_mode_object_put(dev, &crtc->base); list_del(&crtc->head); + mutex_unlock(&dev->mode_config.mutex); dev->mode_config.num_crtc--; } EXPORT_SYMBOL(drm_crtc_cleanup); @@ -2452,3 +2489,135 @@ out: mutex_unlock(&dev->mode_config.mutex); return ret; } + +/** + * drm_mode_page_flip_ioctl - page flip ioctl + * @dev: DRM device + * @data: ioctl args + * @file_priv: file private data + * + * The page flip ioctl replaces the current front buffer with a new + * one, using the CRTC's set_base function, which should just update + * the front buffer base pointer. It's up to set_base to make + * sure the update doesn't result in tearing (on some hardware the + * base register is double buffered, so this is easy). + * + * Note that this covers just the simple case of flipping the front + * buffer immediately. Interval handling and interlaced modes have to + * be handled by userspace, or with new ioctls. + */ +int drm_mode_page_flip_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_pending_flip *pending; + struct drm_mode_page_flip *flip_data = data; + struct drm_mode_object *drm_obj, *fb_obj; + struct drm_crtc *crtc; + int ret = 0; + + if (!(drm_core_check_feature(dev, DRIVER_MODESET))) + return -ENODEV; + + /* + * Reject unknown flags so future userspace knows what we (don't) + * support + */ + if (flip_data->flags & (~DRM_MODE_PAGE_FLIP_FLAGS_MASK)) { + DRM_DEBUG("bad page flip flags\n"); + return -EINVAL; + } + + pending = kzalloc(sizeof *pending, GFP_KERNEL); + if (pending == NULL) + return -ENOMEM; + + mutex_lock(&dev->mode_config.mutex); + + fb_obj = drm_mode_object_find(dev, flip_data->fb_id, + DRM_MODE_OBJECT_FB); + if (!fb_obj) { + DRM_DEBUG("unknown fb %d\n", flip_data->fb_id); + ret = -ENOENT; + goto out_unlock; + } + + drm_obj = drm_mode_object_find(dev, flip_data->crtc_id, + DRM_MODE_OBJECT_CRTC); + if (!drm_obj) { + DRM_DEBUG("unknown crtc %d\n", flip_data->crtc_id); + ret = -ENOENT; + goto out_unlock; + } + crtc = obj_to_crtc(drm_obj); + if (!crtc->enabled) { + DRM_DEBUG("crtc %d not enabled\n", flip_data->crtc_id); + ret = -EINVAL; + goto out_unlock; + } + + if (crtc->fb->funcs->unpin == NULL) { + DRM_DEBUG("fb for crtc %d does not support delayed unpin\n", + flip_data->crtc_id); + ret = -ENODEV; + goto out_unlock; + } + + pending->crtc = crtc; + pending->old_fb = crtc->fb; + pending->pipe = crtc->pipe; + INIT_LIST_HEAD(&pending->link); + pending->event.base.type = DRM_EVENT_MODE_PAGE_FLIP; + pending->event.base.length = sizeof pending->event; + pending->event.user_data = flip_data->user_data; + pending->pending_event.event = &pending->event.base; + pending->pending_event.file_priv = file_priv; + pending->pending_event.destroy = + (void (*) (struct drm_pending_event *)) kfree; + + /* Get vblank ref for completion handling */ + ret = drm_vblank_get(dev, crtc->pipe); + if (ret) { + DRM_DEBUG("failed to take vblank ref\n"); + goto out_unlock; + } + + /* + * The set_base call will change the domain on the new fb, + * which will force the rendering to finish and block the + * ioctl. We need to do this last part from a work queue, to + * avoid blocking userspace here. + */ + crtc->fb = obj_to_fb(fb_obj); + + if (crtc->pending_flip != NULL) { + struct drm_pending_flip *old_flip; + + /* We have an outstanding flip request for this crtc/pipe. + * In order to satisfy the user we can either queue the requests + * and apply them on sequential vblanks, or we can drop old + * requests. + * + * Here we choose to discard the previous request for + * simplicity. Note that since we have not yet applied the + * previous flip, we need to preserve the original (i.e. still + * current) fb. + */ + + old_flip = crtc->pending_flip; + pending->old_fb = old_flip->old_fb; + old_flip->old_fb = NULL; + drm_finish_pending_flip(dev, old_flip, 0); + } else + schedule_work(&crtc->async_flip); + crtc->pending_flip = pending; + + mutex_unlock(&dev->mode_config.mutex); + + return 0; + +out_unlock: + mutex_unlock(&dev->mode_config.mutex); + kfree(pending); + + return ret; +} diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index 3da9cfa..b2d694f 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -1095,3 +1095,13 @@ int drm_helper_resume_force_mode(struct drm_device *dev) return 0; } EXPORT_SYMBOL(drm_helper_resume_force_mode); + +int +drm_crtc_helper_set_base(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb) +{ + struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private; + + return crtc_funcs->mode_set_base(crtc, x, y, old_fb); +} +EXPORT_SYMBOL(drm_crtc_helper_set_base); diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c index b39d7bf..c66c993 100644 --- a/drivers/gpu/drm/drm_drv.c +++ b/drivers/gpu/drm/drm_drv.c @@ -145,6 +145,7 @@ static struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETFB, drm_mode_getfb, DRM_MASTER|DRM_CONTROL_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_MODE_ADDFB, drm_mode_addfb, DRM_MASTER|DRM_CONTROL_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_MODE_RMFB, drm_mode_rmfb, DRM_MASTER|DRM_CONTROL_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_MODE_PAGE_FLIP, drm_mode_page_flip_ioctl, DRM_MASTER|DRM_CONTROL_ALLOW), }; #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 251bc0e..dcd9c66 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -257,6 +257,8 @@ static int drm_open_helper(struct inode *inode, struct file *filp, INIT_LIST_HEAD(&priv->lhead); INIT_LIST_HEAD(&priv->fbs); + INIT_LIST_HEAD(&priv->event_list); + init_waitqueue_head(&priv->event_wait); if (dev->driver->driver_features & DRIVER_GEM) drm_gem_open(dev, priv); @@ -429,6 +431,9 @@ int drm_release(struct inode *inode, struct file *filp) { struct drm_file *file_priv = filp->private_data; struct drm_device *dev = file_priv->minor->dev; + struct drm_pending_flip *f, *ft; + struct drm_pending_event *e, *et; + int retcode = 0; lock_kernel(); @@ -451,6 +456,19 @@ int drm_release(struct inode *inode, struct file *filp) if (file_priv->minor->master) drm_master_release(dev, filp); + mutex_lock(&dev->struct_mutex); + + /* Remove pending flips */ + list_for_each_entry_safe(f, ft, &dev->flip_list, link) + if (f->pending_event.file_priv == file_priv) + drm_finish_pending_flip(dev, f, 0); + + /* Remove unconsumed events */ + list_for_each_entry_safe(e, et, &file_priv->event_list, link) + e->destroy(e); + + mutex_unlock(&dev->struct_mutex); + if (dev->driver->driver_features & DRIVER_GEM) drm_gem_release(dev, file_priv); @@ -544,9 +562,55 @@ int drm_release(struct inode *inode, struct file *filp) } EXPORT_SYMBOL(drm_release); -/** No-op. */ +ssize_t drm_read(struct file *filp, char __user *buffer, + size_t count, loff_t *offset) +{ + struct drm_file *file_priv = filp->private_data; + struct drm_device *dev = file_priv->minor->dev; + struct drm_pending_event *event; + ssize_t total, ret; + + ret = wait_event_interruptible(file_priv->event_wait, + !list_empty(&file_priv->event_list)); + if (ret < 0) + return ret; + + total = 0; + while (!list_empty(&file_priv->event_list)) { + mutex_lock(&dev->struct_mutex); + event = list_first_entry(&file_priv->event_list, + struct drm_pending_event, link); + if (total + event->event->length > count) { + mutex_unlock(&dev->struct_mutex); + break; + } + list_del(&event->link); + mutex_unlock(&dev->struct_mutex); + + if (copy_to_user(buffer + total, + event->event, event->event->length)) { + total = -EFAULT; + break; + } + + total += event->event->length; + event->destroy(event); + } + + return total; +} +EXPORT_SYMBOL(drm_read); + unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait) { - return 0; + struct drm_file *file_priv = filp->private_data; + unsigned int mask = 0; + + poll_wait(filp, &file_priv->event_wait, wait); + + if (!list_empty(&file_priv->event_list)) + mask |= POLLIN | POLLRDNORM; + + return mask; } EXPORT_SYMBOL(drm_poll); diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index b4a3dbc..43f8214 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -34,6 +34,7 @@ */ #include "drmP.h" +#include "drm_crtc_helper.h" #include <linux/interrupt.h> /* For task queue support */ @@ -71,6 +72,44 @@ int drm_irq_by_busid(struct drm_device *dev, void *data, return 0; } +#define vblank_passed(a, b) ((long)(a - b) > 0) + +void drm_finish_pending_flip(struct drm_device *dev, + struct drm_pending_flip *f, u32 frame) +{ + struct timeval now; + + f->event.frame = frame; + do_gettimeofday(&now); + f->event.tv_sec = now.tv_sec; + f->event.tv_usec = now.tv_usec; + drm_vblank_put(dev, f->pipe); + list_del_init(&f->link); + list_add_tail(&f->pending_event.link, + &f->pending_event.file_priv->event_list); + if (f->old_fb) + f->old_fb->funcs->unpin(f->old_fb); + wake_up_interruptible(&f->pending_event.file_priv->event_wait); +} + +static void drm_flip_work_func(struct work_struct *work) +{ + struct drm_device *dev = + container_of(work, struct drm_device, flip_work); + struct drm_pending_flip *f, *t; + u32 frame; + + mutex_lock(&dev->struct_mutex); + + list_for_each_entry_safe(f, t, &dev->flip_list, link) { + frame = drm_vblank_count(dev, f->pipe); + if (vblank_passed(frame, f->frame)) + drm_finish_pending_flip(dev, f, frame); + } + + mutex_unlock(&dev->struct_mutex); +} + static void vblank_disable_fn(unsigned long arg) { struct drm_device *dev = (struct drm_device *)arg; @@ -161,6 +200,8 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs) atomic_set(&dev->vblank_refcount[i], 0); } + INIT_LIST_HEAD(&dev->flip_list); + INIT_WORK(&dev->flip_work, drm_flip_work_func); dev->vblank_disable_allowed = 0; return 0; @@ -626,5 +667,6 @@ void drm_handle_vblank(struct drm_device *dev, int crtc) { atomic_inc(&dev->_vblank_count[crtc]); DRM_WAKEUP(&dev->vbl_queue[crtc]); + schedule_work(&dev->flip_work); } EXPORT_SYMBOL(drm_handle_vblank); diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index fc4b68a..322b0f2 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -203,6 +203,7 @@ static struct drm_driver driver = { .mmap = drm_gem_mmap, .poll = drm_poll, .fasync = drm_fasync, + .read = drm_read, #ifdef CONFIG_COMPAT .compat_ioctl = i915_compat_ioctl, #endif diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 508838e..291c067 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -972,12 +972,12 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y, I915_READ(dspbase); } - intel_wait_for_vblank(dev); - if (old_fb) { intel_fb = to_intel_framebuffer(old_fb); + intel_wait_for_vblank(dev); i915_gem_object_unpin(intel_fb->obj); } + mutex_unlock(&dev->struct_mutex); if (!dev->primary->master) @@ -2840,6 +2840,7 @@ static const struct drm_crtc_funcs intel_crtc_funcs = { .gamma_set = intel_crtc_gamma_set, .set_config = drm_crtc_helper_set_config, .destroy = intel_crtc_destroy, + .set_base = drm_crtc_helper_set_base, }; @@ -2852,7 +2853,7 @@ static void intel_crtc_init(struct drm_device *dev, int pipe) if (intel_crtc == NULL) return; - drm_crtc_init(dev, &intel_crtc->base, &intel_crtc_funcs); + drm_crtc_init(dev, &intel_crtc->base, pipe, &intel_crtc_funcs); drm_mode_crtc_set_gamma_size(&intel_crtc->base, 256); intel_crtc->pipe = pipe; @@ -3071,9 +3072,18 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb, return drm_gem_handle_create(file_priv, object, handle); } +static void intel_user_framebuffer_unpin(struct drm_framebuffer *fb) +{ + struct intel_framebuffer *intel_fb; + + intel_fb = to_intel_framebuffer(fb); + i915_gem_object_unpin(intel_fb->obj); +} + static const struct drm_framebuffer_funcs intel_fb_funcs = { .destroy = intel_user_framebuffer_destroy, .create_handle = intel_user_framebuffer_create_handle, + .unpin = intel_user_framebuffer_unpin }; int intel_framebuffer_create(struct drm_device *dev, diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 3efcf1a..4d73f0b 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -171,6 +171,7 @@ static const struct drm_crtc_funcs radeon_crtc_funcs = { .gamma_set = radeon_crtc_gamma_set, .set_config = drm_crtc_helper_set_config, .destroy = radeon_crtc_destroy, + .set_base = drm_crtc_helper_set_base, }; static void radeon_crtc_init(struct drm_device *dev, int index) @@ -183,7 +184,7 @@ static void radeon_crtc_init(struct drm_device *dev, int index) if (radeon_crtc == NULL) return; - drm_crtc_init(dev, &radeon_crtc->base, &radeon_crtc_funcs); + drm_crtc_init(dev, &radeon_crtc->base, index, &radeon_crtc_funcs); drm_mode_crtc_set_gamma_size(&radeon_crtc->base, 256); radeon_crtc->crtc_id = index; diff --git a/include/drm/drm.h b/include/drm/drm.h index 7cb50bd..c4986c0 100644 --- a/include/drm/drm.h +++ b/include/drm/drm.h @@ -686,6 +686,7 @@ struct drm_gem_open { #define DRM_IOCTL_MODE_GETFB DRM_IOWR(0xAD, struct drm_mode_fb_cmd) #define DRM_IOCTL_MODE_ADDFB DRM_IOWR(0xAE, struct drm_mode_fb_cmd) #define DRM_IOCTL_MODE_RMFB DRM_IOWR(0xAF, unsigned int) +#define DRM_IOCTL_MODE_PAGE_FLIP DRM_IOW(0xB0, struct drm_mode_page_flip) /** * Device specific ioctls should only be in their respective headers @@ -698,6 +699,30 @@ struct drm_gem_open { #define DRM_COMMAND_BASE 0x40 #define DRM_COMMAND_END 0xA0 +/** + * Header for events written back to userspace on the drm fd. The + * type defines the type of event, the length specifies the total + * length of the event (including the header), and user_data is + * typically a 64 bit value passed with the ioctl that triggered the + * event. A read on the drm fd will always only return complete + * events, that is, if for example the read buffer is 100 bytes, and + * there are two 64 byte events pending, only one will be returned. + */ +struct drm_event { + __u32 type; + __u32 length; +}; + +#define DRM_EVENT_MODE_PAGE_FLIP 0x01 + +struct drm_event_page_flip { + struct drm_event base; + __u64 user_data; + __u32 tv_sec; + __u32 tv_usec; + __u32 frame; +}; + /* typedef area */ #ifndef __KERNEL__ typedef struct drm_clip_rect drm_clip_rect_t; diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 45b67d9..4ff43ab 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -402,6 +402,14 @@ struct drm_buf_entry { struct drm_freelist freelist; }; +/* Event queued up for userspace to read */ +struct drm_pending_event { + struct drm_event *event; + struct list_head link; + struct drm_file *file_priv; + void (*destroy) (struct drm_pending_event *event); +}; + /** File private data */ struct drm_file { int authenticated; @@ -425,6 +433,9 @@ struct drm_file { struct drm_master *master; /* master this node is currently associated with N.B. not always minor->master */ struct list_head fbs; + + wait_queue_head_t event_wait; + struct list_head event_list; }; /** Wait queue */ @@ -873,6 +884,16 @@ struct drm_minor { struct drm_mode_group mode_group; }; +struct drm_pending_flip { + struct drm_pending_event pending_event; + struct drm_framebuffer *old_fb; + struct drm_crtc *crtc; + u32 frame; + int pipe; + struct list_head link; + struct drm_event_page_flip event; +}; + /** * DRM device structure. This structure represent a complete card that * may contain multiple heads. @@ -972,6 +993,13 @@ struct drm_device { u32 max_vblank_count; /**< size of vblank counter register */ + struct work_struct flip_work; + + /** + * List of objects waiting on flip completion + */ + struct list_head flip_list; + /*@} */ cycles_t ctx_start; cycles_t lck_start; @@ -1108,6 +1136,8 @@ extern int drm_lastclose(struct drm_device *dev); extern int drm_open(struct inode *inode, struct file *filp); extern int drm_stub_open(struct inode *inode, struct file *filp); extern int drm_fasync(int fd, struct file *filp, int on); +extern ssize_t drm_read(struct file *filp, char __user *buffer, + size_t count, loff_t *offset); extern int drm_release(struct inode *inode, struct file *filp); /* Mapping support (drm_vm.h) */ @@ -1274,6 +1304,8 @@ extern void drm_vblank_pre_modeset(struct drm_device *dev, int crtc); extern void drm_vblank_post_modeset(struct drm_device *dev, int crtc); extern int drm_modeset_ctl(struct drm_device *dev, void *data, struct drm_file *file_priv); +extern void drm_finish_pending_flip(struct drm_device *dev, + struct drm_pending_flip *f, u32 frame); /* AGP/GART support (drm_agpsupport.h) */ extern struct drm_agp_head *drm_agp_init(struct drm_device *dev); diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h index 7300fb8..0b5dc47 100644 --- a/include/drm/drm_crtc.h +++ b/include/drm/drm_crtc.h @@ -238,6 +238,12 @@ struct drm_display_info { }; struct drm_framebuffer_funcs { + /* + * Unpin the old fb after setting a mode. Must be called + * after the old framebuffer is no longer visible, ie, after + * the next vblank, typically. + */ + void (*unpin)(struct drm_framebuffer *fb); void (*destroy)(struct drm_framebuffer *framebuffer); int (*create_handle)(struct drm_framebuffer *fb, struct drm_file *file_priv, @@ -288,6 +294,7 @@ struct drm_property { struct drm_crtc; struct drm_connector; struct drm_encoder; +struct drm_pending_flip; /** * drm_crtc_funcs - control CRTCs for a given device @@ -331,17 +338,29 @@ struct drm_crtc_funcs { void (*destroy)(struct drm_crtc *crtc); int (*set_config)(struct drm_mode_set *set); + + /* + * Move the crtc on the current fb to the given position. + * This function is optional. If old_fb is provided, the + * function will wait for vblank and unpin it. If old_fb is + * NULL, nothing is unpinned and the caller must call + * mode_unpin_fb to release the old framebuffer. + */ + int (*set_base)(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb); }; /** * drm_crtc - central CRTC control structure * @enabled: is this CRTC enabled? + * @pipe: pipe number (as seen by DRM vblank functions) * @x: x position on screen * @y: y position on screen * @desired_mode: new desired mode * @desired_x: desired x for desired_mode * @desired_y: desired y for desired_mode * @funcs: CRTC control functions + * @async_work: work queue for async set base calls * * Each CRTC may have one or more connectors associated with it. This structure * allows the CRTC to be controlled. @@ -359,6 +378,7 @@ struct drm_crtc { struct drm_display_mode mode; + int pipe; int x, y; struct drm_display_mode *desired_mode; int desired_x, desired_y; @@ -368,6 +388,10 @@ struct drm_crtc { uint32_t gamma_size; uint16_t *gamma_store; + /* Allow async set_pipe_base calls for flipping */ + struct work_struct async_flip; + struct drm_pending_flip *pending_flip; + /* if you are using the helper */ void *helper_private; }; @@ -589,6 +613,7 @@ struct drm_mode_config { extern void drm_crtc_init(struct drm_device *dev, struct drm_crtc *crtc, + int pipe, const struct drm_crtc_funcs *funcs); extern void drm_crtc_cleanup(struct drm_crtc *crtc); @@ -736,4 +761,6 @@ extern int drm_mode_gamma_get_ioctl(struct drm_device *dev, extern int drm_mode_gamma_set_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); extern bool drm_detect_hdmi_monitor(struct edid *edid); +extern int drm_mode_page_flip_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_priv); #endif /* __DRM_CRTC_H__ */ diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h index 6769ff6..dd10566 100644 --- a/include/drm/drm_crtc_helper.h +++ b/include/drm/drm_crtc_helper.h @@ -123,4 +123,8 @@ static inline void drm_connector_helper_add(struct drm_connector *connector, } extern int drm_helper_resume_force_mode(struct drm_device *dev); + +extern int drm_crtc_helper_set_base(struct drm_crtc *crtc, int x, int y, + struct drm_framebuffer *old_fb); + #endif diff --git a/include/drm/drm_mode.h b/include/drm/drm_mode.h index ae304cc..464b779 100644 --- a/include/drm/drm_mode.h +++ b/include/drm/drm_mode.h @@ -265,4 +265,20 @@ struct drm_mode_crtc_lut { __u64 blue; }; +#define DRM_MODE_PAGE_FLIP_WAIT (1<<0) /* block on previous page flip */ +#define DRM_MODE_PAGE_FLIP_FLAGS_MASK (DRM_MODE_PAGE_FLIP_WAIT) + +struct drm_mode_page_flip { + /** Handle of new front buffer */ + __u32 fb_id; + __u32 crtc_id; + + /* 64 bit cookie returned to userspace in the page flip event. */ + __u64 user_data; + /** + * page flip flags (wait on flip only for now) + */ + __u32 flags; +}; + #endif -- 1.6.4 |
From: Jakob B. <wal...@gm...> - 2009-05-09 00:34:33
|
2009/5/8 Kristian Høgsberg <kr...@bi...>: > From: Kristian Høgsberg <kr...@re...> > > This patch adds a vblank synced pageflip ioctl for to the modesetting > family of ioctls. The ioctl takes a crtc and an fb and schedules a > pageflip to the new fb at the next coming vertical blank event. This > feature lets userspace implement tear-free updating of the screen contents > with hw-guaranteed low latency page flipping. > > The ioctl is asynchronous in that it returns immediately and then later > notifies the client by making an event available for reading on the drm fd. > This lets applications add the drm fd to their main loop and handle other > tasks while waiting for the flip to happen. The event includes the time > of the flip, the frame counter and a 64 bit opaque token provided by > user space in the ioctl. Great idea with the poll/read stuff. I think you have the userspace <-> kernel part pretty much all figured out now. [SNIP] > + > +/** > + * drm_mode_page_flip - page flip ioctl > + * @dev: DRM device > + * @data: ioctl args > + * @file_priv: file private data > + * > + * The page flip ioctl replaces the current front buffer with a new one, > + * using the CRTC's mode_set_base function, which should just update > + * the front buffer base pointer. It's up to mode_set_base to make > + * sure the update doesn't result in tearing (on some hardware the base > + * register is double buffered, so this is easy). > + * > + * Note that this covers just the simple case of flipping the front > + * buffer immediately. Interval handling and interlaced modes have to > + * be handled by userspace, or with new ioctls. > + */ > +int drm_mode_page_flip(struct drm_device *dev, void *data, > + struct drm_file *file_priv) This function should have _ioctl appended to it. It should also go into the drm_crtc.c file. And should not use drm_crtc_helper functions and structs since those are not apart of the core kms implementation. Also I rather see as much as possible of the code below be put in the generic part and not just wrap this functions in a small function. [SNIP] > + /* > + * The mode_set_base call will change the domain on the new > + * fb, which will force the rendering to finish and block the > + * ioctl. We need to do this last part from a work queue, to > + * avoid blocking userspace here. > + */ > + crtc->fb = obj_to_fb(fb_obj); > + ret = crtc_funcs->mode_set_base(crtc, 0, 0, NULL); We should not block on rendering in the ioctl but do the call from a worker thread. And you probably need one thread per crtc. Also holding the struct_mutex while waiting for rendering might blow up in your face big time if the rendering command takes a long while to complete. [SNIP] > + > + /* > + * Unpin the old fb after setting a mode. Must be called > + * after the old framebuffer is no longer visible, ie, after > + * the next vblank, typically. > + */ > + void (*mode_unpin_fb)(struct drm_crtc *crtc, struct drm_framebuffer *fb); I was going to say that should probably go on the framebuffer functions but the framebuffer doesn't have any helper functions. Cheers Jakob. |
From: Jesse B. <jb...@vi...> - 2009-05-11 21:33:04
|
On Fri, 8 May 2009 17:07:01 -0400 Kristian Høgsberg <kr...@bi...> wrote: ... > + /* > + * The mode_set_base call will change the domain on the new > + * fb, which will force the rendering to finish and block the > + * ioctl. We need to do this last part from a work queue, to > + * avoid blocking userspace here. > + */ > + crtc->fb = obj_to_fb(fb_obj); > + ret = crtc_funcs->mode_set_base(crtc, 0, 0, NULL); > + if (ret) { > + DRM_ERROR("mode_set_base failed: %d\n", ret); > + goto out_unlock; > + } Good stuff. I think this is the last remaining issue with the kernel side: the call to mode_set_base needs to be async so we don't block waiting for the GTT transition (and thus rendering) to complete before returning. I think you've taken care of that in a subsequent patch though? The one gotcha with doing that is that mode_set_base needs the struct mutex to be held while it does the GTT domain transition. execbuf need struct mutex too though; so I'm worried that we may end up hurting rendering performance with refresh rate flips (I haven't measured this at all though, so I don't really know). If it does end up being a problem we may have to find a way to do the transition outside the protection of struct mutex... > -/** No-op. */ > +ssize_t drm_read(struct file *filp, char __user *buffer, > + size_t count, loff_t *offset) > +{ I like the event stuff too; I can imagine it being useful for other GL features as well, like vblank waits. -- Jesse Barnes, Intel Open Source Technology Center |