Ingo Molnar wrote:
> On Fri, 30 Aug 2002, Keith Whitwell wrote:
>>It's a ring, with head & tail pointer in mmio registers and possibly
>>written back to main memory automatically by the card. However
>>userspace never really gets to see the ring, but builds up buffers of
>>commands and vertices which are dispatched to the ring by the kernel
> one more (probably stupid) question: you enter kernel mode (via the ioctl)
> only if some sort of exceptional thing occurs, right - or do you enter it
> for every GX op posted to the card? I was assuming that userspace had
> direct access to the command ring [thus posting a GX op was a
> userspace-only thing], but your mail made me unsure about this.
For security reasons the kernel has to be involved. The userspace driver
builds a buffer of commands (which need to be vetted) and popluates agp
buffers directly with vertex data (which is safe). At a later point, the lock
is grabbed and an ioctl is called to fire the accumulated commands.
>>Correct. We're looping because there previously was no choice but to
>>poll status registers (or their in-memory shadows) to determine things
>> - Runaway rendering. With a big pipe and small, expensive frames,
>>you can queue up minutes worth of frames before running out of
>>resources. Not good for interactivity, so I limit the nr of outstanding
>>frames to 2. We poll an in-memory value which is written to by the
>>card. This is where previously we had a tight loop, then a loop with
>>usleep(), Linus suggestted sched_yield(), and which now I've got irq's
>>working, I think are probably the best option.
> i'd suggest to use interrupts only if it can be ensured that you will get
> an interrupt only per frame. On current x86 hardware interrupts have a
> typical latency of 10 usecs, 30-50% of that is direct CPU overhead, ie. on
> a 1 GHz CPU it's 3000-5000 cycles - this is just the pure IRQ entry/exit.
Fair enough. This is a good answer to my question.
> if the polling only has to be done to limit work to be at most 1 frame
> away from the current frame (correct?), and if the GPU can interrupt the
> host CPU when encountering a specific gx op (the last operation belonging
> to the current frame?), then the use of interrupts is ideal, the overhead
> will be low and the processes interact with the OS in the nicest possible
> way. The process wont lose any timeslices due to sched_yield() - all
> waiting/wakeup can be nicely done from the DRM kernel code and DRM IRQ
Yep, this is the scheme I'd like.
>> - Shutting down the 3d pipe. For software fallbacks you want to
>>let it drain so you can access the framebuffer directly.
> ok - this is a boundary case anyway, but IRQs should work equally well
>> - Waiting for resources. Driver currently has dinky little 64k
>>dma buffers and needs to keep releasing them and picking them up again.
>>Sometimes it has to wait for one to become free. This is a question
>>mark for me -- if I dropped an irq after releasing each one of these,
>>that would be a vast number of the things generated per second. Part of
>>the trouble is the small size of the buffers relative to the actual
>>volume of traffic to the card... I'm thinking about letting the driver
>>grab a chunk of agp sized according to its needs and letting it organize
>>it's own synchronization, hopefully piggybacking off the irq's for
>>runaway rendering prevention.
> i'd suggest to grab all DMA resources upon module initialization (unless
> it's some unrealistic amount of RAM), this will also make coding easier
> and will make the code faster. You should most definitely avoid any IRQ
> overhead (and even allocation overhead) in this area, unless some hardware
> limit really forces you to do so. I really think that anyone who wants to
> run a r200 based card can afford 16 MB (or more) preallocated DMA space.
Effectively this is what I'm proposing. The drm kernel module already
preallocates it's dma resources, but they are then shared out 64k at a time to
active userspace contexts. I think we'd be better off having the userspace
contexts each grab and hold onto several meg of agp for this each.