Keith Whitwell wrote:
> Thomas Hellstr=F6m wrote:
>
>> Keith Whitwell wrote:
>>
>>> Thomas Hellstr=F6m wrote:
>>>
>>>> Hi!
>>>>
>>>> Keith Whitwell wrote:
>>>>
>>>>>
>>>>> get lock
>>>>> while (timestamp mismatch) {
>>>>> release lock
>>>>> request new cliprects and timestamp
>>>>> get lock
>>>>> }
>>>>>
>>>>> Note that is the contended case only. What's the worst that could=20
>>>>> happen - somebody's wizzing windows around and our 3d client sits=20
>>>>> in this loop for the duration. Note that the loop includes X=20
>>>>> server communication so it's not going to suck up the cpu or=20
>>>>> anything drastic.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This is basically what I'm doing right now. The problem is as the=20
>>>> code continues:
>>>>
>>>>
>>>> get lock
>>>> while (timestamp mismatch) {
>>>> release lock
>>>> request new cliprects and timestamp
>>>> get lock
>>>> }
>>>> wait_for_device()
>>>> render_to_scale_buffer()
>>>> wait_for_device()
>>>> render_to_back_buffer()
>>>> wiat_for_device()
>>>> blit_to_screen()
>>>> release_lock()
>>>>
>>>> And, to avoid holding the lock while waiting for the device, since=20
>>>> that blocks use of the decoder while I'm doing scaling operations,=20
>>>> I'd like to
>>>>
>>>> mark_scaling_device_busy()
>>>> get_drawable_lock()
>>>> get lock
>>>> while (timestamp mismatch) {
>>>> release lock
>>>> release_drawable_lock()
>>>> request new cliprects and timestamp
>>>> get_drawable_lock
>>>> get lock
>>>> }
>>>> release_lock()
>>>> wait_for_device()
>>>> get_lock()
>>>> render_to_scale_buffer()
>>>> release_lock()
>>>> wait_for_device()
>>>> get_lock()
>>>> render_to_back_buffer()
>>>> release_lock()
>>>> wait_for_device()
>>>> get_lock()
>>>> blit_to_screen()
>>>> release_lock()
>>>> mark_scaling_device_free()
>>>
>>>
>>>
>>>
>>> And then release_drawable_lock()?
>>>
>>> What semantics are you hoping for from the drawable lock in your=20
>>> scenario above? Just that the cliprects won't change while it is hel=
d?
>>
>>
>>
>> Exactly on both points, except the drawable_lock would have to be=20
>> released before mark_scaling_device_free() to avoid deadlocks.
>
>
>
> So a few more questions:
>
> 1) Why (exactly) is keeping the cliprects from changing a concern? =20
> What happens if they change between steps above?
Not much really. It all boils down to what to do if the per-drawable=20
back-buffer mismatches the drawable. In the simplest case one would=20
simply skip the blit, which might even be better than an attempt to=20
match the old backbuffer to a new drawable size. Problems really only=20
occur if / when the drawable is resized. But I've considered this and=20
since it's a simple, still working but not perfect solution, I'm still=20
considering it.
>
> 2) Could the DDX driver blit the contents of these additional buffers=20
> (scale, back) at the same time it blits the frontbuffer so that the=20
> window change "just works"?
>
You mean the front blitting during window moves? In this case it doesn't=20
relly apply, since the per-drawable back buffer would still be valid.=20
Resizing would be the only operation causing problems.
> 3) I don't think that the drawable lock is a pretty thing, is it worth=20
> keeping it around for this? Would some black areas or incomplete=20
> video frames during window moves be so bad? Note that the next=20
> version of this hardware might have a proper command stream that just=20
> allows you to submit all those operations to hardware in a single go,=20
> and not have to do the waiting in the driver...
>
I can see your point, but on the other hand if the command stream were=20
that smart, it would only in effect implement a continously held=20
heavyweight lock, blocking all dma submissions to the mpeg decoder and=20
2D / 3D engine while the scaling engine is working, which is exactly=20
what I'm trying to avoid.
The problem is really what to do when there are a lot of independent=20
engines on a video chip, with a common command stream, numerous IRQ=20
sources and one global hardware lock. I assume this will be more of a=20
problem in the future. The solution using the drawable lock is not very=20
clean. On the other hand, not being able no to use the engines in=20
parallel is not very efficient and is bad for interactivity.
I'm not sure what's the best design to solve this, but one idea would be=20
having a futex-like lock and a "breadcrumb pool" for each engine,=20
optionally also with an IRQ. This would be sufficient to
* be able to independently submit DMA commands.
* wait for engine idle independently on each engine without ever
needing to wait for DMA quiescent.
* hold the global hardware lock only during operations that render
directly to the front buffer, or to a common back-buffer. The
global lock would then effectively be a "drawable" lock.
* Keep backwards compatibility, as simple architectures may choose
to retain only the global lock.
Hmm, maybe for now I'll stick to the simple solution, :)
But I think a design that works around the=20
single-lock-and-command-stream-multiple-engines would bee needed in the=20
not too far future.
/Thomas
>
> Keith
|