Thread: Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

Status: Beta

Brought to you by: brianp, daryll, faith, jensowen, and 3 others

dri-devel

Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

From: Keith P. <ke...@ke...> - 2001-02-28 22:32:39

Around 23 o'clock on Feb 28, Malte Cornils wrote:
> 
> About the Xrender support, shouldn't this already work in 2D mach64
> from Xfree86.org CVS? I see this entry in the CHANGELOG:

Yes, I added that post 4.0.2.  Render works on all of the supported ATI
cards.

>   68. Disable RENDER extension support in the ATI(misc) driver when
> mibank
>       and/or shadowfb is used (Marc La France).

This is because those modes don't use the fb frame buffer driver (I 
think), and so don't have any code to do the rendering.  This could be 
fixed, but I'm not touching the it (Marc is somewhat defensive of his 
code...)

ke...@ke...	 XFree86 Core Team		SuSE, Inc.

Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

From: Keith P. <ke...@ke...> - 2001-03-01 06:18:34

Around 21 o'clock on Feb 28, William Blew wrote:
> I guess I wasn't clear enough. I was thinking about thinking about adding
> some degree of (3D engine equipped mach64 specific) accelerated alpha
> blending... ;-)  There is example code in the mach64 programmer's guide
> after all...

That's pretty easy; the MGA driver has some examples of using the XAA 
Render acceleration infrastructure.  Mark Vojkovich designed some simple
interfaces so we could experiment with stuff; expect future additions that 
accelerate much more.  In particular, text is still a bit pokey because of 
deeply layered implementation which takes a long time to get down to the 
final rendering step.

Another place to look at is the hw/kdrive/trident directory which shows how
to accelerate the Trident Cyber 9525 DVD at about the same level but
without using XAA.

You might see whether it would be possible to accelerate per-primary alpha
compositing with that chip, the Mach64 is quite popular in recent laptops,
in particular the Sony Transmeta based machine uses a Rage Mobility of some
variety.  I've had complaints from people with that machine about text 
rendering performance.

ke...@ke...	 XFree86 Core Team		SuSE, Inc.

Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

From: William B. <wb...@ho...> - 2001-03-02 09:37:34

On Wed, 28 Feb 2001, Keith Packard wrote:

> You might see whether it would be possible to accelerate per-primary alpha
> compositing with that chip, the Mach64 is quite popular in recent laptops,
> in particular the Sony Transmeta based machine uses a Rage Mobility of some
> variety.  I've had complaints from people with that machine about text
> rendering performance.

Just to ensure that I understand what you mean by 'per-primary' alpha
compositing. Given the following composition operation:

	dest = (source IN mask) OP dest

We are talking about the (source IN mask) term, right?

Herein I am assuming that uniform alpha compositing's IN applies the
mask's alpha channel to all source channels (RGBA) while the per-primary
IN applies each mask channel to its source counterpart.

That said, uniform and per-primary alpha can both be accomplished with any
mach64 hardware that includes 3D support with a two stage alpha blending
operation (I think).

This can be done by using the 3D engine's front side scalar's alpha
blending feature. Key mach64 controls are ALPHA_BLND_SAT (Saturation
PictOp), ALPHA_BLND_SRC and ALPHA_BLND_DST, all in the SCALE_3D_CNTL
register.

So we implement two variations of a two stage alpha blend operation:

1) ensure src and dest are buffered pixmaps

2) ensure that mask is a discardable buffered pixmap

3.1) uniform alpha blend: src(SRC) blend mask(DST)
	BLND_SRC: 6 (Ad,Ad,Ad)
	BLND_DST: 0 (0,0,0)

3.2) per-primary alpha blend: src(SRC) blend mask(DST)
	BLND_SRC: 2 (Rd,Gd,Bd)
	BLND_DST: 0 (0,0,0)

4) composit operator: mask(SRC) blend dest(DST)
   (just xref Render's spec against mach64's register reference)
	BLND_SRC and BLND_DST are setup as per PictOp

5) discard the buffered mask pixbuf as its been hosed

Furthermore, I think that perhaps (more reading is needed) the mach64GT
can achieve both tiled and scaled composition.

Only one problem: I am not certain what happens to the destination's alpha
channel (RGBA8888 or RGBA5551). The SRC and DEST blending functions don't
document how the alpha channel is handled [except for Saturate where the
SRC blend factor is (f,f,f,1) where f = min(As,(1-Ad)].

Is the alpha term similarly handled by the other source blending cases? or
is it analogous to the other channel's terms in the blending functions?

Any opinion on what's likely? or known to occur? Given my newness to 3D
acceleration hardware, I'm not certain what I can safely assume...

Perhaps some experiments are in order...
-- 
William Blew, wb...@ho...
Gamer by Choice, Geek by Birth

Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

From: Keith P. <ke...@ke...> - 2001-03-02 10:04:24

Around 1 o'clock on Mar 2, William Blew wrote:
> On Wed, 28 Feb 2001, Keith Packard wrote:
> 
> > You might see whether it would be possible to accelerate per-primary alpha
> > compositing with that chip, the Mach64 is quite popular in recent laptops,
> > in particular the Sony Transmeta based machine uses a Rage Mobility of some
> > variety.  I've had complaints from people with that machine about text
> > rendering performance.
> 
> Just to ensure that I understand what you mean by 'per-primary' alpha
> compositing. Given the following composition operation:
> 
> 	dest = (source IN mask) OP dest
> 
> We are talking about the (source IN mask) term, right?

Yes, but it commutes into the OP dest part as well; you must carry four
alpha values along with four pixel values.  Here's an example (all four
components are exactly the same (even alpha (the joys of premultiplied
pixels))):

	dst = (src IN msk) OVER dst

Normal compositing:

	dst_r = src_r * msk_a + dst_r * (1 - src_a * msk_a)
	dst_g = src_g * msk_a + dst_g * (1 - src_a * msk_a)
	dst_b = src_b * msk_a + dst_b * (1 - src_a * msk_a)
	dst_a = src_a * msk_a + dst_a * (1 - src_a * msk_a)

Per component compositing:

	dst_r = src_r * msk_r + dst_r * (1 - src_a * msk_r);
	dst_g = src_g * msk_g + dst_g * (1 - src_a * msk_g);
	dst_b = src_b * msk_b + dst_b * (1 - src_a * msk_b);
	dst_a = src_a * msk_a + dst_a * (1 - src_a * msk_a);

The key distinction is that the mask alpha value used in both parts of the 
sum is per-component.  The sympatic Plan9 function extends to this case 
without even blinking, but I don't think hardware can do this.

> Herein I am assuming that uniform alpha compositing's IN applies the
> mask's alpha channel to all source channels (RGBA) while the per-primary
> IN applies each mask channel to its source counterpart.

But you must also apply each mask channel in computing the contribution of 
the original dst channel.

> Only one problem: I am not certain what happens to the destination's alpha
> channel (RGBA8888 or RGBA5551). The SRC and DEST blending functions don't
> document how the alpha channel is handled [except for Saturate where the
> SRC blend factor is (f,f,f,1) where f = min(As,(1-Ad)].

OpenGL says that the dest alpha value gets the computed alpha; certainly 
this is true for Saturate (otherwise it wouldn't work).  Of course, at 
1555, there's not much space for the dest alpha, but in that case, Render 
allows a separate alpha channel to hold the result (not that I've coded 
the separate alpha channel, but the protocol allows it).

> Any opinion on what's likely? or known to occur? Given my newness to 3D
> acceleration hardware, I'm not certain what I can safely assume...

Because Saturate must store the correct value, and because OpenGL spec 
says the alpha value out of the composite operator is computed and saved, 
I believe the hardware will be forced to do the right thing.  Suitable 
experiements could quickly determine this.

ke...@ke...	 XFree86 Core Team		SuSE, Inc.

Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

From: William B. <wb...@ho...> - 2001-03-02 17:07:17

On Fri, 2 Mar 2001, Keith Packard wrote:

> Yes, but it commutes into the OP dest part as well; you must carry four
> alpha values along with four pixel values.  Here's an example (all four
> components are exactly the same (even alpha (the joys of premultiplied
> pixels))):
>
> 	dst = (src IN msk) OVER dst
>
> Normal compositing:
>
> 	dst_r = src_r * msk_a + dst_r * (1 - src_a * msk_a)
> 	dst_g = src_g * msk_a + dst_g * (1 - src_a * msk_a)
> 	dst_b = src_b * msk_a + dst_b * (1 - src_a * msk_a)
> 	dst_a = src_a * msk_a + dst_a * (1 - src_a * msk_a)

Well, the mach64 will do this one (with only 4 bits of significance).

> Per component compositing:
>
> 	dst_r = src_r * msk_r + dst_r * (1 - src_a * msk_r);
> 	dst_g = src_g * msk_g + dst_g * (1 - src_a * msk_g);
> 	dst_b = src_b * msk_b + dst_b * (1 - src_a * msk_b);
> 	dst_a = src_a * msk_a + dst_a * (1 - src_a * msk_a);
>
> The key distinction is that the mask alpha value used in both parts of the
> sum is per-component.  The sympatic Plan9 function extends to this case
> without even blinking, but I don't think hardware can do this.

This one will require more thought... One question; what is the difference
between a (Rs,Gs,Bs,As) DEST blend function and a (As,As,As,As) DEST blend
function when we are talking about per-primary alpha compositing? (in the
uniform case the answer's seems obvious), Perhaps some degree of function
substitution is possible...

> > Only one problem: I am not certain what happens to the destination's alpha
> > channel (RGBA8888 or RGBA5551). The SRC and DEST blending functions don't
> > document how the alpha channel is handled [except for Saturate where the
> > SRC blend factor is (f,f,f,1) where f = min(As,(1-Ad)].
>
> OpenGL says that the dest alpha value gets the computed alpha; certainly
> this is true for Saturate (otherwise it wouldn't work).  Of course, at
> 1555, there's not much space for the dest alpha, but in that case, Render
> allows a separate alpha channel to hold the result (not that I've coded
> the separate alpha channel, but the protocol allows it).

Separate alpha channel? Ouch, the mind boggles at doing that trick in
hardware...

> Because Saturate must store the correct value, and because OpenGL spec
> says the alpha value out of the composite operator is computed and saved,
> I believe the hardware will be forced to do the right thing.  Suitable
> experiements could quickly determine this.

Cool. I think its time to write a little experimental code... for the
uniform case at least *grin*.

PS: I hope you have a nicer weekend! Here in Vancouver BC its raining...
--
William Blew, wb...@ho...
Gamer by Choice, Geek by Birth

Re: OT: Render extension was: Re: [Dri-devel] Debugging Mach64

From: Malte C. <ma...@co...> - 2001-02-28 23:18:34

Keith Packard wrote:
> [Render extension]
> Yes, I added that post 4.0.2.  Render works on all of the supported ATI
> cards.

I found it, entry #7 after 4.0.2 went out :-) I'm a happy camper now
(well, I will be when the compilation finishes. I need a new PC..)

> >   68. Disable RENDER extension support in the ATI(misc) driver when
> > mibank
> >       and/or shadowfb is used (Marc La France).
> 
> This is because those modes don't use the fb frame buffer driver (I
> think), and so don't have any code to do the rendering.  

ok, that's good enough for me (while it conflicts with mach64 DRI
right now, I can switch if the need arises)! Thanks for enabling
anti-aliasing and alpha on X...

-Malte #8-)