Andreas Stenglein wrote:
> Am 2003.06.23 19:12:01 +0200 schrieb(en) Keith Whitwell:
>>Ian Romanick wrote:
>>>As another data point, I have attached my very old patch to enable the
>>>3rd TMU on Radeon. IIRC, it worked w/HW TCL, vtxfmt, & codegen. It is
>>>now quite outdated. There were a couple of reasons I did not commit any
> thanks a lot, it seems I have missed some other bits, for example the
> fallback in radeon_compat.
I knew there was a reason I kept that code around. :)
>>>1. A lot of it (i.e., calculate_max_texture_levels) would be superseded
>>>by the texmem branch (which has now been merged to the trunk).
>>>2. Enabling the 3rd TMU can drastically reduce the maximum available
>>>texture size on some memory configurations. This is even more
>>>significant on the R200 which has 6 TMUs.
> On my 64MB Radeon7500 the max texture size is 2048 with 2TMU and
> 1024 with 3TMUs. It shouldnt be a problem for 32MB versions as long es
> the max texture size is as big as opengl requires as minimum (256?)
> If you really dont need the 3rd TMU, you can switch support off via
> environment variable.
> And we could switch the 3rd TMU off automagically if the resulting
> max texture size gets to small, then recalculate it. But I doubt there
> are 16MB Radeons available.
> For the R200: I think only 64MB and 128MB versions exist. And we can
> make it to support a max. of 2, 4 or 6 TMUs by envvar or via the
> upcoming config-stuff.
> Most programs/games make use of at least 2 TMUs, newer ones use 4,
> maybe 3, and some even 6 TMUs if they are available.
For mobile systems, there are M6 chips with as little as 8MB. I think
there are mobile R200 derrived chips with as little as 16MB or 32MB.
I'm not 100% sure on that, though.
From an application perspective, why should we penalize texture quality
across the board for worst-case situations that may or may not ever
happen? Like I said before, how frequently will an application try to
bind a 2048x2048 texture to all of the available texture units? If the
answer is never, why should the app be forced to use 512x512 textures
for everything just so that we can be "safe"?
This may be a place where we should use a config option. A slider for
selecting the maximum texture size should work.
>>>3. There are some problems with some fast-pathing in the vtxfmt code.
>>>The code assumes that the allowable range for 'target' (see
>>>radeon_vtxfmt_c.c, line 542) is a power of two. If an app calls
>>>glMultiTexCoord2fv with a target of 3 (assuming the mask value is
>>>changed from 1 to 3), the driver will explode.
> I tried to just allocate a dummy (texcoordptr) and let it
> point to tex0 or so.
> A modified multiarb.c which used 4 TMUs even if the driver doesnt
> support it didnt at least crash.
That was the sollution I had thought of, too.
>>>4. A similar problem to #3 exists with the codegen path. The fast paths
>>>selected in radeon_makeX86MultiTexCoord2fvARB (see radeon_vtxfmt_x86.c,
>>>line 354) and friends may not be expandable to the 3 (or 6 for R200) TMU
> the dummy should help in this case, too.
I don't think it helps here because the texcoord data is treated as a
simple array. The assumption is that texcoordptr[x*n] is the same as
texcoordptr[x]. We can't put any padding in the vertex buffer itself.
The hardware won't let us.
I think the best bet is to use the fast path exactly as it is used now
(if exactly TMU 0 & 1 are enabled) and use the slower path otherwise.
One R200 we could also use the fast path if TMU 0 & 1 or 0 & 1 & 2 & 3
are enabled. I'm not sure if they payoff would be worth the effort, though.
>>At worst a test can be used in this code. If there's no sane way to avoid it,
>>we have to do it & that's that.
>>>The first issue is a non-issue now. My original intention, before
>>>discovering the second issue, was to "merge" the patch after merging the
>>>texmem branch. It turns out that it took much longer to make the branch
>>>mergable than initiallly anticipated.
>>>I think we're going to have to wrestle with the second issue at some
>>>point. When the next round of texmem work is complete, we won't be able
>>>to predict apriori how big the largest texture set can be. Even now, I
>>>find it unlikely that on an R200 there would be 6 2048x2048 cube maps
>>>(the worst case) bound at any time. This renders the current
>>>calculation somewhat bogus to begin with. It seems that the existing
>>>closed-source drivers just advertise the hardware maximum in all cases.
>>>If the hardware maximum is advertised, then an app could bind a set of
>>>textures that can't fit in memory at once. The driver would then have
>>>to fallback to software. I believe the open-source drivers used to
>>>function this way, but doing so caused problems with Quake2. I'm really
>>>not sure what the right sollution is.
>>Correct - and in fact they still should function this way if the situation
>>somehow arises that the bound textures can't all be uploaded.
> Or adding a fallback to multipass-rendering with 2 TMUs before
> fallback to sw-rendering. That might get a bit tricky.
Tricky isn't the word. Down right horrible is the word! I thought
about this once WRT implementing some missing parts of
ARB_texture_env_combine for MGA. The interactions with stencil-buffer,
depth-buffer, and other subtle bits made my head hurt.