From: Andreas S. <A.S...@gm...> - 2003-06-21 12:21:36
Attachments:
tex3_patch1.txt
|
I tried to get the 3rd TMU working on radeon, and with this patch it works at least without hw-TCL for multiarb.c from Mesa/demos. (RADEON_TCL_FORCE_DISABLE=3D1) With hw-TCL the 3rd texture is visible, but isnt rotating. The patch also includes some code for the kernelmodule for cube-textures on radeon, and some comments where I dont know what to do. You can switch off the 3rd TMU with export RADEON_NO_3RD_TMU=3D1 It would be nice if someone with knowledge about TCL could have a look at it. Which programs/demos/games could/should be tested as they make use of the 3rd texture unit? best regards, Andreas |
From: <le...@nt...> - 2003-06-21 13:23:34
|
On Sat, Jun 21, 2003 at 02:14:20PM +0200, Andreas Stenglein wrote: > Which programs/demos/games could/should be tested as > they make use of the 3rd texture unit? There was a patch on the list ages ago (before a lot of the new stuff like TCL / vtxfmt was added) http://marc.theaimsgroup.com/?l=dri-devel&m=101391172220971&w=2 At that time, not much used it at all (i.e quake 3, rtcw etc). Brian changed multiarb to add a 3rd texture to test it IIRC. I don't know if any of the newer games since then will use it, like nwn? -- Michael. |
From: Daniel V. <vo...@ep...> - 2003-06-21 20:23:53
|
> On Sat, Jun 21, 2003 at 02:14:20PM +0200, Andreas Stenglein wrote: > > Which programs/demos/games could/should be tested as > > they make use of the 3rd texture unit? > > There was a patch on the list ages ago (before a lot of > the new stuff like TCL / vtxfmt was added) > > http://marc.theaimsgroup.com/?l=dri-devel&m=101391172220971&w=2 > > At that time, not much used it at all (i.e quake 3, rtcw etc). > Brian changed multiarb to add a 3rd texture to test it IIRC. > > I don't know if any of the newer games since then will use it, like nwn? UT2003 makes use of it. -- Daniel, Epic Games Inc. |
From: Andreas S. <A.S...@gm...> - 2003-06-22 11:55:02
|
Am 2003.06.21 22:23:40 +0200 schrieb(en) Daniel Vogel: > > On Sat, Jun 21, 2003 at 02:14:20PM +0200, Andreas Stenglein wrote: > > > Which programs/demos/games could/should be tested as > > > they make use of the 3rd texture unit? > >=20 > > There was a patch on the list ages ago (before a lot of=20 > > the new stuff like TCL / vtxfmt was added) > >=20 > > http://marc.theaimsgroup.com/?l=3Ddri-devel&m=3D101391172220971&w=3D2 > >=20 > > At that time, not much used it at all (i.e quake 3, rtcw etc).=20 > > Brian changed multiarb to add a 3rd texture to test it IIRC. strange that it didn't got merged. > >=20 > > I don't know if any of the newer games since then will use it, like nwn? >=20 > UT2003 makes use of it. >=20 > -- Daniel, Epic Games Inc. I tried to get the UT2003 linux-demo some weeks ago, but the download faile= d.=20 I also tried some other mirrors: no success. Before starting another attempt: What version do I need:=20 ut2003demo-lnx-2206.sh.bin ? and the UT2003 2225 patch ? Or do I need to get the full game for the latest features and fixes? best regards, Andreas |
From: <le...@nt...> - 2003-06-22 13:12:21
|
On Sun, Jun 22, 2003 at 01:47:44PM +0200, Andreas Stenglein wrote: > Am 2003.06.21 22:23:40 +0200 schrieb(en) Daniel Vogel: > > > On Sat, Jun 21, 2003 at 02:14:20PM +0200, Andreas Stenglein wrote: > > > > Which programs/demos/games could/should be tested as > > > > they make use of the 3rd texture unit? > > > > > > There was a patch on the list ages ago (before a lot of > > > the new stuff like TCL / vtxfmt was added) > > > > > > http://marc.theaimsgroup.com/?l=dri-devel&m=101391172220971&w=2 > > > > > > At that time, not much used it at all (i.e quake 3, rtcw etc). > > > Brian changed multiarb to add a 3rd texture to test it IIRC. > > strange that it didn't got merged. Not really, a TCL branch was started with a load of new paths, which meant the patch was out of date, and as there was nothing that used it (and a bunch of other bugs that did stop things working) I didn't see the point at that time of worrying about it - it was 90% cut and paste of what the driver does with textures 0 and 1 anyway. Also because, with nothing using it, it lowers MaxArrayLockSize for no good reason - dunno what impact that has on anything. -- Michael. |
From: Daniel V. <vo...@ep...> - 2003-06-22 20:52:10
|
> Before starting another attempt: What version do I need: > ut2003demo-lnx-2206.sh.bin ? Correct. > and the UT2003 2225 patch ? > Or do I need to get the full game for the latest features and fixes? You need the full version of the game in order to patch it to version 2225. -- Daniel, Epic Games Inc. |
From: Andreas S. <A.S...@gm...> - 2003-06-22 15:04:40
|
Now I have it working with hw-tcl, too. :) (Just missed a few bits here and there) But its necessary to disable codegen or vtxfmt. (export RADEON_NO_CODEGEN=3D1) greetings, Andreas |
From: Michel <mi...@da...> - 2003-06-23 10:22:31
|
On Sat, 2003-06-21 at 14:14, Andreas Stenglein wrote: > > [...] some comments where I dont know what to do. I'll try to address some of them: > + if( (sPriv->drmMinor < 3) || (getenv( "RADEON_NO_3RD_TMU")) ) /* > question: is the check for drmMinor necessary? */ > + ctx->Const.MaxTextureUnits = 2; > + else > + ctx->Const.MaxTextureUnits = RADEON_MAX_TEXTURE_UNITS; /* 3 */ Does the 3rd TMU require any DRM support not present before that minor version? > + /* question: shouldn't the following be controlled by the > kernelmodule */ > + /* and/or Xserver-configuration if it can crash the card? */ > else if (getenv("RADEON_TCL_FORCE_ENABLE")) { > fprintf(stderr, "Enabling TCL support... this will probably > crash\n"); > fprintf(stderr, " your card if it isn't capable of > TCL!\n"); Why? :) I do wonder if trying to enable HW TCL makes sense though, there was a bit of confusion about TCL support on the various chips at the time. Maybe we can clean this up with the new config infrastructure. > + init_rgba_st_st_st(); > + init_rgba_spec_st_st_st(); > + init_st_st_st_n(); > + init_rgpa_spec_st_st_st_n(); /* question: don't know what "rgpa" > is, just copied and modified.. */ > + init_rgba_stq_stq_stq(); > + init_w_rgpa_spec_stq_stq_stq_n(); Looks like a typo? -- Earthling Michel Dänzer \ Debian (powerpc), XFree86 and DRI developer Software libre enthusiast \ http://svcs.affero.net/rm.php?r=daenzer |
From: Felix <fx...@gm...> - 2003-06-23 11:44:58
|
On 23 Jun 2003 12:17:26 +0200 Michel Dänzer <mi...@da...> wrote: > On Sat, 2003-06-21 at 14:14, Andreas Stenglein wrote: [snip] > > > + /* question: shouldn't the following be controlled by the > > kernelmodule */ > > + /* and/or Xserver-configuration if it can crash the card? */ > > else if (getenv("RADEON_TCL_FORCE_ENABLE")) { > > fprintf(stderr, "Enabling TCL support... this will probably > > crash\n"); > > fprintf(stderr, " your card if it isn't capable of > > TCL!\n"); > > Why? :) > > I do wonder if trying to enable HW TCL makes sense though, there was a > bit of confusion about TCL support on the various chips at the time. > Maybe we can clean this up with the new config infrastructure. I guess the point is that it's dangerous if a DRM client can do something that will lock up the box. On the config-0-0-1-branch there are currently still two TCL-related options for the radeon driver: use_tcl and force_tcl. If you say force_tcl is deprecated we could just drop it. > [snip] Felix ------------ __\|/__ ___ ___ ------------------------- Felix ___\_e -_/___/ __\___/ __\_____ You can do anything, Kühling (_____\Ä/____/ /_____/ /________) just not everything fx...@gm... \___/ \___/ U at the same time. |
From: Andreas S. <A.S...@gm...> - 2003-06-23 16:15:50
|
thanks a lot! btw, I got the 3rd TMU working with hw-tcl and codegen, too. (at least multiarb.c works) I'm going to clean the patch up a bit, so that it only contains things related to 3rd TMU support. Am 2003.06.23 12:17:26 +0200 schrieb(en) Michel D=E4nzer: > On Sat, 2003-06-21 at 14:14, Andreas Stenglein wrote: > >=20 > > [...] some comments where I dont know what to do. >=20 > I'll try to address some of them: >=20 > > + if( (sPriv->drmMinor < 3) || (getenv( "RADEON_NO_3RD_TMU")) ) /* > > question: is the check for drmMinor necessary? */ > > + ctx->Const.MaxTextureUnits =3D 2; > > + else > > + ctx->Const.MaxTextureUnits =3D RADEON_MAX_TEXTURE_UNITS; /* 3 */ >=20 > Does the 3rd TMU require any DRM support not present before that minor > version? I dont know ;) I think I have to try if 3rd TMU works when=20 RADEON_COMPAT is set. On the other hand I think DRI from stock XFree86 4.3.0 wont even work with old radeon.o kernelmodules... Is there a list which versions of the drm-modules were distributed? (linux 2.4, 2.5, bsd x.x, XFree86 4.x.x, GATOS etc.) >=20 >=20 > > + /* question: shouldn't the following be controlled by the > > kernelmodule */ > > + /* and/or Xserver-configuration if it can crash the card? */ > > else if (getenv("RADEON_TCL_FORCE_ENABLE")) { > > fprintf(stderr, "Enabling TCL support... this will probably > > crash\n"); > > fprintf(stderr, " your card if it isn't capable of > > TCL!\n"); >=20 > Why? :) >=20 > I do wonder if trying to enable HW TCL makes sense though, there was a > bit of confusion about TCL support on the various chips at the time. > Maybe we can clean this up with the new config infrastructure. I think that wont help: some_bad_user could put that code back and install his own radeon_dri.so to lockup the graphicscard of a workstation he only has user-rights to. So access to hw-accelerated 3d has to be restricted to trusted users, at least if you have a card without tcl. >=20 >=20 > > + init_rgba_st_st_st(); > > + init_rgba_spec_st_st_st(); > > + init_st_st_st_n(); > > + init_rgpa_spec_st_st_st_n(); /* question: don't know what "rgpa" > > is, just copied and modified.. */ > > + init_rgba_stq_stq_stq(); > > + init_w_rgpa_spec_stq_stq_stq_n(); >=20 > Looks like a typo? Maybe, but its in the r200 sources as well. >=20 >=20 > --=20 > Earthling Michel D=E4nzer \ Debian (powerpc), XFree86 and DRI developer > Software libre enthusiast \ http://svcs.affero.net/rm.php?r=3Ddaenzer >=20 >=20 greetings, Andreas |
From: Keith W. <ke...@tu...> - 2003-06-23 16:32:38
|
Andreas Stenglein wrote: > thanks a lot! > > btw, I got the 3rd TMU working with hw-tcl and codegen, too. > (at least multiarb.c works) > I'm going to clean the patch up a bit, so that it only contains > things related to 3rd TMU support. > > Am 2003.06.23 12:17:26 +0200 schrieb(en) Michel Dänzer: > >>On Sat, 2003-06-21 at 14:14, Andreas Stenglein wrote: >> >>>[...] some comments where I dont know what to do. >> >>I'll try to address some of them: >> >> >>>+ if( (sPriv->drmMinor < 3) || (getenv( "RADEON_NO_3RD_TMU")) ) /* >>>question: is the check for drmMinor necessary? */ >>>+ ctx->Const.MaxTextureUnits = 2; >>>+ else >>>+ ctx->Const.MaxTextureUnits = RADEON_MAX_TEXTURE_UNITS; /* 3 */ >> >>Does the 3rd TMU require any DRM support not present before that minor >>version? > > > I dont know ;) I think I have to try if 3rd TMU works when > RADEON_COMPAT is set. > On the other hand I think DRI from stock XFree86 4.3.0 wont even work > with old radeon.o kernelmodules... > > Is there a list which versions of the drm-modules were distributed? > (linux 2.4, 2.5, bsd x.x, XFree86 4.x.x, GATOS etc.) > > >> >>>+ /* question: shouldn't the following be controlled by the >>>kernelmodule */ >>>+ /* and/or Xserver-configuration if it can crash the card? */ >>> else if (getenv("RADEON_TCL_FORCE_ENABLE")) { >>> fprintf(stderr, "Enabling TCL support... this will probably >>>crash\n"); >>> fprintf(stderr, " your card if it isn't capable of >>>TCL!\n"); >> >>Why? :) >> >>I do wonder if trying to enable HW TCL makes sense though, there was a >>bit of confusion about TCL support on the various chips at the time. >>Maybe we can clean this up with the new config infrastructure. > > > I think that wont help: some_bad_user could put that code back and > install his own radeon_dri.so to lockup the graphicscard of a > workstation he only has user-rights to. > So access to hw-accelerated 3d has to be restricted to trusted users, > at least if you have a card without tcl. > There are so many ways to get any of these graphics cards to lock up the system - they're all prone to lockup if you do just one single thing wrong - you can send a bunch of very nice legal commands, but in some unspecified wrong order and have the card lockup. The rules for avoiding lockups are so complex and poorly defined that it's practically impossible to guarentee that it won't happen. Keith |
From: Michel <mi...@da...> - 2003-06-24 14:05:27
|
On Mon, 2003-06-23 at 18:08, Andreas Stenglein wrote: > > Am 2003.06.23 12:17:26 +0200 schrieb(en) Michel Dänzer: > > On Sat, 2003-06-21 at 14:14, Andreas Stenglein wrote: > > > > > > + init_rgba_st_st_st(); > > > + init_rgba_spec_st_st_st(); > > > + init_st_st_st_n(); > > > + init_rgpa_spec_st_st_st_n(); /* question: don't know what "rgpa" > > > is, just copied and modified.. */ > > > + init_rgba_stq_stq_stq(); > > > + init_w_rgpa_spec_stq_stq_stq_n(); > > > > Looks like a typo? > Maybe, but its in the r200 sources as well. The wonders of copy & paste I guess... the rgpa looks pretty lonely to me, surrounded by all those rgbas. ;) -- Earthling Michel Dänzer \ Debian (powerpc), XFree86 and DRI developer Software libre enthusiast \ http://svcs.affero.net/rm.php?r=daenzer |
From: Ian R. <id...@us...> - 2003-06-23 17:03:16
Attachments:
dri-radeon-3rd-tmu.patch
|
Andreas Stenglein wrote: > I tried to get the 3rd TMU working on radeon, > and with this patch it works at least > without hw-TCL for multiarb.c from Mesa/demos. > (RADEON_TCL_FORCE_DISABLE=1) > > With hw-TCL the 3rd texture is visible, but > isnt rotating. > > The patch also includes some code for the > kernelmodule for cube-textures on radeon, > and some comments where I dont know what to do. > You can switch off the 3rd TMU with export RADEON_NO_3RD_TMU=1 > > It would be nice if someone with knowledge about > TCL could have a look at it. > > > Which programs/demos/games could/should be tested as > they make use of the 3rd texture unit? As another data point, I have attached my very old patch to enable the 3rd TMU on Radeon. IIRC, it worked w/HW TCL, vtxfmt, & codegen. It is now quite outdated. There were a couple of reasons I did not commit any of this. 1. A lot of it (i.e., calculate_max_texture_levels) would be superseded by the texmem branch (which has now been merged to the trunk). 2. Enabling the 3rd TMU can drastically reduce the maximum available texture size on some memory configurations. This is even more significant on the R200 which has 6 TMUs. 3. There are some problems with some fast-pathing in the vtxfmt code. The code assumes that the allowable range for 'target' (see radeon_vtxfmt_c.c, line 542) is a power of two. If an app calls glMultiTexCoord2fv with a target of 3 (assuming the mask value is changed from 1 to 3), the driver will explode. 4. A similar problem to #3 exists with the codegen path. The fast paths selected in radeon_makeX86MultiTexCoord2fvARB (see radeon_vtxfmt_x86.c, line 354) and friends may not be expandable to the 3 (or 6 for R200) TMU cases. The first issue is a non-issue now. My original intention, before discovering the second issue, was to "merge" the patch after merging the texmem branch. It turns out that it took much longer to make the branch mergable than initiallly anticipated. I think we're going to have to wrestle with the second issue at some point. When the next round of texmem work is complete, we won't be able to predict apriori how big the largest texture set can be. Even now, I find it unlikely that on an R200 there would be 6 2048x2048 cube maps (the worst case) bound at any time. This renders the current calculation somewhat bogus to begin with. It seems that the existing closed-source drivers just advertise the hardware maximum in all cases. If the hardware maximum is advertised, then an app could bind a set of textures that can't fit in memory at once. The driver would then have to fallback to software. I believe the open-source drivers used to function this way, but doing so caused problems with Quake2. I'm really not sure what the right sollution is. |
From: Keith W. <ke...@tu...> - 2003-06-23 17:14:18
|
Ian Romanick wrote: > Andreas Stenglein wrote: > >> I tried to get the 3rd TMU working on radeon, >> and with this patch it works at least >> without hw-TCL for multiarb.c from Mesa/demos. >> (RADEON_TCL_FORCE_DISABLE=1) >> >> With hw-TCL the 3rd texture is visible, but >> isnt rotating. >> >> The patch also includes some code for the >> kernelmodule for cube-textures on radeon, >> and some comments where I dont know what to do. >> You can switch off the 3rd TMU with export RADEON_NO_3RD_TMU=1 >> >> It would be nice if someone with knowledge about >> TCL could have a look at it. >> >> >> Which programs/demos/games could/should be tested as >> they make use of the 3rd texture unit? > > > As another data point, I have attached my very old patch to enable the > 3rd TMU on Radeon. IIRC, it worked w/HW TCL, vtxfmt, & codegen. It is > now quite outdated. There were a couple of reasons I did not commit any > of this. > > 1. A lot of it (i.e., calculate_max_texture_levels) would be superseded > by the texmem branch (which has now been merged to the trunk). > > 2. Enabling the 3rd TMU can drastically reduce the maximum available > texture size on some memory configurations. This is even more > significant on the R200 which has 6 TMUs. > > 3. There are some problems with some fast-pathing in the vtxfmt code. > The code assumes that the allowable range for 'target' (see > radeon_vtxfmt_c.c, line 542) is a power of two. If an app calls > glMultiTexCoord2fv with a target of 3 (assuming the mask value is > changed from 1 to 3), the driver will explode. > > 4. A similar problem to #3 exists with the codegen path. The fast paths > selected in radeon_makeX86MultiTexCoord2fvARB (see radeon_vtxfmt_x86.c, > line 354) and friends may not be expandable to the 3 (or 6 for R200) TMU > cases. At worst a test can be used in this code. If there's no sane way to avoid it, we have to do it & that's that. > The first issue is a non-issue now. My original intention, before > discovering the second issue, was to "merge" the patch after merging the > texmem branch. It turns out that it took much longer to make the branch > mergable than initiallly anticipated. > > I think we're going to have to wrestle with the second issue at some > point. When the next round of texmem work is complete, we won't be able > to predict apriori how big the largest texture set can be. Even now, I > find it unlikely that on an R200 there would be 6 2048x2048 cube maps > (the worst case) bound at any time. This renders the current > calculation somewhat bogus to begin with. It seems that the existing > closed-source drivers just advertise the hardware maximum in all cases. > If the hardware maximum is advertised, then an app could bind a set of > textures that can't fit in memory at once. The driver would then have > to fallback to software. I believe the open-source drivers used to > function this way, but doing so caused problems with Quake2. I'm really > not sure what the right sollution is. Correct - and in fact they still should function this way if the situation somehow arises that the bound textures can't all be uploaded. Keith |
From: Andreas S. <A.S...@gm...> - 2003-06-23 20:37:19
|
Am 2003.06.23 19:12:01 +0200 schrieb(en) Keith Whitwell: > Ian Romanick wrote: > > Andreas Stenglein wrote: > >=20 > >> I tried to get the 3rd TMU working on radeon, > >> and with this patch it works at least > >> without hw-TCL for multiarb.c from Mesa/demos. > >> (RADEON_TCL_FORCE_DISABLE=3D1) > >> > >> With hw-TCL the 3rd texture is visible, but > >> isnt rotating. > >> > >> The patch also includes some code for the > >> kernelmodule for cube-textures on radeon, > >> and some comments where I dont know what to do. > >> You can switch off the 3rd TMU with export RADEON_NO_3RD_TMU=3D1 > >> > >> It would be nice if someone with knowledge about > >> TCL could have a look at it. > >> > >> > >> Which programs/demos/games could/should be tested as > >> they make use of the 3rd texture unit? > >=20 > >=20 > > As another data point, I have attached my very old patch to enable the= =20 > > 3rd TMU on Radeon. IIRC, it worked w/HW TCL, vtxfmt, & codegen. It is= =20 > > now quite outdated. There were a couple of reasons I did not commit an= y=20 > > of this. thanks a lot, it seems I have missed some other bits, for example the fallback in radeon_compat. > >=20 > > 1. A lot of it (i.e., calculate_max_texture_levels) would be superseded= =20 > > by the texmem branch (which has now been merged to the trunk). > >=20 > > 2. Enabling the 3rd TMU can drastically reduce the maximum available=20 > > texture size on some memory configurations. This is even more=20 > > significant on the R200 which has 6 TMUs. On my 64MB Radeon7500 the max texture size is 2048 with 2TMU and 1024 with 3TMUs. It shouldnt be a problem for 32MB versions as long es the max texture size is as big as opengl requires as minimum (256?) If you really dont need the 3rd TMU, you can switch support off via environment variable. And we could switch the 3rd TMU off automagically if the resulting=20 max texture size gets to small, then recalculate it. But I doubt there are 16MB Radeons available. For the R200: I think only 64MB and 128MB versions exist. And we can make it to support a max. of 2, 4 or 6 TMUs by envvar or via the upcoming config-stuff. Most programs/games make use of at least 2 TMUs, newer ones use 4, maybe 3, and some even 6 TMUs if they are available. > >=20 > > 3. There are some problems with some fast-pathing in the vtxfmt code.= =20 > > The code assumes that the allowable range for 'target' (see=20 > > radeon_vtxfmt_c.c, line 542) is a power of two. If an app calls=20 > > glMultiTexCoord2fv with a target of 3 (assuming the mask value is=20 > > changed from 1 to 3), the driver will explode. I tried to just allocate a dummy (texcoordptr[3]) and let it=20 point to tex0 or so. A modified multiarb.c which used 4 TMUs even if the driver doesnt support it didnt at least crash. > >=20 > > 4. A similar problem to #3 exists with the codegen path. The fast path= s=20 > > selected in radeon_makeX86MultiTexCoord2fvARB (see radeon_vtxfmt_x86.c,= =20 > > line 354) and friends may not be expandable to the 3 (or 6 for R200) TM= U=20 > > cases. >=20 the dummy should help in this case, too. > At worst a test can be used in this code. If there's no sane way to avoi= d it,=20 > we have to do it & that's that. >=20 >=20 > > The first issue is a non-issue now. My original intention, before=20 > > discovering the second issue, was to "merge" the patch after merging th= e=20 > > texmem branch. It turns out that it took much longer to make the branc= h=20 > > mergable than initiallly anticipated. > >=20 > > I think we're going to have to wrestle with the second issue at some=20 > > point. When the next round of texmem work is complete, we won't be abl= e=20 > > to predict apriori how big the largest texture set can be. Even now, I= =20 > > find it unlikely that on an R200 there would be 6 2048x2048 cube maps= =20 > > (the worst case) bound at any time. This renders the current=20 > > calculation somewhat bogus to begin with. It seems that the existing= =20 > > closed-source drivers just advertise the hardware maximum in all cases. >=20 >=20 >=20 > > If the hardware maximum is advertised, then an app could bind a set of= =20 > > textures that can't fit in memory at once. The driver would then have= =20 > > to fallback to software. I believe the open-source drivers used to=20 > > function this way, but doing so caused problems with Quake2. I'm reall= y=20 > > not sure what the right sollution is. >=20 > Correct - and in fact they still should function this way if the situatio= n=20 > somehow arises that the bound textures can't all be uploaded. Or adding a fallback to multipass-rendering with 2 TMUs before fallback to sw-rendering. That might get a bit tricky. >=20 > Keith >=20 |
From: Ian R. <id...@us...> - 2003-06-23 21:31:15
|
Andreas Stenglein wrote: > Am 2003.06.23 19:12:01 +0200 schrieb(en) Keith Whitwell: > >>Ian Romanick wrote: >> >>>As another data point, I have attached my very old patch to enable the >>>3rd TMU on Radeon. IIRC, it worked w/HW TCL, vtxfmt, & codegen. It is >>>now quite outdated. There were a couple of reasons I did not commit any >>>of this. > > thanks a lot, it seems I have missed some other bits, for example the > fallback in radeon_compat. I knew there was a reason I kept that code around. :) >>>1. A lot of it (i.e., calculate_max_texture_levels) would be superseded >>>by the texmem branch (which has now been merged to the trunk). >>> >>>2. Enabling the 3rd TMU can drastically reduce the maximum available >>>texture size on some memory configurations. This is even more >>>significant on the R200 which has 6 TMUs. > > On my 64MB Radeon7500 the max texture size is 2048 with 2TMU and > 1024 with 3TMUs. It shouldnt be a problem for 32MB versions as long es > the max texture size is as big as opengl requires as minimum (256?) > If you really dont need the 3rd TMU, you can switch support off via > environment variable. > And we could switch the 3rd TMU off automagically if the resulting > max texture size gets to small, then recalculate it. But I doubt there > are 16MB Radeons available. > > For the R200: I think only 64MB and 128MB versions exist. And we can > make it to support a max. of 2, 4 or 6 TMUs by envvar or via the > upcoming config-stuff. > Most programs/games make use of at least 2 TMUs, newer ones use 4, > maybe 3, and some even 6 TMUs if they are available. For mobile systems, there are M6 chips with as little as 8MB. I think there are mobile R200 derrived chips with as little as 16MB or 32MB. I'm not 100% sure on that, though. From an application perspective, why should we penalize texture quality across the board for worst-case situations that may or may not ever happen? Like I said before, how frequently will an application try to bind a 2048x2048 texture to all of the available texture units? If the answer is never, why should the app be forced to use 512x512 textures for everything just so that we can be "safe"? This may be a place where we should use a config option. A slider for selecting the maximum texture size should work. >>>3. There are some problems with some fast-pathing in the vtxfmt code. >>>The code assumes that the allowable range for 'target' (see >>>radeon_vtxfmt_c.c, line 542) is a power of two. If an app calls >>>glMultiTexCoord2fv with a target of 3 (assuming the mask value is >>>changed from 1 to 3), the driver will explode. > > I tried to just allocate a dummy (texcoordptr[3]) and let it > point to tex0 or so. > A modified multiarb.c which used 4 TMUs even if the driver doesnt > support it didnt at least crash. That was the sollution I had thought of, too. >>>4. A similar problem to #3 exists with the codegen path. The fast paths >>>selected in radeon_makeX86MultiTexCoord2fvARB (see radeon_vtxfmt_x86.c, >>>line 354) and friends may not be expandable to the 3 (or 6 for R200) TMU >>>cases. > > the dummy should help in this case, too. I don't think it helps here because the texcoord data is treated as a simple array. The assumption is that texcoordptr[0][x*n] is the same as texcoordptr[x]. We can't put any padding in the vertex buffer itself. The hardware won't let us. I think the best bet is to use the fast path exactly as it is used now (if exactly TMU 0 & 1 are enabled) and use the slower path otherwise. One R200 we could also use the fast path if TMU 0 & 1 or 0 & 1 & 2 & 3 are enabled. I'm not sure if they payoff would be worth the effort, though. >>At worst a test can be used in this code. If there's no sane way to avoid it, >>we have to do it & that's that. >> >>>The first issue is a non-issue now. My original intention, before >>>discovering the second issue, was to "merge" the patch after merging the >>>texmem branch. It turns out that it took much longer to make the branch >>>mergable than initiallly anticipated. >>> >>>I think we're going to have to wrestle with the second issue at some >>>point. When the next round of texmem work is complete, we won't be able >>>to predict apriori how big the largest texture set can be. Even now, I >>>find it unlikely that on an R200 there would be 6 2048x2048 cube maps >>>(the worst case) bound at any time. This renders the current >>>calculation somewhat bogus to begin with. It seems that the existing >>>closed-source drivers just advertise the hardware maximum in all cases. >> >>>If the hardware maximum is advertised, then an app could bind a set of >>>textures that can't fit in memory at once. The driver would then have >>>to fallback to software. I believe the open-source drivers used to >>>function this way, but doing so caused problems with Quake2. I'm really >>>not sure what the right sollution is. >> >>Correct - and in fact they still should function this way if the situation >>somehow arises that the bound textures can't all be uploaded. > > Or adding a fallback to multipass-rendering with 2 TMUs before > fallback to sw-rendering. That might get a bit tricky. Tricky isn't the word. Down right horrible is the word! I thought about this once WRT implementing some missing parts of ARB_texture_env_combine for MGA. The interactions with stencil-buffer, depth-buffer, and other subtle bits made my head hurt. |
From: Andreas S. <A.S...@gm...> - 2003-06-26 17:59:32
Attachments:
radeon_tex3_patch2.txt
|
here is a patch that works at least for multiarb.c It is against HEAD from 19 June 2003 (I cleaned it up a bit but its not ready for merge:=20 still some questions...) 1) could someone with an 8MB or 16MB Radeon check if the resulting max_texturesize is big enough? (just use mesa's glxinfo: glxinfo -l) 2) could someone try it out with a game/demo that makes use of the 3rd TMU? 3) could someone with knowledge about the vfmt and codegen stuff have a look on it? especially whether we need those dummys and what should be done in the fast-path and with vertex3f. best regards, Andreas |
From: Ian R. <id...@us...> - 2003-06-27 01:19:11
|
Andreas Stenglein wrote: > here is a patch that works at least for multiarb.c > It is against HEAD from 19 June 2003 > (I cleaned it up a bit but its not ready for merge: > still some questions...) I've taken a quick peek at this patch, and I have a couple comments. I hope to be able to look at it in more detail in the next week or so, but I can't make any promises. > 1) could someone with an 8MB or 16MB Radeon check if the > resulting max_texturesize is big enough? > (just use mesa's glxinfo: glxinfo -l) I should be able to test this on an M6 w/8MB next. Keep in mind that the amount of available AGP memory also plays a role. If the card only has 1MB of available memory for textures, but there is 256MB of AGP memory, it probably won't be a limitation. > 2) could someone try it out with a game/demo that > makes use of the 3rd TMU? Other than multiarb, I think UT2k3 is the only option. Did anyone ever get that working on R100? I know that someone (Keith?) finally got it working on R200. > 3) could someone with knowledge about the vfmt and codegen > stuff have a look on it? especially whether we need those > dummys and what should be done in the fast-path and > with vertex3f. That's where I'll try to focus my attention when I look at it deep. At first glance (see my notes below), it look pretty good in this respect. > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c Wed Jun 11 00:06:16 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c Thu Jun 26 17:15:36 2003 > @@ -314,6 +317,8 @@ > 12, > GL_FALSE ); > > + /* FIXME: we should verify that we dont get limits below the minimum requirements of OpenGL */ > + > ctx->Const.MaxTextureMaxAnisotropy = 16.0; > > /* No wide points. Yes and no. If we persist with the current memory sizing scheme, we'll need to disable texture units if there is not enough memory to provide the required minimum texture size. My opinion, however, is that we should bag that scheme altogether. I think we'll leave that particular issue for a (slightly) later date... > @@ -374,13 +379,15 @@ > > _math_matrix_ctr( &rmesa->TexGenMatrix[0] ); > _math_matrix_ctr( &rmesa->TexGenMatrix[1] ); > + _math_matrix_ctr( &rmesa->TexGenMatrix[2] ); > _math_matrix_ctr( &rmesa->tmpmat ); > _math_matrix_set_identity( &rmesa->TexGenMatrix[0] ); > _math_matrix_set_identity( &rmesa->TexGenMatrix[1] ); > + _math_matrix_set_identity( &rmesa->TexGenMatrix[2] ); > _math_matrix_set_identity( &rmesa->tmpmat ); > > driInitExtensions( ctx, card_extensions, GL_TRUE ); > - if( rmesa->dri.drmMinor >= 9 || getenv( "RADEON_RECTANGLE_FORCE_ENABLE")) /* FIXME! a.s. */ > + if( rmesa->dri.drmMinor >= 9) > _mesa_enable_extension( ctx, "GL_NV_texture_rectangle"); > radeonInitDriverFuncs( ctx ); > radeonInitIoctlFuncs( ctx ); The various bits of texture-rectangle cleanup like this should be commited separately from the 3rd TMU stuff. It will make it easier to grok the log messages & roll stuff back if needed. > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h Wed Jun 11 00:06:16 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h Thu Jun 26 17:22:54 2003 > @@ -635,30 +636,37 @@ > GLuint prim; > }; > > +/* FIXME: do we really need add. 2 to prevent segfault if someone */ > +/* specifies GL_TEXTURE3 (esp. for the codegen-path) ? */ > +#define RADEON_MAX_VERTEX_SIZE 19 /* 17 + 2 */ > + If you're going to use a define for this, which is a good idea, you should move the commentary about the chosen value out of radeon_vbinfo and put it next to the define. I've done some thinking about this particular fast-path. I believe that we should try to keep it as much as possible. I don't think there's any compelling reason not to. I would point out two things here. 1. The extra 2 elements (or MAX_TEXTURE_COORD_ELEMENTS elements, which is 2 right now, but will eventually grow to 3) is to give the appearance of having a power-of-two number of texture units. This allows some optimizations. 2. This is safe because the texture coordinates are always the last elements in a vertex. Therefore the actual vertex data is packed into the start of the buffer. > struct radeon_vbinfo { > GLint counter, initial_counter; > GLint *dmaptr; > void (*notify)( void ); > GLint vertex_size; > > - /* A maximum total of 15 elements per vertex: 3 floats for position, 3 > + /* A maximum total of 17 elements per vertex: 3 floats for position, 3 > * floats for normal, 4 floats for color, 4 bytes for secondary color, > - * 2 floats for each texture unit (4 floats total). > + * 2 floats for each texture unit (6 floats total). > * > - * As soon as the 3rd TMU is supported or cube maps (or 3D textures) are > + * As soon as cube maps (or 3D textures) are > * supported, this value will grow. > * > * The position data is never actually stored here, so 3 elements could be > * trimmed out of the buffer. > */ > - union { float f; int i; radeon_color_t color; } vertex[15]; > + union { float f; int i; radeon_color_t color; } vertex[RADEON_MAX_VERTEX_SIZE]; > > GLfloat *normalptr; > GLfloat *floatcolorptr; > radeon_color_t *colorptr; > GLfloat *floatspecptr; > radeon_color_t *specptr; > - GLfloat *texcoordptr[2]; > + GLfloat *texcoordptr[4]; /* this should be RADEON_MAX_TEXTURE_UNITS but */ > + /* the extra one is needed in radeon_vtxfmt_c.c and */ > + /* in the codegen-path if someone specifies GL_TEXTURE3 */ > + /* maybe we should just use mesas MAX_TEXTURE_UNITS here */ 4 should be used. > GLenum *prim; /* &ctx->Driver.CurrentExecPrimitive */ > GLuint primflags; > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c Fri May 2 13:01:54 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c Thu Jun 19 14:02:42 2003 > @@ -385,6 +385,16 @@ > } > } > > + if (ctx->Texture.Unit[2]._ReallyEnabled) { > + if (ctx->Texture.Unit[2].TexGenEnabled) { > + if (rmesa->TexGenNeedNormals[2]) { > + inputs |= VERT_BIT_NORMAL; > + } > + } else { > + inputs |= VERT_BIT_TEX2; > + } > + } > + I don't remember, but could this be made into a loop? > stage->inputs = inputs; > stage->active = 1; > } > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c Fri May 2 13:01:56 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c Thu Jun 26 17:55:47 2003 > @@ -626,6 +657,11 @@ > rmesa->vb.floatspecptr = ctx->Current.Attrib[VERT_ATTRIB_COLOR1]; > rmesa->vb.texcoordptr[0] = ctx->Current.Attrib[VERT_ATTRIB_TEX0]; > rmesa->vb.texcoordptr[1] = ctx->Current.Attrib[VERT_ATTRIB_TEX1]; > + rmesa->vb.texcoordptr[2] = ctx->Current.Attrib[VERT_ATTRIB_TEX2]; > + rmesa->vb.texcoordptr[3] = ctx->Current.Attrib[VERT_ATTRIB_TEX0]; /* dummy to prevent segfault when someone */ > + /* specifies GL_TEXTURE3 */ > + /* question: could we use VERT_ATTRIB_TEX3, too without */ > + /* the risk of "broken" vtxfmt and codegen path ? */ > > /* Run through and initialize the vertex components in the order > * the hardware understands: It doesn't matter what you use. The spec says that if an app tries to see the coord for a non-existant texture unit the results are undefined. Over-writting a different texture unit's coordinate or doing nothing are both equally valid. > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c Fri May 2 13:01:56 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c Thu Jun 26 17:57:22 2003 > @@ -548,12 +548,16 @@ > * with 0x1F. Masking with 0x1F and then masking with 0x01 is redundant, so > * the subtraction has been omitted. > */ > +/* question: should we continue using this and make the texcoordptr 4 elements > + * or should we mask and verify that the index doesn't get bigger than 2 ? > + * We have this issue in the codegen stuff, too > + */ Yes (see my explanation above). > @@ -736,17 +740,20 @@ > > #define ACTIVE_ST0 RADEON_CP_VC_FRMT_ST0 > #define ACTIVE_ST1 RADEON_CP_VC_FRMT_ST1 > -#define ACTIVE_ST_ALL (RADEON_CP_VC_FRMT_ST1|RADEON_CP_VC_FRMT_ST0) > +#define ACTIVE_ST2 RADEON_CP_VC_FRMT_ST2 > +#define ACTIVE_ST_ALL (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1|RADEON_CP_VC_FRMT_ST2) > > /* Each codegen function should be able to be fully specified by a > * subsetted version of rmesa->vb.vertex_format. > */ > + /* question: this is strange... could someone explain ? */ > #define MASK_NORM (ACTIVE_XYZW) > #define MASK_COLOR (MASK_NORM|ACTIVE_NORM) > #define MASK_SPEC (MASK_COLOR|ACTIVE_COLOR) > #define MASK_ST0 (MASK_SPEC|ACTIVE_SPEC) > #define MASK_ST1 (MASK_ST0|ACTIVE_ST0) > -#define MASK_ST_ALL (MASK_ST1|ACTIVE_ST1) > +#define MASK_ST2 (MASK_ST1|ACTIVE_ST1) > +#define MASK_ST_ALL (MASK_ST2|ACTIVE_ST2) > #define MASK_VERTEX (MASK_ST_ALL|ACTIVE_FPALPHA) You'll have to search through the list archives. I remember asking the same thing once upon a time, but I don't recall the correct technical answer. > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c Fri May 2 13:01:56 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c Thu Jun 26 18:58:10 2003 > @@ -178,13 +178,21 @@ > if (RADEON_DEBUG & DEBUG_CODEGEN) > fprintf(stderr, "%s 0x%08x\n", __FUNCTION__, key ); > > - if ((key & (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1)) == > - (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1)) { > - DFN ( _sse_MultiTexCoord2fv, rmesa->vb.dfn_cache.MultiTexCoord2fvARB ); > - FIXUP(dfn->code, 18, 0xdeadbeef, (int)rmesa->vb.texcoordptr[0]); > - } else { > - DFN ( _sse_MultiTexCoord2fv_2, rmesa->vb.dfn_cache.MultiTexCoord2fvARB ); > - FIXUP(dfn->code, 14, 0x0, (int)rmesa->vb.texcoordptr); > +/* question: should we just let the case for RADEON_CP_VC_FRMT_ST0 for the */ > +/* default path? Maybe some programs just use glMultiTexCoord2f(v)ARB for */ > +/* every TMU, as GL_TEXTURE0_ARB is also allowed */ No. If some other set of texture units (say, 0 and 2 or just 1), are enabled, masking the unit with 3 won't put the data in the correct place in the vertex buffer. If just TMU 1 is enabled, it's coordinate needs to be at the 0th position. Just masking w/3 would put it at the 1st position, and the chip will get garbage. > + switch (key & (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1|RADEON_CP_VC_FRMT_ST2)) > + { > + case RADEON_CP_VC_FRMT_ST0: > + case (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1): > + case (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1|RADEON_CP_VC_FRMT_ST2): > + DFN ( _sse_MultiTexCoord2fv, rmesa->vb.dfn_cache.MultiTexCoord2fvARB ); > + FIXUP(dfn->code, 18, 0xdeadbeef, (int)rmesa->vb.texcoordptr[0]); > + break; > + default: > + DFN ( _sse_MultiTexCoord2fv_2, rmesa->vb.dfn_cache.MultiTexCoord2fvARB ); > + FIXUP(dfn->code, 14, 0x0, (int)rmesa->vb.texcoordptr); > + break; > } > return dfn; > } > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c Fri May 2 13:01:56 2003 > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c Thu Jun 26 18:11:45 2003 > @@ -75,7 +75,7 @@ > if (RADEON_DEBUG & DEBUG_CODEGEN) > fprintf(stderr, "%s 0x%08x %d\n", __FUNCTION__, key, rmesa->vb.vertex_size ); > > - switch (rmesa->vb.vertex_size) { > + switch (rmesa->vb.vertex_size) { /* FIXME: do we need something here for 3rd TMU? */ > case 4: { > > DFN ( _x86_Vertex3f_4, rmesa->vb.dfn_cache.Vertex3f ); I don't think we have to do anything special. It should just picked up by the default case. We may decide to put a fast path, but I don't expect so. |
From: Andreas S. <A.S...@gm...> - 2003-06-27 22:29:59
|
Ian, thanks a lot for having a look at it. Am 2003.06.27 03:19:00 +0200 schrieb(en) Ian Romanick: > Andreas Stenglein wrote: > > here is a patch that works at least for multiarb.c > > It is against HEAD from 19 June 2003 > > (I cleaned it up a bit but its not ready for merge:=20 > > still some questions...) >=20 > I've taken a quick peek at this patch, and I have a couple comments. I= =20 > hope to be able to look at it in more detail in the next week or so, but= =20 > I can't make any promises. >=20 > > 1) could someone with an 8MB or 16MB Radeon check if the > > resulting max_texturesize is big enough? > > (just use mesa's glxinfo: glxinfo -l) >=20 > I should be able to test this on an M6 w/8MB next. Keep in mind that=20 > the amount of available AGP memory also plays a role. If the card only= =20 > has 1MB of available memory for textures, but there is 256MB of AGP=20 > memory, it probably won't be a limitation. >=20 > > 2) could someone try it out with a game/demo that > > makes use of the 3rd TMU? >=20 > Other than multiarb, I think UT2k3 is the only option. Did anyone ever= =20 > get that working on R100? I know that someone (Keith?) finally got it=20 > working on R200. I got the ut2003 demo 2206 and it worked on the radeon 7500. Some lighting/coloring is broken, (I saw some sort of that in another program) and it tries to use GL_TEXTURE_CUBE_MAP. Unfortunately it doesnt log which extensions it would use if they were available. And it doesnt log the implementation limits. @daniel: are there any switches to get more information? what is being drawn with the 3rd TMU ? would GL_NV_vertex_array_range really help so much or is this info of the README.linux outdated? >=20 > > 3) could someone with knowledge about the vfmt and codegen > > stuff have a look on it? especially whether we need those > > dummys and what should be done in the fast-path and > > with vertex3f. >=20 > That's where I'll try to focus my attention when I look at it deep. At= =20 > first glance (see my notes below), it look pretty good in this respect. >=20 > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context= .c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c We= d Jun 11 00:06:16 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.c Thu= Jun 26 17:15:36 2003 > > @@ -314,6 +317,8 @@ > > 12, > > GL_FALSE ); > > =20 > > + /* FIXME: we should verify that we dont get limits below the minimu= m requirements of OpenGL */ > > + > > ctx->Const.MaxTextureMaxAnisotropy =3D 16.0; > > =20 > > /* No wide points. >=20 > Yes and no. If we persist with the current memory sizing scheme, we'll= =20 > need to disable texture units if there is not enough memory to provide=20 > the required minimum texture size. My opinion, however, is that we=20 > should bag that scheme altogether. I think we'll leave that particular= =20 > issue for a (slightly) later date... ok >=20 > > @@ -374,13 +379,15 @@ > > =20 > > _math_matrix_ctr( &rmesa->TexGenMatrix[0] ); > > _math_matrix_ctr( &rmesa->TexGenMatrix[1] ); > > + _math_matrix_ctr( &rmesa->TexGenMatrix[2] ); > > _math_matrix_ctr( &rmesa->tmpmat ); > > _math_matrix_set_identity( &rmesa->TexGenMatrix[0] ); > > _math_matrix_set_identity( &rmesa->TexGenMatrix[1] ); > > + _math_matrix_set_identity( &rmesa->TexGenMatrix[2] ); > > _math_matrix_set_identity( &rmesa->tmpmat ); > > =20 > > driInitExtensions( ctx, card_extensions, GL_TRUE ); > > - if( rmesa->dri.drmMinor >=3D 9 || getenv( "RADEON_RECTANGLE_FORCE_E= NABLE")) /* FIXME! a.s. */ > > + if( rmesa->dri.drmMinor >=3D 9) > > _mesa_enable_extension( ctx, "GL_NV_texture_rectangle"); > > radeonInitDriverFuncs( ctx ); > > radeonInitIoctlFuncs( ctx ); >=20 > The various bits of texture-rectangle cleanup like this should be=20 > commited separately from the 3rd TMU stuff. It will make it easier to=20 > grok the log messages & roll stuff back if needed. yes, right.. this getenv stuff more or less got merged by accident: I didnt know if we should bump the version number, so I just included some test and a "workaround". I think this cleanup could just be committed. OTOH I've found some use of MAX_TEXTURE_UNITS (the one defined in mesas con= fig.h) in the extra layer for NPOT textures (radeon_swtcl.c). I think RADEON_MAX_TEXTURE_UNITS and maybe ctx->Const.MaxTextureUnits can be used there. Only keith knows ... >=20 > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context= .h tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h We= d Jun 11 00:06:16 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_context.h Thu= Jun 26 17:22:54 2003 > > @@ -635,30 +636,37 @@ > > GLuint prim; > > }; > > =20 > > +/* FIXME: do we really need add. 2 to prevent segfault if someone */ > > +/* specifies GL_TEXTURE3 (esp. for the codegen-path) ? */ > > +#define RADEON_MAX_VERTEX_SIZE 19 /* 17 + 2 */ > > + >=20 > If you're going to use a define for this, which is a good idea, you=20 > should move the commentary about the chosen value out of radeon_vbinfo=20 > and put it next to the define. >=20 > I've done some thinking about this particular fast-path. I believe that= =20 > we should try to keep it as much as possible. I don't think there's any= =20 > compelling reason not to. I would point out two things here. >=20 > 1. The extra 2 elements (or MAX_TEXTURE_COORD_ELEMENTS elements, which=20 > is 2 right now, but will eventually grow to 3) is to give the appearance= =20 > of having a power-of-two number of texture units. This allows some=20 > optimizations. >=20 > 2. This is safe because the texture coordinates are always the last=20 > elements in a vertex. Therefore the actual vertex data is packed into=20 > the start of the buffer. >=20 > > struct radeon_vbinfo { > > GLint counter, initial_counter; > > GLint *dmaptr; > > void (*notify)( void ); > > GLint vertex_size; > > =20 > > - /* A maximum total of 15 elements per vertex: 3 floats for positio= n, 3 > > + /* A maximum total of 17 elements per vertex: 3 floats for positio= n, 3 > > * floats for normal, 4 floats for color, 4 bytes for secondary col= or, > > - * 2 floats for each texture unit (4 floats total). > > + * 2 floats for each texture unit (6 floats total). > > *=20 > > - * As soon as the 3rd TMU is supported or cube maps (or 3D textures= ) are > > + * As soon as cube maps (or 3D textures) are > > * supported, this value will grow. > > *=20 > > * The position data is never actually stored here, so 3 elements c= ould be > > * trimmed out of the buffer. > > */ > > - union { float f; int i; radeon_color_t color; } vertex[15]; > > + union { float f; int i; radeon_color_t color; } vertex[RADEON_MAX_V= ERTEX_SIZE]; > > =20 > > GLfloat *normalptr; > > GLfloat *floatcolorptr; > > radeon_color_t *colorptr; > > GLfloat *floatspecptr; > > radeon_color_t *specptr; > > - GLfloat *texcoordptr[2]; > > + GLfloat *texcoordptr[4]; /* this should be RADEON_MAX_TEXTURE_UNITS= but */ > > + /* the extra one is needed in radeon_vtxfmt_c.c and */ > > + /* in the codegen-path if someone specifies GL_TEXTURE3 */ > > + /* maybe we should just use mesas MAX_TEXTURE_UNITS here */ >=20 > 4 should be used. >=20 > > GLenum *prim; /* &ctx->Driver.CurrentExecPrimitive */ > > GLuint primflags; >=20 > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c t= ex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c Fri Ma= y 2 13:01:54 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c Thu Jun= 19 14:02:42 2003 > > @@ -385,6 +385,16 @@ > > } > > } > > =20 > > + if (ctx->Texture.Unit[2]._ReallyEnabled) { > > + if (ctx->Texture.Unit[2].TexGenEnabled) { > > + if (rmesa->TexGenNeedNormals[2]) { > > + inputs |=3D VERT_BIT_NORMAL; > > + } > > + } else { > > + inputs |=3D VERT_BIT_TEX2; > > + } > > + } > > + >=20 > I don't remember, but could this be made into a loop? you are right, yes, maybe. but I thought it isnt a problem for 3TMUs, only for 6 (r200) ;) so it could be good to have it here, too. >=20 > > stage->inputs =3D inputs; > > stage->active =3D 1; > > } > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.= c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c Fri= May 2 13:01:56 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt.c Thu = Jun 26 17:55:47 2003 > > @@ -626,6 +657,11 @@ > > rmesa->vb.floatspecptr =3D ctx->Current.Attrib[VERT_ATTRIB_COLOR1]; > > rmesa->vb.texcoordptr[0] =3D ctx->Current.Attrib[VERT_ATTRIB_TEX0]; > > rmesa->vb.texcoordptr[1] =3D ctx->Current.Attrib[VERT_ATTRIB_TEX1]; > > + rmesa->vb.texcoordptr[2] =3D ctx->Current.Attrib[VERT_ATTRIB_TEX2]; > > + rmesa->vb.texcoordptr[3] =3D ctx->Current.Attrib[VERT_ATTRIB_TEX0];= /* dummy to prevent segfault when someone */ > > + /* specifies GL_TEXTURE3 */ > > + /* question: could we use VERT_ATTRIB_TEX3, too withou= t */ > > + /* the risk of "broken" vtxfmt and codegen path ? */ > > =20 > > /* Run through and initialize the vertex components in the order > > * the hardware understands: >=20 > It doesn't matter what you use. The spec says that if an app tries to=20 > see the coord for a non-existant texture unit the results are undefined.= =20 > Over-writting a different texture unit's coordinate or doing nothing=20 > are both equally valid. >=20 > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_= c.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c F= ri May 2 13:01:56 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_c.c Th= u Jun 26 17:57:22 2003 > > @@ -548,12 +548,16 @@ > > * with 0x1F. Masking with 0x1F and then masking with 0x01 is redunda= nt, so > > * the subtraction has been omitted. > > */ > > +/* question: should we continue using this and make the texcoordptr 4 = elements > > + * or should we mask and verify that the index doesn't get b= igger than 2 ? > > + * We have this issue in the codegen stuff, too > > + */ >=20 > Yes (see my explanation above). >=20 > > @@ -736,17 +740,20 @@ > > =20 > > #define ACTIVE_ST0 RADEON_CP_VC_FRMT_ST0 > > #define ACTIVE_ST1 RADEON_CP_VC_FRMT_ST1 > > -#define ACTIVE_ST_ALL (RADEON_CP_VC_FRMT_ST1|RADEON_CP_VC_FRMT_ST0) > > +#define ACTIVE_ST2 RADEON_CP_VC_FRMT_ST2 > > +#define ACTIVE_ST_ALL (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1|RAD= EON_CP_VC_FRMT_ST2) > > =20 > > /* Each codegen function should be able to be fully specified by a > > * subsetted version of rmesa->vb.vertex_format. > > */ > > + /* question: this is strange... could someone explain ? */ > > #define MASK_NORM (ACTIVE_XYZW) > > #define MASK_COLOR (MASK_NORM|ACTIVE_NORM) > > #define MASK_SPEC (MASK_COLOR|ACTIVE_COLOR) > > #define MASK_ST0 (MASK_SPEC|ACTIVE_SPEC) > > #define MASK_ST1 (MASK_ST0|ACTIVE_ST0) > > -#define MASK_ST_ALL (MASK_ST1|ACTIVE_ST1) > > +#define MASK_ST2 (MASK_ST1|ACTIVE_ST1) > > +#define MASK_ST_ALL (MASK_ST2|ACTIVE_ST2) > > #define MASK_VERTEX (MASK_ST_ALL|ACTIVE_FPALPHA)=20 >=20 > You'll have to search through the list archives. I remember asking the= =20 > same thing once upon a time, but I don't recall the correct technical=20 > answer. >=20 > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_= sse.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c= Fri May 2 13:01:56 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c = Thu Jun 26 18:58:10 2003 > > @@ -178,13 +178,21 @@ > > if (RADEON_DEBUG & DEBUG_CODEGEN) > > fprintf(stderr, "%s 0x%08x\n", __FUNCTION__, key ); > > =20 > > - if ((key & (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1)) =3D=3D > > - (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1)) { > > - DFN ( _sse_MultiTexCoord2fv, rmesa->vb.dfn_cache.MultiTexCoord2f= vARB ); > > - FIXUP(dfn->code, 18, 0xdeadbeef, (int)rmesa->vb.texcoordptr[0]);= =09 > > - } else { > > - DFN ( _sse_MultiTexCoord2fv_2, rmesa->vb.dfn_cache.MultiTexCoord= 2fvARB ); > > - FIXUP(dfn->code, 14, 0x0, (int)rmesa->vb.texcoordptr); > > +/* question: should we just let the case for RADEON_CP_VC_FRMT_ST0 for= the */ > > +/* default path? Maybe some programs just use glMultiTexCoord2f(v)ARB = for */ > > +/* every TMU, as GL_TEXTURE0_ARB is also allowed */ >=20 > No. If some other set of texture units (say, 0 and 2 or just 1), are=20 > enabled, masking the unit with 3 won't put the data in the correct place= =20 > in the vertex buffer. If just TMU 1 is enabled, it's coordinate needs=20 > to be at the 0th position. Just masking w/3 would put it at the 1st=20 > position, and the chip will get garbage. what I tried to ask was, if it would be better to disable the fast-path for= the case RADEON_CP_VC_FRMT_ST0: (as it was before.) before TMU3-patch it was more or less: switch (statement) { case (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1): fast-path(); break; default: default-path(); break; } >=20 > > + switch (key & (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1|RADEON_C= P_VC_FRMT_ST2)) > > + { > > + case RADEON_CP_VC_FRMT_ST0: > > + case (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1): > > + case (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1|RADEON_CP_VC_F= RMT_ST2): > > + DFN ( _sse_MultiTexCoord2fv, rmesa->vb.dfn_cache.MultiTexCoor= d2fvARB ); > > + FIXUP(dfn->code, 18, 0xdeadbeef, (int)rmesa->vb.texcoordptr[0= ]); > > + break; > > + default: > > + DFN ( _sse_MultiTexCoord2fv_2, rmesa->vb.dfn_cache.MultiTexCo= ord2fvARB ); > > + FIXUP(dfn->code, 14, 0x0, (int)rmesa->vb.texcoordptr); > > + break; > > } > > return dfn; > > } > > diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_= x86.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c > > --- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c= Fri May 2 13:01:56 2003 > > +++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_x86.c = Thu Jun 26 18:11:45 2003 > > @@ -75,7 +75,7 @@ > > if (RADEON_DEBUG & DEBUG_CODEGEN) > > fprintf(stderr, "%s 0x%08x %d\n", __FUNCTION__, key, rmesa->vb.v= ertex_size ); > > =20 > > - switch (rmesa->vb.vertex_size) { > > + switch (rmesa->vb.vertex_size) { /* FIXME: do we need something he= re for 3rd TMU? */ > > case 4: { > > =20 > > DFN ( _x86_Vertex3f_4, rmesa->vb.dfn_cache.Vertex3f ); >=20 > I don't think we have to do anything special. It should just picked up= =20 > by the default case. We may decide to put a fast path, but I don't=20 > expect so. >=20 greetings, Andreas |
From: Ian R. <id...@us...> - 2003-06-27 23:55:32
|
Andreas Stenglein wrote: >>>diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c >>>--- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c Fri May 2 13:01:54 2003 >>>+++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_tcl.c Thu Jun 19 14:02:42 2003 >>>@@ -385,6 +385,16 @@ >>> } >>> } >>> >>>+ if (ctx->Texture.Unit[2]._ReallyEnabled) { >>>+ if (ctx->Texture.Unit[2].TexGenEnabled) { >>>+ if (rmesa->TexGenNeedNormals[2]) { >>>+ inputs |= VERT_BIT_NORMAL; >>>+ } >>>+ } else { >>>+ inputs |= VERT_BIT_TEX2; >>>+ } >>>+ } >>>+ >> >>I don't remember, but could this be made into a loop? > > you are right, yes, maybe. but I thought it isnt a problem for 3TMUs, > only for 6 (r200) ;) > so it could be good to have it here, too. One of my goals in the original texmem work was to minimize the textual diffs between the radeon and r200 drivers. At some point after all driver work gets moved to the Mesa tree, we should look at making a radeon_common directory that contains code shared by those two drivers. I think the time for doing that is still a ways off, but it's something to keep in mind. >>>diff -ru trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c >>>--- trunk_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c Fri May 2 13:01:56 2003 >>>+++ tex3_20030619/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_vtxfmt_sse.c Thu Jun 26 18:58:10 2003 >>>@@ -178,13 +178,21 @@ >>> if (RADEON_DEBUG & DEBUG_CODEGEN) >>> fprintf(stderr, "%s 0x%08x\n", __FUNCTION__, key ); >>> >>>- if ((key & (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1)) == >>>- (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1)) { >>>- DFN ( _sse_MultiTexCoord2fv, rmesa->vb.dfn_cache.MultiTexCoord2fvARB ); >>>- FIXUP(dfn->code, 18, 0xdeadbeef, (int)rmesa->vb.texcoordptr[0]); >>>- } else { >>>- DFN ( _sse_MultiTexCoord2fv_2, rmesa->vb.dfn_cache.MultiTexCoord2fvARB ); >>>- FIXUP(dfn->code, 14, 0x0, (int)rmesa->vb.texcoordptr); >>>+/* question: should we just let the case for RADEON_CP_VC_FRMT_ST0 for the */ >>>+/* default path? Maybe some programs just use glMultiTexCoord2f(v)ARB for */ >>>+/* every TMU, as GL_TEXTURE0_ARB is also allowed */ >> >>No. If some other set of texture units (say, 0 and 2 or just 1), are >>enabled, masking the unit with 3 won't put the data in the correct place >>in the vertex buffer. If just TMU 1 is enabled, it's coordinate needs >>to be at the 0th position. Just masking w/3 would put it at the 1st >>position, and the chip will get garbage. > > > what I tried to ask was, if it would be better to disable the fast-path for the > case RADEON_CP_VC_FRMT_ST0: > (as it was before.) > > before TMU3-patch it was more or less: > switch (statement) { > case (RADEON_CP_VC_FRMT_ST0|RADEON_CP_VC_FRMT_ST1): > fast-path(); break; > default: > default-path(); break; > } I think it should be okay to enable the fast path in this case. I think it was just an over-sight that it wasn't included in the first place. Keith may know better about that particular choice (since he made it!) :) |