You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
(10) |
Apr
(28) |
May
(41) |
Jun
(91) |
Jul
(63) |
Aug
(45) |
Sep
(37) |
Oct
(80) |
Nov
(91) |
Dec
(47) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(48) |
Feb
(121) |
Mar
(126) |
Apr
(16) |
May
(85) |
Jun
(84) |
Jul
(115) |
Aug
(71) |
Sep
(27) |
Oct
(33) |
Nov
(15) |
Dec
(71) |
2002 |
Jan
(73) |
Feb
(34) |
Mar
(39) |
Apr
(135) |
May
(59) |
Jun
(116) |
Jul
(93) |
Aug
(40) |
Sep
(50) |
Oct
(87) |
Nov
(90) |
Dec
(32) |
2003 |
Jan
(181) |
Feb
(101) |
Mar
(231) |
Apr
(240) |
May
(148) |
Jun
(228) |
Jul
(156) |
Aug
(49) |
Sep
(173) |
Oct
(169) |
Nov
(137) |
Dec
(163) |
2004 |
Jan
(243) |
Feb
(141) |
Mar
(183) |
Apr
(364) |
May
(369) |
Jun
(251) |
Jul
(194) |
Aug
(140) |
Sep
(154) |
Oct
(167) |
Nov
(86) |
Dec
(109) |
2005 |
Jan
(176) |
Feb
(140) |
Mar
(112) |
Apr
(158) |
May
(140) |
Jun
(201) |
Jul
(123) |
Aug
(196) |
Sep
(143) |
Oct
(165) |
Nov
(158) |
Dec
(79) |
2006 |
Jan
(90) |
Feb
(156) |
Mar
(125) |
Apr
(146) |
May
(169) |
Jun
(146) |
Jul
(150) |
Aug
(176) |
Sep
(156) |
Oct
(237) |
Nov
(179) |
Dec
(140) |
2007 |
Jan
(144) |
Feb
(116) |
Mar
(261) |
Apr
(279) |
May
(222) |
Jun
(103) |
Jul
(237) |
Aug
(191) |
Sep
(113) |
Oct
(129) |
Nov
(141) |
Dec
(165) |
2008 |
Jan
(152) |
Feb
(195) |
Mar
(242) |
Apr
(146) |
May
(151) |
Jun
(172) |
Jul
(123) |
Aug
(195) |
Sep
(195) |
Oct
(138) |
Nov
(183) |
Dec
(125) |
2009 |
Jan
(268) |
Feb
(281) |
Mar
(295) |
Apr
(293) |
May
(273) |
Jun
(265) |
Jul
(406) |
Aug
(679) |
Sep
(434) |
Oct
(357) |
Nov
(306) |
Dec
(478) |
2010 |
Jan
(856) |
Feb
(668) |
Mar
(927) |
Apr
(269) |
May
(12) |
Jun
(13) |
Jul
(6) |
Aug
(8) |
Sep
(23) |
Oct
(4) |
Nov
(8) |
Dec
(11) |
2011 |
Jan
(4) |
Feb
(2) |
Mar
(3) |
Apr
(9) |
May
(6) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <bug...@fr...> - 2010-04-14 16:10:34
|
https://bugs.freedesktop.org/show_bug.cgi?id=27628 --- Comment #3 from Jesse Barnes <jb...@vi...> 2010-04-14 09:10:27 PDT --- On Wed, 14 Apr 2010 17:57:07 +0200 Mario Kleiner <mar...@tu...> wrote: > Hmm. The patch inits mesa's local cached copy of swap_interval to > zero. The DRI2CreateDrawable() function inside the xserver's xserver/ > hw/xfree86/dri2/dri2.c implementation inits its own copy of > swap_interval to 1, at least in the DRI2 patch series of Jesse and > mine that is supposed be pulled of the next x-server release. > > So until client code calls glXSetSwapIntervalMESA() or > glXSetSwapIntervalSGI() the first time, the returned value will not > match the true setting. > > Either we need to match the defaults, or propagate the mesa setting > to the server at drawable creation time, or query the server's > setting to init mesa's copy. Yeah was just talking with Kristian about this. According to the SGI_swap_interval spec, the default value should be 1. For direct rendered clients we should probably just make them send a swap interval protocol request so that the server will match up and all will be in sync. That way we won't have to keep the server and mesa in sync by hand. I'll get to it later today unless Kristian already has a fix. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: Jesse B. <jb...@vi...> - 2010-04-14 16:10:30
|
On Wed, 14 Apr 2010 17:57:07 +0200 Mario Kleiner <mar...@tu...> wrote: > Hmm. The patch inits mesa's local cached copy of swap_interval to > zero. The DRI2CreateDrawable() function inside the xserver's xserver/ > hw/xfree86/dri2/dri2.c implementation inits its own copy of > swap_interval to 1, at least in the DRI2 patch series of Jesse and > mine that is supposed be pulled of the next x-server release. > > So until client code calls glXSetSwapIntervalMESA() or > glXSetSwapIntervalSGI() the first time, the returned value will not > match the true setting. > > Either we need to match the defaults, or propagate the mesa setting > to the server at drawable creation time, or query the server's > setting to init mesa's copy. Yeah was just talking with Kristian about this. According to the SGI_swap_interval spec, the default value should be 1. For direct rendered clients we should probably just make them send a swap interval protocol request so that the server will match up and all will be in sync. That way we won't have to keep the server and mesa in sync by hand. I'll get to it later today unless Kristian already has a fix. -- Jesse Barnes, Intel Open Source Technology Center |
From: <bug...@fr...> - 2010-04-14 14:16:20
|
https://bugs.freedesktop.org/show_bug.cgi?id=27628 Kristian Høgsberg <kr...@bi...> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #2 from Kristian Høgsberg <kr...@bi...> 2010-04-14 07:16:11 PDT --- Thanks, looks good. Committed: commit 0863c7e499a553c2d8e7bcbd09c5de88e396fcd0 Author: Michael Schmidt <msc...@re...> Date: Wed Apr 14 10:12:42 2010 -0400 Initialize DRI2 swap interval to 0 https://bugs.freedesktop.org/show_bug.cgi?id=27628 -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2010-04-14 11:16:50
|
https://bugs.freedesktop.org/show_bug.cgi?id=27628 Michal Schmidt <msc...@re...> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #34997|0 |1 is obsolete| | --- Comment #1 from Michal Schmidt <msc...@re...> 2010-04-14 04:16:42 PDT --- Created an attachment (id=34998) View: https://bugs.freedesktop.org/attachment.cgi?id=34998 Review: https://bugs.freedesktop.org/review?bug=27628&attachment=34998 [PATCH] Initialize swap_interval to 0 Sorry, attached wrong file before. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: <bug...@fr...> - 2010-04-14 11:14:40
|
https://bugs.freedesktop.org/show_bug.cgi?id=27628 Summary: glxgears prints bogus swap interval information Product: Mesa Version: git Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Demos AssignedTo: mes...@li... ReportedBy: msc...@re... Created an attachment (id=34997) View: https://bugs.freedesktop.org/attachment.cgi?id=34997 Review: https://bugs.freedesktop.org/review?bug=27628&attachment=34997 [PATCH] Initialize swap_interval to 0 $ glxgears Running synchronized to the vertical refresh. The framerate should be approximately 1/1920103026 the monitor refresh rate. The number is obviously nonsense. This is on current Fedora 13 (mesa-7.8.1-1.fc13). I looked into it and here's what I found: - glxgears gets this number using GLX_MESA_swap_control from glXGetSwapIntervalMESA(). - glXGetSwapIntervalMESA() calls into src/glx/dri2_glx.c:dri2GetSwapInterval() which returns priv->swap_interval - this value is not initialized. Valgrind agrees: ==25411== Conditional jump or move depends on uninitialised value(s) ==25411== at 0x402C9A: main (glxgears.c:619) Should pdraw->swap_interval be initialized to 0 in dri2CreateDrawable(), like in the attached patch? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: Roland S. <sr...@vm...> - 2010-04-13 23:54:07
|
On 14.04.2010 00:38, Dave Airlie wrote: > On Wed, Apr 14, 2010 at 8:33 AM, Roland Scheidegger <sr...@vm...> wrote: >> On 13.04.2010 20:28, Alex Deucher wrote: >>> On Tue, Apr 13, 2010 at 2:21 PM, Corbin Simpson >>> <mos...@gm...> wrote: >>>> On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger <sr...@vm...> wrote: >>>>> On 13.04.2010 02:52, Dave Airlie wrote: >>>>>> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >>>>>>> Dave Airlie wrote: >>>>>>>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>>>>>>> failed, I've no idea >>>>>>>> if this passes on Intel hw, but it appears the texenvprogram really >>>>>>>> needs to understand the >>>>>>>> draw buffers. The attached patch fixes it here for me on r300g anyone >>>>>>>> want to test this on Intel >>>>>>>> with the piglit test before/after? >>>>>>> The piglit test passes as-is with Mesa/swrast and NVIDIA. >>>>>>> >>>>>>> It fails with gallium/softpipe both with and w/out your patch. >>>>>>> >>>>>>> I think that your patch is on the right track. But multiple render targets >>>>>>> are still a bit of an untested area in the st/mesa code. >>>>>>> >>>>>>> One thing: the patch introduces a dependency on buffer state in the >>>>>>> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >>>>>>> >>>>>>> Otherwise, I'd like to debug the softpipe failure a bit further to see >>>>>>> what's going on. Perhaps you could hold off on committing this for a bit... >>>>>> Well Eric pointed out to me the fun line in the spec >>>>>> >>>>>> (3) Should gl_FragColor be aliased to gl_FragData[0]? >>>>>> >>>>>> RESOLUTION: No. A shader should write either gl_FragColor, or >>>>>> gl_FragData[n], but not both. >>>>>> >>>>>> Writing to gl_FragColor will write to all draw buffers specified >>>>>> with DrawBuffersARB. >>>>>> >>>>>> So I was really just masking the issue with this. From what I can see >>>>>> softpipe messes up and I'm not sure where we should be fixing this. >>>>>> swrast does okay, its just whether we should be doing something in gallium >>>>>> or in the drivers is open. >>>>> Hmm yes looks like that's not really well defined. I guess there are >>>>> several options here: >>>>> 1) don't do anything at the state tracker level, and assume that if a >>>>> fragment shader only writes to color 0 but has several color buffers >>>>> bound the color is meant to go to all outputs. Looks like that's what >>>>> nv50 is doing today. If a shader writes to FragData[0] but not others, >>>>> in gallium that would mean that output still gets replicated to all >>>>> outputs, but since the spec says unwritten outputs are undefined that >>>>> would be just fine (for OpenGL - not sure about other APIs). >>>>> 2) Use some explicit means to distinguish FragData[] from FragColor in >>>>> gallium. For instance, could use different semantic name (like >>>>> TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective >>>>> outputs). Or could have a flag somewhere (not quite sure where) saying >>>>> if color output is to be replicated to all buffers. >>>>> 3) Translate away the single color output in state tracker to multiple >>>>> outputs. >>>>> >>>>> I don't like option 3) though. Means we need to recompile if the >>>>> attached buffers change. Moreover, it seems both new nvidia and AMD >>>>> chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. >>>>> I don't like option 1) neither, that kind of implicit behavior might be >>>>> ok but this kind of guesswork isn't very nice imho. >>>> Whatever's easiest, just document it. I'd be cool with: >>>> >>>> DECL IN[0], COLOR, PERSPECTIVE >>>> DECL OUT[0], COLOR >>>> MOV OUT[0], IN[0] >>>> END >>>> >>>> Effectively being a write to all color buffers, however, this one from >>>> progs/tests/drawbuffers: >>>> >>>> DCL IN[0], COLOR, LINEAR >>>> DCL OUT[0], COLOR >>>> DCL OUT[1], COLOR[1] >>>> IMM FLT32 { 1.0000, 0.0000, 0.0000, 0.0000 } >>>> 0: MOV OUT[0], IN[0] >>>> 1: SUB OUT[1], IMM[0].xxxx, IN[0] >>>> 2: END >>>> >>>> Would then double-write the second color buffer. Unpleasant. Language >>>> like this would work, I suppose? >>>> >>>> """ >>>> If only one color output is declared, writes to the color output shall >>>> be redirected to all bound color buffers. Otherwise, color outputs >>>> shall be bound to their specific color buffer. >>>> """ >>> Also, keep in mind that writing to multiple color buffers uses >>> additional memory bandwidth, so for performance, we should only do so >>> when required. >> Do apps really have several color buffers bound but only write to one, >> leaving the state of the others undefined in the process? Sounds like a >> poor app to begin with to me. >> Actually, I would restrict that language above further, so only color >> output 0 will get redirected to all buffers if it's the only one >> written. As said though I'd think some explicit bits somewhere are >> cleaner. I'm not yet sure that the above would really work for all APIs, >> it is possible some say other buffers not written to are left as is >> instead of undefined. > > Who knows, the GL API allows for it, I don't see how we can > arbitrarily decide to restrict it. > > I could write an app that uses multiple fragment programs, and > switches between them, with two outputs buffers bound, though I'm > possibly constructing something very arbitary. I fail to see the problem. If you have two color buffers bound but only write to one of them then the implementation is allowed to do anything it wants with the other one as far as I can tell. > > The ARB_draw_buffers explicitly states that Data0 != Color. Yes. I wonder though are there other differences somewhere (I couldn't find any) that one gets replicated the other not? Anyway, it looks like noone likes that implicit option. Hence let's make it explicit in gallium. Not quite sure how yet - this seems to be some sort of shader state. We could use new semantic for that special replicated output, or redefine the existing ones (use generic ones for data outputs and color only for the replicated one). Or maybe we should just make that a tgsi shader property like those for pixel centers? Roland |
From: Dave A. <ai...@gm...> - 2010-04-13 22:38:29
|
On Wed, Apr 14, 2010 at 8:33 AM, Roland Scheidegger <sr...@vm...> wrote: > On 13.04.2010 20:28, Alex Deucher wrote: >> On Tue, Apr 13, 2010 at 2:21 PM, Corbin Simpson >> <mos...@gm...> wrote: >>> On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger <sr...@vm...> wrote: >>>> On 13.04.2010 02:52, Dave Airlie wrote: >>>>> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >>>>>> Dave Airlie wrote: >>>>>>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>>>>>> failed, I've no idea >>>>>>> if this passes on Intel hw, but it appears the texenvprogram really >>>>>>> needs to understand the >>>>>>> draw buffers. The attached patch fixes it here for me on r300g anyone >>>>>>> want to test this on Intel >>>>>>> with the piglit test before/after? >>>>>> The piglit test passes as-is with Mesa/swrast and NVIDIA. >>>>>> >>>>>> It fails with gallium/softpipe both with and w/out your patch. >>>>>> >>>>>> I think that your patch is on the right track. But multiple render targets >>>>>> are still a bit of an untested area in the st/mesa code. >>>>>> >>>>>> One thing: the patch introduces a dependency on buffer state in the >>>>>> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >>>>>> >>>>>> Otherwise, I'd like to debug the softpipe failure a bit further to see >>>>>> what's going on. Perhaps you could hold off on committing this for a bit... >>>>> Well Eric pointed out to me the fun line in the spec >>>>> >>>>> (3) Should gl_FragColor be aliased to gl_FragData[0]? >>>>> >>>>> RESOLUTION: No. A shader should write either gl_FragColor, or >>>>> gl_FragData[n], but not both. >>>>> >>>>> Writing to gl_FragColor will write to all draw buffers specified >>>>> with DrawBuffersARB. >>>>> >>>>> So I was really just masking the issue with this. From what I can see >>>>> softpipe messes up and I'm not sure where we should be fixing this. >>>>> swrast does okay, its just whether we should be doing something in gallium >>>>> or in the drivers is open. >>>> Hmm yes looks like that's not really well defined. I guess there are >>>> several options here: >>>> 1) don't do anything at the state tracker level, and assume that if a >>>> fragment shader only writes to color 0 but has several color buffers >>>> bound the color is meant to go to all outputs. Looks like that's what >>>> nv50 is doing today. If a shader writes to FragData[0] but not others, >>>> in gallium that would mean that output still gets replicated to all >>>> outputs, but since the spec says unwritten outputs are undefined that >>>> would be just fine (for OpenGL - not sure about other APIs). >>>> 2) Use some explicit means to distinguish FragData[] from FragColor in >>>> gallium. For instance, could use different semantic name (like >>>> TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective >>>> outputs). Or could have a flag somewhere (not quite sure where) saying >>>> if color output is to be replicated to all buffers. >>>> 3) Translate away the single color output in state tracker to multiple >>>> outputs. >>>> >>>> I don't like option 3) though. Means we need to recompile if the >>>> attached buffers change. Moreover, it seems both new nvidia and AMD >>>> chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. >>>> I don't like option 1) neither, that kind of implicit behavior might be >>>> ok but this kind of guesswork isn't very nice imho. >>> Whatever's easiest, just document it. I'd be cool with: >>> >>> DECL IN[0], COLOR, PERSPECTIVE >>> DECL OUT[0], COLOR >>> MOV OUT[0], IN[0] >>> END >>> >>> Effectively being a write to all color buffers, however, this one from >>> progs/tests/drawbuffers: >>> >>> DCL IN[0], COLOR, LINEAR >>> DCL OUT[0], COLOR >>> DCL OUT[1], COLOR[1] >>> IMM FLT32 { 1.0000, 0.0000, 0.0000, 0.0000 } >>> 0: MOV OUT[0], IN[0] >>> 1: SUB OUT[1], IMM[0].xxxx, IN[0] >>> 2: END >>> >>> Would then double-write the second color buffer. Unpleasant. Language >>> like this would work, I suppose? >>> >>> """ >>> If only one color output is declared, writes to the color output shall >>> be redirected to all bound color buffers. Otherwise, color outputs >>> shall be bound to their specific color buffer. >>> """ >> >> Also, keep in mind that writing to multiple color buffers uses >> additional memory bandwidth, so for performance, we should only do so >> when required. > > Do apps really have several color buffers bound but only write to one, > leaving the state of the others undefined in the process? Sounds like a > poor app to begin with to me. > Actually, I would restrict that language above further, so only color > output 0 will get redirected to all buffers if it's the only one > written. As said though I'd think some explicit bits somewhere are > cleaner. I'm not yet sure that the above would really work for all APIs, > it is possible some say other buffers not written to are left as is > instead of undefined. Who knows, the GL API allows for it, I don't see how we can arbitrarily decide to restrict it. I could write an app that uses multiple fragment programs, and switches between them, with two outputs buffers bound, though I'm possibly constructing something very arbitary. The ARB_draw_buffers explicitly states that Data0 != Color. Dave. > > Roland > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > |
From: Roland S. <sr...@vm...> - 2010-04-13 22:33:27
|
On 13.04.2010 20:28, Alex Deucher wrote: > On Tue, Apr 13, 2010 at 2:21 PM, Corbin Simpson > <mos...@gm...> wrote: >> On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger <sr...@vm...> wrote: >>> On 13.04.2010 02:52, Dave Airlie wrote: >>>> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >>>>> Dave Airlie wrote: >>>>>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>>>>> failed, I've no idea >>>>>> if this passes on Intel hw, but it appears the texenvprogram really >>>>>> needs to understand the >>>>>> draw buffers. The attached patch fixes it here for me on r300g anyone >>>>>> want to test this on Intel >>>>>> with the piglit test before/after? >>>>> The piglit test passes as-is with Mesa/swrast and NVIDIA. >>>>> >>>>> It fails with gallium/softpipe both with and w/out your patch. >>>>> >>>>> I think that your patch is on the right track. But multiple render targets >>>>> are still a bit of an untested area in the st/mesa code. >>>>> >>>>> One thing: the patch introduces a dependency on buffer state in the >>>>> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >>>>> >>>>> Otherwise, I'd like to debug the softpipe failure a bit further to see >>>>> what's going on. Perhaps you could hold off on committing this for a bit... >>>> Well Eric pointed out to me the fun line in the spec >>>> >>>> (3) Should gl_FragColor be aliased to gl_FragData[0]? >>>> >>>> RESOLUTION: No. A shader should write either gl_FragColor, or >>>> gl_FragData[n], but not both. >>>> >>>> Writing to gl_FragColor will write to all draw buffers specified >>>> with DrawBuffersARB. >>>> >>>> So I was really just masking the issue with this. From what I can see >>>> softpipe messes up and I'm not sure where we should be fixing this. >>>> swrast does okay, its just whether we should be doing something in gallium >>>> or in the drivers is open. >>> Hmm yes looks like that's not really well defined. I guess there are >>> several options here: >>> 1) don't do anything at the state tracker level, and assume that if a >>> fragment shader only writes to color 0 but has several color buffers >>> bound the color is meant to go to all outputs. Looks like that's what >>> nv50 is doing today. If a shader writes to FragData[0] but not others, >>> in gallium that would mean that output still gets replicated to all >>> outputs, but since the spec says unwritten outputs are undefined that >>> would be just fine (for OpenGL - not sure about other APIs). >>> 2) Use some explicit means to distinguish FragData[] from FragColor in >>> gallium. For instance, could use different semantic name (like >>> TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective >>> outputs). Or could have a flag somewhere (not quite sure where) saying >>> if color output is to be replicated to all buffers. >>> 3) Translate away the single color output in state tracker to multiple >>> outputs. >>> >>> I don't like option 3) though. Means we need to recompile if the >>> attached buffers change. Moreover, it seems both new nvidia and AMD >>> chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. >>> I don't like option 1) neither, that kind of implicit behavior might be >>> ok but this kind of guesswork isn't very nice imho. >> Whatever's easiest, just document it. I'd be cool with: >> >> DECL IN[0], COLOR, PERSPECTIVE >> DECL OUT[0], COLOR >> MOV OUT[0], IN[0] >> END >> >> Effectively being a write to all color buffers, however, this one from >> progs/tests/drawbuffers: >> >> DCL IN[0], COLOR, LINEAR >> DCL OUT[0], COLOR >> DCL OUT[1], COLOR[1] >> IMM FLT32 { 1.0000, 0.0000, 0.0000, 0.0000 } >> 0: MOV OUT[0], IN[0] >> 1: SUB OUT[1], IMM[0].xxxx, IN[0] >> 2: END >> >> Would then double-write the second color buffer. Unpleasant. Language >> like this would work, I suppose? >> >> """ >> If only one color output is declared, writes to the color output shall >> be redirected to all bound color buffers. Otherwise, color outputs >> shall be bound to their specific color buffer. >> """ > > Also, keep in mind that writing to multiple color buffers uses > additional memory bandwidth, so for performance, we should only do so > when required. Do apps really have several color buffers bound but only write to one, leaving the state of the others undefined in the process? Sounds like a poor app to begin with to me. Actually, I would restrict that language above further, so only color output 0 will get redirected to all buffers if it's the only one written. As said though I'd think some explicit bits somewhere are cleaner. I'm not yet sure that the above would really work for all APIs, it is possible some say other buffers not written to are left as is instead of undefined. Roland |
From: Dave A. <ai...@gm...> - 2010-04-13 22:30:14
|
On Tue, Apr 13, 2010 at 11:42 PM, Roland Scheidegger <sr...@vm...> wrote: > On 13.04.2010 02:52, Dave Airlie wrote: >> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >>> Dave Airlie wrote: >>>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>>> failed, I've no idea >>>> if this passes on Intel hw, but it appears the texenvprogram really >>>> needs to understand the >>>> draw buffers. The attached patch fixes it here for me on r300g anyone >>>> want to test this on Intel >>>> with the piglit test before/after? >>> The piglit test passes as-is with Mesa/swrast and NVIDIA. >>> >>> It fails with gallium/softpipe both with and w/out your patch. >>> >>> I think that your patch is on the right track. But multiple render targets >>> are still a bit of an untested area in the st/mesa code. >>> >>> One thing: the patch introduces a dependency on buffer state in the >>> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >>> >>> Otherwise, I'd like to debug the softpipe failure a bit further to see >>> what's going on. Perhaps you could hold off on committing this for a bit... >> >> Well Eric pointed out to me the fun line in the spec >> >> (3) Should gl_FragColor be aliased to gl_FragData[0]? >> >> RESOLUTION: No. A shader should write either gl_FragColor, or >> gl_FragData[n], but not both. >> >> Writing to gl_FragColor will write to all draw buffers specified >> with DrawBuffersARB. >> >> So I was really just masking the issue with this. From what I can see >> softpipe messes up and I'm not sure where we should be fixing this. >> swrast does okay, its just whether we should be doing something in gallium >> or in the drivers is open. > > Hmm yes looks like that's not really well defined. I guess there are > several options here: > 1) don't do anything at the state tracker level, and assume that if a > fragment shader only writes to color 0 but has several color buffers > bound the color is meant to go to all outputs. Looks like that's what > nv50 is doing today. If a shader writes to FragData[0] but not others, > in gallium that would mean that output still gets replicated to all > outputs, but since the spec says unwritten outputs are undefined that > would be just fine (for OpenGL - not sure about other APIs). > 2) Use some explicit means to distinguish FragData[] from FragColor in > gallium. For instance, could use different semantic name (like > TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective > outputs). Or could have a flag somewhere (not quite sure where) saying > if color output is to be replicated to all buffers. > 3) Translate away the single color output in state tracker to multiple > outputs. > > I don't like option 3) though. Means we need to recompile if the > attached buffers change. Moreover, it seems both new nvidia and AMD > chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. > I don't like option 1) neither, that kind of implicit behavior might be > ok but this kind of guesswork isn't very nice imho. > Yeah 3 is definitely out, I've tried it its too messy, esp with 1->n and n->1 transitions, I'd be happy with 1 or 2, though I do wonder with 1 do you end up binding Color and Data0 implicitly say you bound 2 buffers but your fragprog only emits to one on purpose (maybe you have a few fragprogs). So I expect 2 is the correct answer. Dave. |
From: Alex D. <ale...@gm...> - 2010-04-13 18:28:35
|
On Tue, Apr 13, 2010 at 2:21 PM, Corbin Simpson <mos...@gm...> wrote: > On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger <sr...@vm...> wrote: >> On 13.04.2010 02:52, Dave Airlie wrote: >>> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >>>> Dave Airlie wrote: >>>>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>>>> failed, I've no idea >>>>> if this passes on Intel hw, but it appears the texenvprogram really >>>>> needs to understand the >>>>> draw buffers. The attached patch fixes it here for me on r300g anyone >>>>> want to test this on Intel >>>>> with the piglit test before/after? >>>> The piglit test passes as-is with Mesa/swrast and NVIDIA. >>>> >>>> It fails with gallium/softpipe both with and w/out your patch. >>>> >>>> I think that your patch is on the right track. But multiple render targets >>>> are still a bit of an untested area in the st/mesa code. >>>> >>>> One thing: the patch introduces a dependency on buffer state in the >>>> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >>>> >>>> Otherwise, I'd like to debug the softpipe failure a bit further to see >>>> what's going on. Perhaps you could hold off on committing this for a bit... >>> >>> Well Eric pointed out to me the fun line in the spec >>> >>> (3) Should gl_FragColor be aliased to gl_FragData[0]? >>> >>> RESOLUTION: No. A shader should write either gl_FragColor, or >>> gl_FragData[n], but not both. >>> >>> Writing to gl_FragColor will write to all draw buffers specified >>> with DrawBuffersARB. >>> >>> So I was really just masking the issue with this. From what I can see >>> softpipe messes up and I'm not sure where we should be fixing this. >>> swrast does okay, its just whether we should be doing something in gallium >>> or in the drivers is open. >> >> Hmm yes looks like that's not really well defined. I guess there are >> several options here: >> 1) don't do anything at the state tracker level, and assume that if a >> fragment shader only writes to color 0 but has several color buffers >> bound the color is meant to go to all outputs. Looks like that's what >> nv50 is doing today. If a shader writes to FragData[0] but not others, >> in gallium that would mean that output still gets replicated to all >> outputs, but since the spec says unwritten outputs are undefined that >> would be just fine (for OpenGL - not sure about other APIs). >> 2) Use some explicit means to distinguish FragData[] from FragColor in >> gallium. For instance, could use different semantic name (like >> TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective >> outputs). Or could have a flag somewhere (not quite sure where) saying >> if color output is to be replicated to all buffers. >> 3) Translate away the single color output in state tracker to multiple >> outputs. >> >> I don't like option 3) though. Means we need to recompile if the >> attached buffers change. Moreover, it seems both new nvidia and AMD >> chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. >> I don't like option 1) neither, that kind of implicit behavior might be >> ok but this kind of guesswork isn't very nice imho. > > Whatever's easiest, just document it. I'd be cool with: > > DECL IN[0], COLOR, PERSPECTIVE > DECL OUT[0], COLOR > MOV OUT[0], IN[0] > END > > Effectively being a write to all color buffers, however, this one from > progs/tests/drawbuffers: > > DCL IN[0], COLOR, LINEAR > DCL OUT[0], COLOR > DCL OUT[1], COLOR[1] > IMM FLT32 { 1.0000, 0.0000, 0.0000, 0.0000 } > 0: MOV OUT[0], IN[0] > 1: SUB OUT[1], IMM[0].xxxx, IN[0] > 2: END > > Would then double-write the second color buffer. Unpleasant. Language > like this would work, I suppose? > > """ > If only one color output is declared, writes to the color output shall > be redirected to all bound color buffers. Otherwise, color outputs > shall be bound to their specific color buffer. > """ Also, keep in mind that writing to multiple color buffers uses additional memory bandwidth, so for performance, we should only do so when required. Alex |
From: Corbin S. <mos...@gm...> - 2010-04-13 18:21:49
|
On Tue, Apr 13, 2010 at 6:42 AM, Roland Scheidegger <sr...@vm...> wrote: > On 13.04.2010 02:52, Dave Airlie wrote: >> On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >>> Dave Airlie wrote: >>>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>>> failed, I've no idea >>>> if this passes on Intel hw, but it appears the texenvprogram really >>>> needs to understand the >>>> draw buffers. The attached patch fixes it here for me on r300g anyone >>>> want to test this on Intel >>>> with the piglit test before/after? >>> The piglit test passes as-is with Mesa/swrast and NVIDIA. >>> >>> It fails with gallium/softpipe both with and w/out your patch. >>> >>> I think that your patch is on the right track. But multiple render targets >>> are still a bit of an untested area in the st/mesa code. >>> >>> One thing: the patch introduces a dependency on buffer state in the >>> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >>> >>> Otherwise, I'd like to debug the softpipe failure a bit further to see >>> what's going on. Perhaps you could hold off on committing this for a bit... >> >> Well Eric pointed out to me the fun line in the spec >> >> (3) Should gl_FragColor be aliased to gl_FragData[0]? >> >> RESOLUTION: No. A shader should write either gl_FragColor, or >> gl_FragData[n], but not both. >> >> Writing to gl_FragColor will write to all draw buffers specified >> with DrawBuffersARB. >> >> So I was really just masking the issue with this. From what I can see >> softpipe messes up and I'm not sure where we should be fixing this. >> swrast does okay, its just whether we should be doing something in gallium >> or in the drivers is open. > > Hmm yes looks like that's not really well defined. I guess there are > several options here: > 1) don't do anything at the state tracker level, and assume that if a > fragment shader only writes to color 0 but has several color buffers > bound the color is meant to go to all outputs. Looks like that's what > nv50 is doing today. If a shader writes to FragData[0] but not others, > in gallium that would mean that output still gets replicated to all > outputs, but since the spec says unwritten outputs are undefined that > would be just fine (for OpenGL - not sure about other APIs). > 2) Use some explicit means to distinguish FragData[] from FragColor in > gallium. For instance, could use different semantic name (like > TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective > outputs). Or could have a flag somewhere (not quite sure where) saying > if color output is to be replicated to all buffers. > 3) Translate away the single color output in state tracker to multiple > outputs. > > I don't like option 3) though. Means we need to recompile if the > attached buffers change. Moreover, it seems both new nvidia and AMD > chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. > I don't like option 1) neither, that kind of implicit behavior might be > ok but this kind of guesswork isn't very nice imho. Whatever's easiest, just document it. I'd be cool with: DECL IN[0], COLOR, PERSPECTIVE DECL OUT[0], COLOR MOV OUT[0], IN[0] END Effectively being a write to all color buffers, however, this one from progs/tests/drawbuffers: DCL IN[0], COLOR, LINEAR DCL OUT[0], COLOR DCL OUT[1], COLOR[1] IMM FLT32 { 1.0000, 0.0000, 0.0000, 0.0000 } 0: MOV OUT[0], IN[0] 1: SUB OUT[1], IMM[0].xxxx, IN[0] 2: END Would then double-write the second color buffer. Unpleasant. Language like this would work, I suppose? """ If only one color output is declared, writes to the color output shall be redirected to all bound color buffers. Otherwise, color outputs shall be bound to their specific color buffer. """ ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Corbin S. <mos...@gm...> - 2010-04-13 17:58:04
|
On Mon, Apr 12, 2010 at 9:13 PM, nitesh suthar <nit...@gm...> wrote: > hello all, > thank you for reply > Actually I am working on arm board which support OpenGLES 1.1 and OpenGLES > 2.0. > which provides hardware acceleration for redering images. > What my objective is that I want to implement this OpenGLES 2.0 into mesa > for interactive graphics with hardware acceleration to use Mesa in arm board > with this suppurted OpenGLES libraries. > how could I proceed for implement OpenGLES 2.0 in Mesa ? GL and GLES are programming APIs, not hardware APIs. In order to support your hardware, you will have to write a backend driver which interfaces with that hardware for Mesa. We're not all embedded developers, but some of us might be familiar with your hardware. Which chipset and board is this? ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Roland S. <sr...@vm...> - 2010-04-13 13:43:09
|
On 13.04.2010 02:52, Dave Airlie wrote: > On Tue, Apr 6, 2010 at 2:00 AM, Brian Paul <br...@vm...> wrote: >> Dave Airlie wrote: >>> Just going down the r300g piglit failures and noticed fbo-drawbuffers >>> failed, I've no idea >>> if this passes on Intel hw, but it appears the texenvprogram really >>> needs to understand the >>> draw buffers. The attached patch fixes it here for me on r300g anyone >>> want to test this on Intel >>> with the piglit test before/after? >> The piglit test passes as-is with Mesa/swrast and NVIDIA. >> >> It fails with gallium/softpipe both with and w/out your patch. >> >> I think that your patch is on the right track. But multiple render targets >> are still a bit of an untested area in the st/mesa code. >> >> One thing: the patch introduces a dependency on buffer state in the >> texenvprogram code so in state.c we should check for the _NEW_BUFFERS flag. >> >> Otherwise, I'd like to debug the softpipe failure a bit further to see >> what's going on. Perhaps you could hold off on committing this for a bit... > > Well Eric pointed out to me the fun line in the spec > > (3) Should gl_FragColor be aliased to gl_FragData[0]? > > RESOLUTION: No. A shader should write either gl_FragColor, or > gl_FragData[n], but not both. > > Writing to gl_FragColor will write to all draw buffers specified > with DrawBuffersARB. > > So I was really just masking the issue with this. From what I can see > softpipe messes up and I'm not sure where we should be fixing this. > swrast does okay, its just whether we should be doing something in gallium > or in the drivers is open. Hmm yes looks like that's not really well defined. I guess there are several options here: 1) don't do anything at the state tracker level, and assume that if a fragment shader only writes to color 0 but has several color buffers bound the color is meant to go to all outputs. Looks like that's what nv50 is doing today. If a shader writes to FragData[0] but not others, in gallium that would mean that output still gets replicated to all outputs, but since the spec says unwritten outputs are undefined that would be just fine (for OpenGL - not sure about other APIs). 2) Use some explicit means to distinguish FragData[] from FragColor in gallium. For instance, could use different semantic name (like TGSI_SEMANTIC_COLOR and TGSI_SEMANTIC_GENERIC for the respective outputs). Or could have a flag somewhere (not quite sure where) saying if color output is to be replicated to all buffers. 3) Translate away the single color output in state tracker to multiple outputs. I don't like option 3) though. Means we need to recompile if the attached buffers change. Moreover, it seems both new nvidia and AMD chips (r600 has MULTIWRITE_ENABLE bit) handle this just fine in hw. I don't like option 1) neither, that kind of implicit behavior might be ok but this kind of guesswork isn't very nice imho. Opinions? Roland |
From: Luc V. <li...@sk...> - 2010-04-13 11:37:21
|
On Sat, Apr 10, 2010 at 04:44:09PM +0100, Keith Whitwell wrote: > I haven't been following this very closely, so apologies if I'm going > over established ground. > > This patch appears to create new libraries from some subset of Mesa's > internals. At a guess you're selecting some internal interface(s) > within mesa to become the public API of those libraries? I'm pretty > sure that's not something we're trying to achieve in this project. Why is this not something that you're trying to achieve in this project? It seems like simply good software development practice. > We have public APIs which we respect and implement, and which are > maintained and evolved under the Khronos umbrella. > > We have a strong driver-level interface (ie gallium) which we benefit > greatly from being rev as fast and as hard as we can, to adapt to our > evolving understanding of the problem space. We really don't want to > subject that interface to the strictures that come with becoming a > public API or ABI. So you just do not want to provide the free software world with the ability to just update drivers, and instead want everyone to keep updating most of their installations when they encounter a bug? There is a need to have integrated driver stacks from all sides. The only involved party that is against this is some of the developers. Hardware vendors, distribution vendors, and all factions of users that you can name, they all have an explicit need on being able to update drivers without having to update the rest of their often extensively tested and trusted software. This is not a desire, like wanting windows/console like performance, and a really bling desktop. It is a tangible and clearly identifiable need. Without this ability to update graphics driver stacks without having to update the whole of the installation, we can rest assured that the free software desktop will never happen. You claim that you have a strong interface with gallium, but in the same sentence, you claim that you have a need to completely change that interface all the time. These two things don't really go together. By exporting this interface, you do not kill the ability to throw the interface around completely. You just cannot do so gratuitously anymore; you have to do some versioning and you have to updating what, in the end, will be external drivers. This is just more hassle over the current mode of working, and not exactly an unovercomable hurdle. The reward for that will be that driver developers can get their users to test updated drivers quickly and easily (as long as those updates do not depend on infrastructure changes), which vastly increases the likelyhood that these users get a good experience with their linux desktop in the near future. But having said that, going from one end of the scale to the completely opposite is impossible, and my patches are taking the first careful steps, showing the way forward, and are not forcing anything on the mesa developers. > We have a third interface, the Mesa driver interface, which may appear > relatively stable, but which is more accurately described as > neglected, veering towards deprecated. The shortcomings of this > interface, in particular its porous nature and inappropriate > abstraction level, were major catalysts provoking the development of > gallium. Those shortcomings would also make it less than appropriate > as a public API. At some point in the future, that interface will > either roar back into rapid evolution or truly be deprecated in favour > of either gallium or a low-level, legacy-free GL3/4 subset from > Khronos. > > I can't easily tell from your patch what interfaces these new > libraries have, or what expectations there would be for maintaining > compatibility on those interfaces going forward. I'd like to see that > spelled out explicitly. Without investigating further, I can say I'm > not aware of any interface within Mesa which we'd be happy to promote > to a public interface, nor one which we'd be happy to accept extra > restrictions on in terms of compatibility. The fact that the driver interface for mesa/dri is a mess, only aids my point here. If this SDK had been created earlier (as in, way before mesa 7.5, when gallium was pulled into master), then the mesa/dri interface would be much less of a mess today. The fact that you consider the mesa/dri mostly dead to me just means that this SDK is going to be rather stable and will not require much in the way of maintenance anymore after this. With the drivers and the SDKs i have created in my personaly git repos, I have gone all the way back to 7.0.3. I lost count on how many different versions there are with SDK, but i think somewhere between 15 and 20 currently. The changes to the SDK are all pretty managable, and i more often have had to spend time adjusting for build system changes and mesa internal dependencies than anything else. This quite effectively proves the long term feasibility of this SDK. Why do you like to see these "new" interfaces spelled out completely? You said yourself that this was just one big mess. It is not my job to go in and clean up this mess that was created over the timeframe of more than a decade, just to get you to see a clean API that you now apparently require before this very pragmatic patch can go in. What i do here is identify those bits of the API that are required for building the dri drivers externally. This is a very first step, one that does not hamper anyones abilities to work the way they are used to. If this does not evolve anymore because, as you said, you consider mesa/dri deprecated, then so be it, then this SDK is the actual API. How does this patch limit you in anyway? Could it be that you are more afraid of getting a ball rolling here, of being unable to mute existing demand with "impossible" (which, tbh, i haven't heard from you personally, but have heard frequently from others), than you are of this actual patch. To conclude, you basically state here that: a) you do not want any mesa internal APIs to become slightly more public as that might make it less easy to completely throw things around. b) mesa/dri is a mess that you consider deprecated, and you do not want the world to see what sort of a mess it became over the space of more than a decade and due to a). c) mesa/dri has no API, even though the fact that there are a dozen or so drivers using this "No-API" kind of points the other way. What i want to state is: a) Everyone needs the ability to get driver updates and bugfixes easily and painlessly, and the creation of an SDK, by formalizing existing library boundaries and headers can solve that with very little pain and only build system additions. b) It doesn't matter whether mesa/dri is considered deprecated, that just makes it easier to maintain this SDK. c) It doesn't matter what the api looks like, this patch is just a very pragmatic approach where we just pick what the drivers require. We actually might learn a thing or two here which might make gallium better software in future, without forcing a completely new mode of working on anybody today. If i had done the same work for mesa/gallium there would be a lot more protest, even though there are less drivers, and the existing API is supposed to be much much clean and beautiful, which inherently makes that a lot more practical than what i have now done for mesa/dri. Luc Verhaegen. |
From: Keith W. <ke...@vm...> - 2010-04-13 11:15:40
|
On Tue, 2010-04-13 at 03:55 -0700, Luca Barbieri wrote: > Personally I think the simplest idea for now could be to have all > drivers support 256 indices or, in the case of r600 and svga, the > maximum value supported by the hardware, and expose that as a cap (as > well as another cap for the number of different semantic values > supported at once). > The minimum guaranteed value is set to the lowest hardware constraint, > which would be svga with 219 indices (assuming no bcolor is used). > If some new constraints pop up, we just lower it and change SM3 state > trackers to check for it and fallback otherwise. Luca, Thanks for your patience and efforts in compiling this - I really appreciate the effort you've put into this and the persistence to keep coming back to it. The patchset looks good to me at first reading, I'll dig in more deeply. Keith |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:07
|
--- src/gallium/drivers/nvfx/nvfx_fragprog.c | 146 ++++++++++++++++++---------- src/gallium/drivers/nvfx/nvfx_shader.h | 1 + src/gallium/drivers/nvfx/nvfx_state.c | 4 + src/gallium/drivers/nvfx/nvfx_state.h | 15 +++ src/gallium/drivers/nvfx/nvfx_state_emit.c | 2 +- src/gallium/drivers/nvfx/nvfx_vertprog.c | 40 ++++++-- 6 files changed, 143 insertions(+), 65 deletions(-) diff --git a/src/gallium/drivers/nvfx/nvfx_fragprog.c b/src/gallium/drivers/nvfx/nvfx_fragprog.c index 5fa825a..b4b63e2 100644 --- a/src/gallium/drivers/nvfx/nvfx_fragprog.c +++ b/src/gallium/drivers/nvfx/nvfx_fragprog.c @@ -1,6 +1,7 @@ #include "pipe/p_context.h" #include "pipe/p_defines.h" #include "pipe/p_state.h" +#include "util/u_semantics.h" #include "util/u_inlines.h" #include "pipe/p_shader_tokens.h" @@ -16,8 +17,6 @@ struct nvfx_fpc { struct nvfx_fragment_program *fp; - uint attrib_map[PIPE_MAX_SHADER_INPUTS]; - unsigned r_temps; unsigned r_temps_discard; struct nvfx_sreg r_result[PIPE_MAX_SHADER_OUTPUTS]; @@ -36,6 +35,8 @@ struct nvfx_fpc { struct nvfx_sreg imm[MAX_IMM]; unsigned nr_imm; + + unsigned char sem_table[256]; /* semantic idx for each input semantic */ }; static INLINE struct nvfx_sreg @@ -111,6 +112,11 @@ emit_src(struct nvfx_fpc *fpc, int pos, struct nvfx_sreg src) sr |= (NVFX_FP_REG_TYPE_TEMP << NVFX_FP_REG_TYPE_SHIFT); sr |= (src.index << NVFX_FP_REG_SRC_SHIFT); break; + case NVFXSR_RELOCATED: + sr |= (NVFX_FP_REG_TYPE_INPUT << NVFX_FP_REG_TYPE_SHIFT); + printf("adding relocation at %x for %x\n", fpc->inst_offset, src.index); + util_dynarray_append(&fpc->fp->sem_relocs[src.index], unsigned, fpc->inst_offset); + break; case NVFXSR_CONST: if (!fpc->have_const) { grow_insns(fpc, 4); @@ -241,8 +247,28 @@ tgsi_src(struct nvfx_fpc *fpc, const struct tgsi_full_src_register *fsrc) switch (fsrc->Register.File) { case TGSI_FILE_INPUT: - src = nvfx_sr(NVFXSR_INPUT, - fpc->attrib_map[fsrc->Register.Index]); + if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_POSITION) { + assert(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_POSITION); + } else if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_COLOR) { + if(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0) + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_COL0); + else if(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 1) + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_COL1); + else + assert(0); + } else if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_FOG) { + assert(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_FOGC); + } else if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_FACE) { + /* TODO: check this has the correct values */ + /* XXX: what do we do for nv30 here (assuming it lacks facing)?! */ + assert(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NV40_FP_OP_INPUT_SRC_FACING); + } else { + assert(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_GENERIC); + src = nvfx_sr(NVFXSR_RELOCATED, fpc->sem_table[fpc->fp->info.input_semantic_index[fsrc->Register.Index]]); + } break; case TGSI_FILE_CONSTANT: src = constant(fpc, fsrc->Register.Index, NULL); @@ -611,48 +637,6 @@ nvfx_fragprog_parse_instruction(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, } static boolean -nvfx_fragprog_parse_decl_attrib(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, - const struct tgsi_full_declaration *fdec) -{ - int hw; - - switch (fdec->Semantic.Name) { - case TGSI_SEMANTIC_POSITION: - hw = NVFX_FP_OP_INPUT_SRC_POSITION; - break; - case TGSI_SEMANTIC_COLOR: - if (fdec->Semantic.Index == 0) { - hw = NVFX_FP_OP_INPUT_SRC_COL0; - } else - if (fdec->Semantic.Index == 1) { - hw = NVFX_FP_OP_INPUT_SRC_COL1; - } else { - NOUVEAU_ERR("bad colour semantic index\n"); - return FALSE; - } - break; - case TGSI_SEMANTIC_FOG: - hw = NVFX_FP_OP_INPUT_SRC_FOGC; - break; - case TGSI_SEMANTIC_GENERIC: - if (fdec->Semantic.Index <= 7) { - hw = NVFX_FP_OP_INPUT_SRC_TC(fdec->Semantic. - Index); - } else { - NOUVEAU_ERR("bad generic semantic index\n"); - return FALSE; - } - break; - default: - NOUVEAU_ERR("bad input semantic\n"); - return FALSE; - } - - fpc->attrib_map[fdec->Range.First] = hw; - return TRUE; -} - -static boolean nvfx_fragprog_parse_decl_output(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, const struct tgsi_full_declaration *fdec) { @@ -691,6 +675,15 @@ nvfx_fragprog_prepare(struct nvfx_context* nvfx, struct nvfx_fpc *fpc) { struct tgsi_parse_context p; int high_temp = -1, i; + struct util_semantic_set set; + + fpc->fp->num_semantics = util_semantic_set_from_program_file(&set, fpc->fp->pipe.tokens, TGSI_FILE_INPUT); + if(fpc->fp->num_semantics > 8) + return FALSE; + util_semantic_layout_from_set(fpc->fp->semantics, &set, 0, 8); + util_semantic_table_from_layout(fpc->sem_table, fpc->fp->semantics, 0, 8); + + memset(fpc->fp->cur_slots, 0xff, sizeof(fpc->fp->cur_slots)); tgsi_parse_init(&p, fpc->fp->pipe.tokens); while (!tgsi_parse_end_of_tokens(&p)) { @@ -703,10 +696,6 @@ nvfx_fragprog_prepare(struct nvfx_context* nvfx, struct nvfx_fpc *fpc) const struct tgsi_full_declaration *fdec; fdec = &p.FullToken.FullDeclaration; switch (fdec->Declaration.File) { - case TGSI_FILE_INPUT: - if (!nvfx_fragprog_parse_decl_attrib(nvfx, fpc, fdec)) - goto out_err; - break; case TGSI_FILE_OUTPUT: if (!nvfx_fragprog_parse_decl_output(nvfx, fpc, fdec)) goto out_err; @@ -878,6 +867,31 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) if (nvfx->dirty & NVFX_NEW_FRAGCONST) update = TRUE; + struct nvfx_vertex_program* vp = nvfx->render_mode == HW ? nvfx->vertprog : nvfx->swtnl.vertprog; + if (fp->last_vp_id != vp->id) { + char* vp_sem_table = vp->sem_table; + unsigned char* fp_semantics = fp->semantics; + unsigned diff = 0; + fp->last_vp_id = nvfx->vertprog->id; + unsigned char* cur_slots = fp->cur_slots; + for(unsigned i = 0; i < fp->num_semantics; ++i) { + unsigned char slot_mask = vp_sem_table[fp_semantics[i]]; + diff |= (slot_mask >> 4) & (slot_mask ^ cur_slots[i]); + } + + if(diff) + { + fp->cur_slots_progs_left = fp->progs; + for(unsigned i = 0; i < fp->num_semantics; ++i) { + /* if 0xff, then this will write to the dummy value at fp->last_layout_mask[0] */ + fp->cur_slots[i] = vp_sem_table[fp_semantics[i]] & 0xf; + printf("fp: GENERIC[%i] from fpreg %i\n", fp_semantics[i], fp->cur_slots[i]); + } + + update = TRUE; + } + } + if(update) { ++fp->bo_prog_idx; if(fp->bo_prog_idx >= fp->progs_per_bo) @@ -888,7 +902,9 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) } else { - struct nvfx_fragment_program_bo* fpbo = os_malloc_aligned(sizeof(struct nvfx_fragment_program) + fp->prog_size * fp->progs_per_bo, 16); + struct nvfx_fragment_program_bo* fpbo = os_malloc_aligned(sizeof(struct nvfx_fragment_program) + (fp->prog_size + 8) * fp->progs_per_bo, 16); + fpbo->slots = &fpbo->insn[(fp->prog_size) * fp->progs_per_bo]; + memset(fpbo->slots, 0, 8 * fp->progs_per_bo); if(fp->fpbo) { fpbo->next = fp->fpbo->next; @@ -898,6 +914,8 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) fpbo->next = fpbo; fp->fpbo = fpbo; fpbo->bo = 0; + fp->progs += fp->progs_per_bo; + fp->cur_slots_progs_left += fp->progs_per_bo; nouveau_bo_new(nvfx->screen->base.device, NOUVEAU_BO_VRAM | NOUVEAU_BO_MAP, 64, fp->prog_size * fp->progs_per_bo, &fpbo->bo); nouveau_bo_map(fpbo->bo, NOUVEAU_BO_NOSYNC); @@ -915,6 +933,7 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) } int offset = fp->bo_prog_idx * fp->prog_size; + uint32_t* fpmap = (uint32_t*)((char*)fp->fpbo->bo->map + offset); if(nvfx->constbuf[PIPE_SHADER_FRAGMENT]) { struct pipe_resource* constbuf = nvfx->constbuf[PIPE_SHADER_FRAGMENT]; @@ -922,7 +941,6 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) struct pipe_transfer* transfer; // TODO: does this check make any sense, or should we do this unconditionally? uint32_t* map = pipe_buffer_map(&nvfx->pipe, constbuf, PIPE_TRANSFER_READ, &transfer); - uint32_t* fpmap = (uint32_t*)((char*)fp->fpbo->bo->map + offset); uint32_t* buf = (uint32_t*)((char*)fp->fpbo->insn + offset); for (i = 0; i < fp->nr_consts; ++i) { unsigned off = fp->consts[i].offset; @@ -936,6 +954,25 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) } pipe_buffer_unmap(&nvfx->pipe, constbuf, transfer); } + + if(fp->cur_slots_progs_left) { + unsigned char* fpbo_slots = &fp->fpbo->slots[fp->bo_prog_idx * 8]; + for(unsigned i = 0; i < fp->num_semantics; ++i) { + unsigned value = fp->cur_slots[i];; + if(value != fpbo_slots[i]) { + unsigned* p = (unsigned*)fp->sem_relocs[i].data; + unsigned* pend = (unsigned*)((char*)fp->sem_relocs[i].data + fp->sem_relocs[i].size); + for(; p != pend; ++p) { + unsigned off = *p; + unsigned dw = fp->insn[off]; + dw = (dw & ~NVFX_FP_OP_INPUT_SRC_MASK) | (value << NVFX_FP_OP_INPUT_SRC_SHIFT); + nvfx_fp_memcpy(&fpmap[*p], &dw, sizeof(dw)); + } + fpbo_slots[i] = value; + } + } + --fp->cur_slots_progs_left; + } } if(update || (nvfx->dirty & NVFX_NEW_FRAGPROG)) { @@ -977,6 +1014,7 @@ void nvfx_fragprog_destroy(struct nvfx_context *nvfx, struct nvfx_fragment_program *fp) { + unsigned i; struct nvfx_fragment_program_bo* fpbo = fp->fpbo; if(fpbo) { @@ -991,7 +1029,9 @@ nvfx_fragprog_destroy(struct nvfx_context *nvfx, while(fpbo != fp->fpbo); } + for(i = 0; i < 8; ++i) + util_dynarray_fini(&fp->sem_relocs[i]); + if (fp->insn_len) FREE(fp->insn); } - diff --git a/src/gallium/drivers/nvfx/nvfx_shader.h b/src/gallium/drivers/nvfx/nvfx_shader.h index 50830b3..88cf91b 100644 --- a/src/gallium/drivers/nvfx/nvfx_shader.h +++ b/src/gallium/drivers/nvfx/nvfx_shader.h @@ -323,6 +323,7 @@ #define NVFXSR_INPUT 2 #define NVFXSR_TEMP 3 #define NVFXSR_CONST 4 +#define NVFXSR_RELOCATED 5 #define NVFX_COND_FL 0 #define NVFX_COND_LT 1 diff --git a/src/gallium/drivers/nvfx/nvfx_state.c b/src/gallium/drivers/nvfx/nvfx_state.c index 315de49..3f0c8e6 100644 --- a/src/gallium/drivers/nvfx/nvfx_state.c +++ b/src/gallium/drivers/nvfx/nvfx_state.c @@ -411,9 +411,13 @@ nvfx_vp_state_create(struct pipe_context *pipe, struct nvfx_context *nvfx = nvfx_context(pipe); struct nvfx_vertex_program *vp; + // TODO: use a 64-bit atomic here! + static unsigned long long id = 0; + vp = CALLOC(1, sizeof(struct nvfx_vertex_program)); vp->pipe.tokens = tgsi_dup_tokens(cso->tokens); vp->draw = draw_create_vertex_shader(nvfx->draw, &vp->pipe); + vp->id = ++id; return (void *)vp; } diff --git a/src/gallium/drivers/nvfx/nvfx_state.h b/src/gallium/drivers/nvfx/nvfx_state.h index 9ceb257..3cd7981 100644 --- a/src/gallium/drivers/nvfx/nvfx_state.h +++ b/src/gallium/drivers/nvfx/nvfx_state.h @@ -4,6 +4,8 @@ #include "pipe/p_state.h" #include "tgsi/tgsi_scan.h" #include "nouveau/nouveau_statebuf.h" +#include "util/u_dynarray.h" +#include "util/u_linkage.h" struct nvfx_vertex_program_exec { uint32_t data[4]; @@ -18,6 +20,7 @@ struct nvfx_vertex_program_data { struct nvfx_vertex_program { struct pipe_shader_state pipe; + unsigned long long id; struct draw_vertex_shader *draw; @@ -30,6 +33,8 @@ struct nvfx_vertex_program { struct nvfx_vertex_program_data *consts; unsigned nr_consts; + char sem_table[256]; + struct nouveau_resource *exec; unsigned exec_start; struct nouveau_resource *data; @@ -49,6 +54,7 @@ struct nvfx_fragment_program_data { struct nvfx_fragment_program_bo { struct nvfx_fragment_program_bo* next; struct nouveau_bo* bo; + unsigned char* slots; char insn[] __attribute__((aligned(16))); }; @@ -65,11 +71,20 @@ struct nvfx_fragment_program { struct nvfx_fragment_program_data *consts; unsigned nr_consts; + unsigned num_semantics; /* how many input semantics? */ + unsigned char semantics[8]; /* semantics */ + unsigned char cur_slots[8]; /* current assignment of slots for each used semantic */ + unsigned cur_slots_progs_left; + unsigned long long last_vp_id; + struct util_dynarray sem_relocs[8]; /* semantic relocation offset */ + uint32_t fp_control; unsigned bo_prog_idx; unsigned prog_size; unsigned progs_per_bo; + unsigned progs; + struct nvfx_fragment_program_bo* fpbo; }; diff --git a/src/gallium/drivers/nvfx/nvfx_state_emit.c b/src/gallium/drivers/nvfx/nvfx_state_emit.c index 4137849..1398597 100644 --- a/src/gallium/drivers/nvfx/nvfx_state_emit.c +++ b/src/gallium/drivers/nvfx/nvfx_state_emit.c @@ -47,7 +47,7 @@ nvfx_state_validate_common(struct nvfx_context *nvfx) if(dirty & NVFX_NEW_STIPPLE) nvfx_state_stipple_validate(nvfx); - if(dirty & (NVFX_NEW_FRAGPROG | NVFX_NEW_FRAGCONST)) + if(dirty & (NVFX_NEW_FRAGPROG | NVFX_NEW_FRAGCONST | NVFX_NEW_VERTPROG)) nvfx_fragprog_validate(nvfx); if(dirty & NVFX_NEW_SAMPLER) diff --git a/src/gallium/drivers/nvfx/nvfx_vertprog.c b/src/gallium/drivers/nvfx/nvfx_vertprog.c index b405fd9..4241e73 100644 --- a/src/gallium/drivers/nvfx/nvfx_vertprog.c +++ b/src/gallium/drivers/nvfx/nvfx_vertprog.c @@ -1,7 +1,8 @@ #include "pipe/p_context.h" #include "pipe/p_defines.h" #include "pipe/p_state.h" -#include "util/u_inlines.h" +#include "util/u_semantics.h" +#include "util/u_linkage.h" #include "pipe/p_shader_tokens.h" #include "tgsi/tgsi_parse.h" @@ -60,7 +61,7 @@ temp(struct nvfx_vpc *vpc) return nvfx_sr(NVFXSR_TEMP, idx); } -static INLINE void +static inline void release_temps(struct nvfx_vpc *vpc) { vpc->r_temps &= ~vpc->r_temps_discard; @@ -332,7 +333,7 @@ nvfx_vp_arith(struct nvfx_context* nvfx, struct nvfx_vpc *vpc, int slot, int op, emit_src(nvfx, vpc, hw, 2, s2); } -static INLINE struct nvfx_sreg +static inline struct nvfx_sreg tgsi_src(struct nvfx_vpc *vpc, const struct tgsi_full_src_register *fsrc) { struct nvfx_sreg src; @@ -378,14 +379,14 @@ tgsi_dst(struct nvfx_vpc *vpc, const struct tgsi_full_dst_register *fdst) { dst = vpc->r_address[fdst->Register.Index]; break; default: - NOUVEAU_ERR("bad dst file\n"); + NOUVEAU_ERR("bad dst file %i\n", fdst->Register.File); break; } return dst; } -static INLINE int +static inline int tgsi_mask(uint tgsi) { int mask = 0; @@ -643,12 +644,8 @@ nvfx_vertprog_parse_decl_output(struct nvfx_context* nvfx, struct nvfx_vpc *vpc, hw = NVFX_VP(INST_DEST_PSZ); break; case TGSI_SEMANTIC_GENERIC: - if (fdec->Semantic.Index <= 7) { - hw = NVFX_VP(INST_DEST_TC(fdec->Semantic.Index)); - } else { - NOUVEAU_ERR("bad generic semantic index\n"); - return FALSE; - } + hw = (vpc->vp->sem_table[fdec->Semantic.Index] & 0xf) + + NVFX_VP(INST_DEST_TC(0)) - NVFX_FP_OP_INPUT_SRC_TC(0); break; case TGSI_SEMANTIC_EDGEFLAG: /* not really an error just a fallback */ @@ -668,6 +665,27 @@ nvfx_vertprog_prepare(struct nvfx_context* nvfx, struct nvfx_vpc *vpc) { struct tgsi_parse_context p; int high_temp = -1, high_addr = -1, nr_imm = 0, i; + struct util_semantic_set set; + unsigned char sem_layout[8]; + unsigned sem_layout_size; + unsigned num_outputs; + + num_outputs = util_semantic_set_from_program_file(&set, vpc->vp->pipe.tokens, TGSI_FILE_OUTPUT); + + if(num_outputs > 8) { + NOUVEAU_ERR("too many vertex program outputs: %i\n", num_outputs); + return FALSE; + } + util_semantic_layout_from_set(sem_layout, &set, 8, 8); + + /* hope 0xf is (0, 0, 0, 1) initialized; otherwise, we are _probably_ not required to do this */ + memset(vpc->vp->sem_table, 0x0f, sizeof(vpc->vp->sem_table)); + for(int i = 0; i < 8; ++i) { + if(sem_layout[i] == 0xff) + continue; + printf("vp: GENERIC[%i] to fpreg %i\n", sem_layout[i], NVFX_FP_OP_INPUT_SRC_TC(0) + i); + vpc->vp->sem_table[sem_layout[i]] = 0xf0 | (NVFX_FP_OP_INPUT_SRC_TC(0) + i); + } tgsi_parse_init(&p, vpc->vp->pipe.tokens); while (!tgsi_parse_end_of_tokens(&p)) { -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:06
|
--- src/gallium/auxiliary/Makefile | 1 + src/gallium/auxiliary/util/u_linkage.c | 119 ++++++++++++++++++++++++++++++++ src/gallium/auxiliary/util/u_linkage.h | 38 ++++++++++ 3 files changed, 158 insertions(+), 0 deletions(-) create mode 100644 src/gallium/auxiliary/util/u_linkage.c create mode 100644 src/gallium/auxiliary/util/u_linkage.h diff --git a/src/gallium/auxiliary/Makefile b/src/gallium/auxiliary/Makefile index c4d6b52..44c2f8b 100644 --- a/src/gallium/auxiliary/Makefile +++ b/src/gallium/auxiliary/Makefile @@ -120,6 +120,7 @@ C_SOURCES = \ util/u_hash.c \ util/u_keymap.c \ util/u_linear.c \ + util/u_linkage.c \ util/u_network.c \ util/u_math.c \ util/u_mm.c \ diff --git a/src/gallium/auxiliary/util/u_linkage.c b/src/gallium/auxiliary/util/u_linkage.c new file mode 100644 index 0000000..8a76378 --- /dev/null +++ b/src/gallium/auxiliary/util/u_linkage.c @@ -0,0 +1,119 @@ +#include "util/u_debug.h" +#include "pipe/p_shader_tokens.h" +#include "tgsi/tgsi_parse.h" +#include "tgsi/tgsi_scan.h" +#include "util/u_linkage.h" + +/* we must only record the registers that are actually used, not just declared */ +static INLINE boolean +util_semantic_set_test_and_set(struct util_semantic_set *set, unsigned value) +{ + unsigned mask = 1 << (value % (sizeof(long) * 8)); + unsigned long *p = &set->masks[value / (sizeof(long) * 8)]; + unsigned long v = *p & mask; + *p |= mask; + return !!v; +} + +unsigned +util_semantic_set_from_program_file(struct util_semantic_set *set, const struct tgsi_token *tokens, enum tgsi_file_type file) +{ + struct tgsi_shader_info info; + struct tgsi_parse_context parse; + unsigned count = 0; + ubyte *semantic_name; + ubyte *semantic_index; + + tgsi_scan_shader(tokens, &info); + + if(file == TGSI_FILE_INPUT) + { + semantic_name = info.input_semantic_name; + semantic_index = info.input_semantic_index; + } + else if(file == TGSI_FILE_OUTPUT) + { + semantic_name = info.output_semantic_name; + semantic_index = info.output_semantic_index; + } + else + assert(0); + + tgsi_parse_init(&parse, tokens); + + memset(set->masks, 0, sizeof(set->masks)); + while(!tgsi_parse_end_of_tokens(&parse)) + { + tgsi_parse_token(&parse); + + if(parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_INSTRUCTION) + { + const struct tgsi_full_instruction *finst = &parse.FullToken.FullInstruction; + unsigned i; + for(i = 0; i < finst->Instruction.NumDstRegs; ++i) + { + if(finst->Dst[i].Register.File == file) + { + unsigned idx = finst->Dst[i].Register.Index; + if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC) + { + if(!util_semantic_set_test_and_set(set, semantic_index[idx])) + ++count; + } + } + } + + for(i = 0; i < finst->Instruction.NumSrcRegs; ++i) + { + if(finst->Src[i].Register.File == file) + { + unsigned idx = finst->Src[i].Register.Index; + if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC) + { + if(!util_semantic_set_test_and_set(set, semantic_index[idx])) + ++count; + } + } + } + } + } + tgsi_parse_free(&parse); + + return count; +} + +#define UTIL_SEMANTIC_SET_FOR_EACH(i, set) for(i = 0; i < 256; ++i) if(set->masks[i / (sizeof(long) * 8)] & (1 << (i % (sizeof(long) * 8)))) + +void +util_semantic_layout_from_set(unsigned char *layout, const struct util_semantic_set *set, unsigned efficient_slots, unsigned num_slots) +{ + int first = -1; + int last = -1; + unsigned i; + + memset(layout, 0xff, num_slots); + + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + { + if(first < 0) + first = i; + last = i; + } + + if(last < efficient_slots) + { + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[i] = i; + } + else if((last - first) < efficient_slots) + { + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[i - first] = i; + } + else + { + unsigned idx = 0; + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[idx++] = i; + } +} diff --git a/src/gallium/auxiliary/util/u_linkage.h b/src/gallium/auxiliary/util/u_linkage.h new file mode 100644 index 0000000..e73e0fd --- /dev/null +++ b/src/gallium/auxiliary/util/u_linkage.h @@ -0,0 +1,38 @@ +#ifndef U_LINKAGE_H_ +#define U_LINKAGE_H_ + +#include "pipe/p_compiler.h" + +struct util_semantic_set +{ + unsigned long masks[256 / 8 / sizeof(unsigned long)]; +}; + +static INLINE bool +util_semantic_set_contains(struct util_semantic_set *set, unsigned char value) +{ + return !!(set->masks[value / (sizeof(long) * 8)] & (1 << (value / (sizeof(long) * 8)))); +} + +unsigned util_semantic_set_from_program_file(struct util_semantic_set *set, const struct tgsi_token *tokens, enum tgsi_file_type file); + +/* efficient_slots is the number of slots such that hardware performance is + * the same for using that amount, with holes, or less slots but with less + * holes. + * + * num_slots is the size of the layout array and hardware limit instead. + * + * efficient_slots == 0 or efficient_solts == num_slots are typical settings. + */ +void util_semantic_layout_from_set(unsigned char *layout, const struct util_semantic_set *set, unsigned efficient_slots, unsigned num_slots); + +static INLINE void +util_semantic_table_from_layout(unsigned char *table, unsigned char *layout, unsigned char first_slot_value, unsigned char num_slots) +{ + memset(table, 0xff, sizeof(table)); + + for(int i = 0; i < num_slots; ++i) + table[layout[i]] = first_slot_value + i; +} + +#endif /* U_LINKAGE_H_ */ -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:04
|
--- src/gallium/auxiliary/util/u_semantics.h | 123 ++++++++++++++++++++++++++++++ 1 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 src/gallium/auxiliary/util/u_semantics.h diff --git a/src/gallium/auxiliary/util/u_semantics.h b/src/gallium/auxiliary/util/u_semantics.h new file mode 100644 index 0000000..d620619 --- /dev/null +++ b/src/gallium/auxiliary/util/u_semantics.h @@ -0,0 +1,123 @@ +#ifndef U_SEMANTICS_H_ +#define U_SEMANTICS_H_ + +#include "pipe/p_compiler.h" +#include "pipe/p_shader_tokens.h" + +/* same as SM3 values */ +#define TGSI_SEMANTIC_BYTE_POSITION 0 +#define TGSI_SEMANTIC_BYTE_PSIZE (4 << 4) +#define TGSI_SEMANTIC_BYTE_COLOR0 (10 << 4) +#define TGSI_SEMANTIC_BYTE_COLOR1 (TGSI_SEMANTIC_BYTE_COLOR0 + 1) +#define TGSI_SEMANTIC_BYTE_FOG (11 << 4) +#define TGSI_SEMANTIC_BYTE_BCOLOR0 (14 << 4) +#define TGSI_SEMANTIC_BYTE_BCOLOR1 (TGSI_SEMANTIC_BYTE_BCOLOR0 + 1) +#define TGSI_SEMANTIC_BYTE_TGSI (15 << 4) + +static INLINE unsigned char +pipe_semantic_to_byte(unsigned name, unsigned index) +{ + switch (name) + { + case TGSI_SEMANTIC_POSITION: + return TGSI_SEMANTIC_BYTE_POSITION; + case TGSI_SEMANTIC_PSIZE: + return TGSI_SEMANTIC_BYTE_PSIZE; + case TGSI_SEMANTIC_FOG: + return TGSI_SEMANTIC_BYTE_FOG; + case TGSI_SEMANTIC_COLOR: + return TGSI_SEMANTIC_BYTE_COLOR0 + index; + case TGSI_SEMANTIC_GENERIC: + ++index; + if(index >= TGSI_SEMANTIC_BYTE_PSIZE) + { + ++index; + if(index >= TGSI_SEMANTIC_BYTE_COLOR0) + { + index += 2; + if(index >= TGSI_SEMANTIC_BYTE_FOG) + ++index; + } + } + return index; + case TGSI_SEMANTIC_BCOLOR: + return TGSI_SEMANTIC_BYTE_BCOLOR0 + index; + default: + return TGSI_SEMANTIC_BYTE_TGSI + name; + } +} + +/* this fits BCOLOR in the SM3 range, but is not reversible */ +static INLINE unsigned char +pipe_semantic_to_byte_sm3(unsigned name, unsigned index) +{ + if(name == TGSI_SEMANTIC_BCOLOR) + return TGSI_SEMANTIC_BYTE_BCOLOR0 - 1 - index; + return pipe_semantic_to_byte(name, index); +} + +static INLINE unsigned +pipe_semantic_name_from_byte(unsigned char value) +{ + switch (value) + { + case TGSI_SEMANTIC_BYTE_POSITION: + return TGSI_SEMANTIC_POSITION; + case TGSI_SEMANTIC_BYTE_PSIZE: + return TGSI_SEMANTIC_PSIZE; + case TGSI_SEMANTIC_BYTE_FOG: + return TGSI_SEMANTIC_FOG; + case TGSI_SEMANTIC_BYTE_COLOR0: + case TGSI_SEMANTIC_BYTE_COLOR1: + return TGSI_SEMANTIC_COLOR; + case TGSI_SEMANTIC_BYTE_BCOLOR0: + case TGSI_SEMANTIC_BYTE_BCOLOR1: + return TGSI_SEMANTIC_BCOLOR; + default: + if(value < TGSI_SEMANTIC_BYTE_TGSI) + return TGSI_SEMANTIC_GENERIC; + else + return value - TGSI_SEMANTIC_BYTE_TGSI; + } +} + +static INLINE unsigned +pipe_semantic_index_from_byte(unsigned char value) +{ + if(value == TGSI_SEMANTIC_BYTE_POSITION) + return 0; + + if(value <= TGSI_SEMANTIC_BYTE_PSIZE) + { + if(value < TGSI_SEMANTIC_BYTE_PSIZE) + return value - 1; + else + return 0; + } + + if(value < (TGSI_SEMANTIC_BYTE_COLOR0 + 2)) + { + if(value < TGSI_SEMANTIC_BYTE_COLOR0) + return value - 2; + else + return value - TGSI_SEMANTIC_BYTE_COLOR0; + } + + if(value <= TGSI_SEMANTIC_BYTE_FOG) + { + if(value < TGSI_SEMANTIC_BYTE_FOG) + return value - 4; + else + return 0; + } + + if(value < TGSI_SEMANTIC_BYTE_BCOLOR0) + return value - 5; + + if(value == (TGSI_SEMANTIC_BYTE_BCOLOR1)) + return 1; + + return 0; +} + +#endif /* U_SEMANTICS_H_ */ -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:03
|
--- src/gallium/include/pipe/p_shader_tokens.h | 18 ++++++++++++++++++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index baff802..5d511ba 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -146,6 +146,24 @@ struct tgsi_declaration_dimension #define TGSI_SEMANTIC_INSTANCEID 10 #define TGSI_SEMANTIC_COUNT 11 /**< number of semantic values */ +/* 219 = (14 * 16 - 5) + * All SM3 semantics minus COLOR0, COLOR1, POSITION0, FOG0 and PSIZE0 + * This value is accurately chosen so that Gallium semantic/indices may be converted + * losslessly from and to SM3 semantics. + * + * Note that if BCOLOR is used, then this value is actually 211 - #MAX_BCOLOR_INDEX_USED - 1 + * (SM3 does not support BCOLOR, and uses FACE instead) + * + * In any card supports more, this will be handled later. + * + * However, drivers should support 256 generic indices if the mechanism + * they use is not intrinsically limited to a lower value. + */ +#define TGSI_SEMANTIC_GENERIC_INDICES 219 + +#define TGSI_SEMANTIC_INDICES(sem) (((sem) == TGSI_SEMANTIC_GENERIC) ? TGSI_SEMANTIC_GENERIC_INDICES : \ + ((sem == TGSI_SEMANTIC_COLOR_INDICES || sem == TGSI_SEMANTIC_BCOLOR_INDICES) ? 2 : 1)) + struct tgsi_declaration_semantic { unsigned Name : 8; /**< one of TGSI_SEMANTIC_x */ -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:01
|
Still no control flow support, but basic stuff works. --- src/gallium/drivers/nvfx/nvfx_screen.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/nvfx/nvfx_screen.c b/src/gallium/drivers/nvfx/nvfx_screen.c index 6742759..b935fa9 100644 --- a/src/gallium/drivers/nvfx/nvfx_screen.c +++ b/src/gallium/drivers/nvfx/nvfx_screen.c @@ -42,7 +42,7 @@ nvfx_screen_get_param(struct pipe_screen *pscreen, int param) case PIPE_CAP_TWO_SIDED_STENCIL: return 1; case PIPE_CAP_GLSL: - return 0; + return 1; case PIPE_CAP_ANISOTROPIC_FILTER: return 1; case PIPE_CAP_POINT_SPRITE: -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:00
|
This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. It can also be found in the RFC-gallium-semantics branch. It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. In particular, the following issues are still open: 1. COLOR clamping (and floating point framebuffers) 2. A linkage table CSO allowing to specify non-identity linkage 3. BCOLOR/FACE-related issues 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided This topic was already very extensively discussed. See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. This message should probably be adapted into gallium/docs if/when this is accepted. Here is the short summary; the long rationale follows after it. The proposal here is to add the following limitations to Gallium, for the intermediate semantics: 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. =============== Michal Krol's proposal =============== First of all, see Michal Krol's proposal at http://www.opensource-archive.org/showthread.php?t=148573, and in particular: << name index range ---------------------------- POSITION no limit? COLOR 0..1, explicit clamp? BCOLOR 0..1, explicit clamp? FOG remove? PSIZE 0 GENERIC 0..<max generics> NORMAL remove FACE 0 EDGEFLAG 0 PRIMID 0 INSTANCEID 0 >> My proposal follows this, except for limiting POSITION to 0 too. Not sure why Michal thought "no limit" could make sense: the POSITION is fundamentally a singleton, since it is the input to the rasterizer unit. ====================== An overview of hardware support ====================== Hardware with no capabilities. - nv30 does not support any mapping. However, we already need to patch fragment programs to insert constants, so we can patch input register numbers as well. The current driver only supports 0-7 generic indices, but I already implemented support for 0-255 indices with in-driver linkage and patching. Note that nv30 lacks control flow in fragment programs. - nv40 is like nv30, but supports fp control flow, and may have some configurable mapping support, with unknown behavior Hardware with capabilities that must be configured for each fp/vp pair. - nv40 might have this but the nVidia OpenGL driver does not use them - nv50 has configurable vp->gp and gp->fp mappings with 64 entries. The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses an inefficient O(n^2) algorithm to be able to do that - r300 appears to have a configurable vp->fp mapping. The current driver only supports 0-15 generic indices, but redefining ATTR_GENERIC_COUNT could be enough to have it support larger numbers. Hardware with automatic linkage when semantics match: - VMWare svga appears to support 14 * 16 semantics, but the current driver only supports 0-15 generic indices. This could be fixed by mapping GENERIC into all non-special SM3 semantics. Hardware that can do both configurable mappings and automatic linkage: - r600 supports linkage in hardware between matching apparently byte-sized semantic ids Other hardware; - i915 has no hardware vertex shading The current driver is broken and only supports 0-7 indices: this seems easy to fix though - Not sure about i965 =================== An overview of software APIs =================== 1. DirectX 9 SM3 supports indices in the 0-15 range associated with semantics in the 0-13 range. A few of the name/index pairs have special meanings, but the others are just cosmetic as long as the fixed pipeline is not used. Thus, SM3 wants to use 14 * 16 indices overall. Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC semantics, leaving 219 semantics handled by GENERIC 2. SM2 and non-GLSL OpenGL just want to use as many indices as the hardware interpolator count, sometimes limiting that further They are the most easy and straightforward ones. 3. DirectX 10 seems to only require a 0-31 range. In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and 32-bit indices. However, this information is encoded as metadata in the output file, and the shader bytecode itself uses integers in the 0-31 range to refer to the metadata. It seems that the metadata is resolved by the Microsoft DirectX 10 runtime, and the driver only sees 0-31 indices on the DDI interface. However, this is a bit unclear: confirmation or correction would be appreciated. 4. GLSL requires to provide both shaders at link time, and thus does not constrain the implementation in any way. However, it may be possible to mix GLSL with other shaders, leading to the need to reserve the texcoord slots. In that case, GLSL will need about 8 more slots that the number of effectively used semantics. This is the case with the current Mesa/Gallium implementation 5. GLSL with EXT_separate_shader_objects does not add requirements because only gl_TexCoord and other builtin varyings are supported. User-defined varyings are not supported See in particular the following text from the extension: << It is undesirable from a performance standpoint to attempt to support "rendezvous by name" for arbitrary separate shaders because the separate shaders won't be naturally compiled to match their varying inputs and outputs of the same name without a special link step. Such a special link would introduce an extra validation overhead to binding separate shaders. The link itself would have to be deferred until glBegin time since separate shaders won't match when transitioning from one set of consistent shaders to another. This special link would still create errors or undefined behavior when the names of input and output varyings matched but their types did not match. >> 6. An hypotetical version of EXT_separate_shader_objects extended to support user-defining varyings would either want arbitrary 32-bit generic indices (by interning strings to generate the indices) or the ability to specify a custom mapping between shader indices 7. An hypotetical "no-op" implementation of the GLSL linker would have the same requirement ==================== About non-GENERIC semantics ==================== Also note that non-GENERIC semantics have peculiar properties. For COLOR and BCOLOR: 1. SM3 and OpenGL with glColorClamp appropriately set wants it to _not_ be clamped to [0, 1] 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] (sometimes for fixed point targets only) and may also allow using U8_UNORM precision for it instead of FP32 3. OpenGL allows to enable two-sided lighting, in which case COLOR in the fragment shader is automagically set to BCOLOR for back faces 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. The latest hardware probably supports FACING only. Any API that requires special semantics for COLOR and BCOLOR (i.e. non-SM3) seems to only want 0-1 indices. Note that SM3 does *not* include BCOLOR, so basically the limits for generic indices would need to be conditional on BCOLOR being present or not (e.g. if it is present, we must reserve two semantic slots in svga for it). POSITION0 is obviously special. PSIZE0 is also special for points. FOG0 seems right now to just be a GENERIC with a single component. Gallium could be extended to support fixed function fog, which most DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal to the semantic issue. ============== Current Gallium users ============== Right now no open-source users of Gallium fundamentally require arbitrary indices. In particular: 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices 2. ARB fragment program and vertex program use index-limited texcoord slots 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 4. vega and xorg use indices 0-1 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken 6. The GL_EXT_separate_shader_objects extension does not provide arbitrary index matching for GLSL, but merely lets it use a model similar to ARB fp/vp However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway. =================== Discussion of possible options =================== [Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719] a) Picking a lower number like 128, that an SM3 state tracker could usually be able to directly translate incoming semantics into, but which would force it to renumber under rare circumstances. This would make life easier for the open drivers at the expense of the closed code. b) Picking 256 to make life easier for some closed-source SM3 state tracker, but harder for open drivers. c) Picking 219 (or some other magic number) that happens to work with the current set of constraints, but makes gallium fragile in the face of new constraints. d) Abandoning the current gallium linkage rules and coming up with something new, for instance forcing the state trackers to renumber always and making life trivial for the drivers... [Options from me] (e) Allow arbitrary 32-bit indices. This requires slightly more complicated data structures in some cases, and will require svga and r600 to fallback to software linkage if numbers are too high. (f) Limit semantic indices to hardware interpolators _and_ introduce an interface to let the user specify an Personally I think the simplest idea for now could be to have all drivers support 256 indices or, in the case of r600 and svga, the maximum value supported by the hardware, and expose that as a cap (as well as another cap for the number of different semantic values supported at once). The minimum guaranteed value is set to the lowest hardware constraint, which would be svga with 219 indices (assuming no bcolor is used). If some new constraints pop up, we just lower it and change SM3 state trackers to check for it and fallback otherwise. This should just require simple fixes to svga and r300, and significant code for nv30/nv40, which is however already implemented. Luca Barbieri (5): tgsi: formalize limits on semantic indices tgsi: add support for packing semantics in SM3 byte values gallium/auxiliary: add semantic linkage utility code nvfx: support proper shader linkage - adds glsl support nvfx: expose GLSL Michal Krol (1): gallium: Remove TGSI_SEMANTIC_NORMAL. |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:00
|
From: Michal Krol <mi...@vm...> Use TGSI_SEMANTIC_GENERIC for this kind of stuff. --- src/gallium/auxiliary/tgsi/tgsi_dump.c | 2 +- src/gallium/auxiliary/tgsi/tgsi_text.c | 2 +- src/gallium/docs/source/tgsi.rst | 6 ------ src/gallium/drivers/svga/svga_tgsi_decl_sm30.c | 4 ---- src/gallium/include/pipe/p_shader_tokens.h | 2 +- 5 files changed, 3 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index 5703141..b6df249 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -120,7 +120,7 @@ static const char *semantic_names[] = "FOG", "PSIZE", "GENERIC", - "NORMAL", + "", "FACE", "EDGEFLAG", "PRIM_ID", diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index f918151..356eee0 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -933,7 +933,7 @@ static const char *semantic_names[TGSI_SEMANTIC_COUNT] = "FOG", "PSIZE", "GENERIC", - "NORMAL", + "", "FACE", "EDGEFLAG", "PRIM_ID", diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index c292cd3..d5e0220 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1397,12 +1397,6 @@ These attributes are called "generic" because they may be used for anything else, including parameters, texture generation information, or anything that can be stored inside a four-component vector. -TGSI_SEMANTIC_NORMAL -"""""""""""""""""""" - -Vertex normal; could be used to implement per-pixel lighting for legacy APIs -that allow mixing fixed-function and programmable stages. - TGSI_SEMANTIC_FACE """""""""""""""""" diff --git a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c index 73102a7..05d9102 100644 --- a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c +++ b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c @@ -61,10 +61,6 @@ static boolean translate_vs_ps_semantic( struct tgsi_declaration_semantic semant *idx = semantic.Index + 1; /* texcoord[0] is reserved for fog */ *usage = SVGA3D_DECLUSAGE_TEXCOORD; break; - case TGSI_SEMANTIC_NORMAL: - *idx = semantic.Index; - *usage = SVGA3D_DECLUSAGE_NORMAL; - break; default: assert(0); *usage = SVGA3D_DECLUSAGE_TEXCOORD; diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index c5c480f..baff802 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -139,7 +139,7 @@ struct tgsi_declaration_dimension #define TGSI_SEMANTIC_FOG 3 #define TGSI_SEMANTIC_PSIZE 4 #define TGSI_SEMANTIC_GENERIC 5 -#define TGSI_SEMANTIC_NORMAL 6 + /* gap */ #define TGSI_SEMANTIC_FACE 7 #define TGSI_SEMANTIC_EDGEFLAG 8 #define TGSI_SEMANTIC_PRIMID 9 -- 1.7.0.1.147.g6d84b |
From: Christoph B. <e04...@st...> - 2010-04-13 08:50:05
|
On 04/13/2010 08:07 AM, Luca Barbieri wrote: > On nv30/nv40 support for patching fragment programs is already > necessary (constants must be patched in as immediates), and this can > be handled by just patching the end of the fragment program to include > a variable number of instructions to copy a temp to COLOR[x]. > > It's possible that there could be a hardware mechanism too, haven't checked. > > If other MRT-capable hardware already has this kind of fragment > program patching or supports this in hardware, then a new TGSI > semantic or register file can be added for this, and drivers can > easily implement that without recompilation. > nv50 passes, we check if multiple color results are written and if not, we don't set the FP_CTRL_MULTIPLE_RESULTS bit, and COLOR[0] goes to all RTs :-) I think nv40 might have that too, check the bits in your FP_CONTROL. > Drivers could also just unconditionally write all color outputs as a > first implementation or if that doesn't affect performance. > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev |
From: <bug...@fr...> - 2010-04-13 06:50:38
|
https://bugs.freedesktop.org/show_bug.cgi?id=27612 Summary: Mesa 7.8.1 does not compile against libdrm 2.4.20 Product: Mesa Version: unspecified Platform: Other OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: Other AssignedTo: mes...@li... ReportedBy: evo...@gm... I have a compile error that's a bit weird. MesaLib 7.8.1 compiled fine against libdrm 2.4.20, but then failed to compile again when I re-built my entire system. When I downgraded libdrm from 2.4.20 to 2.4.19, I was again able to build MesaLib 7.8.1 Here is the last line from the compile error: gcc -c -I. -I../../../../../src/mesa/drivers/dri/common -Iserver -I../../../../../include -I../../../../../src/mesa -I../../../../../src/egl/main -I../../../../../src/egl/drivers/dri -I/usr/include/drm -I/usr/include/libdrm -Wall -Wmissing-prototypes -std=c99 -ffast-math -O3 -march=core2 -O2 -pipe -fno-strict-aliasing -fPIC -m32 -mmmx -msse -msse2 -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE -D_GNU_SOURCE -DPTHREADS -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DGLX_DIRECT_RENDERING -DGLX_INDIRECT_RENDERING -DHAVE_ALIAS -DHAVE_POSIX_MEMALIGN -DUSE_X86_ASM -DUSE_MMX_ASM -DUSE_3DNOW_ASM -DUSE_SSE_ASM -fno-strict-aliasing -I../intel -I../intel/server -DI915 -DDRM_VBLANK_FLIP=DRM_VBLANK_FLIP intel_buffer_objects.c -o intel_buffer_objects.o intel_buffer_objects.c: In function 'intel_buffer_purgeable': intel_buffer_objects.c:600: error: 'I915_MADV_DONTNEED' undeclared (first use in this function) intel_buffer_objects.c:600: error: (Each undeclared identifier is reported only once intel_buffer_objects.c:600: error: for each function it appears in.) intel_buffer_objects.c: In function 'intel_buffer_unpurgeable': intel_buffer_objects.c:669: error: 'I915_MADV_WILLNEED' undeclared (first use in this function) make[7]: *** [intel_buffer_objects.o] Error 1 make[7]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1/src/mesa/drivers/dri/i915' make[6]: *** [lib] Error 2 make[6]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1/src/mesa/drivers/dri/i915' make[5]: *** [subdirs] Error 1 make[5]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1/src/mesa/drivers/dri' make[4]: *** [default] Error 1 make[4]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1/src/mesa/drivers' make[3]: *** [driver_subdirs] Error 2 make[3]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1/src/mesa' make[2]: *** [subdirs] Error 1 make[2]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1/src' make[1]: *** [default] Error 1 make[1]: Leaving directory `/usr/src/sorcery/MesaLib/Mesa-7.8.1' make: *** [linux-dri-x86] Error 2 ! Problem Detected ! -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: Luca B. <luc...@gm...> - 2010-04-13 06:08:02
|
On nv30/nv40 support for patching fragment programs is already necessary (constants must be patched in as immediates), and this can be handled by just patching the end of the fragment program to include a variable number of instructions to copy a temp to COLOR[x]. It's possible that there could be a hardware mechanism too, haven't checked. If other MRT-capable hardware already has this kind of fragment program patching or supports this in hardware, then a new TGSI semantic or register file can be added for this, and drivers can easily implement that without recompilation. Drivers could also just unconditionally write all color outputs as a first implementation or if that doesn't affect performance. |