From: Tom S. <tst...@gm...> - 2010-03-18 19:31:35
|
Hi, I am interested in working on the Gallium R300 driver as a part of Google Summer of Code. I would like to try and target a specific game, probably Civilization 4, and get it working as well as possible. I am interested in getting some feedback on whether or not this is a good goal for the summer. In the past, I have filed bug reports and done some testing of the mesa drivers, but I have not spent a lot of time looking at the code. Where is a good place for me to start looking through the code? Is there a reference Gallium driver I can look at to get a good idea of how the drivers are structured? Thanks. -Tom Stellard |
From: tom f. <tf...@al...> - 2010-03-18 20:11:58
|
Tom Stellard <tst...@gm...> writes: > Where is a good place for me to start looking through the code? Is > there a reference Gallium driver I can look at to get a good idea of > how the drivers are structured? I'm sure one of the actual gallium developers can give you more detail/correct me, but: src/gallium/drivers is where you want to start looking. You'll note there's an `r300' directory, which is how your card is supported, of course. The reference driver is in the `softpipe' subdirectory. You can also compare with "classic" Mesa, referred to as swrast, by building --with-driver=xlib. Search the archives, as well. LunarG posted a bunch of videos from a recent hrm... 'gallium conference', I'd guess you'd call it? They're probably enlightening for the current state of gallium. I'd suggest doing this before reading code, it's probably better higher-level documentation. HTH, -tom |
From: Corbin S. <mos...@gm...> - 2010-03-18 22:25:18
|
On Thu, Mar 18, 2010 at 12:30 PM, Tom Stellard <tst...@gm...> wrote: > Hi, > > I am interested in working on the Gallium R300 driver as a part of > Google Summer of Code. I would like to try and target a specific game, > probably Civilization 4, and get it working as well as possible. I am > interested in getting some feedback on whether or not this is a good > goal for the summer. In the past, I have filed bug reports and done some > testing of the mesa drivers, but I have not spent a lot of time looking at > the code. Where is a good place for me to start looking through the code? > Is there a reference Gallium driver I can look at to get a good idea of > how the drivers are structured? Nifty. Well, there's a few places to look for information. If you're not sure how the actual video card works, http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork is a great starting point. Of particular interest is the 3D engine; r300g only talks to the 3D part of the video card. The reference Gallium driver is probably identity, although softpipe is a good reference as well. We also have documentation for the Gallium API and associated bits; if you don't want to build it yourself from the Mesa tree, there should be an up-to-date copy at http://people.freedesktop.org/~csimpson/gallium-docs/. (If there's a problem with the documentation, lemme know!) ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Tom S. <tst...@gm...> - 2010-03-19 08:25:44
|
On Thu, Mar 18, 2010 at 03:25:04PM -0700, Corbin Simpson wrote: > > If you're not sure how the actual video card works, > http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork is a > great starting point. Of particular interest is the 3D engine; r300g > only talks to the 3D part of the video card. > > The reference Gallium driver is probably identity, although softpipe > is a good reference as well. We also have documentation for the > Gallium API and associated bits; if you don't want to build it > yourself from the Mesa tree, there should be an up-to-date copy at > http://people.freedesktop.org/~csimpson/gallium-docs/. (If there's a > problem with the documentation, lemme know!) > Thanks for the info. I'll start taking a look at that. What is the correct way to install the Gallium driver? Up until now, after running make install, I have been symlinking radeong_dri.so to r300_dri.so. Is this still the way to do it? I am currently blocked from using KMS and thus testing the Gallium driver by this bug: https://bugs.freedesktop.org/show_bug.cgi?id=25662 I think it is a problem with the kernel, because my entire system locks up and I am forced to hard reboot. Debugging this problem is probably a good way for me to get familiar with the code. Any tips to help me debug this problem? -Tom Stellard |
From: Tom S. <tst...@gm...> - 2010-03-23 06:40:27
|
On Thu, Mar 18, 2010 at 03:25:04PM -0700, Corbin Simpson wrote: > > Nifty. Well, there's a few places to look for information. > > If you're not sure how the actual video card works, > http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork is a > great starting point. Of particular interest is the 3D engine; r300g > only talks to the 3D part of the video card. > > The reference Gallium driver is probably identity, although softpipe > is a good reference as well. We also have documentation for the > Gallium API and associated bits; if you don't want to build it > yourself from the Mesa tree, there should be an up-to-date copy at > http://people.freedesktop.org/~csimpson/gallium-docs/. (If there's a > problem with the documentation, lemme know!) > Thanks for the information. After spending some time learning about the Gallium driver architecture, I think it might be better to set a goal to implement or improve a specific feature of the Gallium R300 driver rather than trying to get a specific game or application to work. Is there a feature that is currently missing from the R300 driver that might make a good project for the summer? -Tom Stellard |
From: Corbin S. <mos...@gm...> - 2010-03-23 07:13:32
|
On Mon, Mar 22, 2010 at 11:39 PM, Tom Stellard <tst...@gm...> wrote: > On Thu, Mar 18, 2010 at 03:25:04PM -0700, Corbin Simpson wrote: >> >> Nifty. Well, there's a few places to look for information. >> >> If you're not sure how the actual video card works, >> http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork is a >> great starting point. Of particular interest is the 3D engine; r300g >> only talks to the 3D part of the video card. >> >> The reference Gallium driver is probably identity, although softpipe >> is a good reference as well. We also have documentation for the >> Gallium API and associated bits; if you don't want to build it >> yourself from the Mesa tree, there should be an up-to-date copy at >> http://people.freedesktop.org/~csimpson/gallium-docs/. (If there's a >> problem with the documentation, lemme know!) >> > > Thanks for the information. > > After spending some time learning about the Gallium driver architecture, I > think it might be better to set a goal to implement or improve a specific > feature of the Gallium R300 driver rather than trying to get a specific > game or application to work. Is there a feature that is currently missing > from the R300 driver that might make a good project for the summer? Good question. There's a handful of things. Passing piglit might be a good goal. Bumping the GL version further up, or solidifying the GLSL support, might be good too. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Stephane M. <ste...@gm...> - 2010-03-23 07:18:10
|
On Tue, Mar 23, 2010 at 00:13, Corbin Simpson <mos...@gm...> wrote: > On Mon, Mar 22, 2010 at 11:39 PM, Tom Stellard <tst...@gm...> wrote: >> On Thu, Mar 18, 2010 at 03:25:04PM -0700, Corbin Simpson wrote: >>> >>> Nifty. Well, there's a few places to look for information. >>> >>> If you're not sure how the actual video card works, >>> http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork is a >>> great starting point. Of particular interest is the 3D engine; r300g >>> only talks to the 3D part of the video card. >>> >>> The reference Gallium driver is probably identity, although softpipe >>> is a good reference as well. We also have documentation for the >>> Gallium API and associated bits; if you don't want to build it >>> yourself from the Mesa tree, there should be an up-to-date copy at >>> http://people.freedesktop.org/~csimpson/gallium-docs/. (If there's a >>> problem with the documentation, lemme know!) >>> >> >> Thanks for the information. >> >> After spending some time learning about the Gallium driver architecture, I >> think it might be better to set a goal to implement or improve a specific >> feature of the Gallium R300 driver rather than trying to get a specific >> game or application to work. Is there a feature that is currently missing >> from the R300 driver that might make a good project for the summer? > > Good question. There's a handful of things. Passing piglit might be a > good goal. Bumping the GL version further up, or solidifying the GLSL > support, might be good too. > Keep in mind you have to make SoC projects self-contained and doable in 3 months by a newcomer. So you have to measure the difficulty beforehand so you don't hand out trivial/impossible projects. Usually that requires a developer looking at the source and figuring out the amount of work required... Stephane |
From: Tom S. <tst...@gm...> - 2010-03-23 19:48:05
|
On Tue, Mar 23, 2010 at 12:13:25AM -0700, Corbin Simpson wrote: > On Mon, Mar 22, 2010 at 11:39 PM, Tom Stellard <tst...@gm...> wrote: > > > > Thanks for the information. > > > > After spending some time learning about the Gallium driver architecture, I > > think it might be better to set a goal to implement or improve a specific > > feature of the Gallium R300 driver rather than trying to get a specific > > game or application to work. Is there a feature that is currently missing > > from the R300 driver that might make a good project for the summer? > > Good question. There's a handful of things. Passing piglit might be a > good goal. Bumping the GL version further up, or solidifying the GLSL > support, might be good too. > I think the GLSL compiler would be an interesting project for me to work on. What is the current status of GLSL on R300 cards? Would something like passing a subset of the GLSL piglit tests, or being able to correctly handle a certain version of GLSL be a good goal for the summer? -Tom Stellard |
From: Marek O. <ma...@gm...> - 2010-03-27 01:12:01
|
On Tue, Mar 23, 2010 at 8:46 PM, Tom Stellard <tst...@gm...> wrote: > On Tue, Mar 23, 2010 at 12:13:25AM -0700, Corbin Simpson wrote: > > On Mon, Mar 22, 2010 at 11:39 PM, Tom Stellard <tst...@gm...> > wrote: > > > > > > Thanks for the information. > > > > > > After spending some time learning about the Gallium driver > architecture, I > > > think it might be better to set a goal to implement or improve a > specific > > > feature of the Gallium R300 driver rather than trying to get a specific > > > game or application to work. Is there a feature that is currently > missing > > > from the R300 driver that might make a good project for the summer? > > > > Good question. There's a handful of things. Passing piglit might be a > > good goal. Bumping the GL version further up, or solidifying the GLSL > > support, might be good too. > > > > I think the GLSL compiler would be an interesting project for me to work > on. What is the current status of GLSL on R300 cards? >From the driver point of view, we don't have to work on the GLSL compiler itself. The Mesa state tracker compiles GLSL to an assembler-like language called TGSI which is then translated ([1]) to the R300 compiler ([2]) shader representation. The more TGSI we handle, the more GLSL support we get. So now the status. r300g GLSL is missing the following features: 1) Branching and looping This is the most important one and there are 3 things which need to be done. * Unrolling loops and converting conditionals to multiplications. This is crucial for R3xx-R4xx GLSL support. I don't say it will work in all cases but should be fine for the most common ones. This is kind of a standard in all proprietary drivers supporting shaders 2.0. It would be nice have it work with pure TGSI shaders so that drivers like nvfx can reuse it too and I personally prefer to have this feature first before going on. * Teaching the R300 compiler loops and conditionals for R500 fragment shaders. Note that R500 supports the jump instruction so besides adding new opcodes, the compiler optimization passes should be updated too (I think they haven't been designed with loops in mind). * The same but for R500 vertex shaders. The difference is conditionals must be implemented using predication opcodes and predication writes (stuff gets masked out). I think this only affects the conversion to machine code at the end of the pipeline. 2) Derivatives instructions fix It's implemented but broken. From docs: "If src0 is computed in the previous instruction, then a NOP needs to be inserted between the two instructions. Do this by setting the NOP flag in the previous instruction. This is not required if the previous instruction is a texture lookup." .. and that should be the fix. 3) Perspective, flat, and centroid varying modifiers, gl_FrontFacing I think this is specific to the rasterizer (RS) block in hw ([3]). [1] src/gallium/drivers/r300/r300_tgsi_to_rc.c [2] src/mesa/drivers/dri/r300/compiler/ [3] src/gallium/drivers/r300/r300_state_derived.c Would something > like passing a subset of the GLSL piglit tests, or being able to correctly > handle a certain version of GLSL be a good goal for the summer? > I guess this question is for Corbin. ;) -Marek |
From: Tom S. <tst...@gm...> - 2010-03-27 08:56:39
|
On Sat, Mar 27, 2010 at 02:11:54AM +0100, Marek Olšák wrote: > > From the driver point of view, we don't have to work on the GLSL compiler > itself. The Mesa state tracker compiles GLSL to an assembler-like language > called TGSI which is then translated ([1]) to the R300 compiler ([2]) shader > representation. The more TGSI we handle, the more GLSL support we get. > > So now the status. r300g GLSL is missing the following features: > > 1) Branching and looping > > This is the most important one and there are 3 things which need to be done. > * Unrolling loops and converting conditionals to multiplications. This is > crucial for R3xx-R4xx GLSL support. I don't say it will work in all cases > but should be fine for the most common ones. This is kind of a standard in > all proprietary drivers supporting shaders 2.0. It would be nice have it > work with pure TGSI shaders so that drivers like nvfx can reuse it too and I > personally prefer to have this feature first before going on. > * Teaching the R300 compiler loops and conditionals for R500 fragment > shaders. Note that R500 supports the jump instruction so besides adding new > opcodes, the compiler optimization passes should be updated too (I think > they haven't been designed with loops in mind). > * The same but for R500 vertex shaders. The difference is conditionals must > be implemented using predication opcodes and predication writes (stuff gets > masked out). I think this only affects the conversion to machine code at the > end of the pipeline. > > 2) Derivatives instructions fix > > It's implemented but broken. From docs: "If src0 is computed in the previous > instruction, then a NOP needs to be inserted between the two instructions. > Do this by setting the NOP flag in the previous instruction. This is not > required if the previous instruction is a texture lookup." .. and that > should be the fix. > > 3) Perspective, flat, and centroid varying modifiers, gl_FrontFacing > > I think this is specific to the rasterizer (RS) block in hw ([3]). > > [1] src/gallium/drivers/r300/r300_tgsi_to_rc.c > [2] src/mesa/drivers/dri/r300/compiler/ > [3] src/gallium/drivers/r300/r300_state_derived.c > Thanks. This is really helpful. I am in the process of looking through the code and some of the documentation. I'll respond with some questions when I start writing my proposal. -Tom |
From: Tom S. <tst...@gm...> - 2010-03-30 05:10:46
|
On Sat, Mar 27, 2010 at 02:11:54AM +0100, Marek Olšák wrote: > > From the driver point of view, we don't have to work on the GLSL compiler > itself. The Mesa state tracker compiles GLSL to an assembler-like language > called TGSI which is then translated ([1]) to the R300 compiler ([2]) shader > representation. The more TGSI we handle, the more GLSL support we get. > Is adding support for the TGSI opcodes that are currently ignored by r300_tgsi_to_rc.c something that needs to be done? If so, are there some opcodes you would prefer to see done first? > So now the status. r300g GLSL is missing the following features: > > 1) Branching and looping > > This is the most important one and there are 3 things which need to be done. > * Unrolling loops and converting conditionals to multiplications. This is > crucial for R3xx-R4xx GLSL support. I don't say it will work in all cases > but should be fine for the most common ones. This is kind of a standard in > all proprietary drivers supporting shaders 2.0. It would be nice have it > work with pure TGSI shaders so that drivers like nvfx can reuse it too and I > personally prefer to have this feature first before going on. Would you be able to provide a small example of how to convert the conditionals to multiplications? I understand the basic idea is to mask values based on the result of the conditional, but it would help me to see an example. On IRC, eosie mentioned an alternate technique for emulating conditionals: Save the values of variables that might be affected by the conditional statement. Then, after executing both the if and the else branches, roll back the variables that were affected by the branch that was not supposed to be taken. Would this technique work as well? Is the conditional translation something that only needs to be done in the Gallium drivers, or would it be useful to apply the translation before the Mesa IR is converted into TGSI? Are any of the other drivers (Gallium or Mesa) currently doing this kind of translation? > * Teaching the R300 compiler loops and conditionals for R500 fragment > shaders. Note that R500 supports the jump instruction so besides adding new > opcodes, the compiler optimization passes should be updated too (I think > they haven't been designed with loops in mind). > * The same but for R500 vertex shaders. The difference is conditionals must > be implemented using predication opcodes and predication writes (stuff gets > masked out). I think this only affects the conversion to machine code at the > end of the pipeline. > > 2) Derivatives instructions fix > > It's implemented but broken. From docs: "If src0 is computed in the previous > instruction, then a NOP needs to be inserted between the two instructions. > Do this by setting the NOP flag in the previous instruction. This is not > required if the previous instruction is a texture lookup." .. and that > should be the fix. Is the only problem here that NOP is being inserted after texture lookups when it shouldn't be? > 3) Perspective, flat, and centroid varying modifiers, gl_FrontFacing > > I think this is specific to the rasterizer (RS) block in hw ([3]). > For my proposal, I am thinking about a schedule that looks something like this (in this order): 1) Branching and looping - 4 to 6 weeks 2) Derivatives instructions fix - 1 to 2 weeks 3) Adding support for priority TGSI_OPCODES - 3 to 4 weeks 4) Perspective, flat, and centroid varying modifiers, gl_FrontFacing and Adding support for more TGSI_OPCODES - (if there is time left)* *GSOC lasts for 12 weeks. I would appreciate feedback on the order or the time estimates in this schedule. I am sure some of the developers will have a better idea how long some of these tasks might take. Also, if there is something important that I am leaving out or something not important that I have included let me know. Thanks. -Tom Stellard |
From: Corbin S. <mos...@gm...> - 2010-03-30 05:29:05
|
On Mon, Mar 29, 2010 at 10:09 PM, Tom Stellard <tst...@gm...> wrote: > On Sat, Mar 27, 2010 at 02:11:54AM +0100, Marek Olšák wrote: >> >> From the driver point of view, we don't have to work on the GLSL compiler >> itself. The Mesa state tracker compiles GLSL to an assembler-like language >> called TGSI which is then translated ([1]) to the R300 compiler ([2]) shader >> representation. The more TGSI we handle, the more GLSL support we get. >> > > Is adding support for the TGSI opcodes that are currently ignored by > r300_tgsi_to_rc.c something that needs to be done? If so, are there > some opcodes you would prefer to see done first? > >> So now the status. r300g GLSL is missing the following features: >> >> 1) Branching and looping >> >> This is the most important one and there are 3 things which need to be done. >> * Unrolling loops and converting conditionals to multiplications. This is >> crucial for R3xx-R4xx GLSL support. I don't say it will work in all cases >> but should be fine for the most common ones. This is kind of a standard in >> all proprietary drivers supporting shaders 2.0. It would be nice have it >> work with pure TGSI shaders so that drivers like nvfx can reuse it too and I >> personally prefer to have this feature first before going on. > > Would you be able to provide a small example of how to convert the > conditionals to multiplications? I understand the basic idea is to mask > values based on the result of the conditional, but it would help me to see > an example. On IRC, eosie mentioned an alternate technique for emulating > conditionals: Save the values of variables that might be affected by > the conditional statement. Then, after executing both the if and the else > branches, roll back the variables that were affected by the branch that > was not supposed to be taken. Would this technique work as well? > > Is the conditional translation something that only needs to be done > in the Gallium drivers, or would it be useful to apply the translation > before the Mesa IR is converted into TGSI? Are any of the other drivers > (Gallium or Mesa) currently doing this kind of translation? > >> * Teaching the R300 compiler loops and conditionals for R500 fragment >> shaders. Note that R500 supports the jump instruction so besides adding new >> opcodes, the compiler optimization passes should be updated too (I think >> they haven't been designed with loops in mind). >> * The same but for R500 vertex shaders. The difference is conditionals must >> be implemented using predication opcodes and predication writes (stuff gets >> masked out). I think this only affects the conversion to machine code at the >> end of the pipeline. >> >> 2) Derivatives instructions fix >> >> It's implemented but broken. From docs: "If src0 is computed in the previous >> instruction, then a NOP needs to be inserted between the two instructions. >> Do this by setting the NOP flag in the previous instruction. This is not >> required if the previous instruction is a texture lookup." .. and that >> should be the fix. > > Is the only problem here that NOP is being inserted after texture > lookups when it shouldn't be? > >> 3) Perspective, flat, and centroid varying modifiers, gl_FrontFacing >> >> I think this is specific to the rasterizer (RS) block in hw ([3]). >> > > For my proposal, I am thinking about a schedule that looks > something like this (in this order): > > 1) Branching and looping - 4 to 6 weeks > 2) Derivatives instructions fix - 1 to 2 weeks > 3) Adding support for priority TGSI_OPCODES - 3 to 4 weeks > 4) Perspective, flat, and centroid varying modifiers, gl_FrontFacing > and Adding support for more TGSI_OPCODES - (if there is time left)* > > *GSOC lasts for 12 weeks. > > I would appreciate feedback on the order or the time estimates in this > schedule. I am sure some of the developers will have a better idea > how long some of these tasks might take. Also, if there is something > important that I am leaving out or something not important that I have > included let me know. This seems reasonable. The missing opcodes are all related to branches and loops, AFAIR, so you'll be working with those. Go ahead and set up your proposal on the GSoC site; we'll let you know if it needs tuning or tweaking. ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Marek O. <ma...@gm...> - 2010-03-30 06:13:18
|
On Tue, Mar 30, 2010 at 7:09 AM, Tom Stellard <tst...@gm...> wrote: > On Sat, Mar 27, 2010 at 02:11:54AM +0100, Marek Olšák wrote: > > > > From the driver point of view, we don't have to work on the GLSL compiler > > itself. The Mesa state tracker compiles GLSL to an assembler-like > language > > called TGSI which is then translated ([1]) to the R300 compiler ([2]) > shader > > representation. The more TGSI we handle, the more GLSL support we get. > > > > Is adding support for the TGSI opcodes that are currently ignored by > r300_tgsi_to_rc.c something that needs to be done? If so, are there > some opcodes you would prefer to see done first? > One of the goals might be to pass all relevant piglit tests including glean/glsl1 which exercises various opcodes but it's not so important and I'd be surprised if you would make it in your timeframe. You may use it for testing though. > > So now the status. r300g GLSL is missing the following features: > > > > 1) Branching and looping > > > > This is the most important one and there are 3 things which need to be > done. > > * Unrolling loops and converting conditionals to multiplications. This is > > crucial for R3xx-R4xx GLSL support. I don't say it will work in all cases > > but should be fine for the most common ones. This is kind of a standard > in > > all proprietary drivers supporting shaders 2.0. It would be nice have it > > work with pure TGSI shaders so that drivers like nvfx can reuse it too > and I > > personally prefer to have this feature first before going on. > > Would you be able to provide a small example of how to convert the > conditionals to multiplications? I understand the basic idea is to mask > values based on the result of the conditional, but it would help me to see > an example. On IRC, eosie mentioned an alternate technique for emulating > conditionals: Save the values of variables that might be affected by > the conditional statement. Then, after executing both the if and the else > branches, roll back the variables that were affected by the branch that > was not supposed to be taken. Would this technique work as well? > Well, I am eosie, thanks for the info, it's always cool to be reminded what I've written on IRC. ;) Another idea was to convert TGSI to a SSA form. That would make unrolling branches much easier as the Phi function would basically become a linear interpolation, loops and subroutines with conditional return statements might be trickier. The r300 compiler already uses SSA for its optimization passes so maybe you wouldn't need to mess with TGSI that much... > Is the conditional translation something that only needs to be done > in the Gallium drivers, or would it be useful to apply the translation > before the Mesa IR is converted into TGSI? Are any of the other drivers > (Gallium or Mesa) currently doing this kind of translation? > Not that I know of. You may do it wherever you want theoretically, even in the r300 compiler and leaving TGSI untouched, but I think most people would appreciate if these translation were done in TGSI. > > * Teaching the R300 compiler loops and conditionals for R500 fragment > > shaders. Note that R500 supports the jump instruction so besides adding > new > > opcodes, the compiler optimization passes should be updated too (I think > > they haven't been designed with loops in mind). > > * The same but for R500 vertex shaders. The difference is conditionals > must > > be implemented using predication opcodes and predication writes (stuff > gets > > masked out). I think this only affects the conversion to machine code at > the > > end of the pipeline. > > > > 2) Derivatives instructions fix > > > > It's implemented but broken. From docs: "If src0 is computed in the > previous > > instruction, then a NOP needs to be inserted between the two > instructions. > > Do this by setting the NOP flag in the previous instruction. This is not > > required if the previous instruction is a texture lookup." .. and that > > should be the fix. > > Is the only problem here that NOP is being inserted after texture > lookups when it shouldn't be? > Well the derivatives don't work and NOP is not being inserted anywhere. The quoted statement from the docs was supposed to give you a clue. NOP after a texture lookup is *not required*, that means it would be just silly to put it there but it shouldn't break anything. -Marek |
From: Luca B. <luc...@gm...> - 2010-03-30 15:37:18
|
> Another idea was to convert TGSI to a SSA form. That would make unrolling > branches much easier as the Phi function would basically become a linear > interpolation, loops and subroutines with conditional return statements > might be trickier. The r300 compiler already uses SSA for its optimization > passes so maybe you wouldn't need to mess with TGSI that much... > >> >> Is the conditional translation something that only needs to be done >> in the Gallium drivers, or would it be useful to apply the translation >> before the Mesa IR is converted into TGSI? Are any of the other drivers >> (Gallium or Mesa) currently doing this kind of translation? > > Not that I know of. You may do it wherever you want theoretically, even in > the r300 compiler and leaving TGSI untouched, but I think most people would > appreciate if these translation were done in TGSI. It would be nice to have a driver-independent TGSI optimization module. It could either operate directly on TGSI (probably only good for simple optimization), or convert to LLVM IR, optimize, and convert back. This would allow to use this for all drivers: note that at least inlining and loop unrolling should generally be performed even for hardware with full control flow support. Lots of other optimizations would then be possible (using LLVM, with a single line of code to request the appropriate LLVM pass), and would automatically be available for all drivers, instead of being only available for r300 by putting them in the radeon compiler. |
From: José F. <jfo...@vm...> - 2010-03-30 15:53:33
|
On Tue, 2010-03-30 at 08:37 -0700, Luca Barbieri wrote: > > Another idea was to convert TGSI to a SSA form. That would make unrolling > > branches much easier as the Phi function would basically become a linear > > interpolation, loops and subroutines with conditional return statements > > might be trickier. The r300 compiler already uses SSA for its optimization > > passes so maybe you wouldn't need to mess with TGSI that much... > > > >> > >> Is the conditional translation something that only needs to be done > >> in the Gallium drivers, or would it be useful to apply the translation > >> before the Mesa IR is converted into TGSI? Are any of the other drivers > >> (Gallium or Mesa) currently doing this kind of translation? > > > > Not that I know of. You may do it wherever you want theoretically, even in > > the r300 compiler and leaving TGSI untouched, but I think most people would > > appreciate if these translation were done in TGSI. > > It would be nice to have a driver-independent TGSI optimization module. > It could either operate directly on TGSI (probably only good for > simple optimization), or convert to LLVM IR, optimize, and convert > back. > > This would allow to use this for all drivers: note that at least > inlining and loop unrolling should generally be performed even for > hardware with full control flow support. > Lots of other optimizations would then be possible (using LLVM, with a > single line of code to request the appropriate LLVM pass), and would > automatically be available for all drivers, instead of being only > available for r300 by putting them in the radeon compiler. Agreed. These were my thoughts too when watching Nicolai Haehnle's FOSDEM presentation. In my opinion the best would be to use a SSA form of TGSI, with possibility for annotations or ability to have hardware specific instructions, so that the drivers could faithfully represent all the oddities in certain hardware. There are several deep challenges in making TGSI <-> LLVM IR translation lossless -- I'm sure we'll get around to overcome them -- but I don't think that using LLVM is a requirement for this module. Having a shared IR for simple TGSI optimization module would go a long way by itself. Jose |
From: Corbin S. <mos...@gm...> - 2010-03-30 17:08:50
|
On Tue, Mar 30, 2010 at 8:37 AM, Luca Barbieri <luc...@gm...> wrote: >> Another idea was to convert TGSI to a SSA form. That would make unrolling >> branches much easier as the Phi function would basically become a linear >> interpolation, loops and subroutines with conditional return statements >> might be trickier. The r300 compiler already uses SSA for its optimization >> passes so maybe you wouldn't need to mess with TGSI that much... >> >>> >>> Is the conditional translation something that only needs to be done >>> in the Gallium drivers, or would it be useful to apply the translation >>> before the Mesa IR is converted into TGSI? Are any of the other drivers >>> (Gallium or Mesa) currently doing this kind of translation? >> >> Not that I know of. You may do it wherever you want theoretically, even in >> the r300 compiler and leaving TGSI untouched, but I think most people would >> appreciate if these translation were done in TGSI. > > It would be nice to have a driver-independent TGSI optimization module. > It could either operate directly on TGSI (probably only good for > simple optimization), or convert to LLVM IR, optimize, and convert > back. > > This would allow to use this for all drivers: note that at least > inlining and loop unrolling should generally be performed even for > hardware with full control flow support. > Lots of other optimizations would then be possible (using LLVM, with a > single line of code to request the appropriate LLVM pass), and would > automatically be available for all drivers, instead of being only > available for r300 by putting them in the radeon compiler. This is orthogonal to the suggested project... -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Corbin S. <mos...@gm...> - 2010-03-23 07:16:33
|
On Tue, Mar 23, 2010 at 12:13 AM, Corbin Simpson <mos...@gm...> wrote: > Good question. There's a handful of things. Passing piglit might be a > good goal. Bumping the GL version further up, or solidifying the GLSL > support, might be good too. Oh, and how could I forget this? We have a sizeable todo list: http://dri.freedesktop.org/wiki/R300ToDo ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Marek O. <ma...@gm...> - 2010-03-23 13:37:42
|
I've updated the TODO list with the stuff from my private one, in case you guys think there are too few things to do. ;) http://dri.freedesktop.org/wiki/R300ToDo?action=diff -Marek On Tue, Mar 23, 2010 at 8:16 AM, Corbin Simpson <mos...@gm...>wrote: > On Tue, Mar 23, 2010 at 12:13 AM, Corbin Simpson > <mos...@gm...> wrote: > > Good question. There's a handful of things. Passing piglit might be a > > good goal. Bumping the GL version further up, or solidifying the GLSL > > support, might be good too. > > Oh, and how could I forget this? We have a sizeable todo list: > http://dri.freedesktop.org/wiki/R300ToDo > > ~ C. > > -- > When the facts change, I change my mind. What do you do, sir? ~ Keynes > > Corbin Simpson > <Mos...@gm...> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > |
From: Nicolai H. <nha...@gm...> - 2010-03-30 08:26:35
|
Reply to all this time... On Tue, Mar 30, 2010 at 8:13 AM, Marek Olšák <ma...@gm...> wrote: >> > 1) Branching and looping >> > >> > This is the most important one and there are 3 things which need to be >> > done. >> > * Unrolling loops and converting conditionals to multiplications. This >> > is >> > crucial for R3xx-R4xx GLSL support. I don't say it will work in all >> > cases >> > but should be fine for the most common ones. This is kind of a standard >> > in >> > all proprietary drivers supporting shaders 2.0. It would be nice have it >> > work with pure TGSI shaders so that drivers like nvfx can reuse it too >> > and I >> > personally prefer to have this feature first before going on. >> >> Would you be able to provide a small example of how to convert the >> conditionals to multiplications? I understand the basic idea is to mask >> values based on the result of the conditional, but it would help me to see >> an example. On IRC, eosie mentioned an alternate technique for emulating >> conditionals: Save the values of variables that might be affected by >> the conditional statement. Then, after executing both the if and the else >> branches, roll back the variables that were affected by the branch that >> was not supposed to be taken. Would this technique work as well? > > Well, I am eosie, thanks for the info, it's always cool to be reminded what > I've written on IRC. ;) > > Another idea was to convert TGSI to a SSA form. That would make unrolling > branches much easier as the Phi function would basically become a linear > interpolation, loops and subroutines with conditional return statements > might be trickier. The r300 compiler already uses SSA for its optimization > passes so maybe you wouldn't need to mess with TGSI that much... Note that my Git repository already contains an implementation of branch emulation and some additional optimizations, see here: http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl Shame on me for abandoning it - I should really get around to make sure it fits in with recent changes and merge it to master. The main problem is that it produces "somewhat" inefficient code. Adding and improving peephole and similar optimizations should help tremendously. <snip> >> > 2) Derivatives instructions fix >> > >> > It's implemented but broken. From docs: "If src0 is computed in the >> > previous >> > instruction, then a NOP needs to be inserted between the two >> > instructions. >> > Do this by setting the NOP flag in the previous instruction. This is not >> > required if the previous instruction is a texture lookup." .. and that >> > should be the fix. >> >> Is the only problem here that NOP is being inserted after texture >> lookups when it shouldn't be? > > Well the derivatives don't work and NOP is not being inserted anywhere. The > quoted statement from the docs was supposed to give you a clue. NOP after a > texture lookup is *not required*, that means it would be just silly to put > it there but it shouldn't break anything. I seem to recall that there is a bit in the opcodes to have a NOP cycle without actually inserting a NOP instruction. This might be more inefficient. I've never actually tested it. cu, Nicolai |
From: Dave A. <ai...@gm...> - 2010-03-31 10:17:59
|
On Tue, Mar 30, 2010 at 6:26 PM, Nicolai Haehnle <nha...@gm...> wrote: > Reply to all this time... > > On Tue, Mar 30, 2010 at 8:13 AM, Marek Olšák <ma...@gm...> wrote: >>> > 1) Branching and looping >>> > >>> > This is the most important one and there are 3 things which need to be >>> > done. >>> > * Unrolling loops and converting conditionals to multiplications. This >>> > is >>> > crucial for R3xx-R4xx GLSL support. I don't say it will work in all >>> > cases >>> > but should be fine for the most common ones. This is kind of a standard >>> > in >>> > all proprietary drivers supporting shaders 2.0. It would be nice have it >>> > work with pure TGSI shaders so that drivers like nvfx can reuse it too >>> > and I >>> > personally prefer to have this feature first before going on. >>> >>> Would you be able to provide a small example of how to convert the >>> conditionals to multiplications? I understand the basic idea is to mask >>> values based on the result of the conditional, but it would help me to see >>> an example. On IRC, eosie mentioned an alternate technique for emulating >>> conditionals: Save the values of variables that might be affected by >>> the conditional statement. Then, after executing both the if and the else >>> branches, roll back the variables that were affected by the branch that >>> was not supposed to be taken. Would this technique work as well? >> >> Well, I am eosie, thanks for the info, it's always cool to be reminded what >> I've written on IRC. ;) >> >> Another idea was to convert TGSI to a SSA form. That would make unrolling >> branches much easier as the Phi function would basically become a linear >> interpolation, loops and subroutines with conditional return statements >> might be trickier. The r300 compiler already uses SSA for its optimization >> passes so maybe you wouldn't need to mess with TGSI that much... > > Note that my Git repository already contains an implementation of > branch emulation and some additional optimizations, see here: > http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl > > Shame on me for abandoning it - I should really get around to make > sure it fits in with recent changes and merge it to master. The main > problem is that it produces "somewhat" inefficient code. Adding and > improving peephole and similar optimizations should help tremendously. git rebases cleanly onto master, and piglit has -2 for me here texCube->fail glsl-fs-fragcoord -> fail Now it might be other things I haven't had to time to investigate, just letting you know that merging it might not a bad plan, Dave. |
From: Luca B. <luc...@gm...> - 2010-03-30 16:53:02
|
> There are several deep challenges in making TGSI <-> LLVM IR translation > lossless -- I'm sure we'll get around to overcome them -- but I don't > think that using LLVM is a requirement for this module. Having a shared > IR for simple TGSI optimization module would go a long way by itself. What are these challenges? If you keep vectors and don't scalarize, I don't see why it shouldn't just work, especially if you just roundtrip without running any passes. The DAG instruction matcher should be able to match writemasks, swizzles, etc. fine. Control flow may not be exactly reconstructed, but I think LLVM has control flow canonicalization that should allow to reconstruct a loop/if control flow structure of equivalent efficiency. Using LLVM has the obvious advantage that all optimizations have already been written and tested. And for complex shaders, you may really need a good full optimizer (that can do inter-basic-block and interprocedural optimizations, alias analysis, advanced loop optmizations, and so on), especially if we start supporting OpenCL over TGSI. There is also the option of having the driver directly consume the LLVM IR, and the frontend directly produce it (e.g. clang supports OpenCL -> LLVM). Some things, like inlining, are easy to do directly in TGSI (but only because all regs are global). However, even determining the minimum number of loop iterations for loop unrolling is very hard to do without a full compiler. For instance, consider code like this: if(foo >= 6) { if(foo == 1) iters = foo + 3; else if(bar == 1) iters = foo + 5 + bar; else iters = foo + 7; for(i = 0; i < iters; ++i) LOOP_BODY; } You need a non-trivial optimizer (with control flow support, value range propagation, and constant folding) to find out that the loop always executes at least 12 iterations, which you need to know to unroll it optimally. More complex examples are possible. It general, anything that requires (approximately) determining any property of the program potentially benefits from having the most complex and powerful optimizer available. |
From: Brian P. <br...@vm...> - 2010-03-30 17:15:54
|
This is getting off-topic, but anyway... Luca Barbieri wrote: >> There are several deep challenges in making TGSI <-> LLVM IR translation >> lossless -- I'm sure we'll get around to overcome them -- but I don't >> think that using LLVM is a requirement for this module. Having a shared >> IR for simple TGSI optimization module would go a long way by itself. > > What are these challenges? Control flow is hard. Writing a TGSI backend for LLVM would be a lot of work. Etc. > If you keep vectors and don't scalarize, I don't see why it shouldn't > just work, especially if you just roundtrip without running any > passes. > The DAG instruction matcher should be able to match writemasks, > swizzles, etc. fine. > > Control flow may not be exactly reconstructed, but I think LLVM has > control flow canonicalization that should allow to reconstruct a > loop/if control flow structure of equivalent efficiency. LLVM only has branch instructions while GPU instruction sets avoid branching and use explicit conditional and loop constructs. Analyzing the LLVM IR branches to reconstruct GPU loops and conditionals isn't easy. > Using LLVM has the obvious advantage that all optimizations have > already been written and tested. > And for complex shaders, you may really need a good full optimizer > (that can do inter-basic-block and interprocedural optimizations, > alias analysis, advanced loop optmizations, and so on), especially if > we start supporting OpenCL over TGSI. > > There is also the option of having the driver directly consume the > LLVM IR, and the frontend directly produce it (e.g. clang supports > OpenCL -> LLVM). > > Some things, like inlining, are easy to do directly in TGSI (but only > because all regs are global). Inlining isn't always easy. The Mesa GLSL compiler inlines function calls whenever possible. But there are some tricky cases. For example, if the function we want to inline has deeply nested early return statements you have to convert the return statements into something else to avoid mistakenly returning from the calling function. The LLVM optimizer may handle this just fine, but translating the resulting LLVM IR back to TGSI could be hard (see above). > However, even determining the minimum number of loop iterations for > loop unrolling is very hard to do without a full compiler. > > For instance, consider code like this: > if(foo >= 6) > { > if(foo == 1) > iters = foo + 3; > else if(bar == 1) > iters = foo + 5 + bar; > else > iters = foo + 7; > > for(i = 0; i < iters; ++i) LOOP_BODY; > > } > > You need a non-trivial optimizer (with control flow support, value > range propagation, and constant folding) to find out that the loop > always executes at least 12 iterations, which you need to know to > unroll it optimally. > More complex examples are possible. Yup, it's hard. > It general, anything that requires (approximately) determining any > property of the program potentially benefits from having the most > complex and powerful optimizer available. I also think that some optimizations are more effective if they're applied at a higher level (in the GLSL compiler, for example). But that's a another topic of conversation. -Brian |
From: Zack R. <za...@vm...> - 2010-03-30 17:33:37
|
On Tuesday 30 March 2010 12:52:54 Luca Barbieri wrote: > > There are several deep challenges in making TGSI <-> LLVM IR translation > > lossless -- I'm sure we'll get around to overcome them -- but I don't > > think that using LLVM is a requirement for this module. Having a shared > > IR for simple TGSI optimization module would go a long way by itself. > > What are these challenges? Besides what Brian just pointed out, it's also worth noting that the one problem that everyone dreads is creating LLVM code-generator for TGSI. Everyone seems to agree that it's a darn complicated task with a somewhat undefined scope. It's obviously something that will be mandatory for OpenCL, but I doubt anyone will touch it before it's an absolute must. |
From: Luca B. <luc...@gm...> - 2010-03-30 17:05:43
|
DDX/DDY could cause miscompilation, but I think that only happens if LLVM clones or causes some paths to net execute them. Someone proposed some time ago on llvmdev to add a flag to tell llvm to never duplicate an intrinsic, not sure if that went through (iirc, it was for a barrier instruction that relied on the instruction pointer). Alternatively, it should be possible to just disable any passes that clone basic blocks if those instructions are present. The non-execution problem should be fixable by declaring DDX/DDY to have global-write-like side effects (this will prevent dead code elimination of them if they are totally unused, but hopefully shaders are not written so badly they need that). |
From: Corbin S. <mos...@gm...> - 2010-03-30 17:10:57
|
On Tue, Mar 30, 2010 at 10:05 AM, Luca Barbieri <luc...@gm...> wrote: > DDX/DDY could cause miscompilation, but I think that only happens if > LLVM clones or causes some paths to net execute them. > > Someone proposed some time ago on llvmdev to add a flag to tell llvm > to never duplicate an intrinsic, not sure if that went through (iirc, > it was for a barrier instruction that relied on the instruction > pointer). > Alternatively, it should be possible to just disable any passes that > clone basic blocks if those instructions are present. > > The non-execution problem should be fixable by declaring DDX/DDY to > have global-write-like side effects (this will prevent dead code > elimination of them if they are totally unused, but hopefully shaders > are not written so badly they need that). We're talking about a HW-specific issue here, not anything that needs global changes. I'm really not sure where you're going with this. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |