You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
(10) |
Apr
(28) |
May
(41) |
Jun
(91) |
Jul
(63) |
Aug
(45) |
Sep
(37) |
Oct
(80) |
Nov
(91) |
Dec
(47) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(48) |
Feb
(121) |
Mar
(126) |
Apr
(16) |
May
(85) |
Jun
(84) |
Jul
(115) |
Aug
(71) |
Sep
(27) |
Oct
(33) |
Nov
(15) |
Dec
(71) |
2002 |
Jan
(73) |
Feb
(34) |
Mar
(39) |
Apr
(135) |
May
(59) |
Jun
(116) |
Jul
(93) |
Aug
(40) |
Sep
(50) |
Oct
(87) |
Nov
(90) |
Dec
(32) |
2003 |
Jan
(181) |
Feb
(101) |
Mar
(231) |
Apr
(240) |
May
(148) |
Jun
(228) |
Jul
(156) |
Aug
(49) |
Sep
(173) |
Oct
(169) |
Nov
(137) |
Dec
(163) |
2004 |
Jan
(243) |
Feb
(141) |
Mar
(183) |
Apr
(364) |
May
(369) |
Jun
(251) |
Jul
(194) |
Aug
(140) |
Sep
(154) |
Oct
(167) |
Nov
(86) |
Dec
(109) |
2005 |
Jan
(176) |
Feb
(140) |
Mar
(112) |
Apr
(158) |
May
(140) |
Jun
(201) |
Jul
(123) |
Aug
(196) |
Sep
(143) |
Oct
(165) |
Nov
(158) |
Dec
(79) |
2006 |
Jan
(90) |
Feb
(156) |
Mar
(125) |
Apr
(146) |
May
(169) |
Jun
(146) |
Jul
(150) |
Aug
(176) |
Sep
(156) |
Oct
(237) |
Nov
(179) |
Dec
(140) |
2007 |
Jan
(144) |
Feb
(116) |
Mar
(261) |
Apr
(279) |
May
(222) |
Jun
(103) |
Jul
(237) |
Aug
(191) |
Sep
(113) |
Oct
(129) |
Nov
(141) |
Dec
(165) |
2008 |
Jan
(152) |
Feb
(195) |
Mar
(242) |
Apr
(146) |
May
(151) |
Jun
(172) |
Jul
(123) |
Aug
(195) |
Sep
(195) |
Oct
(138) |
Nov
(183) |
Dec
(125) |
2009 |
Jan
(268) |
Feb
(281) |
Mar
(295) |
Apr
(293) |
May
(273) |
Jun
(265) |
Jul
(406) |
Aug
(679) |
Sep
(434) |
Oct
(357) |
Nov
(306) |
Dec
(478) |
2010 |
Jan
(856) |
Feb
(668) |
Mar
(927) |
Apr
(269) |
May
(12) |
Jun
(13) |
Jul
(6) |
Aug
(8) |
Sep
(23) |
Oct
(4) |
Nov
(8) |
Dec
(11) |
2011 |
Jan
(4) |
Feb
(2) |
Mar
(3) |
Apr
(9) |
May
(6) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Luca B. <luc...@gm...> - 2010-04-04 20:28:44
|
> Does it mean there will be format fallbacks? Because dword-unaligned but > still pretty common (i.e. GL1.1) vertex formats aren't supported by r300, > most often we hit R16G16B16. What will happen when is_format_supported says > NO to such a format? I hope it won't share the fate of PIPE_CAP_SM3, which > every in-tree state tracker ignores. I'm not sure I understand correctly what you are saying. The idea is to do like you did in your patch, but instead of calling screen->get_param(screen, PIPE_CAP_HALF_FLOAT_VERTEX), calling screen->is_format_supported(screen, PIPE_FORMAT_R16G16B16G16, PIPE_BUFFER, ..., ...). The PIPE_BUFFER target is supported in gallium-resources, but I'm not sure whether this way of querying vertex formats is supported; it would probably need to be added first. If you mean that r300 doesn't support R16G16B16, I suppose you can just use R16G16B16A16 and ignore the extra fetched w element (the vertex buffer stride will make this work properly). However, if non-dword-aligned vertex buffer strides or vertex element offsets are not supported, I think you have a serious problem, which is however independent of half float vertices since I don't think OpenGL places any alignment constraints on those values (correct me if I'm wrong). |
From: Vinson L. <vl...@vm...> - 2010-04-04 20:07:05
|
> -----Original Message----- > > I see Vinson committed a better fix to the 7.8 branch. However, > Vinson, I think you made a typo: > > #elif defined(PIPE_CC_GCC) && (PIPE_CC_GCC_VERSION >= 401) > > That version should be "410", not "401", right? > The PIPE_CC_GCC_VERSION macro doesn't use __GNUC_PATCHLEVEL__. src/gallium/include/pipe/p_config.h 54 #if defined(__GNUC__) 55 #define PIPE_CC_GCC 56 #define PIPE_CC_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__) 57 #endif printf("PIPE_CC_GCC_VERSION: %d\n", PIPE_CC_GCC_VERSION); PIPE_CC_GCC_VERSION: 402 $ gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1) Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
From: Marek O. <ma...@gm...> - 2010-04-04 20:04:53
|
On Sun, Apr 4, 2010 at 9:41 PM, Luca Barbieri <luc...@gm...>wrote: > There was some talk about doing the query with a vertex buffer target > for is_format_supported. > Does it mean there will be format fallbacks? Because dword-unaligned but still pretty common (i.e. GL1.1) vertex formats aren't supported by r300, most often we hit R16G16B16. What will happen when is_format_supported says NO to such a format? I hope it won't share the fate of PIPE_CAP_SM3, which every in-tree state tracker ignores. -Marek |
From: Luca B. <luc...@gm...> - 2010-04-04 19:41:40
|
There was some talk about doing the query with a vertex buffer target for is_format_supported. After gallium-resources is merged, this should be automatically possible. BTW, the st/mesa patch originally was from Dave Airlie and was slightly changed by me. |
From: Marek O. <ma...@gm...> - 2010-04-04 19:12:54
|
Hi devs, I (and Luca mostly) have made some simple patches which add GL_ARB_half_float_vertex to Gallium. Author: Marek Olšák <ma...@gm...> r300g: enable half float vertex st/mesa: query for half float vertex support Author: Luca Barbieri <lu...@lu...> st/mesa: half float vertex support Please review the commits here: http://cgit.freedesktop.org/~mareko/mesa/log/?h=half-float-vertex<http://cgit.freedesktop.org/%7Emareko/mesa/log/?h=half-float-vertex> Please let me know whether I may push this. Cheers -Marek |
From: Henri V. <hve...@gm...> - 2010-04-04 17:24:46
|
It uses ctx->VertexProgram._Current. --- src/mesa/main/state.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c index 589029d..b971cc9 100644 --- a/src/mesa/main/state.c +++ b/src/mesa/main/state.c @@ -582,9 +582,6 @@ _mesa_update_state_locked( GLcontext *ctx ) if (new_state & _DD_NEW_SEPARATE_SPECULAR) update_separate_specular( ctx ); - if (new_state & (_NEW_ARRAY | _NEW_PROGRAM | _NEW_BUFFER_OBJECT)) - update_arrays( ctx ); - if (new_state & (_NEW_BUFFERS | _NEW_VIEWPORT)) update_viewport_matrix(ctx); @@ -620,6 +617,8 @@ _mesa_update_state_locked( GLcontext *ctx ) new_prog_state |= update_program( ctx ); } + if (new_state & (_NEW_ARRAY | _NEW_PROGRAM | _NEW_BUFFER_OBJECT)) + update_arrays( ctx ); out: new_prog_state |= update_program_constants(ctx); -- 1.6.4.4 |
From: tom f. <tf...@al...> - 2010-04-04 16:53:42
|
Jeremy Huddleston <jer...@fr...> writes: > > On Apr 3, 2010, at 12:34, tom fogal wrote: > > > Vinson Lee <vl...@vm...> writes: > >> Leopard uses gcc-4.0, which didn't have built-in support for atomic > >> variables. > > > > u_atomic.h should probably check for a supported compiler; Jeremy, does > > the attached patch produce an understandable error instead of a link > > error? > > Yeah, that bails appropriately, but the message should probably be soemthing > more like: > > #error "galium requires a compiler that supports gcc atomics." I see Vinson committed a better fix to the 7.8 branch. However, Vinson, I think you made a typo: #elif defined(PIPE_CC_GCC) && (PIPE_CC_GCC_VERSION >= 401) That version should be "410", not "401", right? -tom |
From: Chia-I Wu <ol...@gm...> - 2010-04-04 14:04:16
|
On Sat, Apr 3, 2010 at 11:51 PM, Jakob Bornecrantz <wal...@gm...> wrote: > On Sun, Mar 28, 2010 at 6:13 PM, Chia-I Wu <ol...@gm...> wrote: >> This patch series adds support for GL_OES_EGL_image to st/mesa. The first >> patch implements st_manager::get_egl_image in st/egl. The hook is used to >> check and return an st_egl_image, which describes an EGLImageKHR. The second >> patch implements GL_OES_EGL_image in st/mesa, and the last patch adds a demo >> for the new functionality. I've tested it with egl_x11_i915.so, but it should >> work with other hardware drivers. >> Do you mind having a look at the patches, especially the first one? I'd like >> to hear your opinions before merging the patches, and going on to work on >> EGLImage support in st/dri. > Terribly sorry for taking this long to reply. The patches look good go > ahead and commit. Regarding EGLImage in st/dri don't let me stop you > if you have a itch to do it. If I get time over sometime I'll ask you > then if you have done anything. That's fine. I will rebase the patches and commit soon. I might need to switch my focus to Windows for a while. I guess I will use the chance to convert st/wgl to st_api.h and finally drop st_public.h. Until it is too painful to work on Windows and I need to work on something fun, I will add EGLImage support to st/dri after dropping st_public.h. > And again thanks for the work hard work! |
From: Dave A. <ai...@gm...> - 2010-04-04 10:31:16
|
Hey, So I was trying to fix tfp test on r300g, and ran into an issue with dri st I think. So the way TFP works we get dri2_set_tex_buffer, which then validates the attachment, but ignores the format passed in. So r300g picks up the kernel buffer from the handle and sets up the texture + texture state without the format information. Once we've validated, we call ctx->st->teximage and can give it a different format however at no point does r300g get any place to change the texture format and update its internal state. I'm not sure if either r300g should delay setting up its internal state for emission until later or whether we need to enhance the st interface. The main issue with we get a TFP with a B8G8R8X8 but the visual is B8G8R8A8 which triggers this. Dave. |
From: Marek O. <ma...@gm...> - 2010-04-04 05:30:08
|
On Sun, Apr 4, 2010 at 6:14 AM, Tom Stellard <tst...@gm...> wrote: > On Sun, Apr 04, 2010 at 01:09:51AM +0200, Marek Olšák wrote: > > > > Since Nicolai has already implemented the branch emulation and some other > > optimizations, it would be nice to take over his work. I tried to use the > > branch emulation on vertex shaders and it did not work correctly, I guess > it > > needs little fixing. See this branch in his repo: > > http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl<http://cgit.freedesktop.org/%7Enh/mesa/log/?h=r300g-glsl> > <http://cgit.freedesktop.org/%7Enh/mesa/log/?h=r300g-glsl> > > Especially this commit implements exactly what you propose (see comments > in > > the code): > > > http://cgit.freedesktop.org/~nh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f<http://cgit.freedesktop.org/%7Enh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f> > < > http://cgit.freedesktop.org/%7Enh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f > > > > > > Reusing this code for Gallium seems more reasonable to me than > reinventing > > the wheel and doing basically the same thing elsewhere. I recommend > > implementing a TGSI backend in the r300 compiler, which will make > possible > > using it with TGSI shaders. So basically a TGSI shader would be converted > to > > the RC representation the way it's done in r300g right now, and code for > > converting RC -> hw code would get replaced by conversion RC -> TGSI. > Both > > RC and TGSI are very similar so it'll be pretty straightforward. With a > TGSI > > backend, another step would be to make a nice hw-independent and > > configurable interface on top of it which should go to util. So far it's > > simple, now comes some real work: fixing the branch emulation and > continuing > > from (2) in your list. > > I am not sure if I follow you here, so let me know if I am understanding > this correctly. What you are suggesting is to take Nicolai's branch, > which right now does TGSI -> RC -> Branch Emulation in RC -> hw code and > instead of converting from RC to hw code convert from RC back into TGSI. > That's right. > Then, pull the TGSI -> RC -> Branch Emulation in RC -> TGSI path out of > the r300 compiler and place it in gallium/auxillary/util so it can be used > by other Gallium drivers that want to emulate branches. Is this correct? > Sorry I should have been more clear. The whole RC may stay in src/mesa/drivers/dri/r300/compiler as it is now. I think these are parts that should go to util: - TGSI -> RC conversion - RC -> TGSI conversion - Hw-independent interface to the compiler, i.e. one function (or more) which takes a TGSI shader and returns a TGSI shader. It should do both conversions above and use r300/compiler directly. In the long-term, the compiler should probably be moved to src/compiler or something like that (since both classic and gallium drivers may use it), but you don't need to care about that if you don't want to. -Marek |
From: Jeremy H. <jer...@fr...> - 2010-04-04 04:33:43
|
On Apr 3, 2010, at 12:34, tom fogal wrote: > Vinson Lee <vl...@vm...> writes: >> Leopard uses gcc-4.0, which didn't have built-in support for atomic >> variables. > > u_atomic.h should probably check for a supported compiler; Jeremy, does > the attached patch produce an understandable error instead of a link > error? Yeah, that bails appropriately, but the message should probably be soemthing more like: #error "galium requires a compiler that supports gcc atomics." > In terms of a solution, Jeremy, you could implement PPC assembly for > the few primitives available there. Looks easy for someone who knows > PPC well. I only know MIPS and a splash (more than I'ld like) of X86. Seeing as how ppc is not in any shipping Apple products any more, and it's easy enough to force users to update to gcc-4.2, I think an #error is good enough. |
From: Tom S. <tst...@gm...> - 2010-04-04 04:16:29
|
On Sun, Apr 04, 2010 at 01:09:51AM +0200, Marek Olšák wrote: > > Since Nicolai has already implemented the branch emulation and some other > optimizations, it would be nice to take over his work. I tried to use the > branch emulation on vertex shaders and it did not work correctly, I guess it > needs little fixing. See this branch in his repo: > http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl<http://cgit.freedesktop.org/%7Enh/mesa/log/?h=r300g-glsl> > Especially this commit implements exactly what you propose (see comments in > the code): > http://cgit.freedesktop.org/~nh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f<http://cgit.freedesktop.org/%7Enh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f> > > Reusing this code for Gallium seems more reasonable to me than reinventing > the wheel and doing basically the same thing elsewhere. I recommend > implementing a TGSI backend in the r300 compiler, which will make possible > using it with TGSI shaders. So basically a TGSI shader would be converted to > the RC representation the way it's done in r300g right now, and code for > converting RC -> hw code would get replaced by conversion RC -> TGSI. Both > RC and TGSI are very similar so it'll be pretty straightforward. With a TGSI > backend, another step would be to make a nice hw-independent and > configurable interface on top of it which should go to util. So far it's > simple, now comes some real work: fixing the branch emulation and continuing > from (2) in your list. I am not sure if I follow you here, so let me know if I am understanding this correctly. What you are suggesting is to take Nicolai's branch, which right now does TGSI -> RC -> Branch Emulation in RC -> hw code and instead of converting from RC to hw code convert from RC back into TGSI. Then, pull the TGSI -> RC -> Branch Emulation in RC -> TGSI path out of the r300 compiler and place it in gallium/auxillary/util so it can be used by other Gallium drivers that want to emulate branches. Is this correct? -Tom |
From: Zack R. <za...@vm...> - 2010-04-04 00:42:34
|
On Saturday 03 April 2010 19:07:59 Luca Barbieri wrote: > > Gallium. Obviously a code-generator that can handle control-flow (to be > > honest I'm really not sure why you want to restrict it to something > > without control- flow in the first place). > > The no-control-flow was just for the first step, with a second step > supporting everything. k, that's good. > > Having said that I'm not sure whether this is something that's a good > > GSOC project. It's a fairly difficult piece of code to write. One that to > > do right will depend on adding some features to TGSI (a good source of > > inspiration for those would be AMD's CAL and NVIDIA's PTX > > http://developer.amd.com/gpu_assets/ATI_Intermediate_Language_(IL)_Specif > >ication_v2b.pdf http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf ) > > This would be required to handle arbitrary LLVM code (e.g. for > clang/OpenCL use), but since GLSL shader code starts as TGSI, it > should be possible to convert it back without TGSI. Which of course means you have to have that reduced scope and well defined constraints that I mentioned. Otherwise it's gonna be impossible to judge the success of the project. > I'd say, as an initial step, restricting to code produced by > TGSI->LLVM (AoS) that can be expressed with no intrinsics, having a > single basic block, with no optimization passes having been run on it. > All 4 restrictions (from TGSI->LLVM, no instrinsics, single BB and no > optimizations) can then be lifted in successive iterations. Yes, that's all fine, just like the above it would simply have to be defined, e.g. no texture sampling (since for that stuff we'd obviously want our intrinsics) and whatever other features that go with it. > The problem I see is that since OpenCL will be hopefully done at some > point, then as you say TGSI->LLVM will also be done, and that will > probably make any other optimization work irrelevant. OpenCL has no need for for TGSI->LLVM translation. It deals only with LLVM IR inside. > So basically the r300 optimization work looks doomed from the > beginning to be eventually obsoleted. Well, if that was the attitude we'd never get anything done, in 10 years the work we're doing right will be obsoleted, in 50 Gallium in general will be probably obsoleted and in 100 we'll be dead (except me, I decided that I'll live forever and so far so good), so what's the point? Writing something simple well, is still a lot better, than writing something hard badly. The point of GSOC is not to nail your first Nobel prize, it's to contribute to a Free Software project and ideally keep you interested so that you keep contributing. Picking insanely hard projects is counter productive even if technically they do make sense. Just like for a GSOC for a Linux kernel you'd suggest someone improves Ext4 rather than write a whole new file system even if long term you'll want something better than Ext4 anyway. Or at least that's what I'd suggest, but that's probably because, in general, I'm just not into sadism. z |
From: Zack R. <za...@vm...> - 2010-04-04 00:09:18
|
On Saturday 03 April 2010 18:58:36 Marek Olšák wrote: > On Sun, Apr 4, 2010 at 12:10 AM, Zack Rusin > <za...@vm...<mailto:za...@vm...>> wrote: I thought the initial > proposal was likely a lot more feasible for a GSOC (of course there one > has to point out that Mesa's GLSL compiler already does unroll loops and > in general simplifies control-flow so the points #1 and #2 are largely > no-ops, but surely there's enough work on Gallium Radeon's drivers left to > keep Tom busy). Otherwise having a well-defined and reduced scope with > clear deliverables would be rather necessary for LLVM->TGSI code because > that is not something that you could get rock solid over a summer. > > It doesn't seem to simplify branches or unroll loops that much, if at all. It does for cases where the arguments are known. > It fails even for the simplest cases like this one: > > if (gl_Vertex.x < 30.0) which is unknown at the compilation time. z |
From: Luca B. <luc...@gm...> - 2010-04-03 23:41:52
|
>> So basically the r300 optimization work looks doomed from the >> beginning to be eventually obsoleted. > > Please consider there are hw-specific optimizations in place which I think > no other compiler framework provides, and I believe this SSA thing will do Sure, but it seemed to me that all the optimizations proposed were hardware-independent and valid for any driver (other than having to know about generic capabilities like having control flow or not). > even better job for superscalar r600. So yes, we need both LLVM to do global > optimizations and RC to efficiently map code to hw. LLVM also uses SSA form (actually, it is totally built around it), assuming that's what you meant. There are doubts about whether the LLVM backend framework works well for GPUs or not (apparently because some GPUs are VLIW and only IA-64 is VLIW too, so LLVM support for it is either nonexisting or not necessary a major focus), but using LLVM->TGSI makes this irrelevant, since the existing TGSI-accepting backend will still run. |
From: Marek O. <ma...@gm...> - 2010-04-03 23:31:44
|
On Sun, Apr 4, 2010 at 1:07 AM, Luca Barbieri <luc...@gm...>wrote: > So basically the r300 optimization work looks doomed from the > beginning to be eventually obsoleted. > Please consider there are hw-specific optimizations in place which I think no other compiler framework provides, and I believe this SSA thing will do even better job for superscalar r600. So yes, we need both LLVM to do global optimizations and RC to efficiently map code to hw. -Marek |
From: Marek O. <ma...@gm...> - 2010-04-03 23:09:58
|
On Sat, Apr 3, 2010 at 9:31 AM, Tom Stellard <tst...@gm...> wrote: > 1. Enable branch emulation for Gallium drivers: > The goal of this task will be to create an optional "optimization" pass > over the TGSI code to translate branch instructions into instructions > that are supported by cards without hardware branching. The basic > strategy for doing this translation will be: > > A. Copy values of in scope variables > to a temporary location before executing the conditional statement. > > B. Execute the "if true" branch. > > C. Test the conditional expression. If it evaluates to false, rollback > all values that were modified in the "if true" branch. > > D. Repeat step 2 with the "if false" branch, and then step 3, but this > time only rollback if the conditional expression evaluates to true. > > The TGSI instructions SLT, SNE, SGE, SEQ will be used to test the > conditional expression and the instruction CND will be used to rollback > the values. > > There will be two phases to this task. For phase 1, I will implement a > simple translator that will be able to translate the branch instructions > with only one pass through the TGSI code. This simple translator will > copy all in scope variables to a temporary location before executing the > conditional statement, even if those variables will not not be modified > in either of the branches. > > Phase 2 will add a preliminary pass before to the code translation > pass that will mark variables that might be modified by the conditional > statement. Then, during the translation pass, only the variables that > could potentially be modified inside either of the conditional branches > will be copied before the conditional statement is executed. > First I really appreciate you're looking into this. I'd like to propose something doable in GSoC timeframe. Since Nicolai has already implemented the branch emulation and some other optimizations, it would be nice to take over his work. I tried to use the branch emulation on vertex shaders and it did not work correctly, I guess it needs little fixing. See this branch in his repo: http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl<http://cgit.freedesktop.org/%7Enh/mesa/log/?h=r300g-glsl> Especially this commit implements exactly what you propose (see comments in the code): http://cgit.freedesktop.org/~nh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f<http://cgit.freedesktop.org/%7Enh/mesa/commit/?h=r300g-glsl&id=71c8d4c745da23b0d4f3974353b19fad89818d7f> Reusing this code for Gallium seems more reasonable to me than reinventing the wheel and doing basically the same thing elsewhere. I recommend implementing a TGSI backend in the r300 compiler, which will make possible using it with TGSI shaders. So basically a TGSI shader would be converted to the RC representation the way it's done in r300g right now, and code for converting RC -> hw code would get replaced by conversion RC -> TGSI. Both RC and TGSI are very similar so it'll be pretty straightforward. With a TGSI backend, another step would be to make a nice hw-independent and configurable interface on top of it which should go to util. So far it's simple, now comes some real work: fixing the branch emulation and continuing from (2) in your list. Then it'll be up to developers of other drivers whether they want to implement their own hw-specific optimization passes and lowering transformations. Even linking various shaders would be much easier done with the compiler (and more efficient with its elimination of dead-code due to removed shader outputs/inputs), this is used in classic r300 and I recall Luca wanted such a feature in nouveau drivers. There is also an emulation of shadow samplers, WPOS, and an emulation of various instructions, so this is a nice and handy tool. (I would do it but I have a lot of more important stuff to do.) This may really help Gallium drivers until a real optimization framework emerges. -Marek |
From: Luca B. <luc...@gm...> - 2010-04-03 23:08:06
|
> Gallium. Obviously a code-generator that can handle control-flow (to be honest > I'm really not sure why you want to restrict it to something without control- > flow in the first place). The no-control-flow was just for the first step, with a second step supporting everything. > Having said that I'm not sure whether this is something that's a good GSOC > project. It's a fairly difficult piece of code to write. One that to do right > will depend on adding some features to TGSI (a good source of inspiration for > those would be AMD's CAL and NVIDIA's PTX > http://developer.amd.com/gpu_assets/ATI_Intermediate_Language_(IL)_Specification_v2b.pdf > http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf ) This would be required to handle arbitrary LLVM code (e.g. for clang/OpenCL use), but since GLSL shader code starts as TGSI, it should be possible to convert it back without TGSI. > I thought the initial proposal was likely a lot more feasible for a GSOC (of > course there one has to point out that Mesa's GLSL compiler already does > unroll loops and in general simplifies control-flow so the points #1 and #2 are > largely no-ops, but surely there's enough work on Gallium Radeon's drivers > left to keep Tom busy). Otherwise having a well-defined and reduced scope with > clear deliverables would be rather necessary for LLVM->TGSI code because that > is not something that you could get rock solid over a summer. I'd say, as an initial step, restricting to code produced by TGSI->LLVM (AoS) that can be expressed with no intrinsics, having a single basic block, with no optimization passes having been run on it. All 4 restrictions (from TGSI->LLVM, no instrinsics, single BB and no optimizations) can then be lifted in successive iterations. Of course, yes, it has a different scope than the original proposal. The problem I see is that since OpenCL will be hopefully done at some point, then as you say TGSI->LLVM will also be done, and that will probably make any other optimization work irrelevant. So basically the r300 optimization work looks doomed from the beginning to be eventually obsoleted. That said, you may want to do it anyway. But if you really want a quick fix for r300, seriously, just use the nVidia Cg compiler. It's closed source, but being produced by the nVidia team, you can generally rely on it not sucking. It takes GLSL input and spits out optimized ARB_fragment_program (or optionally other languages) so it is trivial to interface with it. It could even be useful to compare the output/performance of that with a more serious LLVM-based solution, to make sure we get the latter right. For instance, personally, I did work on the nv30/nv40 shader assembler (note the word "assembler" here), and haven't done anything more than simple local transforms, for exactly this reason. The only thing I've done for LLVM->TGSI is trying to recover Stephane Marchesin's work on LLVM (forgot to CC him too), lost in an hard drive crash, but failed to find anyone having pulled it. |
From: Marek O. <ma...@gm...> - 2010-04-03 22:58:44
|
On Sun, Apr 4, 2010 at 12:10 AM, Zack Rusin <za...@vm...> wrote: > I thought the initial proposal was likely a lot more feasible for a GSOC > (of > course there one has to point out that Mesa's GLSL compiler already does > unroll loops and in general simplifies control-flow so the points #1 and #2 > are > largely no-ops, but surely there's enough work on Gallium Radeon's drivers > left to keep Tom busy). Otherwise having a well-defined and reduced scope > with > clear deliverables would be rather necessary for LLVM->TGSI code because > that > is not something that you could get rock solid over a summer. > It doesn't seem to simplify branches or unroll loops that much, if at all. It fails even for the simplest cases like this one: if (gl_Vertex.x < 30.0) gl_FrontColor = vec4(1.0, 0.0, 0.0, 0.0); else gl_FrontColor = vec4(0.0, 1.0, 0.0, 0.0); This gets translated to TGSI "as is", which is fairly... you know what. -Marek |
From: Corbin S. <mos...@gm...> - 2010-04-03 22:26:39
|
On Sat, Apr 3, 2010 at 3:10 PM, Zack Rusin <za...@vm...> wrote: > On Saturday 03 April 2010 17:17:46 Luca Barbieri wrote: >> <snipped walls of text> > > From the compute support LLVM->TGSI translation isn't even about > optimizations, it's about "working". Writing a full C/C++ compiler that > generates TGSI is a lot less realistic than reusing Clang and writing a TGSI > code-generator for it. > So the LLVM code-generator for TGSI would be a very high impact project for > Gallium. Obviously a code-generator that can handle control-flow (to be honest > I'm really not sure why you want to restrict it to something without control- > flow in the first place). > > Having said that I'm not sure whether this is something that's a good GSOC > project. It's a fairly difficult piece of code to write. One that to do right > will depend on adding some features to TGSI (a good source of inspiration for > those would be AMD's CAL and NVIDIA's PTX > http://developer.amd.com/gpu_assets/ATI_Intermediate_Language_(IL)_Specification_v2b.pdf > http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf ) > > I thought the initial proposal was likely a lot more feasible for a GSOC (of > course there one has to point out that Mesa's GLSL compiler already does > unroll loops and in general simplifies control-flow so the points #1 and #2 are > largely no-ops, but surely there's enough work on Gallium Radeon's drivers > left to keep Tom busy). Otherwise having a well-defined and reduced scope with > clear deliverables would be rather necessary for LLVM->TGSI code because that > is not something that you could get rock solid over a summer. Agreed. There are some things here that need to be kept in mind: 1) r300/r500 are not architectures powerful enough to merit general compilation, and they don't mesh well with LLVM. The hand-written optimizations we already have in place are fine for these chipsets. 2) We should leverage LLVM when possible, since we're going to be increasingly dependent on it anyway. 3) Common code goes up, specialized code goes down. That's the entire point of Gallium. Specialized compiler passes that operate on TGSI but are only consumed by one driver should move down into the driver. I think that the first two parts of Tom's original proposal would be better spent on r300 only, taking nha's r300g-glsl work and cleaning and perfecting it. If we can pass all of the GLSL tests (save for the NOISE test) on r300, then we will be far better off as opposed to work on TGSI towards the same goal. ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Zack R. <za...@vm...> - 2010-04-03 22:11:58
|
On Saturday 03 April 2010 17:17:46 Luca Barbieri wrote: > >> (2) Write a LLVM->TGSI backend, restricted to programs without any > >> control flow > > > > I think (2) is probably the closest to what I am proposing, and it is > > something I can take a look at. <snip> > By the way, it would be interesting to know what people who are > working on related things think about this (CCed them). > In particular, Zack Rusin has worked extensively with LLVM and I think > a prototype OpenCL implementation. From the compute support LLVM->TGSI translation isn't even about optimizations, it's about "working". Writing a full C/C++ compiler that generates TGSI is a lot less realistic than reusing Clang and writing a TGSI code-generator for it. So the LLVM code-generator for TGSI would be a very high impact project for Gallium. Obviously a code-generator that can handle control-flow (to be honest I'm really not sure why you want to restrict it to something without control- flow in the first place). Having said that I'm not sure whether this is something that's a good GSOC project. It's a fairly difficult piece of code to write. One that to do right will depend on adding some features to TGSI (a good source of inspiration for those would be AMD's CAL and NVIDIA's PTX http://developer.amd.com/gpu_assets/ATI_Intermediate_Language_(IL)_Specification_v2b.pdf http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf ) I thought the initial proposal was likely a lot more feasible for a GSOC (of course there one has to point out that Mesa's GLSL compiler already does unroll loops and in general simplifies control-flow so the points #1 and #2 are largely no-ops, but surely there's enough work on Gallium Radeon's drivers left to keep Tom busy). Otherwise having a well-defined and reduced scope with clear deliverables would be rather necessary for LLVM->TGSI code because that is not something that you could get rock solid over a summer. z |
From: Luca B. <luc...@gm...> - 2010-04-03 21:17:53
|
> I agree with you that doing these kinds of optimizations is a difficult > task, but I am trying to focus my proposal on emulating branches and > loops for older hardware that don't have branching instructions rather > than performing global optimizations on the TGSI code. I don't think > most of the loop optimizations you listed are even possible on hardware > without branching instructions. Yes, that's possible. In fact, if you unroll loops, those optimizations can be done after loop unrolling. This does not however necessarily change things, since while you can e.g. avoid loop-invariant code motion, you still need common subexpression elimination to remove the mutiple redundant copies of the loop-invariant code generated by unrolling. Also even loop unrolling needs to find the number of iterations, which at the very least requires simple constant folding, and potentially a whole suite of complex optimization to work in all possible Some of the challenges of this were mentioned in a previous thread, as well as LLVM-related issues >> (2) Write a LLVM->TGSI backend, restricted to programs without any control flow > > I think (2) is probably the closest to what I am proposing, and it is > something I can take a look at. Note that this means an _input_ program without control flow, that is a control flow graph with a single basic block. Once you have more than one basic block, you need to convert the CFG for an arbitrary graph to something made of structured loops and conditionals. The problem here is that GPUs often use a "SIMT" approach. This means that the GPU internally works like an SSE CPU with vector registers (but often much wider, with up to 32 elements or even more). However, this is hidden to the programmer, by putting the variables related to several pixels in the vector, and making you think everything is a scalar or just a 4-component vector This works fine as long as there is no control flow; however when you reach a conditional jump, some pixels may want to take one path and some others another path. The solution is to have an "execution mask" and do not write to any pixels not in the execution masks. When and if/else/endif structure is encountered, if the pixels all take the same path, things work like CPUs; if that is not the case, both branches are executed with the appropriate execution masks, and things continue normally after the endif. The problem here is that this needs a structure if/else/endif formulation as opposed to arbitrary gotos. However LLVM and most optimizers work in arbitrary-goto formulation, which needs to be converted to a structured approach. The above all applies for GPU with hardware control flow. However, even without it, you have the same issue of reconstructing if/else/endif blocks, since you need to basically do the same in software, using a the if conditional to choose between results computed by the branches. Converting a control flow graph to a structured program is always possible, but doing it well requires some thought. In particular, you need to be careful to not break DDX instructions, which operate on a 2x2 block of pixels, and will thus behave differently if some of the other things have diverged away due to control flow modifications. This may require to make sure control flow optimizations do not duplicate them, and possibly other issues. Using an ad-hoc optimizer does indeed sidestep the issue, but only as long as you don't try to do non-trivial control flow optimization or changes. In that case, those may be best expressed on an arbitrary control flow graph (e.g. the issue with converting "continue" to if/end), and at this point you would need to add that logic anyway. At any rate, I'm not sure whether this is suitable for your GSoC project or not. My impression is that using an existing compiler would prove to be more widely useful and more long lasting, especially considering that we are moving towards applications and hardware with very complex shader support (consider the CUDA/OpenCL shaders and the very generic GPU shading capabilities). An ad-hoc TGSI optimizer will probably prove unsuitable for efficient code generation for, say, scientific applications using OpenCL, and would need to be later replaced. So my personal impression (which could be wrong) is that using an existing optimizer, while possibly requiring an higher initial investment, should have much better payoffs in the long run, by making everything beyond the initial TGSI->LLVM->TGSI work already done or easier to do. >From a coding perspective, you lose the "design and write everything myself from scratch" aspect, but you gain experience with a complex and real-world compiler, and are able to write more complex optimizations and transforms due to having a well-developed infrastructure allowing to express them easily. Furthermore, hopefully using a real compiler would result in seeing your work producing very good code in all cases, while an ad-hoc optimizer would impove the current situation, but most likely the resulting code would still be blatantly suboptimal. Another advantage would be presumably seeing the work used indefinitely and built upon for projects such as OpenCL/compute shaders support. It may be more or less time consuming, depending on the level of sophistication of the ad-hoc optimizer. By the way, it would be interesting to know what people who are working on related things think about this (CCed them). In particular, Zack Rusin has worked extensively with LLVM and I think a prototype OpenCL implementation. Also, PathScale is interested in GPU code generation and may contribute something based on Open64 and its IR, WHIRL. However, I'm not sure whether this could work as a general optimizing framework, or instead just as a backend code generator for some drivers (e.g. nv50). In particular, it may be possible to use LLVM to do architecture independent optimizations and then convert to WHIRL if such a backend is available for the targeted GPU. BTW, LLVM seems to me superior to Open64 as an easy-to-use framework for flexibly running existing optimization passes and writing your own (due to the unified IR and existing wide adoption for such purpose) so we may want to have it even if a Open64-based GPU backends where to become available; however, I might be wrong on this. The way I see it, it is a fundamental Mesa/Gallium issue, and should really be solved in a lasting way. See the previous thread for more detailed discussion of the technical issues of an LLVM-based implementation. Again, not sure whether this is appropriate for this GSoC project, but it seemed quite worthwhile to raise this issue, since if I'm correct, using an existing optimizer (LLVM is the default candidate here) could produce better results and avoid ad-hoc work that would be scrapped later. I may consider doing this myself, either as a GSoC proposal if still possible, or otherwise, if no one else does before, and time permits (the latter issue is the major problem here...) |
From: Luca B. <luc...@gm...> - 2010-04-03 20:09:14
|
> I don't agree with this. Making the format description table mutable when the only formats that are potentially unsupported due to patent issues are s3tc variants makes no sense. S3TC formats *are* special. There is nothing to generalize here. Yes, I don't like this very much either. The immediate alternative is to have separate "is_supported" flags for externally-implemented formats, but this also doesn't look perfect to me. Another thing to look at is to remove both is_supported and the pack/unpack functions, and put them in a separate, possibly mutable, table. In some sense pack/unpack functionality does not really belong in the format description, since many interfaces are possible (for instance llvmpipe has another interface that is code-generated separately for SoA tiles). This last option, with a mutable format access table, seems conceptually the cleanest to me, but not sure. > Replacing the conditionals with a no-op stubs is a good optimization. > But attempting to load s3tc shared library from the stubs is unnecessary. Stubs should have an assert(0) -- it is an error to attempt any S3TC (de)compression when there's no support for it. The fundamental issue here seems to be: what to do if the application tries to read/write an unsupported format? Currently, unsupported formats have empty functions rather than assert(0), so I just kept with that convention. Since it is permissible to call other format functions without checking they are supported, I made S3TC work consistently with that, which requires on-demand loading upon format access. In general it seems to me that the fact that S3TC (or any other) formats are somehow special should be completely hidden to any user. This allows to write generic robust format-independent code. Explicit initialization or ad-hoc format checking goes counter to this, and requires to sprinkle code everywhere (for instance, I suspect the rbug texture-examination tools don't work right now in master on S3TC because they don't call util_format_s3tc_init). It might makes sense to make all unsupported formats assert(0). A C++ exception would be the perfect thing since you could catch it, but unfortunately we aren't using C++ right now. Another option that seems better to me is to have an util_format_get_functions that would either give you a pointer to a table of functions, or return NULL if unsupported, and make this the only way of accessing format conversions. This way, applications will naturally have to check for support before usage, and both GCC and a static checker can be told to flag an error if the util_format_get_functions return value is not checked for null before use. BTW, note that the indirect function calls are also generally slow, and we may want to switch Gallium to C++ and use C++ templates to specialize (and fully inline) whole algorithms for specific formats. llvmpipe and the code generation facilities lessen the need for this, but it might perhaps be worthwhile in some cases. This a wholly separate issue, but may be worth keeping in mind. |
From: Tom S. <tst...@gm...> - 2010-04-03 19:45:10
|
On Sat, Apr 03, 2010 at 08:37:39PM +0200, Luca Barbieri wrote: > This is somewhat nice, but without using a real compiler, the result > will still be just a toy, unless you employ hundreds of compiler > experts working full time on the project. > <SNIP - loop optimization techniques from Wikipedia> > > Good luck doing all this on TGSI (especially if the developer does not > have serious experience writing production compilers). I agree with you that doing these kinds of optimizations is a difficult task, but I am trying to focus my proposal on emulating branches and loops for older hardware that don't have branching instructions rather than performing global optimizations on the TGSI code. I don't think most of the loop optimizations you listed are even possible on hardware without branching instructions. > Also, this does not mention all the other optimizations and analyses > required to the above stuff well (likely other 10-20 things). > > Using a real compiler (e.g. LLVM, but also gcc or Open64), those > optimizations are already implemented, or at least there is already a > team of experienced compiler developers who are working full time to > implement such optimizations, allowing you to then just turn them on > without having to do any of the work yourself. > > Note all "X compiler is bad for VLIW or whatever GPU architecture" > objections are irrelevant, since almost all optimizations are totally > architecture independent. > > Also note that we should support OpenCL/compute shaders (already > available for *3* years on e.g. nv50) and those *really* need a real > compiler (as in, something developed for years by a team of compiler > experts, and in wide use). > For instance, nVidia uses Open64 to compile CUDA programs, and then > feeds back the output (via PTX) to their ad-hoc code generator. > > Note that unlike Mesa/Gallium, nVidia actually had a working shader > optimizer AND a large paid team, yet they still decided to at least > partially use Open64. > > PathScale (who seems to mainly sell an Open64-based compiler for the > HPC market) might do some of this work (with a particular focus on a > CUDA replacement for nv50), but it's unclear whether this will turn > out to generally useful (for all Gallium drivers, as opposed to > nv50-only) or not. > Also they plan to use Open64 and WHIRL, and it's unclear whether this > is as well designed for embedding and easy to understand and customize > like LLVM is (please expand of this you know about it) > > Really, the current code generation situation is totally _embarassing_ > (and r300 is probably one of the best here, having its own compiler, > and doesn't even have loops, so you can imagine how good the other > drivers are), and ought to be fixed in a definitive fashion. > > This is obviously not achievable if Mesa/Gallium contributors are > supposed to write the compiler optimization themselves, since clearly > there is not even enough manpower to support a relatively up-to-date > version of OpenGL or, say, to have drivers that can allocate and fence > GPU memory in a sensible and fast way, or implement hierarchical Z > buffers, or any of the other things expected from a decent driver, > that the Mesa drivers don't do. > > In other words, state-of-the-art optimizing compilers are not > something one can just pop up and write himself from scratch, unless > he is interested and skilled at it, it is his main project AND he > manages to attract, or pays, a community of compiler experts to work > on it. > > Since LLVM already works well, has a community of compiler experts > working on it, and is funded by companies such as Apple, there is no > chance of attracting such a community, especially for something > limited to the niche of compiling shaders. > > And yes, LLVM->TGSI->LLVM is not entirely trivial, but it is doable > (obviously), and once you get past that initial hurdle, you get > EVERYTHING FOR FREE. > And the free work keeps coming with every commit to the llvm > repository, and you only have to do the minimal work of updating for > LLVM interface changes. > So you can just do nothing and after a few months you notice that your > driver is faster on very advanced games because a new LLVM > automatically improved the quality of your shaders without you even > knowing about it. > > Not to mention that we could then at some point just get rid of TGSI, > use LLVM IR directly, and have each driver implement a normal backend > if possible. > > The test for adequateness of a shader compiler is saying "yes, this > code is really good: I can't easily come up with any way to improve > it", looking at the generated code for any example you can find. > > Any ad-hoc compiler will most likely immediately fail such a test, for > complex examples. I think that part of the advantage of my proposal is that the branch instruction translation is done on the TGSI code. So, even if the architecture of the GLSL compiler is changed to something like LLVM->TGSI->LLVM, these translations can still be applied by hardware that needs them. > So, for a GSoC project, I'd kind of suggest: > (1) Adapt the gallivm/llvmpipe TGSI->LLVM converter to also generate > AoS code (i.e. RGBA vectors as opposed to RRRR, GGGG, etc.) if > possible or write one from scratch otherwise > (2) Write a LLVM->TGSI backend, restricted to programs without any control flow > (3) Make LLVM->TGSI always work (even with control flow and DDX/DDY) > (4) Hook up all useful LLVM optimizations > > If there is still time/as followup (note that these are mostly complex > things, at most one/two might be doable in the timeframe) > (5) Do something about uniform-specific shader generation, and support > automatically generating "pre-shaders" for the CPU (using the > x86/x86-64 LLVM backends) for uniform-only computations > (6) Enhance LLVM to provide any missing optimization with a significant impact > (7) Convert existing drivers to LLVM backends, or have them expose > more functionality to the TGSI backend via TGSI extensions (or > currently unused features such as predicate support), and do > driver-specific stuff (e.g. scalarization for scalar architectures) > (8) Make sure shaders can be compiled using as large as possible a > subset of plain C/C++, as well as OpenCL (using clang), and add OpenCL > support to Mesa/Gallium (some of it already exists in external > repositories) > (9) Compare with fglrx and nVidia libGL,/cgc/nvopencc and improve > whatever necessary to be equal or better than them > (10) Talk with LLVM developers about good VLIW code generation for the > Radeons and to a lesser extent nv30/nv40 that need it, and find out > exactly what the problem is here, how it can be solved and who could > do the work > (11) Add Gallium support for nv10/nv20 and r100/r200 using the LLVM > DAG instruction selector to code-generate a fixed pipeline (Stephane > Marchesin tried this already, seems it is non-trivial but could be > made to work partially, and probably enough to get the Xorg state > tracker to work on all cards and get rid of all X drivers at some > point). > (12) Figure out if any other compilers (Open64, gcc, whatever) can be > useful as backends for some drivers I think (2) is probably the closest to what I am proposing, and it is something I can take a look at. Thanks for your feedback. -Tom Stellard |
From: Luca B. <luc...@gm...> - 2010-04-03 19:39:43
|
By the way, if you want a simple, limited and temporary, but very effective, way to optimize shaders, here it is: 1. Trivially convert TGSI to GLSL 2. Feed the GLSL to the nVidia Cg compiler, telling it to produce optimized output in ARB_fragment_program format 3. Ask the Mesa frontend/state tracker to parse the ARB_fragment_program and give you back TGSI This does actually optimize the program well and does all the nice control flow transformations desired. If your GPU can support predicates or condition codes, you can also ask the Cg compiler to give you NV_fragment_program_option, which will use them efficiently. If it also supports control flow, you can ask for NV_fragment_program2 and get control flow too where appropriate. Of course, if this does not happen to do exactly what you want, you are totally out of luck, since it is closed source. With an ad-hoc TGSI optimizer, you can modify it, but that will often require to rearchitecture the module, since it may be too primitive for the new feature you want, and implement everything from scratch with no supporting tools to help you. With a real compiler framework, you already have the optimization ready for use, or you at least have a comprehensive conceptual framework and IR and a full set of analyses, frameworks and tools to use, not to mention a whole community of compiler developers that can at least tell you what is the best way of doing what you want (actually giving out competent advice), if not even have already done or planned to do it themselves. |