You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
(10) |
Apr
(28) |
May
(41) |
Jun
(91) |
Jul
(63) |
Aug
(45) |
Sep
(37) |
Oct
(80) |
Nov
(91) |
Dec
(47) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(48) |
Feb
(121) |
Mar
(126) |
Apr
(16) |
May
(85) |
Jun
(84) |
Jul
(115) |
Aug
(71) |
Sep
(27) |
Oct
(33) |
Nov
(15) |
Dec
(71) |
2002 |
Jan
(73) |
Feb
(34) |
Mar
(39) |
Apr
(135) |
May
(59) |
Jun
(116) |
Jul
(93) |
Aug
(40) |
Sep
(50) |
Oct
(87) |
Nov
(90) |
Dec
(32) |
2003 |
Jan
(181) |
Feb
(101) |
Mar
(231) |
Apr
(240) |
May
(148) |
Jun
(228) |
Jul
(156) |
Aug
(49) |
Sep
(173) |
Oct
(169) |
Nov
(137) |
Dec
(163) |
2004 |
Jan
(243) |
Feb
(141) |
Mar
(183) |
Apr
(364) |
May
(369) |
Jun
(251) |
Jul
(194) |
Aug
(140) |
Sep
(154) |
Oct
(167) |
Nov
(86) |
Dec
(109) |
2005 |
Jan
(176) |
Feb
(140) |
Mar
(112) |
Apr
(158) |
May
(140) |
Jun
(201) |
Jul
(123) |
Aug
(196) |
Sep
(143) |
Oct
(165) |
Nov
(158) |
Dec
(79) |
2006 |
Jan
(90) |
Feb
(156) |
Mar
(125) |
Apr
(146) |
May
(169) |
Jun
(146) |
Jul
(150) |
Aug
(176) |
Sep
(156) |
Oct
(237) |
Nov
(179) |
Dec
(140) |
2007 |
Jan
(144) |
Feb
(116) |
Mar
(261) |
Apr
(279) |
May
(222) |
Jun
(103) |
Jul
(237) |
Aug
(191) |
Sep
(113) |
Oct
(129) |
Nov
(141) |
Dec
(165) |
2008 |
Jan
(152) |
Feb
(195) |
Mar
(242) |
Apr
(146) |
May
(151) |
Jun
(172) |
Jul
(123) |
Aug
(195) |
Sep
(195) |
Oct
(138) |
Nov
(183) |
Dec
(125) |
2009 |
Jan
(268) |
Feb
(281) |
Mar
(295) |
Apr
(293) |
May
(273) |
Jun
(265) |
Jul
(406) |
Aug
(679) |
Sep
(434) |
Oct
(357) |
Nov
(306) |
Dec
(478) |
2010 |
Jan
(856) |
Feb
(668) |
Mar
(927) |
Apr
(269) |
May
(12) |
Jun
(13) |
Jul
(6) |
Aug
(8) |
Sep
(23) |
Oct
(4) |
Nov
(8) |
Dec
(11) |
2011 |
Jan
(4) |
Feb
(2) |
Mar
(3) |
Apr
(9) |
May
(6) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <bug...@fr...> - 2010-03-31 22:56:15
|
http://bugs.freedesktop.org/show_bug.cgi?id=27403 Summary: GLSL struct causing "Invalid src register file ..." error Product: Mesa Version: 7.6 Platform: x86-64 (AMD64) OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core AssignedTo: mes...@li... ReportedBy: bri...@gm... GLSL fragment shader is producing this error: Mesa 7.7.1 implementation error: Invalid src register file 12 in get_src_register_pointer() Please report at bugzilla.freedesktop.org I've reduced the code down as much as I could while still reproducing the bug: --- shader.frag -------------- uniform sampler3D volShadSampler0; struct VolShad { sampler3D texture; int samples; int channels; mat4 worldToScreen; }; vec3 testfunc(VolShad vs, vec3 p) { return vec3(1.0, 1.0, 1.0); } void main() { // (Initializing the VolShad struct this way also causes the error) //VolShad volShad0 = VolShad(volShadSampler0, 8, 3, mat4(0.987538, 0.911446, 0.626908, 0.626908, 0, 2.20361, -0.496881, -0.49688, 1.03169, -0.872442, -0.600081, -0.600081, -47.4917, 35.4831, 75.2649, 75.3648)); VolShad volShad0; volShad0.texture = volShadSampler0; volShad0.texture = 8; volShad0.channels = 3; volShad0.worldToScreen = mat4(0.987538, 0.911446, 0.626908, 0.626908, 0, 2.20361, -0.496881, -0.49688, 1.03169, -0.872442, -0.600081, -0.600081, -47.4917, 35.4831, 75.2649, 75.3648); vec3 outputColor = testfunc(volShad0, vec3(1, 1, 1)); gl_FragColor = vec4(1, 1, 1, 1); } -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: Luca B. <luc...@gm...> - 2010-03-31 17:58:51
|
> WINE can deal with that. The real showstopper is that WINE has to also > work on MacOS X and Linux + NVIDIA blob, where Gallium is unavailable. We could actually consider making a Gallium driver that uses OpenGL to do rendering. If the app uses DirectX 10, this may not significantly degrade performance, and should instead appreciably increase it if a Gallium driver is available. On the other hand, for DirectX 9 apps, this could decrease performance significantly (because DirectX 9 has immediate mode and doesn't require CSOs). |
From: Xavier B. <xav...@fr...> - 2010-03-31 15:44:52
|
On Wed, 2010-03-31 at 13:29 +0900, Miles Bader wrote: > Luca Barbieri <luc...@gm...> writes: > > In fact, given the Gallium architecture, it may even make sense to > > support a variant of DirectX 10 as the main Mesa/Gallium API on all > > platfoms, instead of OpenGL. > > The apparent benefit would seem to be greater compatibility with > software written for windows -- but that benefit is unlikely to remain, > as MS basically changes their interfaces drastically with each major > revision. WINE can deal with that. The real showstopper is that WINE has to also work on MacOS X and Linux + NVIDIA blob, where Gallium is unavailable. Xav |
From: Brian P. <bri...@gm...> - 2010-03-31 12:18:58
|
Looks good. Please apply to the 7.8 branch. -Brian On Tue, Mar 30, 2010 at 11:44 PM, <sk...@gm...> wrote: > From: Ben Skeggs <bs...@re...> > > --- > src/mesa/state_tracker/st_atom_rasterizer.c | 2 +- > src/mesa/state_tracker/st_program.c | 6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/state_tracker/st_atom_rasterizer.c b/src/mesa/state_tracker/st_atom_rasterizer.c > index 9c9a99b..5669b1f 100644 > --- a/src/mesa/state_tracker/st_atom_rasterizer.c > +++ b/src/mesa/state_tracker/st_atom_rasterizer.c > @@ -209,7 +209,7 @@ static void update_raster_state( struct st_context *st ) > */ > if (vertProg) { > if (vertProg->Base.Id == 0) { > - if (vertProg->Base.OutputsWritten & (1 << VERT_RESULT_PSIZ)) { > + if (vertProg->Base.OutputsWritten & BITFIELD64_BIT(VERT_RESULT_PSIZ)) { > /* generated program which emits point size */ > raster->point_size_per_vertex = TRUE; > } > diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c > index 7f8677d..6e8c446 100644 > --- a/src/mesa/state_tracker/st_program.c > +++ b/src/mesa/state_tracker/st_program.c > @@ -121,7 +121,7 @@ st_prepare_vertex_program(struct st_context *st, > /* Compute mapping of vertex program outputs to slots. > */ > for (attr = 0; attr < VERT_RESULT_MAX; attr++) { > - if ((stvp->Base.Base.OutputsWritten & (1 << attr)) == 0) { > + if ((stvp->Base.Base.OutputsWritten & BITFIELD64_BIT(attr)) == 0) { > stvp->result_to_output[attr] = ~0; > } > else { > @@ -388,7 +388,7 @@ st_translate_fragment_program(struct st_context *st, > GLbitfield64 outputsWritten = stfp->Base.Base.OutputsWritten; > > /* if z is written, emit that first */ > - if (outputsWritten & (1 << FRAG_RESULT_DEPTH)) { > + if (outputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) { > fs_output_semantic_name[fs_num_outputs] = TGSI_SEMANTIC_POSITION; > fs_output_semantic_index[fs_num_outputs] = 0; > outputMapping[FRAG_RESULT_DEPTH] = fs_num_outputs; > @@ -398,7 +398,7 @@ st_translate_fragment_program(struct st_context *st, > > /* handle remaning outputs (color) */ > for (attr = 0; attr < FRAG_RESULT_MAX; attr++) { > - if (outputsWritten & (1 << attr)) { > + if (outputsWritten & BITFIELD64_BIT(attr)) { > switch (attr) { > case FRAG_RESULT_DEPTH: > /* handled above */ > -- > 1.7.0.1 > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > |
From: Dave A. <ai...@gm...> - 2010-03-31 10:17:59
|
On Tue, Mar 30, 2010 at 6:26 PM, Nicolai Haehnle <nha...@gm...> wrote: > Reply to all this time... > > On Tue, Mar 30, 2010 at 8:13 AM, Marek Olšák <ma...@gm...> wrote: >>> > 1) Branching and looping >>> > >>> > This is the most important one and there are 3 things which need to be >>> > done. >>> > * Unrolling loops and converting conditionals to multiplications. This >>> > is >>> > crucial for R3xx-R4xx GLSL support. I don't say it will work in all >>> > cases >>> > but should be fine for the most common ones. This is kind of a standard >>> > in >>> > all proprietary drivers supporting shaders 2.0. It would be nice have it >>> > work with pure TGSI shaders so that drivers like nvfx can reuse it too >>> > and I >>> > personally prefer to have this feature first before going on. >>> >>> Would you be able to provide a small example of how to convert the >>> conditionals to multiplications? I understand the basic idea is to mask >>> values based on the result of the conditional, but it would help me to see >>> an example. On IRC, eosie mentioned an alternate technique for emulating >>> conditionals: Save the values of variables that might be affected by >>> the conditional statement. Then, after executing both the if and the else >>> branches, roll back the variables that were affected by the branch that >>> was not supposed to be taken. Would this technique work as well? >> >> Well, I am eosie, thanks for the info, it's always cool to be reminded what >> I've written on IRC. ;) >> >> Another idea was to convert TGSI to a SSA form. That would make unrolling >> branches much easier as the Phi function would basically become a linear >> interpolation, loops and subroutines with conditional return statements >> might be trickier. The r300 compiler already uses SSA for its optimization >> passes so maybe you wouldn't need to mess with TGSI that much... > > Note that my Git repository already contains an implementation of > branch emulation and some additional optimizations, see here: > http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl > > Shame on me for abandoning it - I should really get around to make > sure it fits in with recent changes and merge it to master. The main > problem is that it produces "somewhat" inefficient code. Adding and > improving peephole and similar optimizations should help tremendously. git rebases cleanly onto master, and piglit has -2 for me here texCube->fail glsl-fs-fragcoord -> fail Now it might be other things I haven't had to time to investigate, just letting you know that merging it might not a bad plan, Dave. |
From: Chia-I Wu <ol...@gm...> - 2010-03-31 08:10:42
|
On Wed, Mar 31, 2010 at 12:52 AM, Keith Whitwell <ke...@vm...> wrote: > On Sun, 2010-03-28 at 23:56 -0700, Chia-I Wu wrote: >> I happened to be playing with the idea yesterday. My take is to define an EGL >> extension, EGL_MESA_gallium. The extension defines Gallium as a rendering API >> of EGL. The downside of this approach is that it depends on st/egl. The >> upside is that, it will work on whatever platform st/egl supports. >> >> I've cleaned up my work a little bit. You can find it in the attachments. >> There is a port of "clear" raw demo to use EGL_MESA_gallium. The demo supports >> window resizing, and is accelerated if a hardware EGL driver is used. >> >> The demo renders into a X11 window. It is worth noting that, when there is no >> need to render into an EGLSurface, eglCreateWindowSurface or eglMakeCurrent is >> not required. To interface with X11, I've also borrowed some code from OpenVG >> demos and renamed it to EGLUT. > I'm not sure how far to take any of these "naked" gallium approaches. > My motivation was to build something to provide a very controlled > environment for bringup of new drivers - basically getting to the first > triangle and not much further. After that, existing state trackers with > stable ABIs are probably preferable. Ok. The benefit of using st/egl is that you get to see the results on the screen. pipe_screen::flush_frontbffer is usually not implemented by hw pipe drivers. But I guess that is minor for bring-up of new drivers. -- ol...@Lu... |
From: Pauli N. <su...@gm...> - 2010-03-31 06:33:56
|
-m64 was not set to ARCH_FLAGS. Signed-off-by: Pauli Nieminen <su...@gm...> --- I noticed that there was difference in -m32 handling compare to -m64. I suspect that -m64 would need same but I don't know if there is some reason not to add the flag same way. configure.ac | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/configure.ac b/configure.ac index 17d61d1..f3d3dc8 100644 --- a/configure.ac +++ b/configure.ac @@ -179,6 +179,7 @@ AC_ARG_ENABLE([64-bit], if test "x$enable_64bit" = xyes; then if test "x$GCC" = xyes; then CFLAGS="$CFLAGS -m64" + ARCH_FLAGS="$ARCH_FLAGS -m64" LDFLAGS="$LDFLAGS -m64" fi if test "x$GXX" = xyes; then -- 1.7.0 |
From: Pauli N. <su...@gm...> - 2010-03-31 06:33:56
|
-m32 and -m64 were missing from linker flags which caused linking errors with dri river linking test. Adding correct flag to linker paramters fixes the linking. Signed-off-by: Pauli Nieminen <su...@gm...> --- Does this look correct way of passing the flags to the build system? configure.ac | 11 +++++++++-- src/mesa/drivers/dri/Makefile.template | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f2e87f4..17d61d1 100644 --- a/configure.ac +++ b/configure.ac @@ -163,9 +163,11 @@ if test "x$enable_32bit" = xyes; then if test "x$GCC" = xyes; then CFLAGS="$CFLAGS -m32" ARCH_FLAGS="$ARCH_FLAGS -m32" + LDFLAGS_ADD="$LDFLAGS_ADD -m32" fi if test "x$GXX" = xyes; then CXXFLAGS="$CXXFLAGS -m32" + LDFLAGS_ADD="$LDFLAGS_ADD -m32" fi fi AC_ARG_ENABLE([64-bit], @@ -177,9 +179,11 @@ AC_ARG_ENABLE([64-bit], if test "x$enable_64bit" = xyes; then if test "x$GCC" = xyes; then CFLAGS="$CFLAGS -m64" + LDFLAGS="$LDFLAGS -m64" fi if test "x$GXX" = xyes; then CXXFLAGS="$CXXFLAGS -m64" + LDFLAGS="$LDFLAGS -m64" fi fi @@ -1414,7 +1418,7 @@ dnl prepend CORE_DIRS to SRC_DIRS SRC_DIRS="$CORE_DIRS $SRC_DIRS" dnl Restore LDFLAGS and CPPFLAGS -LDFLAGS="$_SAVE_LDFLAGS" +LDFLAGS="$_SAVE_LDFLAGS $LDFLAGS_ADD" CPPFLAGS="$_SAVE_CPPFLAGS" dnl Substitute the config @@ -1498,11 +1502,14 @@ dnl Compiler options cflags=`echo $CFLAGS $OPT_FLAGS $PIC_FLAGS $ARCH_FLAGS | \ $SED 's/^ *//;s/ */ /;s/ *$//'` cxxflags=`echo $CXXFLAGS $OPT_FLAGS $PIC_FLAGS $ARCH_FLAGS | \ - $SED 's/^ *//;s/ */ /;s/ *$//'` + $SED 's/^ *//;s/ */ /;s/ *$//'` +ldflags=`echo $LDFLAGS | \ + $SED 's/^ *//;s/ */ /;s/ *$//'` defines=`echo $DEFINES $ASM_FLAGS | $SED 's/^ *//;s/ */ /;s/ *$//'` echo "" echo " CFLAGS: $cflags" echo " CXXFLAGS: $cxxflags" +echo " LDFLAGS: $ldflags" echo " Macros: $defines" echo "" diff --git a/src/mesa/drivers/dri/Makefile.template b/src/mesa/drivers/dri/Makefile.template index f19cc03..a2592bf 100644 --- a/src/mesa/drivers/dri/Makefile.template +++ b/src/mesa/drivers/dri/Makefile.template @@ -54,7 +54,7 @@ $(LIBNAME): $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) Makefile \ $(TOP)/src/mesa/drivers/dri/Makefile.template $(TOP)/src/mesa/drivers/dri/common/dri_test.o $(MKLIB) -o $@.tmp -noprefix -linker '$(CC)' -ldflags '$(LDFLAGS)' \ $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) $(DRI_LIB_DEPS) - $(CC) -o $@.test $(TOP)/src/mesa/drivers/dri/common/dri_test.o $@.tmp $(DRI_LIB_DEPS) + $(CC) -o $@.test $(TOP)/src/mesa/drivers/dri/common/dri_test.o $@.tmp $(LDFLAGS) $(DRI_LIB_DEPS) @rm -f $@.test mv -f $@.tmp $@ -- 1.7.0 |
From: Tom S. <tst...@gm...> - 2010-03-31 06:06:45
|
On Wed, Mar 31, 2010 at 04:34:48AM +0200, Marek Olšák wrote: > On Tue, Mar 30, 2010 at 10:26 AM, Nicolai Haehnle <nha...@gm...>wrote: > > > > Note that my Git repository already contains an implementation of > > branch emulation and some additional optimizations, see here: > > http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl<http://cgit.freedesktop.org/%7Enh/mesa/log/?h=r300g-glsl> > > > > Shame on me for abandoning it - I should really get around to make > > sure it fits in with recent changes and merge it to master. The main > > problem is that it produces "somewhat" inefficient code. Adding and > > improving peephole and similar optimizations should help tremendously. > > > > Well it's either this or nothing so I guess I am not the only one to prefer > to get it merged. ;) However that kinda slightly changes Tom's plan for the > GSoC project. > > On a different note, considering that the r300 compiler has basically 2 > frontends (Mesa IR and TGSI) and 3 backends (r300 VS & FS, r500 FS), would > it be feasible to add yet another backend - TGSI? That would turn the > compiler into a generic Gallium shader optimizer with the lowering tools it > already has (or will have) and more people would be interested in adding new > features and improvements in it. > Implementing branch emulation with TGSI was something I have been thinking about doing, so maybe it would make sense to try and focus on doing more of the optimizations with TGSI and creating a generic Gallium shader optimizer, like you said. Even though Nicolai's branch only involves the r300 compiler, it will still be a good guide for me if I am implementing something similar with TGSI. -Tom |
From: <sk...@gm...> - 2010-03-31 05:31:03
|
From: Ben Skeggs <bs...@re...> --- src/mesa/state_tracker/st_atom_rasterizer.c | 2 +- src/mesa/state_tracker/st_program.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_rasterizer.c b/src/mesa/state_tracker/st_atom_rasterizer.c index 9c9a99b..5669b1f 100644 --- a/src/mesa/state_tracker/st_atom_rasterizer.c +++ b/src/mesa/state_tracker/st_atom_rasterizer.c @@ -209,7 +209,7 @@ static void update_raster_state( struct st_context *st ) */ if (vertProg) { if (vertProg->Base.Id == 0) { - if (vertProg->Base.OutputsWritten & (1 << VERT_RESULT_PSIZ)) { + if (vertProg->Base.OutputsWritten & BITFIELD64_BIT(VERT_RESULT_PSIZ)) { /* generated program which emits point size */ raster->point_size_per_vertex = TRUE; } diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index 7f8677d..6e8c446 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -121,7 +121,7 @@ st_prepare_vertex_program(struct st_context *st, /* Compute mapping of vertex program outputs to slots. */ for (attr = 0; attr < VERT_RESULT_MAX; attr++) { - if ((stvp->Base.Base.OutputsWritten & (1 << attr)) == 0) { + if ((stvp->Base.Base.OutputsWritten & BITFIELD64_BIT(attr)) == 0) { stvp->result_to_output[attr] = ~0; } else { @@ -388,7 +388,7 @@ st_translate_fragment_program(struct st_context *st, GLbitfield64 outputsWritten = stfp->Base.Base.OutputsWritten; /* if z is written, emit that first */ - if (outputsWritten & (1 << FRAG_RESULT_DEPTH)) { + if (outputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) { fs_output_semantic_name[fs_num_outputs] = TGSI_SEMANTIC_POSITION; fs_output_semantic_index[fs_num_outputs] = 0; outputMapping[FRAG_RESULT_DEPTH] = fs_num_outputs; @@ -398,7 +398,7 @@ st_translate_fragment_program(struct st_context *st, /* handle remaning outputs (color) */ for (attr = 0; attr < FRAG_RESULT_MAX; attr++) { - if (outputsWritten & (1 << attr)) { + if (outputsWritten & BITFIELD64_BIT(attr)) { switch (attr) { case FRAG_RESULT_DEPTH: /* handled above */ -- 1.7.0.1 |
From: Miles B. <mi...@gn...> - 2010-03-31 04:45:19
|
Luca Barbieri <luc...@gm...> writes: > In fact, given the Gallium architecture, it may even make sense to > support a variant of DirectX 10 as the main Mesa/Gallium API on all > platfoms, instead of OpenGL. The apparent benefit would seem to be greater compatibility with software written for windows -- but that benefit is unlikely to remain, as MS basically changes their interfaces drastically with each major revision. If Mesa just tried to stick with the older interface, the advantage of using it would largely evaporate (as software makers abandoned it and their support bit-rots), but if Mesa tried to adopt each new version, it would end up trailing behind on an interface completely controlled by Microsoft, and that's _not_ a good place to be. It's rather fortunate to have a portable and still widely used interface such as OpenGL, and I think the Mesa project should try their best to encourage, not discourage, wider use of it. -Miles -- Alliance, n. In international politics, the union of two thieves who have their hands so deeply inserted in each other's pockets that they cannot separately plunder a third. |
From: Marek O. <ma...@gm...> - 2010-03-31 02:34:58
|
On Tue, Mar 30, 2010 at 10:26 AM, Nicolai Haehnle <nha...@gm...>wrote: > On Tue, Mar 30, 2010 at 8:13 AM, Marek Olšák <ma...@gm...> wrote: > > Another idea was to convert TGSI to a SSA form. That would make unrolling > > branches much easier as the Phi function would basically become a linear > > interpolation, loops and subroutines with conditional return statements > > might be trickier. The r300 compiler already uses SSA for its > optimization > > passes so maybe you wouldn't need to mess with TGSI that much... > > Note that my Git repository already contains an implementation of > branch emulation and some additional optimizations, see here: > http://cgit.freedesktop.org/~nh/mesa/log/?h=r300g-glsl<http://cgit.freedesktop.org/%7Enh/mesa/log/?h=r300g-glsl> > > Shame on me for abandoning it - I should really get around to make > sure it fits in with recent changes and merge it to master. The main > problem is that it produces "somewhat" inefficient code. Adding and > improving peephole and similar optimizations should help tremendously. > Well it's either this or nothing so I guess I am not the only one to prefer to get it merged. ;) However that kinda slightly changes Tom's plan for the GSoC project. On a different note, considering that the r300 compiler has basically 2 frontends (Mesa IR and TGSI) and 3 backends (r300 VS & FS, r500 FS), would it be feasible to add yet another backend - TGSI? That would turn the compiler into a generic Gallium shader optimizer with the lowering tools it already has (or will have) and more people would be interested in adding new features and improvements in it. -Marek |
From: Luca B. <luc...@gm...> - 2010-03-30 23:24:41
|
An interesting option could be to provide a DirectX 10 implementation using TGSI text as the shader interface, which should be much easier than one would think at first. DirectX 10 + TGSI text would provide a very thin binary compatible layer over Gallium, unlike all existing state trackers. It could even run Windows games if integrated with Wine and something producing TGSI from either HLSL text or D3D10 bytecode (e.g. whatever Wine uses to produce GLSL + the Mesa GLSL frontend + st_mesa_to_tgsi). In fact, given the Gallium architecture, it may even make sense to support a variant of DirectX 10 as the main Mesa/Gallium API on all platfoms, instead of OpenGL. |
From: Ian R. <id...@fr...> - 2010-03-30 21:28:55
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Corbin Simpson wrote: > On Mon, Mar 29, 2010 at 5:50 PM, Ian Romanick <id...@fr...> wrote: >> Philipp Klaus Krause wrote: >> >>> Well, there is TexSubImage2D. Assuming we have a compressed texture >>> stored internally as some S3TC format and then the application replaces >>> part of it using TexSubImage2D. According to ARB_texture_compression we >>> may not go to uncompressed ("the allocation and chosen compressed image >>> format must not be a function of any other state and cannot be changed >>> once they are established". And while ARB_texture_compression does not >>> require TexSubImage2D support, EXT_texture_compression_s3tc does. >> Ah. Good catch. My best guess is that there are few, if any, apps that >> do that. Such apps would be easy to detect. We could enable the >> non-conformant behavior by default, and provide a driconf switch to >> disable it. We'd then need to blacklist apps that use unsupported >> cases. Since we can detect these cases, we can log a message when the >> occur. >> >> Does that seem like a reasonable compromise? > > We don't have to compromise at all. If the image is already compressed > internally, then updating it with TexSubImage or CompressedTexSubImage > must be done along the block boundaries, and must be done with > pre-compressed blocks, so we are never decompressing and recompressing > the texture. I suspect that TexSubImage calls won't provide compressed data. The compromise is the case where the data is compressed but the subimage is not. Imagine the case where a game has a bunch of textures for walls. Something happens in the games, say the player "tags" the wall with their logo (like in Half-Life), and the game modifies the original texture using TexSubImage (or CopyTexSubImage). > I've pushed a branch, s3tc-by-the-book, to my personal repo > (http://cgit.freedesktop.org/~csimpson/mesa/?h=s3tc-by-the-book), that > changes to this newer behavior. I haven't written up test cases for > these delightful corners and edges we're finding, but they shouldn't > be too hard to handle. The basic idea behind this branch is that if > the internal format request indicates that GL should compress the > texture with S3TC, but we don't have libdxtn present, we just change > the internal format to something more sensible and refuse to compress. I'll take a look at it, but it sounds like the right idea. How are software fallbacks handled, if at all? This actually sounds like a job for metaops. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuybLIACgkQX1gOwKyEAw/LXACfaC2ijrir5gU2NJ+4ViqIKmct 56gAnRWe1w2NkeKFluEsWg+Ur6jmzcay =XvYM -----END PGP SIGNATURE----- |
From: Luca B. <luc...@gm...> - 2010-03-30 18:59:54
|
> On Tue, 2010-03-30 at 09:52 -0700, Luca Barbieri wrote: >> > There are several deep challenges in making TGSI <-> LLVM IR translation >> > lossless -- I'm sure we'll get around to overcome them -- but I don't >> > think that using LLVM is a requirement for this module. Having a shared >> > IR for simple TGSI optimization module would go a long way by itself. >> >> What are these challenges? > > - Control flow as you mentioned -- gets broken into jump spaghetti. LoopSimplify seems to do at least some of the work for loops. Not sure if there is an if-construction pass, but it should be relatively easy. Once you have an acyclic CFG subgraph (which hopefully LoopSimplify easily gives you), every basic block with more than one outedge will need to have an if/else block generated. Now find the first block in topological sort order such that any path from the if start block reaches that block before any later ones in topological sort order. I think this is called the forward dominator, and LLVM should have analysis that gives you that easily. After that, just duplicate the CFG between the if block start and the forward dominator to build each branch of the if, and recursively process the branches. If you have a DDX/DDY present in multiple if parts, you are screwed, but that won't happen without optimization and hopefully you can tune fragment program optimization so that doesn't happen at all. > - Predicates can't be represented -- you need to use AND / NAND masking. > I know people have asked support for this in the LLVM list so it might > change someday. For the LLVM->TGSI part, x86 has condition codes. Not sure how LLVM represents them, but I suppose predicates can be handled in the same way Multiple predicate registers may not work well, but GPUs probably don't have them in hardware anyway (e.g. nv30/nv40 only have one or two). For the TGSI->LLVM part, Mesa never outputs predicates afaik. > - missing intrinsics -- TGSI has a much richer instruction set than > LLVM's builtin instructions, so it would be necessary to add several > llvm.tgsi.xxx instrinsics (e.g., for max, min, madd, exp2, log2, etc), > and teach LLVM to do constant propagation for every single one of them. Yes, of course. Initially you could do without constant propagation. Also again x86/SSE has many of the same intrinsics, so their approach can be imitated. I think MAD can be handled by mul + add, if you don't care about whether an extra rounding is done or not (and I think, for GPU shaders, it's not really a high priority issue). Anyway SSE5 has fused multiply/add, so LLVM has/will have a way. > - Constants -- you often want to make specialized version of shaders for > certain constants, especially when you have control flow statements > whose arguments are constants (e.g., when doing TNL with a big glsl > shader), and therefore should be factored out. You also may want to do > factor out constant operations (e.g., MUL TEMP[1], CONST[0], CONST[1]) > But LLVM can't help you with that given that for LLVM IR constants are > ordinary memory, like the inputs. LLVM doesn't know that a shader will > be invoked million of times with the same constants but varying inputs. If you want to do that, you must of course run LLVM for each constant set, telling it what the constant values are. You can probably identify branch-relevant constant from the LLVM SSA form to restrict that set. For the MUL TEMP[1], CONST[0], CONST[1], I suppose you could enclose the shader code in a big loop to simulate the rasterizer. LLVM will them move the CONST[0] * CONST[1] outside the loop, and you can codegen the part outside the loop using an LLVM CPU backend. In this case, using LLVM will give you automatic "pre-shader" generation for the CPU mostly for free. Alternatively, you could have a basic IF-simplifier on TGSI that only supports the conditional being the comparison of a constant to something else (using the rasterizer loop trick can allow you to get simpler conditionals). > If people can make this TGSI optimization module work quickly on top of > LLVM then it's fine by me. I'm just pointing out that between the > extreme of sharing nothing between each pipe driver compiler, and > sharing everything with LLVM, there's a middle ground which is sharing > between pipe drivers but not LLVM. Once that module exists having it > use LLVM internally would then be pretty easy. It looks to me a better > way to parallize the effort than to be blocked for quite some time on > making TGSI <-> LLVM IR be lossless. Yes, sure, a minimal module can be written first and then LLVM use can be investigated later. In other words, it's not necessarily trivial, but definitely seems doable. In particular getting it to work on anything non-GLSL should be relatively straightforward. |
From: José F. <jfo...@vm...> - 2010-03-30 17:36:23
|
On Tue, 2010-03-30 at 09:52 -0700, Luca Barbieri wrote: > > There are several deep challenges in making TGSI <-> LLVM IR translation > > lossless -- I'm sure we'll get around to overcome them -- but I don't > > think that using LLVM is a requirement for this module. Having a shared > > IR for simple TGSI optimization module would go a long way by itself. > > What are these challenges? - Control flow as you mentioned -- gets broken into jump spaghetti. - Predicates can't be represented -- you need to use AND / NAND masking. I know people have asked support for this in the LLVM list so it might change someday. - missing intrinsics -- TGSI has a much richer instruction set than LLVM's builtin instructions, so it would be necessary to add several llvm.tgsi.xxx instrinsics (e.g., for max, min, madd, exp2, log2, etc), and teach LLVM to do constant propagation for every single one of them. - Constants -- you often want to make specialized version of shaders for certain constants, especially when you have control flow statements whose arguments are constants (e.g., when doing TNL with a big glsl shader), and therefore should be factored out. You also may want to do factor out constant operations (e.g., MUL TEMP[1], CONST[0], CONST[1]) But LLVM can't help you with that given that for LLVM IR constants are ordinary memory, like the inputs. LLVM doesn't know that a shader will be invoked million of times with the same constants but varying inputs. If people can make this TGSI optimization module work quickly on top of LLVM then it's fine by me. I'm just pointing out that between the extreme of sharing nothing between each pipe driver compiler, and sharing everything with LLVM, there's a middle ground which is sharing between pipe drivers but not LLVM. Once that module exists having it use LLVM internally would then be pretty easy. It looks to me a better way to parallize the effort than to be blocked for quite some time on making TGSI <-> LLVM IR be lossless. At any rate, in my book whoever does the job gets to choose. I won't have any time to put into it unfortunately, so feel free to ignore me. Jose |
From: Zack R. <za...@vm...> - 2010-03-30 17:33:37
|
On Tuesday 30 March 2010 12:52:54 Luca Barbieri wrote: > > There are several deep challenges in making TGSI <-> LLVM IR translation > > lossless -- I'm sure we'll get around to overcome them -- but I don't > > think that using LLVM is a requirement for this module. Having a shared > > IR for simple TGSI optimization module would go a long way by itself. > > What are these challenges? Besides what Brian just pointed out, it's also worth noting that the one problem that everyone dreads is creating LLVM code-generator for TGSI. Everyone seems to agree that it's a darn complicated task with a somewhat undefined scope. It's obviously something that will be mandatory for OpenCL, but I doubt anyone will touch it before it's an absolute must. |
From: Brian P. <br...@vm...> - 2010-03-30 17:15:54
|
This is getting off-topic, but anyway... Luca Barbieri wrote: >> There are several deep challenges in making TGSI <-> LLVM IR translation >> lossless -- I'm sure we'll get around to overcome them -- but I don't >> think that using LLVM is a requirement for this module. Having a shared >> IR for simple TGSI optimization module would go a long way by itself. > > What are these challenges? Control flow is hard. Writing a TGSI backend for LLVM would be a lot of work. Etc. > If you keep vectors and don't scalarize, I don't see why it shouldn't > just work, especially if you just roundtrip without running any > passes. > The DAG instruction matcher should be able to match writemasks, > swizzles, etc. fine. > > Control flow may not be exactly reconstructed, but I think LLVM has > control flow canonicalization that should allow to reconstruct a > loop/if control flow structure of equivalent efficiency. LLVM only has branch instructions while GPU instruction sets avoid branching and use explicit conditional and loop constructs. Analyzing the LLVM IR branches to reconstruct GPU loops and conditionals isn't easy. > Using LLVM has the obvious advantage that all optimizations have > already been written and tested. > And for complex shaders, you may really need a good full optimizer > (that can do inter-basic-block and interprocedural optimizations, > alias analysis, advanced loop optmizations, and so on), especially if > we start supporting OpenCL over TGSI. > > There is also the option of having the driver directly consume the > LLVM IR, and the frontend directly produce it (e.g. clang supports > OpenCL -> LLVM). > > Some things, like inlining, are easy to do directly in TGSI (but only > because all regs are global). Inlining isn't always easy. The Mesa GLSL compiler inlines function calls whenever possible. But there are some tricky cases. For example, if the function we want to inline has deeply nested early return statements you have to convert the return statements into something else to avoid mistakenly returning from the calling function. The LLVM optimizer may handle this just fine, but translating the resulting LLVM IR back to TGSI could be hard (see above). > However, even determining the minimum number of loop iterations for > loop unrolling is very hard to do without a full compiler. > > For instance, consider code like this: > if(foo >= 6) > { > if(foo == 1) > iters = foo + 3; > else if(bar == 1) > iters = foo + 5 + bar; > else > iters = foo + 7; > > for(i = 0; i < iters; ++i) LOOP_BODY; > > } > > You need a non-trivial optimizer (with control flow support, value > range propagation, and constant folding) to find out that the loop > always executes at least 12 iterations, which you need to know to > unroll it optimally. > More complex examples are possible. Yup, it's hard. > It general, anything that requires (approximately) determining any > property of the program potentially benefits from having the most > complex and powerful optimizer available. I also think that some optimizations are more effective if they're applied at a higher level (in the GLSL compiler, for example). But that's a another topic of conversation. -Brian |
From: Corbin S. <mos...@gm...> - 2010-03-30 17:10:57
|
On Tue, Mar 30, 2010 at 10:05 AM, Luca Barbieri <luc...@gm...> wrote: > DDX/DDY could cause miscompilation, but I think that only happens if > LLVM clones or causes some paths to net execute them. > > Someone proposed some time ago on llvmdev to add a flag to tell llvm > to never duplicate an intrinsic, not sure if that went through (iirc, > it was for a barrier instruction that relied on the instruction > pointer). > Alternatively, it should be possible to just disable any passes that > clone basic blocks if those instructions are present. > > The non-execution problem should be fixable by declaring DDX/DDY to > have global-write-like side effects (this will prevent dead code > elimination of them if they are totally unused, but hopefully shaders > are not written so badly they need that). We're talking about a HW-specific issue here, not anything that needs global changes. I'm really not sure where you're going with this. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Corbin S. <mos...@gm...> - 2010-03-30 17:08:50
|
On Tue, Mar 30, 2010 at 8:37 AM, Luca Barbieri <luc...@gm...> wrote: >> Another idea was to convert TGSI to a SSA form. That would make unrolling >> branches much easier as the Phi function would basically become a linear >> interpolation, loops and subroutines with conditional return statements >> might be trickier. The r300 compiler already uses SSA for its optimization >> passes so maybe you wouldn't need to mess with TGSI that much... >> >>> >>> Is the conditional translation something that only needs to be done >>> in the Gallium drivers, or would it be useful to apply the translation >>> before the Mesa IR is converted into TGSI? Are any of the other drivers >>> (Gallium or Mesa) currently doing this kind of translation? >> >> Not that I know of. You may do it wherever you want theoretically, even in >> the r300 compiler and leaving TGSI untouched, but I think most people would >> appreciate if these translation were done in TGSI. > > It would be nice to have a driver-independent TGSI optimization module. > It could either operate directly on TGSI (probably only good for > simple optimization), or convert to LLVM IR, optimize, and convert > back. > > This would allow to use this for all drivers: note that at least > inlining and loop unrolling should generally be performed even for > hardware with full control flow support. > Lots of other optimizations would then be possible (using LLVM, with a > single line of code to request the appropriate LLVM pass), and would > automatically be available for all drivers, instead of being only > available for r300 by putting them in the radeon compiler. This is orthogonal to the suggested project... -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: Keith W. <ke...@vm...> - 2010-03-30 17:08:32
|
On Sun, 2010-03-28 at 23:56 -0700, Chia-I Wu wrote: > On Mon, Mar 29, 2010 at 1:51 AM, Keith Whitwell > <kei...@go...> wrote: > > I've just pushed a variation on a theme a couple of people have > > explored in the past, ie. an interface to gallium without an > > intervening state-tracker. > > The purpose of this is for writing minimal test programs to exercise > > new gallium drivers in isolation from the rest of the codebase. > > In fact it doesn't really make sense to say "without a state tracker", > > unless you don't mind creating test programs which are specific to the > > windowing system you're currently working with. Some similar work has > > avoided window-system issues altogether by dumping bitmaps to files, > > or using eg. python to abstract over window systems. > > This approach is a little different - I've defined a super-minimal api > > for creating/destroying windows, currently calling this "graw", and we > > have a tiny little co-state-tracker that each implementation provides. > > This is similar to the glut approach of abstracting over window > > systems, though much less complete. > > It currently consists of three calls: > > struct pipe_screen *graw_init( void ); > > void *graw_create_window(...); > > void graw_destroy_window( void *handle ); > > which are sufficient to build simple demos on top of. A future > > enhancement would be to add a glut-style input handling facility. > > Right now there's a single demo, "clear.c" which displays an ugly > > purple box. Builds so far only with scons, using winsys=graw-xlib. > I happened to be playing with the idea yesterday. My take is to define an EGL > extension, EGL_MESA_gallium. The extension defines Gallium as a rendering API > of EGL. The downside of this approach is that it depends on st/egl. The > upside is that, it will work on whatever platform st/egl supports. > > I've cleaned up my work a little bit. You can find it in the attachments. > There is a port of "clear" raw demo to use EGL_MESA_gallium. The demo supports > window resizing, and is accelerated if a hardware EGL driver is used. > > The demo renders into a X11 window. It is worth noting that, when there is no > need to render into an EGLSurface, eglCreateWindowSurface or eglMakeCurrent is > not required. To interface with X11, I've also borrowed some code from OpenVG > demos and renamed it to EGLUT. > I'm not sure how far to take any of these "naked" gallium approaches. My motivation was to build something to provide a very controlled environment for bringup of new drivers - basically getting to the first triangle and not much further. After that, existing state trackers with stable ABIs are probably preferable. Keith |
From: Luca B. <luc...@gm...> - 2010-03-30 17:05:43
|
DDX/DDY could cause miscompilation, but I think that only happens if LLVM clones or causes some paths to net execute them. Someone proposed some time ago on llvmdev to add a flag to tell llvm to never duplicate an intrinsic, not sure if that went through (iirc, it was for a barrier instruction that relied on the instruction pointer). Alternatively, it should be possible to just disable any passes that clone basic blocks if those instructions are present. The non-execution problem should be fixable by declaring DDX/DDY to have global-write-like side effects (this will prevent dead code elimination of them if they are totally unused, but hopefully shaders are not written so badly they need that). |
From: Luca B. <luc...@gm...> - 2010-03-30 16:53:02
|
> There are several deep challenges in making TGSI <-> LLVM IR translation > lossless -- I'm sure we'll get around to overcome them -- but I don't > think that using LLVM is a requirement for this module. Having a shared > IR for simple TGSI optimization module would go a long way by itself. What are these challenges? If you keep vectors and don't scalarize, I don't see why it shouldn't just work, especially if you just roundtrip without running any passes. The DAG instruction matcher should be able to match writemasks, swizzles, etc. fine. Control flow may not be exactly reconstructed, but I think LLVM has control flow canonicalization that should allow to reconstruct a loop/if control flow structure of equivalent efficiency. Using LLVM has the obvious advantage that all optimizations have already been written and tested. And for complex shaders, you may really need a good full optimizer (that can do inter-basic-block and interprocedural optimizations, alias analysis, advanced loop optmizations, and so on), especially if we start supporting OpenCL over TGSI. There is also the option of having the driver directly consume the LLVM IR, and the frontend directly produce it (e.g. clang supports OpenCL -> LLVM). Some things, like inlining, are easy to do directly in TGSI (but only because all regs are global). However, even determining the minimum number of loop iterations for loop unrolling is very hard to do without a full compiler. For instance, consider code like this: if(foo >= 6) { if(foo == 1) iters = foo + 3; else if(bar == 1) iters = foo + 5 + bar; else iters = foo + 7; for(i = 0; i < iters; ++i) LOOP_BODY; } You need a non-trivial optimizer (with control flow support, value range propagation, and constant folding) to find out that the loop always executes at least 12 iterations, which you need to know to unroll it optimally. More complex examples are possible. It general, anything that requires (approximately) determining any property of the program potentially benefits from having the most complex and powerful optimizer available. |
From: José F. <jfo...@vm...> - 2010-03-30 15:53:33
|
On Tue, 2010-03-30 at 08:37 -0700, Luca Barbieri wrote: > > Another idea was to convert TGSI to a SSA form. That would make unrolling > > branches much easier as the Phi function would basically become a linear > > interpolation, loops and subroutines with conditional return statements > > might be trickier. The r300 compiler already uses SSA for its optimization > > passes so maybe you wouldn't need to mess with TGSI that much... > > > >> > >> Is the conditional translation something that only needs to be done > >> in the Gallium drivers, or would it be useful to apply the translation > >> before the Mesa IR is converted into TGSI? Are any of the other drivers > >> (Gallium or Mesa) currently doing this kind of translation? > > > > Not that I know of. You may do it wherever you want theoretically, even in > > the r300 compiler and leaving TGSI untouched, but I think most people would > > appreciate if these translation were done in TGSI. > > It would be nice to have a driver-independent TGSI optimization module. > It could either operate directly on TGSI (probably only good for > simple optimization), or convert to LLVM IR, optimize, and convert > back. > > This would allow to use this for all drivers: note that at least > inlining and loop unrolling should generally be performed even for > hardware with full control flow support. > Lots of other optimizations would then be possible (using LLVM, with a > single line of code to request the appropriate LLVM pass), and would > automatically be available for all drivers, instead of being only > available for r300 by putting them in the radeon compiler. Agreed. These were my thoughts too when watching Nicolai Haehnle's FOSDEM presentation. In my opinion the best would be to use a SSA form of TGSI, with possibility for annotations or ability to have hardware specific instructions, so that the drivers could faithfully represent all the oddities in certain hardware. There are several deep challenges in making TGSI <-> LLVM IR translation lossless -- I'm sure we'll get around to overcome them -- but I don't think that using LLVM is a requirement for this module. Having a shared IR for simple TGSI optimization module would go a long way by itself. Jose |
From: Luca B. <luc...@gm...> - 2010-03-30 15:37:18
|
> Another idea was to convert TGSI to a SSA form. That would make unrolling > branches much easier as the Phi function would basically become a linear > interpolation, loops and subroutines with conditional return statements > might be trickier. The r300 compiler already uses SSA for its optimization > passes so maybe you wouldn't need to mess with TGSI that much... > >> >> Is the conditional translation something that only needs to be done >> in the Gallium drivers, or would it be useful to apply the translation >> before the Mesa IR is converted into TGSI? Are any of the other drivers >> (Gallium or Mesa) currently doing this kind of translation? > > Not that I know of. You may do it wherever you want theoretically, even in > the r300 compiler and leaving TGSI untouched, but I think most people would > appreciate if these translation were done in TGSI. It would be nice to have a driver-independent TGSI optimization module. It could either operate directly on TGSI (probably only good for simple optimization), or convert to LLVM IR, optimize, and convert back. This would allow to use this for all drivers: note that at least inlining and loop unrolling should generally be performed even for hardware with full control flow support. Lots of other optimizations would then be possible (using LLVM, with a single line of code to request the appropriate LLVM pass), and would automatically be available for all drivers, instead of being only available for r300 by putting them in the radeon compiler. |