mesa3d-dev Mailing List for Mesa3D (Page 14)

Brought to you by: alanh, brianp, chadversary, keithw

mesa3d-dev

You can subscribe to this list here.

2000	Jan	Feb	Mar (10)	Apr (28)	May (41)	Jun (91)	Jul (63)	Aug (45)	Sep (37)	Oct (80)	Nov (91)	Dec (47)
2001	Jan (48)	Feb (121)	Mar (126)	Apr (16)	May (85)	Jun (84)	Jul (115)	Aug (71)	Sep (27)	Oct (33)	Nov (15)	Dec (71)
2002	Jan (73)	Feb (34)	Mar (39)	Apr (135)	May (59)	Jun (116)	Jul (93)	Aug (40)	Sep (50)	Oct (87)	Nov (90)	Dec (32)
2003	Jan (181)	Feb (101)	Mar (231)	Apr (240)	May (148)	Jun (228)	Jul (156)	Aug (49)	Sep (173)	Oct (169)	Nov (137)	Dec (163)
2004	Jan (243)	Feb (141)	Mar (183)	Apr (364)	May (369)	Jun (251)	Jul (194)	Aug (140)	Sep (154)	Oct (167)	Nov (86)	Dec (109)
2005	Jan (176)	Feb (140)	Mar (112)	Apr (158)	May (140)	Jun (201)	Jul (123)	Aug (196)	Sep (143)	Oct (165)	Nov (158)	Dec (79)
2006	Jan (90)	Feb (156)	Mar (125)	Apr (146)	May (169)	Jun (146)	Jul (150)	Aug (176)	Sep (156)	Oct (237)	Nov (179)	Dec (140)
2007	Jan (144)	Feb (116)	Mar (261)	Apr (279)	May (222)	Jun (103)	Jul (237)	Aug (191)	Sep (113)	Oct (129)	Nov (141)	Dec (165)
2008	Jan (152)	Feb (195)	Mar (242)	Apr (146)	May (151)	Jun (172)	Jul (123)	Aug (195)	Sep (195)	Oct (138)	Nov (183)	Dec (125)
2009	Jan (268)	Feb (281)	Mar (295)	Apr (293)	May (273)	Jun (265)	Jul (406)	Aug (679)	Sep (434)	Oct (357)	Nov (306)	Dec (478)
2010	Jan (856)	Feb (668)	Mar (927)	Apr (269)	May (12)	Jun (13)	Jul (6)	Aug (8)	Sep (23)	Oct (4)	Nov (8)	Dec (11)
2011	Jan (4)	Feb (2)	Mar (3)	Apr (9)	May (6)	Jun	Jul (1)	Aug (1)	Sep	Oct (2)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2013	Jan (2)	Feb (2)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (7)	Nov (1)	Dec
2014	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 12 13 14 15 16 .. 858 > >> (Page 14 of 858)

Re: [Mesa3d-dev] gallium failing to build on darwin/ppc

From: tom f. <tf...@al...> - 2010-04-03 19:34:48

Vinson Lee <vl...@vm...> writes:
> Leopard uses gcc-4.0, which didn't have built-in support for atomic
> variables.

u_atomic.h should probably check for a supported compiler; Jeremy, does
the attached patch produce an understandable error instead of a link
error?

In terms of a solution, Jeremy, you could implement PPC assembly for
the few primitives available there.  Looks easy for someone who knows
PPC well.  There's a comment in that file that mentions a mutex-based
implementation... but then I don't see one.

Looks like there's some dri radeon code which is using the gcc
primitives directly instead of through the gallium wrapper.  I'm not
familiar enough w/ it to know if that's correct or not (.. anyway,
you're probably not building dri/radeon on OS X, right?).

-tom

> ________________________________________
> From: Jeremy Huddleston [jer...@fr...]
> Sent: Saturday, April 03, 2010 11:22 AM
> To: mes...@li...
> Subject: [Mesa3d-dev] gallium failing to build on darwin/ppc
> 
> Is there any known reason why gallium would fail to build on darwin/ppc?  I h
> aven't looked into it myself since I figured there might be an easy answer al
> ready
> 
> http://trac.macports.org/ticket/24345

[Mesa3d-dev] gallium-util-format-is-supported

From: Jose F. <jfo...@vm...> - 2010-04-03 19:25:32

> commit 5126683e3b971ccfb51e50e560750ce44e86bae8
> Author: Luca Barbieri <lu...@lu...>
> Date:   Fri Apr 2 05:23:32 2010 +0200
> 
>     gallium/util: add util_format_is_supported to check for pack/unpack
>     
>     This improves the code by making it more readable, and removes
>     special knowledge of S3TC and other formats from softpipe.
> @@ -92,7 +92,7 @@ def write_format_table(formats):
>      u_format_pack.generate(formats)
>      
>      for format in formats:
> -        print 'const struct util_format_description'
> +        print 'struct util_format_description'
>          print 'util_format_%s_description = {' % (format.short_name(),)
>          print "   %s," % (format.name,)
>          print "   \"%s\"," % (format.name,)

I don't agree with this. Making the format description table mutable when the only formats that are potentially unsupported due to patent issues are s3tc variants makes no sense. S3TC formats *are* special. There is nothing to generalize here. 

> commit 52e9b990a192a9329006d5f7dd2ac222effea5a5
> Author: Luca Barbieri <lu...@lu...>
> Date:   Fri Apr 2 04:48:42 2010 +0200
> 
>     gallium/util: load s3tc on demand
>     
>     This changes the S3TC function pointers to be initialized to stubs
>     that load the S3TC library and then delegate to the real functions.
>     
>     If the S3TC library fails to load, the function pointers are replaced
>     with a "nop" function.     
>     The code is also changed to attempt to load the library only one time.c
>     
>     Note that unlike checking for a flag, this method has no performance
>     cost at all.
>     
>     The use of the "nop" functions also allows to avoid most checks, that
>     are only preserved when the function does non-trivial work.

Replacing the conditionals with a no-op stubs is a good optimization.

But attempting to load s3tc shared library from the stubs is unnecessary. Stubs should have an assert(0) -- it is an error to attempt any S3TC (de)compression when there's no support for it.

Jose

Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

From: Luca B. <luc...@gm...> - 2010-04-03 19:09:17

As a further example that just came to mind, nv40 (GeForce 6-7 and PS3
RSX) supports control flow in fragment shaders, but does not
apparently support the "continue" keyword (since NV_fragment_program2,
which maps almost directly to the hardware, does not have it either).

I implemented TGSI control flow in a private branch, but did not
implement the "continue" keyword.

Implementing "continue" requires to transform the code to generate and
carry around "should continue" flags, or perform even less trivial
transformations including code duplication.

Unfortunately, doing requires non-local modifications, and thus would
require to do something beyond just scanning the TGSI source code as
the nv30/nv40 driver currently does.

If there was a TGSI->LLVM->TGSI module, the LLVM->TGSI control flow
reconstruction would already handle this, and it would be enough to
tell it to not make use of the "continue" instruction: it would then
automatically generate the proper if/endif structure, duplicating code
and/or introducing flags as needed in a generic way.

As things stand now, I'm faced with either just hoping the GLSL
programs don't use "continue", implementing an hack in the nv40 shader
backend (where such an high-level optimization does not belong at all
and can't be done cleanly), or writing the LLVM module myself before
tackling this.

With an LLVM-based infrastructure, there would be a clear and
straightforward way to solve this, will all the supporting
infrastructure already available and the ability to create an
optimization pass reusable by other drivers that may face the same
issue.

This is just an example, by the way: others can be found.

Re: [Mesa3d-dev] gallium failing to build on darwin/ppc

From: Vinson L. <vl...@vm...> - 2010-04-03 18:52:58

Leopard uses gcc-4.0, which didn't have built-in support for atomic variables.

________________________________________
From: Jeremy Huddleston [jer...@fr...]
Sent: Saturday, April 03, 2010 11:22 AM
To: mes...@li...
Subject: [Mesa3d-dev] gallium failing to build on darwin/ppc

Is there any known reason why gallium would fail to build on darwin/ppc?  I haven't looked into it myself since I figured there might be an easy answer already

http://trac.macports.org/ticket/24345

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mes...@li...
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

From: Luca B. <luc...@gm...> - 2010-04-03 18:37:46

This is somewhat nice, but without using a real compiler, the result
will still be just a toy, unless you employ hundreds of compiler
experts working full time on the project.

For instance, Wikipedia lists the following loop optimizations:
# loop interchange : These optimizations exchange inner loops with
outer loops. When the loop variables index into an array, such a
transformation can improve locality of reference, depending on the
array's layout. This is also known as loop permutation.

# loop splitting/loop peeling : Loop splitting attempts to simplify a
loop or eliminate dependencies by breaking it into multiple loops
which have the same bodies but iterate over different contiguous
portions of the index range. A useful special case is loop peeling,
which can simplify a loop with a problematic first iteration by
performing that iteration separately before entering the loop.

# loop fusion or loop combining : Another technique which attempts to
reduce loop overhead. When two adjacent loops would iterate the same
number of times (whether or not that number is known at compile time),
their bodies can be combined as long as they make no reference to each
other's data.

# loop fission or loop distribution : Loop fission attempts to break a
loop into multiple loops over the same index range but each taking
only a part of the loop's body. This can improve locality of
reference, both of the data being accessed in the loop and the code in
the loop's body.

# loop unrolling: Duplicates the body of the loop multiple times, in
order to decrease the number of times the loop condition is tested and
the number of jumps, which may degrade performance by impairing the
instruction pipeline. Completely unrolling a loop eliminates all
overhead (except multiple instruction fetches & increased program load
time), but requires that the number of iterations be known at compile
time (except in the case of JIT compilers). Care must also be taken to
ensure that multiple re-calculation of indexed variables is not a
greater overhead than advancing pointers within the original loop.

# loop unswitching : Unswitching moves a conditional inside a loop
outside of it by duplicating the loop's body, and placing a version of
it inside each of the if and else clauses of the conditional.

# loop inversion : This technique changes a standard while loop into a
do/while (a.k.a. repeat/until) loop wrapped in an if conditional,
reducing the number of jumps by two, for cases when the loop is
executed. Doing so duplicates the condition check (increasing the size
of the code) but is more efficient because jumps usually cause a
pipeline stall. Additionally, if the initial condition is known at
compile-time and is known to be side-effect-free, the if guard can be
skipped.

# loop-invariant code motion : If a quantity is computed inside a loop
during every iteration, and its value is the same for each iteration,
it can vastly improve efficiency to hoist it outside the loop and
compute its value just once before the loop begins. This is
particularly important with the address-calculation expressions
generated by loops over arrays. For correct implementation, this
technique must be used with loop inversion, because not all code is
safe to be hoisted outside the loop.

# loop reversal : Loop reversal reverses the order in which values are
assigned to the index variable. This is a subtle optimization which
can help eliminate dependencies and thus enable other optimizations.
Also, certain architectures utilise looping constructs at Assembly
language level that count in a single direction only (e.g.
decrement-jump-if-not-zero (DJNZ)).

# loop tiling/loop blocking : Loop tiling reorganizes a loop to
iterate over blocks of data sized to fit in the cache.

# loop skewing : Loop skewing takes a nested loop iterating over a
multidimensional array, where each iteration of the inner loop depends
on previous iterations, and rearranges its array accesses so that the
only dependencies are between iterations of the outer loop.



Good luck doing all this on TGSI (especially if the developer does not
have serious experience writing production compilers).

Also, this does not mention all the other optimizations and analyses
required to the above stuff well (likely other 10-20 things).

Using a real compiler (e.g. LLVM, but also gcc or Open64), those
optimizations are already implemented, or at least there is already a
team of experienced compiler developers who are working full time to
implement such optimizations, allowing you to then just turn them on
without having to do any of the work yourself.

Note all "X compiler is bad for VLIW or whatever GPU architecture"
objections are irrelevant, since almost all optimizations are totally
architecture independent.

Also note that we should support OpenCL/compute shaders (already
available for *3* years on e.g. nv50) and those *really* need a real
compiler (as in, something developed for years by a team of compiler
experts, and in wide use).
For instance, nVidia uses Open64 to compile CUDA programs, and then
feeds back the output (via PTX) to their ad-hoc code generator.

Note that unlike Mesa/Gallium, nVidia actually had a working shader
optimizer AND a large paid team, yet they still decided to at least
partially use Open64.

PathScale (who seems to mainly sell an Open64-based compiler for the
HPC market) might do some of this work (with a particular focus on a
CUDA replacement for nv50), but it's unclear whether this will turn
out to generally useful (for all Gallium drivers, as opposed to
nv50-only) or not.
Also they plan to use Open64 and WHIRL, and it's unclear whether this
is as well designed for embedding and easy to understand and customize
like LLVM is (please expand of this you know about it)

Really, the current code generation situation is totally _embarassing_
(and r300 is probably one of the best here, having its own compiler,
and doesn't even have loops, so you can imagine how good the other
drivers are), and ought to be fixed in a definitive fashion.

This is obviously not achievable if Mesa/Gallium contributors are
supposed to write the compiler optimization themselves, since clearly
there is not even enough manpower to support a relatively up-to-date
version of OpenGL or, say, to have drivers that can allocate and fence
GPU memory in a sensible and fast way, or implement hierarchical Z
buffers, or any of the other things expected from a decent driver,
that the Mesa drivers don't do.

In other words, state-of-the-art optimizing compilers are not
something one can just pop up and write himself from scratch, unless
he is interested and skilled at it, it is his main project AND he
manages to attract, or pays, a community of compiler experts to work
on it.

Since LLVM already works well, has a community of compiler experts
working on it, and is funded by companies such as Apple, there is no
chance of attracting such a community, especially for something
limited to the niche of compiling shaders.

And yes, LLVM->TGSI->LLVM is not entirely trivial, but it is doable
(obviously), and once you get past that initial hurdle, you get
EVERYTHING FOR FREE.
And the free work keeps coming with every commit to the llvm
repository, and you only have to do the minimal work of updating for
LLVM interface changes.
So you can just do nothing and after a few months you notice that your
driver is faster on very advanced games because a new LLVM
automatically improved the quality of your shaders without you even
knowing about it.

Not to mention that we could then at some point just get rid of TGSI,
use LLVM IR directly, and have each driver implement a normal backend
if possible.

The test for adequateness of a shader compiler is saying "yes, this
code is really good: I can't easily come up with any way to improve
it", looking at the generated code for any example you can find.

Any ad-hoc compiler will most likely immediately fail such a test, for
complex examples.


So, for a GSoC project, I'd kind of suggest:
(1) Adapt the gallivm/llvmpipe TGSI->LLVM converter to also generate
AoS code (i.e. RGBA vectors as opposed to RRRR, GGGG, etc.) if
possible or write one from scratch otherwise
(2) Write a LLVM->TGSI backend, restricted to programs without any control flow
(3) Make LLVM->TGSI always work (even with control flow and DDX/DDY)
(4) Hook up all useful LLVM optimizations

If there is still time/as followup (note that these are mostly complex
things, at most one/two might be doable in the timeframe)
(5) Do something about uniform-specific shader generation, and support
automatically generating "pre-shaders" for the CPU (using the
x86/x86-64 LLVM backends) for uniform-only computations
(6) Enhance LLVM to provide any missing optimization with a significant impact
(7) Convert existing drivers to LLVM backends, or have them expose
more functionality to the TGSI backend via TGSI extensions (or
currently unused features such as predicate support), and do
driver-specific stuff (e.g. scalarization for scalar architectures)
(8) Make sure shaders can be compiled using as large as possible a
subset of plain C/C++, as well as OpenCL (using clang), and add OpenCL
support to Mesa/Gallium (some of it already exists in external
repositories)
(9) Compare with fglrx and nVidia libGL,/cgc/nvopencc and improve
whatever necessary to be equal or better than them
(10) Talk with LLVM developers about good VLIW code generation for the
Radeons and to a lesser extent nv30/nv40 that need it, and find out
exactly what the problem is here, how it can be solved and who could
do the work
(11) Add Gallium support for nv10/nv20 and r100/r200 using the LLVM
DAG instruction selector to code-generate a fixed pipeline (Stephane
Marchesin tried this already, seems it is non-trivial but could be
made to work partially, and probably enough to get the Xorg state
tracker to work on all cards and get rid of all X drivers at some
point).
(12) Figure out if any other compilers (Open64, gcc, whatever) can be
useful as backends for some drivers

Maybe I should propose to do it myself though, if that is still
possible, since everyone else seems afraid of it for some reason and
it seems to me it is absolutely essential to have a chance of having
usable (read: that don't look ridiculous compared to the proprietary
ones) drivers, especially in the long run for DirectX 11-level and
later games and software heavily using OpenCL/compute shaders and very
complex tessellation/vertex/geometry/fragment shaders.

[Mesa3d-dev] gallium failing to build on darwin/ppc

From: Jeremy H. <jer...@fr...> - 2010-04-03 18:22:21

Is there any known reason why gallium would fail to build on darwin/ppc?  I haven't looked into it myself since I figured there might be an easy answer already

http://trac.macports.org/ticket/24345

[Mesa3d-dev] [PATCH] progs/gallium/unit: improve error detection in u_format_test and make it more lenient for S3TC

From: Luca B. <lu...@lu...> - 2010-04-03 16:18:32

Collect the maximum error for fetch/unpack tests, and ratio of flipped
to total bits for pack tests.

Add lenient thresholds for S3TC tests.
---
 progs/gallium/unit/u_format_test.c |  163 +++++++++++++++++++-----------------
 1 files changed, 86 insertions(+), 77 deletions(-)

diff --git a/progs/gallium/unit/u_format_test.c b/progs/gallium/unit/u_format_test.c
index 53e0284..1911dad 100644
--- a/progs/gallium/unit/u_format_test.c
+++ b/progs/gallium/unit/u_format_test.c
@@ -36,22 +36,48 @@
 #include "util/u_format_s3tc.h"
 
 
+static float
+float_error(float x, float y)
+{
+   return fabsf(y - x);
+}
+
+static float
+byte_error(uint8_t x, uint8_t y)
+{
+   return float_error(x / 255.0, y / 255.0);
+}
+
+/* this is done in this terrible way only because these are unit tests.
+ * a real implementation must use a lookup table, or the mask/shift/add
+ * algorithm in the Linux source
+ * it should also use the builtin/intrinsic if available
+ */
+static unsigned
+popcnt8(uint8_t v)
+{
+   unsigned i;
+   unsigned cnt = 0;
+   for(i = 0; i < 8; ++i)
+      cnt += ((v >> i) & 1);
+   return cnt;
+}
+
 static boolean
-compare_float(float x, float y)
+print_max_error(const struct util_format_description *format_desc, float max_error)
 {
-   float error = y - x;
+   if(max_error <= FLT_EPSILON)
+      return TRUE;
 
-   if (error < 0.0f)
-      error = -error;
+   printf("MAX ABS ERROR: %f float, %.1f 8scaled\n", max_error, max_error * 255.0);
 
-   if (error > FLT_EPSILON) {
-      return FALSE;
-   }
+   /* compression tests aren't currently perfect, so be lenient here */
+   if(format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC && max_error < 0.01f)
+      return TRUE;
 
-   return TRUE;
+   return FALSE;
 }
 
-
 static void
 print_packed(const struct util_format_description *format_desc,
              const char *prefix,
@@ -69,6 +95,31 @@ print_packed(const struct util_format_description *format_desc,
    printf("%s", suffix);
 }
 
+static boolean
+print_packed_results(const struct util_format_description *format_desc, const struct util_format_test_case *test, uint8_t* packed)
+{
+   unsigned flipped_bits = 0;
+   unsigned total_bits = 0;
+   float flipped_bits_ratio;
+   unsigned i;
+   for (i = 0; i < format_desc->block.bits/8; ++i) {
+      flipped_bits += popcnt8((test->packed[i] ^ packed[i]) & test->mask[i]);
+      total_bits += popcnt8(test->mask[i]);
+   }
+
+   flipped_bits_ratio = (float)flipped_bits / total_bits;
+
+   if (flipped_bits)
+      printf("FLIPPED BITS: %u (%u %%)\n", flipped_bits, (unsigned)(flipped_bits_ratio * 100.0));
+
+   /* TODO: S3TC threshold is random */
+   if (flipped_bits_ratio > (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC ? 0.1 : 0)) {
+      print_packed(format_desc, "FAILED: ", packed, " obtained\n");
+      print_packed(format_desc, "        ", test->packed, " expected\n");
+      return FALSE;
+   }
+   return TRUE;
+}
 
 static void
 print_unpacked_doubl(const struct util_format_description *format_desc,
@@ -94,7 +145,7 @@ print_unpacked_doubl(const struct util_format_description *format_desc,
 static void
 print_unpacked_float(const struct util_format_description *format_desc,
                      const char *prefix,
-                     const float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
+                     float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
                      const char *suffix)
 {
    unsigned i, j;
@@ -115,7 +166,7 @@ print_unpacked_float(const struct util_format_description *format_desc,
 static void
 print_unpacked_8unorm(const struct util_format_description *format_desc,
                       const char *prefix,
-                      const uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
+                      uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
                       const char *suffix)
 {
    unsigned i, j;
@@ -138,26 +189,23 @@ test_format_fetch_float(const struct util_format_description *format_desc,
 {
    float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } };
    unsigned i, j, k;
-   boolean success;
+   float max_error = 0.0f;
 
-   success = TRUE;
    for (i = 0; i < format_desc->block.height; ++i) {
       for (j = 0; j < format_desc->block.width; ++j) {
          format_desc->fetch_float(unpacked[i][j], test->packed, j, i);
-         for (k = 0; k < 4; ++k) {
-            if (!compare_float(test->unpacked[i][j][k], unpacked[i][j][k])) {
-               success = FALSE;
-            }
-         }
+         for (k = 0; k < 4; ++k)
+            max_error = MAX2(max_error, float_error(test->unpacked[i][j][k], unpacked[i][j][k]));
       }
    }
 
-   if (!success) {
+   if (!print_max_error(format_desc, max_error)) {
       print_unpacked_float(format_desc, "FAILED: ", unpacked, " obtained\n");
       print_unpacked_doubl(format_desc, "        ", test->unpacked, " expected\n");
+      return FALSE;
    }
 
-   return success;
+   return TRUE;
 }
 
 
@@ -167,27 +215,24 @@ test_format_unpack_float(const struct util_format_description *format_desc,
 {
    float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } };
    unsigned i, j, k;
-   boolean success;
+   float max_error = 0.0f;
 
    format_desc->unpack_float(&unpacked[0][0][0], sizeof unpacked[0], test->packed, 0, format_desc->block.width, format_desc->block.height);
 
-   success = TRUE;
    for (i = 0; i < format_desc->block.height; ++i) {
       for (j = 0; j < format_desc->block.width; ++j) {
-         for (k = 0; k < 4; ++k) {
-            if (!compare_float(test->unpacked[i][j][k], unpacked[i][j][k])) {
-               success = FALSE;
-            }
-         }
+         for (k = 0; k < 4; ++k)
+            max_error = MAX2(max_error, float_error(test->unpacked[i][j][k], unpacked[i][j][k]));
       }
    }
 
-   if (!success) {
+   if (!print_max_error(format_desc, max_error)) {
       print_unpacked_float(format_desc, "FAILED: ", unpacked, " obtained\n");
       print_unpacked_doubl(format_desc, "        ", test->unpacked, " expected\n");
+      return FALSE;
    }
 
-   return success;
+   return TRUE;
 }
 
 
@@ -199,16 +244,10 @@ test_format_pack_float(const struct util_format_description *format_desc,
    float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4];
    uint8_t packed[UTIL_FORMAT_MAX_PACKED_BYTES];
    unsigned i, j, k;
-   boolean success;
 
-   if (test->format == PIPE_FORMAT_DXT1_RGBA) {
-      /*
-       * Skip S3TC as packed representation is not canonical.
-       *
-       * TODO: Do a round trip conversion.
-       */
+   /* XXX: this test is broken */
+   if (test->format == PIPE_FORMAT_DXT1_RGBA)
       return TRUE;
-   }
 
    memset(packed, 0, sizeof packed);
    for (i = 0; i < format_desc->block.height; ++i) {
@@ -221,17 +260,7 @@ test_format_pack_float(const struct util_format_description *format_desc,
 
    format_desc->pack_float(packed, 0, &unpacked[0][0][0], sizeof unpacked[0], format_desc->block.width, format_desc->block.height);
 
-   success = TRUE;
-   for (i = 0; i < format_desc->block.bits/8; ++i)
-      if ((test->packed[i] & test->mask[i]) != (packed[i] & test->mask[i]))
-         success = FALSE;
-
-   if (!success) {
-      print_packed(format_desc, "FAILED: ", packed, " obtained\n");
-      print_packed(format_desc, "        ", test->packed, " expected\n");
-   }
-
-   return success;
+   return print_packed_results(format_desc, test, packed);
 }
 
 
@@ -266,29 +295,26 @@ test_format_unpack_8unorm(const struct util_format_description *format_desc,
    uint8_t unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } };
    uint8_t expected[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } };
    unsigned i, j, k;
-   boolean success;
+   float max_error;
 
    format_desc->unpack_8unorm(&unpacked[0][0][0], sizeof unpacked[0], test->packed, 0, 1, 1);
 
    convert_float_to_8unorm(&expected[0][0][0], &test->unpacked[0][0][0]);
 
-   success = TRUE;
    for (i = 0; i < format_desc->block.height; ++i) {
       for (j = 0; j < format_desc->block.width; ++j) {
-         for (k = 0; k < 4; ++k) {
-            if (expected[i][j][k] != unpacked[i][j][k]) {
-               success = FALSE;
-            }
-         }
+         for (k = 0; k < 4; ++k)
+            max_error = MAX2(max_error, byte_error(expected[i][j][k], unpacked[i][j][k]));
       }
    }
 
-   if (!success) {
+   if (!print_max_error(format_desc, max_error)) {
       print_unpacked_8unorm(format_desc, "FAILED: ", unpacked, " obtained\n");
       print_unpacked_8unorm(format_desc, "        ", expected, " expected\n");
+      return FALSE;
    }
 
-   return success;
+   return TRUE;
 }
 
 
@@ -298,17 +324,10 @@ test_format_pack_8unorm(const struct util_format_description *format_desc,
 {
    uint8_t unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4];
    uint8_t packed[UTIL_FORMAT_MAX_PACKED_BYTES];
-   unsigned i;
-   boolean success;
 
-   if (test->format == PIPE_FORMAT_DXT1_RGBA) {
-      /*
-       * Skip S3TC as packed representation is not canonical.
-       *
-       * TODO: Do a round trip conversion.
-       */
+   /* XXX: this test is broken */
+   if (test->format == PIPE_FORMAT_DXT1_RGBA)
       return TRUE;
-   }
 
    if (!convert_float_to_8unorm(&unpacked[0][0][0], &test->unpacked[0][0][0])) {
       /*
@@ -321,17 +340,7 @@ test_format_pack_8unorm(const struct util_format_description *format_desc,
 
    format_desc->pack_8unorm(packed, 0, &unpacked[0][0][0], sizeof unpacked[0], 1, 1);
 
-   success = TRUE;
-   for (i = 0; i < format_desc->block.bits/8; ++i)
-      if ((test->packed[i] & test->mask[i]) != (packed[i] & test->mask[i]))
-         success = FALSE;
-
-   if (!success) {
-      print_packed(format_desc, "FAILED: ", packed, " obtained\n");
-      print_packed(format_desc, "        ", test->packed, " expected\n");
-   }
-
-   return success;
+   return print_packed_results(format_desc, test, packed);
 }
 
 
-- 
1.7.0.1.147.g6d84b

Re: [Mesa3d-dev] [PATCH-RFC] st/mesa: Add GL_OES_EGL_image support

From: Jakob B. <wal...@gm...> - 2010-04-03 15:51:31

On Sun, Mar 28, 2010 at 6:13 PM, Chia-I Wu <ol...@gm...> wrote:
> Hi Jakob,
>
> This patch series adds support for GL_OES_EGL_image to st/mesa.  The first
> patch implements st_manager::get_egl_image in st/egl.  The hook is used to
> check and return an st_egl_image, which describes an EGLImageKHR.  The second
> patch implements GL_OES_EGL_image in st/mesa, and the last patch adds a demo
> for the new functionality.  I've tested it with egl_x11_i915.so, but it should
> work with other hardware drivers.
>
> Do you mind having a look at the patches, especially the first one?  I'd like
> to hear your opinions before merging the patches, and going on to work on
> EGLImage support in st/dri.

Hi Chia-I

Terribly sorry for taking this long to reply. The patches look good go
ahead and commit. Regarding EGLImage in st/dri don't let me stop you
if you have a itch to do it. If I get time over sometime I'll ask you
then if you have done anything.

And again thanks for the work hard work!

Cheers Jakob.

Re: [Mesa3d-dev] How do we init half float tables?

From: Luca B. <lu...@lu...> - 2010-04-03 15:31:53

For instance, the DXT1 test is wrong.

The red values used are:
33
93
153
214

99 - 33 = 60
153 - 93 = 60
214 - 153 = 61

213 should be used instead (i.e. 0xd5 instead 0xd6)

Re: [Mesa3d-dev] How do we init half float tables?

From: Luca B. <lu...@lu...> - 2010-04-03 15:21:06

They are not passing for me with current master and a 32-bit system:

Here are the failures:

Testing util_format_dxt1_rgb_pack_8unorm ...
FAILED: f2 d7 90 20 ae 2c 6f 97 obtained
        f2 d7 b0 20 ae 2c 6f 97 expected

Testing util_format_dxt5_rgba_pack_8unorm ...
FAILED: f7 10 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a obtained
        f8 11 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a expected

Testing util_format_dxt1_rgb_unpack_8unorm ...
FAILED: {0x99, 0xb0, 0x8e, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99,
0xb0, 0x8e, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff},
{0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94,
0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99,
0xb0, 0x8e, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x5d, 0x62, 0x89, 0xff},
{0x21, 0x14, 0x84, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x99, 0xb0, 0x8e,
0xff} obtained
        {0x98, 0xaf, 0x8e, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98,
0xaf, 0x8e, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff},
{0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94,
0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98,
0xaf, 0x8e, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x5c, 0x62, 0x88, 0xff},
{0x21, 0x13, 0x84, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x98, 0xaf, 0x8e,
0xff} expected

Testing util_format_dxt1_rgba_unpack_8unorm ...
FAILED: {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e,
0xaa, 0x90, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff},
{0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90,
0xff}, {0x73, 0x55, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00,
0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e, 0xaa, 0x90, 0xff},
{0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90,
0xff} obtained
        {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e,
0xa9, 0x8f, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff},
{0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f,
0xff}, {0x73, 0x54, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00,
0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e, 0xa9, 0x8f, 0xff},
{0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f,
0xff} expected

Testing util_format_dxt3_rgba_unpack_8unorm ...
FAILED: {0x6d, 0xc6, 0x96, 0x77}, {0x6d, 0xc6, 0x96, 0xee}, {0x6d,
0xc6, 0x96, 0xaa}, {0x8c, 0xff, 0xb5, 0x44}, {0x6d, 0xc6, 0x96, 0xff},
{0x6d, 0xc6, 0x96, 0x88}, {0x31, 0x55, 0x5a, 0x66}, {0x6d, 0xc6, 0x96,
0x99}, {0x31, 0x55, 0x5a, 0xbb}, {0x31, 0x55, 0x5a, 0x55}, {0x31,
0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xcc}, {0x6d, 0xc6, 0x96, 0xcc},
{0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x44}, {0x31, 0x55, 0x5a,
0x88} obtained
        {0x6c, 0xc6, 0x96, 0x77}, {0x6c, 0xc6, 0x96, 0xee}, {0x6c,
0xc6, 0x96, 0xa9}, {0x8c, 0xff, 0xb5, 0x43}, {0x6c, 0xc6, 0x96, 0xff},
{0x6c, 0xc6, 0x96, 0x87}, {0x31, 0x54, 0x5a, 0x66}, {0x6c, 0xc6, 0x96,
0x98}, {0x31, 0x54, 0x5a, 0xba}, {0x31, 0x54, 0x5a, 0x54}, {0x31,
0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xcc}, {0x6c, 0xc6, 0x96, 0xcc},
{0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x43}, {0x31, 0x54, 0x5a,
0x87} expected

Testing util_format_dxt5_rgba_unpack_8unorm ...
FAILED: {0x6d, 0xc6, 0x96, 0x74}, {0x6d, 0xc6, 0x96, 0xf8}, {0x6d,
0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6d, 0xc6, 0x96, 0xf8},
{0x6d, 0xc6, 0x96, 0x95}, {0x31, 0x55, 0x5a, 0x53}, {0x6d, 0xc6, 0x96,
0x95}, {0x31, 0x55, 0x5a, 0xb6}, {0x31, 0x55, 0x5a, 0x53}, {0x31,
0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xd7}, {0x6d, 0xc6, 0x96, 0xb6},
{0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x32}, {0x31, 0x55, 0x5a,
0x95} obtained
        {0x6c, 0xc6, 0x96, 0x73}, {0x6c, 0xc6, 0x96, 0xf7}, {0x6c,
0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6c, 0xc6, 0x96, 0xf7},
{0x6c, 0xc6, 0x96, 0x95}, {0x31, 0x54, 0x5a, 0x53}, {0x6c, 0xc6, 0x96,
0x95}, {0x31, 0x54, 0x5a, 0xb6}, {0x31, 0x54, 0x5a, 0x53}, {0x31,
0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xd7}, {0x6c, 0xc6, 0x96, 0xb6},
{0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x31}, {0x31, 0x54, 0x5a,
0x95} expected

Compiling libtxc_dxtn with -O0 or with -march=core2 -msse2
-mfpmath=sse did not make them work.

As you can see the tests seem mostly off-by-one, which makes me think
of an approximation problem.

libtxc_dxtn seems to take 8-bit input instead of floating point input,
so and it seems to be inherently hard to get it to roundtrip sensibly.

Since only integer-coordinate points can be used, they are unlikely to
be exactly on a line unless specifically crafted to be so.

Thus, a possible solution could be to actually pick a starting color,
pick an increment, and generate an exact line by adding multiples of
that increment to the starting color.

Re: [Mesa3d-dev] gallium + dri2 front buffer readback

From: Chia-I Wu <ol...@gm...> - 2010-04-03 14:19:22

On Sat, Apr 3, 2010 at 3:11 PM, Dave Airlie <ai...@gm...> wrote:
> The piglit read-front.c test is failing and the rabbits warren that is
> front buffer rendering in mesa st + dri st isn't helping me solve it.
> One thing I noticed was check_create_front_buffers is called in a
> number of places in the st, however it seems to never be used, as we
> call st_manager_add_color_renderbuffer moments before and that sets up
> the buffer.
> so
>  if (fb->Attachment[frontIndex].Renderbuffer == NULL) {
> this always fails and we never do any of that stuff.
> Maybe someone has a clue on how this is meant to work and I can implement that.
DRI drivers use st_manager_add_color_renderbuffer path.
check_create_front_buffers is no-op for them.  The latter is used by st/wgl,
which still uses st_public.h.

i915g passes the read-front test on my 945GM laptop.  The failure could be that
some states are not correctly invalidated in st_manager_add_color_renderbuffer
and r300g (I assume this is your platform) could not reflect the change.

-- 
ol...@Lu...

Re: [Mesa3d-dev] How do we init half float tables?

From: Jose F. <jfo...@vm...> - 2010-04-03 10:15:48

Thanks Luca.

Concerning u_format_test.c, I'm not sure the problem is lossiness or ambivalence in the format or a bug in the compressor, but there was logic in u_format_test.c to skip the DXT1_RGBA packing -- all other tests were passing. Lossiness by itself doesn't explain the test failure because we're feeding to the compressor the RGBA data that resulted from decompressing. Given that DXT compression works by interpolating colors in a line segment of the RGB color space, when re-feeding the decompressed output to the compressor it should quickly find the line as all colors points lie exactly on it. "Exactly" is a too string word, as there is rounding, which could be in the root of the differences.

Jose
________________________________________
From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...]
Sent: Saturday, April 03, 2010 1:48
To: Jose Fonseca
Cc: mes...@li...
Subject: Re: [Mesa3d-dev] How do we init half float tables?

The s3tc-teximage test seems fixed by the two line change I put in
gallium-util-format-is-supported.

s3tc-texsubimage prints:
Mesa: User error: GL_INVALID_VALUE in glTexSubImage2D(xoffset+width)
Probe at (285,12)
  Expected: 1.000000 0.000000 0.000000
  Observed: 0.000000 0.000000 0.000000

which seems to be due to a Mesa or testcase bug.

As for u_format_test.c, it looks like it simply fails to account for
DXTn being lossy.

Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

From: Corbin S. <mos...@gm...> - 2010-04-03 09:23:36

On Sat, Apr 3, 2010 at 3:31 PM, Tom Stellard <tst...@gm...> wrote:
> Hi,
>
> I have completed a first draft of my Google Summer of Code
> proposal, and I would appreciate feedback from some of the
> Mesa developers.  I have included the project plan from my
> proposal in this email, and you can also view my full proposal here:
> http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/tstellar/t126997450856
> However, I think you will need a google login to view it.
>
> Project Tasks:
>
> 1. Enable branch emulation for Gallium drivers:
> The goal of this task will be to create an optional "optimization" pass
> over the TGSI code to translate branch instructions into instructions
> that are supported by cards without hardware branching.  The basic
> strategy for doing this translation will be:
>
> A. Copy values of in scope variables
> to a temporary location before executing the conditional statement.
>
> B. Execute the "if true" branch.
>
> C. Test the conditional expression.  If it evaluates to false, rollback
> all values that were modified in the "if true" branch.
>
> D. Repeat step 2 with the "if false" branch, and then step 3, but this
> time only rollback if the conditional expression evaluates to true.
>
> The TGSI instructions SLT, SNE, SGE, SEQ will be used to test the
> conditional expression and the instruction CND will be used to rollback
> the values.
>
> There will be two phases to this task.  For phase 1, I will implement a
> simple translator that will be able to translate the branch instructions
> with only one pass through the TGSI code.  This simple translator will
> copy all in scope variables to a temporary location before executing the
> conditional statement, even if those variables will not not be modified
> in either of the branches.
>
> Phase 2 will add a preliminary pass before to the code translation
> pass that will mark variables that might be modified by the conditional
> statement.  Then, during the translation pass, only the variables that
> could potentially be modified inside either of the conditional branches
> will be copied before the conditional statement is executed.
>
> 2. Unroll loops for Gallium drivers:
> The goal of this task will be to unroll loops so that they can be
> executed by hardware that does not support them.  The loop unrolling
> will be done in the same "optimization" pass as the branch emulation.
> Loops where the number of iterations is known at compile time will be
> unrolled and may have additional optimizations applied.  Loops that
> have an unknown number of iterations, will have to be studied to see
> if there is a way to replace the loop with a set of instructions that
> produces the same output as the loop.  For example, one solution might
> be to replace an ADD(src0, src0) instruction that is supposed to execute
> n times with a MUL(src0, n). It is possible that not all loops will be
> able to be unrolled successfully.
>
> These first two tasks are important not only for older cards that do not
> support hardware branching, but newer cards as well.  Driver developers
> will not need to use every hardware instruction to compile shaders
> with branches and loops, so they could use the branch emulation as a
> temporary solution while hardware support for branching and loops is
> being worked on.
>
> 3. Loops and Conditionals for R500 fragment and vertex shaders:
> The goal of this task will be to make use of the R500 hardware support for
> branches and loops.  New radeon_compiler opcodes (RC_OPCODE_*) will need
> to be added to represent loops, and the corresponding TGSI instructions
> will need to be converted into these new opcodes during the TGSI_OPCODE_*
> to RC_OPCODE_* phase.  Once this has been done, the code generator for
> R500 vertex and fragment shaders will need to be modified to output the
> correct hardware instructions for loops.
>
> 4. More compiler optimizations / other GLSL features:
> This is an optional task that will allow me to revisit the work from the
> previous tasks and explore doing some optimizations I may have wanted to
> do, but were outside the scope of those tasks.  If there are no obvious
> optimizations to be done, this time could be spent implementing some
> other GLSL features for the R300 driver, possible ideas include:
>
> Adding support for the gl_FrontFacing variable.
> Handling varying modifiers like perspective, flat, and centroid.
> Improving the GLSL frontend to add support for more language features.
>
> Schedule / Deliverables:
> 1. Enable branch emulation for Gallium drivers (4 weeks)
> 2. Unroll loops for Gallium drivers (2 - 3 weeks)
> Midterm Evaluation
> 3. Loops and Conditionals for R500 fragment and vertex shaders (4 weeks)
> 4. More compiler optimizations / other GLSL features (2 weeks)
>
> Tasks 1-3 will be required for this project.
> Task 4 is optional.
>
> Thank you.

Wow! Looks like you're certainly on the right track and you've been
doing your research.

I would say that the first two items on your list would be fine as a
complete project. TGSI streams are tricky to modify, and you may find
that you have to write more and more TGSI-specific code as you dig in.
(For example, there are no helpers for strength reduction in TGSI
yet.)

I'll wait for everybody else to chime in, but it looks good so far.

~ C.

-- 
When the facts change, I change my mind. What do you do, sir? ~ Keynes

Corbin Simpson
<Mos...@gm...>

[Mesa3d-dev] [Bug 26666] In GTA VC objects are drawn in incorrect Z order

From: <bug...@fr...> - 2010-04-03 08:22:23

https://bugs.freedesktop.org/show_bug.cgi?id=26666

Ruslan <b7....@gm...> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #1 from Ruslan <b7....@gm...> 2010-04-03 01:22:15 PDT ---
Fixed in 7.7.1

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

From: Tom S. <tst...@gm...> - 2010-04-03 07:33:33

Hi,

I have completed a first draft of my Google Summer of Code
proposal, and I would appreciate feedback from some of the
Mesa developers. I have included the project plan from my
proposal in this email, and you can also view my full proposal here:
http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/tstellar/t126997450856
However, I think you will need a google login to view it.

Project Tasks:

1. Enable branch emulation for Gallium drivers:
The goal of this task will be to create an optional "optimization" pass
over the TGSI code to translate branch instructions into instructions
that are supported by cards without hardware branching. The basic
strategy for doing this translation will be:

A. Copy values of in scope variables
to a temporary location before executing the conditional statement.

B. Execute the "if true" branch.

C. Test the conditional expression. If it evaluates to false, rollback
all values that were modified in the "if true" branch.

D. Repeat step 2 with the "if false" branch, and then step 3, but this
time only rollback if the conditional expression evaluates to true.

The TGSI instructions SLT, SNE, SGE, SEQ will be used to test the
conditional expression and the instruction CND will be used to rollback
the values.

There will be two phases to this task. For phase 1, I will implement a
simple translator that will be able to translate the branch instructions
with only one pass through the TGSI code. This simple translator will
copy all in scope variables to a temporary location before executing the
conditional statement, even if those variables will not not be modified
in either of the branches.

Phase 2 will add a preliminary pass before to the code translation
pass that will mark variables that might be modified by the conditional
statement. Then, during the translation pass, only the variables that
could potentially be modified inside either of the conditional branches
will be copied before the conditional statement is executed.

2. Unroll loops for Gallium drivers:
The goal of this task will be to unroll loops so that they can be
executed by hardware that does not support them. The loop unrolling
will be done in the same "optimization" pass as the branch emulation.
Loops where the number of iterations is known at compile time will be
unrolled and may have additional optimizations applied. Loops that
have an unknown number of iterations, will have to be studied to see
if there is a way to replace the loop with a set of instructions that
produces the same output as the loop. For example, one solution might
be to replace an ADD(src0, src0) instruction that is supposed to execute
n times with a MUL(src0, n). It is possible that not all loops will be
able to be unrolled successfully.

These first two tasks are important not only for older cards that do not
support hardware branching, but newer cards as well. Driver developers
will not need to use every hardware instruction to compile shaders
with branches and loops, so they could use the branch emulation as a
temporary solution while hardware support for branching and loops is
being worked on.

3. Loops and Conditionals for R500 fragment and vertex shaders:
The goal of this task will be to make use of the R500 hardware support for
branches and loops. New radeon_compiler opcodes (RC_OPCODE_*) will need
to be added to represent loops, and the corresponding TGSI instructions
will need to be converted into these new opcodes during the TGSI_OPCODE_*
to RC_OPCODE_* phase. Once this has been done, the code generator for
R500 vertex and fragment shaders will need to be modified to output the
correct hardware instructions for loops.

4. More compiler optimizations / other GLSL features:
This is an optional task that will allow me to revisit the work from the
previous tasks and explore doing some optimizations I may have wanted to
do, but were outside the scope of those tasks. If there are no obvious
optimizations to be done, this time could be spent implementing some
other GLSL features for the R300 driver, possible ideas include:

Adding support for the gl_FrontFacing variable.
Handling varying modifiers like perspective, flat, and centroid.
Improving the GLSL frontend to add support for more language features.

Schedule / Deliverables:
1. Enable branch emulation for Gallium drivers (4 weeks)
2. Unroll loops for Gallium drivers (2 - 3 weeks)
Midterm Evaluation
3. Loops and Conditionals for R500 fragment and vertex shaders (4 weeks)
4. More compiler optimizations / other GLSL features (2 weeks)

Tasks 1-3 will be required for this project.
Task 4 is optional.

Thank you.

-Tom Stellard

[Mesa3d-dev] gallium + dri2 front buffer readback

From: Dave A. <ai...@gm...> - 2010-04-03 07:11:16

The piglit read-front.c test is failing and the rabbits warren that is
front buffer rendering in mesa st + dri st isn't helping me solve it.

One thing I noticed was check_create_front_buffers is called in a
number of places in the st, however it seems to never be used, as we
call st_manager_add_color_renderbuffer moments before and that sets up
the buffer.

so
 if (fb->Attachment[frontIndex].Renderbuffer == NULL) {

this always fails and we never do any of that stuff.

Maybe someone has a clue on how this is meant to work and I can implement that.

Dave.

Re: [Mesa3d-dev] gallium-resources branch merge

From: Marek O. <ma...@gm...> - 2010-04-03 06:23:13

There's something fishy in u_upload_mgr, could you please review the first
two patches here?
http://cgit.freedesktop.org/~mareko/mesa/log/?h=gallium-resources

With this, r300g works again.

-Marek

On Fri, Apr 2, 2010 at 4:17 PM, Roland Scheidegger <sr...@vm...>wrote:

> I'm planning on merging the gallium-resources branch shortly (after
> easter). Due to the amount of code changed, it wouldn't be unexpected if
> some drivers break here and there. So it would be nice if the respective
> driver authors could take a look at that branch now.
>
> If you've missed the discussion about this branch and what this is
> about, here it is:
>
> http://www.mail-archive.com/mes...@li.../msg12726.html
>
> I've also removed the video interfaces completely, as they weren't
> ported to the interface changes and actually some of the video code
> missed some earlier interface changes so didn't build anyway. Video
> related work should be done on pipe-video branch which had newer stuff
> (for video) already.
>
>
> Roland
>
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Mesa3d-dev mailing list
> Mes...@li...
> https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
>

[Mesa3d-dev] ARB draw buffers + texenv program

From: Dave A. <ai...@gm...> - 2010-04-03 03:43:12

Attachments: 0001-texenvprogram-fix-for-ARB_draw_buffers.patch

Just going down the r300g piglit failures and noticed fbo-drawbuffers
failed, I've no idea
if this passes on Intel hw, but it appears the texenvprogram really
needs to understand the
draw buffers. The attached patch fixes it here for me on r300g anyone
want to test this on Intel
with the piglit test before/after?

Dave.

Re: [Mesa3d-dev] How do we init half float tables?

From: Luca B. <lu...@lu...> - 2010-04-03 00:48:15

The s3tc-teximage test seems fixed by the two line change I put in
gallium-util-format-is-supported.

s3tc-texsubimage prints:
Mesa: User error: GL_INVALID_VALUE in glTexSubImage2D(xoffset+width)
Probe at (285,12)
  Expected: 1.000000 0.000000 0.000000
  Observed: 0.000000 0.000000 0.000000

which seems to be due to a Mesa or testcase bug.

As for u_format_test.c, it looks like it simply fails to account for
DXTn being lossy.

Re: [Mesa3d-dev] How do we init half float tables?

From: Jose F. <jfo...@vm...> - 2010-04-03 00:28:42

OK, I can relate with your reasoning. It's no biggie.

Jose

________________________________________
From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...]
Sent: Saturday, April 03, 2010 1:23
To: Jose Fonseca
Cc: mes...@li...
Subject: Re: [Mesa3d-dev] How do we init half float tables?

> One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak.  Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me.

Yes sorry, it was an attempt to fix breakage originally caused by code
of mine that was sent out in a non-fully-mergeable state (to prevent
duplicate work on half float conversion) and got merged anyway.

Since master was already broken (due to u_gctors.cpp not being picked
up by ld), it seemed a good idea to try to fix it.

Unfortunately what seemed to be an easy fix gradually became something
much more invasive than originally envisioned.

After realizing the util_format_init thing wouldn't work out, I should
have made these call util_format_s3tc_init again (was changed so they
would init util_half as well) and then sent the util_foramt changes
for review.

I added a gallium-util-format-is-supported branch to hold the work and
the fix I just sent.
Sorry for not doing that in the first place.

Re: [Mesa3d-dev] How do we init half float tables?

From: Jose F. <jfo...@vm...> - 2010-04-03 00:26:11

Probably the problems are just as you describe. But I'll be offline soon so I'll only review this and all your other changes carefully another day.

Jose

________________________________________
From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...]
Sent: Saturday, April 03, 2010 1:08
To: Jose Fonseca
Cc: Brian Paul; mes...@li...
Subject: Re: [Mesa3d-dev] How do we init half float tables?

Sorry for the regression.
This whole thing was done to fix the u_gctors.cpp issue, originally
done by me, sent out without full testing since I saw duplicate work
being done, and then merged by Roland if I recall correctly.
I probably should not have fixed s3tc/util_format like it was done for
u_half and instead put it in a branch and sent it to the ML first.

Note that everything that reads pixels and does not call
util_format_s3tc_init (e.g. I think rbug tools) needs something like
this, or an explicit call which is likely to be forgotten (even
finding out everything that ends up calling util_format is
nontrivial).

Anyway, this patch fixes a couple of bugs that may have caused the regression.

How can I reproduce it locally?

The DXTn unit tests do fail, but the values have usually a difference
of 1, so I assume it's an approximation error.

commit 80214ef6265d406496dc4fd3c76d8ac782cd012b
Author: Luca Barbieri <lu...@lu...>
Date:   Sat Apr 3 01:55:27 2010 +0200

    gallium/util: fix inverted if is_nop logic in s3tc

diff --git a/src/gallium/auxiliary/util/u_format_s3tc.c
b/src/gallium/auxiliary/util/u_format_s3tc.c
index d48551f..7808210 100644
--- a/src/gallium/auxiliary/util/u_format_s3tc.c
+++ b/src/gallium/auxiliary/util/u_format_s3tc.c
@@ -303,7 +303,7 @@ util_format_dxt3_rgba_unpack_8unorm(uint8_t
*dst_row, unsigned dst_stride, const
 void
 util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned
dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned
width, unsigned height)
 {
-   if (is_nop(util_format_dxt5_rgba_fetch)) {
+   if (!is_nop(util_format_dxt5_rgba_fetch)) {
       unsigned x, y, i, j;
       for(y = 0; y < height; y += 4) {
          const uint8_t *src = src_row;
@@ -324,7 +324,7 @@ util_format_dxt5_rgba_unpack_8unorm(uint8_t
*dst_row, unsigned dst_stride, const
 void
 util_format_dxt1_rgb_unpack_float(float *dst_row, unsigned
dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned
width, unsigned height)
 {
-   if (is_nop(util_format_dxt1_rgb_fetch)) {
+   if (!is_nop(util_format_dxt1_rgb_fetch)) {
       unsigned x, y, i, j;
       for(y = 0; y < height; y += 4) {
          const uint8_t *src = src_row;

Re: [Mesa3d-dev] How do we init half float tables?

From: Luca B. <lu...@lu...> - 2010-04-03 00:23:31

> One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak.  Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me.

Yes sorry, it was an attempt to fix breakage originally caused by code
of mine that was sent out in a non-fully-mergeable state (to prevent
duplicate work on half float conversion) and got merged anyway.

Since master was already broken (due to u_gctors.cpp not being picked
up by ld), it seemed a good idea to try to fix it.

Unfortunately what seemed to be an easy fix gradually became something
much more invasive than originally envisioned.

After realizing the util_format_init thing wouldn't work out, I should
have made these call util_format_s3tc_init again (was changed so they
would init util_half as well) and then sent the util_foramt changes
for review.

I added a gallium-util-format-is-supported branch to hold the work and
the fix I just sent.
Sorry for not doing that in the first place.

Re: [Mesa3d-dev] How do we init half float tables?

From: Jose F. <jfo...@vm...> - 2010-04-03 00:23:08

Both ways are useful: single pixel decompression for texture sampling, whole block for whole image conversions.

Jose
________________________________________
From: Roland Scheidegger [sr...@vm...]
Sent: Friday, April 02, 2010 17:27
To: Luca Barbieri
Cc: Jose Fonseca; mes...@li...
Subject: Re: [Mesa3d-dev] How do we init half float tables?

On 02.04.2010 17:09, Luca Barbieri wrote:
> Additionally, the S3TC library may now support only a subset of the
> formats. This may be even more useful as further compressed formats
> are added.

FWIW, I don't see any new s3tc formats. rgtc will not be handled by s3tc
library since it isn't patent encumbered. util_format_is_s3tc will not
include rgtc formats.
(Though I guess that external decoding per-pixel is really rather lame,
should do it per-block...)

Roland

Re: [Mesa3d-dev] How do we init half float tables?

From: Jose F. <jfo...@vm...> - 2010-04-03 00:12:43

u_format_test started failing and it was not one day ago. Vinson reported some texture compression tests that just got working with my recent changes started to failing again. 

I'm not sure if it's the constructor mechanism, my platform (64bit), or some bug in the code. I just reverted all your recent util format changes. Not all look bad but I just don't have the time to separate the baby from the water. Sorry. I'll cherry pick some of them after I have more time to review and test them.

One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak.  Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me. Either:
- send me an email and buy in my support before implementing
- send a patch of the implementation changes so that I can review
- implement in a feature branch
- or, if you think I'm unreasonable, just make a fork of the whole thing and do whatever you like without breaking the existing code that relies on it.

master branch should be broken as little as possible as there is a lot of automated/manual testing going on that depends upon it. And going over and modifying code I just commited hinders my progress.

Jose

________________________________________
From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...]
Sent: Saturday, April 03, 2010 0:50
To: Jose Fonseca
Cc: Brian Paul; mes...@li...
Subject: Re: [Mesa3d-dev] How do we init half float tables?

What are you seeing a regression on?
texcompress and texcompsub seemed to work for me: I'll try to test
something else and recheck the code.

Re: [Mesa3d-dev] How do we init half float tables?

From: Luca B. <lu...@lu...> - 2010-04-03 00:08:47

Sorry for the regression.
This whole thing was done to fix the u_gctors.cpp issue, originally
done by me, sent out without full testing since I saw duplicate work
being done, and then merged by Roland if I recall correctly.
I probably should not have fixed s3tc/util_format like it was done for
u_half and instead put it in a branch and sent it to the ML first.

Note that everything that reads pixels and does not call
util_format_s3tc_init (e.g. I think rbug tools) needs something like
this, or an explicit call which is likely to be forgotten (even
finding out everything that ends up calling util_format is
nontrivial).

Anyway, this patch fixes a couple of bugs that may have caused the regression.

How can I reproduce it locally?

The DXTn unit tests do fail, but the values have usually a difference
of 1, so I assume it's an approximation error.

commit 80214ef6265d406496dc4fd3c76d8ac782cd012b
Author: Luca Barbieri <lu...@lu...>
Date:   Sat Apr 3 01:55:27 2010 +0200

    gallium/util: fix inverted if is_nop logic in s3tc

diff --git a/src/gallium/auxiliary/util/u_format_s3tc.c
b/src/gallium/auxiliary/util/u_format_s3tc.c
index d48551f..7808210 100644
--- a/src/gallium/auxiliary/util/u_format_s3tc.c
+++ b/src/gallium/auxiliary/util/u_format_s3tc.c
@@ -303,7 +303,7 @@ util_format_dxt3_rgba_unpack_8unorm(uint8_t
*dst_row, unsigned dst_stride, const
 void
 util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned
dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned
width, unsigned height)
 {
-   if (is_nop(util_format_dxt5_rgba_fetch)) {
+   if (!is_nop(util_format_dxt5_rgba_fetch)) {
       unsigned x, y, i, j;
       for(y = 0; y < height; y += 4) {
          const uint8_t *src = src_row;
@@ -324,7 +324,7 @@ util_format_dxt5_rgba_unpack_8unorm(uint8_t
*dst_row, unsigned dst_stride, const
 void
 util_format_dxt1_rgb_unpack_float(float *dst_row, unsigned
dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned
width, unsigned height)
 {
-   if (is_nop(util_format_dxt1_rgb_fetch)) {
+   if (!is_nop(util_format_dxt1_rgb_fetch)) {
       unsigned x, y, i, j;
       for(y = 0; y < height; y += 4) {
          const uint8_t *src = src_row;

102 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 12 13 14 15 16 .. 858 > >> (Page 14 of 858)