You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
(10) |
Apr
(28) |
May
(41) |
Jun
(91) |
Jul
(63) |
Aug
(45) |
Sep
(37) |
Oct
(80) |
Nov
(91) |
Dec
(47) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(48) |
Feb
(121) |
Mar
(126) |
Apr
(16) |
May
(85) |
Jun
(84) |
Jul
(115) |
Aug
(71) |
Sep
(27) |
Oct
(33) |
Nov
(15) |
Dec
(71) |
2002 |
Jan
(73) |
Feb
(34) |
Mar
(39) |
Apr
(135) |
May
(59) |
Jun
(116) |
Jul
(93) |
Aug
(40) |
Sep
(50) |
Oct
(87) |
Nov
(90) |
Dec
(32) |
2003 |
Jan
(181) |
Feb
(101) |
Mar
(231) |
Apr
(240) |
May
(148) |
Jun
(228) |
Jul
(156) |
Aug
(49) |
Sep
(173) |
Oct
(169) |
Nov
(137) |
Dec
(163) |
2004 |
Jan
(243) |
Feb
(141) |
Mar
(183) |
Apr
(364) |
May
(369) |
Jun
(251) |
Jul
(194) |
Aug
(140) |
Sep
(154) |
Oct
(167) |
Nov
(86) |
Dec
(109) |
2005 |
Jan
(176) |
Feb
(140) |
Mar
(112) |
Apr
(158) |
May
(140) |
Jun
(201) |
Jul
(123) |
Aug
(196) |
Sep
(143) |
Oct
(165) |
Nov
(158) |
Dec
(79) |
2006 |
Jan
(90) |
Feb
(156) |
Mar
(125) |
Apr
(146) |
May
(169) |
Jun
(146) |
Jul
(150) |
Aug
(176) |
Sep
(156) |
Oct
(237) |
Nov
(179) |
Dec
(140) |
2007 |
Jan
(144) |
Feb
(116) |
Mar
(261) |
Apr
(279) |
May
(222) |
Jun
(103) |
Jul
(237) |
Aug
(191) |
Sep
(113) |
Oct
(129) |
Nov
(141) |
Dec
(165) |
2008 |
Jan
(152) |
Feb
(195) |
Mar
(242) |
Apr
(146) |
May
(151) |
Jun
(172) |
Jul
(123) |
Aug
(195) |
Sep
(195) |
Oct
(138) |
Nov
(183) |
Dec
(125) |
2009 |
Jan
(268) |
Feb
(281) |
Mar
(295) |
Apr
(293) |
May
(273) |
Jun
(265) |
Jul
(406) |
Aug
(679) |
Sep
(434) |
Oct
(357) |
Nov
(306) |
Dec
(478) |
2010 |
Jan
(856) |
Feb
(668) |
Mar
(927) |
Apr
(269) |
May
(12) |
Jun
(13) |
Jul
(6) |
Aug
(8) |
Sep
(23) |
Oct
(4) |
Nov
(8) |
Dec
(11) |
2011 |
Jan
(4) |
Feb
(2) |
Mar
(3) |
Apr
(9) |
May
(6) |
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: tom f. <tf...@al...> - 2010-04-03 19:34:48
|
Vinson Lee <vl...@vm...> writes: > Leopard uses gcc-4.0, which didn't have built-in support for atomic > variables. u_atomic.h should probably check for a supported compiler; Jeremy, does the attached patch produce an understandable error instead of a link error? In terms of a solution, Jeremy, you could implement PPC assembly for the few primitives available there. Looks easy for someone who knows PPC well. There's a comment in that file that mentions a mutex-based implementation... but then I don't see one. Looks like there's some dri radeon code which is using the gcc primitives directly instead of through the gallium wrapper. I'm not familiar enough w/ it to know if that's correct or not (.. anyway, you're probably not building dri/radeon on OS X, right?). -tom > ________________________________________ > From: Jeremy Huddleston [jer...@fr...] > Sent: Saturday, April 03, 2010 11:22 AM > To: mes...@li... > Subject: [Mesa3d-dev] gallium failing to build on darwin/ppc > > Is there any known reason why gallium would fail to build on darwin/ppc? I h > aven't looked into it myself since I figured there might be an easy answer al > ready > > http://trac.macports.org/ticket/24345 |
From: Jose F. <jfo...@vm...> - 2010-04-03 19:25:32
|
> commit 5126683e3b971ccfb51e50e560750ce44e86bae8 > Author: Luca Barbieri <lu...@lu...> > Date: Fri Apr 2 05:23:32 2010 +0200 > > gallium/util: add util_format_is_supported to check for pack/unpack > > This improves the code by making it more readable, and removes > special knowledge of S3TC and other formats from softpipe. > @@ -92,7 +92,7 @@ def write_format_table(formats): > u_format_pack.generate(formats) > > for format in formats: > - print 'const struct util_format_description' > + print 'struct util_format_description' > print 'util_format_%s_description = {' % (format.short_name(),) > print " %s," % (format.name,) > print " \"%s\"," % (format.name,) I don't agree with this. Making the format description table mutable when the only formats that are potentially unsupported due to patent issues are s3tc variants makes no sense. S3TC formats *are* special. There is nothing to generalize here. > commit 52e9b990a192a9329006d5f7dd2ac222effea5a5 > Author: Luca Barbieri <lu...@lu...> > Date: Fri Apr 2 04:48:42 2010 +0200 > > gallium/util: load s3tc on demand > > This changes the S3TC function pointers to be initialized to stubs > that load the S3TC library and then delegate to the real functions. > > If the S3TC library fails to load, the function pointers are replaced > with a "nop" function. > The code is also changed to attempt to load the library only one time.c > > Note that unlike checking for a flag, this method has no performance > cost at all. > > The use of the "nop" functions also allows to avoid most checks, that > are only preserved when the function does non-trivial work. Replacing the conditionals with a no-op stubs is a good optimization. But attempting to load s3tc shared library from the stubs is unnecessary. Stubs should have an assert(0) -- it is an error to attempt any S3TC (de)compression when there's no support for it. Jose |
From: Luca B. <luc...@gm...> - 2010-04-03 19:09:17
|
As a further example that just came to mind, nv40 (GeForce 6-7 and PS3 RSX) supports control flow in fragment shaders, but does not apparently support the "continue" keyword (since NV_fragment_program2, which maps almost directly to the hardware, does not have it either). I implemented TGSI control flow in a private branch, but did not implement the "continue" keyword. Implementing "continue" requires to transform the code to generate and carry around "should continue" flags, or perform even less trivial transformations including code duplication. Unfortunately, doing requires non-local modifications, and thus would require to do something beyond just scanning the TGSI source code as the nv30/nv40 driver currently does. If there was a TGSI->LLVM->TGSI module, the LLVM->TGSI control flow reconstruction would already handle this, and it would be enough to tell it to not make use of the "continue" instruction: it would then automatically generate the proper if/endif structure, duplicating code and/or introducing flags as needed in a generic way. As things stand now, I'm faced with either just hoping the GLSL programs don't use "continue", implementing an hack in the nv40 shader backend (where such an high-level optimization does not belong at all and can't be done cleanly), or writing the LLVM module myself before tackling this. With an LLVM-based infrastructure, there would be a clear and straightforward way to solve this, will all the supporting infrastructure already available and the ability to create an optimization pass reusable by other drivers that may face the same issue. This is just an example, by the way: others can be found. |
From: Vinson L. <vl...@vm...> - 2010-04-03 18:52:58
|
Leopard uses gcc-4.0, which didn't have built-in support for atomic variables. ________________________________________ From: Jeremy Huddleston [jer...@fr...] Sent: Saturday, April 03, 2010 11:22 AM To: mes...@li... Subject: [Mesa3d-dev] gallium failing to build on darwin/ppc Is there any known reason why gallium would fail to build on darwin/ppc? I haven't looked into it myself since I figured there might be an easy answer already http://trac.macports.org/ticket/24345 ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mes...@li... https://lists.sourceforge.net/lists/listinfo/mesa3d-dev |
From: Luca B. <luc...@gm...> - 2010-04-03 18:37:46
|
This is somewhat nice, but without using a real compiler, the result will still be just a toy, unless you employ hundreds of compiler experts working full time on the project. For instance, Wikipedia lists the following loop optimizations: # loop interchange : These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout. This is also known as loop permutation. # loop splitting/loop peeling : Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop. # loop fusion or loop combining : Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data. # loop fission or loop distribution : Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body. # loop unrolling: Duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches & increased program load time), but requires that the number of iterations be known at compile time (except in the case of JIT compilers). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop. # loop unswitching : Unswitching moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional. # loop inversion : This technique changes a standard while loop into a do/while (a.k.a. repeat/until) loop wrapped in an if conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the if guard can be skipped. # loop-invariant code motion : If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion, because not all code is safe to be hoisted outside the loop. # loop reversal : Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations. Also, certain architectures utilise looping constructs at Assembly language level that count in a single direction only (e.g. decrement-jump-if-not-zero (DJNZ)). # loop tiling/loop blocking : Loop tiling reorganizes a loop to iterate over blocks of data sized to fit in the cache. # loop skewing : Loop skewing takes a nested loop iterating over a multidimensional array, where each iteration of the inner loop depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop. Good luck doing all this on TGSI (especially if the developer does not have serious experience writing production compilers). Also, this does not mention all the other optimizations and analyses required to the above stuff well (likely other 10-20 things). Using a real compiler (e.g. LLVM, but also gcc or Open64), those optimizations are already implemented, or at least there is already a team of experienced compiler developers who are working full time to implement such optimizations, allowing you to then just turn them on without having to do any of the work yourself. Note all "X compiler is bad for VLIW or whatever GPU architecture" objections are irrelevant, since almost all optimizations are totally architecture independent. Also note that we should support OpenCL/compute shaders (already available for *3* years on e.g. nv50) and those *really* need a real compiler (as in, something developed for years by a team of compiler experts, and in wide use). For instance, nVidia uses Open64 to compile CUDA programs, and then feeds back the output (via PTX) to their ad-hoc code generator. Note that unlike Mesa/Gallium, nVidia actually had a working shader optimizer AND a large paid team, yet they still decided to at least partially use Open64. PathScale (who seems to mainly sell an Open64-based compiler for the HPC market) might do some of this work (with a particular focus on a CUDA replacement for nv50), but it's unclear whether this will turn out to generally useful (for all Gallium drivers, as opposed to nv50-only) or not. Also they plan to use Open64 and WHIRL, and it's unclear whether this is as well designed for embedding and easy to understand and customize like LLVM is (please expand of this you know about it) Really, the current code generation situation is totally _embarassing_ (and r300 is probably one of the best here, having its own compiler, and doesn't even have loops, so you can imagine how good the other drivers are), and ought to be fixed in a definitive fashion. This is obviously not achievable if Mesa/Gallium contributors are supposed to write the compiler optimization themselves, since clearly there is not even enough manpower to support a relatively up-to-date version of OpenGL or, say, to have drivers that can allocate and fence GPU memory in a sensible and fast way, or implement hierarchical Z buffers, or any of the other things expected from a decent driver, that the Mesa drivers don't do. In other words, state-of-the-art optimizing compilers are not something one can just pop up and write himself from scratch, unless he is interested and skilled at it, it is his main project AND he manages to attract, or pays, a community of compiler experts to work on it. Since LLVM already works well, has a community of compiler experts working on it, and is funded by companies such as Apple, there is no chance of attracting such a community, especially for something limited to the niche of compiling shaders. And yes, LLVM->TGSI->LLVM is not entirely trivial, but it is doable (obviously), and once you get past that initial hurdle, you get EVERYTHING FOR FREE. And the free work keeps coming with every commit to the llvm repository, and you only have to do the minimal work of updating for LLVM interface changes. So you can just do nothing and after a few months you notice that your driver is faster on very advanced games because a new LLVM automatically improved the quality of your shaders without you even knowing about it. Not to mention that we could then at some point just get rid of TGSI, use LLVM IR directly, and have each driver implement a normal backend if possible. The test for adequateness of a shader compiler is saying "yes, this code is really good: I can't easily come up with any way to improve it", looking at the generated code for any example you can find. Any ad-hoc compiler will most likely immediately fail such a test, for complex examples. So, for a GSoC project, I'd kind of suggest: (1) Adapt the gallivm/llvmpipe TGSI->LLVM converter to also generate AoS code (i.e. RGBA vectors as opposed to RRRR, GGGG, etc.) if possible or write one from scratch otherwise (2) Write a LLVM->TGSI backend, restricted to programs without any control flow (3) Make LLVM->TGSI always work (even with control flow and DDX/DDY) (4) Hook up all useful LLVM optimizations If there is still time/as followup (note that these are mostly complex things, at most one/two might be doable in the timeframe) (5) Do something about uniform-specific shader generation, and support automatically generating "pre-shaders" for the CPU (using the x86/x86-64 LLVM backends) for uniform-only computations (6) Enhance LLVM to provide any missing optimization with a significant impact (7) Convert existing drivers to LLVM backends, or have them expose more functionality to the TGSI backend via TGSI extensions (or currently unused features such as predicate support), and do driver-specific stuff (e.g. scalarization for scalar architectures) (8) Make sure shaders can be compiled using as large as possible a subset of plain C/C++, as well as OpenCL (using clang), and add OpenCL support to Mesa/Gallium (some of it already exists in external repositories) (9) Compare with fglrx and nVidia libGL,/cgc/nvopencc and improve whatever necessary to be equal or better than them (10) Talk with LLVM developers about good VLIW code generation for the Radeons and to a lesser extent nv30/nv40 that need it, and find out exactly what the problem is here, how it can be solved and who could do the work (11) Add Gallium support for nv10/nv20 and r100/r200 using the LLVM DAG instruction selector to code-generate a fixed pipeline (Stephane Marchesin tried this already, seems it is non-trivial but could be made to work partially, and probably enough to get the Xorg state tracker to work on all cards and get rid of all X drivers at some point). (12) Figure out if any other compilers (Open64, gcc, whatever) can be useful as backends for some drivers Maybe I should propose to do it myself though, if that is still possible, since everyone else seems afraid of it for some reason and it seems to me it is absolutely essential to have a chance of having usable (read: that don't look ridiculous compared to the proprietary ones) drivers, especially in the long run for DirectX 11-level and later games and software heavily using OpenCL/compute shaders and very complex tessellation/vertex/geometry/fragment shaders. |
From: Jeremy H. <jer...@fr...> - 2010-04-03 18:22:21
|
Is there any known reason why gallium would fail to build on darwin/ppc? I haven't looked into it myself since I figured there might be an easy answer already http://trac.macports.org/ticket/24345 |
From: Luca B. <lu...@lu...> - 2010-04-03 16:18:32
|
Collect the maximum error for fetch/unpack tests, and ratio of flipped to total bits for pack tests. Add lenient thresholds for S3TC tests. --- progs/gallium/unit/u_format_test.c | 163 +++++++++++++++++++----------------- 1 files changed, 86 insertions(+), 77 deletions(-) diff --git a/progs/gallium/unit/u_format_test.c b/progs/gallium/unit/u_format_test.c index 53e0284..1911dad 100644 --- a/progs/gallium/unit/u_format_test.c +++ b/progs/gallium/unit/u_format_test.c @@ -36,22 +36,48 @@ #include "util/u_format_s3tc.h" +static float +float_error(float x, float y) +{ + return fabsf(y - x); +} + +static float +byte_error(uint8_t x, uint8_t y) +{ + return float_error(x / 255.0, y / 255.0); +} + +/* this is done in this terrible way only because these are unit tests. + * a real implementation must use a lookup table, or the mask/shift/add + * algorithm in the Linux source + * it should also use the builtin/intrinsic if available + */ +static unsigned +popcnt8(uint8_t v) +{ + unsigned i; + unsigned cnt = 0; + for(i = 0; i < 8; ++i) + cnt += ((v >> i) & 1); + return cnt; +} + static boolean -compare_float(float x, float y) +print_max_error(const struct util_format_description *format_desc, float max_error) { - float error = y - x; + if(max_error <= FLT_EPSILON) + return TRUE; - if (error < 0.0f) - error = -error; + printf("MAX ABS ERROR: %f float, %.1f 8scaled\n", max_error, max_error * 255.0); - if (error > FLT_EPSILON) { - return FALSE; - } + /* compression tests aren't currently perfect, so be lenient here */ + if(format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC && max_error < 0.01f) + return TRUE; - return TRUE; + return FALSE; } - static void print_packed(const struct util_format_description *format_desc, const char *prefix, @@ -69,6 +95,31 @@ print_packed(const struct util_format_description *format_desc, printf("%s", suffix); } +static boolean +print_packed_results(const struct util_format_description *format_desc, const struct util_format_test_case *test, uint8_t* packed) +{ + unsigned flipped_bits = 0; + unsigned total_bits = 0; + float flipped_bits_ratio; + unsigned i; + for (i = 0; i < format_desc->block.bits/8; ++i) { + flipped_bits += popcnt8((test->packed[i] ^ packed[i]) & test->mask[i]); + total_bits += popcnt8(test->mask[i]); + } + + flipped_bits_ratio = (float)flipped_bits / total_bits; + + if (flipped_bits) + printf("FLIPPED BITS: %u (%u %%)\n", flipped_bits, (unsigned)(flipped_bits_ratio * 100.0)); + + /* TODO: S3TC threshold is random */ + if (flipped_bits_ratio > (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC ? 0.1 : 0)) { + print_packed(format_desc, "FAILED: ", packed, " obtained\n"); + print_packed(format_desc, " ", test->packed, " expected\n"); + return FALSE; + } + return TRUE; +} static void print_unpacked_doubl(const struct util_format_description *format_desc, @@ -94,7 +145,7 @@ print_unpacked_doubl(const struct util_format_description *format_desc, static void print_unpacked_float(const struct util_format_description *format_desc, const char *prefix, - const float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], + float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], const char *suffix) { unsigned i, j; @@ -115,7 +166,7 @@ print_unpacked_float(const struct util_format_description *format_desc, static void print_unpacked_8unorm(const struct util_format_description *format_desc, const char *prefix, - const uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], + uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], const char *suffix) { unsigned i, j; @@ -138,26 +189,23 @@ test_format_fetch_float(const struct util_format_description *format_desc, { float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } }; unsigned i, j, k; - boolean success; + float max_error = 0.0f; - success = TRUE; for (i = 0; i < format_desc->block.height; ++i) { for (j = 0; j < format_desc->block.width; ++j) { format_desc->fetch_float(unpacked[i][j], test->packed, j, i); - for (k = 0; k < 4; ++k) { - if (!compare_float(test->unpacked[i][j][k], unpacked[i][j][k])) { - success = FALSE; - } - } + for (k = 0; k < 4; ++k) + max_error = MAX2(max_error, float_error(test->unpacked[i][j][k], unpacked[i][j][k])); } } - if (!success) { + if (!print_max_error(format_desc, max_error)) { print_unpacked_float(format_desc, "FAILED: ", unpacked, " obtained\n"); print_unpacked_doubl(format_desc, " ", test->unpacked, " expected\n"); + return FALSE; } - return success; + return TRUE; } @@ -167,27 +215,24 @@ test_format_unpack_float(const struct util_format_description *format_desc, { float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } }; unsigned i, j, k; - boolean success; + float max_error = 0.0f; format_desc->unpack_float(&unpacked[0][0][0], sizeof unpacked[0], test->packed, 0, format_desc->block.width, format_desc->block.height); - success = TRUE; for (i = 0; i < format_desc->block.height; ++i) { for (j = 0; j < format_desc->block.width; ++j) { - for (k = 0; k < 4; ++k) { - if (!compare_float(test->unpacked[i][j][k], unpacked[i][j][k])) { - success = FALSE; - } - } + for (k = 0; k < 4; ++k) + max_error = MAX2(max_error, float_error(test->unpacked[i][j][k], unpacked[i][j][k])); } } - if (!success) { + if (!print_max_error(format_desc, max_error)) { print_unpacked_float(format_desc, "FAILED: ", unpacked, " obtained\n"); print_unpacked_doubl(format_desc, " ", test->unpacked, " expected\n"); + return FALSE; } - return success; + return TRUE; } @@ -199,16 +244,10 @@ test_format_pack_float(const struct util_format_description *format_desc, float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4]; uint8_t packed[UTIL_FORMAT_MAX_PACKED_BYTES]; unsigned i, j, k; - boolean success; - if (test->format == PIPE_FORMAT_DXT1_RGBA) { - /* - * Skip S3TC as packed representation is not canonical. - * - * TODO: Do a round trip conversion. - */ + /* XXX: this test is broken */ + if (test->format == PIPE_FORMAT_DXT1_RGBA) return TRUE; - } memset(packed, 0, sizeof packed); for (i = 0; i < format_desc->block.height; ++i) { @@ -221,17 +260,7 @@ test_format_pack_float(const struct util_format_description *format_desc, format_desc->pack_float(packed, 0, &unpacked[0][0][0], sizeof unpacked[0], format_desc->block.width, format_desc->block.height); - success = TRUE; - for (i = 0; i < format_desc->block.bits/8; ++i) - if ((test->packed[i] & test->mask[i]) != (packed[i] & test->mask[i])) - success = FALSE; - - if (!success) { - print_packed(format_desc, "FAILED: ", packed, " obtained\n"); - print_packed(format_desc, " ", test->packed, " expected\n"); - } - - return success; + return print_packed_results(format_desc, test, packed); } @@ -266,29 +295,26 @@ test_format_unpack_8unorm(const struct util_format_description *format_desc, uint8_t unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } }; uint8_t expected[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } }; unsigned i, j, k; - boolean success; + float max_error; format_desc->unpack_8unorm(&unpacked[0][0][0], sizeof unpacked[0], test->packed, 0, 1, 1); convert_float_to_8unorm(&expected[0][0][0], &test->unpacked[0][0][0]); - success = TRUE; for (i = 0; i < format_desc->block.height; ++i) { for (j = 0; j < format_desc->block.width; ++j) { - for (k = 0; k < 4; ++k) { - if (expected[i][j][k] != unpacked[i][j][k]) { - success = FALSE; - } - } + for (k = 0; k < 4; ++k) + max_error = MAX2(max_error, byte_error(expected[i][j][k], unpacked[i][j][k])); } } - if (!success) { + if (!print_max_error(format_desc, max_error)) { print_unpacked_8unorm(format_desc, "FAILED: ", unpacked, " obtained\n"); print_unpacked_8unorm(format_desc, " ", expected, " expected\n"); + return FALSE; } - return success; + return TRUE; } @@ -298,17 +324,10 @@ test_format_pack_8unorm(const struct util_format_description *format_desc, { uint8_t unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4]; uint8_t packed[UTIL_FORMAT_MAX_PACKED_BYTES]; - unsigned i; - boolean success; - if (test->format == PIPE_FORMAT_DXT1_RGBA) { - /* - * Skip S3TC as packed representation is not canonical. - * - * TODO: Do a round trip conversion. - */ + /* XXX: this test is broken */ + if (test->format == PIPE_FORMAT_DXT1_RGBA) return TRUE; - } if (!convert_float_to_8unorm(&unpacked[0][0][0], &test->unpacked[0][0][0])) { /* @@ -321,17 +340,7 @@ test_format_pack_8unorm(const struct util_format_description *format_desc, format_desc->pack_8unorm(packed, 0, &unpacked[0][0][0], sizeof unpacked[0], 1, 1); - success = TRUE; - for (i = 0; i < format_desc->block.bits/8; ++i) - if ((test->packed[i] & test->mask[i]) != (packed[i] & test->mask[i])) - success = FALSE; - - if (!success) { - print_packed(format_desc, "FAILED: ", packed, " obtained\n"); - print_packed(format_desc, " ", test->packed, " expected\n"); - } - - return success; + return print_packed_results(format_desc, test, packed); } -- 1.7.0.1.147.g6d84b |
From: Jakob B. <wal...@gm...> - 2010-04-03 15:51:31
|
On Sun, Mar 28, 2010 at 6:13 PM, Chia-I Wu <ol...@gm...> wrote: > Hi Jakob, > > This patch series adds support for GL_OES_EGL_image to st/mesa. The first > patch implements st_manager::get_egl_image in st/egl. The hook is used to > check and return an st_egl_image, which describes an EGLImageKHR. The second > patch implements GL_OES_EGL_image in st/mesa, and the last patch adds a demo > for the new functionality. I've tested it with egl_x11_i915.so, but it should > work with other hardware drivers. > > Do you mind having a look at the patches, especially the first one? I'd like > to hear your opinions before merging the patches, and going on to work on > EGLImage support in st/dri. Hi Chia-I Terribly sorry for taking this long to reply. The patches look good go ahead and commit. Regarding EGLImage in st/dri don't let me stop you if you have a itch to do it. If I get time over sometime I'll ask you then if you have done anything. And again thanks for the work hard work! Cheers Jakob. |
From: Luca B. <lu...@lu...> - 2010-04-03 15:31:53
|
For instance, the DXT1 test is wrong. The red values used are: 33 93 153 214 99 - 33 = 60 153 - 93 = 60 214 - 153 = 61 213 should be used instead (i.e. 0xd5 instead 0xd6) |
From: Luca B. <lu...@lu...> - 2010-04-03 15:21:06
|
They are not passing for me with current master and a 32-bit system: Here are the failures: Testing util_format_dxt1_rgb_pack_8unorm ... FAILED: f2 d7 90 20 ae 2c 6f 97 obtained f2 d7 b0 20 ae 2c 6f 97 expected Testing util_format_dxt5_rgba_pack_8unorm ... FAILED: f7 10 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a obtained f8 11 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a expected Testing util_format_dxt1_rgb_unpack_8unorm ... FAILED: {0x99, 0xb0, 0x8e, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x99, 0xb0, 0x8e, 0xff} obtained {0x98, 0xaf, 0x8e, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x98, 0xaf, 0x8e, 0xff} expected Testing util_format_dxt1_rgba_unpack_8unorm ... FAILED: {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e, 0xaa, 0x90, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x73, 0x55, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e, 0xaa, 0x90, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff} obtained {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e, 0xa9, 0x8f, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x73, 0x54, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e, 0xa9, 0x8f, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff} expected Testing util_format_dxt3_rgba_unpack_8unorm ... FAILED: {0x6d, 0xc6, 0x96, 0x77}, {0x6d, 0xc6, 0x96, 0xee}, {0x6d, 0xc6, 0x96, 0xaa}, {0x8c, 0xff, 0xb5, 0x44}, {0x6d, 0xc6, 0x96, 0xff}, {0x6d, 0xc6, 0x96, 0x88}, {0x31, 0x55, 0x5a, 0x66}, {0x6d, 0xc6, 0x96, 0x99}, {0x31, 0x55, 0x5a, 0xbb}, {0x31, 0x55, 0x5a, 0x55}, {0x31, 0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xcc}, {0x6d, 0xc6, 0x96, 0xcc}, {0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x44}, {0x31, 0x55, 0x5a, 0x88} obtained {0x6c, 0xc6, 0x96, 0x77}, {0x6c, 0xc6, 0x96, 0xee}, {0x6c, 0xc6, 0x96, 0xa9}, {0x8c, 0xff, 0xb5, 0x43}, {0x6c, 0xc6, 0x96, 0xff}, {0x6c, 0xc6, 0x96, 0x87}, {0x31, 0x54, 0x5a, 0x66}, {0x6c, 0xc6, 0x96, 0x98}, {0x31, 0x54, 0x5a, 0xba}, {0x31, 0x54, 0x5a, 0x54}, {0x31, 0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xcc}, {0x6c, 0xc6, 0x96, 0xcc}, {0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x43}, {0x31, 0x54, 0x5a, 0x87} expected Testing util_format_dxt5_rgba_unpack_8unorm ... FAILED: {0x6d, 0xc6, 0x96, 0x74}, {0x6d, 0xc6, 0x96, 0xf8}, {0x6d, 0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6d, 0xc6, 0x96, 0xf8}, {0x6d, 0xc6, 0x96, 0x95}, {0x31, 0x55, 0x5a, 0x53}, {0x6d, 0xc6, 0x96, 0x95}, {0x31, 0x55, 0x5a, 0xb6}, {0x31, 0x55, 0x5a, 0x53}, {0x31, 0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xd7}, {0x6d, 0xc6, 0x96, 0xb6}, {0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x32}, {0x31, 0x55, 0x5a, 0x95} obtained {0x6c, 0xc6, 0x96, 0x73}, {0x6c, 0xc6, 0x96, 0xf7}, {0x6c, 0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6c, 0xc6, 0x96, 0xf7}, {0x6c, 0xc6, 0x96, 0x95}, {0x31, 0x54, 0x5a, 0x53}, {0x6c, 0xc6, 0x96, 0x95}, {0x31, 0x54, 0x5a, 0xb6}, {0x31, 0x54, 0x5a, 0x53}, {0x31, 0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xd7}, {0x6c, 0xc6, 0x96, 0xb6}, {0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x31}, {0x31, 0x54, 0x5a, 0x95} expected Compiling libtxc_dxtn with -O0 or with -march=core2 -msse2 -mfpmath=sse did not make them work. As you can see the tests seem mostly off-by-one, which makes me think of an approximation problem. libtxc_dxtn seems to take 8-bit input instead of floating point input, so and it seems to be inherently hard to get it to roundtrip sensibly. Since only integer-coordinate points can be used, they are unlikely to be exactly on a line unless specifically crafted to be so. Thus, a possible solution could be to actually pick a starting color, pick an increment, and generate an exact line by adding multiples of that increment to the starting color. |
From: Chia-I Wu <ol...@gm...> - 2010-04-03 14:19:22
|
On Sat, Apr 3, 2010 at 3:11 PM, Dave Airlie <ai...@gm...> wrote: > The piglit read-front.c test is failing and the rabbits warren that is > front buffer rendering in mesa st + dri st isn't helping me solve it. > One thing I noticed was check_create_front_buffers is called in a > number of places in the st, however it seems to never be used, as we > call st_manager_add_color_renderbuffer moments before and that sets up > the buffer. > so > if (fb->Attachment[frontIndex].Renderbuffer == NULL) { > this always fails and we never do any of that stuff. > Maybe someone has a clue on how this is meant to work and I can implement that. DRI drivers use st_manager_add_color_renderbuffer path. check_create_front_buffers is no-op for them. The latter is used by st/wgl, which still uses st_public.h. i915g passes the read-front test on my 945GM laptop. The failure could be that some states are not correctly invalidated in st_manager_add_color_renderbuffer and r300g (I assume this is your platform) could not reflect the change. -- ol...@Lu... |
From: Jose F. <jfo...@vm...> - 2010-04-03 10:15:48
|
Thanks Luca. Concerning u_format_test.c, I'm not sure the problem is lossiness or ambivalence in the format or a bug in the compressor, but there was logic in u_format_test.c to skip the DXT1_RGBA packing -- all other tests were passing. Lossiness by itself doesn't explain the test failure because we're feeding to the compressor the RGBA data that resulted from decompressing. Given that DXT compression works by interpolating colors in a line segment of the RGB color space, when re-feeding the decompressed output to the compressor it should quickly find the line as all colors points lie exactly on it. "Exactly" is a too string word, as there is rounding, which could be in the root of the differences. Jose ________________________________________ From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...] Sent: Saturday, April 03, 2010 1:48 To: Jose Fonseca Cc: mes...@li... Subject: Re: [Mesa3d-dev] How do we init half float tables? The s3tc-teximage test seems fixed by the two line change I put in gallium-util-format-is-supported. s3tc-texsubimage prints: Mesa: User error: GL_INVALID_VALUE in glTexSubImage2D(xoffset+width) Probe at (285,12) Expected: 1.000000 0.000000 0.000000 Observed: 0.000000 0.000000 0.000000 which seems to be due to a Mesa or testcase bug. As for u_format_test.c, it looks like it simply fails to account for DXTn being lossy. |
From: Corbin S. <mos...@gm...> - 2010-04-03 09:23:36
|
On Sat, Apr 3, 2010 at 3:31 PM, Tom Stellard <tst...@gm...> wrote: > Hi, > > I have completed a first draft of my Google Summer of Code > proposal, and I would appreciate feedback from some of the > Mesa developers. I have included the project plan from my > proposal in this email, and you can also view my full proposal here: > http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/tstellar/t126997450856 > However, I think you will need a google login to view it. > > Project Tasks: > > 1. Enable branch emulation for Gallium drivers: > The goal of this task will be to create an optional "optimization" pass > over the TGSI code to translate branch instructions into instructions > that are supported by cards without hardware branching. The basic > strategy for doing this translation will be: > > A. Copy values of in scope variables > to a temporary location before executing the conditional statement. > > B. Execute the "if true" branch. > > C. Test the conditional expression. If it evaluates to false, rollback > all values that were modified in the "if true" branch. > > D. Repeat step 2 with the "if false" branch, and then step 3, but this > time only rollback if the conditional expression evaluates to true. > > The TGSI instructions SLT, SNE, SGE, SEQ will be used to test the > conditional expression and the instruction CND will be used to rollback > the values. > > There will be two phases to this task. For phase 1, I will implement a > simple translator that will be able to translate the branch instructions > with only one pass through the TGSI code. This simple translator will > copy all in scope variables to a temporary location before executing the > conditional statement, even if those variables will not not be modified > in either of the branches. > > Phase 2 will add a preliminary pass before to the code translation > pass that will mark variables that might be modified by the conditional > statement. Then, during the translation pass, only the variables that > could potentially be modified inside either of the conditional branches > will be copied before the conditional statement is executed. > > 2. Unroll loops for Gallium drivers: > The goal of this task will be to unroll loops so that they can be > executed by hardware that does not support them. The loop unrolling > will be done in the same "optimization" pass as the branch emulation. > Loops where the number of iterations is known at compile time will be > unrolled and may have additional optimizations applied. Loops that > have an unknown number of iterations, will have to be studied to see > if there is a way to replace the loop with a set of instructions that > produces the same output as the loop. For example, one solution might > be to replace an ADD(src0, src0) instruction that is supposed to execute > n times with a MUL(src0, n). It is possible that not all loops will be > able to be unrolled successfully. > > These first two tasks are important not only for older cards that do not > support hardware branching, but newer cards as well. Driver developers > will not need to use every hardware instruction to compile shaders > with branches and loops, so they could use the branch emulation as a > temporary solution while hardware support for branching and loops is > being worked on. > > 3. Loops and Conditionals for R500 fragment and vertex shaders: > The goal of this task will be to make use of the R500 hardware support for > branches and loops. New radeon_compiler opcodes (RC_OPCODE_*) will need > to be added to represent loops, and the corresponding TGSI instructions > will need to be converted into these new opcodes during the TGSI_OPCODE_* > to RC_OPCODE_* phase. Once this has been done, the code generator for > R500 vertex and fragment shaders will need to be modified to output the > correct hardware instructions for loops. > > 4. More compiler optimizations / other GLSL features: > This is an optional task that will allow me to revisit the work from the > previous tasks and explore doing some optimizations I may have wanted to > do, but were outside the scope of those tasks. If there are no obvious > optimizations to be done, this time could be spent implementing some > other GLSL features for the R300 driver, possible ideas include: > > Adding support for the gl_FrontFacing variable. > Handling varying modifiers like perspective, flat, and centroid. > Improving the GLSL frontend to add support for more language features. > > Schedule / Deliverables: > 1. Enable branch emulation for Gallium drivers (4 weeks) > 2. Unroll loops for Gallium drivers (2 - 3 weeks) > Midterm Evaluation > 3. Loops and Conditionals for R500 fragment and vertex shaders (4 weeks) > 4. More compiler optimizations / other GLSL features (2 weeks) > > Tasks 1-3 will be required for this project. > Task 4 is optional. > > Thank you. Wow! Looks like you're certainly on the right track and you've been doing your research. I would say that the first two items on your list would be fine as a complete project. TGSI streams are tricky to modify, and you may find that you have to write more and more TGSI-specific code as you dig in. (For example, there are no helpers for strength reduction in TGSI yet.) I'll wait for everybody else to chime in, but it looks good so far. ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson <Mos...@gm...> |
From: <bug...@fr...> - 2010-04-03 08:22:23
|
https://bugs.freedesktop.org/show_bug.cgi?id=26666 Ruslan <b7....@gm...> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #1 from Ruslan <b7....@gm...> 2010-04-03 01:22:15 PDT --- Fixed in 7.7.1 -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. |
From: Tom S. <tst...@gm...> - 2010-04-03 07:33:33
|
Hi, I have completed a first draft of my Google Summer of Code proposal, and I would appreciate feedback from some of the Mesa developers. I have included the project plan from my proposal in this email, and you can also view my full proposal here: http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/tstellar/t126997450856 However, I think you will need a google login to view it. Project Tasks: 1. Enable branch emulation for Gallium drivers: The goal of this task will be to create an optional "optimization" pass over the TGSI code to translate branch instructions into instructions that are supported by cards without hardware branching. The basic strategy for doing this translation will be: A. Copy values of in scope variables to a temporary location before executing the conditional statement. B. Execute the "if true" branch. C. Test the conditional expression. If it evaluates to false, rollback all values that were modified in the "if true" branch. D. Repeat step 2 with the "if false" branch, and then step 3, but this time only rollback if the conditional expression evaluates to true. The TGSI instructions SLT, SNE, SGE, SEQ will be used to test the conditional expression and the instruction CND will be used to rollback the values. There will be two phases to this task. For phase 1, I will implement a simple translator that will be able to translate the branch instructions with only one pass through the TGSI code. This simple translator will copy all in scope variables to a temporary location before executing the conditional statement, even if those variables will not not be modified in either of the branches. Phase 2 will add a preliminary pass before to the code translation pass that will mark variables that might be modified by the conditional statement. Then, during the translation pass, only the variables that could potentially be modified inside either of the conditional branches will be copied before the conditional statement is executed. 2. Unroll loops for Gallium drivers: The goal of this task will be to unroll loops so that they can be executed by hardware that does not support them. The loop unrolling will be done in the same "optimization" pass as the branch emulation. Loops where the number of iterations is known at compile time will be unrolled and may have additional optimizations applied. Loops that have an unknown number of iterations, will have to be studied to see if there is a way to replace the loop with a set of instructions that produces the same output as the loop. For example, one solution might be to replace an ADD(src0, src0) instruction that is supposed to execute n times with a MUL(src0, n). It is possible that not all loops will be able to be unrolled successfully. These first two tasks are important not only for older cards that do not support hardware branching, but newer cards as well. Driver developers will not need to use every hardware instruction to compile shaders with branches and loops, so they could use the branch emulation as a temporary solution while hardware support for branching and loops is being worked on. 3. Loops and Conditionals for R500 fragment and vertex shaders: The goal of this task will be to make use of the R500 hardware support for branches and loops. New radeon_compiler opcodes (RC_OPCODE_*) will need to be added to represent loops, and the corresponding TGSI instructions will need to be converted into these new opcodes during the TGSI_OPCODE_* to RC_OPCODE_* phase. Once this has been done, the code generator for R500 vertex and fragment shaders will need to be modified to output the correct hardware instructions for loops. 4. More compiler optimizations / other GLSL features: This is an optional task that will allow me to revisit the work from the previous tasks and explore doing some optimizations I may have wanted to do, but were outside the scope of those tasks. If there are no obvious optimizations to be done, this time could be spent implementing some other GLSL features for the R300 driver, possible ideas include: Adding support for the gl_FrontFacing variable. Handling varying modifiers like perspective, flat, and centroid. Improving the GLSL frontend to add support for more language features. Schedule / Deliverables: 1. Enable branch emulation for Gallium drivers (4 weeks) 2. Unroll loops for Gallium drivers (2 - 3 weeks) Midterm Evaluation 3. Loops and Conditionals for R500 fragment and vertex shaders (4 weeks) 4. More compiler optimizations / other GLSL features (2 weeks) Tasks 1-3 will be required for this project. Task 4 is optional. Thank you. -Tom Stellard |
From: Dave A. <ai...@gm...> - 2010-04-03 07:11:16
|
The piglit read-front.c test is failing and the rabbits warren that is front buffer rendering in mesa st + dri st isn't helping me solve it. One thing I noticed was check_create_front_buffers is called in a number of places in the st, however it seems to never be used, as we call st_manager_add_color_renderbuffer moments before and that sets up the buffer. so if (fb->Attachment[frontIndex].Renderbuffer == NULL) { this always fails and we never do any of that stuff. Maybe someone has a clue on how this is meant to work and I can implement that. Dave. |
From: Marek O. <ma...@gm...> - 2010-04-03 06:23:13
|
There's something fishy in u_upload_mgr, could you please review the first two patches here? http://cgit.freedesktop.org/~mareko/mesa/log/?h=gallium-resources With this, r300g works again. -Marek On Fri, Apr 2, 2010 at 4:17 PM, Roland Scheidegger <sr...@vm...>wrote: > I'm planning on merging the gallium-resources branch shortly (after > easter). Due to the amount of code changed, it wouldn't be unexpected if > some drivers break here and there. So it would be nice if the respective > driver authors could take a look at that branch now. > > If you've missed the discussion about this branch and what this is > about, here it is: > > http://www.mail-archive.com/mes...@li.../msg12726.html > > I've also removed the video interfaces completely, as they weren't > ported to the interface changes and actually some of the video code > missed some earlier interface changes so didn't build anyway. Video > related work should be done on pipe-video branch which had newer stuff > (for video) already. > > > Roland > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > |
From: Dave A. <ai...@gm...> - 2010-04-03 03:43:12
|
Just going down the r300g piglit failures and noticed fbo-drawbuffers failed, I've no idea if this passes on Intel hw, but it appears the texenvprogram really needs to understand the draw buffers. The attached patch fixes it here for me on r300g anyone want to test this on Intel with the piglit test before/after? Dave. |
From: Luca B. <lu...@lu...> - 2010-04-03 00:48:15
|
The s3tc-teximage test seems fixed by the two line change I put in gallium-util-format-is-supported. s3tc-texsubimage prints: Mesa: User error: GL_INVALID_VALUE in glTexSubImage2D(xoffset+width) Probe at (285,12) Expected: 1.000000 0.000000 0.000000 Observed: 0.000000 0.000000 0.000000 which seems to be due to a Mesa or testcase bug. As for u_format_test.c, it looks like it simply fails to account for DXTn being lossy. |
From: Jose F. <jfo...@vm...> - 2010-04-03 00:28:42
|
OK, I can relate with your reasoning. It's no biggie. Jose ________________________________________ From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...] Sent: Saturday, April 03, 2010 1:23 To: Jose Fonseca Cc: mes...@li... Subject: Re: [Mesa3d-dev] How do we init half float tables? > One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak. Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me. Yes sorry, it was an attempt to fix breakage originally caused by code of mine that was sent out in a non-fully-mergeable state (to prevent duplicate work on half float conversion) and got merged anyway. Since master was already broken (due to u_gctors.cpp not being picked up by ld), it seemed a good idea to try to fix it. Unfortunately what seemed to be an easy fix gradually became something much more invasive than originally envisioned. After realizing the util_format_init thing wouldn't work out, I should have made these call util_format_s3tc_init again (was changed so they would init util_half as well) and then sent the util_foramt changes for review. I added a gallium-util-format-is-supported branch to hold the work and the fix I just sent. Sorry for not doing that in the first place. |
From: Jose F. <jfo...@vm...> - 2010-04-03 00:26:11
|
Probably the problems are just as you describe. But I'll be offline soon so I'll only review this and all your other changes carefully another day. Jose ________________________________________ From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...] Sent: Saturday, April 03, 2010 1:08 To: Jose Fonseca Cc: Brian Paul; mes...@li... Subject: Re: [Mesa3d-dev] How do we init half float tables? Sorry for the regression. This whole thing was done to fix the u_gctors.cpp issue, originally done by me, sent out without full testing since I saw duplicate work being done, and then merged by Roland if I recall correctly. I probably should not have fixed s3tc/util_format like it was done for u_half and instead put it in a branch and sent it to the ML first. Note that everything that reads pixels and does not call util_format_s3tc_init (e.g. I think rbug tools) needs something like this, or an explicit call which is likely to be forgotten (even finding out everything that ends up calling util_format is nontrivial). Anyway, this patch fixes a couple of bugs that may have caused the regression. How can I reproduce it locally? The DXTn unit tests do fail, but the values have usually a difference of 1, so I assume it's an approximation error. commit 80214ef6265d406496dc4fd3c76d8ac782cd012b Author: Luca Barbieri <lu...@lu...> Date: Sat Apr 3 01:55:27 2010 +0200 gallium/util: fix inverted if is_nop logic in s3tc diff --git a/src/gallium/auxiliary/util/u_format_s3tc.c b/src/gallium/auxiliary/util/u_format_s3tc.c index d48551f..7808210 100644 --- a/src/gallium/auxiliary/util/u_format_s3tc.c +++ b/src/gallium/auxiliary/util/u_format_s3tc.c @@ -303,7 +303,7 @@ util_format_dxt3_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const void util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, unsigned height) { - if (is_nop(util_format_dxt5_rgba_fetch)) { + if (!is_nop(util_format_dxt5_rgba_fetch)) { unsigned x, y, i, j; for(y = 0; y < height; y += 4) { const uint8_t *src = src_row; @@ -324,7 +324,7 @@ util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const void util_format_dxt1_rgb_unpack_float(float *dst_row, unsigned dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, unsigned height) { - if (is_nop(util_format_dxt1_rgb_fetch)) { + if (!is_nop(util_format_dxt1_rgb_fetch)) { unsigned x, y, i, j; for(y = 0; y < height; y += 4) { const uint8_t *src = src_row; |
From: Luca B. <lu...@lu...> - 2010-04-03 00:23:31
|
> One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak. Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me. Yes sorry, it was an attempt to fix breakage originally caused by code of mine that was sent out in a non-fully-mergeable state (to prevent duplicate work on half float conversion) and got merged anyway. Since master was already broken (due to u_gctors.cpp not being picked up by ld), it seemed a good idea to try to fix it. Unfortunately what seemed to be an easy fix gradually became something much more invasive than originally envisioned. After realizing the util_format_init thing wouldn't work out, I should have made these call util_format_s3tc_init again (was changed so they would init util_half as well) and then sent the util_foramt changes for review. I added a gallium-util-format-is-supported branch to hold the work and the fix I just sent. Sorry for not doing that in the first place. |
From: Jose F. <jfo...@vm...> - 2010-04-03 00:23:08
|
Both ways are useful: single pixel decompression for texture sampling, whole block for whole image conversions. Jose ________________________________________ From: Roland Scheidegger [sr...@vm...] Sent: Friday, April 02, 2010 17:27 To: Luca Barbieri Cc: Jose Fonseca; mes...@li... Subject: Re: [Mesa3d-dev] How do we init half float tables? On 02.04.2010 17:09, Luca Barbieri wrote: > Additionally, the S3TC library may now support only a subset of the > formats. This may be even more useful as further compressed formats > are added. FWIW, I don't see any new s3tc formats. rgtc will not be handled by s3tc library since it isn't patent encumbered. util_format_is_s3tc will not include rgtc formats. (Though I guess that external decoding per-pixel is really rather lame, should do it per-block...) Roland |
From: Jose F. <jfo...@vm...> - 2010-04-03 00:12:43
|
u_format_test started failing and it was not one day ago. Vinson reported some texture compression tests that just got working with my recent changes started to failing again. I'm not sure if it's the constructor mechanism, my platform (64bit), or some bug in the code. I just reverted all your recent util format changes. Not all look bad but I just don't have the time to separate the baby from the water. Sorry. I'll cherry pick some of them after I have more time to review and test them. One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak. Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me. Either: - send me an email and buy in my support before implementing - send a patch of the implementation changes so that I can review - implement in a feature branch - or, if you think I'm unreasonable, just make a fork of the whole thing and do whatever you like without breaking the existing code that relies on it. master branch should be broken as little as possible as there is a lot of automated/manual testing going on that depends upon it. And going over and modifying code I just commited hinders my progress. Jose ________________________________________ From: luc...@gm... [luc...@gm...] On Behalf Of Luca Barbieri [lu...@lu...] Sent: Saturday, April 03, 2010 0:50 To: Jose Fonseca Cc: Brian Paul; mes...@li... Subject: Re: [Mesa3d-dev] How do we init half float tables? What are you seeing a regression on? texcompress and texcompsub seemed to work for me: I'll try to test something else and recheck the code. |
From: Luca B. <lu...@lu...> - 2010-04-03 00:08:47
|
Sorry for the regression. This whole thing was done to fix the u_gctors.cpp issue, originally done by me, sent out without full testing since I saw duplicate work being done, and then merged by Roland if I recall correctly. I probably should not have fixed s3tc/util_format like it was done for u_half and instead put it in a branch and sent it to the ML first. Note that everything that reads pixels and does not call util_format_s3tc_init (e.g. I think rbug tools) needs something like this, or an explicit call which is likely to be forgotten (even finding out everything that ends up calling util_format is nontrivial). Anyway, this patch fixes a couple of bugs that may have caused the regression. How can I reproduce it locally? The DXTn unit tests do fail, but the values have usually a difference of 1, so I assume it's an approximation error. commit 80214ef6265d406496dc4fd3c76d8ac782cd012b Author: Luca Barbieri <lu...@lu...> Date: Sat Apr 3 01:55:27 2010 +0200 gallium/util: fix inverted if is_nop logic in s3tc diff --git a/src/gallium/auxiliary/util/u_format_s3tc.c b/src/gallium/auxiliary/util/u_format_s3tc.c index d48551f..7808210 100644 --- a/src/gallium/auxiliary/util/u_format_s3tc.c +++ b/src/gallium/auxiliary/util/u_format_s3tc.c @@ -303,7 +303,7 @@ util_format_dxt3_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const void util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, unsigned height) { - if (is_nop(util_format_dxt5_rgba_fetch)) { + if (!is_nop(util_format_dxt5_rgba_fetch)) { unsigned x, y, i, j; for(y = 0; y < height; y += 4) { const uint8_t *src = src_row; @@ -324,7 +324,7 @@ util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const void util_format_dxt1_rgb_unpack_float(float *dst_row, unsigned dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, unsigned height) { - if (is_nop(util_format_dxt1_rgb_fetch)) { + if (!is_nop(util_format_dxt1_rgb_fetch)) { unsigned x, y, i, j; for(y = 0; y < height; y += 4) { const uint8_t *src = src_row; |