From: Luca B. <lu...@lu...> - 2010-04-13 10:56:00
|
This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. It can also be found in the RFC-gallium-semantics branch. It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. In particular, the following issues are still open: 1. COLOR clamping (and floating point framebuffers) 2. A linkage table CSO allowing to specify non-identity linkage 3. BCOLOR/FACE-related issues 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided This topic was already very extensively discussed. See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. This message should probably be adapted into gallium/docs if/when this is accepted. Here is the short summary; the long rationale follows after it. The proposal here is to add the following limitations to Gallium, for the intermediate semantics: 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. =============== Michal Krol's proposal =============== First of all, see Michal Krol's proposal at http://www.opensource-archive.org/showthread.php?t=148573, and in particular: << name index range ---------------------------- POSITION no limit? COLOR 0..1, explicit clamp? BCOLOR 0..1, explicit clamp? FOG remove? PSIZE 0 GENERIC 0..<max generics> NORMAL remove FACE 0 EDGEFLAG 0 PRIMID 0 INSTANCEID 0 >> My proposal follows this, except for limiting POSITION to 0 too. Not sure why Michal thought "no limit" could make sense: the POSITION is fundamentally a singleton, since it is the input to the rasterizer unit. ====================== An overview of hardware support ====================== Hardware with no capabilities. - nv30 does not support any mapping. However, we already need to patch fragment programs to insert constants, so we can patch input register numbers as well. The current driver only supports 0-7 generic indices, but I already implemented support for 0-255 indices with in-driver linkage and patching. Note that nv30 lacks control flow in fragment programs. - nv40 is like nv30, but supports fp control flow, and may have some configurable mapping support, with unknown behavior Hardware with capabilities that must be configured for each fp/vp pair. - nv40 might have this but the nVidia OpenGL driver does not use them - nv50 has configurable vp->gp and gp->fp mappings with 64 entries. The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses an inefficient O(n^2) algorithm to be able to do that - r300 appears to have a configurable vp->fp mapping. The current driver only supports 0-15 generic indices, but redefining ATTR_GENERIC_COUNT could be enough to have it support larger numbers. Hardware with automatic linkage when semantics match: - VMWare svga appears to support 14 * 16 semantics, but the current driver only supports 0-15 generic indices. This could be fixed by mapping GENERIC into all non-special SM3 semantics. Hardware that can do both configurable mappings and automatic linkage: - r600 supports linkage in hardware between matching apparently byte-sized semantic ids Other hardware; - i915 has no hardware vertex shading The current driver is broken and only supports 0-7 indices: this seems easy to fix though - Not sure about i965 =================== An overview of software APIs =================== 1. DirectX 9 SM3 supports indices in the 0-15 range associated with semantics in the 0-13 range. A few of the name/index pairs have special meanings, but the others are just cosmetic as long as the fixed pipeline is not used. Thus, SM3 wants to use 14 * 16 indices overall. Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC semantics, leaving 219 semantics handled by GENERIC 2. SM2 and non-GLSL OpenGL just want to use as many indices as the hardware interpolator count, sometimes limiting that further They are the most easy and straightforward ones. 3. DirectX 10 seems to only require a 0-31 range. In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and 32-bit indices. However, this information is encoded as metadata in the output file, and the shader bytecode itself uses integers in the 0-31 range to refer to the metadata. It seems that the metadata is resolved by the Microsoft DirectX 10 runtime, and the driver only sees 0-31 indices on the DDI interface. However, this is a bit unclear: confirmation or correction would be appreciated. 4. GLSL requires to provide both shaders at link time, and thus does not constrain the implementation in any way. However, it may be possible to mix GLSL with other shaders, leading to the need to reserve the texcoord slots. In that case, GLSL will need about 8 more slots that the number of effectively used semantics. This is the case with the current Mesa/Gallium implementation 5. GLSL with EXT_separate_shader_objects does not add requirements because only gl_TexCoord and other builtin varyings are supported. User-defined varyings are not supported See in particular the following text from the extension: << It is undesirable from a performance standpoint to attempt to support "rendezvous by name" for arbitrary separate shaders because the separate shaders won't be naturally compiled to match their varying inputs and outputs of the same name without a special link step. Such a special link would introduce an extra validation overhead to binding separate shaders. The link itself would have to be deferred until glBegin time since separate shaders won't match when transitioning from one set of consistent shaders to another. This special link would still create errors or undefined behavior when the names of input and output varyings matched but their types did not match. >> 6. An hypotetical version of EXT_separate_shader_objects extended to support user-defining varyings would either want arbitrary 32-bit generic indices (by interning strings to generate the indices) or the ability to specify a custom mapping between shader indices 7. An hypotetical "no-op" implementation of the GLSL linker would have the same requirement ==================== About non-GENERIC semantics ==================== Also note that non-GENERIC semantics have peculiar properties. For COLOR and BCOLOR: 1. SM3 and OpenGL with glColorClamp appropriately set wants it to _not_ be clamped to [0, 1] 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] (sometimes for fixed point targets only) and may also allow using U8_UNORM precision for it instead of FP32 3. OpenGL allows to enable two-sided lighting, in which case COLOR in the fragment shader is automagically set to BCOLOR for back faces 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. The latest hardware probably supports FACING only. Any API that requires special semantics for COLOR and BCOLOR (i.e. non-SM3) seems to only want 0-1 indices. Note that SM3 does *not* include BCOLOR, so basically the limits for generic indices would need to be conditional on BCOLOR being present or not (e.g. if it is present, we must reserve two semantic slots in svga for it). POSITION0 is obviously special. PSIZE0 is also special for points. FOG0 seems right now to just be a GENERIC with a single component. Gallium could be extended to support fixed function fog, which most DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal to the semantic issue. ============== Current Gallium users ============== Right now no open-source users of Gallium fundamentally require arbitrary indices. In particular: 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices 2. ARB fragment program and vertex program use index-limited texcoord slots 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 4. vega and xorg use indices 0-1 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken 6. The GL_EXT_separate_shader_objects extension does not provide arbitrary index matching for GLSL, but merely lets it use a model similar to ARB fp/vp However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway. =================== Discussion of possible options =================== [Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719] a) Picking a lower number like 128, that an SM3 state tracker could usually be able to directly translate incoming semantics into, but which would force it to renumber under rare circumstances. This would make life easier for the open drivers at the expense of the closed code. b) Picking 256 to make life easier for some closed-source SM3 state tracker, but harder for open drivers. c) Picking 219 (or some other magic number) that happens to work with the current set of constraints, but makes gallium fragile in the face of new constraints. d) Abandoning the current gallium linkage rules and coming up with something new, for instance forcing the state trackers to renumber always and making life trivial for the drivers... [Options from me] (e) Allow arbitrary 32-bit indices. This requires slightly more complicated data structures in some cases, and will require svga and r600 to fallback to software linkage if numbers are too high. (f) Limit semantic indices to hardware interpolators _and_ introduce an interface to let the user specify an Personally I think the simplest idea for now could be to have all drivers support 256 indices or, in the case of r600 and svga, the maximum value supported by the hardware, and expose that as a cap (as well as another cap for the number of different semantic values supported at once). The minimum guaranteed value is set to the lowest hardware constraint, which would be svga with 219 indices (assuming no bcolor is used). If some new constraints pop up, we just lower it and change SM3 state trackers to check for it and fallback otherwise. This should just require simple fixes to svga and r300, and significant code for nv30/nv40, which is however already implemented. Luca Barbieri (5): tgsi: formalize limits on semantic indices tgsi: add support for packing semantics in SM3 byte values gallium/auxiliary: add semantic linkage utility code nvfx: support proper shader linkage - adds glsl support nvfx: expose GLSL Michal Krol (1): gallium: Remove TGSI_SEMANTIC_NORMAL. |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:00
|
From: Michal Krol <mi...@vm...> Use TGSI_SEMANTIC_GENERIC for this kind of stuff. --- src/gallium/auxiliary/tgsi/tgsi_dump.c | 2 +- src/gallium/auxiliary/tgsi/tgsi_text.c | 2 +- src/gallium/docs/source/tgsi.rst | 6 ------ src/gallium/drivers/svga/svga_tgsi_decl_sm30.c | 4 ---- src/gallium/include/pipe/p_shader_tokens.h | 2 +- 5 files changed, 3 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index 5703141..b6df249 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -120,7 +120,7 @@ static const char *semantic_names[] = "FOG", "PSIZE", "GENERIC", - "NORMAL", + "", "FACE", "EDGEFLAG", "PRIM_ID", diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index f918151..356eee0 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -933,7 +933,7 @@ static const char *semantic_names[TGSI_SEMANTIC_COUNT] = "FOG", "PSIZE", "GENERIC", - "NORMAL", + "", "FACE", "EDGEFLAG", "PRIM_ID", diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index c292cd3..d5e0220 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1397,12 +1397,6 @@ These attributes are called "generic" because they may be used for anything else, including parameters, texture generation information, or anything that can be stored inside a four-component vector. -TGSI_SEMANTIC_NORMAL -"""""""""""""""""""" - -Vertex normal; could be used to implement per-pixel lighting for legacy APIs -that allow mixing fixed-function and programmable stages. - TGSI_SEMANTIC_FACE """""""""""""""""" diff --git a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c index 73102a7..05d9102 100644 --- a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c +++ b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c @@ -61,10 +61,6 @@ static boolean translate_vs_ps_semantic( struct tgsi_declaration_semantic semant *idx = semantic.Index + 1; /* texcoord[0] is reserved for fog */ *usage = SVGA3D_DECLUSAGE_TEXCOORD; break; - case TGSI_SEMANTIC_NORMAL: - *idx = semantic.Index; - *usage = SVGA3D_DECLUSAGE_NORMAL; - break; default: assert(0); *usage = SVGA3D_DECLUSAGE_TEXCOORD; diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index c5c480f..baff802 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -139,7 +139,7 @@ struct tgsi_declaration_dimension #define TGSI_SEMANTIC_FOG 3 #define TGSI_SEMANTIC_PSIZE 4 #define TGSI_SEMANTIC_GENERIC 5 -#define TGSI_SEMANTIC_NORMAL 6 + /* gap */ #define TGSI_SEMANTIC_FACE 7 #define TGSI_SEMANTIC_EDGEFLAG 8 #define TGSI_SEMANTIC_PRIMID 9 -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:01
|
Still no control flow support, but basic stuff works. --- src/gallium/drivers/nvfx/nvfx_screen.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/nvfx/nvfx_screen.c b/src/gallium/drivers/nvfx/nvfx_screen.c index 6742759..b935fa9 100644 --- a/src/gallium/drivers/nvfx/nvfx_screen.c +++ b/src/gallium/drivers/nvfx/nvfx_screen.c @@ -42,7 +42,7 @@ nvfx_screen_get_param(struct pipe_screen *pscreen, int param) case PIPE_CAP_TWO_SIDED_STENCIL: return 1; case PIPE_CAP_GLSL: - return 0; + return 1; case PIPE_CAP_ANISOTROPIC_FILTER: return 1; case PIPE_CAP_POINT_SPRITE: -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:03
|
--- src/gallium/include/pipe/p_shader_tokens.h | 18 ++++++++++++++++++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index baff802..5d511ba 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -146,6 +146,24 @@ struct tgsi_declaration_dimension #define TGSI_SEMANTIC_INSTANCEID 10 #define TGSI_SEMANTIC_COUNT 11 /**< number of semantic values */ +/* 219 = (14 * 16 - 5) + * All SM3 semantics minus COLOR0, COLOR1, POSITION0, FOG0 and PSIZE0 + * This value is accurately chosen so that Gallium semantic/indices may be converted + * losslessly from and to SM3 semantics. + * + * Note that if BCOLOR is used, then this value is actually 211 - #MAX_BCOLOR_INDEX_USED - 1 + * (SM3 does not support BCOLOR, and uses FACE instead) + * + * In any card supports more, this will be handled later. + * + * However, drivers should support 256 generic indices if the mechanism + * they use is not intrinsically limited to a lower value. + */ +#define TGSI_SEMANTIC_GENERIC_INDICES 219 + +#define TGSI_SEMANTIC_INDICES(sem) (((sem) == TGSI_SEMANTIC_GENERIC) ? TGSI_SEMANTIC_GENERIC_INDICES : \ + ((sem == TGSI_SEMANTIC_COLOR_INDICES || sem == TGSI_SEMANTIC_BCOLOR_INDICES) ? 2 : 1)) + struct tgsi_declaration_semantic { unsigned Name : 8; /**< one of TGSI_SEMANTIC_x */ -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:04
|
--- src/gallium/auxiliary/util/u_semantics.h | 123 ++++++++++++++++++++++++++++++ 1 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 src/gallium/auxiliary/util/u_semantics.h diff --git a/src/gallium/auxiliary/util/u_semantics.h b/src/gallium/auxiliary/util/u_semantics.h new file mode 100644 index 0000000..d620619 --- /dev/null +++ b/src/gallium/auxiliary/util/u_semantics.h @@ -0,0 +1,123 @@ +#ifndef U_SEMANTICS_H_ +#define U_SEMANTICS_H_ + +#include "pipe/p_compiler.h" +#include "pipe/p_shader_tokens.h" + +/* same as SM3 values */ +#define TGSI_SEMANTIC_BYTE_POSITION 0 +#define TGSI_SEMANTIC_BYTE_PSIZE (4 << 4) +#define TGSI_SEMANTIC_BYTE_COLOR0 (10 << 4) +#define TGSI_SEMANTIC_BYTE_COLOR1 (TGSI_SEMANTIC_BYTE_COLOR0 + 1) +#define TGSI_SEMANTIC_BYTE_FOG (11 << 4) +#define TGSI_SEMANTIC_BYTE_BCOLOR0 (14 << 4) +#define TGSI_SEMANTIC_BYTE_BCOLOR1 (TGSI_SEMANTIC_BYTE_BCOLOR0 + 1) +#define TGSI_SEMANTIC_BYTE_TGSI (15 << 4) + +static INLINE unsigned char +pipe_semantic_to_byte(unsigned name, unsigned index) +{ + switch (name) + { + case TGSI_SEMANTIC_POSITION: + return TGSI_SEMANTIC_BYTE_POSITION; + case TGSI_SEMANTIC_PSIZE: + return TGSI_SEMANTIC_BYTE_PSIZE; + case TGSI_SEMANTIC_FOG: + return TGSI_SEMANTIC_BYTE_FOG; + case TGSI_SEMANTIC_COLOR: + return TGSI_SEMANTIC_BYTE_COLOR0 + index; + case TGSI_SEMANTIC_GENERIC: + ++index; + if(index >= TGSI_SEMANTIC_BYTE_PSIZE) + { + ++index; + if(index >= TGSI_SEMANTIC_BYTE_COLOR0) + { + index += 2; + if(index >= TGSI_SEMANTIC_BYTE_FOG) + ++index; + } + } + return index; + case TGSI_SEMANTIC_BCOLOR: + return TGSI_SEMANTIC_BYTE_BCOLOR0 + index; + default: + return TGSI_SEMANTIC_BYTE_TGSI + name; + } +} + +/* this fits BCOLOR in the SM3 range, but is not reversible */ +static INLINE unsigned char +pipe_semantic_to_byte_sm3(unsigned name, unsigned index) +{ + if(name == TGSI_SEMANTIC_BCOLOR) + return TGSI_SEMANTIC_BYTE_BCOLOR0 - 1 - index; + return pipe_semantic_to_byte(name, index); +} + +static INLINE unsigned +pipe_semantic_name_from_byte(unsigned char value) +{ + switch (value) + { + case TGSI_SEMANTIC_BYTE_POSITION: + return TGSI_SEMANTIC_POSITION; + case TGSI_SEMANTIC_BYTE_PSIZE: + return TGSI_SEMANTIC_PSIZE; + case TGSI_SEMANTIC_BYTE_FOG: + return TGSI_SEMANTIC_FOG; + case TGSI_SEMANTIC_BYTE_COLOR0: + case TGSI_SEMANTIC_BYTE_COLOR1: + return TGSI_SEMANTIC_COLOR; + case TGSI_SEMANTIC_BYTE_BCOLOR0: + case TGSI_SEMANTIC_BYTE_BCOLOR1: + return TGSI_SEMANTIC_BCOLOR; + default: + if(value < TGSI_SEMANTIC_BYTE_TGSI) + return TGSI_SEMANTIC_GENERIC; + else + return value - TGSI_SEMANTIC_BYTE_TGSI; + } +} + +static INLINE unsigned +pipe_semantic_index_from_byte(unsigned char value) +{ + if(value == TGSI_SEMANTIC_BYTE_POSITION) + return 0; + + if(value <= TGSI_SEMANTIC_BYTE_PSIZE) + { + if(value < TGSI_SEMANTIC_BYTE_PSIZE) + return value - 1; + else + return 0; + } + + if(value < (TGSI_SEMANTIC_BYTE_COLOR0 + 2)) + { + if(value < TGSI_SEMANTIC_BYTE_COLOR0) + return value - 2; + else + return value - TGSI_SEMANTIC_BYTE_COLOR0; + } + + if(value <= TGSI_SEMANTIC_BYTE_FOG) + { + if(value < TGSI_SEMANTIC_BYTE_FOG) + return value - 4; + else + return 0; + } + + if(value < TGSI_SEMANTIC_BYTE_BCOLOR0) + return value - 5; + + if(value == (TGSI_SEMANTIC_BYTE_BCOLOR1)) + return 1; + + return 0; +} + +#endif /* U_SEMANTICS_H_ */ -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:06
|
--- src/gallium/auxiliary/Makefile | 1 + src/gallium/auxiliary/util/u_linkage.c | 119 ++++++++++++++++++++++++++++++++ src/gallium/auxiliary/util/u_linkage.h | 38 ++++++++++ 3 files changed, 158 insertions(+), 0 deletions(-) create mode 100644 src/gallium/auxiliary/util/u_linkage.c create mode 100644 src/gallium/auxiliary/util/u_linkage.h diff --git a/src/gallium/auxiliary/Makefile b/src/gallium/auxiliary/Makefile index c4d6b52..44c2f8b 100644 --- a/src/gallium/auxiliary/Makefile +++ b/src/gallium/auxiliary/Makefile @@ -120,6 +120,7 @@ C_SOURCES = \ util/u_hash.c \ util/u_keymap.c \ util/u_linear.c \ + util/u_linkage.c \ util/u_network.c \ util/u_math.c \ util/u_mm.c \ diff --git a/src/gallium/auxiliary/util/u_linkage.c b/src/gallium/auxiliary/util/u_linkage.c new file mode 100644 index 0000000..8a76378 --- /dev/null +++ b/src/gallium/auxiliary/util/u_linkage.c @@ -0,0 +1,119 @@ +#include "util/u_debug.h" +#include "pipe/p_shader_tokens.h" +#include "tgsi/tgsi_parse.h" +#include "tgsi/tgsi_scan.h" +#include "util/u_linkage.h" + +/* we must only record the registers that are actually used, not just declared */ +static INLINE boolean +util_semantic_set_test_and_set(struct util_semantic_set *set, unsigned value) +{ + unsigned mask = 1 << (value % (sizeof(long) * 8)); + unsigned long *p = &set->masks[value / (sizeof(long) * 8)]; + unsigned long v = *p & mask; + *p |= mask; + return !!v; +} + +unsigned +util_semantic_set_from_program_file(struct util_semantic_set *set, const struct tgsi_token *tokens, enum tgsi_file_type file) +{ + struct tgsi_shader_info info; + struct tgsi_parse_context parse; + unsigned count = 0; + ubyte *semantic_name; + ubyte *semantic_index; + + tgsi_scan_shader(tokens, &info); + + if(file == TGSI_FILE_INPUT) + { + semantic_name = info.input_semantic_name; + semantic_index = info.input_semantic_index; + } + else if(file == TGSI_FILE_OUTPUT) + { + semantic_name = info.output_semantic_name; + semantic_index = info.output_semantic_index; + } + else + assert(0); + + tgsi_parse_init(&parse, tokens); + + memset(set->masks, 0, sizeof(set->masks)); + while(!tgsi_parse_end_of_tokens(&parse)) + { + tgsi_parse_token(&parse); + + if(parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_INSTRUCTION) + { + const struct tgsi_full_instruction *finst = &parse.FullToken.FullInstruction; + unsigned i; + for(i = 0; i < finst->Instruction.NumDstRegs; ++i) + { + if(finst->Dst[i].Register.File == file) + { + unsigned idx = finst->Dst[i].Register.Index; + if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC) + { + if(!util_semantic_set_test_and_set(set, semantic_index[idx])) + ++count; + } + } + } + + for(i = 0; i < finst->Instruction.NumSrcRegs; ++i) + { + if(finst->Src[i].Register.File == file) + { + unsigned idx = finst->Src[i].Register.Index; + if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC) + { + if(!util_semantic_set_test_and_set(set, semantic_index[idx])) + ++count; + } + } + } + } + } + tgsi_parse_free(&parse); + + return count; +} + +#define UTIL_SEMANTIC_SET_FOR_EACH(i, set) for(i = 0; i < 256; ++i) if(set->masks[i / (sizeof(long) * 8)] & (1 << (i % (sizeof(long) * 8)))) + +void +util_semantic_layout_from_set(unsigned char *layout, const struct util_semantic_set *set, unsigned efficient_slots, unsigned num_slots) +{ + int first = -1; + int last = -1; + unsigned i; + + memset(layout, 0xff, num_slots); + + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + { + if(first < 0) + first = i; + last = i; + } + + if(last < efficient_slots) + { + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[i] = i; + } + else if((last - first) < efficient_slots) + { + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[i - first] = i; + } + else + { + unsigned idx = 0; + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[idx++] = i; + } +} diff --git a/src/gallium/auxiliary/util/u_linkage.h b/src/gallium/auxiliary/util/u_linkage.h new file mode 100644 index 0000000..e73e0fd --- /dev/null +++ b/src/gallium/auxiliary/util/u_linkage.h @@ -0,0 +1,38 @@ +#ifndef U_LINKAGE_H_ +#define U_LINKAGE_H_ + +#include "pipe/p_compiler.h" + +struct util_semantic_set +{ + unsigned long masks[256 / 8 / sizeof(unsigned long)]; +}; + +static INLINE bool +util_semantic_set_contains(struct util_semantic_set *set, unsigned char value) +{ + return !!(set->masks[value / (sizeof(long) * 8)] & (1 << (value / (sizeof(long) * 8)))); +} + +unsigned util_semantic_set_from_program_file(struct util_semantic_set *set, const struct tgsi_token *tokens, enum tgsi_file_type file); + +/* efficient_slots is the number of slots such that hardware performance is + * the same for using that amount, with holes, or less slots but with less + * holes. + * + * num_slots is the size of the layout array and hardware limit instead. + * + * efficient_slots == 0 or efficient_solts == num_slots are typical settings. + */ +void util_semantic_layout_from_set(unsigned char *layout, const struct util_semantic_set *set, unsigned efficient_slots, unsigned num_slots); + +static INLINE void +util_semantic_table_from_layout(unsigned char *table, unsigned char *layout, unsigned char first_slot_value, unsigned char num_slots) +{ + memset(table, 0xff, sizeof(table)); + + for(int i = 0; i < num_slots; ++i) + table[layout[i]] = first_slot_value + i; +} + +#endif /* U_LINKAGE_H_ */ -- 1.7.0.1.147.g6d84b |
From: Luca B. <lu...@lu...> - 2010-04-13 10:56:07
|
--- src/gallium/drivers/nvfx/nvfx_fragprog.c | 146 ++++++++++++++++++---------- src/gallium/drivers/nvfx/nvfx_shader.h | 1 + src/gallium/drivers/nvfx/nvfx_state.c | 4 + src/gallium/drivers/nvfx/nvfx_state.h | 15 +++ src/gallium/drivers/nvfx/nvfx_state_emit.c | 2 +- src/gallium/drivers/nvfx/nvfx_vertprog.c | 40 ++++++-- 6 files changed, 143 insertions(+), 65 deletions(-) diff --git a/src/gallium/drivers/nvfx/nvfx_fragprog.c b/src/gallium/drivers/nvfx/nvfx_fragprog.c index 5fa825a..b4b63e2 100644 --- a/src/gallium/drivers/nvfx/nvfx_fragprog.c +++ b/src/gallium/drivers/nvfx/nvfx_fragprog.c @@ -1,6 +1,7 @@ #include "pipe/p_context.h" #include "pipe/p_defines.h" #include "pipe/p_state.h" +#include "util/u_semantics.h" #include "util/u_inlines.h" #include "pipe/p_shader_tokens.h" @@ -16,8 +17,6 @@ struct nvfx_fpc { struct nvfx_fragment_program *fp; - uint attrib_map[PIPE_MAX_SHADER_INPUTS]; - unsigned r_temps; unsigned r_temps_discard; struct nvfx_sreg r_result[PIPE_MAX_SHADER_OUTPUTS]; @@ -36,6 +35,8 @@ struct nvfx_fpc { struct nvfx_sreg imm[MAX_IMM]; unsigned nr_imm; + + unsigned char sem_table[256]; /* semantic idx for each input semantic */ }; static INLINE struct nvfx_sreg @@ -111,6 +112,11 @@ emit_src(struct nvfx_fpc *fpc, int pos, struct nvfx_sreg src) sr |= (NVFX_FP_REG_TYPE_TEMP << NVFX_FP_REG_TYPE_SHIFT); sr |= (src.index << NVFX_FP_REG_SRC_SHIFT); break; + case NVFXSR_RELOCATED: + sr |= (NVFX_FP_REG_TYPE_INPUT << NVFX_FP_REG_TYPE_SHIFT); + printf("adding relocation at %x for %x\n", fpc->inst_offset, src.index); + util_dynarray_append(&fpc->fp->sem_relocs[src.index], unsigned, fpc->inst_offset); + break; case NVFXSR_CONST: if (!fpc->have_const) { grow_insns(fpc, 4); @@ -241,8 +247,28 @@ tgsi_src(struct nvfx_fpc *fpc, const struct tgsi_full_src_register *fsrc) switch (fsrc->Register.File) { case TGSI_FILE_INPUT: - src = nvfx_sr(NVFXSR_INPUT, - fpc->attrib_map[fsrc->Register.Index]); + if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_POSITION) { + assert(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_POSITION); + } else if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_COLOR) { + if(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0) + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_COL0); + else if(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 1) + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_COL1); + else + assert(0); + } else if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_FOG) { + assert(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_FOGC); + } else if(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_FACE) { + /* TODO: check this has the correct values */ + /* XXX: what do we do for nv30 here (assuming it lacks facing)?! */ + assert(fpc->fp->info.input_semantic_index[fsrc->Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NV40_FP_OP_INPUT_SRC_FACING); + } else { + assert(fpc->fp->info.input_semantic_name[fsrc->Register.Index] == TGSI_SEMANTIC_GENERIC); + src = nvfx_sr(NVFXSR_RELOCATED, fpc->sem_table[fpc->fp->info.input_semantic_index[fsrc->Register.Index]]); + } break; case TGSI_FILE_CONSTANT: src = constant(fpc, fsrc->Register.Index, NULL); @@ -611,48 +637,6 @@ nvfx_fragprog_parse_instruction(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, } static boolean -nvfx_fragprog_parse_decl_attrib(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, - const struct tgsi_full_declaration *fdec) -{ - int hw; - - switch (fdec->Semantic.Name) { - case TGSI_SEMANTIC_POSITION: - hw = NVFX_FP_OP_INPUT_SRC_POSITION; - break; - case TGSI_SEMANTIC_COLOR: - if (fdec->Semantic.Index == 0) { - hw = NVFX_FP_OP_INPUT_SRC_COL0; - } else - if (fdec->Semantic.Index == 1) { - hw = NVFX_FP_OP_INPUT_SRC_COL1; - } else { - NOUVEAU_ERR("bad colour semantic index\n"); - return FALSE; - } - break; - case TGSI_SEMANTIC_FOG: - hw = NVFX_FP_OP_INPUT_SRC_FOGC; - break; - case TGSI_SEMANTIC_GENERIC: - if (fdec->Semantic.Index <= 7) { - hw = NVFX_FP_OP_INPUT_SRC_TC(fdec->Semantic. - Index); - } else { - NOUVEAU_ERR("bad generic semantic index\n"); - return FALSE; - } - break; - default: - NOUVEAU_ERR("bad input semantic\n"); - return FALSE; - } - - fpc->attrib_map[fdec->Range.First] = hw; - return TRUE; -} - -static boolean nvfx_fragprog_parse_decl_output(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, const struct tgsi_full_declaration *fdec) { @@ -691,6 +675,15 @@ nvfx_fragprog_prepare(struct nvfx_context* nvfx, struct nvfx_fpc *fpc) { struct tgsi_parse_context p; int high_temp = -1, i; + struct util_semantic_set set; + + fpc->fp->num_semantics = util_semantic_set_from_program_file(&set, fpc->fp->pipe.tokens, TGSI_FILE_INPUT); + if(fpc->fp->num_semantics > 8) + return FALSE; + util_semantic_layout_from_set(fpc->fp->semantics, &set, 0, 8); + util_semantic_table_from_layout(fpc->sem_table, fpc->fp->semantics, 0, 8); + + memset(fpc->fp->cur_slots, 0xff, sizeof(fpc->fp->cur_slots)); tgsi_parse_init(&p, fpc->fp->pipe.tokens); while (!tgsi_parse_end_of_tokens(&p)) { @@ -703,10 +696,6 @@ nvfx_fragprog_prepare(struct nvfx_context* nvfx, struct nvfx_fpc *fpc) const struct tgsi_full_declaration *fdec; fdec = &p.FullToken.FullDeclaration; switch (fdec->Declaration.File) { - case TGSI_FILE_INPUT: - if (!nvfx_fragprog_parse_decl_attrib(nvfx, fpc, fdec)) - goto out_err; - break; case TGSI_FILE_OUTPUT: if (!nvfx_fragprog_parse_decl_output(nvfx, fpc, fdec)) goto out_err; @@ -878,6 +867,31 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) if (nvfx->dirty & NVFX_NEW_FRAGCONST) update = TRUE; + struct nvfx_vertex_program* vp = nvfx->render_mode == HW ? nvfx->vertprog : nvfx->swtnl.vertprog; + if (fp->last_vp_id != vp->id) { + char* vp_sem_table = vp->sem_table; + unsigned char* fp_semantics = fp->semantics; + unsigned diff = 0; + fp->last_vp_id = nvfx->vertprog->id; + unsigned char* cur_slots = fp->cur_slots; + for(unsigned i = 0; i < fp->num_semantics; ++i) { + unsigned char slot_mask = vp_sem_table[fp_semantics[i]]; + diff |= (slot_mask >> 4) & (slot_mask ^ cur_slots[i]); + } + + if(diff) + { + fp->cur_slots_progs_left = fp->progs; + for(unsigned i = 0; i < fp->num_semantics; ++i) { + /* if 0xff, then this will write to the dummy value at fp->last_layout_mask[0] */ + fp->cur_slots[i] = vp_sem_table[fp_semantics[i]] & 0xf; + printf("fp: GENERIC[%i] from fpreg %i\n", fp_semantics[i], fp->cur_slots[i]); + } + + update = TRUE; + } + } + if(update) { ++fp->bo_prog_idx; if(fp->bo_prog_idx >= fp->progs_per_bo) @@ -888,7 +902,9 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) } else { - struct nvfx_fragment_program_bo* fpbo = os_malloc_aligned(sizeof(struct nvfx_fragment_program) + fp->prog_size * fp->progs_per_bo, 16); + struct nvfx_fragment_program_bo* fpbo = os_malloc_aligned(sizeof(struct nvfx_fragment_program) + (fp->prog_size + 8) * fp->progs_per_bo, 16); + fpbo->slots = &fpbo->insn[(fp->prog_size) * fp->progs_per_bo]; + memset(fpbo->slots, 0, 8 * fp->progs_per_bo); if(fp->fpbo) { fpbo->next = fp->fpbo->next; @@ -898,6 +914,8 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) fpbo->next = fpbo; fp->fpbo = fpbo; fpbo->bo = 0; + fp->progs += fp->progs_per_bo; + fp->cur_slots_progs_left += fp->progs_per_bo; nouveau_bo_new(nvfx->screen->base.device, NOUVEAU_BO_VRAM | NOUVEAU_BO_MAP, 64, fp->prog_size * fp->progs_per_bo, &fpbo->bo); nouveau_bo_map(fpbo->bo, NOUVEAU_BO_NOSYNC); @@ -915,6 +933,7 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) } int offset = fp->bo_prog_idx * fp->prog_size; + uint32_t* fpmap = (uint32_t*)((char*)fp->fpbo->bo->map + offset); if(nvfx->constbuf[PIPE_SHADER_FRAGMENT]) { struct pipe_resource* constbuf = nvfx->constbuf[PIPE_SHADER_FRAGMENT]; @@ -922,7 +941,6 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) struct pipe_transfer* transfer; // TODO: does this check make any sense, or should we do this unconditionally? uint32_t* map = pipe_buffer_map(&nvfx->pipe, constbuf, PIPE_TRANSFER_READ, &transfer); - uint32_t* fpmap = (uint32_t*)((char*)fp->fpbo->bo->map + offset); uint32_t* buf = (uint32_t*)((char*)fp->fpbo->insn + offset); for (i = 0; i < fp->nr_consts; ++i) { unsigned off = fp->consts[i].offset; @@ -936,6 +954,25 @@ nvfx_fragprog_validate(struct nvfx_context *nvfx) } pipe_buffer_unmap(&nvfx->pipe, constbuf, transfer); } + + if(fp->cur_slots_progs_left) { + unsigned char* fpbo_slots = &fp->fpbo->slots[fp->bo_prog_idx * 8]; + for(unsigned i = 0; i < fp->num_semantics; ++i) { + unsigned value = fp->cur_slots[i];; + if(value != fpbo_slots[i]) { + unsigned* p = (unsigned*)fp->sem_relocs[i].data; + unsigned* pend = (unsigned*)((char*)fp->sem_relocs[i].data + fp->sem_relocs[i].size); + for(; p != pend; ++p) { + unsigned off = *p; + unsigned dw = fp->insn[off]; + dw = (dw & ~NVFX_FP_OP_INPUT_SRC_MASK) | (value << NVFX_FP_OP_INPUT_SRC_SHIFT); + nvfx_fp_memcpy(&fpmap[*p], &dw, sizeof(dw)); + } + fpbo_slots[i] = value; + } + } + --fp->cur_slots_progs_left; + } } if(update || (nvfx->dirty & NVFX_NEW_FRAGPROG)) { @@ -977,6 +1014,7 @@ void nvfx_fragprog_destroy(struct nvfx_context *nvfx, struct nvfx_fragment_program *fp) { + unsigned i; struct nvfx_fragment_program_bo* fpbo = fp->fpbo; if(fpbo) { @@ -991,7 +1029,9 @@ nvfx_fragprog_destroy(struct nvfx_context *nvfx, while(fpbo != fp->fpbo); } + for(i = 0; i < 8; ++i) + util_dynarray_fini(&fp->sem_relocs[i]); + if (fp->insn_len) FREE(fp->insn); } - diff --git a/src/gallium/drivers/nvfx/nvfx_shader.h b/src/gallium/drivers/nvfx/nvfx_shader.h index 50830b3..88cf91b 100644 --- a/src/gallium/drivers/nvfx/nvfx_shader.h +++ b/src/gallium/drivers/nvfx/nvfx_shader.h @@ -323,6 +323,7 @@ #define NVFXSR_INPUT 2 #define NVFXSR_TEMP 3 #define NVFXSR_CONST 4 +#define NVFXSR_RELOCATED 5 #define NVFX_COND_FL 0 #define NVFX_COND_LT 1 diff --git a/src/gallium/drivers/nvfx/nvfx_state.c b/src/gallium/drivers/nvfx/nvfx_state.c index 315de49..3f0c8e6 100644 --- a/src/gallium/drivers/nvfx/nvfx_state.c +++ b/src/gallium/drivers/nvfx/nvfx_state.c @@ -411,9 +411,13 @@ nvfx_vp_state_create(struct pipe_context *pipe, struct nvfx_context *nvfx = nvfx_context(pipe); struct nvfx_vertex_program *vp; + // TODO: use a 64-bit atomic here! + static unsigned long long id = 0; + vp = CALLOC(1, sizeof(struct nvfx_vertex_program)); vp->pipe.tokens = tgsi_dup_tokens(cso->tokens); vp->draw = draw_create_vertex_shader(nvfx->draw, &vp->pipe); + vp->id = ++id; return (void *)vp; } diff --git a/src/gallium/drivers/nvfx/nvfx_state.h b/src/gallium/drivers/nvfx/nvfx_state.h index 9ceb257..3cd7981 100644 --- a/src/gallium/drivers/nvfx/nvfx_state.h +++ b/src/gallium/drivers/nvfx/nvfx_state.h @@ -4,6 +4,8 @@ #include "pipe/p_state.h" #include "tgsi/tgsi_scan.h" #include "nouveau/nouveau_statebuf.h" +#include "util/u_dynarray.h" +#include "util/u_linkage.h" struct nvfx_vertex_program_exec { uint32_t data[4]; @@ -18,6 +20,7 @@ struct nvfx_vertex_program_data { struct nvfx_vertex_program { struct pipe_shader_state pipe; + unsigned long long id; struct draw_vertex_shader *draw; @@ -30,6 +33,8 @@ struct nvfx_vertex_program { struct nvfx_vertex_program_data *consts; unsigned nr_consts; + char sem_table[256]; + struct nouveau_resource *exec; unsigned exec_start; struct nouveau_resource *data; @@ -49,6 +54,7 @@ struct nvfx_fragment_program_data { struct nvfx_fragment_program_bo { struct nvfx_fragment_program_bo* next; struct nouveau_bo* bo; + unsigned char* slots; char insn[] __attribute__((aligned(16))); }; @@ -65,11 +71,20 @@ struct nvfx_fragment_program { struct nvfx_fragment_program_data *consts; unsigned nr_consts; + unsigned num_semantics; /* how many input semantics? */ + unsigned char semantics[8]; /* semantics */ + unsigned char cur_slots[8]; /* current assignment of slots for each used semantic */ + unsigned cur_slots_progs_left; + unsigned long long last_vp_id; + struct util_dynarray sem_relocs[8]; /* semantic relocation offset */ + uint32_t fp_control; unsigned bo_prog_idx; unsigned prog_size; unsigned progs_per_bo; + unsigned progs; + struct nvfx_fragment_program_bo* fpbo; }; diff --git a/src/gallium/drivers/nvfx/nvfx_state_emit.c b/src/gallium/drivers/nvfx/nvfx_state_emit.c index 4137849..1398597 100644 --- a/src/gallium/drivers/nvfx/nvfx_state_emit.c +++ b/src/gallium/drivers/nvfx/nvfx_state_emit.c @@ -47,7 +47,7 @@ nvfx_state_validate_common(struct nvfx_context *nvfx) if(dirty & NVFX_NEW_STIPPLE) nvfx_state_stipple_validate(nvfx); - if(dirty & (NVFX_NEW_FRAGPROG | NVFX_NEW_FRAGCONST)) + if(dirty & (NVFX_NEW_FRAGPROG | NVFX_NEW_FRAGCONST | NVFX_NEW_VERTPROG)) nvfx_fragprog_validate(nvfx); if(dirty & NVFX_NEW_SAMPLER) diff --git a/src/gallium/drivers/nvfx/nvfx_vertprog.c b/src/gallium/drivers/nvfx/nvfx_vertprog.c index b405fd9..4241e73 100644 --- a/src/gallium/drivers/nvfx/nvfx_vertprog.c +++ b/src/gallium/drivers/nvfx/nvfx_vertprog.c @@ -1,7 +1,8 @@ #include "pipe/p_context.h" #include "pipe/p_defines.h" #include "pipe/p_state.h" -#include "util/u_inlines.h" +#include "util/u_semantics.h" +#include "util/u_linkage.h" #include "pipe/p_shader_tokens.h" #include "tgsi/tgsi_parse.h" @@ -60,7 +61,7 @@ temp(struct nvfx_vpc *vpc) return nvfx_sr(NVFXSR_TEMP, idx); } -static INLINE void +static inline void release_temps(struct nvfx_vpc *vpc) { vpc->r_temps &= ~vpc->r_temps_discard; @@ -332,7 +333,7 @@ nvfx_vp_arith(struct nvfx_context* nvfx, struct nvfx_vpc *vpc, int slot, int op, emit_src(nvfx, vpc, hw, 2, s2); } -static INLINE struct nvfx_sreg +static inline struct nvfx_sreg tgsi_src(struct nvfx_vpc *vpc, const struct tgsi_full_src_register *fsrc) { struct nvfx_sreg src; @@ -378,14 +379,14 @@ tgsi_dst(struct nvfx_vpc *vpc, const struct tgsi_full_dst_register *fdst) { dst = vpc->r_address[fdst->Register.Index]; break; default: - NOUVEAU_ERR("bad dst file\n"); + NOUVEAU_ERR("bad dst file %i\n", fdst->Register.File); break; } return dst; } -static INLINE int +static inline int tgsi_mask(uint tgsi) { int mask = 0; @@ -643,12 +644,8 @@ nvfx_vertprog_parse_decl_output(struct nvfx_context* nvfx, struct nvfx_vpc *vpc, hw = NVFX_VP(INST_DEST_PSZ); break; case TGSI_SEMANTIC_GENERIC: - if (fdec->Semantic.Index <= 7) { - hw = NVFX_VP(INST_DEST_TC(fdec->Semantic.Index)); - } else { - NOUVEAU_ERR("bad generic semantic index\n"); - return FALSE; - } + hw = (vpc->vp->sem_table[fdec->Semantic.Index] & 0xf) + + NVFX_VP(INST_DEST_TC(0)) - NVFX_FP_OP_INPUT_SRC_TC(0); break; case TGSI_SEMANTIC_EDGEFLAG: /* not really an error just a fallback */ @@ -668,6 +665,27 @@ nvfx_vertprog_prepare(struct nvfx_context* nvfx, struct nvfx_vpc *vpc) { struct tgsi_parse_context p; int high_temp = -1, high_addr = -1, nr_imm = 0, i; + struct util_semantic_set set; + unsigned char sem_layout[8]; + unsigned sem_layout_size; + unsigned num_outputs; + + num_outputs = util_semantic_set_from_program_file(&set, vpc->vp->pipe.tokens, TGSI_FILE_OUTPUT); + + if(num_outputs > 8) { + NOUVEAU_ERR("too many vertex program outputs: %i\n", num_outputs); + return FALSE; + } + util_semantic_layout_from_set(sem_layout, &set, 8, 8); + + /* hope 0xf is (0, 0, 0, 1) initialized; otherwise, we are _probably_ not required to do this */ + memset(vpc->vp->sem_table, 0x0f, sizeof(vpc->vp->sem_table)); + for(int i = 0; i < 8; ++i) { + if(sem_layout[i] == 0xff) + continue; + printf("vp: GENERIC[%i] to fpreg %i\n", sem_layout[i], NVFX_FP_OP_INPUT_SRC_TC(0) + i); + vpc->vp->sem_table[sem_layout[i]] = 0xf0 | (NVFX_FP_OP_INPUT_SRC_TC(0) + i); + } tgsi_parse_init(&p, vpc->vp->pipe.tokens); while (!tgsi_parse_end_of_tokens(&p)) { -- 1.7.0.1.147.g6d84b |
From: Keith W. <ke...@vm...> - 2010-04-13 11:15:40
|
On Tue, 2010-04-13 at 03:55 -0700, Luca Barbieri wrote: > Personally I think the simplest idea for now could be to have all > drivers support 256 indices or, in the case of r600 and svga, the > maximum value supported by the hardware, and expose that as a cap (as > well as another cap for the number of different semantic values > supported at once). > The minimum guaranteed value is set to the lowest hardware constraint, > which would be svga with 219 indices (assuming no bcolor is used). > If some new constraints pop up, we just lower it and change SM3 state > trackers to check for it and fallback otherwise. Luca, Thanks for your patience and efforts in compiling this - I really appreciate the effort you've put into this and the persistence to keep coming back to it. The patchset looks good to me at first reading, I'll dig in more deeply. Keith |
From: Christoph B. <e04...@st...> - 2010-12-13 21:02:53
|
I want to warm this up again adding nvc0 and GL_ARB_separate_shader_objects to the picture. The latter extends GL_EXT_separate_shader_objects to support user defined varyings and guarantees well defined behaviour only if - varyings are declared inside the gl_PerVertex/gl_PerFragment block the blocks match exactly in name, type, qualification, and (most significantly) declaration order. - varyings are assigned matching location qualifiers: like: layout(location = 3) in vec4 normal "The number of input locations available to a shader is limited." So, I propose to (loosely) identify GENERIC semantic indices with these location qualifiers and let the pipe driver set a limit on the allowed maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs). My motivation is mostly that the hardware routing table for shader varyings that was present on nv50 has been removed with nvc0 (Fermi). And I'm glad, because filling 4 routing tables (since we have 5 shader types now) is somewhat annoying. And so applying relocations to shaders - it can be done, it's probably not too time consuming, but it's just plain *unnecessary* (and thus stupid) for OpenGL. Now about d3d9 ... 1. don't care, I don't see a d3d9 state tracker 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx says "n is an optional integer between 0 and the number of resources supported" - what "supported" means here isn't clear to me, but, I didn't find any example where someone used something OpenGL doesn't have (like COLOR2). 3. http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics says "Input semantics are similar to the values in the D3DDECLUSAGE." and DECLUSAGE sounds like you're limited to sane values. Not sure if anyone wants to think about this issue at this time (since implementation of ARB_separate_shader_objects is probably far in the GL4 future), but I'd be happy about any comments. Regards, Christoph On 04/13/2010 12:55 PM, Luca Barbieri wrote: > This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. > It can also be found in the RFC-gallium-semantics branch. > > It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. > > These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. > > Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. > > In particular, the following issues are still open: > 1. COLOR clamping (and floating point framebuffers) > 2. A linkage table CSO allowing to specify non-identity linkage > 3. BCOLOR/FACE-related issues > 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided > > This topic was already very extensively discussed. > See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) > See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. > See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . > > That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. > This message should probably be adapted into gallium/docs if/when this is accepted. > > Here is the short summary; the long rationale follows after it. > > The proposal here is to add the following limitations to Gallium, for the intermediate semantics: > 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged > 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 > 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) > 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used > 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used > 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) > 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen > > The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. > > The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: > 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host > 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium > > Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. > > SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. > However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. > > The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. > Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. > > Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. > > =============== > Michal Krol's proposal > =============== > > First of all, see Michal Krol's proposal at http://www.opensource-archive.org/showthread.php?t=148573, and in particular: > << > name index range > ---------------------------- > POSITION no limit? > COLOR 0..1, explicit clamp? > BCOLOR 0..1, explicit clamp? > FOG remove? > PSIZE 0 > GENERIC 0..<max generics> > NORMAL remove > FACE 0 > EDGEFLAG 0 > PRIMID 0 > INSTANCEID 0 >>> > > My proposal follows this, except for limiting POSITION to 0 too. > Not sure why Michal thought "no limit" could make sense: the POSITION is fundamentally a singleton, since it is the input to the rasterizer unit. > > > ====================== > An overview of hardware support > ====================== > > Hardware with no capabilities. > - nv30 does not support any mapping. However, we already need to patch > fragment programs to insert constants, so we can patch input register > numbers as well. The current driver only supports 0-7 generic indices, > but I already implemented support for 0-255 indices with in-driver > linkage and patching. Note that nv30 lacks control flow in fragment > programs. > - nv40 is like nv30, but supports fp control flow, and may have some > configurable mapping support, with unknown behavior > > Hardware with capabilities that must be configured for each fp/vp pair. > - nv40 might have this but the nVidia OpenGL driver does not use them > - nv50 has configurable vp->gp and gp->fp mappings with 64 entries. > The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses an inefficient O(n^2) algorithm to be able to do that > > - r300 appears to have a configurable vp->fp mapping. The current > driver only supports 0-15 generic indices, but redefining > ATTR_GENERIC_COUNT could be enough to have it support larger numbers. > > Hardware with automatic linkage when semantics match: > - VMWare svga appears to support 14 * 16 semantics, but the current > driver only supports 0-15 generic indices. This could be fixed by > mapping GENERIC into all non-special SM3 semantics. > > Hardware that can do both configurable mappings and automatic linkage: > - r600 supports linkage in hardware between matching apparently > byte-sized semantic ids > > Other hardware; > - i915 has no hardware vertex shading > The current driver is broken and only supports 0-7 indices: this seems > easy to fix though > - Not sure about i965 > > =================== > An overview of software APIs > =================== > > 1. DirectX 9 SM3 supports indices in the 0-15 range associated with > semantics in the 0-13 range. > > A few of the name/index pairs have special meanings, but the others > are just cosmetic as long as the fixed pipeline is not used. > > Thus, SM3 wants to use 14 * 16 indices overall. > > Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC > semantics, leaving 219 semantics handled by GENERIC > > 2. SM2 and non-GLSL OpenGL just want to use as many indices as the > hardware interpolator count, sometimes limiting that further > > They are the most easy and straightforward ones. > > 3. DirectX 10 seems to only require a 0-31 range. > > In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and > 32-bit indices. > > However, this information is encoded as metadata in the output file, and > the shader bytecode itself uses integers in the 0-31 range to refer to the > metadata. > > It seems that the metadata is resolved by the Microsoft DirectX 10 runtime, > and the driver only sees 0-31 indices on the DDI interface. > > However, this is a bit unclear: confirmation or correction would be > appreciated. > > 4. GLSL requires to provide both shaders at link time, and thus does > not constrain the implementation in any way. > > However, it may be possible to mix GLSL with other shaders, leading to > the need to reserve the texcoord slots. > > In that case, GLSL will need about 8 more slots that the number of > effectively used semantics. > > This is the case with the current Mesa/Gallium implementation > > 5. GLSL with EXT_separate_shader_objects does not add requirements > because only gl_TexCoord and other builtin varyings are supported. > User-defined varyings are not supported > > See in particular the following text from the extension: > << > It is undesirable from a performance standpoint to attempt to > support "rendezvous by name" for arbitrary separate shaders > because the separate shaders won't be naturally compiled to > match their varying inputs and outputs of the same name without > a special link step. Such a special link would introduce an > extra validation overhead to binding separate shaders. The link > itself would have to be deferred until glBegin time since separate > shaders won't match when transitioning from one set of consistent > shaders to another. This special link would still create errors > or undefined behavior when the names of input and output varyings > matched but their types did not match. >>> > > 6. An hypotetical version of EXT_separate_shader_objects extended to > support user-defining varyings would either want arbitrary 32-bit > generic indices (by interning strings to generate the indices) or the > ability to specify a custom mapping between shader indices > > 7. An hypotetical "no-op" implementation of the GLSL linker would have > the same requirement > > > ==================== > About non-GENERIC semantics > ==================== > > Also note that non-GENERIC semantics have peculiar properties. > > For COLOR and BCOLOR: > 1. SM3 and OpenGL with glColorClamp appropriately set wants it to > _not_ be clamped to [0, 1] > 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] > (sometimes for fixed point targets only) and may also allow using > U8_UNORM precision for it instead of FP32 > 3. OpenGL allows to enable two-sided lighting, in which case COLOR in > the fragment shader is automagically set to BCOLOR for back faces > 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. > Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. > The latest hardware probably supports FACING only. > > Any API that requires special semantics for COLOR and BCOLOR (i.e. > non-SM3) seems to only want 0-1 indices. > > Note that SM3 does *not* include BCOLOR, so basically the limits for > generic indices would need to be conditional on BCOLOR being present > or not (e.g. if it is present, we must reserve two semantic slots in > svga for it). > > POSITION0 is obviously special. > PSIZE0 is also special for points. > > FOG0 seems right now to just be a GENERIC with a single component. > Gallium could be extended to support fixed function fog, which most > DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal > to the semantic issue. > > ============== > Current Gallium users > ============== > > Right now no open-source users of Gallium fundamentally require arbitrary indices. > In particular: > 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices > 2. ARB fragment program and vertex program use index-limited texcoord slots > 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 > 4. vega and xorg use indices 0-1 > 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken > 6. The GL_EXT_separate_shader_objects extension does not provide > arbitrary index matching for GLSL, but merely lets it use a model > similar to ARB fp/vp > > However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway. > > =================== > Discussion of possible options > =================== > > [Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719] > a) Picking a lower number like 128, that an SM3 state tracker could > usually be able to directly translate incoming semantics into, but which > would force it to renumber under rare circumstances. This would make > life easier for the open drivers at the expense of the closed code. > > b) Picking 256 to make life easier for some closed-source SM3 state > tracker, but harder for open drivers. > > c) Picking 219 (or some other magic number) that happens to work with > the current set of constraints, but makes gallium fragile in the face of > new constraints. > > d) Abandoning the current gallium linkage rules and coming up with > something new, for instance forcing the state trackers to renumber > always and making life trivial for the drivers... > > [Options from me] > > (e) Allow arbitrary 32-bit indices. This requires slightly more > complicated data structures in some cases, and will require svga and > r600 to fallback to software linkage if numbers are too high. > > (f) Limit semantic indices to hardware interpolators _and_ introduce > an interface to let the user specify an > > Personally I think the simplest idea for now could be to have all > drivers support 256 indices or, in the case of r600 and svga, the > maximum value supported by the hardware, and expose that as a cap (as > well as another cap for the number of different semantic values > supported at once). > The minimum guaranteed value is set to the lowest hardware constraint, > which would be svga with 219 indices (assuming no bcolor is used). > If some new constraints pop up, we just lower it and change SM3 state > trackers to check for it and fallback otherwise. > > This should just require simple fixes to svga and r300, and > significant code for nv30/nv40, which is however already implemented. > > Luca Barbieri (5): > tgsi: formalize limits on semantic indices > tgsi: add support for packing semantics in SM3 byte values > gallium/auxiliary: add semantic linkage utility code > nvfx: support proper shader linkage - adds glsl support > nvfx: expose GLSL > > Michal Krol (1): > gallium: Remove TGSI_SEMANTIC_NORMAL. > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev |
From: Keith W. <ke...@vm...> - 2010-12-14 11:35:03
|
On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: > I want to warm this up again adding nvc0 and > GL_ARB_separate_shader_objects to the picture. > > The latter extends GL_EXT_separate_shader_objects to support user > defined varyings and guarantees well defined behaviour only if > - varyings are declared inside the gl_PerVertex/gl_PerFragment block the > blocks match exactly in name, type, qualification, and (most > significantly) declaration order. > - varyings are assigned matching location qualifiers: > like: layout(location = 3) in vec4 normal > "The number of input locations available to a shader is limited." > > So, I propose to (loosely) identify GENERIC semantic indices with these > location qualifiers and let the pipe driver set a limit on the allowed > maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least > support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs). This sounds fine actually. We kicked this around before & I was basically ok with the last iteration of the proposal, but this seems ok too. As far as I can tell from a gallium perspective you're really just proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would be clearer), which the state tracker thereafter has to respect? That would be fine with me. > My motivation is mostly that the hardware routing table for shader > varyings that was present on nv50 has been removed with nvc0 (Fermi). > And I'm glad, because filling 4 routing tables (since we have 5 shader > types now) is somewhat annoying. And so applying relocations to shaders > - it can be done, it's probably not too time consuming, but it's just > plain *unnecessary* (and thus stupid) for OpenGL. > > Now about d3d9 ... > 1. don't care, I don't see a d3d9 state tracker > 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx > says "n is an optional integer between 0 and the number of resources > supported" - what "supported" means here isn't clear to me, but, I > didn't find any example where someone used something OpenGL doesn't have > (like COLOR2). > 3. > http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics > says "Input semantics are similar to the values in the D3DDECLUSAGE." > and > DECLUSAGE sounds like you're limited to sane values. I think you're on the right track with (1)... It's fairly pointless trying to discuss code here which isn't public & I don't think people need to be worrying about what may or may not be important for code they can't see. I know this idea previously got tied up with speculation about what a DX9 state tracker might or might not require, but in retrospect I wish I'd been able to steer conversation away from that. The work on closed components may drive a lot of the feature development and new interfaces, but there's usually enough flexibility that this sort of cleanup isn't a big deal. Keith > Not sure if anyone wants to think about this issue at this time (since > implementation of ARB_separate_shader_objects is probably far in the GL4 > future), but I'd be happy about any comments. > > Regards, > Christoph > > On 04/13/2010 12:55 PM, Luca Barbieri wrote: > > This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. > > It can also be found in the RFC-gallium-semantics branch. > > > > It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. > > > > These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. > > > > Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. > > > > In particular, the following issues are still open: > > 1. COLOR clamping (and floating point framebuffers) > > 2. A linkage table CSO allowing to specify non-identity linkage > > 3. BCOLOR/FACE-related issues > > 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided > > > > This topic was already very extensively discussed. > > See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) > > See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. > > See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . > > > > That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. > > This message should probably be adapted into gallium/docs if/when this is accepted. > > > > Here is the short summary; the long rationale follows after it. > > > > The proposal here is to add the following limitations to Gallium, for the intermediate semantics: > > 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged > > 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 > > 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) > > 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used > > 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used > > 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) > > 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen > > > > The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. > > > > The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: > > 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host > > 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium > > > > Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. > > > > SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. > > However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. > > > > The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. > > Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. > > > > Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. > > > > =============== > > Michal Krol's proposal > > =============== > > > > First of all, see Michal Krol's proposal at http://www.opensource-archive.org/showthread.php?t=148573, and in particular: > > << > > name index range > > ---------------------------- > > POSITION no limit? > > COLOR 0..1, explicit clamp? > > BCOLOR 0..1, explicit clamp? > > FOG remove? > > PSIZE 0 > > GENERIC 0..<max generics> > > NORMAL remove > > FACE 0 > > EDGEFLAG 0 > > PRIMID 0 > > INSTANCEID 0 > >>> > > > > My proposal follows this, except for limiting POSITION to 0 too. > > Not sure why Michal thought "no limit" could make sense: the POSITION is fundamentally a singleton, since it is the input to the rasterizer unit. > > > > > > ====================== > > An overview of hardware support > > ====================== > > > > Hardware with no capabilities. > > - nv30 does not support any mapping. However, we already need to patch > > fragment programs to insert constants, so we can patch input register > > numbers as well. The current driver only supports 0-7 generic indices, > > but I already implemented support for 0-255 indices with in-driver > > linkage and patching. Note that nv30 lacks control flow in fragment > > programs. > > - nv40 is like nv30, but supports fp control flow, and may have some > > configurable mapping support, with unknown behavior > > > > Hardware with capabilities that must be configured for each fp/vp pair. > > - nv40 might have this but the nVidia OpenGL driver does not use them > > - nv50 has configurable vp->gp and gp->fp mappings with 64 entries. > > The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses an inefficient O(n^2) algorithm to be able to do that > > > > - r300 appears to have a configurable vp->fp mapping. The current > > driver only supports 0-15 generic indices, but redefining > > ATTR_GENERIC_COUNT could be enough to have it support larger numbers. > > > > Hardware with automatic linkage when semantics match: > > - VMWare svga appears to support 14 * 16 semantics, but the current > > driver only supports 0-15 generic indices. This could be fixed by > > mapping GENERIC into all non-special SM3 semantics. > > > > Hardware that can do both configurable mappings and automatic linkage: > > - r600 supports linkage in hardware between matching apparently > > byte-sized semantic ids > > > > Other hardware; > > - i915 has no hardware vertex shading > > The current driver is broken and only supports 0-7 indices: this seems > > easy to fix though > > - Not sure about i965 > > > > =================== > > An overview of software APIs > > =================== > > > > 1. DirectX 9 SM3 supports indices in the 0-15 range associated with > > semantics in the 0-13 range. > > > > A few of the name/index pairs have special meanings, but the others > > are just cosmetic as long as the fixed pipeline is not used. > > > > Thus, SM3 wants to use 14 * 16 indices overall. > > > > Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC > > semantics, leaving 219 semantics handled by GENERIC > > > > 2. SM2 and non-GLSL OpenGL just want to use as many indices as the > > hardware interpolator count, sometimes limiting that further > > > > They are the most easy and straightforward ones. > > > > 3. DirectX 10 seems to only require a 0-31 range. > > > > In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and > > 32-bit indices. > > > > However, this information is encoded as metadata in the output file, and > > the shader bytecode itself uses integers in the 0-31 range to refer to the > > metadata. > > > > It seems that the metadata is resolved by the Microsoft DirectX 10 runtime, > > and the driver only sees 0-31 indices on the DDI interface. > > > > However, this is a bit unclear: confirmation or correction would be > > appreciated. > > > > 4. GLSL requires to provide both shaders at link time, and thus does > > not constrain the implementation in any way. > > > > However, it may be possible to mix GLSL with other shaders, leading to > > the need to reserve the texcoord slots. > > > > In that case, GLSL will need about 8 more slots that the number of > > effectively used semantics. > > > > This is the case with the current Mesa/Gallium implementation > > > > 5. GLSL with EXT_separate_shader_objects does not add requirements > > because only gl_TexCoord and other builtin varyings are supported. > > User-defined varyings are not supported > > > > See in particular the following text from the extension: > > << > > It is undesirable from a performance standpoint to attempt to > > support "rendezvous by name" for arbitrary separate shaders > > because the separate shaders won't be naturally compiled to > > match their varying inputs and outputs of the same name without > > a special link step. Such a special link would introduce an > > extra validation overhead to binding separate shaders. The link > > itself would have to be deferred until glBegin time since separate > > shaders won't match when transitioning from one set of consistent > > shaders to another. This special link would still create errors > > or undefined behavior when the names of input and output varyings > > matched but their types did not match. > >>> > > > > 6. An hypotetical version of EXT_separate_shader_objects extended to > > support user-defining varyings would either want arbitrary 32-bit > > generic indices (by interning strings to generate the indices) or the > > ability to specify a custom mapping between shader indices > > > > 7. An hypotetical "no-op" implementation of the GLSL linker would have > > the same requirement > > > > > > ==================== > > About non-GENERIC semantics > > ==================== > > > > Also note that non-GENERIC semantics have peculiar properties. > > > > For COLOR and BCOLOR: > > 1. SM3 and OpenGL with glColorClamp appropriately set wants it to > > _not_ be clamped to [0, 1] > > 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] > > (sometimes for fixed point targets only) and may also allow using > > U8_UNORM precision for it instead of FP32 > > 3. OpenGL allows to enable two-sided lighting, in which case COLOR in > > the fragment shader is automagically set to BCOLOR for back faces > > 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. > > Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. > > The latest hardware probably supports FACING only. > > > > Any API that requires special semantics for COLOR and BCOLOR (i.e. > > non-SM3) seems to only want 0-1 indices. > > > > Note that SM3 does *not* include BCOLOR, so basically the limits for > > generic indices would need to be conditional on BCOLOR being present > > or not (e.g. if it is present, we must reserve two semantic slots in > > svga for it). > > > > POSITION0 is obviously special. > > PSIZE0 is also special for points. > > > > FOG0 seems right now to just be a GENERIC with a single component. > > Gallium could be extended to support fixed function fog, which most > > DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal > > to the semantic issue. > > > > ============== > > Current Gallium users > > ============== > > > > Right now no open-source users of Gallium fundamentally require arbitrary indices. > > In particular: > > 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices > > 2. ARB fragment program and vertex program use index-limited texcoord slots > > 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 > > 4. vega and xorg use indices 0-1 > > 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken > > 6. The GL_EXT_separate_shader_objects extension does not provide > > arbitrary index matching for GLSL, but merely lets it use a model > > similar to ARB fp/vp > > > > However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway. > > > > =================== > > Discussion of possible options > > =================== > > > > [Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719] > > a) Picking a lower number like 128, that an SM3 state tracker could > > usually be able to directly translate incoming semantics into, but which > > would force it to renumber under rare circumstances. This would make > > life easier for the open drivers at the expense of the closed code. > > > > b) Picking 256 to make life easier for some closed-source SM3 state > > tracker, but harder for open drivers. > > > > c) Picking 219 (or some other magic number) that happens to work with > > the current set of constraints, but makes gallium fragile in the face of > > new constraints. > > > > d) Abandoning the current gallium linkage rules and coming up with > > something new, for instance forcing the state trackers to renumber > > always and making life trivial for the drivers... > > > > [Options from me] > > > > (e) Allow arbitrary 32-bit indices. This requires slightly more > > complicated data structures in some cases, and will require svga and > > r600 to fallback to software linkage if numbers are too high. > > > > (f) Limit semantic indices to hardware interpolators _and_ introduce > > an interface to let the user specify an > > > > Personally I think the simplest idea for now could be to have all > > drivers support 256 indices or, in the case of r600 and svga, the > > maximum value supported by the hardware, and expose that as a cap (as > > well as another cap for the number of different semantic values > > supported at once). > > The minimum guaranteed value is set to the lowest hardware constraint, > > which would be svga with 219 indices (assuming no bcolor is used). > > If some new constraints pop up, we just lower it and change SM3 state > > trackers to check for it and fallback otherwise. > > > > This should just require simple fixes to svga and r300, and > > significant code for nv30/nv40, which is however already implemented. > > > > Luca Barbieri (5): > > tgsi: formalize limits on semantic indices > > tgsi: add support for packing semantics in SM3 byte values > > gallium/auxiliary: add semantic linkage utility code > > nvfx: support proper shader linkage - adds glsl support > > nvfx: expose GLSL > > > > Michal Krol (1): > > gallium: Remove TGSI_SEMANTIC_NORMAL. > > > > > > ------------------------------------------------------------------------------ > > Download Intel® Parallel Studio Eval > > Try the new software tools for yourself. Speed compiling, find bugs > > proactively, and fine-tune applications for parallel performance. > > See why Intel Parallel Studio got high marks during beta. > > http://p.sf.net/sfu/intel-sw-dev > > _______________________________________________ > > Mesa3d-dev mailing list > > Mes...@li... > > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev |
From: Christoph B. <e04...@st...> - 2010-12-16 15:47:13
|
On 12/14/2010 12:36 PM, Keith Whitwell wrote: > On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: >> I want to warm this up again adding nvc0 and >> GL_ARB_separate_shader_objects to the picture. >> >> The latter extends GL_EXT_separate_shader_objects to support user >> defined varyings and guarantees well defined behaviour only if >> - varyings are declared inside the gl_PerVertex/gl_PerFragment block the >> blocks match exactly in name, type, qualification, and (most >> significantly) declaration order. >> - varyings are assigned matching location qualifiers: >> like: layout(location = 3) in vec4 normal >> "The number of input locations available to a shader is limited." >> >> So, I propose to (loosely) identify GENERIC semantic indices with these >> location qualifiers and let the pipe driver set a limit on the allowed >> maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least >> support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs). > > This sounds fine actually. We kicked this around before & I was > basically ok with the last iteration of the proposal, but this seems ok > too. > > As far as I can tell from a gallium perspective you're really just > proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would > be clearer), which the state tracker thereafter has to respect? > > That would be fine with me. First attempt at a patch introducing such a cap attached. > >> My motivation is mostly that the hardware routing table for shader >> varyings that was present on nv50 has been removed with nvc0 (Fermi). >> And I'm glad, because filling 4 routing tables (since we have 5 shader >> types now) is somewhat annoying. And so applying relocations to shaders >> - it can be done, it's probably not too time consuming, but it's just >> plain *unnecessary* (and thus stupid) for OpenGL. >> >> Now about d3d9 ... >> 1. don't care, I don't see a d3d9 state tracker >> 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx >> says "n is an optional integer between 0 and the number of resources >> supported" - what "supported" means here isn't clear to me, but, I >> didn't find any example where someone used something OpenGL doesn't have >> (like COLOR2). >> 3. >> http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics >> says "Input semantics are similar to the values in the D3DDECLUSAGE." >> and >> DECLUSAGE sounds like you're limited to sane values. > > I think you're on the right track with (1)... It's fairly pointless > trying to discuss code here which isn't public & I don't think people > need to be worrying about what may or may not be important for code they > can't see. > > I know this idea previously got tied up with speculation about what a > DX9 state tracker might or might not require, but in retrospect I wish > I'd been able to steer conversation away from that. > > The work on closed components may drive a lot of the feature development > and new interfaces, but there's usually enough flexibility that this > sort of cleanup isn't a big deal. > > > Keith > >> Not sure if anyone wants to think about this issue at this time (since >> implementation of ARB_separate_shader_objects is probably far in the GL4 >> future), but I'd be happy about any comments. >> >> Regards, >> Christoph >> >> On 04/13/2010 12:55 PM, Luca Barbieri wrote: >>> This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. >>> It can also be found in the RFC-gallium-semantics branch. >>> >>> It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. >>> >>> These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. >>> >>> Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. >>> >>> In particular, the following issues are still open: >>> 1. COLOR clamping (and floating point framebuffers) >>> 2. A linkage table CSO allowing to specify non-identity linkage >>> 3. BCOLOR/FACE-related issues >>> 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided >>> >>> This topic was already very extensively discussed. >>> See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) >>> See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. >>> See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . >>> >>> That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. >>> This message should probably be adapted into gallium/docs if/when this is accepted. >>> >>> Here is the short summary; the long rationale follows after it. >>> >>> The proposal here is to add the following limitations to Gallium, for the intermediate semantics: >>> 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged >>> 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 >>> 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) >>> 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used >>> 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used >>> 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) >>> 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen >>> >>> The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. >>> >>> The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: >>> 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host >>> 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium >>> >>> Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. >>> >>> SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. >>> However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. >>> >>> The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. >>> Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. >>> >>> Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. >>> ... |
From: Keith W. <ke...@vm...> - 2010-12-17 12:28:17
|
Christoph, This looks good. Thanks for bringing this back to life. Keith On Thu, 2010-12-16 at 07:47 -0800, Christoph Bumiller wrote: > On 12/14/2010 12:36 PM, Keith Whitwell wrote: > > On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: > >> I want to warm this up again adding nvc0 and > >> GL_ARB_separate_shader_objects to the picture. > >> > >> The latter extends GL_EXT_separate_shader_objects to support user > >> defined varyings and guarantees well defined behaviour only if > >> - varyings are declared inside the gl_PerVertex/gl_PerFragment block the > >> blocks match exactly in name, type, qualification, and (most > >> significantly) declaration order. > >> - varyings are assigned matching location qualifiers: > >> like: layout(location = 3) in vec4 normal > >> "The number of input locations available to a shader is limited." > >> > >> So, I propose to (loosely) identify GENERIC semantic indices with these > >> location qualifiers and let the pipe driver set a limit on the allowed > >> maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least > >> support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs). > > > > This sounds fine actually. We kicked this around before & I was > > basically ok with the last iteration of the proposal, but this seems ok > > too. > > > > As far as I can tell from a gallium perspective you're really just > > proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would > > be clearer), which the state tracker thereafter has to respect? > > > > That would be fine with me. > First attempt at a patch introducing such a cap attached. > > > > >> My motivation is mostly that the hardware routing table for shader > >> varyings that was present on nv50 has been removed with nvc0 (Fermi). > >> And I'm glad, because filling 4 routing tables (since we have 5 shader > >> types now) is somewhat annoying. And so applying relocations to shaders > >> - it can be done, it's probably not too time consuming, but it's just > >> plain *unnecessary* (and thus stupid) for OpenGL. > >> > >> Now about d3d9 ... > >> 1. don't care, I don't see a d3d9 state tracker > >> 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx > >> says "n is an optional integer between 0 and the number of resources > >> supported" - what "supported" means here isn't clear to me, but, I > >> didn't find any example where someone used something OpenGL doesn't have > >> (like COLOR2). > >> 3. > >> http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics > >> says "Input semantics are similar to the values in the D3DDECLUSAGE." > >> and > >> DECLUSAGE sounds like you're limited to sane values. > > > > I think you're on the right track with (1)... It's fairly pointless > > trying to discuss code here which isn't public & I don't think people > > need to be worrying about what may or may not be important for code they > > can't see. > > > > I know this idea previously got tied up with speculation about what a > > DX9 state tracker might or might not require, but in retrospect I wish > > I'd been able to steer conversation away from that. > > > > The work on closed components may drive a lot of the feature development > > and new interfaces, but there's usually enough flexibility that this > > sort of cleanup isn't a big deal. > > > > > > Keith > > > >> Not sure if anyone wants to think about this issue at this time (since > >> implementation of ARB_separate_shader_objects is probably far in the GL4 > >> future), but I'd be happy about any comments. > >> > >> Regards, > >> Christoph > >> > >> On 04/13/2010 12:55 PM, Luca Barbieri wrote: > >>> This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. > >>> It can also be found in the RFC-gallium-semantics branch. > >>> > >>> It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. > >>> > >>> These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. > >>> > >>> Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. > >>> > >>> In particular, the following issues are still open: > >>> 1. COLOR clamping (and floating point framebuffers) > >>> 2. A linkage table CSO allowing to specify non-identity linkage > >>> 3. BCOLOR/FACE-related issues > >>> 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided > >>> > >>> This topic was already very extensively discussed. > >>> See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) > >>> See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. > >>> See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . > >>> > >>> That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. > >>> This message should probably be adapted into gallium/docs if/when this is accepted. > >>> > >>> Here is the short summary; the long rationale follows after it. > >>> > >>> The proposal here is to add the following limitations to Gallium, for the intermediate semantics: > >>> 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged > >>> 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 > >>> 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) > >>> 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used > >>> 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used > >>> 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) > >>> 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen > >>> > >>> The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. > >>> > >>> The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: > >>> 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host > >>> 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium > >>> > >>> Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. > >>> > >>> SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. > >>> However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. > >>> > >>> The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. > >>> Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. > >>> > >>> Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. > >>> ... |
From: Brian P. <br...@vm...> - 2010-12-17 15:32:43
|
Christoph, I don't see a patch for the st/mesa program translation code to check that we don't exceed the limit. Were you doing to take care of that too? I guess we're assuming that the max number of generic inputs == max number of generic outputs. I think that's OK until a counter case appears. -Brian On 12/17/2010 05:28 AM, Keith Whitwell wrote: > Christoph, > > This looks good. Thanks for bringing this back to life. > > Keith > > On Thu, 2010-12-16 at 07:47 -0800, Christoph Bumiller wrote: >> On 12/14/2010 12:36 PM, Keith Whitwell wrote: >>> On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: >>>> I want to warm this up again adding nvc0 and >>>> GL_ARB_separate_shader_objects to the picture. >>>> >>>> The latter extends GL_EXT_separate_shader_objects to support user >>>> defined varyings and guarantees well defined behaviour only if >>>> - varyings are declared inside the gl_PerVertex/gl_PerFragment block the >>>> blocks match exactly in name, type, qualification, and (most >>>> significantly) declaration order. >>>> - varyings are assigned matching location qualifiers: >>>> like: layout(location = 3) in vec4 normal >>>> "The number of input locations available to a shader is limited." >>>> >>>> So, I propose to (loosely) identify GENERIC semantic indices with these >>>> location qualifiers and let the pipe driver set a limit on the allowed >>>> maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least >>>> support 219 of them - nvc0 offsers 0x200 bytes for generic inputs/outputs). >>> >>> This sounds fine actually. We kicked this around before& I was >>> basically ok with the last iteration of the proposal, but this seems ok >>> too. >>> >>> As far as I can tell from a gallium perspective you're really just >>> proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would >>> be clearer), which the state tracker thereafter has to respect? >>> >>> That would be fine with me. >> First attempt at a patch introducing such a cap attached. >> >>> >>>> My motivation is mostly that the hardware routing table for shader >>>> varyings that was present on nv50 has been removed with nvc0 (Fermi). >>>> And I'm glad, because filling 4 routing tables (since we have 5 shader >>>> types now) is somewhat annoying. And so applying relocations to shaders >>>> - it can be done, it's probably not too time consuming, but it's just >>>> plain *unnecessary* (and thus stupid) for OpenGL. >>>> >>>> Now about d3d9 ... >>>> 1. don't care, I don't see a d3d9 state tracker >>>> 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx >>>> says "n is an optional integer between 0 and the number of resources >>>> supported" - what "supported" means here isn't clear to me, but, I >>>> didn't find any example where someone used something OpenGL doesn't have >>>> (like COLOR2). >>>> 3. >>>> http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics >>>> says "Input semantics are similar to the values in the D3DDECLUSAGE." >>>> and >>>> DECLUSAGE sounds like you're limited to sane values. >>> >>> I think you're on the right track with (1)... It's fairly pointless >>> trying to discuss code here which isn't public& I don't think people >>> need to be worrying about what may or may not be important for code they >>> can't see. >>> >>> I know this idea previously got tied up with speculation about what a >>> DX9 state tracker might or might not require, but in retrospect I wish >>> I'd been able to steer conversation away from that. >>> >>> The work on closed components may drive a lot of the feature development >>> and new interfaces, but there's usually enough flexibility that this >>> sort of cleanup isn't a big deal. >>> >>> >>> Keith >>> >>>> Not sure if anyone wants to think about this issue at this time (since >>>> implementation of ARB_separate_shader_objects is probably far in the GL4 >>>> future), but I'd be happy about any comments. >>>> >>>> Regards, >>>> Christoph >>>> >>>> On 04/13/2010 12:55 PM, Luca Barbieri wrote: >>>>> This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium. >>>>> It can also be found in the RFC-gallium-semantics branch. >>>>> >>>>> It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation. >>>>> >>>>> These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints. >>>>> >>>>> Note that this does NOT resolve all issues, and there are quite a bit left to future refinement. >>>>> >>>>> In particular, the following issues are still open: >>>>> 1. COLOR clamping (and floating point framebuffers) >>>>> 2. A linkage table CSO allowing to specify non-identity linkage >>>>> 3. BCOLOR/FACE-related issues >>>>> 4. Adding a cap to inform the state tracker that more than 219 generic indices are provided >>>>> >>>>> This topic was already very extensively discussed. >>>>> See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here) >>>>> See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here. >>>>> See in particular http://www.mail-archive.com/mes...@li.../msg12041.html . >>>>> >>>>> That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages. >>>>> This message should probably be adapted into gallium/docs if/when this is accepted. >>>>> >>>>> Here is the short summary; the long rationale follows after it. >>>>> >>>>> The proposal here is to add the following limitations to Gallium, for the intermediate semantics: >>>>> 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged >>>>> 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0 >>>>> 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs) >>>>> 4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used >>>>> 5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used >>>>> 6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation) >>>>> 7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen >>>>> >>>>> The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all. >>>>> >>>>> The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value: >>>>> 1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host >>>>> 2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium >>>>> >>>>> Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics. >>>>> >>>>> SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC. >>>>> However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that. >>>>> >>>>> The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other. >>>>> Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed. >>>>> >>>>> Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages. >>>>> ... > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > . > |
From: Marek O. <ma...@gm...> - 2010-12-17 16:55:22
|
On Fri, Dec 17, 2010 at 4:32 PM, Brian Paul <br...@vm...> wrote: > Christoph, > > I don't see a patch for the st/mesa program translation code to check > that we don't exceed the limit. Were you doing to take care of that too? > > I guess we're assuming that the max number of generic inputs == max > number of generic outputs. I think that's OK until a counter case > appears. > The way I understand it is that the max number of generic outputs is equal to the max number of generic inputs in the next shader stage (the same logic applies to some other shader caps too). I guess we need to use get_param to determine which shader stages are supported by the driver to know which one is next. The name *PIPE_SHADER_CAP_MAX_GENERIC_INPUT_INDEX* would be less ambiguous (still not perfect though). However I don't believe in usefulness of this new cap, at least not without some serious state tracker work. I don't consider failing to translate a shader if some CAP is too low particularly useful. (posting to mesa-dev as well) Marek > -Brian > > > On 12/17/2010 05:28 AM, Keith Whitwell wrote: > > Christoph, > > > > This looks good. Thanks for bringing this back to life. > > > > Keith > > > > On Thu, 2010-12-16 at 07:47 -0800, Christoph Bumiller wrote: > >> On 12/14/2010 12:36 PM, Keith Whitwell wrote: > >>> On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: > >>>> I want to warm this up again adding nvc0 and > >>>> GL_ARB_separate_shader_objects to the picture. > >>>> > >>>> The latter extends GL_EXT_separate_shader_objects to support user > >>>> defined varyings and guarantees well defined behaviour only if > >>>> - varyings are declared inside the gl_PerVertex/gl_PerFragment block > the > >>>> blocks match exactly in name, type, qualification, and (most > >>>> significantly) declaration order. > >>>> - varyings are assigned matching location qualifiers: > >>>> like: layout(location = 3) in vec4 normal > >>>> "The number of input locations available to a shader is limited." > >>>> > >>>> So, I propose to (loosely) identify GENERIC semantic indices with > these > >>>> location qualifiers and let the pipe driver set a limit on the allowed > >>>> maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at least > >>>> support 219 of them - nvc0 offsers 0x200 bytes for generic > inputs/outputs). > >>> > >>> This sounds fine actually. We kicked this around before& I was > >>> basically ok with the last iteration of the proposal, but this seems ok > >>> too. > >>> > >>> As far as I can tell from a gallium perspective you're really just > >>> proposing a new pipe cap _MAX_INPUTS (actually _MAX_GENERIC_INDEX would > >>> be clearer), which the state tracker thereafter has to respect? > >>> > >>> That would be fine with me. > >> First attempt at a patch introducing such a cap attached. > >> > >>> > >>>> My motivation is mostly that the hardware routing table for shader > >>>> varyings that was present on nv50 has been removed with nvc0 (Fermi). > >>>> And I'm glad, because filling 4 routing tables (since we have 5 shader > >>>> types now) is somewhat annoying. And so applying relocations to > shaders > >>>> - it can be done, it's probably not too time consuming, but it's just > >>>> plain *unnecessary* (and thus stupid) for OpenGL. > >>>> > >>>> Now about d3d9 ... > >>>> 1. don't care, I don't see a d3d9 state tracker > >>>> 2. http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx > >>>> says "n is an optional integer between 0 and the number of resources > >>>> supported" - what "supported" means here isn't clear to me, but, I > >>>> didn't find any example where someone used something OpenGL doesn't > have > >>>> (like COLOR2). > >>>> 3. > >>>> > http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics > >>>> says "Input semantics are similar to the values in the D3DDECLUSAGE." > >>>> and > >>>> DECLUSAGE sounds like you're limited to sane values. > >>> > >>> I think you're on the right track with (1)... It's fairly pointless > >>> trying to discuss code here which isn't public& I don't think people > >>> need to be worrying about what may or may not be important for code > they > >>> can't see. > >>> > >>> I know this idea previously got tied up with speculation about what a > >>> DX9 state tracker might or might not require, but in retrospect I wish > >>> I'd been able to steer conversation away from that. > >>> > >>> The work on closed components may drive a lot of the feature > development > >>> and new interfaces, but there's usually enough flexibility that this > >>> sort of cleanup isn't a big deal. > >>> > >>> > >>> Keith > >>> > >>>> Not sure if anyone wants to think about this issue at this time (since > >>>> implementation of ARB_separate_shader_objects is probably far in the > GL4 > >>>> future), but I'd be happy about any comments. > >>>> > >>>> Regards, > >>>> Christoph > >>>> > >>>> On 04/13/2010 12:55 PM, Luca Barbieri wrote: > >>>>> This patch series is intended to resolve the issue of semantic-based > shader linkage in Gallium. > >>>>> It can also be found in the RFC-gallium-semantics branch. > >>>>> > >>>>> It does not change the current Gallium design, but rather formalizes > some limitations to it, and provides infrastructure to implement this model > more easily in drivers, along with a full nv30/nv40 implementation. > >>>>> > >>>>> These limitations are added to allow an efficient implementation for > both hardware lacking special support and hardware having support but also > special constraints. > >>>>> > >>>>> Note that this does NOT resolve all issues, and there are quite a bit > left to future refinement. > >>>>> > >>>>> In particular, the following issues are still open: > >>>>> 1. COLOR clamping (and floating point framebuffers) > >>>>> 2. A linkage table CSO allowing to specify non-identity linkage > >>>>> 3. BCOLOR/FACE-related issues > >>>>> 4. Adding a cap to inform the state tracker that more than 219 > generic indices are provided > >>>>> > >>>>> This topic was already very extensively discussed. > >>>>> See > http://www.mail-archive.com/mes...@li.../msg10865.htmlfor some early inconclusive discussion around an early implementation that > modified the GLSL linker (which is NOT being proposed here) > >>>>> See > http://www.mail-archive.com/mes...@li.../msg12016.htmlfor some more discussion that seemed to mostly reach a consensus over the > approach proposed here. > >>>>> See in particular > http://www.mail-archive.com/mes...@li.../msg12041.html. > >>>>> > >>>>> That said, I'm going to try to repeat all information here, partially > by copy&pasting from earlier messages. > >>>>> This message should probably be adapted into gallium/docs if/when > this is accepted. > >>>>> > >>>>> Here is the short summary; the long rationale follows after it. > >>>>> > >>>>> The proposal here is to add the following limitations to Gallium, for > the intermediate semantics: > >>>>> 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol > that was never merged > >>>>> 2. Every semantic except GENERIC, COLOR and BCOLOR can only be used > with semantic index 0 > >>>>> 3. COLOR and BCOLOR can only be used with semantic index 0-1 (note > that this doesn't apply to fragment outputs) > >>>>> 4. GENERIC can be used with semantic indices 0-218 on any driver, if > BCOLOR is not used > >>>>> 5. GENERIC can be used with semantic indices 0-216 on any driver, if > BCOLOR IS used > >>>>> 6. GENERIC can be used with semantic indices 0-255 on almost all > drivers (those that don't need the 0-218 limitation) > >>>>> 7. Some drivers may also choose to support GENERIC with arbitrary > indices, but that should generally not happen > >>>>> > >>>>> The reason of this, in short, is that this maps directly to DirectX 9 > SM3, which is the most problematic interface of all. > >>>>> > >>>>> The peculiar problem we have here is that we have two competing > constraints that force us into choosing the exact SM3 value: > >>>>> 1. The VMware SVGA driver must deal with an SM3 host interface and > would ideally want to directly feed the Gallium semantics to the host > >>>>> 2. An hypotetical DirectX 9 state tracker needs to support SM3 and > would ideally want to directly feed the SM3 semantics to Gallium > >>>>> > >>>>> Note that this is not a reference to the VMware DirectX 9 state > tracker, since its authors haven't provided details about its handling of > shader semantics. > >>>>> > >>>>> SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, > minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones > that wouldn't be mapped to GENERIC. > >>>>> However, Gallium drivers that don't benefit from having specific > contraints (like svga and r600) are supposed to support 256 indices, and my > nv30/nv40 work does that. > >>>>> > >>>>> The expected implementation, if no hardware support exists, is to > build a list of relocations to apply to either the fragment or the vertex > shader, and patch one of them at validation time to match the other. > >>>>> Data structures are provided in gallium/auxiliary to ease this, and > try to minimize the number of times where this needs to be performed. > >>>>> > >>>>> Let's now proceed to the discussion and detailed rationale, mostly > constructed by copy&pasting older messages. > >>>>> ... > > > > > > > > > ------------------------------------------------------------------------------ > > Lotusphere 2011 > > Register now for Lotusphere 2011 and learn how > > to connect the dots, take your collaborative environment > > to the next level, and enter the era of Social Business. > > http://p.sf.net/sfu/lotusphere-d2d > > _______________________________________________ > > Mesa3d-dev mailing list > > Mes...@li... > > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > > . > > > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > |
From: Christoph B. <e04...@st...> - 2010-12-17 18:04:23
|
On 17.12.2010 17:54, Marek Olšák wrote: > On Fri, Dec 17, 2010 at 4:32 PM, Brian Paul <br...@vm... > <mailto:br...@vm...>> wrote: > > Christoph, > > I don't see a patch for the st/mesa program translation code to check > that we don't exceed the limit. Were you going to take care of > that too? > I didn't plan to for now, at least nothing beyond making the state tracker return an error if possible, and removing/modifying a certain comment mentioned below. > I guess we're assuming that the max number of generic inputs == max > number of generic outputs. I think that's OK until a counter case > appears. > > > The way I understand it is that the max number of generic outputs is > equal to the max number of generic inputs in the next shader stage > (the same logic applies to some other shader caps too). I guess we > need to use get_param to determine which shader stages are supported > by the driver to know which one is next. The name The problem is that (apart from the linked GL program case) you cannot know which stage is next until validation time. You have the same problem with the existing PIPE_SHADER_CAP_MAX_INPUTS/OUTPUTS - nv50's vertex shaders can output more variables to geometry shaders than they can to vertex shaders. Maybe MAX_GENERIC_INDEX should be a non-shader specific cap - for nvc0 the value is the same everywhere, and for hardware that only has VP and FP as well. > *PIPE_SHADER_CAP_MAX_GENERIC_INPUT_INDEX* would be less ambiguous > (still not perfect though). > I thought about using something even more verbose, like PIPE_SHADER_CAP_MAX_GENERIC_INPUT_SEMANTIC_INDEX. > However I don't believe in usefulness of this new cap, at least not > without some serious state tracker work. I don't consider failing to > translate a shader if some CAP is too low particularly useful. > The use of the cap is to prevent state tracker writers from thinking they're free to use GENERIC[0,96,8911] or whatever random numbers they like and rely on pipe drivers to ensure at all costs that linkage will be correct. In mesa/st I see the comment /* Actually, let's try and zero-base this just for * readability of the generated TGSI. */ So I guess someone thought it would be ok to start at some unspecified high index. Such random behaviour makes it really hard to get ARB_separate_shader_objects features (which galliums assumed pipe drivers would be able to do anyway from the start) sanely. --- So, maybe we can do without this cap. Maybe it would be better to just mandate that the GENERIC index be less than PIPE_SHADER_CAP_MAX_INPUTS/OUTPUTS after all. Christoph > (posting to mesa-dev as well) > > Marek > > > -Brian > > > On 12/17/2010 05:28 AM, Keith Whitwell wrote: > > Christoph, > > > > This looks good. Thanks for bringing this back to life. > > > > Keith > > > > On Thu, 2010-12-16 at 07:47 -0800, Christoph Bumiller wrote: > >> On 12/14/2010 12:36 PM, Keith Whitwell wrote: > >>> On Mon, 2010-12-13 at 12:01 -0800, Christoph Bumiller wrote: > >>>> I want to warm this up again adding nvc0 and > >>>> GL_ARB_separate_shader_objects to the picture. > >>>> > >>>> The latter extends GL_EXT_separate_shader_objects to support user > >>>> defined varyings and guarantees well defined behaviour only if > >>>> - varyings are declared inside the > gl_PerVertex/gl_PerFragment block the > >>>> blocks match exactly in name, type, qualification, and (most > >>>> significantly) declaration order. > >>>> - varyings are assigned matching location qualifiers: > >>>> like: layout(location = 3) in vec4 normal > >>>> "The number of input locations available to a shader is limited." > >>>> > >>>> So, I propose to (loosely) identify GENERIC semantic indices > with these > >>>> location qualifiers and let the pipe driver set a limit on > the allowed > >>>> maximum (e.g PIPE_SHADER_CAP_MAX_INPUTS, and not demand to at > least > >>>> support 219 of them - nvc0 offsers 0x200 bytes for generic > inputs/outputs). > >>> > >>> This sounds fine actually. We kicked this around before& I was > >>> basically ok with the last iteration of the proposal, but this > seems ok > >>> too. > >>> > >>> As far as I can tell from a gallium perspective you're really just > >>> proposing a new pipe cap _MAX_INPUTS (actually > _MAX_GENERIC_INDEX would > >>> be clearer), which the state tracker thereafter has to respect? > >>> > >>> That would be fine with me. > >> First attempt at a patch introducing such a cap attached. > >> > >>> > >>>> My motivation is mostly that the hardware routing table for > shader > >>>> varyings that was present on nv50 has been removed with nvc0 > (Fermi). > >>>> And I'm glad, because filling 4 routing tables (since we have > 5 shader > >>>> types now) is somewhat annoying. And so applying relocations > to shaders > >>>> - it can be done, it's probably not too time consuming, but > it's just > >>>> plain *unnecessary* (and thus stupid) for OpenGL. > >>>> > >>>> Now about d3d9 ... > >>>> 1. don't care, I don't see a d3d9 state tracker > >>>> 2. > http://msdn.microsoft.com/en-us/library/bb509647%28v=VS.85%29.aspx > >>>> says "n is an optional integer between 0 and the number of > resources > >>>> supported" - what "supported" means here isn't clear to me, > but, I > >>>> didn't find any example where someone used something OpenGL > doesn't have > >>>> (like COLOR2). > >>>> 3. > >>>> > http://msdn.microsoft.com/en-us/library/bb944006%28v=vs.85%29.aspx#Varying_Shader_Inputs_and_Semantics > >>>> says "Input semantics are similar to the values in the > D3DDECLUSAGE." > >>>> and > >>>> DECLUSAGE sounds like you're limited to sane values. > >>> > >>> I think you're on the right track with (1)... It's fairly > pointless > >>> trying to discuss code here which isn't public& I don't think > people > >>> need to be worrying about what may or may not be important for > code they > >>> can't see. > >>> > >>> I know this idea previously got tied up with speculation about > what a > >>> DX9 state tracker might or might not require, but in > retrospect I wish > >>> I'd been able to steer conversation away from that. > >>> > >>> The work on closed components may drive a lot of the feature > development > >>> and new interfaces, but there's usually enough flexibility > that this > >>> sort of cleanup isn't a big deal. > >>> > >>> > >>> Keith > >>> > >>>> Not sure if anyone wants to think about this issue at this > time (since > >>>> implementation of ARB_separate_shader_objects is probably far > in the GL4 > >>>> future), but I'd be happy about any comments. > >>>> > >>>> Regards, > >>>> Christoph > >>>> > >>>> On 04/13/2010 12:55 PM, Luca Barbieri wrote: > >>>>> This patch series is intended to resolve the issue of > semantic-based shader linkage in Gallium. > >>>>> It can also be found in the RFC-gallium-semantics branch. > >>>>> > >>>>> It does not change the current Gallium design, but rather > formalizes some limitations to it, and provides infrastructure to > implement this model more easily in drivers, along with a full > nv30/nv40 implementation. > >>>>> > >>>>> These limitations are added to allow an efficient > implementation for both hardware lacking special support and > hardware having support but also special constraints. > >>>>> > >>>>> Note that this does NOT resolve all issues, and there are > quite a bit left to future refinement. > >>>>> > >>>>> In particular, the following issues are still open: > >>>>> 1. COLOR clamping (and floating point framebuffers) > >>>>> 2. A linkage table CSO allowing to specify non-identity linkage > >>>>> 3. BCOLOR/FACE-related issues > >>>>> 4. Adding a cap to inform the state tracker that more than > 219 generic indices are provided > >>>>> > >>>>> This topic was already very extensively discussed. > >>>>> See > http://www.mail-archive.com/mes...@li.../msg10865.html > for some early inconclusive discussion around an early > implementation that modified the GLSL linker (which is NOT being > proposed here) > >>>>> See > http://www.mail-archive.com/mes...@li.../msg12016.html > for some more discussion that seemed to mostly reach a consensus > over the approach proposed here. > >>>>> See in particular > http://www.mail-archive.com/mes...@li.../msg12041.html > . > >>>>> > >>>>> That said, I'm going to try to repeat all information here, > partially by copy&pasting from earlier messages. > >>>>> This message should probably be adapted into gallium/docs > if/when this is accepted. > >>>>> > >>>>> Here is the short summary; the long rationale follows after it. > >>>>> > >>>>> The proposal here is to add the following limitations to > Gallium, for the intermediate semantics: > >>>>> 1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal > Krol that was never merged > >>>>> 2. Every semantic except GENERIC, COLOR and BCOLOR can only > be used with semantic index 0 > >>>>> 3. COLOR and BCOLOR can only be used with semantic index 0-1 > (note that this doesn't apply to fragment outputs) > >>>>> 4. GENERIC can be used with semantic indices 0-218 on any > driver, if BCOLOR is not used > >>>>> 5. GENERIC can be used with semantic indices 0-216 on any > driver, if BCOLOR IS used > >>>>> 6. GENERIC can be used with semantic indices 0-255 on almost > all drivers (those that don't need the 0-218 limitation) > >>>>> 7. Some drivers may also choose to support GENERIC with > arbitrary indices, but that should generally not happen > >>>>> > >>>>> The reason of this, in short, is that this maps directly to > DirectX 9 SM3, which is the most problematic interface of all. > >>>>> > >>>>> The peculiar problem we have here is that we have two > competing constraints that force us into choosing the exact SM3 value: > >>>>> 1. The VMware SVGA driver must deal with an SM3 host > interface and would ideally want to directly feed the Gallium > semantics to the host > >>>>> 2. An hypotetical DirectX 9 state tracker needs to support > SM3 and would ideally want to directly feed the SM3 semantics to > Gallium > >>>>> > >>>>> Note that this is not a reference to the VMware DirectX 9 > state tracker, since its authors haven't provided details about > its handling of shader semantics. > >>>>> > >>>>> SM3 ends up supporting 219 generic indices: 16 indices in 14 > classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which > are the only ones that wouldn't be mapped to GENERIC. > >>>>> However, Gallium drivers that don't benefit from having > specific contraints (like svga and r600) are supposed to support > 256 indices, and my nv30/nv40 work does that. > >>>>> > >>>>> The expected implementation, if no hardware support exists, > is to build a list of relocations to apply to either the fragment > or the vertex shader, and patch one of them at validation time to > match the other. > >>>>> Data structures are provided in gallium/auxiliary to ease > this, and try to minimize the number of times where this needs to > be performed. > >>>>> > >>>>> Let's now proceed to the discussion and detailed rationale, > mostly constructed by copy&pasting older messages. > >>>>> ... > > > > > > > > > ------------------------------------------------------------------------------ > > Lotusphere 2011 > > Register now for Lotusphere 2011 and learn how > > to connect the dots, take your collaborative environment > > to the next level, and enter the era of Social Business. > > http://p.sf.net/sfu/lotusphere-d2d > > _______________________________________________ > > Mesa3d-dev mailing list > > Mes...@li... > <mailto:Mes...@li...> > > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > > . > > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > _______________________________________________ > Mesa3d-dev mailing list > Mes...@li... > <mailto:Mes...@li...> > https://lists.sourceforge.net/lists/listinfo/mesa3d-dev > > |