Thread: [Mesa3d-dev] [PATCH 0/6] [RFC] Formalization of the Gallium shader semantics linkage model

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

This patch series is intended to resolve the issue of semantic-based shader linkage in Gallium.
It can also be found in the RFC-gallium-semantics branch.

It does not change the current Gallium design, but rather formalizes some limitations to it, and provides infrastructure to implement this model more easily in drivers, along with a full nv30/nv40 implementation.

These limitations are added to allow an efficient implementation for both hardware lacking special support and hardware having support but also special constraints.

Note that this does NOT resolve all issues, and there are quite a bit left to future refinement.

In particular, the following issues are still open:
1. COLOR clamping (and floating point framebuffers)
2. A linkage table CSO allowing to specify non-identity linkage
3. BCOLOR/FACE-related issues
4. Adding a cap to inform the state tracker that more than 219 generic indices are provided

This topic was already very extensively discussed.
See http://www.mail-archive.com/mes...@li.../msg10865.html for some early inconclusive discussion around an early implementation that modified the GLSL linker (which is NOT being proposed here)
See http://www.mail-archive.com/mes...@li.../msg12016.html for some more discussion that seemed to mostly reach a consensus over the approach proposed here.
See in particular http://www.mail-archive.com/mes...@li.../msg12041.html .

That said, I'm going to try to repeat all information here, partially by copy&pasting from earlier messages.
This message should probably be adapted into gallium/docs if/when this is accepted.

Here is the short summary; the long rationale follows after it.

The proposal here is to add the following limitations to Gallium, for the intermediate semantics:
1. TGSI_SEMANTIC_NORMAL is removed, using a commit by Michal Krol that was never merged
2. Every semantic except GENERIC, COLOR and BCOLOR can only be used with semantic index 0
3. COLOR and BCOLOR can only be used with semantic index 0-1 (note that this doesn't apply to fragment outputs)
4. GENERIC can be used with semantic indices 0-218 on any driver, if BCOLOR is not used
5. GENERIC can be used with semantic indices 0-216 on any driver, if BCOLOR IS used
6. GENERIC can be used with semantic indices 0-255 on almost all drivers (those that don't need the 0-218 limitation)
7. Some drivers may also choose to support GENERIC with arbitrary indices, but that should generally not happen

The reason of this, in short, is that this maps directly to DirectX 9 SM3, which is the most problematic interface of all.

The peculiar problem we have here is that we have two competing constraints that force us into choosing the exact SM3 value:
1. The VMware SVGA driver must deal with an SM3 host interface and would ideally want to directly feed the Gallium semantics to the host
2. An hypotetical DirectX 9 state tracker needs to support SM3 and would ideally want to directly feed the SM3 semantics to Gallium

Note that this is not a reference to the VMware DirectX 9 state tracker, since its authors haven't provided details about its handling of shader semantics.

SM3 ends up supporting 219 generic indices: 16 indices in 14 classes, minus POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 which are the only ones that wouldn't be mapped to GENERIC.
However, Gallium drivers that don't benefit from having specific contraints (like svga and r600) are supposed to support 256 indices, and my nv30/nv40 work does that.

The expected implementation, if no hardware support exists, is to build a list of relocations to apply to either the fragment or the vertex shader, and patch one of them at validation time to match the other.
Data structures are provided in gallium/auxiliary to ease this, and try to minimize the number of times where this needs to be performed.

Let's now proceed to the discussion and detailed rationale, mostly constructed by copy&pasting older messages.

===============
Michal Krol's proposal
===============

First of all, see Michal Krol's proposal at http://www.opensource-archive.org/showthread.php?t=148573, and in particular:
<<
name index range
----------------------------
POSITION no limit?
COLOR 0..1, explicit clamp?
BCOLOR 0..1, explicit clamp?
FOG remove?
PSIZE 0
GENERIC 0..<max generics>
NORMAL remove
FACE 0
EDGEFLAG 0
PRIMID 0
INSTANCEID 0
>>

My proposal follows this, except for limiting POSITION to 0 too.
Not sure why Michal thought "no limit" could make sense: the POSITION is fundamentally a singleton, since it is the input to the rasterizer unit.

======================
An overview of hardware support
======================

Hardware with no capabilities.
- nv30 does not support any mapping. However, we already need to patch
fragment programs to insert constants, so we can patch input register
numbers as well. The current driver only supports 0-7 generic indices,
but I already implemented support for 0-255 indices with in-driver
linkage and patching. Note that nv30 lacks control flow in fragment
programs.
- nv40 is like nv30, but supports fp control flow, and may have some
configurable mapping support, with unknown behavior

Hardware with capabilities that must be configured for each fp/vp pair.
- nv40 might have this but the nVidia OpenGL driver does not use them
- nv50 has configurable vp->gp and gp->fp mappings with 64 entries.
The current Gallium driver seems to support arbitrary 0-2^32 indices, but uses an inefficient O(n^2) algorithm to be able to do that

- r300 appears to have a configurable vp->fp mapping. The current
driver only supports 0-15 generic indices, but redefining
ATTR_GENERIC_COUNT could be enough to have it support larger numbers.

Hardware with automatic linkage when semantics match:
- VMWare svga appears to support 14 * 16 semantics, but the current
driver only supports 0-15 generic indices. This could be fixed by
mapping GENERIC into all non-special SM3 semantics.

Hardware that can do both configurable mappings and automatic linkage:
- r600 supports linkage in hardware between matching apparently
byte-sized semantic ids

Other hardware;
- i915 has no hardware vertex shading
The current driver is broken and only supports 0-7 indices: this seems
easy to fix though
- Not sure about i965

===================
An overview of software APIs
===================

1. DirectX 9 SM3 supports indices in the 0-15 range associated with 
semantics in the 0-13 range.

A few of the name/index pairs have special meanings, but the others
are just cosmetic as long as the fixed pipeline is not used.

Thus, SM3 wants to use 14 * 16 indices overall.

Of these, POSITION0, PSIZE0, COLOR0, COLOR1 and FOG0 map to non-GENERIC
semantics, leaving 219 semantics handled by GENERIC

2. SM2 and non-GLSL OpenGL just want to use as many indices as the
hardware interpolator count, sometimes limiting that further

They are the most easy and straightforward ones.

3. DirectX 10 seems to only require a 0-31 range.

In particular, the fxc.exe compiler allows to specify arbitrary _strings_ and
32-bit indices.

However, this information is encoded as metadata in the output file, and
the shader bytecode itself uses integers in the 0-31 range to refer to the
metadata.

It seems that the metadata is resolved by the Microsoft DirectX 10 runtime,
and the driver only sees 0-31 indices on the DDI interface.

However, this is a bit unclear: confirmation or correction would be
appreciated.

4. GLSL requires to provide both shaders at link time, and thus does
not constrain the implementation in any way.

However, it may be possible to mix GLSL with other shaders, leading to
the need to reserve the texcoord slots.

In that case, GLSL will need about 8 more slots that the number of
effectively used semantics.

This is the case with the current Mesa/Gallium implementation

5. GLSL with EXT_separate_shader_objects does not add requirements
because only gl_TexCoord and other builtin varyings are supported.
User-defined varyings are not supported

See in particular the following text from the extension:
<<
        It is undesirable from a performance standpoint to attempt to
        support "rendezvous by name" for arbitrary separate shaders
        because the separate shaders won't be naturally compiled to
        match their varying inputs and outputs of the same name without
        a special link step.  Such a special link would introduce an
        extra validation overhead to binding separate shaders.  The link
        itself would have to be deferred until glBegin time since separate
        shaders won't match when transitioning from one set of consistent
        shaders to another.  This special link would still create errors
        or undefined behavior when the names of input and output varyings
        matched but their types did not match.
>>

6. An hypotetical version of EXT_separate_shader_objects extended to
support user-defining varyings would either want arbitrary 32-bit
generic indices (by interning strings to generate the indices) or the
ability to specify a custom mapping between shader indices

7. An hypotetical "no-op" implementation of the GLSL linker would have
the same requirement

====================
About non-GENERIC semantics
====================

Also note that non-GENERIC semantics have peculiar properties.

For COLOR and BCOLOR:
1. SM3 and OpenGL with glColorClamp appropriately set wants it to
_not_ be clamped to [0, 1]
2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1]
(sometimes for fixed point targets only) and may also allow using
U8_UNORM precision for it instead of FP32
3. OpenGL allows to enable two-sided lighting, in which case COLOR in
the fragment shader is automagically set to BCOLOR for back faces
4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING.
Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware.
The latest hardware probably supports FACING only.

Any API that requires special semantics for COLOR and BCOLOR (i.e.
non-SM3) seems to only want 0-1 indices.

Note that SM3 does *not* include BCOLOR, so basically the limits for
generic indices would need to be conditional on BCOLOR being present
or not (e.g. if it is present, we must reserve two semantic slots in
svga for it).

POSITION0 is obviously special.
PSIZE0 is also special for points.

FOG0 seems right now to just be a GENERIC with a single component.
Gallium could be extended to support fixed function fog, which most
DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal
to the semantic issue.

==============
Current Gallium users
==============

Right now no open-source users of Gallium fundamentally require arbitrary indices.
In particular:
1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices
2. ARB fragment program and vertex program use index-limited texcoord slots
3. g3dvl needs and uses 8 texcoord slots, indices 0-7
4. vega and xorg use indices 0-1
5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
6. The GL_EXT_separate_shader_objects extension does not provide
arbitrary index matching for GLSL, but merely lets it use a model
similar to ARB fp/vp

However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway.

===================
Discussion of possible options
===================

[Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719]
a) Picking a lower number like 128, that an SM3 state tracker could
usually be able to directly translate incoming semantics into, but which
would force it to renumber under rare circumstances. This would make
life easier for the open drivers at the expense of the closed code.

b) Picking 256 to make life easier for some closed-source SM3 state
tracker, but harder for open drivers.

c) Picking 219 (or some other magic number) that happens to work with
the current set of constraints, but makes gallium fragile in the face of
new constraints.

d) Abandoning the current gallium linkage rules and coming up with
something new, for instance forcing the state trackers to renumber
always and making life trivial for the drivers...

[Options from me]

(e) Allow arbitrary 32-bit indices. This requires slightly more
complicated data structures in some cases, and will require svga and
r600 to fallback to software linkage if numbers are too high.

(f) Limit semantic indices to hardware interpolators _and_ introduce
an interface to let the user specify an

Personally I think the simplest idea for now could be to have all
drivers support 256 indices or, in the case of r600 and svga, the
maximum value supported by the hardware, and expose that as a cap (as
well as another cap for the number of different semantic values
supported at once).
The minimum guaranteed value is set to the lowest hardware constraint,
which would be svga with 219 indices (assuming no bcolor is used).
If some new constraints pop up, we just lower it and change SM3 state
trackers to check for it and fallback otherwise.

This should just require simple fixes to svga and r300, and
significant code for nv30/nv40, which is however already implemented.

Luca Barbieri (5):
  tgsi: formalize limits on semantic indices
  tgsi: add support for packing semantics in SM3 byte values
  gallium/auxiliary: add semantic linkage utility code
  nvfx: support proper shader linkage - adds glsl support
  nvfx: expose GLSL

Michal Krol (1):
  gallium: Remove TGSI_SEMANTIC_NORMAL.

Thread: [Mesa3d-dev] [PATCH 0/6] [RFC] Formalization of the Gallium shader semantics linkage model

mesa3d-dev