On Wed, Oct 14, 2009 at 3:02 PM, Nicolai Hähnle <nhaehnle@...> wrote:
> Alex, I added you to the CC in case you can help clarify the points on R500
> vertex programs.
> Am Wednesday 14 October 2009 08:20:42 schrieb Ian Romanick:
>> > Issue 2:
>> > 1) R500 supports unstructured branching in fragment programs but not in
>> > vertex programs, so I'm happy about leaving it out.
>> Weird. That's backwards from how other SM3 GPUs do it. Usually you get
>> unstructured branching in the AoS vertex shader.
> I agree. To be honest, the vertex processor documentation for R500 confuses
> the hell out of me. Somehow, the way it is written it suggests that there is a
> JUMP instruction that can only jump based on a constant register, which just
> seems extremely bizarre, but the documentation is quite consistent about it,
> because it tells us to use conditional write instructions to implement if-
> Maybe there is a part which is simply missing? I also see neither JUMP nor
> LOOP opcodes anywhere, just registers describing the first and last
> instruction pointer for a loop.
It's not part of the actual shader code per se; it's implemented in
parallel to the vertex shader and interacts with it. See the
VAP_PVS_FLOW_CNTL_* regs. r3xx-r5xx have them so it should be
possible on all 3 generations; r5xx supports longer programs however.
>> > 2) R500 supports address registers as described in vertex programs
>> > (including input/output offsets), but has no address registers at all in
>> > fragment programs. A loop address register can be used as offsets in
>> > loops, but the values loaded into this register must be determined at
>> > compile time.
>> I had intended to move the grammar for ARL and ARR out of the generic
>> GPU grammar and into the vertex program-specific grammar. The intention
>> is that LOOP/ENDLOOP is the only way to load an address register in a
>> fragment program. LOOP/ENDLOOP set the .x component and leave the other
>> components undefined. Since the ENDLOOP restores the "previous" value
>> of the address register, the last ENDLOOP restores garbage. My
>> intention was to provide consistent syntactic sugar over the constrained
>> functionality of the loop index.
> Sounds good.
>> > I think we can do everything you throw at us on R500. The only difficulty
>> > is that R500 is a bit schizophrenic in that vertex programs are very
>> > different from fragment programs, but we can emulate things. The only
>> > stupid weakness is that swizzling predicates in fragment programs is
>> > essentially impossible (the only natively supported swizzles are .rgba
>> > and the smears .rrrr, .gggg, .bbbb, .aaaa). Obviously we can emulate
>> > this.
>> How painful would it be to emulate? We could restrict the set of
>> available predicate swizzles. I think this matches D3D, so it shouldn't
>> be a problem for Wine.
> I'd always be happier if I didn't have to do it, but it's certainly easier
> than what we're already doing for R300 fragment programs anyway. The question
> is whether you want to add a fragment-program-only restriction to the provided
> swizzles. I don't feel very strongly either way.
>> > Issue 11:
>> > R500 supposedly supports relative addressing of temporary registers in
>> > vertex programs, and also in fragment programs (but only using loop
>> > indices). I have never tested whether it actually works, though.
>> This would be a good feature to have. Would it be possible to hack up a
>> test? Do you know of any limitations?
> Will do this weekend, at least for vertex programs; I don't know of any
> I don't know if I'll get to hacking something up for fragment programs soon,
> because that's slightly more involved (I haven't done fragment program loops
>> > Issue 13:
>> > Similar to issue 2, R500 fragment programs support unstructured
>> > everything but vertex programs don't, so not overlapping sounds good to
>> > me.
>> > Issue 15:
>> > I know R500 fragment programs can support a CONT, but I'm not so familiar
>> > with the R500 vertex programs, and they seem generally less flexible.
>> I didn't see an explicit CONT instruction. If there's no unstructured
>> branch, there probably isn't a way to do it.
>> > Issue 17:
>> > I would *expect* negative addressing offsets to work on R500, but somehow
>> > I haven't been able to get them to work. I'll see if I can look into it
>> > again.
>> No hardware that I'm aware of supports true negative offsets in the
>> instructions. This is made to work with program parameters by putting
>> the base of the array at a large enough positive offset to make the
>> largest negative offset be zero. For example, if the program uses
>> my_array[A0.x - 10], the driver has to place my_array at parameter slot
>> 10 or higher.
> I see.
>> I don't think we can do similar trickery for attributes and results. I
>> think we may have to leave the negative offsets just for program
>> parameters and only allow positive offsets for attributes and results.
>> Note that NV_gpu_program4 only allows positive offsets. It can get away
>> with this because SM4 has general purpose integer instructions and any
>> register can be used for indirect addressing.
> Well, one possible trickery that I believe Corbin suggested was transforming:
> ARL A0.x, R.x;
> MOV R, CONST[A0.x - 5];
> SUB TMP.x, R.x, 5;
> ARL A0.x, R.x;
> MOV R, CONST[A0.x];
>> > Issue 34:
>> > I don't see any support for an address register stack on R500, or
>> > anything else to provide for a subroutine stack.
>> If you can do relative addressing of temporaries, you can fake a small
>> stack. It's ugly, but it's possible. Of course, without address
>> register math it's even more ugly.
> True, that's a good argument in favour of relative addressing of temporaries.
>> I'll post an updated version in the morning with the grammar change (for
>> ARL and ARR) and the documentation for the other predicate-set