I wouldn’t call it (recent ATI) a scalar architecture, it’s more like a VLIW architecture, where you explicitly schedule the various slots.

 

Peter-Pike

 

From: gdalgorithms-list-bounces@lists.sourceforge.net [mailto:gdalgorithms-list-bounces@lists.sourceforge.net] On Behalf Of Emil Persson
Sent: Tuesday, February 12, 2008 1:37 PM
To: 'Game Development Algorithms'
Subject: Re: [Algorithms] Dummie Matrix math questions

 

I wrote that document, and either you’re reading it wrong or you misunderstood what Jon and Marco said.

The HD 2000 series is a scalar architecture, hence it’s not necessary to vectorize code like you would on R520 or G70, however, it’s parallel rather than serial. The important thing is to parallelize the code so that each scalar instruction can be issued into a separate slot. Since vectorized code is by nature also parallel it will run fast, so vectorized code is not a bad thing. But unlike earlier generations parallel code that’s not vectorized also runs fast since all parallel scalars can be computed in parallel. What doesn’t run fast is serial scalar dependencies since these lead to poor utilization of the shader cores. In average shaders this is not so much of a problem, but you could construct pathological cases where utilization would be 1/5 of the maximum throughput. To parallelize your code it’s recommended that you use parentheses to break up long lines since HLSL evaluates expressions from left to right, just like C/C++. So A+B+C+D is computed as ((A+B)+C)+D, whereas (A+B)+(C+D) could run in one instruction less (assuming all are scalars).

 

-Emil

 

 


From: gdalgorithms-list-bounces@lists.sourceforge.net [mailto:gdalgorithms-list-bounces@lists.sourceforge.net] On Behalf Of Jesús de Santos García
Sent: Tuesday, February 12, 2008 2:47 PM
To: Game Development Algorithms
Subject: Re: [Algorithms] Dummie Matrix math questions

 

The same is true for ATI too as described in the ATI HD2000 Programming Guide. Non-vectorized code is optimized and it is recommended for cases where vectorial instructions are not needed.

http://ati.amd.com/developer/SDK/AMD_SDK_Samples_May2007/Documentations/ATI_Radeon_HD_2000_programming_guide.pdf

Good to know that nvidia and intel are using the same architecture.

On Feb 8, 2008 7:34 PM, Marco Salvi <marcotti@gmail.com> wrote:

 

On Feb 8, 2008 9:42 AM, Jon Watte <hplus@mindcontrol.org> wrote:

 

I'm told that the implementation of modern 4-way SIMD on graphics cards
is now done as serial instructions, so there's little benefit to be had
compared to just coding it out.


This is true for NVIDIA (G8x architecture) and latest Intel GPUs, not for AMD/ATI GPUs.
But yes..I guess everyone will sooner or later move to the same model.

Marco


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
GDAlgorithms-list mailing list
GDAlgorithms-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list
Archives:
http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list