R8000 optimized dual pixel (R8010 FPU) calculation.
This code uses 2x unroll to reach 25% theoretical-peak
of the R8010.
Alas, the dual FPUs should really deliver 50% which is
the R8010 real-world peak (when not using MADDs only),
Seems we cannot hide the *huge* 4 cycles ADD/SUB
latencies in the short mandel loop.
A future quad-pixel version with 4x unroll should reach
The R8010 is a great FPUs but the lame 4 cycles latency
for ADD/SUB is a showstopper.
So far we only have 7.5 MegaIters/sec on my R8000 75MHz.
That is only about 1 FLOP/cycle.
Log in to post a comment.