Re: [atlas-devel] PowerPC performance, was Floating point operations in DGEMM and DSYRK
Brought to you by:
rwhaley,
tonyc040457
From: Ian O. <ia...@ap...> - 2013-03-06 23:27:05
|
AltiVec on these G4s did single precision but not double precision. Peak throughput was 8 single precision flops per cycle per core. For double precision it was formally 1. However, the FPU was only capable of keeping 4 out of 5 stages busy due to under provisioned reservation stations, so the peak theoretical performance for single precision in something like sgemm was about 5x what you could get for double. For a unicore machine running at 1.33 GHz, we would expect something south of 10.6 GFlops for single precision. Based on the results you present, it would not be surprising if you eventually discover that you are running a simple unoptimized scalar loop for most of the calculation. A formulation of the sgemm that had the double precision accumulators would have single slower than double because of the extra work to convert all the single precision values to double. If you have a tool capable of sampling the code while running at instruction granularity, sending us the assembly code for the hot loop should allow quick discovery as to the problem. Ian On Mar 6, 2013, at 3:07 PM, José Luis García Pallero <jgp...@gm...> wrote: > 2013/3/6 Brooks Moses <bro...@me...>: >> [writing off-list so as not to clutter everyone's inboxes] >> >> Hello José, >> >> José Luis García Pallero wrote, at 3/6/2013 2:43 AM: >>> I have an Apple iBook PPC G4 running Debian GNU/Linux and ATLAS 3.8.4 >>> installed. Probably tonight I could try some benchmarks with >>> cblas_sgemm() and cblas_ssyrk(). I will post the results here. The >>> laptop has 1.5 GB of memory. Which range of dimensions you prefer to >>> do the tests? 100 to 1000, 1000 to 10000? >> >> The smaller range would be ideal -- and thank you very much for offering! > > Hello: > > Apple iBook G4, CPU type: PowerPC G4 (1.5), Debian GNU/Linux, ATLAS > 3.8.4 from the Debian repositories (NOT compiled by myself, but as > there are not much variety of PPC G4, it should be well optimized -see > results compared with veclib for double in the original post of this > thread). > > The tests was performed for M=N=K=100, 200, ..., 1000 using > cblas_sgemm and cblas_ssyrk. The results are (each value was computed > after 10 runs for each dimensions values): > > GEMM: M=N=K= 100 -> 0.533 GFLOPS/s > GEMM: M=N=K= 200 -> 0.643 GFLOPS/s > GEMM: M=N=K= 300 -> 0.783 GFLOPS/s > GEMM: M=N=K= 400 -> 0.767 GFLOPS/s > GEMM: M=N=K= 500 -> 0.759 GFLOPS/s > GEMM: M=N=K= 600 -> 0.808 GFLOPS/s > GEMM: M=N=K= 700 -> 0.803 GFLOPS/s > GEMM: M=N=K= 800 -> 0.822 GFLOPS/s > GEMM: M=N=K= 900 -> 0.821 GFLOPS/s > GEMM: M=N=K= 1000 -> 0.819 GFLOPS/s > > SYRK: M=N=K= 100 -> 0.371 GFLOPS/s > SYRK: M=N=K= 200 -> 0.502 GFLOPS/s > SYRK: M=N=K= 300 -> 0.649 GFLOPS/s > SYRK: M=N=K= 400 -> 0.653 GFLOPS/s > SYRK: M=N=K= 500 -> 0.658 GFLOPS/s > SYRK: M=N=K= 600 -> 0.721 GFLOPS/s > SYRK: M=N=K= 700 -> 0.724 GFLOPS/s > SYRK: M=N=K= 800 -> 0.755 GFLOPS/s > SYRK: M=N=K= 900 -> 0.756 GFLOPS/s > SYRK: M=N=K= 1000 -> 0.762 GFLOPS/s > > For dimensions of 100 and 200 the performance varies between 0.5 and > 0.75 with GEMM, but with SYRK is more stable. > > I'm very surprised, the performance for single precision is worst as > for double. ??? > For double I obtain performances about 1 GFLOPS/s with this machine > (with ATLAS in Linux and with Apple's veclib in MAC OS X Tiger). I've > tested in my intel pentium m 1.33 GHz laptop and for double the > performance is about 1 GFLOPS/s but for single is about 2.5 GFLOPS/s > which is spected. > > I don't know why in this case the single precission functions are > slower. Probably Clint could give some hints. > > By now I have not enough time in order to compile the atlas 3.10.1 > version, so I can offer only these results. > > We can argue that the problem is the debian compilation, but for > double presision the results are similar as with apple's veclib in OS > X > > Cheers > > -- > ***************************************** > José Luis García Pallero > jgp...@gm... > (o< > / / \ > V_/_ > Use Debian GNU/Linux and enjoy! > ***************************************** > > ------------------------------------------------------------------------------ > Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester > Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the > endpoint security space. For insight on selecting the right partner to > tackle endpoint security challenges, access the full report. > http://p.sf.net/sfu/symantec-dev2dev > _______________________________________________ > Math-atlas-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |