[Math-atlas-results] comparitive x86 timings
Brought to you by:
rwhaley,
tonyc040457
From: <rw...@cs...> - 2004-07-18 14:32:16
|
Guys, In trying to understand the effeceon results, I built the following table people might find interesting. It shows the performance of an SSE2 all-register code (SSE2), ATLAS best matmul kernel ran in-cache (dmm-ic), the same kernel ran out-of-cache (dmm-oc), and full DGEMM times (dGEMM). Then, I have % of peak, and of the preceeding column. PEAK SSE2 dMM-ic dMM-oc dGEMM ==== ========= =========== =========== =========== 1.6Ghz Ham64 3200 3051(98%) 2984(93/98%) 2937(92/98%) 2805(88/96%) 2.8Ghz P4E 5600 5178(92%) 4492(80/87%) 4425(79/99%) 4303(77/97%) 1.0Ghz PIII 1000 -------- 933(93%) 840(84/90%) 760(76/90%) 1.0Ghz Eff 2000 1790(90%) 1514(76/85%) 1309(65/86%) 1201(60/92%) In building a table like this, you are looking for what pieces you are lacking. In the efficeon case, it looks like every step other than the last is slightly low. Unfortunately, the strongest correlation I found was in the time I have spent tuning the particular arch :) Cheers, Clint P.S.: Efficeon timings at: http://math-atlas.sourceforge.net/timing/Efficeon/ |