Re: [mpg123-devel] fixed point decoders
Brought to you by:
sobukus
|
From: Thomas O. <tho...@or...> - 2009-06-01 12:54:47
|
Am Mon, 01 Jun 2009 18:30:12 +0900 schrieb Taihei Monma <tm...@ma...>: > Well, that's expected. As I said before, > > > And I noticed that this method is about 10% (!) faster than the > > simple rounding with truncation on x87 fpu (my mistake, fisttp is > > only available on SSE3-capable cpus). Ah, now I can make sense out of this statement! I didn't look at a non-sse build back then. Indeed, the situation on the Atlon XP is funny. Using -msse -mfpmath=sse in CFLAGS makes the truncation code faster... but it's again faster when dropping the -mfpmath=sse (I did use blank CFLAGS before, so basically i386 code). Hm, so far the fastest build on the AthlonXP is achieved by using -msse, but _not_ -mfpmath=sse, together with --enable-int-quality. Fastest & good rounding... well, that's the generic code... the SSE code for non-quality rounding is significantly faster than anything else. But it is really non-trivial to devise the optimum set of CFLAGS, or choice of rounding mode ... when the fast rounding is the slow rounding... when the x87 fpu and SSE unit exchange positions at will... > on x87 FPU simple truncation is slow. But on SSE FPU truncation is > fast, because it is doable with just one instruction. Probably you are > using SSE FPU (it is default on x86-64) on your core2. Yes... the core2 should run on SSE. What really buggers me a bit is that the generic code is about 2.5 times faster on the 1.466 GHz AthlonXP compared to my mobile 1.2Ghz Core2duo. How can the latter be that lame? I don't see the specs favouring the Athlon that much. Even the one remaining memory channel (instead of two) of the Core2 should stil be faster than what the Athlon has. Well, with your x86-64 SSE code, the Core2 catches up a bit... the AthlonXP's SSE being just about 2.1 times faster. It seems like Intel did some interesting cuts with this Core2 to make it power efficient -- makes one wonder how well a die-shrinked Atlhon XP would fare against it:-/ Tuning code and compiler to get optimum performance for some program continues to be an adventurous journey with many possible directions... Alrighty then, Thomas. |