Re: [mpg123-devel] fixed point decoders
Brought to you by:
sobukus
|
From: Taihei M. <tm...@ma...> - 2009-05-31 19:32:07
|
I've tuned the fixed point decoder again, and now it is as good as the MMX/SSE integer decoder (near the "fully compliant" quality)! ==== Layer 1 ==== --> 16 bit signed integer output fl1.bit: RMS=9.086680e-06 (LIMITED) maxdiff=2.175570e-05 (PASS) fl2.bit: RMS=9.001627e-06 (LIMITED) maxdiff=2.199411e-05 (PASS) fl3.bit: RMS=9.030924e-06 (LIMITED) maxdiff=1.877546e-05 (PASS) fl4.bit: RMS=8.908524e-06 (LIMITED) maxdiff=1.573563e-05 (PASS) fl5.bit: RMS=9.134317e-06 (LIMITED) maxdiff=1.996756e-05 (PASS) fl6.bit: RMS=9.211905e-06 (LIMITED) maxdiff=3.045797e-05 (PASS) fl7.bit: RMS=8.427010e-06 (PASS) maxdiff=1.686811e-05 (PASS) fl8.bit: RMS=8.977507e-06 (LIMITED) maxdiff=2.092123e-05 (PASS) ==== Layer 2 ==== --> 16 bit signed integer output fl10.bit: RMS=9.532534e-06 (LIMITED) maxdiff=2.503395e-05 (PASS) fl11.bit: RMS=9.379052e-06 (LIMITED) maxdiff=2.652407e-05 (PASS) fl12.bit: RMS=9.330248e-06 (LIMITED) maxdiff=2.586842e-05 (PASS) fl13.bit: RMS=8.938162e-06 (LIMITED) maxdiff=1.621246e-05 (PASS) fl14.bit: RMS=1.012798e-05 (LIMITED) maxdiff=2.151728e-05 (PASS) fl15.bit: RMS=9.110907e-06 (LIMITED) maxdiff=3.051758e-05 (PASS) fl16.bit: RMS=1.001348e-05 (LIMITED) maxdiff=4.702806e-05 (PASS) ==== Layer 3 ==== --> 16 bit signed integer output compl.bit: RMS=9.119217e-06 (LIMITED) maxdiff=2.092123e-05 (PASS) > So we are talking of the workhorse parts of layerX.c ... do you > think separating that out for fixed point, but leaving the synth > code alone, would fly? Yes. But currently I have no idea about integer specific optimizations (not asm opts, but algorithm level). I'll look for some papers about integer (or HW) mp3 decoder... > Currently, the integer code is about 13% slower than generic fpu > code here, which could be taken as argument for "hey, floating point > math is better", but that may be flawed;-) > I wonder how had fpu kernel emulation compares to that. Still would > be interesting to test on a i386 with and without fpu, could be that > integer is better there (not sure how much fpu speed improved). Generally, integer multiplication is slow compared to the FP one. And our fixed point decoder uses 64-bit (long long) type, which is also slow on 32-bit CPUs. probably optimizing REAL_MUL with inline assembly code will improve this. Thanks, Taihei Monma |