Re: [mpg123-devel] fixed point decoders

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I've tuned the fixed point decoder again, and now it is as good as the  
MMX/SSE integer decoder (near the "fully compliant" quality)!

==== Layer 1 ====
--> 16 bit signed integer output
fl1.bit:        RMS=9.086680e-06 (LIMITED) maxdiff=2.175570e-05 (PASS)
fl2.bit:        RMS=9.001627e-06 (LIMITED) maxdiff=2.199411e-05 (PASS)
fl3.bit:        RMS=9.030924e-06 (LIMITED) maxdiff=1.877546e-05 (PASS)
fl4.bit:        RMS=8.908524e-06 (LIMITED) maxdiff=1.573563e-05 (PASS)
fl5.bit:        RMS=9.134317e-06 (LIMITED) maxdiff=1.996756e-05 (PASS)
fl6.bit:        RMS=9.211905e-06 (LIMITED) maxdiff=3.045797e-05 (PASS)
fl7.bit:        RMS=8.427010e-06 (PASS) maxdiff=1.686811e-05 (PASS)
fl8.bit:        RMS=8.977507e-06 (LIMITED) maxdiff=2.092123e-05 (PASS)

==== Layer 2 ====
--> 16 bit signed integer output
fl10.bit:       RMS=9.532534e-06 (LIMITED) maxdiff=2.503395e-05 (PASS)
fl11.bit:       RMS=9.379052e-06 (LIMITED) maxdiff=2.652407e-05 (PASS)
fl12.bit:       RMS=9.330248e-06 (LIMITED) maxdiff=2.586842e-05 (PASS)
fl13.bit:       RMS=8.938162e-06 (LIMITED) maxdiff=1.621246e-05 (PASS)
fl14.bit:       RMS=1.012798e-05 (LIMITED) maxdiff=2.151728e-05 (PASS)
fl15.bit:       RMS=9.110907e-06 (LIMITED) maxdiff=3.051758e-05 (PASS)
fl16.bit:       RMS=1.001348e-05 (LIMITED) maxdiff=4.702806e-05 (PASS)

==== Layer 3 ====
--> 16 bit signed integer output
compl.bit:      RMS=9.119217e-06 (LIMITED) maxdiff=2.092123e-05 (PASS)

> So we are talking of the workhorse parts of layerX.c ... do you  
> think separating that out for fixed point, but leaving the synth  
> code alone, would fly?

Yes. But currently I have no idea about integer specific optimizations  
(not asm opts, but algorithm level). I'll look for some papers about  
integer (or HW) mp3 decoder...

> Currently, the integer code is about 13% slower than generic fpu  
> code here, which could be taken as argument for "hey, floating point  
> math is better", but that may be flawed;-)
> I wonder how had fpu kernel emulation compares to that. Still would  
> be interesting to test on a i386 with and without fpu, could be that  
> integer is better there (not sure how much fpu speed improved).

Generally, integer multiplication is slow compared to the FP one. And  
our fixed point decoder uses 64-bit (long long) type, which is also  
slow on 32-bit CPUs. probably optimizing REAL_MUL with inline assembly  
code will improve this.

Thanks,
Taihei Monma