From: Diego B. <di...@bi...> - 2008-04-07 22:48:24
|
On Wed, Apr 02, 2008 at 01:52:24PM +0200, Christophe Massiot wrote: > On Tue, Apr 01, 2008, Diego Biurrun wrote: > > > > Does anyone know why the output of the SSE2 IDCT is different from the > > > one of the MMXEXT IDCT ? Since the algorithm is supposed to be the same, > > > they should have exactly identical results. This not being the case, it > > > may indicate that some extra approximations are used, and the SSE2 code > > > may not pass the ieee 1180 conformance test. > > > > How much of a problem is this given that an IDCT is always a lossy > > operation? > > IDCT is lossy but MPEG-2 specifies that it mustn't be too lossy. So we > must check that the output is within the allowed boundaries. If the > results are identical to the ones of the MMXEXT version, then there is > no need to do the check since the MMXEXT has already been checked to be > compliant. > > However, the results are different, so the checks must be done again. > This is really not my area of expertise, I've asked to people who know > about this but haven't had any answer yet (this is a good opportunity > to remind them :). But if you speak x86 asm fluently (which I don't), > maybe you can spot what's the reason why the results are different while > implementing the same algorithm. Alexander Strange told me he found the problem and provided me with an updated patch. Apparently static const int32_t rounder4_128[] ATTR_ALIGN(16) = rounder (0); should have been static const int32_t rounder4_128[] ATTR_ALIGN(16) = rounder_sse2 (0); Please try the attached version and let me know if this works as expected. Again, I could only do minimal compilation tests, I do not have a machine with SSE. Diego |