Re: [Libjpeg-turbo-devel] xmm6 & xmm7 corruption on win64 after calling jpeg_write_scanlines (from
SIMD-accelerated libjpeg-compatible JPEG codec library
Brought to you by:
dcommander
From: DRC <dco...@us...> - 2012-04-26 03:21:47
|
Can you provide more information about how to reproduce the bug? A test image, for instance? On 4/25/12 9:29 PM, Stefanos Kornilios Mitsis Piitidis wrote: > Hello, > > jsimd_convsamp_sse2 & related assembly functions that use the macros > collect_args & uncollect_args don't properly save and restore all 128 > bits of registers xmm6 and xmm7 as required by the win64 ABI (source: > http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx -> " XMM6:XMM15, > Nonvolatile, Must be preserved as needed by callee" ). While the > macros collect_args & uncollect_args try to preserve xmm6 and xmm7 > movlpd is used instead of movaps resulting in corruption on the high > half of the registers. When libjpeg is used with simd-optimised code > this can lead to wrong calculations (we actually run into this problem > while developing http://code.google.com/p/intrael/ ). > > > The bug is fixed by replacing movlpd with movaps which moves all 128 bits ! > > Index: simd/jsimdext.inc > =================================================================== > --- simd/jsimdext.inc (revision 826) > +++ simd/jsimdext.inc (working copy) > @@ -322,15 +322,15 @@ > push rsi > push rdi > sub rsp, SIZEOF_XMMWORD > - movlpd XMMWORD [rsp], xmm6 > + movaps XMMWORD [rsp], xmm6 > sub rsp, SIZEOF_XMMWORD > - movlpd XMMWORD [rsp], xmm7 > + movaps XMMWORD [rsp], xmm7 > %endmacro > > %imacro uncollect_args 0 > - movlpd xmm7, XMMWORD [rsp] > + movaps xmm7, XMMWORD [rsp] > add rsp, SIZEOF_XMMWORD > - movlpd xmm6, XMMWORD [rsp] > + movaps xmm6, XMMWORD [rsp] > add rsp, SIZEOF_XMMWORD > pop rdi > pop rsi > > > |