[Mingw-notify] [ mingw-Bugs-2906836 ] restrict not honoured or not working properly

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #2906836, was opened at 2009-12-01 15:09
Message generated for change (Tracker Item Submitted) made by thomas-denk
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=102435&aid=2906836&group_id=2435

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: gcc
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: thomas (thomas-denk)
Assigned to: Nobody/Anonymous (nobody)
Summary: __restrict__ not honoured or not working properly

Initial Comment:
Applies to:
-----------
Target: mingw32
Configured with: ../gcc-4.4.0/configure --prefix=/mingw --build=mingw32 --enable-languages=c,ada,c++,fortran,objc,obj-c++ --disable-nls --disable-win32-registry
 --disable-werror --enable-threads --disable-symvers --enable-cxx-flags='-fno-function-sections -fno-data-sections' --enable-fully-dynamic-string --enable-libgo
mp --enable-version-specific-runtime-libs --disable-sjlj-exceptions --program-suffix=-dw2 --with-pkgversion='TDM-1 mingw32' --with-bugurl=http://www.tdragon.net
/recentgcc/bugs.php
Thread model: win32
gcc version 4.4.0-dw2 (TDM-1 mingw32)

Objective:
----------
Provide a template function to avoid doing manual unrolling and copying identical code for half a dozen logical operators.
The aim is to read in data, overlap operations as loads are satisfied, and write out results. Simple enough.	

Implementation:
---------------

template<typename T> void op(T f, __m128i* __restrict__ dst, __m128i* __restrict__ src, unsigned int n)
{
	while(n >= 4)
	{
		n -= 4;
		dst[0] = f(src[0], dst[0]);
		dst[1] = f(src[1], dst[1]);
		dst[2] = f(src[2], dst[2]);
		dst[3] = f(src[3], dst[3]);
		dst += 4; src += 4;
	}

	while(n--)
	{
		*dst = f(*src, *dst);
		++src; ++dst;
	}
}

Usage example: op(_mm_and_si128, buf1, buf2, size);

Problem:
--------
The compiler seems to assume aliasing and therefore produces non-pipelined code, even thought it had been told that there is no aliasing:

	pand	(%ebx,%eax), %xmm0
	movdqa	%xmm0, (%ebx,%eax)
	movdqa	16(%edx,%eax), %xmm0
	pand	16(%ebx,%eax), %xmm0
	movdqa	%xmm0, 16(%ebx,%eax)
	movdqa	32(%edx,%eax), %xmm0
	pand	32(%ebx,%eax), %xmm0
	movdqa	%xmm0, 32(%ebx,%eax)
	movdqa	48(%edx,%eax), %xmm0
	pand	48(%ebx,%eax), %xmm0
	movdqa	%xmm0, 48(%ebx,%eax)
	addl	$64, %eax

Rewriting the function so all inputs are manually consumed before outputs are written (the old, "before restrict age" way) like this:

template<typename T> void op(T f, __m128i* __restrict__ dst, __m128i* __restrict__ src, unsigned int n)
{
	while(n >= 4)
	{
		n -= 4;
		__m128i t1 = f(src[0], dst[0]);
		__m128i t2 = f(src[1], dst[1]);
		__m128i t3 = f(src[2], dst[2]);
		__m128i t4 = f(src[3], dst[3]);
		dst[0] = t1;
		dst[1] = t2;
		dst[2] = t3;
		dst[3] = t4;
		dst += 4; src += 4;
	}

	while(n--)
	{
		*dst = f(*src, *dst);
		++src; ++dst;
	}
}

proves that the compiler is indeed able to generate the desired pipelined code:

	movdqa	(%edx,%eax), %xmm3
	movdqa	16(%edx,%eax), %xmm2
	movdqa	32(%edx,%eax), %xmm1
	movdqa	48(%edx,%eax), %xmm0
	pand	(%ebx,%eax), %xmm3
	pand	16(%ebx,%eax), %xmm2
	pand	32(%ebx,%eax), %xmm1
	pand	48(%ebx,%eax), %xmm0
	subl	$4, %edi
	movdqa	%xmm3, (%ebx,%eax)
	movdqa	%xmm2, 16(%ebx,%eax)
	movdqa	%xmm1, 32(%ebx,%eax)
	movdqa	%xmm0, 48(%ebx,%eax)
	addl	$64, %eax

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=102435&aid=2906836&group_id=2435

[Mingw-notify] [ mingw-Bugs-2906836 ] __restrict__ not honoured or not working properly

A native Windows port of the GNU Compiler Collection (GCC)