From: SourceForge.net <no...@so...> - 2004-09-13 09:15:04
|
Bugs item #1001932, was opened at 2004-08-02 13:04 Message generated for change (Comment added) made by knue You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1001932&group_id=2435 Category: gcc Group: None Status: Closed Resolution: Fixed Priority: 7 Submitted By: Italianate (lorenzoseno) Assigned to: Danny Smith (dannysmith) Summary: gcc 3.3 - 3.4 SSE V4sf misalignement Initial Comment: gcc version: both core 3.3.1 and 3.4.0 MinGW version: 3.1.0 mingw-runtime: 3.3 w32api 2.5 ld version 2.13.90 OS: W2k sp4 When using SSE extension, if you declare any v4sf vector __attribute__ ((aligned (16))) amid globals, each one is allocated to the (shared) address 0x0, which is a 16 byte-aligned location so that subsequent movaps doesn't crash the run (but the code doesn't work at all, of course). If you declare the previous v4sfs amid the main variables, the generated addresses are different, but not 16 bytes aligned (they end with an 0x8, in my run). Subsequent movaps (automatically generated by gcc or manually coded) will crash the run. The code can be made working, but only manually coding movups instead of movaps (in asm()). The same things happen if you suppress the attribute - aligned declaration. The gcc 3.3 linux version works instead fine, it automatically generates 16 byte-boundary-aligned v4sf locations, w/o the need of declaring alignement attributes. ---------------------------------------------------------------------- Comment By: knue (knue) Date: 2004-09-13 11:14 Message: Logged In: YES user_id=1039092 I have the same problem. I use the following code, to test the alignment of the vector type v4sf (The proper output should be "0"): #include <iostream> #include <stdlib.h> using namespace std; typedef int v4sf __attribute__ ((mode(V4SF))); int main(int argc, char *argv[]) { v4sf v1; cout << ((int)&v1)%16 << endl; //16-byte-aligned? NO: Output: "8" system("PAUSE"); return 0; } Interesstingly this works with Win 98. I use Win XP. ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-27 01:13 Message: Logged In: YES user_id=11494 I think the problem in your simd3.c is not SSE related, but rather stack exhaustion: Default stack is set at 2MB. This: // vector for cache miss: osc4 ot[0xFFFFFF]; wont fit. Try adding -Wl,--stack,0x2000000 Danny ---------------------------------------------------------------------- Comment By: Italianate (lorenzoseno) Date: 2004-08-24 15:18 Message: Logged In: YES user_id=1095855 Thanks Danny for your work. The patch you submmitted actually fix the problem for simple test cases, but troubles still remain with more complex examples. Here I attached two files. test_auto_main.c works fine, using automatic vectorization. simd3.c crashes instead before reaching any c instruction, giving a segmentation fault, so that the problem is presumably in the initialization code. There is still a lot of code in .rdata section. Regards, and thank you. ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-24 11:21 Message: Logged In: YES user_id=11494 Fixed in mingw-runtime CVS. Danny ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 23:37 Message: Logged In: YES user_id=11494 Hi, I've attached a rebuilt crt2.o that fixes the testcase for me. Could you install into your_mingw_root/lib directory and confirm that it also fixes in your test harnesses? Danny ---------------------------------------------------------------------- Comment By: Italianate (lorenzoseno) Date: 2004-08-08 14:00 Message: Logged In: YES user_id=1095855 To document the misalignement diagnosis: This one: typedef int v4sf __attribute__ ((mode(V4SF))) __attribute__ ((aligned (16))); union f4vector { v4sf v; float f[4]; } __attribute__ ((aligned (16))); int main(int argc, char *argv[]) { volatile union f4vector a __attribute__ ((aligned (16))), b __attribute__ ((aligned (16))); volatile union f4vector c __attribute__ ((aligned (16))), d __attribute__ ((aligned (16))); a.f[0] = 1; a.f[1] = 2; a.f[2] = 3; a.f[3] = 4; b.f[0] = 5; b.f[1] = 6; b.f[2] = 7; b.f[3] = 8; asm volatile ("movaps %1, %%xmm0\n\t" "movaps %2, %%xmm1\n\t" "mulps %%xmm0, %%xmm1\n\t" "movaps %%xmm1, %0\n\t" : "=m" (c.v) : "m" (a.v) , "m" (b.v)); // c.v = a.v * b.v; printf("%f %f %f %f\n", c.f[0], c.f[1], c.f[2], c.f[3]); return(0); } Compile in this way: gcc -ggdb -o bug_asm.exe bug_asm.c It will crash. Now replace "movups" instead to "movaps". Recompile. It gives the correct result: 5.000000 12.000000 21.000000 32.000000 Regards. ---------------------------------------------------------------------- Comment By: Italianate (lorenzoseno) Date: 2004-08-08 13:44 Message: Logged In: YES user_id=1095855 Tanks Danny and sorry for the delay. I was out. About you question: I compiled this one: typedef int v4sf __attribute__ ((mode(V4SF))) __attribute__ ((aligned (16))); union f4vector { v4sf v; float f[4]; } __attribute__ ((aligned (16))); int main(int argc, char *argv[]) { volatile union f4vector a __attribute__ ((aligned (16))), b __attribute__ ((aligned (16))); volatile union f4vector c __attribute__ ((aligned (16))), d __attribute__ ((aligned (16))); a.f[0] = 1; a.f[1] = 2; a.f[2] = 3; a.f[3] = 4; b.f[0] = 5; b.f[1] = 6; b.f[2] = 7; b.f[3] = 8; c.v = a.v * b.v; printf("%f %f %f %f\n", c.f[0], c.f[1], c.f[2], c.f[3]); return(0); } both under Mingw and under Linux. The linux code runs fine, where the windows crashes instead. Disassemble of main under gdb are neverthless perfectly identical. I'm sorry, I'm not so aquainted about how physical memory assignment is made in the gnu compile-linking-loading 386 process, so I can't give any useful suggestion on where the misalignment can happen. As this bug is quite important (the actual situation is such that there is no way to use SSE code under Windows using open-source compilers) can you submit the problem to the guys in charge of assembler and linker? Regards and thank you. ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-04 23:24 Message: Logged In: YES user_id=11494 Well my patch didn't do anything to your bug, but it fixed a different one. Sigh. The output of objdump on your two tests is really strange, with code in .rdata. When I look at the assembly output by gcc -S I can't see any problems, so I suspect this an assembler bug, probably in libbfd. Could you have a look at the GCC -S output and see if you can spot anything wrong. Danny Danny ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-04 13:44 Message: Logged In: YES user_id=11494 Thanks for the testcase. I'm testing patches against gcc- 3.5 and 3.4.1 now. Danny ---------------------------------------------------------------------- Comment By: Italianate (lorenzoseno) Date: 2004-08-04 11:36 Message: Logged In: YES user_id=1095855 Here an example (ld 2.15.90): typedef int v4sf __attribute__ ((mode(V4SF))) __attribute__ ((aligned (16))); union f4vector { v4sf v; float f[4]; } __attribute__ ((aligned (16))); int main(int argc, char *argv[]) { volatile union f4vector a __attribute__ ((aligned (16))), b __attribute__ ((aligned (16))); volatile union f4vector c __attribute__ ((aligned (16))), d __attribute__ ((aligned (16))); a.f[0] = 1; a.f[1] = 2; a.f[2] = 3; a.f[3] = 4; b.f[0] = 5; b.f[1] = 6; b.f[2] = 7; b.f[3] = 8; c.v = a.v * b.v; return(0); } objdump: 55: 89 45 e4 mov %eax,0xffffffe4(%ebp) c.v = a.v * b.v; return(0); } 58: 31 c0 xor %eax,%eax 5a: 0f 28 45 e8 movaps 0xffffffe8(%ebp),%xmm0 5e: 0f 28 4d d8 movaps 0xffffffd8(%ebp),%xmm1 62: 0f 59 c1 mulps %xmm1,%xmm0 65: 0f 29 45 c8 movaps %xmm0,0xffffffc8(%ebp) 69: c9 leave 6a: c3 ret 6b: 90 nop Please note movaps 0xffffffe8 <- Now note: typedef int v4sf __attribute__ ((mode(V4SF))) __attribute__ ((aligned (16))); union f4vector { v4sf v; float f[4]; } __attribute__ ((aligned (16))); volatile union f4vector a __attribute__ ((aligned (16))), b __attribute__ ((aligned (16))); volatile union f4vector c __attribute__ ((aligned (16))), d __attribute__ ((aligned (16))); int main(int argc, char *argv[]) objdump: 68: a3 0c 00 00 00 mov %eax,0xc c.v = a.v * b.v; return(0); } 6d: 31 c0 xor %eax,%eax 6f: 0f 28 05 00 00 00 00 movaps 0x0,%xmm0 76: 0f 28 0d 00 00 00 00 movaps 0x0,%xmm1 7d: 0f 59 c1 mulps %xmm1,%xmm0 80: 0f 29 05 00 00 00 00 movaps %xmm0,0x0 87: c9 leave 88: c3 ret 89: 90 nop note multiple movaps 0x0. Regards. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1001932&group_id=2435 |