From: SourceForge.net <no...@so...> - 2004-08-19 12:25:42
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |