From: SourceForge.net <no...@so...> - 2004-08-12 22:43:28
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Earnie Boyd (earnie) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-13 01:31:10
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Earnie Boyd (earnie) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-13 11:06:27
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Closed Resolution: Duplicate Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Earnie Boyd (earnie) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-13 17:32:31
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Closed Resolution: Duplicate Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Earnie Boyd (earnie) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-18 21:52:26
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None >Status: Open >Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) >Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 10:08:49
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 10:47:54
|
Bugs item #1008330, was opened at 2004-08-12 18:43 Message generated for change (Comment added) made by earnie You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Earnie Boyd (earnie) Date: 2004-08-19 06:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 06:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 17:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 13:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 07:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-12 21:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 11:01:05
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 11:12:51
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 11:42:27
|
Bugs item #1008330, was opened at 2004-08-12 18:43 Message generated for change (Comment added) made by earnie You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Earnie Boyd (earnie) Date: 2004-08-19 07:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 07:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 07:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 06:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 06:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 17:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 13:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 07:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-12 21:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 12:16:45
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 12:25:42
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 12:55:19
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 21:37:40
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2004-08-20 09:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 23:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 22:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 22:12:20
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 21:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-19 22:30:41
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 21:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-08-20 15:53:11
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2004-08-20 15:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 21:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2004-12-22 09:01:00
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2004-12-22 22:00 Message: Logged In: YES user_id=11494 Hi, This bug (PR/10395 in gcc bugzlla) has been discussed recently on gcc lists and a solution proposed http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html but rejected: http://gcc.gnu.org/ml/gcc/2004-12/msg00918.html in favour of fixing glibc (not much help for us poor windows sods) The rejection however did suggest a workaround: static void * __attribute__((noinline, stdcall)) f_prime (void *p) { /* old code from f */ } void * __attribute__ ((stdcall)) f (void *p) { (void)__builtin_return_address(1); // to force call frame asm volatile ("andl $-16,%%esp" ::: "%esp"); return f_prime (p); } and pass the new f to _beginthreadex. Does that work in your larger programmes? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-21 03:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-20 09:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 23:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 22:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2005-01-18 20:29:27
|
Bugs item #1008330, was opened at 2004-08-12 22:43 Message generated for change (Comment added) made by mattdaws You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open Resolution: None Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Matt Daws (mattdaws) Date: 2005-01-18 20:29 Message: Logged In: YES user_id=1103054 I've tried this work-around on the test code, and it does indeed seem to work. A bit annoying, but thanks for the time put in. From what I understand, there is little that can be done, as GCC assumes a statically aligned stack, whereas Windows does not: the standard code to start a new thread does not align the stack better than to 4 bytes. The best idea I could come up with was to write a new function to start a thread, and have it align the stack better. However, this involves a degree of messing about with thread-line syncronisation etc. and it's not really a proper GCC solution, as starting threads, I guess, should be left to the O/S and not made part of GCC. Actually, the proposed solution given in http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html sounds excellent: very annoying it wasn't added!!! It's suggested in the rejection that pthreads is fixed: not exactly an option for trying to write pure win32 code! Again, thank you for the time put in, cheers, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-12-22 09:00 Message: Logged In: YES user_id=11494 Hi, This bug (PR/10395 in gcc bugzlla) has been discussed recently on gcc lists and a solution proposed http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html but rejected: http://gcc.gnu.org/ml/gcc/2004-12/msg00918.html in favour of fixing glibc (not much help for us poor windows sods) The rejection however did suggest a workaround: static void * __attribute__((noinline, stdcall)) f_prime (void *p) { /* old code from f */ } void * __attribute__ ((stdcall)) f (void *p) { (void)__builtin_return_address(1); // to force call frame asm volatile ("andl $-16,%%esp" ::: "%esp"); return f_prime (p); } and pass the new f to _beginthreadex. Does that work in your larger programmes? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 15:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 21:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 12:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 11:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 11:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 10:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 10:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-18 21:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 17:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 11:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 01:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2005-01-31 07:45:34
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Category: MinGW Group: None Status: Open >Resolution: Remind Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2005-01-31 20:45 Message: Logged In: YES user_id=11494 I'll leave this open, as a reminder. Maybe gcc will accept a target specific attribute??? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2005-01-19 09:29 Message: Logged In: YES user_id=1103054 I've tried this work-around on the test code, and it does indeed seem to work. A bit annoying, but thanks for the time put in. From what I understand, there is little that can be done, as GCC assumes a statically aligned stack, whereas Windows does not: the standard code to start a new thread does not align the stack better than to 4 bytes. The best idea I could come up with was to write a new function to start a thread, and have it align the stack better. However, this involves a degree of messing about with thread-line syncronisation etc. and it's not really a proper GCC solution, as starting threads, I guess, should be left to the O/S and not made part of GCC. Actually, the proposed solution given in http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html sounds excellent: very annoying it wasn't added!!! It's suggested in the rejection that pthreads is fixed: not exactly an option for trying to write pure win32 code! Again, thank you for the time put in, cheers, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-12-22 22:00 Message: Logged In: YES user_id=11494 Hi, This bug (PR/10395 in gcc bugzlla) has been discussed recently on gcc lists and a solution proposed http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html but rejected: http://gcc.gnu.org/ml/gcc/2004-12/msg00918.html in favour of fixing glibc (not much help for us poor windows sods) The rejection however did suggest a workaround: static void * __attribute__((noinline, stdcall)) f_prime (void *p) { /* old code from f */ } void * __attribute__ ((stdcall)) f (void *p) { (void)__builtin_return_address(1); // to force call frame asm volatile ("andl $-16,%%esp" ::: "%esp"); return f_prime (p); } and pass the new f to _beginthreadex. Does that work in your larger programmes? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-21 03:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-20 09:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 23:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 22:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2006-03-18 22:55:36
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: MinGW Group: None Status: Open Resolution: Remind Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2006-03-19 10:55 Message: Logged In: YES user_id=11494 Matt, A patch for a target-specific attribute that should fix your problem is at: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01073.html Danny ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2005-01-31 20:45 Message: Logged In: YES user_id=11494 I'll leave this open, as a reminder. Maybe gcc will accept a target specific attribute??? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2005-01-19 09:29 Message: Logged In: YES user_id=1103054 I've tried this work-around on the test code, and it does indeed seem to work. A bit annoying, but thanks for the time put in. From what I understand, there is little that can be done, as GCC assumes a statically aligned stack, whereas Windows does not: the standard code to start a new thread does not align the stack better than to 4 bytes. The best idea I could come up with was to write a new function to start a thread, and have it align the stack better. However, this involves a degree of messing about with thread-line syncronisation etc. and it's not really a proper GCC solution, as starting threads, I guess, should be left to the O/S and not made part of GCC. Actually, the proposed solution given in http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html sounds excellent: very annoying it wasn't added!!! It's suggested in the rejection that pthreads is fixed: not exactly an option for trying to write pure win32 code! Again, thank you for the time put in, cheers, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-12-22 22:00 Message: Logged In: YES user_id=11494 Hi, This bug (PR/10395 in gcc bugzlla) has been discussed recently on gcc lists and a solution proposed http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html but rejected: http://gcc.gnu.org/ml/gcc/2004-12/msg00918.html in favour of fixing glibc (not much help for us poor windows sods) The rejection however did suggest a workaround: static void * __attribute__((noinline, stdcall)) f_prime (void *p) { /* old code from f */ } void * __attribute__ ((stdcall)) f (void *p) { (void)__builtin_return_address(1); // to force call frame asm volatile ("andl $-16,%%esp" ::: "%esp"); return f_prime (p); } and pass the new f to _beginthreadex. Does that work in your larger programmes? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-21 03:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-20 09:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 23:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 22:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2006-05-25 01:23:00
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Settings changed) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: MinGW Group: None Status: Open >Resolution: Works For Me Priority: 5 Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2006-05-25 13:22 Message: Logged In: YES user_id=11494 With yet-to-be-released GCC 4.2.0, adding __attribute__ ((force_align_arg_pointer)) to definition of thread startup function does the trick. This attribute was added by Darwin contributor. Here is the gcc.info description of the new attribute ((`force_align_arg_pointer' On the Intel x86, the `force_align_arg_pointer' attribute may be applied to individual function definitions, generating an alternate prologue and epilogue that realigns the runtime stack. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. The alternate prologue and epilogue are slower and bigger than the regular ones, and the alternate prologue requires a scratch register; this lowers the number of registers available if used in conjunction with the `regparm' attribute. The `force_align_arg_pointer' attribute is incompatible with nested functions; this is considered a hard error. ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2006-03-19 10:55 Message: Logged In: YES user_id=11494 Matt, A patch for a target-specific attribute that should fix your problem is at: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01073.html Danny ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2005-01-31 20:45 Message: Logged In: YES user_id=11494 I'll leave this open, as a reminder. Maybe gcc will accept a target specific attribute??? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2005-01-19 09:29 Message: Logged In: YES user_id=1103054 I've tried this work-around on the test code, and it does indeed seem to work. A bit annoying, but thanks for the time put in. From what I understand, there is little that can be done, as GCC assumes a statically aligned stack, whereas Windows does not: the standard code to start a new thread does not align the stack better than to 4 bytes. The best idea I could come up with was to write a new function to start a thread, and have it align the stack better. However, this involves a degree of messing about with thread-line syncronisation etc. and it's not really a proper GCC solution, as starting threads, I guess, should be left to the O/S and not made part of GCC. Actually, the proposed solution given in http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html sounds excellent: very annoying it wasn't added!!! It's suggested in the rejection that pthreads is fixed: not exactly an option for trying to write pure win32 code! Again, thank you for the time put in, cheers, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-12-22 22:00 Message: Logged In: YES user_id=11494 Hi, This bug (PR/10395 in gcc bugzlla) has been discussed recently on gcc lists and a solution proposed http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html but rejected: http://gcc.gnu.org/ml/gcc/2004-12/msg00918.html in favour of fixing glibc (not much help for us poor windows sods) The rejection however did suggest a workaround: static void * __attribute__((noinline, stdcall)) f_prime (void *p) { /* old code from f */ } void * __attribute__ ((stdcall)) f (void *p) { (void)__builtin_return_address(1); // to force call frame asm volatile ("andl $-16,%%esp" ::: "%esp"); return f_prime (p); } and pass the new f to _beginthreadex. Does that work in your larger programmes? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-21 03:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-20 09:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 23:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 22:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |
From: SourceForge.net <no...@so...> - 2007-09-25 19:36:15
|
Bugs item #1008330, was opened at 2004-08-13 10:43 Message generated for change (Comment added) made by dannysmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: MinGW Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Matt Daws (mattdaws) Assigned to: Danny Smith (dannysmith) Summary: movaps problem with mingw Initial Comment: I have a slightly complicated program which is crashing when I compile with certain options. I have narrowed the problem down to the fact that MinGW is generating MOVAPS instructions which are trying to access memory at an unaligned point. If I edit the assembler file and replace these with MOVUPS instructions and then assemble and link, my program works fine. I thought this was linked to my use of threads and/or the printf instruction. However, I've removed the printf's and still get the problem. I am now suspecting my use of threads. I tried changing from using CreateThread to using _beginthreadex, but this didn't help. A Google search reveals that other people have been having a general problem with GCC, MOVAPS and threads. Thus this *might* be a problem with GCC. However, people in the know are blaming the problem on system calls which are not setting up the stack in an aligned manner when starting a thread. Hence my hope that _beginthreadex would help. The problem does not occur with Mingw 3.3.3, but this seems to be down to the fact that the compiler no longer produces MOVAPS instructions. I am using Mingw 3.4.1, Windows XP SP1. The compiler flags I use are: -Wall -ffast-math -O3 -march=pentium4 -mthreads But anything enabling SSE seems to give problems. I could send the code, but it is rather long. I could also try to generate a smaller example which does the same thing, but perhaps the bug will not occur. Here is a snippit of the offending assembly (produced with the -S flag in g++): __ZN9Landscape4DrawEv: pushl %ebp movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $236, %esp movl 8(%ebp), %edi flds 8(%edi) flds 40(%edi) ..... movaps %xmm4, -104(%ebp) ..... movaps %xmm0, -136(%ebp) Many thanks in advance for any hints! --Matt Daws ---------------------------------------------------------------------- >Comment By: Danny Smith (dannysmith) Date: 2007-09-26 07:36 Message: Logged In: YES user_id=11494 Originator: NO Fixed in 4.2 series Danny ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2006-05-25 13:22 Message: Logged In: YES user_id=11494 With yet-to-be-released GCC 4.2.0, adding __attribute__ ((force_align_arg_pointer)) to definition of thread startup function does the trick. This attribute was added by Darwin contributor. Here is the gcc.info description of the new attribute ((`force_align_arg_pointer' On the Intel x86, the `force_align_arg_pointer' attribute may be applied to individual function definitions, generating an alternate prologue and epilogue that realigns the runtime stack. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility. The alternate prologue and epilogue are slower and bigger than the regular ones, and the alternate prologue requires a scratch register; this lowers the number of registers available if used in conjunction with the `regparm' attribute. The `force_align_arg_pointer' attribute is incompatible with nested functions; this is considered a hard error. ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2006-03-19 10:55 Message: Logged In: YES user_id=11494 Matt, A patch for a target-specific attribute that should fix your problem is at: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01073.html Danny ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2005-01-31 20:45 Message: Logged In: YES user_id=11494 I'll leave this open, as a reminder. Maybe gcc will accept a target specific attribute??? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2005-01-19 09:29 Message: Logged In: YES user_id=1103054 I've tried this work-around on the test code, and it does indeed seem to work. A bit annoying, but thanks for the time put in. >From what I understand, there is little that can be done, as GCC assumes a statically aligned stack, whereas Windows does not: the standard code to start a new thread does not align the stack better than to 4 bytes. The best idea I could come up with was to write a new function to start a thread, and have it align the stack better. However, this involves a degree of messing about with thread-line syncronisation etc. and it's not really a proper GCC solution, as starting threads, I guess, should be left to the O/S and not made part of GCC. Actually, the proposed solution given in http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html sounds excellent: very annoying it wasn't added!!! It's suggested in the rejection that pthreads is fixed: not exactly an option for trying to write pure win32 code! Again, thank you for the time put in, cheers, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-12-22 22:00 Message: Logged In: YES user_id=11494 Hi, This bug (PR/10395 in gcc bugzlla) has been discussed recently on gcc lists and a solution proposed http://gcc.gnu.org/ml/gcc/2004-12/msg00912.html but rejected: http://gcc.gnu.org/ml/gcc/2004-12/msg00918.html in favour of fixing glibc (not much help for us poor windows sods) The rejection however did suggest a workaround: static void * __attribute__((noinline, stdcall)) f_prime (void *p) { /* old code from f */ } void * __attribute__ ((stdcall)) f (void *p) { (void)__builtin_return_address(1); // to force call frame asm volatile ("andl $-16,%%esp" ::: "%esp"); return f_prime (p); } and pass the new f to _beginthreadex. Does that work in your larger programmes? Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-21 03:52 Message: Logged In: YES user_id=1103054 Okay, I've been busy over at Bugzilla for GCC. The following are interesting: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10395 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9633 Basically it seems that the problem here is well-known, at least to some people. GCC does not do dynamic stack alignment, so things like threads and call-backs from the OS which can break stack-alignment will also break GCC. Other compilers do dynamically align the stack. I get the impression from the above link that this might eventually be what GCC does. As a work around: i) Don't use threads ii) Don't put code which needs alignment (i.e. SSE code) into main(), as functions called by main() get aligned stack, but main() itself does not. iii) If you must use threads, then Danny's idea below seems to work as long as the call-back function does nothing else except align the stack via Danny's trick and then calls the real call-back function. iv) I'm not sure what happens for WinMain(). Hopefully the people over at GCC will come up with a dynamic stack alignment scheme, as this problem also seems to affect various thread implementations under linux. I don't think this is a bug when can be solved by MinGW. That said, Danny's patch for crt2.o does seem to solve the problem with SSE in main() which is nice. Perhaps something similar could be done for WinMain(), although the style of Windows programming (i.e. loads of call-backs) means that I would fully expect SSE with a Windows GUI program to be a nightmare. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:30 Message: Logged In: YES user_id=1103054 Ah, well, the following is cheeky, but seems to work: unsigned __stdcall foo(void*) { unsigned int stack; asm volatile ("movl %%esp,%%eax; andl $-16,%%esp;" : "=a"(stack) : : "%esp"); // body of function asm volatile ("movl %%eax, %%esp;" : : "a"(stack) ); } This works a treat for my project. However, an examination of the assembler output reveals that it's more a case of luck. Again, the "andl $-15, %%esp" instruction comes too late: we have already saved the (unaligned) value of esp to ebp, and ebp is used repeatedly to access memory. A check of the test case reveals that it's the same inlining problem: namely, the above assembler will align esp in that function, and in any *sub functions* the stack and hence ebp will be aligned correctly. However, things can still fall over in the callback function. Again, it's quite possible to work around this: just make sure no SSE code ends up in the callback... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 10:12 Message: Logged In: YES user_id=1103054 This sort of works for me. In the larger project which first gave me problems, this gets the program to run without the crash. However, we do get a crash when the thread tries to exit: precisely for the reason that we've been messing with the stack pointer. Weirdly, the callback functions does not use the LEAVE instruction, whereas some (but, a check reveals, not all) other functions do. The LEAVE function moves ebp back to esp, which means we can mess with esp. This is clearly related to the frame-pointer stuff, but no amount of messing with options can get the LEAVE function included for some reason (I've compiler with no optimisations, and with and without the -fno-omit-frame-pointer option). I guess GCC thinks that it controlling the stack ok, so that it just pops stuff off and does a RET, which is better obviously, except when the user has messed with the stack. Still, it's clearly the correct idea. I mean, I'd even be happy to live with it, as I don't mind never quiting a thread except at exit. Thanks for all the work, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-20 09:37 Message: Logged In: YES user_id=11494 Hello, Simply adding this to the callback fixes the testcase in my tests, but (here is the gotcha) only if compiled with frame- pointers enabled (ie, -fomit-frame-pointer fails) unsigned __stdcall foo(void*) { asm volatile ("andl $-16,%%esp" ::: "%esp"); Landscape land(320,240,0.3,0.01,0.0001); land.Draw(); return 0; } I have not tested with exceptions. I have seen a workaround for the omit-frame-pointer case in GPL'd code; I need to inquire about licensing issues Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:55 Message: Logged In: YES user_id=1103054 Sorry, yet another thought. Of course, the order: save esp to ebp align esp Is the only one possible in _main, as at the end, we must restore esp before the ret instruction. It does mean, however, that we cannot control stack alignment in _main, only in subsequent functions. Of course, in a Windows GUI program, this doesn't matter, as _main is internal, and just calls WinMain. Does this perhaps mean that -mpreferred-stack-boundary= has no effect on Win GUI programs? A quick check suggests yes, as while GCC acts to keep the stack aligned, it has no mechanism to enforce that alignment initially. The only work around I can think off is that there should be a hidden function right at the start of a program which aligns the stack. This would need to be generated when a main() or WinMain() function was present, to ensure -mpreferred-stack-boundary= works as expected. Furthermore, it seems as if we should have a helper function to align the stack in a new thread. All this seems a lot of work however, which makes me think there might be a more elegant way... --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:25 Message: Logged In: YES user_id=1103054 Okay, my hack seems to work. But it's a hack: it needs some syncronistation code added to make it remotely thread-safe... The code always starts a new thread by running a small function which aligns ESP and then calls the real function which we want. We have to jump through some hoops to get GCC to issue the correct assembly output, so a more elegant way would be to write the helper-function in assembly to start with. I'm a bit rusty with asm though. unsigned (__stdcall *func_address) (void *); unsigned int __stdcall hack_stack(void *param) { // Align stack: GCC stack frame means that we can change // ESP here and it'll be reset later. unsigned tmp; asm volatile ( "movl %%esp,%%eax; subl $15, %%eax; andl $-16, %%eax; mov %%eax,%%esp;" : "=a"(tmp) ); // Some random code to ensure GCC issues a CALL to // func_address, and not just a long jump int temp,i; for (i=0; i<1000; i++) temp+=i; func_address(param); // Some more random code. int j; for (j=0; j<1000; j++) temp+=j; } unsigned long my_start_thread(unsigned (__stdcall *addr) (void *), unsigned *ret) { // Should sync here func_address=addr; return _beginthreadex(NULL,0,hack_stack,NULL,0,ret); } ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-20 00:16 Message: Logged In: YES user_id=1103054 Okay, I've tried -mpreferred-stack-boundary= It doesn't stop the crash. Looking at the assmebly output, the following happens: i) In _main, we always have something like: pushl %ebp // Setup stack frame movl %esp, %ebp pushl %ebx // Save ebx, doesn't always occur subl $148, %esp // Allocate local storage andl $-32, %esp // ALIGN STACK call __alloca // Call some helper functions: call ___main // I don't know what these do. The value used in "andl $-32, %esp" changes with preferred-stack-boundary, as expected. Notice, however, that this occurs AFTER the stack-frame is setup, so that %ebp can still be unaligned. However, any function called from _main will be fine, as ESP is aligned. This is what was causing the problem in BUG 1001932 I believe, and I guess Danny changed some code in crt2.o to mean that %esp is aligned at the start, and so %ebp becomes aligned from the start as well. Of course, preferred-stack-boundary doesn't change this! ii) GCC keeps the stack aligned to the requested boundary, so that each function contains a "subl $val, %esp" and $val is changed with preferred-stack-boundary. This works fine, but it assumes that the stack starts out aligned correctly. I guess this is why the code in _main changes, but it seems odd that it does it too late. Furthermore, it means that there is no mechanism for dealing with threads, as then _main never gets called to align the stack. I am currently playing about with assembler to try to write my own thread-launching code to align the stack. I'll let you know if I get it to work (sadly GCC doesn't allow inline asm to alter %esp, which I guess is not too surprising, but a bit annoying). --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 23:42 Message: Logged In: YES user_id=15438 What about adding: -mpreferred-stack-boundary= Attempt to keep stack aligned to this power of 2 ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:12 Message: Logged In: YES user_id=1103054 Hmm, I've been playing with swapping _beginthreadex for CreateThread (which shouldn't be used, I think, as I'm using the CRT). The result is a change of alignment on the stack, but it still isn't aligned on a 16-byte boundary. --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 23:01 Message: Logged In: YES user_id=1103054 Earnie: Nope, doesn't do a thing. Sorry! I really do think this is a stack alignment issue, which now seems solved for main(), but not for new threads. --Matt ---------------------------------------------------------------------- Comment By: Earnie Boyd (earnie) Date: 2004-08-19 22:47 Message: Logged In: YES user_id=15438 Currious: Does -mms-bitfields help? ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-19 22:08 Message: Logged In: YES user_id=1103054 Danny, Thanks for all your work! I've looked at 1001932 and tried the new crt2.o This seems to fix problems in the main thread of my program, but I am still having issues in the second thread which I start up. In particular, it's the same stack alignment issue: it seems that GCC assumes that every function should be called with the stack aligned, so that then the calling address is saved, meaning that the function starts with esp+4 aligned, not esp. This isn't true if I use _beginthreadex to start a new thread. Is this correct, or should I be using some other function to start a new thread? I cannot quite see what alignment _beginthreadex is making: I am tempted to conjecture it makes no alignment beyond to a 4-byte boundary. Thanks, --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-19 09:52 Message: Logged In: YES user_id=11494 Your right, it is a diferent bug -- at least the simple fix that worked for 1001932 doesn't work for this. Thanks for your analysis on the GCC bugzilla report. Danny ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-14 05:32 Message: Logged In: YES user_id=1103054 Hmm, okay, I've now made a small(ish) example program. The key points seem to be: i) -finline-functions is needed to create the movaps instruction. ii) Threads *are* important. If the example is run as a single-thread, all is okay. If I start a new thread to run the offending code in, it crashes. I guess this IS an alignment issue, so I'll leave the bug as closed and see if 1001932 goes anywhere... See attached C++ file. Compile it as, for example, g++ -Wall -ffast-math -O2 -march=pentium4 -finline-functions main.cpp -o test.exe --Matt ---------------------------------------------------------------------- Comment By: Matt Daws (mattdaws) Date: 2004-08-13 23:06 Message: Logged In: YES user_id=1103054 Not sure what the protocol here is: I'm not going to re-open the bug for now. However, I've been doing some more playing, and have found that the optimisation -finline-functions is responsible: without this, a small function of mine is called, and my code works. With it, the function is not called, but is inlined (as expected). However, the curious movaps instruction is now generated. I'm not sure this is a duplicate bug, as I'm not trying to use SSE data types: the instruction is just being generated automatically by GCC, whereas the 1001932 bug relates to the explicit use of SSE data types. Furthermore, I'm not explicitly telling the compiler to use SSE to do maths, only that it is running on a P4, and so is free to use SSE if it so wishes. As the problem only occurs when inlining, I wonder if this is a code generation bug? Furthermore, my trick of replacing the movaps by movups seems to also work if I simply remove the movaps instructions completely, which is a little odd. The same is true if I use -mfpmath=sse, i.e. the inlining causes the error. Furthermore, in this case, if I change movaps to movups, the problem still occurs (but with an Access Violation as the program tries to read from address -1). The same occurs if I remove the movaps instructions which reference memory. I will try and make a simple program which duplicates these problems. --Matt ---------------------------------------------------------------------- Comment By: Danny Smith (dannysmith) Date: 2004-08-13 13:31 Message: Logged In: YES user_id=11494 Duplicate of 1001932, which has a self-contained testcase. The bug has also been reported to GCC's bugzilla, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16890 but no response yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=102435&aid=1008330&group_id=2435 |