Menu

#659 Linux x86 ABI changed; compiler update required

closed
dkl
None
compiler
2016-05-14
2013-02-04
TeeEmCee
No

The GCC devs decided to unilaterally change the Linux x86 ABI [1] [2] [3] [4]. I'm not sure when this happened, but I think it first became a common problem with GCC 4.1. Previously the Linux x86 ABI was the "SysV i386 ABI", which stated that the stack is aligned to a 4-byte boundary on function entry. GCC now assumes by default that the stack is aligned to a 16-byte boundary. This is an very controversial issue with the GCC devs saying they have changed the ABI, and many other people considering the SysV ABI to be the real ABI and GCC to be buggy. GCC Bugzilla is full of flamewars. GCC devs have said "GCC chose to change the unwritten standard for the ABI in use for IA32 GNU/Linux" and "The ABI is undocumented; that is reality" [3]

Anyway, this is a problem because of the existence of SSE instructions that segfault if their operands are not 16-byte aligned. When compiling code with -O3 GCC will use SSE instructions if they are enabled on the selected CPU architecture (eg. -march=pentium4 or -msse or -m32 on a 64 bit machine). If you try to link these object files into a FB program and call them you can get a segfault (testcase below).

Regardless of whether GCC is right or wrong, FBC should be updated to ensure a 16 byte stack alignment when calling external code so that it's compatible with Linux libraries. Some ways this could be done:

1) What I did this for the OSX port was to realign the stack before pushing arguments for any CDECL call, then restore $esp afterwards
2) ensure that the size of a function's stack frame is a multiple of 16 bytes on every function call, plus align the stack on every entry point (program start and thread starts) or in any function that actually makes uses of SSE instructions. But it's not truely necessary to realign the stack when a FB function is called from an external library if you assume that all external linked code ensures 16 byte alignment if any of it requires 16 byte alignment.
3) Realign the stack to 16 bytes at the beginning of every function, and ensure that the size of a function's stack frame is a multiple of 16 bytes on every function call

1) is a hack, tricky, and relatively expensive. 2) is a far cleaner and faster solution and is AFAIK what GCC does. 3) is the simplest, though slightly slower than 2), and is what GCC does with the -mstackrealign argument. I suggest using 3); the speed difference will be tiny compared to all the other optimisations that FBC doesn't do. And I suggest doing so regardless of OS, for simplicity.

16 byte stack alignment is and always has been a part of the OSX ABI.

It appears that GCC's alignment expectation has changed to 16 bytes on all x86 platforms, but it's only a semi-official change to the ABI on Linux. The *BSDs maintainers have apparently been patching GCC or ensuring build args to fix its observation of their ABI (eg. [5]).

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838#c86
[2] https://groups.google.com/forum/?fromgroups#!forum/ia32-abi
[3] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38496#c14
[4] http://lists.freebsd.org/pipermail/freebsd-hackers/2011-January/034059.html
[5] http://mail-index.netbsd.org/port-i386/2012/12/30/msg002975.html

Testcase:

aligntest.bas

extern "C"
  declare sub alignrequired()
end extern
alignrequired()

sseusage.c

typedef int v4si __attribute__ ((vector_size (16)));

v4si s1;

void alignrequired() {
      v4si s2 = s1;
}

Use:

$ gcc -m32 -msse sseusage.c -c
$ fbc aligntest.bas sseusage.o -g
$ ./aligntest
Segmentation fault

$ gdb aligntest
...
(gdb) r
...
Program received signal SIGSEGV, Segmentation fault.
...
(gdb) disass $pc
Dump of assembler code for function alignrequired:
   0x08048fe8 <+0>: push   %ebp
   0x08048fe9 <+1>: mov    %esp,%ebp
   0x08048feb <+3>: sub    $0x18,%esp
   0x08048fee <+6>: movdqa 0x804b470,%xmm0
=> 0x08048ff6 <+14>:    movdqa %xmm0,-0x18(%ebp)
   0x08048ffb <+19>:    leave  
   0x08048ffc <+20>:    ret    
End of assembler dump.

(If that doesn't crash, then the stack just so happens to be aligned. Try adding local variables to aligntest.bas)

Discussion

  • dkl

    dkl - 2013-02-07

    It'd be best to ensure the stack alignment at compile-time where possible. I think we already have some code for that (commented out) in astLoadCALL(), but wouldn't we have to take local variables into account (and scopes)? (i.e. the sum of stack space taken up for local variables plus the call return address should also be padded to 16-byte alignment)

    At least FB itself won't normally emit SSE instructions, though I wonder what happens under -fpu sse... I can see that it's doing some 16-byte alignment, but only through ".balign 16", only for float literal numbers. I don't see it touching vars on stack, which makes it look like it's incomplete too...

    I think it'd be safe to assume that main() and thread entry points will have a 16-byte aligned stack to start with, if the system compiler defaults to 16-byte alignment. Then again, at least the Ubuntu gcc 4.6 I have here generates this at the top of main():

    and esp, -16
    

    that will also give a 16-byte aligned stack I suppose, haha.

     
  • TeeEmCee

    TeeEmCee - 2013-02-10

    Yes the code in astLoadCall looks good, if it's additionally ensured that the function prologue allocates a multiple of 16 bytes. (Doing the two paddings separately will mean sometimes using more stack space that required, but I'm not complaining.) I don't know what you mean by scopes. Isn't the stack space used by a function fixed, regardless of where in the function you are? Are least that's what I documented after poking the FB internals a couple years ago (warning, this amount of dependence on FB internals will disgust you):
    http://rpg.hamsterrepublic.com/ohrrpgce/FB_stack_internals

    I have no idea about emit_SSE.bas. Looking through it I see use of instructions like movaps that require alignment, but I have no idea whether any of the operands could be stack addresses.

    I have no idea whether Linux or _start aligns to 16 bytes.

     
  • dkl

    dkl - 2013-07-22

    Another thing, FB can emit code like this:

    f1( 1, f2( 2 ), 3 )
    
    push 3
    push 2
    call f2
    add esp, 4
    push eax
    push 1
    call f1
    add esp, 12
    

    i.e. PUSHes for nested calls come in between PUSHes for the toplevel call. That makes ensuring 16-byte alignment somewhat more complicated.

     
  • dkl

    dkl - 2016-05-14
    • status: open --> closed
    • assigned_to: dkl
     

Log in to post a comment.