Menu

Stack alignment

habran
2011-10-27
2013-04-20
  • habran

    habran - 2011-10-27

    Hi japheth,

    I am having problems with JWasm version v2.07 because of stack alignment on x64 machine
    when pushed uneven number of registers and have local variable it decrements stack for only 8 bytes
    v2.06e does the same

    when pushed even number of registers it aligns 16 bytes and works fine
    or if I add dummy variable

    here is a disassembly:

      8515:SomeSubrutine PROC FRAME USES r12 r14 r15 a1:PTR ASMWEDIT, a2:PTR ASMWEDIT
      8516: local  lpAsmwItem  :PTR ASMWEDITITEMW
      8517:
      8518:   mov r14,rdx
    000000000046D8A7  mov         qword ptr ,rcx
    000000000046D8AC  mov         qword ptr ,rdx
    000000000046D8B1  mov         qword ptr ,r8
    000000000046D8B6  mov         qword ptr ,r9
    000000000046D8BB  push        rbp 
    000000000046D8BC  mov         rbp,rsp
    000000000046D8BF  push        rbx 
    000000000046D8C0  push        r12 
    000000000046D8C2  push        r14 
    000000000046D8C4  push        r15 
    000000000046D8C6  sub         rsp,8

    best regards

     
  • habran

    habran - 2011-10-27

    I have looked how is the stack alignment solved in MSVC for x64 programming
    and found out that they reserve the space on the stack in the beginning of  the subroutine as big as the greatest call plus locals
    so, it is not necessary to push the stack  for each call
    that way it is possible to have faster program with less code

    best regards

     
  • japheth

    japheth - 2011-10-28

    > that way it is possible to have faster program with less code

    Yes, it's more efficient. OTOH, this approach requires to use PUSH/POP very carefully inside the procedure; in any case, the programmer must know exactly what he's doing. Unlike C, which has full control of the stack, the assembler allows more freedom - and hence more possibilities to add bugs.

     
  • habran

    habran - 2011-10-28

    In x64 programming there is available plenty of registers and there is no need so often to use PUSH/POP
    maybe you could make .OPTION  "CFRAME" so when you don't need to use PUSH/POP you can use CFRAME and FRAME otherwise
    You are right in saying that assembly programmer must know exactly what he's doing and I think this would give us more freedom and speed

    thank you for a great tool

    best regards

     
  • habran

    habran - 2011-10-28

    However,

    I am still having a problem with overwritten last QWORD in local vars if pushed registers are uneven

    best regards

     
  • habran

    habran - 2011-11-04

    Hi japheth
    here is the code to prove to you that there is a bug in the JWasm version 206 and 207
    205 is working correctly:

    HeadersTo64bits proc FRAME USES rsi rdi r12 RawText:LPSTR, FixedText:LPSTR,pszFileName :LPCTSTR, fSize:DWORD
    local szName :BYTE
    local szVar :BYTE

    mov rsi,rcx
    mov rdi,rdx
    mov r12,rcx
    add r12,r9

    invoke lstrcpy,addr szName, pszFileName
    ;-----------------------------------------------------------------------

    JWasm version 205bw compiles correctly:
    mdi64!HeadersTo64bits:
    00000000`0040268e 48894c2408      mov     qword ptr ,rcx
    00000000`00402693 4889542410      mov     qword ptr ,rdx
    00000000`00402698 4c89442418      mov     qword ptr ,r8
    00000000`0040269d 4c894c2420      mov     qword ptr ,r9
    00000000`004026a2 55              push    rbp
    00000000`004026a3 488bec          mov     rbp,rsp
    00000000`004026a6 56              push    rsi
    00000000`004026a7 57              push    rdi
    00000000`004026a8 4154            push    r12
    00000000`004026aa 4881ec08020000  sub     rsp,208h
    00000000`004026b1 488bf1          mov     rsi,rcx
    00000000`004026b4 488bfa          mov     rdi,rdx
    00000000`004026b7 4c8be1          mov     r12,rcx
    00000000`004026ba 4d03e1          add     r12,r9
    00000000`004026bd 4883ec20        sub     rsp,20h
    00000000`004026c1 488d8de4feffff  lea     rcx, ;Corect
    00000000`004026c8 488b5520        mov     rdx,qword ptr
    00000000`004026cc ff15fe8f0000    call    qword ptr
    00000000`004026d2 4883c420        add     rsp,20h

    JWasm version 207bw compiles wrong
    and JWasm 206 also wrong
    mdi64!HeadersTo64bits:
    00000000`0040268e 48894c2408      mov     qword ptr ,rcx
    00000000`00402693 4889542410      mov     qword ptr ,rdx
    00000000`00402698 4c89442418      mov     qword ptr ,r8
    00000000`0040269d 4c894c2420      mov     qword ptr ,r9
    00000000`004026a2 55              push    rbp
    00000000`004026a3 488bec          mov     rbp,rsp
    00000000`004026a6 56              push    rsi
    00000000`004026a7 57              push    rdi
    00000000`004026a8 4154            push    r12
    00000000`004026aa 4881ec08020000  sub     rsp,208h
    00000000`004026b1 488bf1          mov     rsi,rcx
    00000000`004026b4 488bfa          mov     rdi,rdx
    00000000`004026b7 4c8be1          mov     r12,rcx
    00000000`004026ba 4d03e1          add     r12,r9
    00000000`004026bd 4883ec20        sub     rsp,20h
    00000000`004026c1 488d8ddcfeffff  lea     rcx, ;Incorect
    00000000`004026c8 488b5520        mov     rdx,qword ptr
    00000000`004026cc ff15fe8f0000    call    qword ptr
    00000000`004026d2 4883c420        add     rsp,20h

    thanks

    regards

     
  • habran

    habran - 2011-11-06

    Hi again,

    JWasm version 207bw can not compile these macros:
    GetGValue MACRO arg:REQ
    IFDIFI <arg>,<eax>
       mov eax, arg
    ENDIF
    shr eax, 8
    and eax, 0ffh
    ENDM <eax>

    GetBValue MACRO arg:REQ
    IFDIFI <arg>,<eax>
       mov eax, arg
    ENDIF
    shr eax, 16
        and eax, 0ffh
    ENDM <eax>

    GetRValue MACRO arg:REQ
    IFDIFI <arg>,<eax>
       mov eax, arg
    ENDIF
    and eax, 0ffh
    ENDM <eax>
    it throws : Error A2209: Syntax error: GetBValue
                      Error A2209: Syntax error: GetGValue
                      Error A2209: Syntax error: GetRValue

    JWasm version 205bw works fine

    best regards

     
  • japheth

    japheth - 2011-11-07

    Habran,
    > JWasm version 207bw can not compile these macros:

    Please post bugs in the "Bugs Tracker"! Also, please don't add new bugs to existing threads. These rules ensure that no bug is forgotten.

    FYI: I confirm the "stack alignment" bug. It's a side-effect of a bugfix in v2.05.

     
  • habran

    habran - 2011-11-08

    Hi Japheth,

    Errors happened because of conflicting with the same macros in wingdi.inc

    GetRValue macro rgb
    exitm <( ( rgb ) ) >
    endm
    GetGValue macro rgb
    exitm <( ( ( ( rgb ) )  shr  8 ) ) >
    endm
    GetBValue macro rgb
    exitm <( ( ( rgb )  shr  16 ) ) >
    endm
    I have changed GetRValue into GetaRValue and it works fine
    However, why JWasm 205 did not produce error?
    Also, these macros from wingdi.inc are not working
    maybe I don't know how to use them 

    Anyway, thank you so much to take care of that stack alignment
    I spent lot of time looking for the bug in my program and couldn't find it
    then I realized that it is a bug in the compiler

    best regards

    BTW what is a "Bugs Tracker"?
    I found only  .err file in my folder and there was the same message as I posted to you
    nothing else

     
  • habran

    habran - 2011-11-12

    I apologize for my ignorance

     
  • habran

    habran - 2012-01-16

    Hi Japheth
    sorry about your difficulties, I know how it fills when you can not do programming

    while I was waiting for you to come back I looked up that problem about stack alignment and found what was wrong

    here is an original source from  v2.06e:

    file: proc.c, line:413

    #if AMD64_SUPPORT
        /* adjust start displacement for Win64 FRAME procs.
         * v2.06: the list may contain xmm registers, which have size 16!
         */
        if ( info->isframe ) {
            uint_16 *regs = info->regslist;
            int sizestd = 0;
            int sizexmm = 0;
            if ( regs )
                for( cnt = *regs++; cnt; cnt-, regs++ )
                    if ( GetValueSp( *regs ) & OP_XMM )
                        sizexmm += 16;
                    else
                        sizestd += 8;
            displ = sizexmm + sizestd;
            if ( sizestd & 0xf )                      // problem is here because not checking if there is any xmm register or not
                displ += 8;                             // just checking for odd or even
        }
    #endif

    here is the correct source:

    #if AMD64_SUPPORT
        /* adjust start displacement for Win64 FRAME procs.
         * v2.06: the list may contain xmm registers, which have size 16!
         */
        if ( info->isframe ) {
            uint_16 *regs = info->regslist;
            int sizestd = 0;
            int sizexmm = 0;
            if ( regs )
                for( cnt = *regs++; cnt; cnt-, regs++ )
                    if ( GetValueSp( *regs ) & OP_XMM )
                        sizexmm += 16;
                    else
                        sizestd += 8;
            displ = sizexmm + sizestd;
            if (( sizestd & 0xf ) && sizexmm)            // is there any xmm register?
                displ += 8;

        }
    #endif

    now it works fine

    I wish you to come back soon
    wee need you

    best regards

     
  • Hjort Nidudsson

    Hjort Nidudsson - 2012-07-05

    >> that way it is possible to have faster program with less code
    >
    > Yes, it's more efficient. OTOH, this approach requires to use PUSH/POP very carefully inside the
    > procedure; in any case, the programmer must know exactly what he's doing. Unlike C, which has full
    > control of the stack, the assembler allows more freedom - and hence more possibilities to add bugs.

    The reason for these bugs is usually readability. You start by hard-coding the locals (mov ,eax), then naming them (result equ ).

    One way of handling this is to use a struct:

    S_OUTPUT    STRUC
    OP_filep    dd ?
    OP_format   dd ?
    OP_charsout     d? ?
    OP_hexoff   d? ?
    OP_state    d? ?
    OP_curadix  d? ?
    OP_prefix   db 2 dup(?)
    OP_count    d? ?
    OP_prefixlen    d? ?
    OP_no_output    d? ?
    OP_fldwidth d? ?
    OP_padding  d? ?
    OP_text     dd ?
    OP_capitalize   d? ?
    ifdef __f__
    OP_numeax   dd ?
    OP_numedx   dd ?
    else
    OP_number   dd ?
    endif
    OP_ddtemp   dd ?
    OP_dwtemp   d? ?
    OP_buffer   db BUFFERSIZE dup(?)
    OP_STACK    dd ? ; [(E)BP]
    ifndef __c__
    OP_CSIP     d? ?
    endif
    ifdef __CDECL__
    OP_ARGfile  dd ?
    OP_ARGformat    dd ?
    OP_argp     dd ?
    else
    OP_argp     dd ?
    OP_ARGformat    dd ?
    OP_ARGfile  dd ?
    endif
    S_OUTPUT    ENDS
        .code
        ASSUME bp?:ptr S_OUTPUT
    _output proc _CType public uses bx? si? di? bp? filep:dword,
        format:dword, argp:dword
    local   OP[S_OUTPUT.OP_STACK]:byte
        lea bp?,OP
        movmx [bp?].OP_format,[bp?].OP_ARGformat
        movmx [bp?].OP_filep,[bp?].OP_ARGfile
        sub ax?,ax?
        mov [bp?].OP_count,ax?
        mov [bp?].OP_charsout,ax?
        mov [bp?].OP_state,ax?
    

    There should be some method for handling this, a reversed stack struct:

    MyStack SSTRUC
    Label_0 db ?
    Label_1 db ?
    Label_2 db ?
    MyStac  ENDS
    MyStac.Label_0 = -3
    MyStac.Label_1 = -2
    MyStac.Label_2 = -1
    
     
  • Hjort Nidudsson

    Hjort Nidudsson - 2012-07-09

    Well, a bit off topic, but I try to convince myself that there is a very simple solution to this..

    To size-up both arguments and locals in one struct seems to be the simplest way of doing this. A redefine of the labels relative to base enable direct access.

    foo struc
    l1  p? ?
    l2  d? ?
    base    dd ?    ; (R|E)BP -- sizeof(locals)
    ifdef __64__
        dd ?
    endif
    ifndef __c__
        d? ?
    endif
    ifdef __CDECL__
    a1      p? ?
    a2      p? ?
    else
    a2      p? ?
    a1      p? ?
    endif
    foo ends
    P_l1    equ <(foo.l1-foo.base)>
    P_l2    equ <(foo.l2-foo.base)>
    P_a1    equ <(foo.a1-foo.base)>
    P_a2    equ <(foo.a2-foo.base)>
    

    The _output function is basicly a large switch with a lot of static functions, all using the same stack frame. In addition to this there is also a math section defined in different files. Child functions will then use the struct:

    OPST_percent:
        sub ax?,ax?
        mov [bp?].OP_no_output,ax?
        mov [bp?].OP_fldwidth,ax?
        mov [bp?].OP_prefixlen,ax?
        mov [bp?].OP_capitalize,ax?
        mov si?,ax? ; bufferiswide (default)
        mov di?,ax? ; precision
        dec di?
        ret
    

    The definition of the proc in 16/16/32/64:

    _output  . . . .P Near   0000     _TEXT    001D     Public   PASCAL
      arg1 . . . . .Word          bp + 0008
      arg2 . . . . .Word          bp + 0006
      arg3 . . . . .Word          bp + 0004
      OP . . . . . .Byte[522]         bp - 020A
    _output  . . . .P Far    0000     _TEXT    001D     Public   PASCAL
      arg1 . . . . .DWord         bp + 000E
      arg2 . . . . .DWord         bp + 000A
      arg3 . . . . .DWord         bp + 0006
      OP . . . . . .Byte[522]         bp - 020A
    _output  . . . .P Near   00000000 _TEXT    0000001F Public   STDCALL
      arg1 . . . . .DWord         ebp + 0008
      arg2 . . . . .DWord         ebp + 000C
      arg3 . . . . .DWord         ebp + 0010
      OP . . . . . .Byte[524]         ebp - 020C
    _output  . . . .P Near   00000000 _TEXT    0000002C Public   FASTCALL
      arg3 . . . . .QWord         rbp + 0020
      arg2 . . . . .QWord         rbp + 0018
      arg1 . . . . .QWord         rbp + 0010
      OP . . . . . .Byte[528]         rbp - 0210
    

    If the stack issue is handled by the proc definition, and invoke is used, the code becomes more readable, and to some degree also portable.

    include clib.inc
    include stdio.inc
        .code
    printf  proc _CDecl public format:ptr byte, argptr:VARARG
        invoke  _stbuf,addr stdout
        push    ax?
        invoke  _output,addr stdout,format,addr argptr
        pop dx?
        push    ax?
        invoke  _ftbuf,dx?,addr stdout
        pop ax?
        ret
    printf  endp
        end
    
     

Log in to post a comment.