#183 AVX mode

open
nobody
None
5
2010-12-23
2010-12-23
Jasper Neumann
No

I would like to see an AVX mode variable which when set converts every suitable SSE command to the corresponding AVX variant. It is almost always simply prepending the letter "v" to these commands, e.g. "movaps xmm1,xmm2" => "vmovaps xmm1,xmm2".

Discussion

1 2 > >> (Page 1 of 2)
  • While prepending "V" (and replicating operands as needed)
    seemingly does the trick, you really are looking at 2 different
    operations: the 128-bit legacy SSE ops retain upper bits, but
    the 128-bit VEX ops clear them.

    That said, the assembler itself should implement the ISA. By
    contrast, convenience features like the one you prefer should
    be handled by optional macros.

     
  • Jasper Neumann
    Jasper Neumann
    2010-12-23

    Yes, the AVX commands clear the upper bits. However this is irrelevant if only the lower 128 bits are used.
    I have some procedures coded as macros which can optionally create AVX commands and use the AVX features (essentially the extra target operand) only at some places.
    If I have understood Agner Fog's comments about AVX states (http://www.agner.org/optimize/optimizing_assembly.pdf, chapter 13.6) correctly it makes sense to encode a routine completely in AVX or completely in SSE in order to avoid potentially costly state changes (even if called from state B).

    Using ymm (the high bits( registers is currently only meaningful for floats; I end all such procedures with vclearupper (Linux / Win32 / Win64) or vclearall (Linux / Win32) in order to avoid state C.

    In the meantime have helped myself by prepending a define "avx" just before every SSE command. This is defined empty (for SSE) or "v %+" (for AVX). This kludge works but needs "unnecessary" work and I simply would like to get rid of this.

    The proposed mode variable could become strict SSE, strict AVX and "best of SSE and AVX" (shortest command),

     
  • H. Peter Anvin
    H. Peter Anvin
    2010-12-23

    I would agree this is probably best implemented as a macro package. If someone, like yourself, develops (and maintains!) one I would be willing to consider adding it as one of the builtin macro packages.

     
  • Jasper Neumann
    Jasper Neumann
    2011-01-11

    Well, I have collected all the affected commands and prepared a macro.
    It's usable but I'm not too happy with it; I still would prefer to simply write movdqu instead of ?movdqu.
    Just overriding the built-in movdqu with a define might also be possible for most cases but then we have problems with mmx-commands and e.g. movsd which can also be a string-move-command. So this approach is sure to fail sooner or later.

    Here is the macro:

    %macro assign_avx 1
    ; assign is_avx and the macros avx and ?xxx
    ; avx can be used to prepend sse commands such as "avx movaps xmm1,xmm2"
    ; is_avx can be used to check the avx "state"
    %if %1<>0
    %define avx v %+
    %assign is_avx 1
    %else
    %define avx
    %assign is_avx 0
    %endif
    %idefine ?addpd avx addpd
    ; ... (analogous line for each of the other commands, see below) <<<
    %endmacro

    Intended usage: Prepend the code (i.e. everything or one proc) with
    assign_avx x
    where x is 0 or 1 and for safety append it with
    assign_avx 0
    For all sse commands use the variants with the prepended "?".
    It makes sense to pre-initialize the avx mode to 0 or with e.g.
    %ifdef use_avx
    assign_avx 1
    %else
    assign_avx 0
    %endif
    where use_avx can be set e.g. with the nasm command line.

    Example usage (produces the 2 procs avgb$sse2 and avgb$avx):
    %macro __avgb 2
    assign_avx %1 ; use avx
    ; %2 ; name decorator
    global avgb %+ %2
    avgb %+ %2:
    ?movdqu xmm0,[eax]
    ?movdqu xmm1,[edx]
    ?pavgb xmm0,xmm1
    ?movdqu [eax],xmm0
    ret
    assign_avx 0 ; reset for safety
    %endmacro
    ;
    __avgb 0,$sse2
    __avgb 1,$avx

    The program which uses the thereby created external routines can then select the appropriate one by checking the name decorators (e.g. "$avx").

    Here are the affected commands:

    ADDPD
    ADDPS
    ADDSD
    ADDSS
    ADDSUBPD
    ADDSUBPS
    AESDEC
    AESDECLAST
    AESENC
    AESENCLAST
    AESIMC
    AESKEYGENASSIST
    ANDNPD
    ANDNPS
    ANDPD
    ANDPS
    BLENDPD
    BLENDPS
    BLENDVPD
    BLENDVPS
    CMPEQPD
    CMPEQPS
    CMPEQSD
    CMPEQSS
    CMPLEPD
    CMPLEPS
    CMPLESD
    CMPLESS
    CMPLTPD
    CMPLTPS
    CMPLTSD
    CMPLTSS
    CMPNEQPD
    CMPNEQPS
    CMPNEQSD
    CMPNEQSS
    CMPNLEPD
    CMPNLEPS
    CMPNLESD
    CMPNLESS
    CMPNLTPD
    CMPNLTPS
    CMPNLTSD
    CMPNLTSS
    CMPORDPD
    CMPORDPS
    CMPORDSD
    CMPORDSS
    CMPPD
    CMPPS
    CMPSD
    CMPSS
    CMPUNORDPD
    CMPUNORDPS
    CMPUNORDSD
    CMPUNORDSS
    COMISD
    COMISS
    CVTDQ2PD
    CVTDQ2PS
    CVTPD2DQ
    CVTPD2PS
    CVTPS2DQ
    CVTPS2PD
    CVTSD2SI
    CVTSD2SS
    CVTSI2SD
    CVTSI2SS
    CVTSS2SD
    CVTSS2SI
    CVTTPD2DQ
    CVTTPS2DQ
    CVTTSD2SI
    CVTTSS2SI
    DIVPD
    DIVPS
    DIVSD
    DIVSS
    DPPD
    DPPS
    EXTRACTPS
    HADDPD
    HADDPS
    HSUBPD
    HSUBPS
    INSERTPS
    LDDQU
    LDMXCSR
    MASKMOVDQU
    MAXPD
    MAXPS
    MAXSD
    MAXSS
    MINPD
    MINPS
    MINSD
    MINSS
    MOVAPD
    MOVAPS
    MOVD
    MOVDDUP
    MOVDQA
    MOVDQU
    MOVHLPS
    MOVHPD
    MOVHPS
    MOVLHPS
    MOVLPD
    MOVLPS
    MOVMSKPD
    MOVMSKPS
    MOVNTDQ
    MOVNTDQA
    MOVNTPD
    MOVNTPS
    MOVQ
    MOVSD
    MOVSHDUP
    MOVSLDUP
    MOVSS
    MOVUPD
    MOVUPS
    MPSADBW
    MULPD
    MULPS
    MULSD
    MULSS
    ORPD
    ORPS
    PABSB
    PABSD
    PABSW
    PACKSSDW
    PACKSSWB
    PACKUSDW
    PACKUSWB
    PADDB
    PADDD
    PADDQ
    PADDSB
    PADDSW
    PADDUSB
    PADDUSW
    PADDW
    PALIGNR
    PAND
    PANDN
    PAVGB
    PAVGW
    PBLENDVB
    PBLENDW
    PCLMULHQHQDQ
    PCLMULHQLQDQ
    PCLMULLQHQDQ
    PCLMULLQLQDQ
    PCLMULQDQ
    PCMPEQB
    PCMPEQD
    PCMPEQQ
    PCMPEQW
    PCMPESTRI
    PCMPESTRM
    PCMPGTB
    PCMPGTD
    PCMPGTQ
    PCMPGTW
    PCMPISTRI
    PCMPISTRM
    PEXTRB
    PEXTRD
    PEXTRQ
    PEXTRW
    PHADDD
    PHADDSW
    PHADDW
    PHMINPOSUW
    PHSUBD
    PHSUBSW
    PHSUBW
    PINSRB
    PINSRD
    PINSRQ
    PINSRW
    PMADDUBSW
    PMADDWD
    PMAXSB
    PMAXSD
    PMAXSW
    PMAXUB
    PMAXUD
    PMAXUW
    PMINSB
    PMINSD
    PMINSW
    PMINUB
    PMINUD
    PMINUW
    PMOVMSKB
    PMOVSXBD
    PMOVSXBQ
    PMOVSXBW
    PMOVSXDQ
    PMOVSXWD
    PMOVSXWQ
    PMOVZXBD
    PMOVZXBQ
    PMOVZXBW
    PMOVZXDQ
    PMOVZXWD
    PMOVZXWQ
    PMULDQ
    PMULHRSW
    PMULHUW
    PMULHW
    PMULLD
    PMULLW
    PMULUDQ
    POR
    PSADBW
    PSHUFB
    PSHUFD
    PSHUFHW
    PSHUFLW
    PSIGNB
    PSIGND
    PSIGNW
    PSLLD
    PSLLDQ
    PSLLQ
    PSLLW
    PSRAD
    PSRAW
    PSRLD
    PSRLDQ
    PSRLQ
    PSRLW
    PSUBB
    PSUBD
    PSUBQ
    PSUBSB
    PSUBSW
    PSUBUSB
    PSUBUSW
    PSUBW
    PTEST
    PUNPCKHBW
    PUNPCKHDQ
    PUNPCKHQDQ
    PUNPCKHWD
    PUNPCKLBW
    PUNPCKLDQ
    PUNPCKLQDQ
    PUNPCKLWD
    PXOR
    RCPPS
    RCPSS
    ROUNDPD
    ROUNDPS
    ROUNDSD
    ROUNDSS
    RSQRTPS
    RSQRTSS
    SHUFPD
    SHUFPS
    SQRTPD
    SQRTPS
    SQRTSD
    SQRTSS
    STMXCSR
    SUBPD
    SUBPS
    SUBSD
    SUBSS
    UCOMISD
    UCOMISS
    UNPCKHPD
    UNPCKHPS
    UNPCKLPD
    UNPCKLPS
    XORPD
    XORPS
    ; -- end of list --

     
  • Jasper Neumann
    Jasper Neumann
    2011-01-11

    Some further notes:

    The "?xxx" macros are simply abbreviations of "avx xxx" and could be eliminated.
    A mode which uses the shortest possible command, i.e. sse or vex coded, is -if emplementable at all- quite complicated if done with macros whereas the assembler already knows which command is the shortest.
    The user must always pay attention to not forget the ? or avx prefix.

     
  •  
    Attachments
  •  
    Attachments
  • > I still would prefer to simply write movdqu instead of ?movdqu.

    Attached find source code for how to do that.

    Since NASM doesn't support %REPTOK (see SF #1842438), I had
    to use nested macros, with a particular trick -- see code comment.

     
  • Jasper Neumann
    Jasper Neumann
    2011-01-12

    Your solution (2011-01-11) works most of the time. Thanks a lot!

    I have appended the lists for the different argument counts (without the optional extra parameter for avx for some but not all commands).

    I still have problems with the command movsd which obviously needs another macro without parameters since movsd is also a string command. If I define it "movsd" works but e.g. "rep movsd" does not.

    In avx mode the following commands will not compile if used on mmx registers:
    movd
    movq
    pabsb
    pabsd
    pabsw
    packssdw
    packsswb
    packuswb
    paddb
    paddd
    paddq
    paddsb
    paddsw
    paddusb
    paddusw
    paddw
    palignr
    pand
    pandn
    pavgb
    pavgw
    pcmpeqb
    pcmpeqd
    pcmpeqw
    pcmpgtb
    pcmpgtd
    pcmpgtw
    pextrw
    phaddd
    phaddsw
    phaddw
    phsubd
    phsubsw
    phsubw
    pinsrw
    pmaddubsw
    pmaddwd
    pmaxsw
    pmaxub
    pminsw
    pminub
    pmovmskb
    pmulhrsw
    pmulhuw
    pmulhw
    pmullw
    pmuludq
    por
    psadbw
    pshufb
    psignb
    psignd
    psignw
    pslld
    psllq
    psllw
    psrad
    psraw
    psrld
    psrlq
    psrlw
    psubb
    psubd
    psubq
    psubsb
    psubsw
    psubusb
    psubusw
    psubw
    punpckhbw
    punpckhdq
    punpckhwd
    punpcklbw
    punpckldq
    punpcklwd
    pxor
    ; end of list

    Here are 1-op commands:
    ldmxcsr
    stmxcsr
    ; end of list

    Here are 2-op commands:
    addpd
    addps
    addsd
    addss
    addsubpd
    addsubps
    aesdec
    aesdeclast
    aesenc
    aesenclast
    aesimc
    andnpd
    andnps
    andpd
    andps
    cmpeqpd
    cmpeqps
    cmpeqsd
    cmpeqss
    cmplepd
    cmpleps
    cmplesd
    cmpless
    cmpltpd
    cmpltps
    cmpltsd
    cmpltss
    cmpneqpd
    cmpneqps
    cmpneqsd
    cmpneqss
    cmpnlepd
    cmpnleps
    cmpnlesd
    cmpnless
    cmpnltpd
    cmpnltps
    cmpnltsd
    cmpnltss
    cmpordpd
    cmpordps
    cmpordsd
    cmpordss
    cmpunordpd
    cmpunordps
    cmpunordsd
    cmpunordss
    comisd
    comiss
    cvtdq2pd
    cvtdq2ps
    cvtpd2dq
    cvtpd2ps
    cvtps2dq
    cvtps2pd
    cvtsd2si
    cvtsd2ss
    cvtsi2sd
    cvtsi2ss
    cvtss2sd
    cvtss2si
    cvttpd2dq
    cvttps2dq
    cvttsd2si
    cvttss2si
    divpd
    divps
    divsd
    divss
    haddpd
    haddps
    hsubpd
    hsubps
    lddqu
    maskmovdqu
    maxpd
    maxps
    maxsd
    maxss
    minpd
    minps
    minsd
    minss
    movapd
    movaps
    movd
    movddup
    movdqa
    movdqu
    movhlps
    movhpd
    movhps
    movlhps
    movlpd
    movlps
    movmskpd
    movmskps
    movntdq
    movntdqa
    movntpd
    movntps
    movq
    movsd
    movshdup
    movsldup
    movss
    movupd
    movups
    mulpd
    mulps
    mulsd
    mulss
    orpd
    orps
    pabsb
    pabsd
    pabsw
    packssdw
    packsswb
    packusdw
    packuswb
    paddb
    paddd
    paddq
    paddsb
    paddsw
    paddusb
    paddusw
    paddw
    pand
    pandn
    pavgb
    pavgw
    pclmulhqhqdq
    pclmulhqlqdq
    pclmullqhqdq
    pclmullqlqdq
    pcmpeqb
    pcmpeqd
    pcmpeqq
    pcmpeqw
    pcmpgtb
    pcmpgtd
    pcmpgtq
    pcmpgtw
    phaddd
    phaddsw
    phaddw
    phminposuw
    phsubd
    phsubsw
    phsubw
    pmaddubsw
    pmaddwd
    pmaxsb
    pmaxsd
    pmaxsw
    pmaxub
    pmaxud
    pmaxuw
    pminsb
    pminsd
    pminsw
    pminub
    pminud
    pminuw
    pmovmskb
    pmovsxbd
    pmovsxbq
    pmovsxbw
    pmovsxdq
    pmovsxwd
    pmovsxwq
    pmovzxbd
    pmovzxbq
    pmovzxbw
    pmovzxdq
    pmovzxwd
    pmovzxwq
    pmuldq
    pmulhrsw
    pmulhuw
    pmulhw
    pmulld
    pmullw
    pmuludq
    por
    psadbw
    pshufb
    psignb
    psignd
    psignw
    pslld
    pslldq
    psllq
    psllw
    psrad
    psraw
    psrld
    psrldq
    psrlq
    psrlw
    psubb
    psubd
    psubq
    psubsb
    psubsw
    psubusb
    psubusw
    psubw
    ptest
    punpckhbw
    punpckhdq
    punpckhqdq
    punpckhwd
    punpcklbw
    punpckldq
    punpcklqdq
    punpcklwd
    pxor
    rcpps
    rcpss
    rsqrtps
    rsqrtss
    sqrtpd
    sqrtps
    sqrtsd
    sqrtss
    subpd
    subps
    subsd
    subss
    ucomisd
    ucomiss
    unpckhpd
    unpckhps
    unpcklpd
    unpcklps
    xorpd
    xorps
    ; end of list

    Here are 3-op commands:
    aeskeygenassist
    blendpd
    blendps
    blendvpd
    blendvps
    cmppd
    cmpps
    cmpsd
    cmpss
    dppd
    dpps
    extractps
    insertps
    mpsadbw
    palignr
    pblendvb
    pblendw
    pclmulqdq
    pcmpestri
    pcmpestrm
    pcmpistri
    pcmpistrm
    pextrb
    pextrd
    pextrq
    pextrw
    pinsrb
    pinsrd
    pinsrq
    pinsrw
    pshufd
    pshufhw
    pshuflw
    roundpd
    roundps
    roundsd
    roundss
    shufpd
    shufps
    ; end of list

     
  • > movsd

    It should be possible to handle this with a 0-n arg mmac.

    > In avx mode the following commands will not compile if used on mmx

    Only until x86 introduces VEX encoding and 256-bit support for MMX. :)

     
1 2 > >> (Page 1 of 2)