From: Marat D. <ma...@gm...> - 2012-11-20 15:37:53
|
Dear NASM developers, I work on a high-performance library of optimized functions, and use NASM to assemble the x86/x86-64 specific implementations. For high-performance code on some x86 microachitectures (e.g. Intel Atom, Intel Nehalem, AMD Bulldozer) it is essential to align groups of instructions on certain boundaries (8 or 16 bytes) to achieve full CPU front-end performance. There are three ways to align instruction groups on a 8- or 16-byte boundary: insert NOPs, make instructions longer by adding prefixes, or make instructions longer by using longer instruction forms. 1. Since NOPs consume decoder resources, they do not help to improve decoder performance. 2. Adding instruction prefixes helps to certain degree, but CPU decoders are limited in the number of instruction prefixes they can decode per cycle, so this technique has limited use. 3. Using different (longer) encoding forms is the optimal solution, but it requires support from the assembler. NASM already supports some specifications of instruction forms, e.g. MOV ecx, [esi] ; encoded without memory displacement MOV ecx, [byte esi] ; encoded with 8-bit memory displacement MOV ecx, [dword esi] ; encoded with 32-bit memory displacement AND ecx, 0F ; encoded with 8-bit immediate AND ecx, dword 0F ; encoded with 32-bit immediate MOV ecx, [eax * 2] ; encoded as [eax + eax*1] without offset MOV ecx, [nosplit eax * 2] ; encoded as [eax*2] with offset I would like this functionality in NASM to be extended to more instruction forms, and suggest new keywords acc, modrm, sib, rex, vex3: acc keyword forces NASM to use special rax/eax/ax/al encoding form. Example for acc keyword: ADD eax, 32 ; encoded as ModR/M + imm8 acc ADD eax, 32 ; encoded as special eax form + imm32 modrm keyword forces NASM to use ModR/M encoding Example for modrm keyword: ADD al, 32 ; encoded as special eax form + imm8 modrm ADD al, 32 ; encoded as ModR/M form + imm8 (1 byte londer than the above version) PUSH ecx ; encoded as 50+rd modrm PUSH ecx ; encoded as FF /6 sib keyword forces NASM to use SIB byte even if ModR/M would be enough Example for sib keyword: MOV ecx, [esi + 4] ; encoded as ModR/M + imm8 MOV ecx, [sib esi + 4] ; encoded as ModR/M + sib + imm8 Example for rex keyword: MOV ecx, [rsi] ; encoded without REX rex MOV ecx, [rsi] ; encoded with REX Example for vex3 keyword: VPADDD xmm0, xmm0, xmm0 ; encoded with 2-byte VEX prefix vex3 VPADDD xmm0, xmm0, xmm0 ; encoded with 3-byte VEX prefix Is there any chance to get these features in NASM? Kind regards, Marat Dukhan |