nasm-devel Mailing List for The Netwide Assembler (Page 26)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Here are both patches as text/plain attachments. I think it may have
been gmail on my end that screwed them up.

On Tue, Feb 19, 2013 at 08:58:56PM -0800, Ben Rudiak-Gould wrote:
> In long mode relative offsets are always 32 bits sign-extended to 64
> bits and absolute near addresses are always 64 bits, regardless of the
> operand size.
> 
> Signed-off-by: Ben Rudiak-Gould <benrudiak_at_gmail.com>
---
> 
> diff --git a/disasm.c b/disasm.c
> index 46cec8a..50149d2 100644
> --- a/disasm.c
> +++ b/disasm.c
> @@ -532,22 +532,21 @@ static int matches(const struct itemplate *t,
> uint8_t *data,

Seems sourceforge mailer has screwed the patch body and I can't apply it :(
(hpa@ I believe it's a time to setup own Mailman?)

So could you please re-send both patches as attachments with
gor...@gm... CC'ed.



On 02/20/2013 09:23 AM, H. Peter Anvin wrote:
> On 02/19/2013 09:39 PM, Ben Rudiak-Gould wrote:
>> This adds "np" to a bunch of SSE-style instructions that should have
>> it, "norep" (which was implemented but unused) on quasi-SSE
>> instructions that use F2 and F3 as instruction extensions but 66 for
>> operand size, "nof3" (newly implemented) on a few instructions,
>> "norexw" on some instructions that have only 32-bit and 64-bit
>> versions, and one NOLONG. It also removes some incorrect "np"s,
>> changes some "f3"s to "f3i"s, and fixes the decoding of the
>> XCHG/NOP/PAUSE mess: F390 is always PAUSE even when rex.b=1 (at least
>> according to XED).
>
> It should have been REX.R not REX.B, to prevent:
>
> 	[rep] xchg r8,rax
>
> ... from being treated as NOP or PAUSE.
>

Ah, but despite the documentation it is REX.B, not REX.R.

And yes, I can confirm this applies to PAUSE but *NOT* NOP, at least on 
Sandy Bridge, i.e.:

	F3 49 90	- PAUSE	(no swap)
	49 90		- XCHG R8,RAX (registers do swap)

Odd, but that's how it works.

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




On 02/19/2013 09:39 PM, Ben Rudiak-Gould wrote:
> This adds "np" to a bunch of SSE-style instructions that should have
> it, "norep" (which was implemented but unused) on quasi-SSE
> instructions that use F2 and F3 as instruction extensions but 66 for
> operand size, "nof3" (newly implemented) on a few instructions,
> "norexw" on some instructions that have only 32-bit and 64-bit
> versions, and one NOLONG. It also removes some incorrect "np"s,
> changes some "f3"s to "f3i"s, and fixes the decoding of the
> XCHG/NOP/PAUSE mess: F390 is always PAUSE even when rex.b=1 (at least
> according to XED).

It should have been REX.R not REX.B, to prevent:

	[rep] xchg r8,rax

... from being treated as NOP or PAUSE.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




This adds "np" to a bunch of SSE-style instructions that should have
it, "norep" (which was implemented but unused) on quasi-SSE
instructions that use F2 and F3 as instruction extensions but 66 for
operand size, "nof3" (newly implemented) on a few instructions,
"norexw" on some instructions that have only 32-bit and 64-bit
versions, and one NOLONG. It also removes some incorrect "np"s,
changes some "f3"s to "f3i"s, and fixes the decoding of the
XCHG/NOP/PAUSE mess: F390 is always PAUSE even when rex.b=1 (at least
according to XED).

Signed-off-by: Ben Rudiak-Gould <benrudiak_at_gmail.com>

diff --git a/assemble.c b/assemble.c
index 4f791ec..7b33df9 100644
--- a/assemble.c
+++ b/assemble.c
@@ -118,6 +118,8 @@
  * \323          - indicates fixed 64-bit operand size, REX on extensions only.
  * \324          - indicates 64-bit operand size requiring REX prefix.
  * \325          - instruction which always uses spl/bpl/sil/dil
+ * \326          - instruction not valid with 0xF3 REP prefix.  Hint for
+                   disassembler only; for SSE instructions.
  * \330          - a literal byte follows in the code stream, to be added
  *                 to the condition code value of the instruction.
  * \331          - instruction not valid with REP prefix.  Hint for
@@ -1061,6 +1063,9 @@ static int64_t calcsize(int32_t segment, int64_t
offset, int bits,
             ins->rex |= REX_NH;
             break;

+        case 0326:
+            break;
+
         case 0330:
             codes++, length++;
             break;
@@ -1709,6 +1714,9 @@ static void gencode(int32_t segment, int64_t
offset, int bits,
         case 0325:
             break;

+        case 0326:
+            break;
+
         case 0330:
             *bytes = *codes++ ^ condval[ins->condition];
             out(offset, segment, bytes, OUT_RAWDATA, 1, NO_SEG, NO_SEG);
diff --git a/disasm.c b/disasm.c
index 46cec8a..c28ebe2 100644
--- a/disasm.c
+++ b/disasm.c
@@ -819,6 +819,11 @@ static int matches(const struct itemplate *t,
uint8_t *data,
 	    break;
 	}

+        case 0326:
+            if (prefix->rep == 0xF3)
+                return false;
+            break;
+
 	case 0331:
             if (prefix->rep)
                 return false;
diff --git a/insns.dat b/insns.dat
index a039106..0c3828d 100644
--- a/insns.dat
+++ b/insns.dat
@@ -178,18 +178,18 @@ BB0_RESET	void				[	0f 3a]					PENT,CYRIX,ND
 BB1_RESET	void				[	0f 3b]					PENT,CYRIX,ND
 BOUND		reg16,mem			[rm:	o16 62 /r]				186,NOLONG
 BOUND		reg32,mem			[rm:	o32 62 /r]				386,NOLONG
-BSF		reg16,mem			[rm:	o16 0f bc /r]				386,SM
-BSF		reg16,reg16			[rm:	o16 0f bc /r]				386
-BSF		reg32,mem			[rm:	o32 0f bc /r]				386,SM
-BSF		reg32,reg32			[rm:	o32 0f bc /r]				386
-BSF		reg64,mem			[rm:	o64 0f bc /r]				X64,SM
-BSF		reg64,reg64			[rm:	o64 0f bc /r]				X64
-BSR		reg16,mem			[rm:	o16 0f bd /r]				386,SM
-BSR		reg16,reg16			[rm:	o16 0f bd /r]				386
-BSR		reg32,mem			[rm:	o32 0f bd /r]				386,SM
-BSR		reg32,reg32			[rm:	o32 0f bd /r]				386
-BSR		reg64,mem			[rm:	o64 0f bd /r]				X64,SM
-BSR		reg64,reg64			[rm:	o64 0f bd /r]				X64
+BSF		reg16,mem			[rm:	o16 nof3 0f bc /r]			386,SM
+BSF		reg16,reg16			[rm:	o16 nof3 0f bc /r]			386
+BSF		reg32,mem			[rm:	o32 nof3 0f bc /r]			386,SM
+BSF		reg32,reg32			[rm:	o32 nof3 0f bc /r]			386
+BSF		reg64,mem			[rm:	o64 nof3 0f bc /r]			X64,SM
+BSF		reg64,reg64			[rm:	o64 nof3 0f bc /r]			X64
+BSR		reg16,mem			[rm:	o16 nof3 0f bd /r]			386,SM
+BSR		reg16,reg16			[rm:	o16 nof3 0f bd /r]			386
+BSR		reg32,mem			[rm:	o32 nof3 0f bd /r]			386,SM
+BSR		reg32,reg32			[rm:	o32 nof3 0f bd /r]			386
+BSR		reg64,mem			[rm:	o64 nof3 0f bd /r]			X64,SM
+BSR		reg64,reg64			[rm:	o64 nof3 0f bd /r]			X64
 BSWAP		reg32				[r:	o32 0f c8+r]				486
 BSWAP		reg64				[r:	o64 0f c8+r]				X64
 BT		mem,reg16			[mr:	o16 0f a3 /r]				386,SM
@@ -320,7 +320,7 @@ CMPXCHG486	mem,reg16			[mr:	o16 0f a7
/r]				486,SM,UNDOC,ND,LOCK
 CMPXCHG486	reg16,reg16			[mr:	o16 0f a7 /r]				486,UNDOC,ND
 CMPXCHG486	mem,reg32			[mr:	o32 0f a7 /r]				486,SM,UNDOC,ND,LOCK
 CMPXCHG486	reg32,reg32			[mr:	o32 0f a7 /r]				486,UNDOC,ND
-CMPXCHG8B	mem				[m:	hle 0f c7 /1]				PENT,LOCK
+CMPXCHG8B	mem				[m:	hle norexw 0f c7 /1]			PENT,LOCK
 CMPXCHG16B	mem				[m:	o64 0f c7 /1]				X64,LOCK
 CPUID		void				[	0f a2]					PENT
 CPU_READ	void				[	0f 3d]					PENT,CYRIX
@@ -715,7 +715,7 @@ LEA		reg64,mem			[rm:	o64 8d /r]				X64
 LEAVE		void				[	c9]					186
 LES		reg16,mem			[rm:	o16 c4 /r]				8086,NOLONG
 LES		reg32,mem			[rm:	o32 c4 /r]				386,NOLONG
-LFENCE		void				[	0f ae e8]				X64,AMD
+LFENCE		void				[	np 0f ae e8]				X64,AMD
 LFS		reg16,mem			[rm:	o16 0f b4 /r]				386
 LFS		reg32,mem			[rm:	o32 0f b4 /r]				386
 LFS		reg64,mem			[rm:	o64 0f b4 /r]				X64
@@ -774,9 +774,9 @@ LSS		reg64,mem			[rm:	o64 0f b2 /r]				X64
 LTR		mem				[m:	0f 00 /3]				286,PROT,PRIV
 LTR		mem16				[m:	0f 00 /3]				286,PROT,PRIV
 LTR		reg16				[m:	0f 00 /3]				286,PROT,PRIV
-MFENCE		void				[	0f ae f0]				X64,AMD
+MFENCE		void				[	np 0f ae f0]				X64,AMD
 MONITOR		void				[	0f 01 c8]				PRESCOTT
-MONITOR		reg_eax,reg_ecx,reg_edx		[---:	0f 01 c8]				PRESCOTT,ND
+MONITOR		reg_eax,reg_ecx,reg_edx		[---:	0f 01 c8]				PRESCOTT,NOLONG,ND
 MONITOR		reg_rax,reg_ecx,reg_edx		[---:	0f 01 c8]				X64,ND
 MOV		mem,reg_sreg			[mr:	8c /r]					8086,SW
 MOV		reg16,reg_sreg			[mr:	o16 8c /r]				8086
@@ -874,7 +874,7 @@ NEG		rm8				[m:	hle f6 /3]				8086,LOCK
 NEG		rm16				[m:	hle o16 f7 /3]				8086,LOCK
 NEG		rm32				[m:	hle o32 f7 /3]				386,LOCK
 NEG		rm64				[m:	hle o64 f7 /3]				X64,LOCK
-NOP		void				[	norexb 90]				8086
+NOP		void				[	norexb nof3 90]				8086
 NOP		rm16				[m:	o16 0f 1f /0]				P6
 NOP		rm32				[m:	o32 0f 1f /0]				P6
 NOP		rm64				[m:	o64 0f 1f /0]				X64
@@ -938,7 +938,7 @@ PADDUSW		mmxreg,mmxrm			[rm:	np o64nw 0f dd
/r]			PENT,MMX,SQ
 PADDW		mmxreg,mmxrm			[rm:	np o64nw 0f fd /r]			PENT,MMX,SQ
 PAND		mmxreg,mmxrm			[rm:	np o64nw 0f db /r]			PENT,MMX,SQ
 PANDN		mmxreg,mmxrm			[rm:	np o64nw 0f df /r]			PENT,MMX,SQ
-PAUSE		void				[	norexb f3i 90]				8086
+PAUSE		void				[	f3i 90]					8086
 PAVEB		mmxreg,mmxrm			[rm:	o64nw 0f 50 /r]				PENT,MMX,SQ,CYRIX
 PAVGUSB		mmxreg,mmxrm			[rm:	o64nw 0f 0f /r bf]			PENT,3DNOW,SQ
 PCMPEQB		mmxreg,mmxrm			[rm:	np o64nw 0f 74 /r]			PENT,MMX,SQ
@@ -1177,7 +1177,7 @@ SCASB		void				[	repe ae]				8086
 SCASD		void				[	repe o32 af]				386
 SCASQ		void				[	repe o64 af]				X64
 SCASW		void				[	repe o16 af]				8086
-SFENCE		void				[	0f ae f8]				X64,AMD
+SFENCE		void				[	np 0f ae f8]				X64,AMD
 SGDT		mem				[m:	0f 01 /0]				286
 SHL		rm8,unity			[m-:	d0 /4]					8086
 SHL		rm8,reg_cl			[m-:	d2 /4]					8086
@@ -1480,7 +1480,7 @@ CVTTSS2SI	reg32,xmmrm			[rm:	f3 0f 2c
/r]				KATMAI,SSE,SD,AR1
 CVTTSS2SI	reg64,xmmrm			[rm:	o64 f3 0f 2c /r]			X64,SSE,SD,AR1
 DIVPS		xmmreg,xmmrm128			[rm:	np 0f 5e /r]				KATMAI,SSE
 DIVSS		xmmreg,xmmrm32			[rm:	f3 0f 5e /r]				KATMAI,SSE
-LDMXCSR		mem32				[m:	0f ae /2]				KATMAI,SSE
+LDMXCSR		mem32				[m:	np 0f ae /2]				KATMAI,SSE
 MAXPS		xmmreg,xmmrm128			[rm:	np 0f 5f /r]				KATMAI,SSE
 MAXSS		xmmreg,xmmrm32			[rm:	f3 0f 5f /r]				KATMAI,SSE
 MINPS		xmmreg,xmmrm128			[rm:	np 0f 5d /r]				KATMAI,SSE
@@ -1511,7 +1511,7 @@ RSQRTSS		xmmreg,xmmrm32			[rm:	f3 0f 52 /r]				KATMAI,SSE
 SHUFPS		xmmreg,xmmrm128,imm8		[rmi:	np 0f c6 /r ib,u]			KATMAI,SSE
 SQRTPS		xmmreg,xmmrm128			[rm:	np 0f 51 /r]				KATMAI,SSE
 SQRTSS		xmmreg,xmmrm32			[rm:	f3 0f 51 /r]				KATMAI,SSE
-STMXCSR		mem32				[m:	0f ae /3]				KATMAI,SSE
+STMXCSR		mem32				[m:	np 0f ae /3]				KATMAI,SSE
 SUBPS		xmmreg,xmmrm128			[rm:	np 0f 5c /r]				KATMAI,SSE
 SUBSS		xmmreg,xmmrm32			[rm:	f3 0f 5c /r]				KATMAI,SSE
 UCOMISS		xmmreg,xmmrm32			[rm:	np 0f 2e /r]				KATMAI,SSE
@@ -1520,22 +1520,22 @@ UNPCKLPS	xmmreg,xmmrm128			[rm:	np 0f 14
/r]				KATMAI,SSE
 XORPS		xmmreg,xmmrm128			[rm:	np 0f 57 /r]				KATMAI,SSE

 ;# Introduced in Deschutes but necessary for SSE support
-FXRSTOR		mem				[m:	0f ae /1]				P6,SSE,FPU
-FXRSTOR64	mem				[m:	o64 0f ae /1]				X64,SSE,FPU
-FXSAVE		mem				[m:	0f ae /0]				P6,SSE,FPU
-FXSAVE64	mem				[m:	o64 0f ae /0]				X64,SSE,FPU
+FXRSTOR		mem				[m:	np 0f ae /1]				P6,SSE,FPU
+FXRSTOR64	mem				[m:	o64 np 0f ae /1]			X64,SSE,FPU
+FXSAVE		mem				[m:	np 0f ae /0]				P6,SSE,FPU
+FXSAVE64	mem				[m:	o64 np 0f ae /0]			X64,SSE,FPU

 ;# XSAVE group (AVX and extended state)
 ; Introduced in late Penryn ... we really need to clean up the handling
 ; of CPU feature bits.
-XGETBV		void				[	np 0f 01 d0]				NEHALEM
-XSETBV		void				[	np 0f 01 d1]				NEHALEM,PRIV
-XSAVE		mem				[m:	0f ae /4]				NEHALEM
-XSAVE64		mem				[m:	o64 0f ae /4]				LONG,NEHALEM
-XSAVEOPT	mem				[m:	0f ae /6]				FUTURE
-XSAVEOPT64	mem				[m:	o64 0f ae /6]				LONG,FUTURE
-XRSTOR		mem				[m:	0f ae /5]				NEHALEM
-XRSTOR64	mem				[m:	o64 0f ae /5]				LONG,NEHALEM
+XGETBV		void				[	0f 01 d0]				NEHALEM
+XSETBV		void				[	0f 01 d1]				NEHALEM,PRIV
+XSAVE		mem				[m:	np 0f ae /4]				NEHALEM
+XSAVE64		mem				[m:	o64 np 0f ae /4]			LONG,NEHALEM
+XSAVEOPT	mem				[m:	np 0f ae /6]				FUTURE
+XSAVEOPT64	mem				[m:	o64 np 0f ae /6]			LONG,FUTURE
+XRSTOR		mem				[m:	np 0f ae /5]				NEHALEM
+XRSTOR64	mem				[m:	o64 np 0f ae /5]			LONG,NEHALEM

 ; These instructions are not SSE-specific; they are
 ;# Generic memory operations
@@ -1544,7 +1544,7 @@ PREFETCHNTA	mem				[m:	0f 18 /0]				KATMAI
 PREFETCHT0	mem				[m:	0f 18 /1]				KATMAI
 PREFETCHT1	mem				[m:	0f 18 /2]				KATMAI
 PREFETCHT2	mem				[m:	0f 18 /3]				KATMAI
-SFENCE		void				[	0f ae f8]				KATMAI
+SFENCE		void				[	np 0f ae f8]				KATMAI

 ;# New MMX instructions introduced in Katmai
 MASKMOVQ	mmxreg,mmxreg			[rm:	np 0f f7 /r]				KATMAI,MMX
@@ -1576,13 +1576,13 @@ PSWAPD		mmxreg,mmxrm			[rm:	o64nw 0f 0f /r
bb]			PENT,3DNOW,SQ
 ;# Willamette SSE2 Cacheability Instructions
 MASKMOVDQU	xmmreg,xmmreg			[rm:	66 0f f7 /r]				WILLAMETTE,SSE2
 ; CLFLUSH needs its own feature flag implemented one day
-CLFLUSH		mem				[m:	0f ae /7]				WILLAMETTE,SSE2
+CLFLUSH		mem				[m:	np 0f ae /7]				WILLAMETTE,SSE2
 MOVNTDQ		mem,xmmreg			[mr:	66 0f e7 /r]				WILLAMETTE,SSE2,SO
 MOVNTI		mem,reg32			[mr:	np 0f c3 /r]				WILLAMETTE,SD
 MOVNTI		mem,reg64			[mr:	o64 np 0f c3 /r]			X64,SQ
 MOVNTPD		mem,xmmreg			[mr:	66 0f 2b /r]				WILLAMETTE,SSE2,SO
-LFENCE		void				[	0f ae e8]				WILLAMETTE,SSE2
-MFENCE		void				[	0f ae f0]				WILLAMETTE,SSE2
+LFENCE		void				[	np 0f ae e8]				WILLAMETTE,SSE2
+MFENCE		void				[	np 0f ae f0]				WILLAMETTE,SSE2

 ;# Willamette MMX instructions (SSE2 SIMD Integer Instructions)
 MOVD		mem,xmmreg			[mr:	66 norexw 0f 7e /r]			WILLAMETTE,SSE2,SD
@@ -1722,20 +1722,20 @@ CVTPD2PS	xmmreg,xmmrm			[rm:	66 0f 5a
/r]				WILLAMETTE,SSE2,SO
 CVTPI2PD	xmmreg,mmxrm			[rm:	66 0f 2a /r]				WILLAMETTE,SSE2,SQ
 CVTPS2DQ	xmmreg,xmmrm			[rm:	66 0f 5b /r]				WILLAMETTE,SSE2,SO
 CVTPS2PD	xmmreg,xmmrm			[rm:	np 0f 5a /r]				WILLAMETTE,SSE2,SQ
-CVTSD2SI	reg32,xmmreg			[rm:	f2 0f 2d /r]				WILLAMETTE,SSE2,SQ,AR1
-CVTSD2SI	reg32,mem			[rm:	f2 0f 2d /r]				WILLAMETTE,SSE2,SQ,AR1
+CVTSD2SI	reg32,xmmreg			[rm:	norexw f2 0f 2d /r]			WILLAMETTE,SSE2,SQ,AR1
+CVTSD2SI	reg32,mem			[rm:	norexw f2 0f 2d /r]			WILLAMETTE,SSE2,SQ,AR1
 CVTSD2SI	reg64,xmmreg			[rm:	o64 f2 0f 2d /r]			X64,SSE2,SQ,AR1
 CVTSD2SI	reg64,mem			[rm:	o64 f2 0f 2d /r]			X64,SSE2,SQ,AR1
 CVTSD2SS	xmmreg,xmmrm			[rm:	f2 0f 5a /r]				WILLAMETTE,SSE2,SQ
 CVTSI2SD	xmmreg,mem			[rm:	f2 0f 2a /r]				WILLAMETTE,SSE2,SD,AR1,ND
-CVTSI2SD	xmmreg,rm32			[rm:	f2 0f 2a /r]				WILLAMETTE,SSE2,SD,AR1
+CVTSI2SD	xmmreg,rm32			[rm:	norexw f2 0f 2a /r]			WILLAMETTE,SSE2,SD,AR1
 CVTSI2SD	xmmreg,rm64			[rm:	o64 f2 0f 2a /r]			X64,SSE2,SQ,AR1
 CVTSS2SD	xmmreg,xmmrm			[rm:	f3 0f 5a /r]				WILLAMETTE,SSE2,SD
 CVTTPD2PI	mmxreg,xmmrm			[rm:	66 0f 2c /r]				WILLAMETTE,SSE2,SO
 CVTTPD2DQ	xmmreg,xmmrm			[rm:	66 0f e6 /r]				WILLAMETTE,SSE2,SO
 CVTTPS2DQ	xmmreg,xmmrm			[rm:	f3 0f 5b /r]				WILLAMETTE,SSE2,SO
-CVTTSD2SI	reg32,xmmreg			[rm:	f2 0f 2c /r]				WILLAMETTE,SSE2,SQ,AR1
-CVTTSD2SI	reg32,mem			[rm:	f2 0f 2c /r]				WILLAMETTE,SSE2,SQ,AR1
+CVTTSD2SI	reg32,xmmreg			[rm:	norexw f2 0f 2c /r]			WILLAMETTE,SSE2,SQ,AR1
+CVTTSD2SI	reg32,mem			[rm:	norexw f2 0f 2c /r]			WILLAMETTE,SSE2,SQ,AR1
 CVTTSD2SI	reg64,xmmreg			[rm:	o64 f2 0f 2c /r]			X64,SSE2,SQ,AR1
 CVTTSD2SI	reg64,mem			[rm:	o64 f2 0f 2c /r]			X64,SSE2,SQ,AR1
 DIVPD		xmmreg,xmmrm			[rm:	66 0f 5e /r]				WILLAMETTE,SSE2,SO
@@ -1795,8 +1795,8 @@ VMFUNC		void				[	0f 01 d4]				VMX
 VMLAUNCH	void				[	0f 01 c2]				VMX
 VMLOAD		void				[	0f 01 da]				X64,VMX
 VMMCALL		void				[	0f 01 d9]				X64,VMX
-VMPTRLD		mem				[m:	0f c7 /6]				VMX
-VMPTRST		mem				[m:	0f c7 /7]				VMX
+VMPTRLD		mem				[m:	np 0f c7 /6]				VMX
+VMPTRST		mem				[m:	np 0f c7 /7]				VMX
 VMREAD		rm32,reg32			[mr:	np 0f 78 /r]				VMX,NOLONG,SD
 VMREAD		rm64,reg64			[mr:	o64nw np 0f 78 /r]			X64,VMX,SQ
 VMRESUME	void				[	0f 01 c3]				VMX
@@ -1878,7 +1878,7 @@ PCMPEQQ		xmmreg,xmmrm			[rm:	66 0f 38 29 /r]				SSE41
 PEXTRB		reg32,xmmreg,imm		[mri:	66 0f 3a 14 /r ib,u]			SSE41
 PEXTRB		mem8,xmmreg,imm			[mri:	66 0f 3a 14 /r ib,u]			SSE41
 PEXTRB		reg64,xmmreg,imm		[mri:	o64 66 0f 3a 14 /r ib,u]		SSE41,X64
-PEXTRD		rm32,xmmreg,imm			[mri:	66 0f 3a 16 /r ib,u]			SSE41
+PEXTRD		rm32,xmmreg,imm			[mri:	norexw 66 0f 3a 16 /r ib,u]			SSE41
 PEXTRQ		rm64,xmmreg,imm			[mri:	o64 66 0f 3a 16 /r ib,u]		SSE41,X64
 PEXTRW		reg32,xmmreg,imm		[mri:	66 0f 3a 15 /r ib,u]			SSE41
 PEXTRW		mem16,xmmreg,imm		[mri:	66 0f 3a 15 /r ib,u]			SSE41
@@ -1887,8 +1887,8 @@ PHMINPOSUW	xmmreg,xmmrm			[rm:	66 0f 38 41 /r]				SSE41
 PINSRB		xmmreg,mem,imm			[rmi:	66 0f 3a 20 /r ib,u]			SSE41,SB,AR2
 PINSRB		xmmreg,rm8,imm			[rmi:	nohi 66 0f 3a 20 /r ib,u]		SSE41,SB,AR2
 PINSRB		xmmreg,reg32,imm		[rmi:	66 0f 3a 20 /r ib,u]			SSE41,SB,AR2
-PINSRD		xmmreg,mem,imm			[rmi:	66 0f 3a 22 /r ib,u]			SSE41,SB,AR2
-PINSRD		xmmreg,rm32,imm			[rmi:	66 0f 3a 22 /r ib,u]			SSE41,SB,AR2
+PINSRD		xmmreg,mem,imm			[rmi:	norexw 66 0f 3a 22 /r ib,u]			SSE41,SB,AR2
+PINSRD		xmmreg,rm32,imm			[rmi:	norexw 66 0f 3a 22 /r ib,u]			SSE41,SB,AR2
 PINSRQ		xmmreg,mem,imm			[rmi:	o64 66 0f 3a 22 /r ib,u]		SSE41,X64,SB,AR2
 PINSRQ		xmmreg,rm64,imm			[rmi:	o64 66 0f 3a 22 /r ib,u]		SSE41,X64,SB,AR2
 PMAXSB		xmmreg,xmmrm			[rm:	66 0f 38 3c /r]				SSE41
@@ -1943,12 +1943,12 @@ PFRSQRTV	mmxreg,mmxrm			[rm:	o64nw 0f 0f /r
87]			PENT,3DNOW,SQ,CYRIX

 ;# Intel new instructions in ???
 ; Is NEHALEM right here?
-MOVBE		reg16,mem16			[rm:	o16 0f 38 f0 /r]			NEHALEM,SM
-MOVBE		reg32,mem32			[rm:	o32 0f 38 f0 /r]			NEHALEM,SM
-MOVBE		reg64,mem64			[rm:	o64 0f 38 f0 /r]			NEHALEM,SM
-MOVBE		mem16,reg16			[mr:	o16 0f 38 f1 /r]			NEHALEM,SM
-MOVBE		mem32,reg32			[mr:	o32 0f 38 f1 /r]			NEHALEM,SM
-MOVBE		mem64,reg64			[mr:	o64 0f 38 f1 /r]			NEHALEM,SM
+MOVBE		reg16,mem16			[rm:	o16 norep 0f 38 f0 /r]			NEHALEM,SM
+MOVBE		reg32,mem32			[rm:	o32 norep 0f 38 f0 /r]			NEHALEM,SM
+MOVBE		reg64,mem64			[rm:	o64 norep 0f 38 f0 /r]			NEHALEM,SM
+MOVBE		mem16,reg16			[mr:	o16 norep 0f 38 f1 /r]			NEHALEM,SM
+MOVBE		mem32,reg32			[mr:	o32 norep 0f 38 f1 /r]			NEHALEM,SM
+MOVBE		mem64,reg64			[mr:	o64 norep 0f 38 f1 /r]			NEHALEM,SM

 ;# Intel AES instructions
 AESENC		xmmreg,xmmrm128			[rm:	66 0f 38 dc /r]				SSE,WESTMERE
@@ -3356,9 +3356,9 @@ XTEST		void				[	0f 01 d6]				FUTURE,HLE,RTM
 ;
 ; based on pub number 319433-011 dated July 2011
 ;
-TZCNT		reg16,rm16			[rm:	o16 f3 0f bc /r]			FUTURE,BMI1
-TZCNT		reg32,rm32			[rm:	o32 f3 0f bc /r]			FUTURE,BMI1
-TZCNT		reg64,rm64			[rm:	o64 f3 0f bc /r]			LONG,FUTURE,BMI1
+TZCNT		reg16,rm16			[rm:	o16 f3i 0f bc /r]			FUTURE,BMI1
+TZCNT		reg32,rm32			[rm:	o32 f3i 0f bc /r]			FUTURE,BMI1
+TZCNT		reg64,rm64			[rm:	o64 f3i 0f bc /r]			LONG,FUTURE,BMI1
 ANDN		reg32,reg32,rm32		[rvm:	vex.nds.lz.0f38.w0 f2 /r]		FUTURE,BMI1
 ANDN		reg64,reg64,rm64		[rvm:	vex.nds.lz.0f38.w1 f2 /r]		LONG,FUTURE,BMI1
 BEXTR		reg32,rm32,reg32		[rmv:	vex.nds.lz.0f38.w0 f7 /r]		FUTURE,BMI1
diff --git a/insns.pl b/insns.pl
index b154dbd..1b9d980 100755
--- a/insns.pl
+++ b/insns.pl
@@ -721,6 +721,8 @@ sub byte_code_compile($$) {
 	'norexw' => 0317,
 	'repe' => 0335,
 	'nohi' => 0325,		# Use spl/bpl/sil/dil even without REX
+	'nof3' => 0326,		# No REP 0xF3 prefix permitted
+	'norep' => 0331,	# No REP prefix permitted
 	'wait' => 0341,		# Needs a wait prefix
 	'resb' => 0340,
 	'jcc8' => 0370,		# Match only if Jcc possible with single byte



In long mode relative offsets are always 32 bits sign-extended to 64
bits and absolute near addresses are always 64 bits, regardless of the
operand size.

Signed-off-by: Ben Rudiak-Gould <benrudiak_at_gmail.com>


diff --git a/disasm.c b/disasm.c
index 46cec8a..50149d2 100644
--- a/disasm.c
+++ b/disasm.c
@@ -532,22 +532,21 @@ static int matches(const struct itemplate *t,
uint8_t *data,
             opx->segment &= ~SEG_32BIT;
 	    break;

-	case4(064):
+	case4(064):  /* rel */
             opx->segment |= SEG_RELATIVE;
-	    if (osize == 16) {
-		opx->offset = gets16(data);
-		data += 2;
-                opx->segment &= ~(SEG_32BIT|SEG_64BIT);
-	    } else if (osize == 32) {
-		opx->offset = gets32(data);
-		data += 4;
-                opx->segment &= ~SEG_64BIT;
-                opx->segment |= SEG_32BIT;
-	    }
-            if (segsize != osize) {
-                opx->type =
-                    (opx->type & ~SIZE_MASK)
-                    | ((osize == 16) ? BITS16 : BITS32);
+            /* In long mode rel is always 32 bits, sign extended. */
+            if (segsize == 64 || osize == 32) {
+                opx->offset = gets32(data);
+                data += 4;
+                if (segsize != 64)
+                    opx->segment |= SEG_32BIT;
+                opx->type = (opx->type & ~SIZE_MASK)
+                    | (segsize == 64 ? BITS64 : BITS32);
+            } else {
+                opx->offset = gets16(data);
+                data += 2;
+                opx->segment &= ~SEG_32BIT;
+                opx->type = (opx->type & ~SIZE_MASK) | BITS16;
             }
 	    break;

diff --git a/insns.dat b/insns.dat
index a039106..fe3b447 100644
--- a/insns.dat
+++ b/insns.dat
@@ -229,14 +229,17 @@ BTS		rm16,imm			[mi:	hle o16 0f ba /5 ib,u]			386,SB,LOCK
 BTS		rm32,imm			[mi:	hle o32 0f ba /5 ib,u]			386,SB,LOCK
 BTS		rm64,imm			[mi:	hle o64 0f ba /5 ib,u]			X64,SB,LOCK
 CALL		imm				[i:	odf e8 rel]				8086
-CALL		imm|near			[i:	odf e8 rel]				8086
+CALL		imm|near			[i:	odf e8 rel]				8086,ND
 CALL		imm|far				[i:	odf 9a iwd seg]				8086,ND,NOLONG
-CALL		imm16				[i:	o16 e8 rel]				8086
-CALL		imm16|near			[i:	o16 e8 rel]				8086
+; Call/jmp near imm/reg/mem is always 64-bit in long mode.
+CALL		imm16				[i:	o16 e8 rel]				8086,NOLONG
+CALL		imm16|near			[i:	o16 e8 rel]				8086,ND,NOLONG
 CALL		imm16|far			[i:	o16 9a iwd seg]				8086,ND,NOLONG
-CALL		imm32				[i:	o32 e8 rel]				386
-CALL		imm32|near			[i:	o32 e8 rel]				386
+CALL		imm32				[i:	o32 e8 rel]				386,NOLONG
+CALL		imm32|near			[i:	o32 e8 rel]				386,ND,NOLONG
 CALL		imm32|far			[i:	o32 9a iwd seg]				386,ND,NOLONG
+CALL		imm64				[i:	o64nw e8 rel]				X64
+CALL		imm64|near			[i:	o64nw e8 rel]				X64,ND
 CALL		imm:imm				[ji:	odf 9a iwd iw]				8086,NOLONG
 CALL		imm16:imm			[ji:	o16 9a iw iw]				8086,NOLONG
 CALL		imm:imm16			[ji:	o16 9a iw iw]				8086,NOLONG
@@ -248,17 +251,13 @@ CALL		mem16|far			[m:	o16 ff /3]				8086
 CALL		mem32|far			[m:	o32 ff /3]				386
 CALL		mem64|far			[m:	o64 ff /3]				X64
 CALL		mem|near			[m:	odf ff /2]				8086,ND
-CALL		mem16|near			[m:	o16 ff /2]				8086,ND
-CALL		mem32|near			[m:	o32 ff /2]				386,NOLONG,ND
-CALL		mem64|near			[m:	o64nw ff /2]				X64,ND
-CALL		reg16				[m:	o16 ff /2]				8086
-CALL		reg32				[m:	o32 ff /2]				386,NOLONG
-CALL		reg64				[m:	o64nw ff /2]				X64
+CALL		rm16|near			[m:	o16 ff /2]				8086,NOLONG,ND
+CALL		rm32|near			[m:	o32 ff /2]				386,NOLONG,ND
+CALL		rm64|near			[m:	o64nw ff /2]				X64,ND
 CALL		mem				[m:	odf ff /2]				8086
-CALL		mem16				[m:	o16 ff /2]				8086
-CALL		mem32				[m:	o32 ff /2]				386,NOLONG
-CALL		mem				[m:	o64nw ff /2]				X64
-CALL		mem64				[m:	o64nw ff /2]				X64
+CALL		rm16				[m:	o16 ff /2]				8086,NOLONG
+CALL		rm32				[m:	o32 ff /2]				386,NOLONG
+CALL		rm64				[m:	o64nw ff /2]				X64
 CBW		void				[	o16 98]					8086
 CDQ		void				[	o32 99]					386
 CDQE		void				[	o64 98]					X64
@@ -661,12 +660,15 @@ JMP		imm				[i:	jmp8 eb rel8]				8086,ND
 JMP		imm				[i:	odf e9 rel]				8086
 JMP		imm|near			[i:	odf e9 rel]				8086,ND
 JMP		imm|far				[i:	odf ea iwd seg]				8086,ND,NOLONG
-JMP		imm16				[i:	o16 e9 rel]				8086
-JMP		imm16|near			[i:	o16 e9 rel]				8086,ND
+; Call/jmp near imm/reg/mem is always 64-bit in long mode.
+JMP		imm16				[i:	o16 e9 rel]				8086,NOLONG
+JMP		imm16|near			[i:	o16 e9 rel]				8086,ND,NOLONG
 JMP		imm16|far			[i:	o16 ea iwd seg]				8086,ND,NOLONG
-JMP		imm32				[i:	o32 e9 rel]				386
-JMP		imm32|near			[i:	o32 e9 rel]				386,ND
+JMP		imm32				[i:	o32 e9 rel]				386,NOLONG
+JMP		imm32|near			[i:	o32 e9 rel]				386,ND,NOLONG
 JMP		imm32|far			[i:	o32 ea iwd seg]				386,ND,NOLONG
+JMP		imm64				[i:	o64nw e9 rel]				X64
+JMP		imm64|near			[i:	o64nw e9 rel]				X64,ND
 JMP		imm:imm				[ji:	odf ea iwd iw]				8086,NOLONG
 JMP		imm16:imm			[ji:	o16 ea iw iw]				8086,NOLONG
 JMP		imm:imm16			[ji:	o16 ea iw iw]				8086,NOLONG
@@ -678,17 +680,13 @@ JMP		mem16|far			[m:	o16 ff /5]				8086
 JMP		mem32|far			[m:	o32 ff /5]				386
 JMP		mem64|far			[m:	o64 ff /5]				X64
 JMP		mem|near			[m:	odf ff /4]				8086,ND
-JMP		mem16|near			[m:	o16 ff /4]				8086,ND
-JMP		mem32|near			[m:	o32 ff /4]				386,NOLONG,ND
-JMP		mem64|near			[m:	o64nw ff /4]				X64,ND
-JMP		reg16				[m:	o16 ff /4]				8086
-JMP		reg32				[m:	o32 ff /4]				386,NOLONG
-JMP		reg64				[m:	o64nw ff /4]				X64
+JMP		rm16|near			[m:	o16 ff /4]				8086,NOLONG,ND
+JMP		rm32|near			[m:	o32 ff /4]				386,NOLONG,ND
+JMP		rm64|near			[m:	o64nw ff /4]				X64,ND
 JMP		mem				[m:	odf ff /4]				8086
-JMP		mem16				[m:	o16 ff /4]				8086
-JMP		mem32				[m:	o32 ff /4]				386,NOLONG
-JMP		mem				[m:	o64nw ff /4]				X64
-JMP		mem64				[m:	o64nw ff /4]				X64
+JMP		rm16				[m:	o16 ff /4]				8086,NOLONG
+JMP		rm32				[m:	o32 ff /4]				386,NOLONG
+JMP		rm64				[m:	o64nw ff /4]				X64
 JMPE		imm				[i:	odf 0f b8 rel]				IA64
 JMPE		imm16				[i:	o16 0f b8 rel]				IA64
 JMPE		imm32				[i:	o32 0f b8 rel]				IA64
@@ -1428,8 +1426,9 @@ CMOVcc		reg32,reg32			[rm:	o32 0f 40+c /r]				P6
 CMOVcc		reg64,mem			[rm:	o64 0f 40+c /r]				X64,SM
 CMOVcc		reg64,reg64			[rm:	o64 0f 40+c /r]				X64
 Jcc		imm|near			[i:	odf 0f 80+c rel]			386
-Jcc		imm16|near			[i:	o16 0f 80+c rel]			386
-Jcc		imm32|near			[i:	o32 0f 80+c rel]			386
+Jcc		imm16|near			[i:	o16 0f 80+c rel]			386,NOLONG
+Jcc		imm32|near			[i:	o32 0f 80+c rel]			386,NOLONG
+Jcc		imm64|near			[i:	o64nw 0f 80+c rel]			X64
 Jcc		imm|short			[i:	70+c rel8]				8086,ND
 Jcc		imm				[i:	jcc8 70+c rel8]				8086,ND
 Jcc		imm				[i:	0f 80+c rel]				386,ND
@@ -3344,11 +3343,13 @@ VPGATHERQQ	ymmreg,mem64,ymmreg		[rmv:	vm64y
vex.dds.256.66.0f38.w1 91 /r]	FUTURE
 XABORT		imm				[i:	c6 f8 ib]				FUTURE,RTM
 XABORT		imm8				[i:	c6 f8 ib]				FUTURE,RTM
 XBEGIN		imm				[i:	odf c7 f8 rel]				FUTURE,RTM
-XBEGIN		imm|near			[i:	odf c7 f8 rel]				FUTURE,RTM
-XBEGIN		imm16				[i:	o16 c7 f8 rel]				FUTURE,RTM
-XBEGIN		imm16|near			[i:	o16 c7 f8 rel]				FUTURE,RTM
-XBEGIN		imm32				[i:	o32 c7 f8 rel]				FUTURE,RTM
-XBEGIN		imm32|near			[i:	o32 c7 f8 rel]				FUTURE,RTM
+XBEGIN		imm|near			[i:	odf c7 f8 rel]				FUTURE,RTM,ND
+XBEGIN		imm16				[i:	o16 c7 f8 rel]				FUTURE,RTM,NOLONG
+XBEGIN		imm16|near			[i:	o16 c7 f8 rel]				FUTURE,RTM,NOLONG,ND
+XBEGIN		imm32				[i:	o32 c7 f8 rel]				FUTURE,RTM,NOLONG
+XBEGIN		imm32|near			[i:	o32 c7 f8 rel]				FUTURE,RTM,NOLONG,ND
+XBEGIN		imm64				[i:	o64nw c7 f8 rel]				FUTURE,RTM,LONG
+XBEGIN		imm64|near			[i:	o64nw c7 f8 rel]				FUTURE,RTM,LONG,ND
 XEND		void				[	0f 01 d5]				FUTURE,RTM
 XTEST		void				[	0f 01 d6]				FUTURE,HLE,RTM



>> However, support for parentheses around an
>> operand (as suggested by #327364-001 3.5)
>> is tricky: for reg ops the parentheses simply
>> evaluate, but for mem ops the parser has to
>> explicitly skip them -- however, that decision
>> hinges on the leading transform modifier and
>> there is no clear reg versus mem distinction,
>> because of the D(...) down converts: they do
>> use reg ops, but with mem transforms.
>>
>> I have not decided the most suitable course
>> yet -- add extra parse-ahead to find '[', parse
>> down convert instructions specially, or, well,
>> ignore Intel's suggested syntax (i.e. no (...),
>> and permit all modifiers before and after any
>> operand, reg or mem [with subsequent tests
>> for their sanity/validity, of course]).
>
> Cyrill and I looked at this last night, and we're not entirely sure what you
> mean.  Our thinking has been to treat braced keywords simply as a separate
> keyword space, which would mean a slightly looser syntax but probably okay.
> I seriously did not follow the big about parens, though...

The KNC ISA PDF suggests this operand syntax:

  {transform} ( op {hint} ) {mask}

  - {eh} can follow memory operand, i.e. [...] {eh}
  - {transform} can precede specific source operand
  - {mask} can follow destination operand
  - the parentheses around op and {hint} are explicit

I have come to refer to that as the canonical form.

By contrast, a relaxed syntax would permit these:

  - incorrect source operand has transform modifier
  - non-destination operand has mask modifier

  - transform modifier specified after operand
  - eviction hint specified before operand
  - mask modifier specified before operand

  - transform modifier operand not preceded by "("
  - transform modifier operand not followed by ")"

  - eviction hint specified after ")", not before
  - mask modifier specified before ")", not after

  - transform modifier invalid for memory operand
  - transform modifier invalid for register operand

  - modifier specified as extra operand

And for that last "specified as extra operand" case:

  - only operand --> ignored
  - leading operand --> applied to next operand
  - trailing operand --> applied to previous operand
  - in between operands --> applied to previous operand

The problem with the "(" and ")" in the canonical form:

  {transform} ( reg op ) --> parentheses will evaluate
  {transform} ( mem op ) --> parentheses won't evaluate

  finding ")" after mem op --> easy
  finding ")" after reg op --> hard

So always skipping "(" is hard because finding ")" is.

And to avoid having to scan ahead for reg vs mem op,
the skip decision must be made based on {transform}.

Since D(...) down converts use reg ops, but have mem
transform modifiers, those instructions require special
treatment. (I did managed to make that work after all.)

That said...

I do have functional KNC support.
I am still working on KNF support.

KNF became obsolete after KNC arrived.
KNC might be obsolete after AVX3 arrives.

Before burdening the assembler with KNF and KNC, it
might make sense to await the AVX3 spec, to find out
what "pieces" will actually be required in the long term.

So far it looks like:

  - support for {modifier} tokens
  - support for one or more of them before any operand
  - support for one or more of them after any operand
  - support for one or more of them without any operand

  versus

  - constrains on which modifier token(s) can be used,
    based on KNC/KNF/AVX3, instruction, operand, etc.

I plan to spend more time on this during the holidays.



On 09/25/2012 03:40 AM, anonymous coward wrote:
>
> However, support for parentheses around an
> operand (as suggested by #327364-001 3.5)
> is tricky: for reg ops the parentheses simply
> evaluate, but for mem ops the parser has to
> explicitly skip them -- however, that decision
> hinges on the leading transform modifier and
> there is no clear reg versus mem distinction,
> because of the D(...) down converts: they do
> use reg ops, but with mem transforms.
>
> I have not decided the most suitable course
> yet -- add extra parse-ahead to find '[', parse
> down convert instructions specially, or, well,
> ignore Intel's suggested syntax (i.e. no (...),
> and permit all modifiers before and after any
> operand, reg or mem [with subsequent tests
> for their sanity/validity, of course]).
>
Hi...

Cyrill and I looked at this last night, and we're not entirely sure what 
you mean.  Our thinking has been to treat braced keywords simply as a 
separate keyword space, which would mean a slightly looser syntax but 
probably okay.  I seriously did not follow the big about parens, though...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




> Third, covering all the cases and making them work with the matching
> engine could take a fair bit of time.

I think this is the biggest issue. I actually suggested something like
this to Jules way back in 1998 or so. (Though my suggestion was to
introduce just one new prefix keyword, e.g. "alt", that would cause
nasm to use the less-common variant encoding for the following
instruction. Of course that falls apart for cases that have three
possible encodings.) He was vaguely sympathetic to the idea, but I
think he felt (rightly) that it was a fair bit of work for the amount
of payoff (particularly if future upkeep is taken into account).

b



On 11/20/2012 07:37 AM, Marat Dukhan wrote:
>
> Example for rex keyword:
>
>     MOV ecx, [rsi] ; encoded without REX
>     rex MOV ecx, [rsi] ; encoded with REX
>
>
> Example for vex3 keyword:
>
>     VPADDD xmm0, xmm0, xmm0 ; encoded with 2-byte VEX prefix
>     vex3 VPADDD xmm0, xmm0, xmm0 ; encoded with 3-byte VEX prefix
>
> Is there any chance to get these features in NASM?
>

Well, there are a few problems:

First, NASM is largely a volunteer project, so getting the resources to 
do it is always a problem.

Second, and this may be the bigger issue, is that it causes namespace 
issues; someone may be using e.g. "rex" as a variable.  However, the 
Xeon Phi assembler syntax already are addressing this by adding new 
keywords recognized only if they are inside curly braces, so perhaps we 
could just make that a general keyword space and make the syntax {rex} 
and so on.

Third, covering all the cases and making them work with the matching 
engine could take a fair bit of time.

If you are interested in this as a project, it is certainly something 
that would be welcome.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Dear NASM developers,

I work on a high-performance library of optimized functions, and use NASM
to assemble the x86/x86-64 specific implementations.
For high-performance code on some x86 microachitectures (e.g. Intel Atom,
Intel Nehalem, AMD Bulldozer) it is essential to align
groups of instructions on certain boundaries (8 or 16 bytes) to achieve
full CPU front-end performance.

There are three ways to align instruction groups on a 8- or 16-byte
boundary: insert NOPs, make instructions longer by adding
prefixes, or make instructions longer by using longer instruction forms.

1. Since NOPs consume decoder resources, they do not help to improve
decoder performance.

2. Adding instruction prefixes helps to certain degree, but CPU decoders
are limited in the number of instruction prefixes they
can decode per cycle, so this technique has limited use.

3. Using different (longer) encoding forms is the optimal solution, but it
requires support from the assembler.
NASM already supports some specifications of instruction forms, e.g.

MOV ecx, [esi] ; encoded without memory displacement
MOV ecx, [byte esi] ; encoded with 8-bit memory displacement
MOV ecx, [dword esi] ; encoded with 32-bit memory displacement


AND ecx, 0F ; encoded with 8-bit immediate
AND ecx, dword 0F ; encoded with 32-bit immediate


MOV ecx, [eax * 2] ; encoded as [eax + eax*1] without offset
MOV ecx, [nosplit eax * 2] ; encoded as [eax*2] with offset

I would like this functionality in NASM to be extended to more instruction
forms,
and suggest new keywords acc, modrm, sib, rex, vex3:

acc keyword forces NASM to use special rax/eax/ax/al encoding form.
Example for acc keyword:

ADD eax, 32 ; encoded as ModR/M + imm8
acc ADD eax, 32 ; encoded as special eax form + imm32


modrm keyword forces NASM to use ModR/M encoding
Example for modrm keyword:

ADD al, 32 ; encoded as special eax form + imm8
modrm ADD al, 32 ; encoded as ModR/M form + imm8 (1 byte londer than the
above version)

PUSH ecx ; encoded as 50+rd
modrm PUSH ecx ; encoded as FF /6


sib keyword forces NASM to use SIB byte even if ModR/M would be enough
Example for sib keyword:

MOV ecx, [esi + 4] ; encoded as ModR/M + imm8
MOV ecx, [sib esi + 4] ; encoded as ModR/M + sib + imm8


Example for rex keyword:

MOV ecx, [rsi] ; encoded without REX
rex MOV ecx, [rsi] ; encoded with REX


Example for vex3 keyword:

VPADDD xmm0, xmm0, xmm0 ; encoded with 2-byte VEX prefix
vex3 VPADDD xmm0, xmm0, xmm0 ; encoded with 3-byte VEX prefix


Is there any chance to get these features in NASM?

Kind regards,
Marat Dukhan

On Tue, Nov 13, 2012 at 07:58:29PM -0800, H. Peter Anvin wrote:
> On 11/05/2012 12:48 PM, nasm-bot for Cyrill Gorcunov wrote:
> >Commit-ID:  7ce86b500c792b782b7b076f50b220fc62234954
> >Gitweb:     http://repo.or.cz/w/nasm.git?a=commitdiff;h=7ce86b500c792b782b7b076f50b220fc62234954
> >Author:     Cyrill Gorcunov <gor...@gm...>
> >AuthorDate: Tue, 6 Nov 2012 00:47:20 +0400
> >Committer:  Cyrill Gorcunov <gor...@gm...>
> >CommitDate: Tue, 6 Nov 2012 00:47:20 +0400
> >
> >BR3392231: Fix get_closest_section_symbol_by_offset
> >
> >This patch changes get_closest_section_symbol_by_offset
> >logic to lookup only the closest symbols which are at
> >or before the supplied offset.
> 
> We could use an rbtree for this.  That's actually exactly what we
> have the rbtree code in NASM for.

Sure. But this would require more code. I'll take a look
once time permits, at moment fast fix should be enough.



On 11/05/2012 12:48 PM, nasm-bot for Cyrill Gorcunov wrote:
> Commit-ID:  7ce86b500c792b782b7b076f50b220fc62234954
> Gitweb:     http://repo.or.cz/w/nasm.git?a=commitdiff;h=7ce86b500c792b782b7b076f50b220fc62234954
> Author:     Cyrill Gorcunov <gor...@gm...>
> AuthorDate: Tue, 6 Nov 2012 00:47:20 +0400
> Committer:  Cyrill Gorcunov <gor...@gm...>
> CommitDate: Tue, 6 Nov 2012 00:47:20 +0400
>
> BR3392231: Fix get_closest_section_symbol_by_offset
>
> This patch changes get_closest_section_symbol_by_offset
> logic to lookup only the closest symbols which are at
> or before the supplied offset.
>

We could use an rbtree for this.  That's actually exactly what we have 
the rbtree code in NASM for.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Hi Frank,

> I suspect that the license change would apply to this anyway,

(I'm also not a lawyer...) It seems likely that this is the case due to  
the language you used in the relicensing, iff all relevant copyright  
holders also participated in the relicensing.

Otherwise, it is of course possible for something that's still in the  
repo's history to be only available under the then-applicable licensing,  
eg 6c98ca4 removed a tool's source which seems to not be legally  
obtainable under the current (2-clause BSD) licence (at least not via the  
repo),  
http://repo.or.cz/w/nasm.git?a=commit;h=6c98ca4ddce52101fb06abff7e65352693a01137

> a notation of which instructions affect which flags would be
> a big improvement, IMO.

I'm currently using an older manual that still includes the reference (and  
thus was compiled with the previous licensing in effect). I want to look  
into compiling it as a document on its own, maybe use Halibut for that. I  
also want to either embed some additional information and use a viewer  
that lets me fold all uninteresting sections, or failing that re-order the  
sections to have the uninteresting ones at the end, or only conditionally  
compile them into the result. (Most of the time, all post-586 extensions  
and even all the FPU instructions are uninteresting to me.) Highlighting  
the 086-/186-/386-compatible instructions also seems like an interesting  
idea.

As far as content changes are concerned, it's still lacking 64-bit mode  
information (one of the apparent reasons it became obsolete for NASM's  
manual). I don't use that as of yet though so it's not important to me.

Affected flags are a good idea. I'm also interested in recording more  
detailed semantic descriptions.

Regards,
Chris



C. Masloch wrote:
> Hello,
> 
> Can the instruction reference be used according to the new NASM (2-clause  
> BSD) licence as well?
> 
> NASM's relicensing was completed around ecfba9d (on 2009-07-06), available  
> here using the web interface:  
> http://repo.or.cz/w/nasm.git/commit/ecfba9d6abdda57383f61031ab3406efba2769b3
> 
> The instruction reference was removed from the sources with 03b9f94 (on  
> 2009-05-09),  
> http://repo.or.cz/w/nasm.git?a=commit;h=03b9f941336d901e32054efc8cda20a3cc3916d3
> 
> No changes were applied to the doc/insref.src file after its extraction  
>  from doc/nasmdoc.src by 9b49e24,  
> http://repo.or.cz/w/nasm.git/commitdiff/9b49e24e1fe1a4afc021f6c3a01720fcabdc47ca
> 
> So next the annotations of that part of doc/nasmdoc.src from 62cb606 (the  
> parent of the extraction, 9b49e24) are relevant, (the last part of)  
> http://repo.or.cz/w/nasm.git/blame/62cb606f6876b01c5d89ad00b6d3d4a3a2ffccf2:/doc/nasmdoc.src
> 
> This indicates that all the relevant changes are recorded as checked in by  
> Peter, Keith, Debbie, and Frank. I don't know whether that means that only  
> the four of you would be relevant for the licensing, though. Hence, it  
> seems best to ask you here.
> 
> Regards,
> Chris

Hi Chris,

I believe the original document was written by Simon Tatham (possible 
input from Julian Hall?). I'm happy with any changes I'm responsible for 
to be under 2-clause BSD. (I suspect that the license change would apply 
to this anyway, but... TGIANAL)

If you're anticipating any changes, a notation of which instructions 
affect which flags would be a big improvement, IMO.

Best,
Frank




Hello,

Can the instruction reference be used according to the new NASM (2-clause  
BSD) licence as well?

NASM's relicensing was completed around ecfba9d (on 2009-07-06), available  
here using the web interface:  
http://repo.or.cz/w/nasm.git/commit/ecfba9d6abdda57383f61031ab3406efba2769b3

The instruction reference was removed from the sources with 03b9f94 (on  
2009-05-09),  
http://repo.or.cz/w/nasm.git?a=commit;h=03b9f941336d901e32054efc8cda20a3cc3916d3

No changes were applied to the doc/insref.src file after its extraction  
 from doc/nasmdoc.src by 9b49e24,  
http://repo.or.cz/w/nasm.git/commitdiff/9b49e24e1fe1a4afc021f6c3a01720fcabdc47ca

So next the annotations of that part of doc/nasmdoc.src from 62cb606 (the  
parent of the extraction, 9b49e24) are relevant, (the last part of)  
http://repo.or.cz/w/nasm.git/blame/62cb606f6876b01c5d89ad00b6d3d4a3a2ffccf2:/doc/nasmdoc.src

This indicates that all the relevant changes are recorded as checked in by  
Peter, Keith, Debbie, and Frank. I don't know whether that means that only  
the four of you would be relevant for the licensing, though. Hence, it  
seems best to ask you here.

Regards,
Chris



The manuals for the encoding are out I have just printed them out.
________________________________________
From: Cyrill Gorcunov [gor...@gm...] on behalf of Cyrill Gorcunov [gor...@op...]
Sent: Tuesday, September 25, 2012 7:46 AM
To: William Cockshott
Cc: nas...@li...
Subject: Re: [Nasm-devel] Xeon Phi

On Mon, Sep 24, 2012 at 07:23:41PM +0000, William Cockshott wrote:
> Hi there I am the chief maintainer of the Vector Pascal compiler which uses Nasm as its preferred back end.
> I am keen to have a version of the compiler out for the Xeon Phi as soon as I can get hold of a Xeon Phi board.
> It would be a big help if the Nasm team plan to release a Xeon Phi upgrade since otherwise I will be
> forced into the undocumented purgatory of the Gnu Assembler.
>
> Are there any such plans?

As Peter mentioned, ineed the encoding is not yet well established. But we have
a plan to support it somewhere in future. No dates though.

        Cyrill



> But we have a plan to support it somewhere in future. No dates though.

I have been experimenting with a prototype
implementation that supports the following:

- K1OM instructions
- support for K1OM to [CPU] and CPU
- support for __CPU_K1OM__
- MVEX.R and MVEX.V prefixes (to force unused bits to 1)
- ZMM0...ZMM31 registers (including VSIB)
- K0...K7 mask registers
- disp8*N displacements
- operand modifiers -- {transform} ( op {eviction hint} ) {mask}
- ZWORD operand size qualifier
- ZWORD segment alignment argument
- DZ and RESZ pseudo instructions
- DZ and __DZ__ standard macros
- XITEMZ optional standard macro

The sanity checking for the {...} modifiers is
somewhat tedious, but doable.

However, support for parentheses around an
operand (as suggested by #327364-001 3.5)
is tricky: for reg ops the parentheses simply
evaluate, but for mem ops the parser has to
explicitly skip them -- however, that decision
hinges on the leading transform modifier and
there is no clear reg versus mem distinction,
because of the D(...) down converts: they do
use reg ops, but with mem transforms.

I have not decided the most suitable course
yet -- add extra parse-ahead to find '[', parse
down convert instructions specially, or, well,
ignore Intel's suggested syntax (i.e. no (...),
and permit all modifiers before and after any
operand, reg or mem [with subsequent tests
for their sanity/validity, of course]).

And yes, I share everyone's concerns about
whether K1OM will persist, or simply be yet
another one-off like L1OM. It's up to Intel, to
provide clarity on that front first.



On Mon, Sep 24, 2012 at 07:23:41PM +0000, William Cockshott wrote:
> Hi there I am the chief maintainer of the Vector Pascal compiler which uses Nasm as its preferred back end.
> I am keen to have a version of the compiler out for the Xeon Phi as soon as I can get hold of a Xeon Phi board.
> It would be a big help if the Nasm team plan to release a Xeon Phi upgrade since otherwise I will be
> forced into the undocumented purgatory of the Gnu Assembler.
> 
> Are there any such plans?

As Peter mentioned, ineed the encoding is not yet well established. But we have
a plan to support it somewhere in future. No dates though.

	Cyrill



On Mon, 24 Sep 2012, William Cockshott wrote:
> Hi there I am the chief maintainer of the Vector Pascal compiler which uses Nasm as its
> preferred back end. I am keen to have a version of the compiler out for the Xeon Phi as
> soon as I can get hold of a Xeon Phi board. It would be a big help if the Nasm team
> plan to release a Xeon Phi upgrade since otherwise I will be forced into the
> undocumented purgatory of the Gnu Assembler.

I've been looking at adding support for Knights Corner (KNC, Xeon Phi) to 
Yasm, but when I asked Intel about it, the response I received was that 
Intel has not committed to maintaining the instruction encodings.  It's 
hard to know if they can't commit right now because that triggers 
cross-licensing arrangements, or whether they really plan on changing the 
encodings in the near future.

As far as I can tell, GAS doesn't even have support for it yet (at least 
in official CVS).

-Peter



Hi there I am the chief maintainer of the Vector Pascal compiler which uses Nasm as its preferred back end.
I am keen to have a version of the compiler out for the Xeon Phi as soon as I can get hold of a Xeon Phi board. It would be a big help if the Nasm team plan to release a Xeon Phi upgrade since otherwise I will be forced into the undocumented purgatory of the Gnu Assembler.

Are there any such plans?

?????? ????? wrote:
> Withdrawn.

Thanks for the attempt, anyway. I hope you'll stick around and discuss 
ideas with this list!

Best,
Frank






Withdrawn. I was working under a wrong assumption that it would use
64-bit relative addressing by default, since I saw no reason to limit
the address space by using absolute.



On 07/02/2012 12:44 AM, Йордан Гигов wrote:
> The language addition can indeed be achieved with macros, but you
> should really test the one in process_ea(). I can't test it until I
> find out why all the 32-bit linkers I try are unable to find any of
> the symbols. I haven't tried alink yet.
> I have a feeling the else block after it won't work right.

That patch looks wrong, and I mean dangerously wrong.  I think you don't 
quite understand how the CPU works.

The reason that code is there is that in 64-bit mode, a displacement 
without a SIB is a RIP-relative reference.  There is no 64-bit 
displacement mode (except for one instruction, see the manual) at all; 
you have to get the address into a register.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.






The language addition can indeed be achieved with macros, but you
should really test the one in process_ea(). I can't test it until I
find out why all the 32-bit linkers I try are unable to find any of
the symbols. I haven't tried alink yet.
I have a feeling the else block after it won't work right.

2012/7/2 H. Peter Anvin <hp...@zy...>:
> On 07/01/2012 08:32 AM, Cyrill Gorcunov wrote:
>>
>> On Sun, Jul 01, 2012 at 10:13:49AM +0300, Йордан Гигов wrote:
>>>
>>> The current version of Nasm never generates mod 0 rm 5 bytes to
>>> address memory or code, thus it can only be linked with
>>> /LARGEADDRESSAWARE:NO by the Microsoft linkers. Additionally you can't
>>> specify a base larger than 0x7FFFFFFF. My patch fixes that.
>>>
>>> Also I make the proposition that in addition to "db", "dw", "dd",
>>> "dq", etc. keywords we add "dp" (as in define pointer). It is to be
>>> the same size as the program's BITS mode. In 64-bit mode it would
>>> behave as dq, in 32-bit as dd, and in 16-bit as dw.
>>
>>
>> I think this should be done rather by a macro definition than
>> squashing into C source (and, btw don't address two problems in
>> one path, it could be 2 patches -- one for sib and one for dp).
>>
>
> I think we could go either way on that... it's not a huge difference.
>
> However, to do an if tree is kind of silly...
>
>         -hpa
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
>
>
>




2000	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (71)	Aug (152)	Sep (123)	Oct (49)	Nov	Dec
2001	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep (3)	Oct	Nov	Dec
2002	Jan	Feb	Mar	Apr (37)	May (554)	Jun (301)	Jul (84)	Aug (39)	Sep (44)	Oct (99)	Nov (41)	Dec (52)
2003	Jan (15)	Feb (32)	Mar (19)	Apr (4)	May (8)	Jun (30)	Jul (122)	Aug (100)	Sep (120)	Oct (4)	Nov (39)	Dec (32)
2004	Jan (38)	Feb (87)	Mar (11)	Apr (23)	May (7)	Jun (6)	Jul (18)	Aug (2)	Sep (22)	Oct (2)	Nov (7)	Dec (48)
2005	Jan (74)	Feb (29)	Mar (28)	Apr (1)	May (24)	Jun (16)	Jul (9)	Aug (7)	Sep (69)	Oct (11)	Nov (13)	Dec (13)
2006	Jan (5)	Feb (3)	Mar (7)	Apr	May (12)	Jun (12)	Jul (5)	Aug (1)	Sep (4)	Oct (61)	Nov (68)	Dec (46)
2007	Jan (16)	Feb (15)	Mar (46)	Apr (171)	May (78)	Jun (109)	Jul (61)	Aug (71)	Sep (189)	Oct (219)	Nov (162)	Dec (91)
2008	Jan (49)	Feb (41)	Mar (43)	Apr (31)	May (70)	Jun (98)	Jul (39)	Aug (8)	Sep (75)	Oct (47)	Nov (11)	Dec (17)
2009	Jan (9)	Feb (12)	Mar (8)	Apr (11)	May (27)	Jun (25)	Jul (161)	Aug (28)	Sep (66)	Oct (36)	Nov (49)	Dec (22)
2010	Jan (34)	Feb (20)	Mar (3)	Apr (12)	May (1)	Jun (10)	Jul (28)	Aug (98)	Sep (7)	Oct (25)	Nov (4)	Dec (9)
2011	Jan	Feb (12)	Mar (7)	Apr (16)	May (11)	Jun (59)	Jul (120)	Aug (7)	Sep (4)	Oct (5)	Nov (3)	Dec (2)
2012	Jan	Feb (6)	Mar (21)	Apr	May	Jun	Jul (9)	Aug	Sep (5)	Oct (3)	Nov (6)	Dec (1)
2013	Jan	Feb (19)	Mar (10)	Apr	May (2)	Jun	Jul (7)	Aug (62)	Sep (14)	Oct (44)	Nov (38)	Dec (47)
2014	Jan (14)	Feb (1)	Mar (4)	Apr	May (20)	Jun	Jul	Aug (8)	Sep (6)	Oct (11)	Nov (9)	Dec (9)
2015	Jan (3)	Feb (2)	Mar (2)	Apr (3)	May (2)	Jun (5)	Jul	Aug (2)	Sep (1)	Oct (1)	Nov (10)	Dec (2)
2016	Jan (12)	Feb (13)	Mar (9)	Apr (45)	May (9)	Jun (2)	Jul (15)	Aug (32)	Sep (6)	Oct (28)	Nov (1)	Dec
2017	Jan (1)	Feb	Mar	Apr (13)	May (8)	Jun (2)	Jul (3)	Aug (10)	Sep	Oct (2)	Nov	Dec (1)
2018	Jan (2)	Feb (4)	Mar (2)	Apr (7)	May	Jun (8)	Jul	Aug (8)	Sep (2)	Oct (2)	Nov (8)	Dec (6)
2019	Jan (2)	Feb	Mar (1)	Apr	May (1)	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec (3)
2020	Jan (3)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2021	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

nasm-devel Mailing List for The Netwide Assembler (Page 26)

nasm-devel — NASM development work