nasm-devel Mailing List for The Netwide Assembler (Page 23)

Brought to you by: cyrillos, fbkotler, hpa

nasm-devel — NASM development work

You can subscribe to this list here.

2000	Jan	Feb	Mar	Apr	May	Jun (1)	Jul (71)	Aug (152)	Sep (123)	Oct (49)	Nov	Dec
2001	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep (3)	Oct	Nov	Dec
2002	Jan	Feb	Mar	Apr (37)	May (554)	Jun (301)	Jul (84)	Aug (39)	Sep (44)	Oct (99)	Nov (41)	Dec (52)
2003	Jan (15)	Feb (32)	Mar (19)	Apr (4)	May (8)	Jun (30)	Jul (122)	Aug (100)	Sep (120)	Oct (4)	Nov (39)	Dec (32)
2004	Jan (38)	Feb (87)	Mar (11)	Apr (23)	May (7)	Jun (6)	Jul (18)	Aug (2)	Sep (22)	Oct (2)	Nov (7)	Dec (48)
2005	Jan (74)	Feb (29)	Mar (28)	Apr (1)	May (24)	Jun (16)	Jul (9)	Aug (7)	Sep (69)	Oct (11)	Nov (13)	Dec (13)
2006	Jan (5)	Feb (3)	Mar (7)	Apr	May (12)	Jun (12)	Jul (5)	Aug (1)	Sep (4)	Oct (61)	Nov (68)	Dec (46)
2007	Jan (16)	Feb (15)	Mar (46)	Apr (171)	May (78)	Jun (109)	Jul (61)	Aug (71)	Sep (189)	Oct (219)	Nov (162)	Dec (91)
2008	Jan (49)	Feb (41)	Mar (43)	Apr (31)	May (70)	Jun (98)	Jul (39)	Aug (8)	Sep (75)	Oct (47)	Nov (11)	Dec (17)
2009	Jan (9)	Feb (12)	Mar (8)	Apr (11)	May (27)	Jun (25)	Jul (161)	Aug (28)	Sep (66)	Oct (36)	Nov (49)	Dec (22)
2010	Jan (34)	Feb (20)	Mar (3)	Apr (12)	May (1)	Jun (10)	Jul (28)	Aug (98)	Sep (7)	Oct (25)	Nov (4)	Dec (9)
2011	Jan	Feb (12)	Mar (7)	Apr (16)	May (11)	Jun (59)	Jul (120)	Aug (7)	Sep (4)	Oct (5)	Nov (3)	Dec (2)
2012	Jan	Feb (6)	Mar (21)	Apr	May	Jun	Jul (9)	Aug	Sep (5)	Oct (3)	Nov (6)	Dec (1)
2013	Jan	Feb (19)	Mar (10)	Apr	May (2)	Jun	Jul (7)	Aug (62)	Sep (14)	Oct (44)	Nov (38)	Dec (47)
2014	Jan (14)	Feb (1)	Mar (4)	Apr	May (20)	Jun	Jul	Aug (8)	Sep (6)	Oct (11)	Nov (9)	Dec (9)
2015	Jan (3)	Feb (2)	Mar (2)	Apr (3)	May (2)	Jun (5)	Jul	Aug (2)	Sep (1)	Oct (1)	Nov (10)	Dec (2)
2016	Jan (12)	Feb (13)	Mar (9)	Apr (45)	May (9)	Jun (2)	Jul (15)	Aug (32)	Sep (6)	Oct (28)	Nov (1)	Dec
2017	Jan (1)	Feb	Mar	Apr (13)	May (8)	Jun (2)	Jul (3)	Aug (10)	Sep	Oct (2)	Nov	Dec (1)
2018	Jan (2)	Feb (4)	Mar (2)	Apr (7)	May	Jun (8)	Jul	Aug (8)	Sep (2)	Oct (2)	Nov (8)	Dec (6)
2019	Jan (2)	Feb	Mar (1)	Apr	May (1)	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec (3)
2020	Jan (3)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2021	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 21 22 23 24 25 .. 244 > >> (Page 23 of 244)

Re: [Nasm-devel] [PATCH 4/7] AVX-512: Add a feature to generate a raw bytecode file

From: Song, J. K. <jin...@in...> - 2013-08-28 10:00:09

This is weird. I will send this patch again. I guess the smtp server clobbered first a few lines of this patch.

Thanks,
Jin

> -----Original Message-----
> From: Cyrill Gorcunov [mailto:gor...@gm...]
> Sent: Tuesday, August 27, 2013 10:42 PM
> To: Song, Jin Kyu
> Cc: nas...@li...
> Subject: Re: [Nasm-devel] [PATCH 4/7] AVX-512: Add a feature to generate a
> raw bytecode file
> 
> On Mon, Aug 26, 2013 at 08:28:40PM -0700, Jin Kyu Song wrote:
> > >From gas testsuite file, a text file containing raw bytecodes
> > is useful when verifying the output of NASM.
> >
> > Signed-off-by: Jin Kyu Song <jin...@in...>
> > ---
> >  test/gas2nasm.py |   11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/test/gas2nasm.py b/test/gas2nasm.py
> > index de16745..a00af92 100755
> > --- a/test/gas2nasm.py
> > +++ b/test/gas2nasm.py
> > @@ -21,6 +21,9 @@ def setup():
> >      parser.add_option('-b', dest='bits', action='store',
> >              default="",
> >              help='Bits for output ASM file.')
> > +    parser.add_option('-r', dest='raw_output', action='store',
> > +            default="",
> > +            help='Name for raw output bytes in text')
> 
> This one doesn't apply. Please refresh and send again.

Re: [Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for instruction flags

From: Song, J. K. <jin...@in...> - 2013-08-28 09:55:58

> -----Original Message-----
> From: Cyrill Gorcunov [mailto:gor...@gm...]
> Sent: Tuesday, August 27, 2013 10:44 PM
> To: Song, Jin Kyu
> Cc: nas...@li...
> Subject: Re: [Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for
> instruction flags
> 
> On Mon, Aug 26, 2013 at 08:28:42PM -0700, Jin Kyu Song wrote:
> > Increased the size of data type for instruction flags from 32bits to
> 64bits.
> > And a new type (iflags_t) is defined for better maintainability.
> >
> > Bigger data type is needed because more instruction set types are coming
> > but there were not enough space for them. Since they are not bit masks,
> > only one instruction set is allowed for each instruction.
> >
> > Signed-off-by: Jin Kyu Song <jin...@in...>
> > -CVTPI2PS	xmmreg,mmxrm64			[rm:	np 0f 2a /r]
> 		KATMAI,SSE,MMX
> > -CVTPS2PI	mmxreg,xmmrm64			[rm:	np 0f 2d /r]
> 		KATMAI,SSE,MMX
> > +CVTPI2PS	xmmreg,mmxrm64			[rm:	np 0f 2a /r]
> 		KATMAI,SSE
> > +CVTPS2PI	mmxreg,xmmrm64			[rm:	np 0f 2d /r]
> 		KATMAI,SSE
> 
> Why you've changed flags here and a couple of other places?

The reason is actually written in the commit message. "Since they are not bit masks, only one instruction set is allowed for each instruction." So both SSE and MMX could not be set for one instruction. As nasm64developer mentioned in his email, this may not be a proper way to expand and define bits for instruction sets. And it needs a major restructuring of instruction flags not a simple fix of increasing data type size.

The original purpose of this change was merely that I needed a way to distinguish EVEX instruction from VEX one which has exactly same operand types. For example, "vmovq xmm30,xmm29" should be encoded with EVEX because of high-16 registers but in the matches() function, I could not think of a way to see the current template being matched is VEX or EVEX except checking the first byte of bytecode (0240 or 0260). So I decided to enable instruction set flags in IF_*.

static const struct itemplate instrux_VMOVQ[] = {
    {I_VMOVQ, 2, {XMMREG,RM_XMM|BITS64,0,0,0}, NO_DECORATOR, nasm_bytecodes+13891, IF_AVX|IF_SANDYBRIDGE|IF_SQ},
    {I_VMOVQ, 2, {XMMREG,RM_XMM|BITS64,0,0,0}, NO_DECORATOR, nasm_bytecodes+9496, IF_AVX512|IF_FUTURE},

Please let me know any better way to implement this part.

Re: [Nasm-devel] [PATCH 0/6] AVX-512: Bug fixes and additional features

From: Cyrill G. <gor...@gm...> - 2013-08-28 05:47:53

On Wed, Aug 21, 2013 at 07:29:07PM -0700, Jin Kyu Song wrote:
> Please review these patches and pull if they look good.
> git://repo.or.cz/nasm/avx512.git
> 
> After running a test case, various issues were found. One major thing is
> curly brace already used for grouping multi-line macro parameters.
> An escape backward slash character '\' is added when braces are passed
> as a part of enclosed parameter. The test asm file used here is also included.
> 
> Patch "AVX-512: Add a test case for EVEX encoded instructions" is
> relatively huge. So I did not attch that patch in this email. Please refer to
> http://repo.or.cz/w/nasm/avx512.git/commitdiff/a4a573c47f3d9ddfd5c2521804454327765f367e

Jin, I picked up the series, except path 4/7 which doesn't apply. But I'm deferring
pushing it out until you explain why mmx IF flags are wiped.

Re: [Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for instruction flags

From: Cyrill G. <gor...@gm...> - 2013-08-28 05:44:23

On Mon, Aug 26, 2013 at 08:28:42PM -0700, Jin Kyu Song wrote:
> Increased the size of data type for instruction flags from 32bits to 64bits.
> And a new type (iflags_t) is defined for better maintainability.
> 
> Bigger data type is needed because more instruction set types are coming
> but there were not enough space for them. Since they are not bit masks,
> only one instruction set is allowed for each instruction.
> 
> Signed-off-by: Jin Kyu Song <jin...@in...>
> -CVTPI2PS	xmmreg,mmxrm64			[rm:	np 0f 2a /r]				KATMAI,SSE,MMX
> -CVTPS2PI	mmxreg,xmmrm64			[rm:	np 0f 2d /r]				KATMAI,SSE,MMX
> +CVTPI2PS	xmmreg,mmxrm64			[rm:	np 0f 2a /r]				KATMAI,SSE
> +CVTPS2PI	mmxreg,xmmrm64			[rm:	np 0f 2d /r]				KATMAI,SSE

Why you've changed flags here and a couple of other places?

Re: [Nasm-devel] [PATCH 4/7] AVX-512: Add a feature to generate a raw bytecode file

From: Cyrill G. <gor...@gm...> - 2013-08-28 05:42:25

On Mon, Aug 26, 2013 at 08:28:40PM -0700, Jin Kyu Song wrote:
> >From gas testsuite file, a text file containing raw bytecodes
> is useful when verifying the output of NASM.
> 
> Signed-off-by: Jin Kyu Song <jin...@in...>
> ---
>  test/gas2nasm.py |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/test/gas2nasm.py b/test/gas2nasm.py
> index de16745..a00af92 100755
> --- a/test/gas2nasm.py
> +++ b/test/gas2nasm.py
> @@ -21,6 +21,9 @@ def setup():
>      parser.add_option('-b', dest='bits', action='store',
>              default="",
>              help='Bits for output ASM file.')
> +    parser.add_option('-r', dest='raw_output', action='store',
> +            default="",
> +            help='Name for raw output bytes in text')

This one doesn't apply. Please refresh and send again.

Re: [Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for instruction flags

From: anonymous c. <nas...@us...> - 2013-08-28 05:17:24

>> PTEST and ROUND[PS|PD|SS|SD] were part of SSE4.1 and SSE5A.
> Thanks for letting me know. NASM does not have a definition of SSE5A though.
> Is NASM missing SSE5A intentionally?

AMD decided not to ship SSE5A in the end.

(I merely cited it as an example for instructions that
were part of more than one extension or feature.)

> So do you suggest make them all bit masks to lift this kind of limitation
> (prohibiting two different types)? Since it became 64bit, there are enough
> space for now. But I worried the case that it runs out quickly.

You already need more than 64 bits to properly map
all existing instructions to their CPUID features (and
in some case you will need to make up a flag, since
CPUID lacks a few of them, e.g. hinting NOPs, hints
for branches, PAUSE, etc.).

So instead of just widening from 32 to 64 bits, you'll
need a real bit vector, with more than 64 bits.

Also, this would be a change that you would want to
make on the main tree, not the AVX-512 branch.

Re: [Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for instruction flags

From: Song, J. K. <jin...@in...> - 2013-08-27 21:57:56

> PTEST and ROUND[PS|PD|SS|SD] were part of SSE4.1 and SSE5A.
Thanks for letting me know. NASM does not have a definition of SSE5A though. Is NASM missing SSE5A intentionally?

> 
> XTEST is part of RTM and HLE.
I made UNDOC and HLE bit masks for this reason while others are values.

#define IF_UNDOC        0x8000000000UL    /* it's an undocumented instruction */
#define IF_HLE          0x4000000000UL    /* HACK NEED TO REORGANIZE THESE BITS */

So do you suggest make them all bit masks to lift this kind of limitation (prohibiting two different types)? Since it became 64bit, there are enough space for now. But I worried the case that it runs out quickly.

Another question is that 0x00FFC000UL seemed not used anywhere now. Are those bits reserved?

Re: [Nasm-devel] [PATCH 0/6] AVX-512: Bug fixes and additional features

From: anonymous c. <nas...@us...> - 2013-08-27 04:22:19

> And please note that current code patch searches for two specific strings of
> "\{" and "\}", so it might not break any existing code that have used
> backslashes in macro parameters.

How do you specify a parameter enclosed in '{' and '}'
that expands to something containing '\{' and '\}'?

Re: [Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for instruction flags

From: anonymous c. <nas...@us...> - 2013-08-27 04:02:32

> Increased the size of data type for instruction flags from 32bits to 64bits.
> And a new type (iflags_t) is defined for better maintainability.
>
> Bigger data type is needed because more instruction set types are coming
> but there were not enough space for them. Since they are not bit masks,
> only one instruction set is allowed for each instruction.

PTEST and ROUND[PS|PD|SS|SD] were part of SSE4.1 and SSE5A.

XTEST is part of RTM and HLE.

[Nasm-devel] [PATCH 6/7] AVX-512: Change the data type for instruction flags

From: Jin K. S. <jin...@in...> - 2013-08-27 03:30:00

Increased the size of data type for instruction flags from 32bits to 64bits.
And a new type (iflags_t) is defined for better maintainability.

Bigger data type is needed because more instruction set types are coming
but there were not enough space for them. Since they are not bit masks,
only one instruction set is allowed for each instruction.

Signed-off-by: Jin Kyu Song <jin...@in...>
---
 assemble.c |    6 +++---
 assemble.h |    4 ++--
 disasm.c   |    4 ++--
 disasm.h   |    2 +-
 insns.dat  |   46 +++++++++++++++++++++++-----------------------
 insns.h    |   53 ++++++++++++++++++++++++++++-------------------------
 insns.pl   |   15 +++++++++++++++
 nasm.c     |    8 ++++----
 nasm.h     |    2 ++
 ndisasm.c  |    2 +-
 10 files changed, 81 insertions(+), 61 deletions(-)

diff --git a/assemble.c b/assemble.c
index baae15f..c22075d 100644
--- a/assemble.c
+++ b/assemble.c
@@ -213,7 +213,7 @@ typedef struct {
 #define GEN_MODRM(mod, reg, rm)                     \
         (((mod) << 6) | (((reg) & 7) << 3) | ((rm) & 7))
 
-static uint32_t cpu;            /* cpu level received from nasm.c */
+static iflags_t cpu;            /* cpu level received from nasm.c */
 static efunc errfunc;
 static struct ofmt *outfmt;
 static ListGen *list;
@@ -377,7 +377,7 @@ static bool jmp_match(int32_t segment, int64_t offset, int bits,
     return (isize >= -128 && isize <= 127); /* is it byte size? */
 }
 
-int64_t assemble(int32_t segment, int64_t offset, int bits, uint32_t cp,
+int64_t assemble(int32_t segment, int64_t offset, int bits, iflags_t cp,
                  insn * instruction, struct ofmt *output, efunc error,
                  ListGen * listgen)
 {
@@ -680,7 +680,7 @@ int64_t assemble(int32_t segment, int64_t offset, int bits, uint32_t cp,
     return 0;
 }
 
-int64_t insn_size(int32_t segment, int64_t offset, int bits, uint32_t cp,
+int64_t insn_size(int32_t segment, int64_t offset, int bits, iflags_t cp,
                   insn * instruction, efunc error)
 {
     const struct itemplate *temp;
diff --git a/assemble.h b/assemble.h
index e5e5015..1197d59 100644
--- a/assemble.h
+++ b/assemble.h
@@ -38,9 +38,9 @@
 #ifndef NASM_ASSEMBLE_H
 #define NASM_ASSEMBLE_H
 
-int64_t insn_size(int32_t segment, int64_t offset, int bits, uint32_t cp,
+int64_t insn_size(int32_t segment, int64_t offset, int bits, iflags_t cp,
                insn * instruction, efunc error);
-int64_t assemble(int32_t segment, int64_t offset, int bits, uint32_t cp,
+int64_t assemble(int32_t segment, int64_t offset, int bits, iflags_t cp,
               insn * instruction, struct ofmt *output, efunc error,
               ListGen * listgen);
 
diff --git a/disasm.c b/disasm.c
index cc55d2c..9a5f9ad 100644
--- a/disasm.c
+++ b/disasm.c
@@ -944,7 +944,7 @@ static const char * const condition_name[16] = {
 };
 
 int32_t disasm(uint8_t *data, char *output, int outbufsize, int segsize,
-            int32_t offset, int autosync, uint32_t prefer)
+            int32_t offset, int autosync, iflags_t prefer)
 {
     const struct itemplate * const *p, * const *best_p;
     const struct disasm_index *ix;
@@ -955,7 +955,7 @@ int32_t disasm(uint8_t *data, char *output, int outbufsize, int segsize,
     uint8_t *origdata;
     int works;
     insn tmp_ins, ins;
-    uint32_t goodness, best;
+    iflags_t goodness, best;
     int best_pref;
     struct prefix_info prefix;
     bool end_prefix;
diff --git a/disasm.h b/disasm.h
index 3edbfd5..70a9a7b 100644
--- a/disasm.h
+++ b/disasm.h
@@ -41,7 +41,7 @@
 #define INSN_MAX 32             /* one instruction can't be longer than this */
 
 int32_t disasm(uint8_t *data, char *output, int outbufsize, int segsize,
-            int32_t offset, int autosync, uint32_t prefer);
+            int32_t offset, int autosync, iflags_t prefer);
 int32_t eatbyte(uint8_t *data, char *output, int outbufsize, int segsize);
 
 #endif
diff --git a/insns.dat b/insns.dat
index 7a0ec60..772a3e9 100644
--- a/insns.dat
+++ b/insns.dat
@@ -1514,8 +1514,8 @@ CMPPS		xmmreg,xmmreg,imm		[rmi:	np 0f c2 /r ib,u]			KATMAI,SSE,SB,AR2
 CMPSS		xmmreg,mem,imm			[rmi:	f3 0f c2 /r ib,u]			KATMAI,SSE,SB,AR2
 CMPSS		xmmreg,xmmreg,imm		[rmi:	f3 0f c2 /r ib,u]			KATMAI,SSE,SB,AR2
 COMISS		xmmreg,xmmrm32			[rm:	np 0f 2f /r]				KATMAI,SSE
-CVTPI2PS	xmmreg,mmxrm64			[rm:	np 0f 2a /r]				KATMAI,SSE,MMX
-CVTPS2PI	mmxreg,xmmrm64			[rm:	np 0f 2d /r]				KATMAI,SSE,MMX
+CVTPI2PS	xmmreg,mmxrm64			[rm:	np 0f 2a /r]				KATMAI,SSE
+CVTPS2PI	mmxreg,xmmrm64			[rm:	np 0f 2d /r]				KATMAI,SSE
 CVTSI2SS	xmmreg,mem			[rm:	f3 0f 2a /r]				KATMAI,SSE,SD,AR1,ND
 CVTSI2SS	xmmreg,rm32			[rm:	f3 0f 2a /r]				KATMAI,SSE,SD,AR1
 CVTSI2SS	xmmreg,rm64			[rm:	o64 f3 0f 2a /r]			X64,SSE,SQ,AR1
@@ -1523,7 +1523,7 @@ CVTSS2SI	reg32,xmmreg			[rm:	f3 0f 2d /r]				KATMAI,SSE,SD,AR1
 CVTSS2SI	reg32,mem			[rm:	f3 0f 2d /r]				KATMAI,SSE,SD,AR1
 CVTSS2SI	reg64,xmmreg			[rm:	o64 f3 0f 2d /r]			X64,SSE,SD,AR1
 CVTSS2SI	reg64,mem			[rm:	o64 f3 0f 2d /r]			X64,SSE,SD,AR1
-CVTTPS2PI	mmxreg,xmmrm			[rm:	np 0f 2c /r]				KATMAI,SSE,MMX,SQ
+CVTTPS2PI	mmxreg,xmmrm			[rm:	np 0f 2c /r]				KATMAI,SSE,SQ
 CVTTSS2SI	reg32,xmmrm			[rm:	f3 0f 2c /r]				KATMAI,SSE,SD,AR1
 CVTTSS2SI	reg64,xmmrm			[rm:	o64 f3 0f 2c /r]			X64,SSE,SD,AR1
 DIVPS		xmmreg,xmmrm128			[rm:	np 0f 5e /r]				KATMAI,SSE
@@ -1568,10 +1568,10 @@ UNPCKLPS	xmmreg,xmmrm128			[rm:	np 0f 14 /r]				KATMAI,SSE
 XORPS		xmmreg,xmmrm128			[rm:	np 0f 57 /r]				KATMAI,SSE
 
 ;# Introduced in Deschutes but necessary for SSE support
-FXRSTOR		mem				[m:	np 0f ae /1]				P6,SSE,FPU
-FXRSTOR64	mem				[m:	o64 np 0f ae /1]			X64,SSE,FPU
-FXSAVE		mem				[m:	np 0f ae /0]				P6,SSE,FPU
-FXSAVE64	mem				[m:	o64 np 0f ae /0]			X64,SSE,FPU
+FXRSTOR		mem				[m:	np 0f ae /1]				P6,SSE
+FXRSTOR64	mem				[m:	o64 np 0f ae /1]			X64,SSE
+FXSAVE		mem				[m:	np 0f ae /0]				P6,SSE
+FXSAVE64	mem				[m:	o64 np 0f ae /0]			X64,SSE
 
 ;# XSAVE group (AVX and extended state)
 ; Introduced in late Penryn ... we really need to clean up the handling
@@ -1863,37 +1863,37 @@ INVVPID		reg32,mem			[rm: 66 0f 38 81 /r]				VMX,SO,NOLONG
 INVVPID		reg64,mem			[rm: o64nw 66 0f 38 81 /r]			VMX,SO,LONG
 
 ;# Tejas New Instructions (SSSE3)
-PABSB		mmxreg,mmxrm			[rm:	np 0f 38 1c /r]				SSSE3,MMX,SQ
+PABSB		mmxreg,mmxrm			[rm:	np 0f 38 1c /r]				SSSE3,SQ
 PABSB		xmmreg,xmmrm			[rm:	66 0f 38 1c /r]				SSSE3
-PABSW		mmxreg,mmxrm			[rm:	np 0f 38 1d /r]				SSSE3,MMX,SQ
+PABSW		mmxreg,mmxrm			[rm:	np 0f 38 1d /r]				SSSE3,SQ
 PABSW		xmmreg,xmmrm			[rm:	66 0f 38 1d /r]				SSSE3
-PABSD		mmxreg,mmxrm			[rm:	np 0f 38 1e /r]				SSSE3,MMX,SQ
+PABSD		mmxreg,mmxrm			[rm:	np 0f 38 1e /r]				SSSE3,SQ
 PABSD		xmmreg,xmmrm			[rm:	66 0f 38 1e /r]				SSSE3
-PALIGNR		mmxreg,mmxrm,imm		[rmi:	np 0f 3a 0f /r ib,u]			SSSE3,MMX,SQ
+PALIGNR		mmxreg,mmxrm,imm		[rmi:	np 0f 3a 0f /r ib,u]			SSSE3,SQ
 PALIGNR		xmmreg,xmmrm,imm		[rmi:	66 0f 3a 0f /r ib,u]			SSSE3
-PHADDW		mmxreg,mmxrm			[rm:	np 0f 38 01 /r]				SSSE3,MMX,SQ
+PHADDW		mmxreg,mmxrm			[rm:	np 0f 38 01 /r]				SSSE3,SQ
 PHADDW		xmmreg,xmmrm			[rm:	66 0f 38 01 /r]				SSSE3
-PHADDD		mmxreg,mmxrm			[rm:	np 0f 38 02 /r]				SSSE3,MMX,SQ
+PHADDD		mmxreg,mmxrm			[rm:	np 0f 38 02 /r]				SSSE3,SQ
 PHADDD		xmmreg,xmmrm			[rm:	66 0f 38 02 /r]				SSSE3
-PHADDSW		mmxreg,mmxrm			[rm:	np 0f 38 03 /r]				SSSE3,MMX,SQ
+PHADDSW		mmxreg,mmxrm			[rm:	np 0f 38 03 /r]				SSSE3,SQ
 PHADDSW		xmmreg,xmmrm			[rm:	66 0f 38 03 /r]				SSSE3
-PHSUBW		mmxreg,mmxrm			[rm:	np 0f 38 05 /r]				SSSE3,MMX,SQ
+PHSUBW		mmxreg,mmxrm			[rm:	np 0f 38 05 /r]				SSSE3,SQ
 PHSUBW		xmmreg,xmmrm			[rm:	66 0f 38 05 /r]				SSSE3
-PHSUBD		mmxreg,mmxrm			[rm:	np 0f 38 06 /r]				SSSE3,MMX,SQ
+PHSUBD		mmxreg,mmxrm			[rm:	np 0f 38 06 /r]				SSSE3,SQ
 PHSUBD		xmmreg,xmmrm			[rm:	66 0f 38 06 /r]				SSSE3
-PHSUBSW		mmxreg,mmxrm			[rm:	np 0f 38 07 /r]				SSSE3,MMX,SQ
+PHSUBSW		mmxreg,mmxrm			[rm:	np 0f 38 07 /r]				SSSE3,SQ
 PHSUBSW		xmmreg,xmmrm			[rm:	66 0f 38 07 /r]				SSSE3
-PMADDUBSW	mmxreg,mmxrm			[rm:	np 0f 38 04 /r]				SSSE3,MMX,SQ
+PMADDUBSW	mmxreg,mmxrm			[rm:	np 0f 38 04 /r]				SSSE3,SQ
 PMADDUBSW	xmmreg,xmmrm			[rm:	66 0f 38 04 /r]				SSSE3
-PMULHRSW	mmxreg,mmxrm			[rm:	np 0f 38 0b /r]				SSSE3,MMX,SQ
+PMULHRSW	mmxreg,mmxrm			[rm:	np 0f 38 0b /r]				SSSE3,SQ
 PMULHRSW	xmmreg,xmmrm			[rm:	66 0f 38 0b /r]				SSSE3
-PSHUFB		mmxreg,mmxrm			[rm:	np 0f 38 00 /r]				SSSE3,MMX,SQ
+PSHUFB		mmxreg,mmxrm			[rm:	np 0f 38 00 /r]				SSSE3,SQ
 PSHUFB		xmmreg,xmmrm			[rm:	66 0f 38 00 /r]				SSSE3
-PSIGNB		mmxreg,mmxrm			[rm:	np 0f 38 08 /r]				SSSE3,MMX,SQ
+PSIGNB		mmxreg,mmxrm			[rm:	np 0f 38 08 /r]				SSSE3,SQ
 PSIGNB		xmmreg,xmmrm			[rm:	66 0f 38 08 /r]				SSSE3
-PSIGNW		mmxreg,mmxrm			[rm:	np 0f 38 09 /r]				SSSE3,MMX,SQ
+PSIGNW		mmxreg,mmxrm			[rm:	np 0f 38 09 /r]				SSSE3,SQ
 PSIGNW		xmmreg,xmmrm			[rm:	66 0f 38 09 /r]				SSSE3
-PSIGND		mmxreg,mmxrm			[rm:	np 0f 38 0a /r]				SSSE3,MMX,SQ
+PSIGND		mmxreg,mmxrm			[rm:	np 0f 38 0a /r]				SSSE3,SQ
 PSIGND		xmmreg,xmmrm			[rm:	66 0f 38 0a /r]				SSSE3
 
 ;# AMD SSE4A
diff --git a/insns.h b/insns.h
index 58a4cd7..ad795e2 100644
--- a/insns.h
+++ b/insns.h
@@ -19,7 +19,7 @@ struct itemplate {
     opflags_t       opd[MAX_OPERANDS];  /* bit flags for operand types */
     decoflags_t     deco[MAX_OPERANDS]; /* bit flags for operand decorators */
     const uint8_t   *code;              /* the code it assembles to */
-    uint32_t        flags;              /* some flags */
+    iflags_t        flags;              /* some flags */
 };
 
 /* Disassembler table structure */
@@ -72,6 +72,8 @@ extern const uint8_t nasm_bytecodes[];
  * (The default state if neither IF_SM nor IF_SM2 is specified is
  * that any operand with unspecified size in the template is
  * required to have unspecified size in the instruction too...)
+ *
+ * iflags_t is defined to store these flags.
  */
 
 #define IF_SM           0x00000001UL    /* size match */
@@ -103,33 +105,34 @@ extern const uint8_t nasm_bytecodes[];
 #define IF_LONG         0x00001000UL    /* long mode instruction */
 #define IF_NOHLE	0x00002000UL    /* HLE prefixes forbidden */
 /* These flags are currently not used for anything - intended for insn set */
-#define IF_UNDOC        0x00000000UL    /* it's an undocumented instruction */
-#define IF_FPU          0x00000000UL    /* it's an FPU instruction */
-#define IF_MMX          0x00000000UL    /* it's an MMX instruction */
-#define IF_3DNOW        0x00000000UL    /* it's a 3DNow! instruction */
-#define IF_SSE          0x00000000UL    /* it's a SSE (KNI, MMX2) instruction */
-#define IF_SSE2         0x00000000UL    /* it's a SSE2 instruction */
-#define IF_SSE3         0x00000000UL    /* it's a SSE3 (PNI) instruction */
-#define IF_VMX          0x00000000UL    /* it's a VMX instruction */
-#define IF_SSSE3        0x00000000UL    /* it's an SSSE3 instruction */
-#define IF_SSE4A        0x00000000UL    /* AMD SSE4a */
-#define IF_SSE41        0x00000000UL    /* it's an SSE4.1 instruction */
-#define IF_SSE42        0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_SSE5         0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_AVX          0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_AVX2         0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_AVX512       0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_FMA          0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_BMI1         0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_BMI2         0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_TBM          0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_HLE          0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_RTM          0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
-#define IF_INVPCID      0x00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_UNDOC        0x8000000000UL    /* it's an undocumented instruction */
+#define IF_HLE          0x4000000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_FPU          0x0100000000UL    /* it's an FPU instruction */
+#define IF_MMX          0x0200000000UL    /* it's an MMX instruction */
+#define IF_3DNOW        0x0300000000UL    /* it's a 3DNow! instruction */
+#define IF_SSE          0x0400000000UL    /* it's a SSE (KNI, MMX2) instruction */
+#define IF_SSE2         0x0500000000UL    /* it's a SSE2 instruction */
+#define IF_SSE3         0x0600000000UL    /* it's a SSE3 (PNI) instruction */
+#define IF_VMX          0x0700000000UL    /* it's a VMX instruction */
+#define IF_SSSE3        0x0800000000UL    /* it's an SSSE3 instruction */
+#define IF_SSE4A        0x0900000000UL    /* AMD SSE4a */
+#define IF_SSE41        0x0A00000000UL    /* it's an SSE4.1 instruction */
+#define IF_SSE42        0x0B00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_SSE5         0x0C00000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_AVX          0x0D00000000UL    /* it's an AVX     (128b) instruction */
+#define IF_AVX2         0x0E00000000UL    /* it's an AVX2    (256b) instruction */
+#define IF_AVX512       0x0F00000000UL    /* it's an AVX-512 (512b) instruction */
+#define IF_FMA          0x1000000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_BMI1         0x1100000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_BMI2         0x1200000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_TBM          0x1300000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_RTM          0x1400000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_INVPCID      0x1500000000UL    /* HACK NEED TO REORGANIZE THESE BITS */
+#define IF_INSMASK      0xFF00000000UL    /* the mask for instruction set types */
 #define IF_PMASK        0xFF000000UL    /* the mask for processor types */
 #define IF_PLEVEL       0x0F000000UL    /* the mask for processor instr. level */
                                         /* also the highest possible processor */
-#define IF_PFMASK       0xF01FF800UL    /* the mask for disassembly "prefer" */
+#define IF_PFMASK       0xFFF0000000UL    /* the mask for disassembly "prefer" */
 #define IF_8086         0x00000000UL    /* 8086 instruction */
 #define IF_186          0x01000000UL    /* 186+ instruction */
 #define IF_286          0x02000000UL    /* 286+ instruction */
diff --git a/insns.pl b/insns.pl
index eb99f6b..60f7dd3 100755
--- a/insns.pl
+++ b/insns.pl
@@ -427,6 +427,10 @@ sub format_insn($$$$$) {
     my $num, $nd = 0;
     my @bytecode;
     my $op, @ops, $opp, @opx, @oppx, @decos, @opevex;
+    my @iflags = (  "FPU", "MMX", "3DNOW", "SSE", "SSE2",
+                    "SSE3", "VMX", "SSSE3", "SSE4A", "SSE41",
+                    "SSE42", "SSE5", "AVX", "AVX2", "AVX512",
+                    "FMA", "BMI1", "BMI2", "TBM", "RTM", "INVPCID");
 
     return (undef, undef) if $operands eq "ignore";
 
@@ -476,6 +480,17 @@ sub format_insn($$$$$) {
     }
     $decorators =~ tr/a-z/A-Z/;
 
+    # check if two different insn set types are set
+    $cnt = 0;
+    foreach $fla (split(/,/, $flags)) {
+        if ($fla ~~ @iflags) {
+            $cnt++;
+            if ($cnt >= 2) {
+                die "Too many insn set flags in $flags\n";
+            }
+        }
+    }
+
     # format the flags
     $flags =~ s/,/|IF_/g;
     $flags =~ s/(\|IF_ND|IF_ND\|)//, $nd = 1 if $flags =~ /IF_ND/;
diff --git a/nasm.c b/nasm.c
index 126f271..3a0c050 100644
--- a/nasm.c
+++ b/nasm.c
@@ -74,7 +74,7 @@ struct forwrefinfo {            /* info held on forward refs. */
 };
 
 static int get_bits(char *value);
-static uint32_t get_cpu(char *cpu_str);
+static iflags_t get_cpu(char *cpu_str);
 static void parse_cmdline(int, char **);
 static void assemble_file(char *, StrList **);
 static void nasm_verror_gnu(int severity, const char *fmt, va_list args);
@@ -106,8 +106,8 @@ static FILE *error_file;        /* Where to write error messages */
 FILE *ofile = NULL;
 int optimizing = MAX_OPTIMIZE; /* number of optimization passes to take */
 static int sb, cmd_sb = 16;    /* by default */
-static uint32_t cmd_cpu = IF_PLEVEL;       /* highest level by default */
-static uint32_t cpu = IF_PLEVEL;   /* passed to insn_size & assemble.c */
+static iflags_t cmd_cpu = IF_PLEVEL;       /* highest level by default */
+static iflags_t cpu = IF_PLEVEL;   /* passed to insn_size & assemble.c */
 int64_t global_offset_changed;      /* referenced in labels.c */
 int64_t prev_offset_changed;
 int32_t stall_count;
@@ -2006,7 +2006,7 @@ static void usage(void)
     fputs("type `nasm -h' for help\n", error_file);
 }
 
-static uint32_t get_cpu(char *value)
+static iflags_t get_cpu(char *value)
 {
     if (!strcmp(value, "8086"))
         return IF_8086;
diff --git a/nasm.h b/nasm.h
index fc5a18d..72986ee 100644
--- a/nasm.h
+++ b/nasm.h
@@ -694,6 +694,8 @@ typedef struct insn { /* an instruction itself */
 
 enum geninfo { GI_SWITCH };
 
+typedef uint64_t iflags_t;
+
 /*
  * The data structure defining an output format driver, and the
  * interfaces to the functions therein.
diff --git a/ndisasm.c b/ndisasm.c
index 710d1f0..638299f 100644
--- a/ndisasm.c
+++ b/ndisasm.c
@@ -88,7 +88,7 @@ int main(int argc, char **argv)
     bool autosync = false;
     int bits = 16, b;
     bool eof = false;
-    uint32_t prefer = 0;
+    iflags_t prefer = 0;
     bool rn_error;
     int32_t offset;
     FILE *fp;
-- 
1.7.9.5

[Nasm-devel] [PATCH 7/7] AVX-512: Fix match function to check the range of registers

From: Jin K. S. <jin...@in...> - 2013-08-27 03:29:59

High-16 registers of XMM and YMM need to be encoded with EVEX not VEX.
Even if all the operand types match with VEX instruction format,
it should use EVEX instead.

Signed-off-by: Jin Kyu Song <jin...@in...>
---
 assemble.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/assemble.c b/assemble.c
index c22075d..b0d4571 100644
--- a/assemble.c
+++ b/assemble.c
@@ -191,6 +191,7 @@ enum match_result {
     MERR_BADCPU,
     MERR_BADMODE,
     MERR_BADHLE,
+    MERR_ENCMISMATCH,
     /*
      * Matching success; the conditional ones first
      */
@@ -1233,6 +1234,10 @@ static int64_t calcsize(int32_t segment, int64_t offset, int bits,
         if (bits != 64 && ((ins->rex & bad32) || ins->vexreg > 7)) {
             errfunc(ERR_NONFATAL, "invalid operands in non-64-bit mode");
             return -1;
+        } else if (!(ins->rex & REX_EV) &&
+                   ((ins->vexreg > 15) || (ins->evex_p[0] & 0xf0))) {
+            errfunc(ERR_NONFATAL, "invalid high-16 register in non-AVX-512");
+            return -1;
         }
         if (ins->rex & REX_EV)
             length += 4;
@@ -2147,6 +2152,9 @@ static enum match_result matches(const struct itemplate *itemp,
                  */
                 opsizemissing = true;
             }
+        } else if (instruction->oprs[i].basereg >= 16 &&
+                   (itemp->flags & IF_INSMASK) != IF_AVX512) {
+            return MERR_ENCMISMATCH;
         }
     }
 
-- 
1.7.9.5

[Nasm-devel] [PATCH 4/7] AVX-512: Add a feature to generate a raw bytecode file

From: Jin K. S. <jin...@in...> - 2013-08-27 03:29:57

>From gas testsuite file, a text file containing raw bytecodes
is useful when verifying the output of NASM.

Signed-off-by: Jin Kyu Song <jin...@in...>
---
 test/gas2nasm.py |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/test/gas2nasm.py b/test/gas2nasm.py
index de16745..a00af92 100755
--- a/test/gas2nasm.py
+++ b/test/gas2nasm.py
@@ -21,6 +21,9 @@ def setup():
     parser.add_option('-b', dest='bits', action='store',
             default="",
             help='Bits for output ASM file.')
+    parser.add_option('-r', dest='raw_output', action='store',
+            default="",
+            help='Name for raw output bytes in text')
     (options, args) =  parser.parse_args()
     return options
 
@@ -77,11 +80,19 @@ def write(data, options):
                 outstr = outstrfmt % tuple(insn)
                 out.write(outstr)
 
+def write_rawbytes(data, options):
+    if options.raw_output:
+        with open(options.raw_output, 'wb') as out:
+            for insn in data:
+                out.write(insn[0] + '\n')
+
 if __name__ == "__main__":
     options = setup()
     recs = read(options)
     print "AVX3.1 instructions"
 
+    write_rawbytes(recs, options)
+
     recs = commas(recs)
 
     write(recs, options)
-- 
1.7.9.5

[Nasm-devel] [PATCH 5/7] AVX-512: Fix a bug in calculating Disp8*N value

From: Jin K. S. <jin...@in...> - 2013-08-27 03:29:57

Fixed a bug that derived an incorrect N value for tuple types of
T2, T4, T8.

Signed-off-by: Jin Kyu Song <jin...@in...>
---
 assemble.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/assemble.c b/assemble.c
index 313ff8a..baae15f 100644
--- a/assemble.c
+++ b/assemble.c
@@ -2257,7 +2257,7 @@ static bool is_disp8n(operand *input, insn *ins, int8_t *compdisp)
         if (vectlen + 7 <= (evex_w + 5) + (tuple - T2 + 1))
             n = 0;
         else
-            n = 1 << (tuple - T2 + evex_w + 4);
+            n = 1 << (tuple - T2 + evex_w + 3);
         break;
     case HVM:
     case QVM:
-- 
1.7.9.5

[Nasm-devel] [PATCH 3/7] AVX-512: Find the correct position of the last SIMD op

From: Jin K. S. <jin...@in...> - 2013-08-27 03:29:56

Since embedded rounding mode is following the last SIMD op,
GPR op should be skipped when finding the last SIMD op.

Signed-off-by: Jin Kyu Song <jin...@in...>
---
 assemble.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/assemble.c b/assemble.c
index 4f0cd9c..313ff8a 100644
--- a/assemble.c
+++ b/assemble.c
@@ -1159,6 +1159,8 @@ static int64_t calcsize(int32_t segment, int64_t offset, int bits,
                     rfield = nasm_regvals[opx->basereg];
                     /* find the last SIMD operand where ER decorator resides */
                     oplast = &ins->oprs[op1 > op2 ? op1 : op2];
+                    while (oplast && is_class(REG_CLASS_GPR, oplast->type))
+                        oplast--;
                 } else {
                     rflags = 0;
                     rfield = c & 7;
-- 
1.7.9.5

[Nasm-devel] [PATCH 2/7] AVX-512: Moved {er} decorator position next to the last SIMD op

From: Jin K. S. <jin...@in...> - 2013-08-27 03:29:55

This is for following the current syntax used in gas even though
this is not SDM conforming.
According to SDM, {er} should follow the last GPR op not SIMD op.
e.g. SDM : VCVTSI2SD xmm1, xmm2, r/m64{er}
    NASM : VCVTSI2SD xmm1, xmm2{er}, r/m64

Signed-off-by: Jin Kyu Song <jin...@in...>
---
 insns.dat |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/insns.dat b/insns.dat
index 320280a..7a0ec60 100644
--- a/insns.dat
+++ b/insns.dat
@@ -3504,10 +3504,10 @@ VCVTSD2SI        reg64,xmmrm64|er                              [rm:t1f64:
 VCVTSD2SS        xmmreg|mask|z,xmmreg,xmmrm64|er               [rvm:t1s:        evex.nds.lig.f2.0f.w1 5a /r ]  AVX512,FUTURE
 VCVTSD2USI       reg32,xmmrm64|er                              [rm:t1f64:           evex.lig.f2.0f.w0 79 /r ]  AVX512,FUTURE
 VCVTSD2USI       reg64,xmmrm64|er                              [rm:t1f64:           evex.lig.f2.0f.w1 79 /r ]  AVX512,FUTURE
-VCVTSI2SD        xmmreg,xmmreg,rm32|er                         [rvm:t1s:        evex.nds.lig.f2.0f.w0 2a /r ]  AVX512,FUTURE
-VCVTSI2SD        xmmreg,xmmreg,rm64|er                         [rvm:t1s:        evex.nds.lig.f2.0f.w1 2a /r ]  AVX512,FUTURE
-VCVTSI2SS        xmmreg,xmmreg,rm32|er                         [rvm:t1s:        evex.nds.lig.f3.0f.w0 2a /r ]  AVX512,FUTURE
-VCVTSI2SS        xmmreg,xmmreg,rm64|er                         [rvm:t1s:        evex.nds.lig.f3.0f.w1 2a /r ]  AVX512,FUTURE
+VCVTSI2SD        xmmreg,xmmreg|er,rm32                         [rvm:t1s:        evex.nds.lig.f2.0f.w0 2a /r ]  AVX512,FUTURE
+VCVTSI2SD        xmmreg,xmmreg|er,rm64                         [rvm:t1s:        evex.nds.lig.f2.0f.w1 2a /r ]  AVX512,FUTURE
+VCVTSI2SS        xmmreg,xmmreg|er,rm32                         [rvm:t1s:        evex.nds.lig.f3.0f.w0 2a /r ]  AVX512,FUTURE
+VCVTSI2SS        xmmreg,xmmreg|er,rm64                         [rvm:t1s:        evex.nds.lig.f3.0f.w1 2a /r ]  AVX512,FUTURE
 VCVTSS2SD        xmmreg|mask|z,xmmreg,xmmrm32|sae              [rvm:t1s:        evex.nds.lig.f3.0f.w0 5a /r ]  AVX512,FUTURE
 VCVTSS2SI        reg32,xmmrm32|er                              [rm:t1f32:           evex.lig.f3.0f.w0 2d /r ]  AVX512,FUTURE
 VCVTSS2SI        reg64,xmmrm32|er                              [rm:t1f32:           evex.lig.f3.0f.w1 2d /r ]  AVX512,FUTURE
@@ -3527,10 +3527,10 @@ VCVTTSS2USI      reg32,xmmrm32|sae                             [rm:t1f32:
 VCVTTSS2USI      reg64,xmmrm32|sae                             [rm:t1f32:           evex.lig.f3.0f.w1 78 /r ]  AVX512,FUTURE
 VCVTUDQ2PD       zmmreg|mask|z,ymmrm256|b32|er                 [rm:hv:              evex.512.f3.0f.w0 7a /r ]  AVX512,FUTURE
 VCVTUDQ2PS       zmmreg|mask|z,zmmrm512|b32|er                 [rm:fv:              evex.512.f2.0f.w0 7a /r ]  AVX512,FUTURE
-VCVTUSI2SD       xmmreg,xmmreg,rm32|er                         [rvm:t1s:        evex.nds.lig.f2.0f.w0 7b /r ]  AVX512,FUTURE
-VCVTUSI2SD       xmmreg,xmmreg,rm64|er                         [rvm:t1s:        evex.nds.lig.f2.0f.w1 7b /r ]  AVX512,FUTURE
-VCVTUSI2SS       xmmreg,xmmreg,rm32|er                         [rvm:t1s:        evex.nds.lig.f3.0f.w0 7b /r ]  AVX512,FUTURE
-VCVTUSI2SS       xmmreg,xmmreg,rm64|er                         [rvm:t1s:        evex.nds.lig.f3.0f.w1 7b /r ]  AVX512,FUTURE
+VCVTUSI2SD       xmmreg,xmmreg|er,rm32                         [rvm:t1s:        evex.nds.lig.f2.0f.w0 7b /r ]  AVX512,FUTURE
+VCVTUSI2SD       xmmreg,xmmreg|er,rm64                         [rvm:t1s:        evex.nds.lig.f2.0f.w1 7b /r ]  AVX512,FUTURE
+VCVTUSI2SS       xmmreg,xmmreg|er,rm32                         [rvm:t1s:        evex.nds.lig.f3.0f.w0 7b /r ]  AVX512,FUTURE
+VCVTUSI2SS       xmmreg,xmmreg|er,rm64                         [rvm:t1s:        evex.nds.lig.f3.0f.w1 7b /r ]  AVX512,FUTURE
 VDIVPD           zmmreg|mask|z,zmmreg,zmmrm512|b64|er          [rvm:fv:         evex.nds.512.66.0f.w1 5e /r ]  AVX512,FUTURE
 VDIVPS           zmmreg|mask|z,zmmreg,zmmrm512|b32|er          [rvm:fv:            evex.nds.512.0f.w0 5e /r ]  AVX512,FUTURE
 VDIVSD           xmmreg|mask|z,xmmreg,xmmrm64|er               [rvm:t1s:        evex.nds.lig.f2.0f.w1 5e /r ]  AVX512,FUTURE
@@ -3548,6 +3548,7 @@ VEXTRACTI32X4    xmmreg|mask|z,zmmreg,imm8                     [mri:           e
 VEXTRACTI64X4    mem256|mask,zmmreg,imm8                       [mri:t4:        evex.512.66.0f3a.w1 3b /r ib ]  AVX512,FUTURE
 VEXTRACTI64X4    ymmreg|mask|z,zmmreg,imm8                     [mri:           evex.512.66.0f3a.w1 3b /r ib ]  AVX512,FUTURE
 VEXTRACTPS       rm32,xmmreg,imm8                              [mri:t1s:      evex.128.66.0f3a.wig 17 /r ib ]  AVX512,FUTURE
+VEXTRACTPS       rm64,xmmreg,imm8                              [mri:t1s:       evex.128.66.0f3a.w1 17 /r ib ]  AVX512,FUTURE
 VFIXUPIMMPD      zmmreg|mask|z,zmmreg,zmmrm512|b64|sae,imm8    [rvmi:fv:   evex.nds.512.66.0f3a.w1 54 /r ib ]  AVX512,FUTURE
 VFIXUPIMMPS      zmmreg|mask|z,zmmreg,zmmrm512|b32|sae,imm8    [rvmi:fv:   evex.nds.512.66.0f3a.w0 54 /r ib ]  AVX512,FUTURE
 VFIXUPIMMSD      xmmreg|mask|z,xmmreg,xmmrm64|sae,imm8         [rvmi:t1s:  evex.nds.lig.66.0f3a.w1 55 /r ib ]  AVX512,FUTURE
-- 
1.7.9.5

[Nasm-devel] [PATCH 0/7] AVX-512: Add a test case and fix bugs

From: Jin K. S. <jin...@in...> - 2013-08-27 03:29:54

Please review and pull patches from:
git://repo.or.cz/nasm/avx512.git

A test case is added and checked against. Quite a few bugs are fixed in
this series of patches. Gas and other tools expects the embedded rounding
decorator located next to the last SIMD operand but this is different from
what AVX-512 spec says. Currently NASM is implemented to be compatible with
the existing other tools such as gas.

The instruction flags (IF_*) in insns.dat ran out of the space for
accomodating increasing number of instruction set types such as AVX-512.
So the data type size is increased from 32 bits to 64 bits.

Jin Kyu Song (7):
  AVX-512: Add a test case for EVEX encoded instructions
  AVX-512: Moved {er} decorator position next to the last SIMD op
  AVX-512: Find the correct position of the last SIMD op
  AVX-512: Add a feature to generate a raw bytecode file
  AVX-512: Fix a bug in calculating Disp8*N value
  AVX-512: Change the data type for instruction flags
  AVX-512: Fix match function to check the range of registers

 assemble.c       |   18 +-
 assemble.h       |    4 +-
 disasm.c         |    4 +-
 disasm.h         |    2 +-
 insns.dat        |   63 +-
 insns.h          |   53 +-
 insns.pl         |   15 +
 nasm.c           |    8 +-
 nasm.h           |    2 +
 ndisasm.c        |    2 +-
 test/avx512f.asm | 9175 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/gas2nasm.py |  107 +
 12 files changed, 9383 insertions(+), 70 deletions(-)
 create mode 100644 test/avx512f.asm
 create mode 100755 test/gas2nasm.py

-- 
1.7.9.5

Re: [Nasm-devel] [PATCH 0/6] AVX-512: Bug fixes and additional features

From: Song, J. K. <jin...@in...> - 2013-08-26 12:25:43

At first I considered smac way of counting braces. But for a better flexibility of macro, I decided not to follow that way. When the number of braces in a grouped parameter is odd, the macro expander would be confused again.

This example might not be a good one but shows what I thought about.
=== example ===
%macro mmacro 1
vcvtph2ps zmm1{%1
%endmacro

mmacro {k1\}\{z\},zmm3}
=== result ====
vcvtph2ps zmm1{k1}{z},zmm3
===============
This is why I chose to use a backslash escaping - eliminating any special meaning from braces in a grouped parameter.

And it also give the same benefit to smac code, too.
=== example ===
%define smacro(x) vaddpd zmm1{x,zmm3

smacro({k1\}\{z\},zmm2})
=== result ====
vaddpd zmm1{k1}{z},zmm2,zmm3
===============

And please note that current code patch searches for two specific strings of "\{" and "\}", so it might not break any existing code that have used backslashes in macro parameters.

Please let me know if there is no such a case I was concerned about like shown above.

Thanks,
Jin

> -----Original Message-----
> From: anonymous coward [mailto:nas...@us...]
> Sent: Sunday, August 25, 2013 12:35 PM
> To: nas...@li...
> Subject: Re: [Nasm-devel] [PATCH 0/6] AVX-512: Bug fixes and additional
> features
> 
> Instead of trying to introduce backslash escaping, you want to
> fix the mmac code to match the smac code, i.e. keep a count
> of curly braces.
> 
> === example ===
> 
> %define smacro(x) [x]
> 
> smacro ({{a,b}})
> 
> %macro mmacro 1
> <%1>
> %endmacro
> 
> mmacro {{a,b}}
> 
> === current ===
> 
> %line 2+1 0.asm
> 
> [{a,b}]
> 
> %line 8+1 0.asm
> 
> 0.asm:9: error: braces do not enclose all of macro parameter
> <{a,b>
> 
> === desired ===
> 
> %line 2+1 0.asm
> 
> [{a,b}]
> 
> %line 8+1 0.asm
> 
> <{a,b}>
> 
> --------------------------------------------------------------------------
> ----
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktr
> k
> _______________________________________________
> Nasm-devel mailing list
> Nas...@li...
> https://lists.sourceforge.net/lists/listinfo/nasm-devel

[Nasm-devel] [JOB] AVX-512 / xeon phi

From: C. B. <cbe...@pa...> - 2013-08-25 23:49:55

Hi

Anyone interested to take on some contract work to help us add AVX-512 
or Xeon PHI support to llvm mc or yasm-nextgen?
https://github.com/yasm/yasm-nextgen/

(We may have other work for adding various other vector extensions as 
well if interested..)

Thanks

./C

Re: [Nasm-devel] VSIB operand memory access size specifier

From: anonymous c. <nas...@us...> - 2013-08-25 19:44:32

>> I think NASM way makes more sense because each data element size is
>> 64bits(QWORD) not ZWORD. But it is also true that
>
> I can't say for gas (better to ask gas developers then why zword there).
> Still using QWORD for nasm looks sane for me. Lets wait for people
> opinions.

PD --> packed double --> 8 byte elements --> QWORD gather accesses

Re: [Nasm-devel] [PATCH 0/6] AVX-512: Bug fixes and additional features

From: anonymous c. <nas...@us...> - 2013-08-25 19:35:29

Instead of trying to introduce backslash escaping, you want to
fix the mmac code to match the smac code, i.e. keep a count
of curly braces.

=== example ===

%define smacro(x) [x]

smacro ({{a,b}})

%macro mmacro 1
<%1>
%endmacro

mmacro {{a,b}}

=== current ===

%line 2+1 0.asm

[{a,b}]

%line 8+1 0.asm

0.asm:9: error: braces do not enclose all of macro parameter
<{a,b>

=== desired ===

%line 2+1 0.asm

[{a,b}]

%line 8+1 0.asm

<{a,b}>

Re: [Nasm-devel] [PATCH 1/6] AVX-512: Handle curly braces in multi-line macro parameters

From: anonymous c. <nas...@us...> - 2013-08-25 19:11:10

> Multi-line macro uses curly braces for enclosing a parameter
> containing comma(s). Passing curly braces as a part of a parameter
> which is already enclosed with braces confuses the macro expander.
>
> Escape character '\' is prefixed in this case.
> e.g.) mmacro {1,2,3}, {4,\{5,6\}}
>       mmacro gets 2 parameters of '1,2,3' and '4,{5,6}'
>
> Signed-off-by: Jin Kyu Song <jin...@in...>

Yes, curly braces inside mmac params should be handled properly.

But no, you really do not want to introduce backslash escaping -- it
breaks existing code that has backslashes in mmac params.

Also, there is no need to mess around with the curly brace code of
the preprocessor when it comes to AVX-512 -- the preprocessor has
no concept of {xxx} modifiers; from its perspective that's just a curly
brace, followed by xxx, followed by another curly brace.

Only the (assembler's) parser needs to understand {xxx} modifiers,
and it's really trivial to handle them there.

Re: [Nasm-devel] [PATCH] AVX-512: Add support for parsing braces

From: anonymous c. <nas...@us...> - 2013-08-25 18:07:46

>> In terms of modifier placement you probably want to look at
>> gas again -- I have seen code which has modifiers before an
>> operand, and I have seen code which has them as their own
>> operand. For example, {one} op1, {two}, op2.
>
> Could you explain a little bit more about this? Is it regarding
> {er} and {sae} that are put as if they are separate operands?

In short, yes.

For a longer more detailed background, read on.

With L1OM and K1OM, Intel picked a specific operand syntax:

  {transform} ( operand {nt} {eh} ) {mask}

In particular:

  L1OM = {sss,ccccc} ( operand {nt} ) {kkk}

  K1OM = {sss} ( operand {eh} ) {kkk}

As a result you can face a bunch of non-compliant asm code:

  - incorrect source operand has transform modifier
  - non-destination operand has mask modifier

  - transform modifier specified after operand
  - non-temporal or eviction hint specified before operand
  - mask modifier specified before operand

  - transform modifier operand not preceded by "("
  - transform modifier operand not followed by ")"

  - non-temporal or eviction hint specified after ")", not before
  - mask modifier specified before ")", not after

  - transform modifier invalid for memory operand
  - transform modifier invalid for register operand

  - modifier specified as extra operand

As well as these "modifier used as an operand" cases:

  - only operand --> ignored
  - leading operand --> applied to next operand
  - trailing operand --> applied to previous operand
  - in between operands --> applied to previous operand

With AVX-512 Intel failed to prescribe a specific operand syntax.

So the modifiers can go anywhere -- before or after any operand,
or as their own operands.

Re: [Nasm-devel] VSIB operand memory access size specifier

From: Cyrill G. <gor...@gm...> - 2013-08-23 18:27:37

On Fri, Aug 23, 2013 at 06:03:36PM +0000, Song, Jin Kyu wrote:
> I found one discrepancy between NASM and gas regarding the size specifier.
> In insns.dat, VGATHERQPD/DPD expects 64bits(QWORD), if specified, specifier for VSIB operand.
> 	NASM : VGATHERQPD      xmmreg,xmem64,xmmreg            [rmv:   vm64x vex.dds.128.66.0f38.w1 93 /r]     FUTURE,AVX2
> 		 VGATHERDPD       zmmreg|mask,ymem64             [rm:t1s:  vsiby evex.512.66.0f38.w1 92 /r ]  AVX512,FUTURE
> But gas thinks it is a ZWORD.
> 	Gas  : vgatherdpd      zmm30{k1}, ZMMWORD PTR [r14+ymm31*8-123]
> 
> I think NASM way makes more sense because each data element size is 64bits(QWORD) not ZWORD. But it is also true that
> the eventual data size gathered is up to ZWORD. Is this discrepancy made intentionally? Does it need to be fixed to
> conform with gas or just to stay same as it used to be?

I can't say for gas (better to ask gas developers then why zword there).
Still using QWORD for nasm looks sane for me. Lets wait for people opinions.

[Nasm-devel] VSIB operand memory access size specifier

From: Song, J. K. <jin...@in...> - 2013-08-23 18:03:56

I found one discrepancy between NASM and gas regarding the size specifier.
In insns.dat, VGATHERQPD/DPD expects 64bits(QWORD), if specified, specifier for VSIB operand.
	NASM : VGATHERQPD      xmmreg,xmem64,xmmreg            [rmv:   vm64x vex.dds.128.66.0f38.w1 93 /r]     FUTURE,AVX2
		 VGATHERDPD       zmmreg|mask,ymem64             [rm:t1s:  vsiby evex.512.66.0f38.w1 92 /r ]  AVX512,FUTURE
But gas thinks it is a ZWORD.
	Gas  : vgatherdpd      zmm30{k1}, ZMMWORD PTR [r14+ymm31*8-123]

I think NASM way makes more sense because each data element size is 64bits(QWORD) not ZWORD. But it is also true that the eventual data size gathered is up to ZWORD.
Is this discrepancy made intentionally? Does it need to be fixed to conform with gas or just to stay same as it used to be?

- Jin

Re: [Nasm-devel] [PATCH 0/6] AVX-512: Bug fixes and additional features

From: Cyrill G. <gor...@gm...> - 2013-08-22 20:54:50

On Thu, Aug 22, 2013 at 08:33:23PM +0000, Song, Jin Kyu wrote:
> > 
> > One question -- you use TOK_BRACE for both { and } terms, won't it be
> > better to
> > introduce two terms instead TOK_OPEN_BRACE and TOK_CLOSE_BRACE? How
> > tokenizer
> > will handle statements like
> > 
> > 	term \{ term \{
> > 
> > it will be treated as non-error case? (I must admit I didn't yet review
> > the whole avx code :(
> 
> Hi Cyrill,
> 
> This case might be treated as an error in a parser not in a preprocessor if braces
> do not match even after expanding all macros. But this patch is for the multi-line macro preprocessing.
> 
> I used TOK_BRACE for the braces inside the parameter - "\{" or "\}". So they should be
> handled as a part of normal string without any special meaning. The reason why I added
> a new token type is tok_is_()/ tok_isnt_() macros check if it is TOK_OTHER or not. 
> 	#define tok_is_(x,v)    (tok_type_((x), TOK_OTHER) && !strcmp((x)->text,(v)))
> 	#define tok_isnt_(x,v)  ((x) && ((x)->type!=TOK_OTHER || strcmp((x)->text,(v))))
> 
> 	"{" with TOK_OTHER : an opening brace of a parameter
> 	"{" with TOK_BRACE : same as any normal character. Originally it is "\{".
> 
> So a new token type could easily avoid this type checking while holding a curly brace
> as a string. I chose this way to minimize the change. Maybe I need to rename the new
> token type because people may think the name of TOK_BRACE means the brace actually enclosing the macro parameter.
> 
> At first I tried to change the parsing logic of preprocessor but that way led me to much bigger code change.

Yeah, preprocessor is already complex enough, so big changes are not approved :-)
I see Jin what you're implementing here, need to think. Still this should not
stop you, we always can update code and logic before release.

280 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 21 22 23 24 25 .. 244 > >> (Page 23 of 244)