You can subscribe to this list here.
| 2002 | Jan | Feb | Mar | Apr | May (208) | Jun (43) | Jul | Aug (2) | Sep (17) | Oct | Nov (4) | Dec (9) | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 | Jan | Feb (11) | Mar (3) | Apr (2) | May | Jun (3) | Jul (29) | Aug (29) | Sep (48) | Oct | Nov | Dec (5) | 
| 2004 | Jan (1) | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec (1) | 
| 2005 | Jan (12) | Feb (1) | Mar (1) | Apr | May (1) | Jun (2) | Jul | Aug | Sep (4) | Oct (3) | Nov (1) | Dec (2) | 
| 2006 | Jan (1) | Feb (2) | Mar (1) | Apr | May (1) | Jun | Jul | Aug (1) | Sep (2) | Oct (21) | Nov (25) | Dec (16) | 
| 2007 | Jan (26) | Feb (26) | Mar (18) | Apr (51) | May (45) | Jun (26) | Jul (6) | Aug (85) | Sep (161) | Oct (111) | Nov (83) | Dec (18) | 
| 2008 | Jan (31) | Feb (27) | Mar | Apr (16) | May (142) | Jun (136) | Jul (51) | Aug (21) | Sep (47) | Oct (428) | Nov (19) | Dec (6) | 
| 2009 | Jan (11) | Feb (37) | Mar (17) | Apr (15) | May (13) | Jun (61) | Jul (127) | Aug (15) | Sep (22) | Oct (28) | Nov (37) | Dec (10) | 
| 2010 | Jan (18) | Feb (22) | Mar (10) | Apr (41) | May | Jun (48) | Jul (61) | Aug (54) | Sep (34) | Oct (15) | Nov (49) | Dec (11) | 
| 2011 | Jan | Feb (24) | Mar (10) | Apr (9) | May | Jun (33) | Jul (41) | Aug (20) | Sep | Oct | Nov | Dec | 
| 2012 | Jan | Feb (86) | Mar (12) | Apr | May (10) | Jun | Jul (9) | Aug (4) | Sep (11) | Oct (3) | Nov (3) | Dec (10) | 
| 2013 | Jan (1) | Feb (23) | Mar (15) | Apr (7) | May (20) | Jun (3) | Jul (15) | Aug | Sep (29) | Oct (16) | Nov (69) | Dec (18) | 
| 2014 | Jan | Feb (8) | Mar | Apr | May (16) | Jun (7) | Jul | Aug (5) | Sep (2) | Oct (4) | Nov (25) | Dec (8) | 
| 2015 | Jan (6) | Feb (6) | Mar | Apr (1) | May (2) | Jun (1) | Jul (7) | Aug | Sep (2) | Oct (1) | Nov (6) | Dec | 
| 2016 | Jan (12) | Feb (97) | Mar (57) | Apr (52) | May (33) | Jun (1) | Jul (1) | Aug | Sep | Oct (3) | Nov (3) | Dec | 
| 2017 | Jan (4) | Feb | Mar (23) | Apr (5) | May | Jun (2) | Jul (3) | Aug (2) | Sep | Oct (6) | Nov (3) | Dec (3) | 
| 2018 | Jan (4) | Feb (11) | Mar | Apr (1) | May (3) | Jun (6) | Jul | Aug (5) | Sep (5) | Oct (36) | Nov (128) | Dec (18) | 
| 2019 | Jan | Feb | Mar (1) | Apr (1) | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | 
| 2020 | Jan | Feb | Mar | Apr | May (24) | Jun | Jul | Aug | Sep | Oct | Nov | Dec | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:35
      
     | 
| Commit-ID: 51fd5a506723e756d306228cacba47e0a88b50e1 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=51fd5a506723e756d306228cacba47e0a88b50e1 Author: Chang S. Bae <cha...@in...> AuthorDate: Thu, 6 Feb 2020 14:39:22 -0800 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:41:38 -0700 preproc: Fix the token iterator in expanding single-line macro The code used to stuck in going through whitespace tokens. Fix to increment towards on the next in the loop. Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392630 Suggested-by: C. Masloch <pu...@ul...> Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/asm/preproc.c b/asm/preproc.c index f94d9558..befe77e8 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -5379,8 +5379,10 @@ static SMacro *expand_one_smacro(Token ***tpp) Token *endt = tline; tline = t; - while (!cond_comma && t && t != endt) + while (!cond_comma && t && t != endt) { cond_comma = t->type != TOK_WHITESPACE; + t = t->next; + } } if (tnext) { | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:35
      
     | 
| Commit-ID: 53ca4bb19cc4f3147891c03d10959d57e0edcc01 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=53ca4bb19cc4f3147891c03d10959d57e0edcc01 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 14:45:23 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:41:44 -0700 test: Add BR 3392630 Add the test code into the existing xdefine testing. Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392630 Signed-off-by: Chang S. Bae <cha...@in...> --- test/xdefine.asm | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/test/xdefine.asm b/test/xdefine.asm index 3b475864..180c0305 100644 --- a/test/xdefine.asm +++ b/test/xdefine.asm @@ -8,4 +8,8 @@ %xdefine ctr n %define n 0x22 - db ctr, n ; Should be 0x21, 0x22 + db ctr, n ; Should be 0x21, 0x22 + +%define MNSUFFIX +%define MNCURRENT TEST%[MNSUFFIX] +%xdefine var MNCURRENT | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:34
      
     | 
| Commit-ID: 2f3e8987807237605f8a8a41d62df3d4fe98d548 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=2f3e8987807237605f8a8a41d62df3d4fe98d548 Author: Chang S. Bae <cha...@in...> AuthorDate: Tue, 24 Mar 2020 14:24:43 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:47:26 -0700 disam: explicitly change stdin to binary mode The binary mode has no difference from text mode in POSIX-compliant operating systems. The two modes are distinguishable from each other on Windows, and perhaps on other systems as well. The binary stream has scalability and other advantages. Windows treats the standard input stream as text mode by default. So the code changes it to binary mode. Also, add a helper function, nasm_set_binary_mode(), that is OS-agnostic, in the library. Reported-by: Didier Stevens <did...@gm...> Suggested-by: Didier Stevens <did...@gm...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392649 Signed-off-by: Chang S. Bae <cha...@in...> --- disasm/ndisasm.c | 4 +++- include/nasmlib.h | 2 ++ nasmlib/file.c | 5 +++++ nasmlib/file.h | 22 ++++++++++++++++++++++ 4 files changed, 32 insertions(+), 1 deletion(-) diff --git a/disasm/ndisasm.c b/disasm/ndisasm.c index f3c23b00..01e0c557 100644 --- a/disasm/ndisasm.c +++ b/disasm/ndisasm.c @@ -280,8 +280,10 @@ int main(int argc, char **argv) pname, filename, strerror(errno)); return 1; } - } else + } else { + nasm_set_binary_mode(stdin); fp = stdin; + } if (initskip > 0) skip(initskip, fp); diff --git a/include/nasmlib.h b/include/nasmlib.h index 940f1cb7..c4b4ac4c 100644 --- a/include/nasmlib.h +++ b/include/nasmlib.h @@ -365,6 +365,8 @@ enum file_flags { FILE *nasm_open_read(const char *filename, enum file_flags flags); FILE *nasm_open_write(const char *filename, enum file_flags flags); +void nasm_set_binary_mode(FILE *f); + /* Probe for existence of a file */ bool nasm_file_exists(const char *filename); diff --git a/nasmlib/file.c b/nasmlib/file.c index a8cd3057..62b854de 100644 --- a/nasmlib/file.c +++ b/nasmlib/file.c @@ -148,6 +148,11 @@ os_filename os_mangle_filename(const char *filename) #endif +void nasm_set_binary_mode(FILE *f) +{ + os_set_binary_mode(f); +} + FILE *nasm_open_read(const char *filename, enum file_flags flags) { FILE *f = NULL; diff --git a/nasmlib/file.h b/nasmlib/file.h index 4f0420ec..fc8f893d 100644 --- a/nasmlib/file.h +++ b/nasmlib/file.h @@ -103,6 +103,24 @@ typedef struct _stati64 os_struct_stat; # define os_stat _wstati64 # define os_fstat _fstati64 +/* + * On Win32/64, freopen() and _wfreopen() fails when the mode string + * is with the letter 'b' that represents to set binary mode. On + * POSIX operating systems, the 'b' is ignored, without failure. + */ + +#include <io.h> +#include <fcntl.h> + +static inline void os_set_binary_mode(FILE *f) { + int ret = _setmode(_fileno(f), _O_BINARY); + + if (ret == -1) { + nasm_fatalf(ERR_NOFILE, "unable to open file: %s", + strerror(errno)); + } +} + #else /* not _WIN32 */ typedef const char *os_filename; @@ -117,6 +135,10 @@ static inline void os_free_filename(os_filename filename) (void)filename; /* Nothing to do */ } +static inline void os_set_binary_mode(FILE *f) { + (void)f; +} + # define os_fopen fopen #if defined(HAVE_FACCESSAT) && defined(AT_EACCESS) | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:34
      
     | 
| Commit-ID: 333f1d02bbcd7f9afac0b0d7941243b570335079 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=333f1d02bbcd7f9afac0b0d7941243b570335079 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 15:25:05 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:48:37 -0700 test: Add BR 3392607 Reported-by: Henrik Gramner <he...@gr...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392607 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392607.asm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/test/br3392607.asm b/test/br3392607.asm new file mode 100644 index 00000000..a61eafc3 --- /dev/null +++ b/test/br3392607.asm @@ -0,0 +1,2 @@ +BITS 64 + vpshldvw xmm0{k1}{z}, xmm1, xmm2 | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:32
      
     | 
| Commit-ID: 655761ba187807395d5303ca92aa575fabbed628 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=655761ba187807395d5303ca92aa575fabbed628 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 15:07:49 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:43:54 -0700 test: Add BR 3392640 Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392640 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392640.asm | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/test/br3392640.asm b/test/br3392640.asm new file mode 100644 index 00000000..e593be9e --- /dev/null +++ b/test/br3392640.asm @@ -0,0 +1,4 @@ + %imacro mac 1-2 + j%+1 + %endmacro + mac c, label | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:29
      
     | 
| Commit-ID: 6e3f3411a1686e554beca3e766edb0a8efb6d617 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=6e3f3411a1686e554beca3e766edb0a8efb6d617 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 25 Mar 2020 15:13:21 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:42:16 -0700 preproc: Fix the token in expanding the %+/%- macro-parameters The code looked to be unintentionally always nullifying the token pointer at first place in handling those macro-parameters. Remove it to avoid segfault. Fixes: de7acc3a46cb ("preproc: defer %00, %? and %?? expansion for nested macros, cleanups") Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392640 Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/asm/preproc.c b/asm/preproc.c index befe77e8..cf770026 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -4833,8 +4833,6 @@ static Token *expand_mmac_params(Token * tline) unsigned long n; char *ep; - text = NULL; - n = strtoul(tok_text(t) + 2, &ep, 10); if (unlikely(*ep)) goto invalid; | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:28
      
     | 
| Commit-ID: 073cd40c63ba66782bb5bede24749e3444109c37 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=073cd40c63ba66782bb5bede24749e3444109c37 Author: Chang S. Bae <cha...@in...> AuthorDate: Fri, 7 Feb 2020 15:49:38 -0800 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:40:40 -0700 preproc: Fix to reset %rep list line number after every iteration The code has been fixed to print the corresponding line numbers of %rep blocks correctly, but only for the first iteration. For the subsequent iterations, the current line number on the expansion needs to be explicitly reset again. Fixes: ab6f8319552f ("listing: when listing lines in macros and rep blocks, show the actual line") Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392626 Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/asm/preproc.c b/asm/preproc.c index 41a7c6fb..f94d9558 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -6211,6 +6211,7 @@ static Token *pp_tokline(void) Token *t, *tt, **tail; Line *ll; + istk->mstk.mstk->lineno = 0; nasm_new(ll); ll->next = istk->expansion; tail = &ll->first; | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-26 17:18:28
      
     | 
| Commit-ID: ddb22a821ce08ec79b013cdcd4538de24c93a93f Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=ddb22a821ce08ec79b013cdcd4538de24c93a93f Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 15:06:38 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 1 Apr 2020 15:40:59 -0700 test: Add BR 3392626 There are many similar preprocessor loop cases but located on each bug-report basis. While it looks to be better to consolidate them together, add one more test case like what was done before. Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392626 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392626.asm | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/test/br3392626.asm b/test/br3392626.asm new file mode 100644 index 00000000..de4ad8ee --- /dev/null +++ b/test/br3392626.asm @@ -0,0 +1,6 @@ + ; line 1 +%rep 3 ; line 2 + ; line 3 + db 26h ; line 4 +%endrep ; line 5 + ; line 6 | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-06 20:36:28
      
     | 
| Commit-ID: 19c289297a8cd8eaa8fcd7c04f971bf95e320a41 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=19c289297a8cd8eaa8fcd7c04f971bf95e320a41 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 6 May 2020 17:13:02 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 6 May 2020 20:34:47 +0000 doc: Update for upcoming 2.15 release Update release notes and documentation for 2.15 Signed-off-by: Andrey Matyukov <and...@in...> Signed-off-by: Chang S. Bae <cha...@in...> --- doc/Makefile.in | 2 +- doc/changes.src | 68 ++++++++++++++++++++++++ doc/nasmdoc.src | 159 ++++++++++++++++++++++++++++++++++++++++++++++++-------- 3 files changed, 206 insertions(+), 23 deletions(-) diff --git a/doc/Makefile.in b/doc/Makefile.in index ba44b44a..a076300c 100644 --- a/doc/Makefile.in +++ b/doc/Makefile.in @@ -42,7 +42,7 @@ all: $(OUT) inslist.src: inslist.pl ../x86/insns.dat $(PERL) $(srcdir)/inslist.pl $(srcdir)/../x86/insns.dat -.PHONY: html +.PHONY: html nasmdoc.ps html: $(HTMLAUX) $(MKDIR) -p html for f in $(HTMLAUX); do $(CP_UF) "$(srcdir)/$$f" html/; done diff --git a/doc/changes.src b/doc/changes.src index eec64bca..84abcfd5 100644 --- a/doc/changes.src +++ b/doc/changes.src @@ -13,6 +13,74 @@ since 2007. \c{[WARNING PUSH]} and \c{[WARNING POP]} directives. See \k{asmdir-warning}. +\b The "sectalign on|off" switch does not affect an explicit directive. See +\k{sectalign} + +\b Added build option to enable building with profiling (--enable-profiling). + +\b Added support of long pathnames, up to 32767 of UTF-16 characters, on +Windows. + +\b Fixed 'mismatch in operand sizes' error in MOVDDUP instruction. + +\b Improved error messages in the string transformation routine. + +\b Removed obsolete '-gnu-elf-extensions' option and a warning about 8- and 16-bit relocation +generation. See \k{elf16} + +\b Added group aliases for all prefixed warnings. See \k{opt-w} + +\b Allowed building with MSVC versions older than 1700. + +\b Fixed to recognize a comma as a single-line macros argument +separator. + +\b Added implicitly sized versions of the K instructions, which allows the K +instructions to be specified without a size suffix as long as the operands are +sized. + +\b Added -L option for additional listing information. See \k{opt-L} + +\b Made an empty string usable as an unused argument in macros. See +\k{define}. + +\b Added warnings for obsolete instructions for a specified CPU. + +\b Deprecated \c{-hf} and \c{-y} options. Use \c{-h} instead. + +\b Made DWARF as the default debug format for ELF. + +\b Added a %pragma to set or clear listing options (%pragma list options +bempf). + +\b Allowed immediate syntax for LEA instruction (ignore operand size completely). + +\b Added limited functionality MASM compatibility package. See \k{pkg_masm}. + +\b Added support of macros aliases using %defalias, %idefalias. See \k{defalias}. + +\b Added support for stringify, nostrip, greedy single-line macro arguments. See \k{define}. + +\b Added conditional comma operator \c{%,}. See \k{cond-comma}. + +\b Changed private namespace from __foo__ to __?foo?__, so a user namespace starting from underscore +is now clean from symbols. + +\b Added support of ELF weak symbols and external references. See \k{elfglob}. + +\b Changed the behavior of the EXTERN keyword and introduced REQUIRED keyword. +See \k{required}. + +\b Added %ifusable and %ifusing directives. See \k{macropkg}. + +\b Made various performance improvements and stability fixes in macro +preprocessor engine. + +\b Improved NASM error handling and cleaned up error messages. + +\b Bugzilla bugfixes: 3392472, 3392554, 3392560, 3392564, 3392570, 3392576, 3392585, +3392590, 3392597, 3392599, 3392601, 3392602, 3392603, 3392607, 3392612, 3392614, 3392623, +3392626, 3392630, 3392640, 3392649, 3392659, 3392660, 3392661. + \S{cl-2.14.03} Version 2.14.03 \b Suppress nuisance "\c{label changed during code generation}" messages diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src index 69ccbc9f..6c7b851a 100644 --- a/doc/nasmdoc.src +++ b/doc/nasmdoc.src @@ -53,6 +53,7 @@ \IR{-E} \c{-E} option \IR{-F} \c{-F} option \IR{-I} \c{-I} option +\IR{-L} \c{-L} option \IR{-M} \c{-M} option \IR{-MD} \c{-MD} option \IR{-MF} \c{-MF} option @@ -81,7 +82,6 @@ \IR{-Werror} \c{-Werror} option \IR{-Wno-error} \c{-Wno-error} option \IR{-w} \c{-w} option -\IR{-y} \c{-y} option \IR{-Z} \c{-Z} option \IR{!=} \c{!=} operator \IR{$, here} \c{$}, Here token @@ -171,6 +171,7 @@ in ELF \IR{elf64} \c{elf64} \IR{elfx32} \c{elfx32} \IR{executable and linkable format} Executable and Linkable Format +\IR{extern, elf extensions to} \c{EXTERN}, \c{elf} extensions to \IR{extern, obj extensions to} \c{EXTERN}, \c{obj} extensions to \IR{extern, rdf extensions to} \c{EXTERN}, \c{rdf} extensions to \IR{floating-point, constants} floating-point, constants @@ -372,9 +373,6 @@ To get further usage instructions from NASM, try typing The option \c{--help} is an alias for the \c{-h} option. -The option \c{-hf} will also list the available output file formats, -and what they are. - If you use Linux but aren't sure whether your system is \c{a.out} or ELF, type @@ -442,7 +440,7 @@ Like \c{-o}, the intervening space between \c{-f} and the output file format is optional; so \c{-f elf} and \c{-felf} are both valid. A complete list of the available output file formats can be given by -issuing the command \i\c{nasm -hf}. +issuing the command \i\c{nasm -h}. \S{opt-l} The \i\c{-l} Option: Generating a \i{Listing File} @@ -463,6 +461,30 @@ with \c{[list +]}, (the default, obviously). There is no "user form" (without the brackets). This can be used to list only sections of interest, avoiding excessively long listings. +\S{opt-L} The \i\c{-L} Option: Additional Listing Info + +Use this option to specify listing output details. + +Supported options are: + +\c{-Le} emit each line after processing through the preprocessor + +\c{-Ls} show all single-line macro definitions + +\c{-Lm} show multi-line macro calls with expanded parameters + +\c{-Lp} output a list file in every pass + +\c{-Ld} show byte and repeat counts in decimal, not hex + +\c{-Lb} show builtin macro packages + +\c{-Lf} ignore .nolist and force output + +\c{-Lw} flush the output after every line + +\c{-L+} enable all listing options + \S{opt-M} The \i\c{-M} Option: Generate \i{Makefile Dependencies} @@ -551,8 +573,8 @@ to enable output. Versions 2.03.01 and later automatically enable \c{-g} if \c{-F} is specified. A complete list of the available debug file formats for an output -format can be seen by issuing the command \c{nasm -f <format> -y}. Not -all output formats currently support debugging output. See \k{opt-y}. +format can be seen by issuing the command \c{nasm -h}. Not +all output formats currently support debugging output. This should not be confused with the \c{-f dbg} output format option, see \k{dbgfmt}. @@ -818,6 +840,10 @@ The current \i{warning classes} are: \& warnings.src +Since version 2.15, NASM has group aliases for all prefixed warnings, +so they can be used to enable or disable all warnings in the group. +For example, -w+float enables all warnings with names starting with float-*. + Since version 2.00, NASM has also supported the \c{gcc}-like syntax \c{-Wwarning-class} and \c{-Wno-warning-class} instead of \c{-w+warning-class} and \c{-w-warning-class}, respectively; both @@ -845,19 +871,6 @@ You will need the version number if you report a bug. For command-line compatibility with Yasm, the form \i\c{--v} is also accepted for this option starting in NASM version 2.11.05. -\S{opt-y} The \i\c{-y} Option: Display Available Debug Info Formats - -Typing \c{nasm -f <option> -y} will display a list of the available -debug info formats for the given output format. The default format -is indicated by an asterisk. For example: - -\c nasm -f elf -y - -\c valid debug formats for 'elf32' output format are -\c ('*' denotes default): -\c * stabs ELF32 (i386) stabs debug format for Linux -\c dwarf elf32 (i386) dwarf debug format for Linux - \S{opt-pfix} The \i\c{--(g|l)prefix}, \i\c{--(g|l)postfix} Options. @@ -1100,6 +1113,10 @@ In addition to all of this, macros and directives work completely differently to MASM. See \k{preproc} and \k{directive} for further details. +\S{masm-compat} MASM compatibility package + +See \k{pkg_masm}. + \C{lang} The NASM Language @@ -1995,6 +2012,13 @@ not at definition time. Thus the code will evaluate in the expected way to \c{mov ax,1+2*8}, even though the macro \c{b} wasn't defined at the time of definition of \c{a}. +Note that single-line macro argument list cannot be preceded by whitespace. +Otherwise it will be treated as an expansion. For example: + +\c %define foo (a,b) ; no arguments, (a,b) is the expansion +\c %define bar(a,b) ; two arguments, empty expansion + + Macros defined with \c{%define} are \i{case sensitive}: after \c{%define foo bar}, only \c{foo} will expand to \c{bar}: \c{Foo} or \c{FOO} will not. By using \c{%idefine} instead of \c{%define} (the @@ -2047,6 +2071,21 @@ Then everywhere the macro \c{foo} is invoked, it will be expanded according to the most recent definition. This is particularly useful when defining single-line macros with \c{%assign} (see \k{assign}). +It is possible to define an empty string in the arguments list to specify +that the argument is unused explicitly. The construction like: + +\c %define myreg() eax +\c mov edx,myreg() + +is also perfectly valid, and it means that macro \c{myreg} has zero arguments - +behavior similar to preprocessor in C. + +As of version 2.15, NASM supports special types of macros arguments: +If an argument declared with an \c{&}, a macro parameter would be quoted as a +string. +If declared with a \c{+}, it is a greedy or variadic parameter. +If declared with an \c{!}, NASM will not try to strip whitespace and braces (useful with \c{&}). + You can \i{pre-define} single-line macros using the `-d' option on the NASM command line: see \k{opt-d}. @@ -2273,6 +2312,39 @@ is equivalent to \c %define test TEST +\S{defalias} Defining Aliases: \I\c{%idefalias}\i\c{%defalias} + +\c{%defalias}, and its case-insensitive counterpart \c{%idefalias}, define an +alias to a macro, i.e. equivalent of a symbolic link. + +When used with various macro defining and undefining directives, it affects the +aliased macro. This functionality is intended for being able to rename macros while +retaining the legacy names. + +When an alias is defined, but the aliased macro is then undefined, the +aliases can legitimately point to nonexistent macros. + +The single alias can be undefined using \c{%undefalias} directive. + +To disable all the single-line macro aliases, use \c{%aliases off} directive. + +To check whether an alias is defined, use \c{%ifdefalias}. + + +\S{cond-comma} \i{Conditional Comma Operator}: \i\c{%,} + +As of version 2.15, NASM has conditional comma operator \c{%,} that expands to a +comma unless followed by a null expansion, which allows suppressing the comma before an +empty argument. For example, all the expressions below are valid: + +\c %define greedy(a,b,c+) a + 66 %, b * 3 %, c +\c +\c db greedy(1,2) +\c db greedy(1,2,3) +\c db greedy(1,2,3,4) +\c db greedy(1,2,3,4,5) + + \H{strlen} \i{String Manipulation in Macros} It's often useful to be able to handle strings in macros. NASM @@ -3843,7 +3915,7 @@ mode-dependent macros. The \c{__?OUTPUT_FORMAT?__} standard macro holds the current output format name, as given by the \c{-f} option or NASM's default. Type -\c{nasm -hf} for a list. +\c{nasm -h} for a list. \c %ifidn __?OUTPUT_FORMAT?__, win32 \c %define NEWLINE 13, 10 @@ -4145,6 +4217,8 @@ It is still possible to turn in on again by \c SECTALIGN ON +Note that \c{SECTALIGN <ON|OFF>} affects only the \c{ALIGN}/\c{ALIGNB} directives, +not an explicit \c{SECTALIGN} directive. \C{macropkg} \i{Standard Macro Packages} @@ -4153,9 +4227,13 @@ macro packages included with the NASM distribution and compiled into the NASM binary. It operates like the \c{%include} directive (see \k{include}), but the included contents is provided by NASM itself. -The names of standard macro packages are case insensitive, and can be +The names of standard macro packages are case insensitive and can be quoted or not. +As of version 2.15, NASM has \c{%ifusable} and \c{%ifusing} directives to help +the user understand whether an individual package available in this version of +NASM (\c{%ifusable}) or a particular package already loaded (\c{%ifusing}). + \H{pkg_altreg} \i\c{altreg}: \i{Alternate Register Names} @@ -4268,6 +4346,20 @@ The functions \i\c{ilog2fw()} (alias \i\c{ilog2w()}) and two, but otherwise behaves like \c{ilog2f()} and \c{ilog2c()}, respectively. +\H{pkg_masm} \i\c{masm}: \i{MASM compatibility} + +Since version 2.15, NASM has a MASM compatibility package with minimal +functionality, as intended to be used primarily with machine-generated code. +It does not include any "programmer-friendly" shortcuts, nor does it in any way +support ASSUME, symbol typing, or MASM-style structures. + +Currently, the MASM compatibility package emulates only the PTR keyword and +recognize syntax displacement[index] for memory operations. + +To enable the package, use the directive: + +\c{%use masm} + \C{directive} \i{Assembler Directives} @@ -4560,6 +4652,17 @@ declared as \c{EXTERN} and then defined, it will be treated as \c{EXTERN}, it will be treated as \c{COMMON}. +\H{required} \i\c{REQUIRED}: \i{Importing Symbols} from Other Modules + +The \c{REQUIRED} keyword is similar to \c{EXTERN} one. The difference is that +the \c{EXTERN} keyword as of version 2.15 does not generate unknown symbols, as +this behavior is highly undesirable when using common header files, +because it might cause the linker to pull in a bunch of unnecessary modules, +depending on how smart the linker is. + +If the old behavior is required, use \c{REQUIRED} keyword instead. + + \H{global} \i\c{GLOBAL}: \i{Exporting Symbols} to Other Modules \c{GLOBAL} is the other end of \c{EXTERN}: if one module declares a @@ -6053,6 +6156,9 @@ course. For example, to make \c{hashlookup} hidden: \c global hashlookup:function hidden +Since version 2.15, it is possible to specify symbols binding. The keywords +are: \i\c{weak} to generate weak symbol or \i\c{strong}. The default is \i\c{strong}. + You can also specify the size of the data associated with the symbol, as a numeric expression (which may involve labels, and even forward references) after the type specifier. Like this: @@ -6071,6 +6177,15 @@ writing shared library code. For more information, see \k{picglobal}. +\S{elfextrn} \c{elf} Extensions to the \c{EXTERN} Directive\I{EXTERN, +elf extensions to}\I{EXTERN, elf extensions to} + +Since version 2.15 it is possible to specify keyword \i\c{weak} to generate weak external +reference. Example: + +\c extern weak_ref:weak + + \S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive \I{COMMON, elf extensions to} | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 07:09:15
      
     | 
| Commit-ID: 7902262721fdc3b5e6698532dfa36dc15e64b2f2 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=7902262721fdc3b5e6698532dfa36dc15e64b2f2 Author: Chang S. Bae <cha...@in...> AuthorDate: Tue, 5 May 2020 07:05:47 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 5 May 2020 07:05:47 +0000 NASM 2.15rc1 --- version | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/version b/version index 91d11b10..454eeb35 100644 --- a/version +++ b/version @@ -1 +1 @@ -2.15rc0 +2.15rc1 | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:45
      
     | 
| Commit-ID: c32fb083192d8583b32fa0ac83452308cd093f81 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=c32fb083192d8583b32fa0ac83452308cd093f81 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 15:25:05 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 22 Apr 2020 00:10:21 +0000 test: Add BR3392607 Reported-by: Henrik Gramner <he...@gr...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392607 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392607.asm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/test/br3392607.asm b/test/br3392607.asm new file mode 100644 index 00000000..a61eafc3 --- /dev/null +++ b/test/br3392607.asm @@ -0,0 +1,2 @@ +BITS 64 + vpshldvw xmm0{k1}{z}, xmm1, xmm2 | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:43
      
     | 
| Commit-ID: c52aff4cc8680e404cce1cc2a183b7015f07be08 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=c52aff4cc8680e404cce1cc2a183b7015f07be08 Author: Chang S. Bae <cha...@in...> AuthorDate: Mon, 20 Apr 2020 21:43:44 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:41:33 +0000 preproc: Fix in accessing the definition structure of a single-line macro Determining whether we should warn on defining a single-line macro, with a name and a certain number of parameters, call a helper function, smacro_defined(). It does not always return the address of the definition structure. Fix the code to be cautiously accessing the definition structure. Fixes: e91f5cc1322e ("preproc: fix %undef of macro aliases, and add %ifdefalias") Reported-by: Dale Curtis <dal...@ch...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392659 Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/asm/preproc.c b/asm/preproc.c index fae3b868..9ab05765 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -2448,7 +2448,7 @@ static enum cond_state if_condition(Token * tline, enum preproc_token ct) mname = tok_text(tline); ctx = get_ctx(mname, &mname); - if (smacro_defined(ctx, mname, 0, &smac, true, alias) + if (smacro_defined(ctx, mname, 0, &smac, true, alias) && smac && smac->alias == alias) { j = true; break; | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:42
      
     | 
| Commit-ID: ee8edad40bbc7687752f7987c63b5d9087cf9151 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=ee8edad40bbc7687752f7987c63b5d9087cf9151 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 22 Apr 2020 00:07:58 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 22 Apr 2020 00:08:28 +0000 test: Add BR3392661 Suggested-by: <ma...@ou...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392661 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392661.asm | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/test/br3392661.asm b/test/br3392661.asm new file mode 100644 index 00000000..9a349d19 --- /dev/null +++ b/test/br3392661.asm @@ -0,0 +1,8 @@ +section .text + +global _start + +_start: + mov rdi, 0 ; Exit status + mov rax, 60 ; Exit syscall number + syscall | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:39
      
     | 
| Commit-ID: d45cafc6599e727803260c8c86f54ff4837d7cab Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=d45cafc6599e727803260c8c86f54ff4837d7cab Author: Chang S. Bae <cha...@in...> AuthorDate: Tue, 5 May 2020 06:38:11 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 5 May 2020 06:38:11 +0000 NASM 2.15rc1 --- version | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/version b/version index 91d11b10..454eeb35 100644 --- a/version +++ b/version @@ -1 +1 @@ -2.15rc0 +2.15rc1 | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:36
      
     | 
| Commit-ID: bd1055b8be048c4b996415a0438f8b48ac766f90 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=bd1055b8be048c4b996415a0438f8b48ac766f90 Author: Chang S. Bae <cha...@in...> AuthorDate: Tue, 24 Mar 2020 14:24:43 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 22 Apr 2020 00:09:58 +0000 disam: explicitly change stdin to binary mode The binary mode has no difference from text mode in POSIX-compliant operating systems. The two modes are distinguishable from each other on Windows, and perhaps on other systems as well. The binary stream has scalability and other advantages. Windows treats the standard input stream as text mode by default. So the code changes it to binary mode. Also, add a helper function, nasm_set_binary_mode(), that is OS-agnostic, in the library. Reported-by: Didier Stevens <did...@gm...> Suggested-by: Didier Stevens <did...@gm...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392649 Signed-off-by: Chang S. Bae <cha...@in...> --- disasm/ndisasm.c | 4 +++- include/nasmlib.h | 2 ++ nasmlib/file.c | 5 +++++ nasmlib/file.h | 22 ++++++++++++++++++++++ 4 files changed, 32 insertions(+), 1 deletion(-) diff --git a/disasm/ndisasm.c b/disasm/ndisasm.c index f3c23b00..01e0c557 100644 --- a/disasm/ndisasm.c +++ b/disasm/ndisasm.c @@ -280,8 +280,10 @@ int main(int argc, char **argv) pname, filename, strerror(errno)); return 1; } - } else + } else { + nasm_set_binary_mode(stdin); fp = stdin; + } if (initskip > 0) skip(initskip, fp); diff --git a/include/nasmlib.h b/include/nasmlib.h index 940f1cb7..c4b4ac4c 100644 --- a/include/nasmlib.h +++ b/include/nasmlib.h @@ -365,6 +365,8 @@ enum file_flags { FILE *nasm_open_read(const char *filename, enum file_flags flags); FILE *nasm_open_write(const char *filename, enum file_flags flags); +void nasm_set_binary_mode(FILE *f); + /* Probe for existence of a file */ bool nasm_file_exists(const char *filename); diff --git a/nasmlib/file.c b/nasmlib/file.c index a8cd3057..62b854de 100644 --- a/nasmlib/file.c +++ b/nasmlib/file.c @@ -148,6 +148,11 @@ os_filename os_mangle_filename(const char *filename) #endif +void nasm_set_binary_mode(FILE *f) +{ + os_set_binary_mode(f); +} + FILE *nasm_open_read(const char *filename, enum file_flags flags) { FILE *f = NULL; diff --git a/nasmlib/file.h b/nasmlib/file.h index 4f0420ec..fc8f893d 100644 --- a/nasmlib/file.h +++ b/nasmlib/file.h @@ -103,6 +103,24 @@ typedef struct _stati64 os_struct_stat; # define os_stat _wstati64 # define os_fstat _fstati64 +/* + * On Win32/64, freopen() and _wfreopen() fails when the mode string + * is with the letter 'b' that represents to set binary mode. On + * POSIX operating systems, the 'b' is ignored, without failure. + */ + +#include <io.h> +#include <fcntl.h> + +static inline void os_set_binary_mode(FILE *f) { + int ret = _setmode(_fileno(f), _O_BINARY); + + if (ret == -1) { + nasm_fatalf(ERR_NOFILE, "unable to open file: %s", + strerror(errno)); + } +} + #else /* not _WIN32 */ typedef const char *os_filename; @@ -117,6 +135,10 @@ static inline void os_free_filename(os_filename filename) (void)filename; /* Nothing to do */ } +static inline void os_set_binary_mode(FILE *f) { + (void)f; +} + # define os_fopen fopen #if defined(HAVE_FACCESSAT) && defined(AT_EACCESS) | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:35
      
     | 
| Commit-ID: 74b2731f2cef0f1ec8c0c6e6e3dee9492b851e8c Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=74b2731f2cef0f1ec8c0c6e6e3dee9492b851e8c Author: Chang S. Bae <cha...@in...> AuthorDate: Tue, 21 Apr 2020 09:23:39 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Wed, 22 Apr 2020 00:05:56 +0000 outelf: Fix the section index for the debug output The section information delivered to the debug output has an index of the section table. The index should be different from the total number of sections at the moment, the returned value from add_sectname(). So, fix the value. Fixes: b2004511ddde ("ELF: handle more than 32,633 sections") Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392654 Reported-by: <ma...@ou...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392661 Signed-off-by: Chang S. Bae <cha...@in...> --- output/outelf.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/output/outelf.c b/output/outelf.c index 4976b680..18b52d88 100644 --- a/output/outelf.c +++ b/output/outelf.c @@ -1108,7 +1108,8 @@ static void elf32_out(int32_t segto, const void *data, /* again some stabs debugging stuff */ sinfo.offset = s->len; - sinfo.section = s->shndx; + /* Adjust to an index of the section table. */ + sinfo.section = s->shndx - 1; sinfo.segto = segto; sinfo.name = s->name; dfmt->debug_output(TY_DEBUGSYMLIN, &sinfo); @@ -1312,7 +1313,8 @@ static void elf64_out(int32_t segto, const void *data, /* again some stabs debugging stuff */ sinfo.offset = s->len; - sinfo.section = s->shndx; + /* Adjust to an index of the section table. */ + sinfo.section = s->shndx - 1; sinfo.segto = segto; sinfo.name = s->name; dfmt->debug_output(TY_DEBUGSYMLIN, &sinfo); @@ -1592,7 +1594,8 @@ static void elfx32_out(int32_t segto, const void *data, /* again some stabs debugging stuff */ sinfo.offset = s->len; - sinfo.section = s->shndx; + /* Adjust to an index of the section table. */ + sinfo.section = s->shndx - 1; sinfo.segto = segto; sinfo.name = s->name; dfmt->debug_output(TY_DEBUGSYMLIN, &sinfo); | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:32
      
     | 
| Commit-ID: 5f8d0ec1f6487fb7a2520b1c81292f2242acb01c Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=5f8d0ec1f6487fb7a2520b1c81292f2242acb01c Author: Chang S. Bae <cha...@in...> AuthorDate: Tue, 21 Apr 2020 21:35:54 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:35:54 +0000 test: Add BR3392660 Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392660 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392660.asm | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/test/br3392660.asm b/test/br3392660.asm new file mode 100644 index 00000000..737fb616 --- /dev/null +++ b/test/br3392660.asm @@ -0,0 +1,9 @@ +%macro coreloop 1 + .count_%+1: + .no_run_before_%+1: + .broken_run_before_%-1: +%endmacro + +label: + coreloop z + coreloop nz | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:32
      
     | 
| Commit-ID: 7ee58d44e4df3f3097b9475dd0aafedecd428abd Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=7ee58d44e4df3f3097b9475dd0aafedecd428abd Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 25 Mar 2020 15:13:21 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:12:01 +0000 preproc: Fix the token in expanding the macro-parameters The code looked to be unintentionally always nullifying the token pointer at first place in handling those macro-parameters. Remove it to avoid segfault. Fixes: de7acc3a46cb ("preproc: defer %00, %? and %?? expansion for nested macros, cleanups") Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392640 Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/asm/preproc.c b/asm/preproc.c index befe77e8..cf770026 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -4833,8 +4833,6 @@ static Token *expand_mmac_params(Token * tline) unsigned long n; char *ep; - text = NULL; - n = strtoul(tok_text(t) + 2, &ep, 10); if (unlikely(*ep)) goto invalid; | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:32
      
     | 
| Commit-ID: 0197c966da91cc2cecc0884ccaf332aa81b32deb Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=0197c966da91cc2cecc0884ccaf332aa81b32deb Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 14:45:23 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:11:33 +0000 test: Add BR3392630 Add the test code into the existing xdefine testing. Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392630 Signed-off-by: Chang S. Bae <cha...@in...> --- test/xdefine.asm | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/test/xdefine.asm b/test/xdefine.asm index 3b475864..180c0305 100644 --- a/test/xdefine.asm +++ b/test/xdefine.asm @@ -8,4 +8,8 @@ %xdefine ctr n %define n 0x22 - db ctr, n ; Should be 0x21, 0x22 + db ctr, n ; Should be 0x21, 0x22 + +%define MNSUFFIX +%define MNCURRENT TEST%[MNSUFFIX] +%xdefine var MNCURRENT | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:32
      
     | 
| Commit-ID: 9e019f249c6c10e86e4fd47ed82533dc8d70e789 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=9e019f249c6c10e86e4fd47ed82533dc8d70e789 Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 15:07:49 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:12:39 +0000 test: Add BR3392640 Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392640 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392640.asm | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/test/br3392640.asm b/test/br3392640.asm new file mode 100644 index 00000000..e593be9e --- /dev/null +++ b/test/br3392640.asm @@ -0,0 +1,4 @@ + %imacro mac 1-2 + j%+1 + %endmacro + mac c, label | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:31
      
     | 
| Commit-ID: 057b832f45daca2d2260f0e914f565252271b69f Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=057b832f45daca2d2260f0e914f565252271b69f Author: Chang S. Bae <cha...@in...> AuthorDate: Sat, 18 Apr 2020 23:11:21 +0000 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:28:50 +0000 preproc: Fix the macro-parameter check for conditional code Mistreating the macro-parameter, just equivalent to the given argument number, leads to casting an unnecessary error. Fix to assemble the conditional code correctly. Fixes: de7acc3a46cb ("preproc: defer %00, %? and %?? expansion for nested macros, cleanups") Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392660 Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/asm/preproc.c b/asm/preproc.c index cf770026..fae3b868 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -4837,7 +4837,7 @@ static Token *expand_mmac_params(Token * tline) if (unlikely(*ep)) goto invalid; - if (n && n < mac->nparam) { + if (n && n <= mac->nparam) { n = mmac_rotate(mac, n); tt = mac->params[n]; } | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:29
      
     | 
| Commit-ID: 95e54a9f1f693f38075e8c8b8553970a8cfac0d7 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=95e54a9f1f693f38075e8c8b8553970a8cfac0d7 Author: Chang S. Bae <cha...@in...> AuthorDate: Thu, 6 Feb 2020 14:39:22 -0800 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:11:10 +0000 preproc: Fix the token iterator in expanding single-line macro The code used to stuck in going through whitespace tokens. Fix to increment towards on the next in the loop. Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392630 Suggested-by: C. Masloch <pu...@ul...> Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/asm/preproc.c b/asm/preproc.c index f94d9558..befe77e8 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -5379,8 +5379,10 @@ static SMacro *expand_one_smacro(Token ***tpp) Token *endt = tline; tline = t; - while (!cond_comma && t && t != endt) + while (!cond_comma && t && t != endt) { cond_comma = t->type != TOK_WHITESPACE; + t = t->next; + } } if (tnext) { | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:27
      
     | 
| Commit-ID: bec812fc4b56d0ff8c2321f8ac47ffe41e86e9ca Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=bec812fc4b56d0ff8c2321f8ac47ffe41e86e9ca Author: Chang S. Bae <cha...@in...> AuthorDate: Fri, 7 Feb 2020 15:49:38 -0800 Committer: Chang S. Bae <cha...@in...> CommitDate: Fri, 17 Apr 2020 21:33:33 +0000 preproc: Fix to reset %rep list line number after every iteration The code has been fixed to print the corresponding line numbers of %rep blocks correctly, but only for the first iteration. For the subsequent iterations, the current line number on the expansion needs to be explicitly reset again. Fixes: ab6f8319552f ("listing: when listing lines in macros and rep blocks, show the actual line") Reported-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392626 Signed-off-by: Chang S. Bae <cha...@in...> --- asm/preproc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/asm/preproc.c b/asm/preproc.c index 41a7c6fb..f94d9558 100644 --- a/asm/preproc.c +++ b/asm/preproc.c @@ -6211,6 +6211,7 @@ static Token *pp_tokline(void) Token *t, *tt, **tail; Line *ll; + istk->mstk.mstk->lineno = 0; nasm_new(ll); ll->next = istk->expansion; tail = &ll->first; | 
| 
      
      
      From: nasm-bot f. C. S. B. <cha...@in...> - 2020-05-05 06:57:27
      
     | 
| Commit-ID: bb96fdc74ce894a86c3f0efaa522f5c76739a6eb Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=bb96fdc74ce894a86c3f0efaa522f5c76739a6eb Author: Chang S. Bae <cha...@in...> AuthorDate: Wed, 1 Apr 2020 15:06:38 -0700 Committer: Chang S. Bae <cha...@in...> CommitDate: Tue, 21 Apr 2020 21:00:56 +0000 test: Add BR3392626 There are many similar preprocessor loop cases but located on each bug-report basis. While it looks to be better to consolidate them together, add one more test case like what was done before. Suggested-by: C. Masloch <pu...@ul...> Link: https://bugzilla.nasm.us/show_bug.cgi?id=3392626 Signed-off-by: Chang S. Bae <cha...@in...> --- test/br3392626.asm | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/test/br3392626.asm b/test/br3392626.asm new file mode 100644 index 00000000..de4ad8ee --- /dev/null +++ b/test/br3392626.asm @@ -0,0 +1,6 @@ + ; line 1 +%rep 3 ; line 2 + ; line 3 + db 26h ; line 4 +%endrep ; line 5 + ; line 6 | 
| 
      
      
      From: nasm-bot f. C. G. <gor...@gm...> - 2019-04-24 18:03:54
      
     | 
| Commit-ID: a384068a04a5cf92a4564fa39b061b7539cb94f9 Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=a384068a04a5cf92a4564fa39b061b7539cb94f9 Author: Cyrill Gorcunov <gor...@gm...> AuthorDate: Sun, 31 Mar 2019 19:33:08 +0300 Committer: Cyrill Gorcunov <gor...@gm...> CommitDate: Sun, 31 Mar 2019 19:34:50 +0300 doc: latex -- Initial import It is an initial import for conversion of our documentation to latex format. Note that latex additional packages needs to be preinstalled, xelatex is used for pdf generation. While I've been very carefull while converting the docs there is a big probability that some indices might be screwed so we need to review everything once again. Then we need to create a converter for html backend, I started working on it but didn't successed yet and I fear won't have enough spare time in near future. Also we need to autogenerate instruction table and warnings from insns.dat and probably from scanning nasm sources. To build nasm.pdf just run make -C doc/latex/ it doesn't require configuration and rather a standalone builder out of our traditional build engine. Signed-off-by: Cyrill Gorcunov <gor...@gm...> --- doc/latex/.gitignore | 2 + doc/latex/Makefile | 66 ++ doc/latex/src/16bit.tex | 868 ++++++++++++++ doc/latex/src/32bit.tex | 539 +++++++++ doc/latex/src/64bit.tex | 204 ++++ doc/latex/src/changelog.tex | 2304 ++++++++++++++++++++++++++++++++++++ doc/latex/src/contact.tex | 111 ++ doc/latex/src/directive.tex | 541 +++++++++ doc/latex/src/idxconf.ist | 9 + doc/latex/src/inslist.tex | 14 + doc/latex/src/intro.tex | 55 + doc/latex/src/language.tex | 945 +++++++++++++++ doc/latex/src/macropkg.tex | 127 ++ doc/latex/src/mixsize.tex | 185 +++ doc/latex/src/nasm.tex | 163 +++ doc/{ => latex/src}/nasmlogo.eps | 0 doc/latex/src/ndisasm.tex | 174 +++ doc/latex/src/outfmt.tex | 1606 +++++++++++++++++++++++++ doc/latex/src/preproc.tex | 2400 ++++++++++++++++++++++++++++++++++++++ doc/latex/src/running.tex | 902 ++++++++++++++ doc/latex/src/source.tex | 53 + doc/latex/src/trouble.tex | 114 ++ doc/latex/src/version.tex | 4 + 23 files changed, 11386 insertions(+) diff --git a/doc/latex/.gitignore b/doc/latex/.gitignore new file mode 100644 index 00000000..4f265ac2 --- /dev/null +++ b/doc/latex/.gitignore @@ -0,0 +1,2 @@ +.git-ignore/ +*.swp diff --git a/doc/latex/Makefile b/doc/latex/Makefile new file mode 100644 index 00000000..afbe73ad --- /dev/null +++ b/doc/latex/Makefile @@ -0,0 +1,66 @@ +.PHONY: all .FORCE +.DEFAULT_GOAL := all + +ifeq ($(strip $(V)),) + E := @echo + Q := @ +else + E := @\# + Q := +endif + +export E Q + +define msg-gen + $(E) " GEN " $(1) +endef + +define msg-clean + $(E) " CLEAN " $(1) +endef + +RM ?= rm -f +XELATEX ?= xelatex +XELATEX-OPTS ?= -output-driver="xdvipdfmx -V 3" -8bit + +tex-d += src/16bit.tex +tex-d += src/32bit.tex +tex-d += src/64bit.tex +tex-d += src/changelog.tex +tex-d += src/contact.tex +tex-d += src/directive.tex +tex-d += src/idxconf.ist +tex-d += src/inslist.tex +tex-d += src/intro.tex +tex-d += src/language.tex +tex-d += src/macropkg.tex +tex-d += src/mixsize.tex +tex-d += src/nasmlogo.eps +tex-d += src/ndisasm.tex +tex-d += src/outfmt.tex +tex-d += src/preproc.tex +tex-d += src/running.tex +tex-d += src/source.tex +tex-d += src/trouble.tex +tex-d += src/version.tex +tex-y += src/nasm.tex + +$(tex-y): $(tex-d) + @true + +nasm.pdf: $(tex-y) .FORCE + $(call msg-gen,$@) + $(Q) $(XELATEX) $(XELATEX-OPTS) $^ + $(Q) $(XELATEX) $(XELATEX-OPTS) $^ +all-y += nasm.pdf + +# Default target +all: $(all-y) + +clean: + $(call msg-clean,nasm) + $(Q) $(RM) ./nasm.aux ./nasm.idx ./nasm.ilg ./nasm.ind ./nasm.log + $(Q) $(RM) ./nasm.out ./nasm.pdf ./nasm.toc + +# Disable implicit rules in _this_ Makefile. +.SUFFIXES: diff --git a/doc/latex/src/16bit.tex b/doc/latex/src/16bit.tex new file mode 100644 index 00000000..79bebcb9 --- /dev/null +++ b/doc/latex/src/16bit.tex @@ -0,0 +1,868 @@ +% +% vim: ts=4 sw=4 et +% +\xchapter{16bit}{Writing 16-bit Code (DOS, Windows 3/3.1)} + +This chapter attempts to cover some of the common issues encountered +when writing 16-bit code to run under \code{MS-DOS} or \code{Windows 3.x}. +It covers how to link programs to produce \code{.EXE} or \code{.COM} files, +how to write \code{.SYS} device drivers, and how to interface assembly +language code with 16-bit C compilers and with Borland Pascal. + +\xsection{exefiles}{Producing \codeindex{.EXE} Files} + +Any large program written under DOS needs to be built as a \code{.EXE} +file: only \code{.EXE} files have the necessary internal structure +required to span more than one 64K segment. \textindex{Windows} programs, +also, have to be built as \code{.EXE} files, since Windows does not +support the \code{.COM} format. + +In general, you generate \code{.EXE} files by using the \code{obj} output +format to produce one or more \codeindex{.OBJ} files, and then linking +them together using a linker. However, NASM also supports the direct +generation of simple DOS \code{.EXE} files using the \code{bin} output +format (by using \code{DB} and \code{DW} to construct the \code{.EXE} file +header), and a macro package is supplied to do this. Thanks to +Yann Guidon for contributing the code for this. + +NASM may also support \code{.EXE} natively as another output format in +future releases. + +\xsubsection{objexe}{Using the \code{obj} Format To Generate \code{.EXE} Files} + +This section describes the usual method of generating \code{.EXE} files +by linking \code{.OBJ} files together. + +Most 16-bit programming language packages come with a suitable +linker; if you have none of these, there is a free linker called +\textindex{VALX}\index{linker!VALX}, available as a part of +CC386 compiler on \href{http://ladsoft.tripod.com/cc386\_compiler.html} +{ladsoft.tripod.com}. + +There is another `free' linker (though this one doesn't come with +sources) called \textindex{FREELINK}\index{linker!FREELINK}, available +from \href{http://www.pcorner.com/tpc/old/3-101.html}{www.pcorner.com}. + +A third, \textindex{djlink}, written by DJ Delorie, is available at +\href{http://www.delorie.com/djgpp/16bit/djlink/}{www.delorie.com}. + +A fourth linker, \textindex{ALINK}\index{linker!ALINK}, written by +Anthony A.J. Williams, is available at \href{http://alink.sourceforge.net} +{alink.sourceforge.net}. + +When linking several \code{.OBJ} files into a \code{.EXE} file, you should +ensure that exactly one of them has a start point defined (using the +\index{program entry point}\codeindex{..start} special symbol defined by the +\code{obj} format: see \nref{dotdotstart}). If no module defines a start +point, the linker will not know what value to give the entry-point +field in the output file header; if more than one defines a start +point, the linker will not know \emph{which} value to use. + +An example of a NASM source file which can be assembled to a +\code{.OBJ} file and linked on its own to a \code{.EXE} is given here. It +demonstrates the basic principles of defining a stack, initialising +the segment registers, and declaring a start point. This file is +also provided in the \index{test subdirectory}\code{test} subdirectory of +the NASM archives, under the name \code{objexe.asm}. + +\begin{lstlisting} +segment code + +..start: + mov ax,data + mov ds,ax + mov ax,stack + mov ss,ax + mov sp,stacktop +\end{lstlisting} + +This initial piece of code sets up \code{DS} to point to the data +segment, and initializes \code{SS} and \code{SP} to point to the top of +the provided stack. Notice that interrupts are implicitly disabled +for one instruction after a move into \code{SS}, precisely for this +situation, so that there's no chance of an interrupt occurring +between the loads of \code{SS} and \code{SP} and not having a stack to +execute on. + +Note also that the special symbol \code{..start} is defined at the +beginning of this code, which means that will be the entry point +into the resulting executable file. + +\begin{lstlisting} + mov dx,hello + mov ah,9 + int 0x21 +\end{lstlisting} + +The above is the main program: load \code{DS:DX} with a pointer to the +greeting message (\code{hello} is implicitly relative to the segment +\code{data}, which was loaded into \code{DS} in the setup code, so the +full pointer is valid), and call the DOS print-string function. + +\begin{lstlisting} + mov ax,0x4c00 + int 0x21 +\end{lstlisting} + +This terminates the program using another DOS system call. + +\begin{lstlisting} +segment data + +hello: db 'hello, world', 13, 10, '$' +\end{lstlisting} + +The data segment contains the string we want to display. + +\begin{lstlisting} +segment stack stack + resb 64 +stacktop: +\end{lstlisting} + +The above code declares a stack segment containing 64 bytes of +uninitialized stack space, and points \code{stacktop} at the top of it. +The directive \code{segment stack stack} defines a segment \emph{called} +\code{stack}, and also of \emph{type} \code{STACK}. The latter is not +necessary to the correct running of the program, but linkers are +likely to issue warnings or errors if your program has no segment of +type \code{STACK}. + +The above file, when assembled into a \code{.OBJ} file, will link on +its own to a valid \code{.EXE} file, which when run will print `hello, +world' and then exit. + +\xsubsection{binexe}{Using the \code{bin} Format To Generate \code{.EXE} Files} + +The \code{.EXE} file format is simple enough that it's possible to +build a \code{.EXE} file by writing a pure-binary program and sticking +a 32-byte header on the front. This header is simple enough that it +can be generated using \code{DB} and \code{DW} commands by NASM itself, +so that you can use the \code{bin} output format to directly generate +\code{.EXE} files. + +Included in the NASM archives, in the \index{misc subdirectory}\code{misc} +subdirectory, is a file \codeindex{exebin.mac} of macros. It defines three +macros: \codeindex{EXE\_begin}, \codeindex{EXE\_stack} and +\codeindex{EXE\_end}. + +To produce a \code{.EXE} file using this method, you should start by +using \code{\%include} to load the \code{exebin.mac} macro package into +your source file. You should then issue the \code{EXE\_begin} macro call +(which takes no arguments) to generate the file header data. Then +write code as normal for the \code{bin} format - you can use all three +standard sections \code{.text}, \code{.data} and \code{.bss}. At the end of +the file you should call the \code{EXE\_end} macro (again, no arguments), +which defines some symbols to mark section sizes, and these symbols +are referred to in the header code generated by \code{EXE\_begin}. + +In this model, the code you end up writing starts at \code{0x100}, just +like a \code{.COM} file - in fact, if you strip off the 32-byte header +from the resulting \code{.EXE} file, you will have a valid \code{.COM} +program. All the segment bases are the same, so you are limited to a +64K program, again just like a \code{.COM} file. Note that an \code{ORG} +directive is issued by the \code{EXE\_begin} macro, so you should not +explicitly issue one of your own. + +You can't directly refer to your segment base value, unfortunately, +since this would require a relocation in the header, and things +would get a lot more complicated. So you should get your segment +base by copying it out of \code{CS} instead. + +On entry to your \code{.EXE} file, \code{SS:SP} are already set up to +point to the top of a 2Kb stack. You can adjust the default stack +size of 2Kb by calling the \code{EXE\_stack} macro. For example, to +change the stack size of your program to 64 bytes, you would call +\code{EXE\_stack 64}. + +A sample program which generates a \code{.EXE} file in this way is +given in the \code{test} subdirectory of the NASM archive, as +\code{binexe.asm}. + +\xsection{comfiles}{Producing \codeindex{.COM} Files} + +While large DOS programs must be written as \code{.EXE} files, small +ones are often better written as \code{.COM} files. \code{.COM} files are +pure binary, and therefore most easily produced using the \code{bin} +output format. + +\xsubsection{combinfmt}{Using the \code{bin} Format To Generate \code{.COM} Files} + +\code{.COM} files expect to be loaded at offset \code{100h} into their +segment (though the segment may change). Execution then begins at +\indexcode{ORG}\code{100h}, i.e. right at the start of the program. +So to write a \code{.COM} program, you would create a source file +looking like + +\begin{lstlisting} + org 100h + +section .text +start: + ; put your code here + +section .data + ; put data items here + +section .bss + ; put uninitialized data here +\end{lstlisting} + +The \code{bin} format puts the \code{.text} section first in the file, +so you can declare data or BSS items before beginning to write code if +you want to and the code will still end up at the front of the file +where it belongs. + +The BSS (uninitialized data) section does not take up space in the +\code{.COM} file itself: instead, addresses of BSS items are resolved +to point at space beyond the end of the file, on the grounds that +this will be free memory when the program is run. Therefore you +should not rely on your BSS being initialized to all zeros when you +run. + +To assemble the above program, you should use a command line like + +\begin{lstlisting} +nasm myprog.asm -fbin -o myprog.com +\end{lstlisting} + +The \code{bin} format would produce a file called \code{myprog} if no +explicit output file name were specified, so you have to override it +and give the desired file name. + +\xsubsection{comobjfmt}{Using the \code{obj} Format To Generate \code{.COM} Files} + +If you are writing a \code{.COM} program as more than one module, you +may wish to assemble several \code{.OBJ} files and link them together +into a \code{.COM} program. You can do this, provided you have a linker +capable of outputting \code{.COM} files directly (\textindex{TLINK} does this), +or alternatively a converter program such as \codeindex{EXE2BIN} to +transform the \code{.EXE} file output from the linker into a \code{.COM} +file. + +If you do this, you need to take care of several things: + +\begin{itemize} + \item{The first object file containing code should start its code + segment with a line like \code{RESB 100h}. This is to ensure + that the code begins at offset \code{100h} relative to the beginning + of the code segment, so that the linker or converter program does + not have to adjust address references within the file when generating + the \code{.COM} file. Other assemblers use an \codeindex{ORG} directive + for this purpose, but \code{ORG} in NASM is a format-specific directive + to the \code{bin} output format, and does not mean the same thing as + it does in MASM-compatible assemblers.} + \item{You don't need to define a stack segment.} + \item{All your segments should be in the same group, so that every time + your code or data references a symbol offset, all offsets are + relative to the same segment base. This is because, when a \code{.COM} + file is loaded, all the segment registers contain the same value.} +\end{itemize} + +\xsection{sysfiles}{Producing \codeindex{.SYS} Files} + +\textindex{MS-DOS device drivers} - \code{.SYS} files - are pure binary files, +similar to \code{.COM} files, except that they start at origin zero +rather than \code{100h}. Therefore, if you are writing a device driver +using the \code{bin} format, you do not need the \code{ORG} directive, +since the default origin for \code{bin} is zero. Similarly, if you are +using \code{obj}, you do not need the \code{RESB 100h} at the start of +your code segment. + +\code{.SYS} files start with a header structure, containing pointers to +the various routines inside the driver which do the work. This +structure should be defined at the start of the code segment, even +though it is not actually code. + +For more information on the format of \code{.SYS} files, and the data +which has to go in the header structure, a list of books is given in +the Frequently Asked Questions list for the newsgroup +\href{news:comp.os.msdos.programmer}{comp.os.msdos.programmer}. + +\xsection{16c}{Interfacing to 16-bit C Programs} + +This section covers the basics of writing assembly routines that +call, or are called from, C programs. To do this, you would +typically write an assembly module as a \code{.OBJ} file, and link it +with your C modules to produce a \textindex{mixed-language program}. + +\xsubsection{16cunder}{External Symbol Names} + +\index{C symbol names}\index{underscore!in C symbols}C compilers have the +convention that the names of all global symbols (functions or data) +they define are formed by prefixing an underscore to the name as it +appears in the C program. So, for example, the function a C +programmer thinks of as \code{printf} appears to an assembly language +programmer as \code{\_printf}. This means that in your assembly +programs, you can define symbols without a leading underscore, and +not have to worry about name clashes with C symbols. + +If you find the underscores inconvenient, you can define macros to +replace the \code{GLOBAL} and \code{EXTERN} directives as follows: + +\begin{lstlisting} +%macro cglobal 1 + global _%1 + %define %1 _%1 +%endmacro + +%macro cextern 1 + extern _%1 + %define %1 _%1 +%endmacro +\end{lstlisting} + +(These forms of the macros only take one argument at a time; a +\code{\%rep} construct could solve this.) + +If you then declare an external like this: + +\begin{lstlisting} +cextern printf +\end{lstlisting} + +then the macro will expand it as + +\begin{lstlisting} +extern _printf +%define printf _printf +\end{lstlisting} + +Thereafter, you can reference \code{printf} as if it was a symbol, and +the preprocessor will put the leading underscore on where necessary. + +The \code{cglobal} macro works similarly. You must use \code{cglobal} +before defining the symbol in question, but you would have had to do +that anyway if you used \code{GLOBAL}. + +Also see \nref{opt-pfix}. + +\xsubsection{16cmodels}{\textindexlc{Memory Models}} + +NASM contains no mechanism to support the various C memory models +directly; you have to keep track yourself of which one you are +writing for. This means you have to keep track of the following +things: + +\begin{itemize} + \item{In models using a single code segment (tiny, small and compact), + functions are near. This means that function pointers, when stored + in data segments or pushed on the stack as function arguments, are + 16 bits long and contain only an offset field (the \code{CS} register + never changes its value, and always gives the segment part of the + full function address), and that functions are called using ordinary + near \code{CALL} instructions and return using \code{RETN} (which, in + NASM, is synonymous with \code{RET} anyway). This means both that you + should write your own routines to return with \code{RETN}, and that you + should call external C routines with near \code{CALL} instructions.} + + \item{In models using more than one code segment (medium, large and + huge), functions are far. This means that function pointers are 32 + bits long (consisting of a 16-bit offset followed by a 16-bit + segment), and that functions are called using \code{CALL FAR} (or + \code{CALL seg:offset}) and return using \code{RETF}. Again, you should + therefore write your own routines to return with \code{RETF} and use + \code{CALL FAR} to call external routines.} + + \item{In models using a single data segment (tiny, small and medium), + data pointers are 16 bits long, containing only an offset field (the + \code{DS} register doesn't change its value, and always gives the + segment part of the full data item address).} + + \item{In models using more than one data segment (compact, large and + huge), data pointers are 32 bits long, consisting of a 16-bit offset + followed by a 16-bit segment. You should still be careful not to + modify \code{DS} in your routines without restoring it afterwards, but + \code{ES} is free for you to use to access the contents of 32-bit data + pointers you are passed.} + + \item{The huge memory model allows single data items to exceed 64K in + size. In all other memory models, you can access the whole of a data + item just by doing arithmetic on the offset field of the pointer you + are given, whether a segment field is present or not; in huge model, + you have to be more careful of your pointer arithmetic.} + + \item{In most memory models, there is a \emph{default} data segment, whose + segment address is kept in \code{DS} throughout the program. This data + segment is typically the same segment as the stack, kept in \code{SS}, + so that functions' local variables (which are stored on the stack) + and global data items can both be accessed easily without changing + \code{DS}. Particularly large data items are typically stored in other + segments. However, some memory models (though not the standard + ones, usually) allow the assumption that \code{SS} and \code{DS} hold the + same value to be removed. Be careful about functions' local + variables in this latter case.} +\end{itemize} + +In models with a single code segment, the segment is called \codeindex{\_TEXT}, +so your code segment must also go by this name in order to be linked into the +same place as the main code segment. In models with a single data segment, +or with a default data segment, it is called \codeindex{\_DATA}. + +\xsubsection{16cfunc}{Function Definitions and Function Calls} + +\index{functions!C calling convention}The \textindex{C calling convention} +in 16-bit programs is as follows. In the following description, the +words \emph{caller} and \emph{callee} are used to denote the function +doing the calling and the function which gets called. + +\begin{itemize} + \item{The caller pushes the function's parameters on the stack, one + after another, in reverse order (right to left, so that the first + argument specified to the function is pushed last).} + + \item{The caller then executes a \code{CALL} instruction to pass control + to the callee. This \code{CALL} is either near or far depending on the + memory model.} + + \item{The callee receives control, and typically (although this is not + actually necessary, in functions which do not need to access their + parameters) starts by saving the value of \code{SP} in \code{BP} so as to + be able to use \code{BP} as a base pointer to find its parameters on + the stack. However, the caller was probably doing this too, so part + of the calling convention states that \code{BP} must be preserved by + any C function. Hence the callee, if it is going to set up \code{BP} as + a \emph{\textindex{frame pointer}}, must push the previous value first.} + + \item{The callee may then access its parameters relative to \code{BP}. + The word at \code{[BP]} holds the previous value of \code{BP} as it was + pushed; the next word, at \code{[BP+2]}, holds the offset part of the + return address, pushed implicitly by \code{CALL}. In a small-model + (near) function, the parameters start after that, at \code{[BP+4]}; in + a large-model (far) function, the segment part of the return address + lives at \code{[BP+4]}, and the parameters begin at \code{[BP+6]}. The + leftmost parameter of the function, since it was pushed last, is + accessible at this offset from \code{BP}; the others follow, at + successively greater offsets. Thus, in a function such as \code{printf} + which takes a variable number of parameters, the pushing of the + parameters in reverse order means that the function knows where to + find its first parameter, which tells it the number and type of the + remaining ones.} + + \item{The callee may also wish to decrease \code{SP} further, so as to + allocate space on the stack for local variables, which will then be + accessible at negative offsets from \code{BP}.} + + \item{The callee, if it wishes to return a value to the caller, should + leave the value in \code{AL}, \code{AX} or \code{DX:AX} depending + on the size of the value. Floating-point results are sometimes + (depending on the compiler) returned in \code{ST0}.} + + \item{Once the callee has finished processing, it restores \code{SP} from + \code{BP} if it had allocated local stack space, then pops the previous + value of \code{BP}, and returns via \code{RETN} or \code{RETF} depending on + memory model.} + + \item{When the caller regains control from the callee, the function + parameters are still on the stack, so it typically adds an immediate + constant to \code{SP} to remove them (instead of executing a number of + slow \code{POP} instructions). Thus, if a function is accidentally + called with the wrong number of parameters due to a prototype + mismatch, the stack will still be returned to a sensible state since + the caller, which \emph{knows} how many parameters it pushed, does the + removing.} +\end{itemize} + +It is instructive to compare this calling convention with that for +Pascal programs (described in \nref{16bpfunc}). Pascal has +a simpler convention, since no functions have variable numbers of parameters. +Therefore the callee knows how many parameters it should have been +passed, and is able to deallocate them from the stack itself by +passing an immediate argument to the \code{RET} or \code{RETF} +instruction, so the caller does not have to do it. Also, the +parameters are pushed in left-to-right order, not right-to-left, +which means that a compiler can give better guarantees about +sequence points without performance suffering. + +Thus, you would define a function in C style in the following way. +The following example is for small model: + +\begin{lstlisting} +global _myfunc + +_myfunc: + push bp + mov bp,sp + sub sp,0x40 ; 64 bytes of local stack space + mov bx,[bp+4] ; first parameter to function + + ; some more code + + mov sp,bp ; undo "sub sp,0x40" above + pop bp + ret +\end{lstlisting} + +For a large-model function, you would replace \code{RET} by \code{RETF}, +and look for the first parameter at \code{[BP+6]} instead of +\code{[BP+4]}. Of course, if one of the parameters is a pointer, then +the offsets of \emph{subsequent} parameters will change depending on +the memory model as well: far pointers take up four bytes on the +stack when passed as a parameter, whereas near pointers take up two. + +At the other end of the process, to call a C function from your +assembly code, you would do something like this: + +\begin{lstlisting} +extern _printf + ; and then, further down... + + push word [myint] ; one of my integer variables + push word mystring ; pointer into my data segment + call _printf + add sp,byte 4 ; `byte' saves space + + ; then those data items... +segment _DATA + +myint dw 1234 +mystring db 'This number -> %d <- should be 1234',10,0 +\end{lstlisting} + +This piece of code is the small-model assembly equivalent of the C +code + +\begin{lstlisting} + int myint = 1234; + printf("This number -> %d <- should be 1234\n", myint); +\end{lstlisting} + +In large model, the function-call code might look more like this. In +this example, it is assumed that \code{DS} already holds the segment +base of the segment \code{\_DATA}. If not, you would have to initialize +it first. + +\begin{lstlisting} + push word [myint] + push word seg mystring ; Now push the segment, and... + push word mystring ; ... offset of "mystring" + call far _printf + add sp,byte 6 +\end{lstlisting} + +The integer value still takes up one word on the stack, since large +model does not affect the size of the \code{int} data type. The first +argument (pushed last) to \code{printf}, however, is a data pointer, +and therefore has to contain a segment and offset part. The segment +should be stored second in memory, and therefore must be pushed +first. (Of course, \code{PUSH DS} would have been a shorter instruction +than \code{PUSH WORD SEG mystring}, if \code{DS} was set up as the above +example assumed.) Then the actual call becomes a far call, since +functions expect far calls in large model; and \code{SP} has to be +increased by 6 rather than 4 afterwards to make up for the extra +word of parameters. + +\xsubsection{16cdata}{Accessing Data Items} + +To get at the contents of C variables, or to declare variables which +C can access, you need only declare the names as \code{GLOBAL} or +\code{EXTERN}. (Again, the names require leading underscores, as stated +in \nref{16cunder}.) Thus, a C variable declared as \code{int i} +can be accessed from assembler as + +\begin{lstlisting} +extern _i + + mov ax,[_i] +\end{lstlisting} + +And to declare your own integer variable which C programs can access +as \code{extern int j}, you do this (making sure you are assembling in +the \code{\_DATA} segment, if necessary): + +\begin{lstlisting} +global _j + +_j dw 0 +\end{lstlisting} + +To access a C array, you need to know the size of the components of +the array. For example, \code{int} variables are two bytes long, so if +a C program declares an array as \code{int a[10]}, you can access +\code{a[3]} by coding \code{mov ax,[\_a+6]}. (The byte offset 6 is obtained +by multiplying the desired array index, 3, by the size of the array +element, 2.) The sizes of the C base types in 16-bit compilers are: +1 for \code{char}, 2 for \code{short} and \code{int}, 4 for \code{long} +and \code{float}, and 8 for \code{double}. + +To access a C \textindex{data structure}, you need to know the offset from +the base of the structure to the field you are interested in. You +can either do this by converting the C structure definition into a +NASM structure definition (using \codeindex{STRUC}), or by calculating the +one offset and using just that. + +To do either of these, you should read your C compiler's manual to +find out how it organizes data structures. NASM gives no special +alignment to structure members in its own \code{STRUC} macro, so you +have to specify alignment yourself if the C compiler generates it. +Typically, you might find that a structure like + +\begin{lstlisting} +struct { + char c; + int i; +} foo; +\end{lstlisting} + +might be four bytes long rather than three, since the \code{int} field +would be aligned to a two-byte boundary. However, this sort of +feature tends to be a configurable option in the C compiler, either +using command-line options or \code{\#pragma} lines, so you have to find +out how your own compiler does it. + +\xsubsection{16cmacro}{\codeindex{c16.mac}: Helper Macros for the 16-bit C Interface} + +Included in the NASM archives, in the \index{misc subdirectory}\code{misc} +directory, is a file \code{c16.mac} of macros. It defines three macros: +\codeindex{proc}, \codeindex{arg} and \codeindex{endproc}. These are intended +to be used for C-style procedure definitions, and they automate a lot of +the work involved in keeping track of the calling convention. + +(An alternative, TASM compatible form of \code{arg} is also now built +into NASM's preprocessor. See \nref{stackrel} for details.) + +An example of an assembly function using the macro set is given +here: + +\begin{lstlisting} +proc _nearproc +%$i arg +%$j arg + mov ax,[bp + %$i] + mov bx,[bp + %$j] + add ax,[bx] +endproc +\end{lstlisting} + +This defines \code{\_nearproc} to be a procedure taking two arguments, +the first (\code{i}) an integer and the second (\code{j}) a pointer to an +integer. It returns \code{i + *j}. + +Note that the \code{arg} macro has an \code{EQU} as the first line of its +expansion, and since the label before the macro call gets prepended +to the first line of the expanded macro, the \code{EQU} works, defining +\code{\%\$i} to be an offset from \code{BP}. A context-local variable is +used, local to the context pushed by the \code{proc} macro and popped +by the \code{endproc} macro, so that the same argument name can be used +in later procedures. Of course, you don't \emph{have} to do that. + +The macro set produces code for near functions (tiny, small and +compact-model code) by default. You can have it generate far +functions (medium, large and huge-model code) by means of coding +\indexcode{FARCODE}\code{\%define FARCODE}. This changes the kind of +return instruction generated by \code{endproc}, and also changes the +starting point for the argument offsets. The macro set contains no +intrinsic dependency on whether data pointers are far or not. + +\code{arg} can take an optional parameter, giving the size of the +argument. If no size is given, 2 is assumed, since it is likely that +many function parameters will be of type \code{int}. + +The large-model equivalent of the above function would look like this: + +\begin{lstlisting} +%define FARCODE + +proc _farproc +%$i arg +%$j arg 4 + mov ax,[bp + %$i] + mov bx,[bp + %$j] + mov es,[bp + %$j + 2] + add ax,[bx] +endproc +\end{lstlisting} + +This makes use of the argument to the \code{arg} macro to define a +parameter of size 4, because \code{j} is now a far pointer. When we +load from \code{j}, we must load a segment and an offset. + +\xsection{16bp}{Interfacing to \textindex{Borland Pascal} Programs} + +Interfacing to Borland Pascal programs is similar in concept to +interfacing to 16-bit C programs. The differences are: + +\begin{itemize} + \item{The leading underscore required for interfacing to C programs is + not required for Pascal.} + + \item{The memory model is always large: functions are far, data + pointers are far, and no data item can be more than 64K long. + (Actually, some functions are near, but only those functions that + are local to a Pascal unit and never called from outside it. All + assembly functions that Pascal calls, and all Pascal functions that + assembly routines are able to call, are far.) However, all static + data declared in a Pascal program goes into the default data + segment, which is the one whose segment address will be in \code{DS} + when control is passed to your assembly code. The only things that + do not live in the default data segment are local variables (they + live in the stack segment) and dynamically allocated variables. All + data \emph{pointers}, however, are far.} + + \item{The function calling convention is different - described below.} + + \item{Some data types, such as strings, are stored differently.} + + \item{There are restrictions on the segment names you are allowed to + use - Borland Pascal will ignore code or data declared in a segment + it doesn't like the name of. The restrictions are described below.} +\end{itemize} + +\xsubsection{16bpfunc}{The Pascal Calling Convention} + +\index{functions!Pascal calling convention}\index{Pascal calling +convention}The 16-bit Pascal calling convention is as follows. In +the following description, the words \emph{caller} and \emph{callee} are +used to denote the function doing the calling and the function which +gets called. + +\begin{itemize} + \item{The caller pushes the function's parameters on the stack, one + after another, in normal order (left to right, so that the first + argument specified to the function is pushed first).} + + \item{The caller then executes a far \code{CALL} instruction to pass + control to the callee.} + + \item{The callee receives control, and typically (although this is not + actually necessary, in functions which do not need to access their + parameters) starts by saving the value of \code{SP} in \code{BP} so as to + be able to use \code{BP} as a base pointer to find its parameters on + the stack. However, the caller was probably doing this too, so part + of the calling convention states that \code{BP} must be preserved by + any function. Hence the callee, if it is going to set up \code{BP} as a + \textindex{frame pointer}, must push the previous value first.} + + \item{The callee may then access its parameters relative to \code{BP}. + The word at \code{[BP]} holds the previous value of \code{BP} as it was + pushed. The next word, at \code{[BP+2]}, holds the offset part of the + return address, and the next one at \code{[BP+4]} the segment part. The + parameters begin at \code{[BP+6]}. The rightmost parameter of the + function, since it was pushed last, is accessible at this offset + from \code{BP}; the others follow, at successively greater offsets.} + + \item{The callee may also wish to decrease \code{SP} further, so as to + allocate space on the stack for local variables, which will then be + accessible at negative offsets from \code{BP}.} + + \item{The callee, if it wishes to return a value to the caller, should + leave the value in \code{AL}, \code{AX} or \code{DX:AX} depending on + the size of the value. Floating-point results are returned in \code{ST0}. + Results of type \code{Real} (Borland's own custom floating-point data + type, not handled directly by the FPU) are returned in \code{DX:BX:AX}. + To return a result of type \code{String}, the caller pushes a pointer + to a temporary string before pushing the parameters, and the callee + places the returned string value at that location. The pointer is + not a parameter, and should not be removed from the stack by the + \code{RETF} instruction.} + + \item{Once the callee has finished processing, it restores \code{SP} from + \code{BP} if it had allocated local stack space, then pops the previous + value of \code{BP}, and returns via \code{RETF}. It uses the form of + \code{RETF} with an immediate parameter, giving the number of bytes + taken up by the parameters on the stack. This causes the parameters + to be removed from the stack as a side effect of the return + instruction.} + + \item{When the caller regains control from the callee, the function + parameters have already been removed from the stack, so it needs to + do nothing further.} +\end{itemize} + +Thus, you would define a function in Pascal style, taking two +\code{Integer}-type parameters, in the following way: + +\begin{lstlisting} +global myfunc + +myfunc: + push bp + mov bp,sp + sub sp,0x40 ; 64 bytes of local stack space + mov bx,[bp+8] ; first parameter to function + mov bx,[bp+6] ; second parameter to function + + ; some more code + + mov sp,bp ; undo "sub sp,0x40" above + pop bp + retf 4 ; total size of params is 4 +\end{lstlisting} + +At the other end of the process, to call a Pascal function from your +assembly code, you would do something like this: + +\begin{lstlisting} +extern SomeFunc + ; and then, further down... + push word seg mystring ; Now push the segment, and... + push word mystring ; ... offset of "mystring" + push word [myint] ; one of my variables + call far SomeFunc +\end{lstlisting} + +This is equivalent to the Pascal code + +\begin{lstlisting} +procedure SomeFunc(String: PChar; Int: Integer); + SomeFunc(@mystring, myint); +\end{lstlisting} + +\xsubsection{16bpseg}{Borland Pascal Segment Name Restrictions} +\index{segment names!Borland Pascal} + +Since Borland Pascal's internal unit file format is completely +different from \code{OBJ}, it only makes a very sketchy job of actually +reading and understanding the various information contained in a +real \code{OBJ} file when it links that in. Therefore an object file +intended to be linked to a Pascal program must obey a number of +restrictions: + +\begin{itemize} + \item{Procedures and functions must be in a segment whose name is + either \code{CODE}, \code{CSEG}, or something ending in + \code{\_TEXT}.} + + \item{initialized data must be in a segment whose name is either + \code{CONST} or something ending in \code{\_DATA}.} + + \item{Uninitialized data must be in a segment whose name is either + \code{DATA}, \code{DSEG}, or something ending in \code{\_BSS}.} + + \item{Any other segments in the object file are completely ignored. + \code{GROUP} directives and segment attributes are also ignored.} +\end{itemize} + +\xsubsection{16bpmacro}{Using \codeindex{c16.mac} With Pascal Programs} + +The \code{c16.mac} macro package, described in \nref{16cmacro}, +can also be used to simplify writing functions to be called from Pascal +programs, if you code \indexcode{PASCAL}\code{\%define PASCAL}. This +definition ensures that functions are far (it implies \codeindex{FARCODE}), +and also causes procedure return instructions to be generated with +an operand. + +Defining \code{PASCAL} does not change the code which calculates the +argument offsets; you must declare your function's arguments in +reverse order. For example: + +\begin{lstlisting} +%define PASCAL + +proc _pascalproc +%$j arg 4 +%$i arg + mov ax,[bp + %$i] + mov bx,[bp + %$j] + mov es,[bp + %$j + 2] + add ax,[bx] +endproc +\end{lstlisting} + +This defines the same routine, conceptually, as the example in +\nref{16cmacro}: it defines a function taking two arguments, +an integer and a pointer to an integer, which returns the sum of +the integer and the contents of the pointer. The only difference +between this code and the large-model C version is that \code{PASCAL} +is defined instead of \code{FARCODE}, and that the arguments are +declared in reverse order. diff --git a/doc/latex/src/32bit.tex b/doc/latex/src/32bit.tex new file mode 100644 index 00000000..47c27466 --- /dev/null +++ b/doc/latex/src/32bit.tex @@ -0,0 +1,539 @@ +% +% vim: ts=4 sw=4 et +% +\xchapter{32bit}{Writing 32-bit Code (Unix, Win32, DJGPP)} + +This chapter attempts to cover some of the common issues involved +when writing 32-bit code, to run under \textindex{Win32} or Unix, +or to be linked with C code generated by a Unix-style C compiler such as +\textindex{DJGPP}. It covers how to write assembly code to interface with +32-bit C routines, and how to write position-independent code for +shared libraries. + +Almost all 32-bit code, and in particular all code running under +\code{Win32}, \code{DJGPP} or any of the PC Unix variants, runs in +\index{flat memory model}\emph{flat} memory model. This means that +the segment registers and paging have already been set up to give +you the same 32-bit 4Gb address space no matter what segment you +work relative to, and that you should ignore all segment registers +completely. When writing flat-model application code, you never +need to use a segment override or modify any segment register, +and the code-section addresses you pass to \code{CALL} and +\code{JMP} live in the same address space as the data-section addresses +you access your variables by and the stack-section addresses you access +local variables and procedure parameters by. Every address is 32 bits +long and contains only an offset part. + +\xsection{32c}{Interfacing to 32-bit C Programs} + +A lot of the discussion in \nref{16c}, about interfacing to +16-bit C programs, still applies when working in 32 bits. The absence of +memory models or segmentation worries simplifies things a lot. + +\xsubsection{32cunder}{External Symbol Names} + +Most 32-bit C compilers share the convention used by 16-bit +compilers, that the names of all global symbols (functions or data) +they define are formed by prefixing an underscore to the name as it +appears in the C program. However, not all of them do: the \code{ELF} +specification states that C symbols do \emph{not} have a leading +underscore on their assembly-language names. + +The older Linux \code{a.out} C compiler, all \code{Win32} compilers, +\code{DJGPP}, and \code{NetBSD} and \code{FreeBSD}, all use the leading +underscore; for these compilers, the macros \code{cextern} and +\code{cglobal}, as given in \nref{16cunder}, will still work. +For \code{ELF}, though, the leading underscore should not be used. + +See also \nref{opt-pfix}. + +\xsubsection{32cfunc}{Function Definitions and Function Calls} + +\index{functions!C calling convention}The \textindex{C calling convention} +in 32-bit programs is as follows. In the following description, +the words \emph{caller} and \emph{callee} are used to denote +the function doing the calling and the function which gets called. + +\begin{itemize} + \item{The caller pushes the function's parameters on the stack, one + after another, in reverse order (right to left, so that the first + argument specified to the function is pushed last).} + + \item{The caller then executes a near \code{CALL} instruction to pass + control to the callee.} + + \item{The callee receives control, and typically (although this + is not actually necessary, in functions which do not need to + access their parameters) starts by saving the value of \code{ESP} + in \code{EBP} so as to be able to use \code{EBP} as a base pointer + to find its parameters on the stack. However, the caller was + probably doing this too, so part of the calling convention states + that \code{EBP} must be preserved by any C function. Hence the + callee, if it is going to set up \code{EBP} as a \textindex{frame + pointer}, must push the previous value first.} + + \item{The callee may then access its parameters relative to \code{EBP}. + The doubleword at \code{[EBP]} holds the previous value of + \code{EBP} as it was pushed; the next doubleword, at \code{[EBP+4]}, + holds the return address, pushed implicitly by \code{CALL}. + The parameters start after that, at \code{[EBP+8]}. The leftmost + parameter of the function, since it was pushed last, is accessible + at this offset from \code{EBP}; the others follow, at successively + greater offsets. Thus, in a function such as \code{printf} which + takes a variable number of parameters, the pushing of the + parameters in reverse order means that the function knows where + to find its first parameter, which tells it the number and type + of the remaining ones.} + + \item{The callee may also wish to decrease \code{ESP} further, so as + to allocate space on the stack for local variables, which will + then be accessible at negative offsets from \code{EBP}.} + + \item{The callee, if it wishes to return a value to the caller, + should leave the value in \code{AL}, \code{AX} or \code{EAX} + depending on the size of the value. Floating-point results + are typically returned in \code{ST0}.} + + \item{Once the callee has finished processing, it restores + \code{ESP} from \code{EBP} if it had allocated local stack space, + then pops the previous value of \code{EBP}, and returns via + \code{RET} (equivalently, \code{RETN}).} + + \item{When the caller regains control from the callee, the function + parameters are still on the stack, so it typically adds an + immediate constant to \code{ESP} to remove them (instead of + executing a number of slow \code{POP} instructions). Thus, + if a function is accidentally called with the wrong number + of parameters due to a prototype mismatch, the stack will + still be returned to a sensible state since the caller, which + \emph{knows} how many parameters it pushed, does the + removing.} +\end{itemize} + +There is an alternative calling convention used by Win32 programs +for Windows API calls, and also for functions called \emph{by} the +Windows API such as window procedures: they follow what Microsoft +calls the \code{\_\_stdcall} convention. This is slightly closer to the +Pascal convention, in that the callee clears the stack by passing a +parameter to the \code{RET} instruction. However, the parameters are +still pushed in right-to-left order. + +Thus, you would define a function in C style in the following way: + +\begin{lstlisting} +global _myfunc + +_myfunc: + push ebp + mov ebp,esp + sub esp,0x40 ; 64 bytes of local stack space + mov ebx,[ebp+8] ; first parameter to function + + ; some more code + + leave ; mov esp,ebp / pop ebp + ret +\end{lstlisting} + +At the other end of the process, to call a C function from your +assembly code, you would do something like this: + +\begin{lstlisting} +extern _printf + + ; and then, further down... + + push dword [myint] ; one of my integer variables + push dword mystring ; pointer into my data segment + call _printf + add esp,byte 8 ; `byte' saves space + + ; then those data items... + +segment _DATA + +myint dd 1234 +mystring db 'This number -> %d <- should be 1234',10,0 +\end{lstlisting} + +This piece of code is the assembly equivalent of the C code + +\begin{lstlisting} + int myint = 1234; + printf("This number -> %d <- should be 1234\n", myint); +\end{lstlisting} + +\xsubsection{32cdata}{Accessing Data Items} + +To get at the contents of C variables, or to declare variables which +C can access, you need only declare the names as \code{GLOBAL} or +\code{EXTERN}. (Again, the names require leading underscores, as stated +in \nref{32cunder}.) Thus, a C variable declared as \code{int i} +can be accessed from assembler as + +\begin{lstlisting} + extern _i + mov eax,[_i] +\end{lstlisting} + +And to declare your own integer variable which C programs can access +as \code{extern int j}, you do this (making sure you are assembling in +the \code{\_DATA} segment, if necessary): + +\begin{lstlisting} + global _j +_j dd 0 +\end{lstlisting} + +To access a C array, you need to know the size of the components of +the array. For example, \code{int} variables are four bytes long, so if +a C program declares an array as \code{int a[10]}, you can access +\code{a[3]} by coding \code{mov ax,[\_a+12]}. (The byte offset 12 is +obtained by multiplying the desired array index, 3, by the size of +the array element, 4.) The sizes of the C base types in 32-bit compilers +are: 1 for \code{char}, 2 for \code{short}, 4 for \code{int}, \code{long} +and \code{float}, and 8 for \code{double}. Pointers, being 32-bit +addresses, are also 4 bytes long. + +To access a C \textindex{data structure}, you need to know the offset from +the base of the structure to the field you are interested in. You +can either do this by converting the C structure definition into a +NASM structure definition (using \code{STRUC}), or by calculating the +one offset and using just that. + +To do either of these, you should read your C compiler's manual to +find out how it organizes data structures. NASM gives no special +alignment to structure members in its own \codeindex{STRUC} macro, +so you have to specify alignment yourself if the C compiler generates it. +Typically, you might find that a structure like + +\begin{lstlisting} +struct { + char c; + int i; +} foo; +\end{lstlisting} + +might be eight bytes long rather than five, since the \code{int} field +would be aligned to a four-byte boundary. However, this sort of +feature is sometimes a configurable option in the C compiler, either +using command-line options or \code{\#pragma} lines, so you have to find +out how your own compiler does it. + +\xsubsection{32cmacro}{\codeindex{c32.mac}: Helper Macros for the 32-bit C Interface} + +Included in the NASM archives, in the \index{misc directory}\code{misc} +directory, is a file \code{c32.mac} of macros. It defines three macros: +\codeindex{proc}, \codeindex{arg} and \codeindex{endproc}. These are +intended to be used for C-style procedure definitions, and they automate +a lot of the work involved in keeping track of the calling convention. + +An example of an assembly function using the macro set is given +here: + +\begin{lstlisting} +proc _proc32 +%$i arg +%$j arg + mov eax,[ebp + %$i] + mov ebx,[ebp + %$j] + add eax,[ebx] +endproc +\end{lstlisting} + +This defines \code{\_proc32} to be a procedure taking two arguments, the +first (\code{i}) an integer and the second (\code{j}) a pointer to an +integer. It returns \code{i + *j}. + +Note that the \code{arg} macro has an \code{EQU} as the first line of its +expansion, and since the label before the macro call gets prepended +to the first line of the expanded macro, the \code{EQU} works, defining +\code{\%\$i} to be an offset from \code{BP}. A context-local variable is +used, local to the context pushed by the \code{proc} macro and popped +by the \code{endproc} macro, so that the same argument name can be used +in later procedures. Of course, you don't \emph{have} to do that. + +\code{arg} can take an optional parameter, giving the size of the +argument. If no size is given, 4 is assumed, since it is likely that +many function parameters will be of type \code{int} or pointers. + +\xsection{picdll}{Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF} +\index{Shared Libraries} + +\code{ELF} replaced the older \code{a.out} object file format under Linux +because it contains support for \textindex{position-independent code} +(\textindex{PIC}), which makes writing shared libraries much easier. NASM +supports the \code{ELF} position-independent code features, so you can +write Linux \code{ELF} shared libraries in NASM. + +\textindex{NetBSD}, and its close cousins \textindex{FreeBSD} and +\textindex{OpenBSD}, take a different approach by hacking PIC support +into the \code{a.out} format. NASM supports this as the \codeindex{aoutb} +output format, so you can write \textindex{BSD} shared libraries in +NASM too. + +The operating system loads a PIC shared library by memory-mapping +the library file at an arbitrarily chosen point in the address space +of the running process. The contents of the library's code section +must therefore not depend on where it is loaded in memory. + +Therefore, you cannot get at your variables by writing code like +this: + +\begin{lstlisting} + mov eax,[myvar] ; WRONG +\end{lstlisting} + +Instead, the linker provides an area of memory called the +\textindex{global offset table}, or \textindex{GOT}; the GOT is situated +at a constant distance from your library's code, so if you can find out +where your library is loaded (which is typically done using a \code{CALL} +and \code{POP} combination), you can obtain the address of the GOT, and +you can then load the addresses of your variables out of linker-generated +entries in the GOT. + +The \emph{data} section of a PIC shared library does not have these +restrictions: since the data section is writable, it has to be +copied into memory anyway rather than just paged in from the library +file, so as long as it's being copied it can be relocated too. So +you can put ordinary types of relocation in the data section without +too much worry (but see \nref{picglobal} for a caveat). + +\xsubsection{picgot}{Obtaining the Address of the GOT} + +Each code module in your shared library should define the GOT as an +external symbol: + +\begin{lstlisting} +extern _GLOBAL_OFFSET_TABLE_ ; in ELF +extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out +\end{lstlisting} + +At the beginning of any function in your shared library which plans +to access your data or BSS sections, you must first calculate the +address of the GOT. This is typically done by writing the function +in this form: + +\begin{lstlisting} +func: + push ebp + mov ebp,esp + push ebx + call .get_GOT +.get_GOT: + pop ebx + add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc + + ; the function body comes here + + mov ebx,[ebp-4] + mov esp,ebp + pop ebp + ret +\end{lstlisting} + +(For BSD, again, the symbol \code{\_GLOBAL\_OFFSET\_TABLE} requires a +second leading underscore.) + +The first two lines of this function are simply the standard C +prologue to set up a stack frame, and the last three lines are +standard C function epilogue. The third line, and the fourth to last +line, save and restore the \code{EBX} register, because PIC shared +libraries use this register to store the address of the GOT. + +The interesting bit is the \code{CALL} instruction and the following +two lines. The \code{CALL} and \code{POP} combination obtains the address +of the label \code{.get\_GOT}, without having to know in advance where +the program was loaded (since the \code{CALL} instruction is encoded +relative to the current position). The \code{ADD} instruction makes use +of one of the special PIC relocation types: \textindex{GOTPC relocation}. +With the \codeindex{WRT ..gotpc} qualifier specified, the symbol +referenced (here \code{\_GLOBAL\_OFFSET\_TABLE\_}, the special symbol +assigned to the GOT) is given as an offset from the beginning of the +section. (Actually, \code{ELF} encodes it as the offset from the operand +field of the \code{ADD} instruction, but NASM simplifies this +deliberately, so you do things the same way for both \code{ELF} and +\code{BSD}.) So the instruction then \emph{adds} the beginning of the +section, to get the real address of the GOT, and subtracts the value of +\code{.get\_GOT} which it knows is in \code{EBX}. Therefore, by the time +that instruction has finished, \code{EBX} contains the address of the GOT. + +If you didn't follow that, don't worry: it's never necessary to +obtain the address of the GOT by any other means, so you can put +those three instructions into a macro and safely ignore them: + +\begin{lstlisting} +%macro get_GOT 0 + call %%getgot +%%getgot: + pop ebx + add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc +%endmacro +\end{lstlisting} + +\xsubsection{piclocal}{Finding Your Local Data Items} + +Having got the GOT, you can then use it to obtain the addresses of +your data items. Most variables will reside in the sections you have +declared; they can be accessed using the \index{GOTOFF relocation} +\code{..gotoff} special \indexcode{WRT ..gotoff}\code{WRT} type. The +way this works is like this: + +\begin{lstlisting} + lea eax,[ebx+myvar wrt ..gotoff] +\end{lstlisting} + +The expression \code{myvar wrt ..gotoff} is calculated, when the shared +library is linked, to be the offset to the local variable \code{myvar} +from the beginning of the GOT. Therefore, adding it to \code{EBX} as +above will place the real address of \code{myvar} in \code{EAX}. + +If you declare variables as \code{GLOBAL} without specifying a size for +them, they are shared between code modules in the library, but do +not get exported from the library to the program that loaded it. +They will still be in your ordinary data and BSS sections, so you +can access them in the same way as local variables, using the above +\code{..gotoff} mechanism. + +Note that due to a peculiarity of the way BSD \code{a.out} format +handles this relocation type, there must be at least one non-local +symbol in the same section as the address you're trying to access. + +\xsubsection{picextern}{Finding External and Common Data Items} + +If your library needs to get at an external variable (ext... [truncated message content] |