Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Tree [15c265] default tip /
History



File Date Author Commit
NBS 2014-07-15 gatewood gatewood [87ea3b] more typo fixes
NBS_compile_output 2014-06-29 gatewood gatewood [2acae7] update NBS test 199 output now that scanner det...
NBS_run_input 2014-04-04 gatewood gatewood [140de6] Correct NBS test 203 input/output test files
NBS_run_output 2014-07-15 gatewood gatewood [87ea3b] more typo fixes
dgay 2014-03-13 gatewood gatewood [12007d] update both versions of g_fmt() to use 15 digit...
tests 2014-07-17 gatewood gatewood [0b5d56] add some actual math problem examples
.hgtags 2014-07-17 gatewood gatewood [272cd1] This is bugfix release 1.3
BASICC 2014-06-27 gatewood gatewood [9f62ac] explicitely state we use att syntax and mneomic...
BASICC.1 2014-04-08 gatewood gatewood [7ccada] add man pages for the compile scripts
BASICCS 2014-06-27 gatewood gatewood [9f62ac] explicitely state we use att syntax and mneomic...
BASICCS.1 2014-04-08 gatewood gatewood [7ccada] add man pages for the compile scripts
BASICCW 2014-06-27 gatewood gatewood [9f62ac] explicitely state we use att syntax and mneomic...
BASICCW.1 2014-04-08 gatewood gatewood [7ccada] add man pages for the compile scripts
COPYING 2014-07-15 gatewood gatewood [330c56] convert tabs to spaces
ChangeLog 4 days ago gatewood gatewood [668ada] document latest changes
ECMA-55.TXT 2014-02-07 gatewood gatewood [444bfe] fix many spelling errors
ECMA55-slideshow.odp 2014-04-02 gatewood gatewood [1bd450] update slideshow with info about test harness a...
ECMA55-slideshow.pdf 2014-04-02 gatewood gatewood [1bd450] update slideshow with info about test harness a...
INSTALL 2014-04-08 gatewood gatewood [6956fc] add install support
Makefile 2014-07-17 gatewood gatewood [7b7eb6] tweak debug compilation settings
NEWS 2014-07-17 gatewood gatewood [0cffa2] bump version
README 4 days ago gatewood gatewood [ef843f] Document use of static analyzers
codegen.c 4 days ago gatewood gatewood [47e3d9] more const, fix wrong function names in error m...
codegen.h 4 days ago gatewood gatewood [a62a7d] add missing header file includes
datum.dot 2014-02-27 gatewood gatewood [474145] fix bug in INPUT FSM where when we saw a number...
dtoa5.s 2014-02-10 gatewood gatewood [aeb711] activate errno support for strtod()
dtoa5_normal.c 4 days ago gatewood gatewood [ba4cd1] use NULL, not 0, for pointers to please sparse
dtoa5_normal.h 2014-06-30 gatewood gatewood [97b0f3] add missing header guard macros
dumpregs.s 2014-02-08 gatewood gatewood [d2c8ca] FZ is really FTZ; fix display of that flag
ecma55.1 2014-04-04 gatewood gatewood [7f8edb] documentation update
g_fmt_BASIC.s 2014-04-04 gatewood gatewood [10cc46] update g_fmt() in g_fmt_BASIC.s to have a param...
g_fmt_BASIC_normal.c 2014-04-04 gatewood gatewood [a0885f] update g_fmt() in g_fmt_BASIC_normal.[ch] to ha...
g_fmt_BASIC_normal.h 2014-06-30 gatewood gatewood [97b0f3] add missing header guard macros
globals.c 4 days ago gatewood gatewood [24a1e2] more static, const, and cleanups
globals.h 4 days ago gatewood gatewood [24a1e2] more static, const, and cleanups
lineno.c 4 days ago gatewood gatewood [d98ab5] cleanups for self-test code
lineno.h 2014-06-29 gatewood gatewood [8ed03e] only use 64bit integers when required
main.c 4 days ago gatewood gatewood [24a1e2] more static, const, and cleanups
parseinput.c 4 days ago gatewood gatewood [bf1cb2] use const and static where it makes sense
parseinput.txt 2014-04-04 gatewood gatewood [7f8edb] documentation update
parser2.c 4 days ago gatewood gatewood [24a1e2] more static, const, and cleanups
parser2.h 4 days ago gatewood gatewood [a62a7d] add missing header file includes
peephole.1 2014-04-08 gatewood gatewood [7f2411] Update peephole optimizer to remove sequences o...
peephole.c 4 days ago gatewood gatewood [66e027] remove bogus comment
robert1.c 4 days ago gatewood gatewood [fadef7] use static more to please sparse
run_tests 2014-04-08 gatewood gatewood [962a7a] add ECMA55 and PEEPHOLE environment support
scanner2.c 4 days ago gatewood gatewood [24a1e2] more static, const, and cleanups
scanner2.h 5 days ago gatewood gatewood [6d528c] use const more, add comments
stackmath.txt 2014-03-13 gatewood gatewood [285cfc] document the atack-based expression evaluation ...
structure.dot 2014-03-12 gatewood gatewood [f9e1c7] add diagram for overall structure of compiler
symbol_table.c 4 days ago gatewood gatewood [15c265] change parameter name in binary_search() to avo...
symbol_table.h 4 days ago gatewood gatewood [a62a7d] add missing header file includes
zonermore.c 4 days ago gatewood gatewood [fadef7] use static more to please sparse
zonermore.txt 2014-04-04 gatewood gatewood [7f8edb] documentation update

Read Me

[This file will only look correct if you use a fixed-width font]

This software is a compiler for 'Minimal BASIC' as specified by the ECMA-55
standard.  The target is AMD64/EM64T/x86-64 machines running a modern Linux
distribution.  This compiler will create assembly language output files.
These must be assembled into object files and linked to create an executable.
The assembly dialect used is that of GNU gas, since that will be available on
any modern general purpose x86-64 Linux distribution.  No libc or libm is used 
by the generated code, which allows creating very small executables.  To keep
the generated code small and simple, output of SIN, COS, TAN, ATAN, EXP, POW, 
LOG, RND, and RANDOMIZE is only emitted of those features are required by the 
input BASIC program.

After completing this project, I did find one other FOSS compiler that claims
to be able to handle much of ANSI Full BASIC, the BASIC Accelerator at
http://hp.vector.co.jp/authors/VA008683/english/BASICAcc.htm, but the output is
Object Pascal for the FreePascal compiler at http://www.freepascal.org/, and
not assembly.  Also, they implement only what ECMA-116 calls OPTION ARITHMETIC
NATIVE mode, which is essentially the same mode implemented in this compiler.
The same developers have created an interpreter called Decimal BASIC at
http://hp.vector.co.jp/authors/VA008683/english/ that does attempt to support
the required decimal arithmetic.  Strangely, these projects did not turn up in
normal web searches, but only when I searched for "BASIC-1 OPTION ARITHMETIC
DECIMAL".

The license for this compiler is the GNU General Public License version 2 only,
see the file included file COPYING for details.

This author of the actual compiler software is John Gatewood Ham,
and that code is available under the GPL version 2 license only.

The following NBS tests were kindly supplied by Emmanuel Roche:
56, 57, 65, 66, 67, 68, 69, 109, 117, 118, 119, 120, 121, 122, 123, 124
The rest came from the Google Books PDF files available on the Internet.

The included runtime library assembly routines for SIN, COS, TAN, ATAN, LOG,
EXP, and POW are public domain from SLEEF-2.80 (tweaked), from Naoki Shibata.
http://shibatch.sourceforge.net/

The included runtime library assembly routines for RND, and RANDOMIZE are 
public domain from ISAAC-64 (tweaked), from Bob Jenkins.
http://burtleburtle.net/bob/rand/isaacafa.html

The included runtime library assembly routines for floating point input and
output are derived from David M. Gay's dtoa.c and g_fmt.c
http://netlib.sandia.gov/fp/index.html

I wrote a special file dumpregs.s for dumping CPU registers while debugging,
and unlike the main compiler, this one file is public domain.  The compiler
does not use it, but I used it when debugging programs and include it for
other people who might work at the assembly-language level.

The ECMA-55 standard was chosen over the "ANSI X3.60-1978 minimal BASIC"
standard since it is free.  ANSI, despite canceling the standard, still
keeps the 35 year old standard locked down and available only if you pay
for it, which is a quite mean-spirited attempt to prevent any compliant
free and open source implementations from being written.  The same attitude
exists with ISO for the "ISO 6373:1984 Data processing -- Programming 
languages -- Minimal BASIC" standard.  This standard has many other names,
such as "AS 2797-1985 Programming language - Minimal BASIC", and the only
free one is ECMA-55, since all the other standards bodies are trying to
kill BASIC forever.

http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/

FILES

COPYING

   This contains a copy of the GNU GPL version 2 license for the
   compiler itself.

ChangeLog

   This contains a high-level overview of changes sorted by time in
   ascending date order with the newest changes at the end of the file.

globals.[ch]

   This contains global variables that must be shared across all modules.

scanner2.[ch]

   This is the scanner that converts the input byte stream into tokens
   for the parser.  This uses a hand-coded finite state machine.

parser2.[ch]

   This contains the parser that uses the token stream created by the
   scanner.

lineno.[ch]

   This contains the line number module.

symbol_table.[ch]

   This contains the symbol table module.

codegen.[ch]

   This contains routines that emit the GAS assembler output.  It also
   contains the runtime functions and macros.  Much of the runtime library 
   code is from other people and is not GPLv2 like the compiler itself.

main.c

   This contains the main routine that calls everything else.  It does
   the command-line argument processing, loads the input file into a
   buffer, calls the scanner to convert that into a token stream, then
   calls the parser to process the token stream.

dtoa5.s

   This contains the assembly code for David M. Gay's dtoa.c file.
   The process to generate this is in the magic.txt file in the dgay
   sub-directory.  This is used as part of a compiled BASIC program's
   runtime.  A tweaked copy of this is included in the codegen.c file.

g_fmt_BASIC.s

   This contains the assembly code for my tweaked version of 
   David M. Gay's g_fmt.c file.  The process to generate this is in 
   the magic.txt file in the dgay sub-directory.  This is used as part
   of a compiled BASIC program's runtime.  A tweaked copy of this is
   included in the codegen.c file.l

dtoa5_normal.[ch]

   This contains the C code for my tweaked version of David M. Gay's
   dtoa.c file.  This is used by the compiler to ensure it formats
   numbers in exactly the same format as the runtime.

g_fmt_BASIC_normal.[ch]

   This contains the C code for my tweaked version of David M. Gay's
   g_fmt.c file.  This is used by the compiler to ensure it formats
   numbers in exactly the same format as the runtime.

Makefile

   This is the project build file for use by the make program.

ecma55.1

   This is the man page for the compiler.

peephole.1

   This is the man page for the optimizer.

ECMA-55.TXT

   This file contains the text of the ECMA-55 standard for 
   "Minimal BASIC".  This was retyped by me from the PDF version
   both to get a smaller file and to allow easy searching.

BASICC

   This file is a script that will compile, assemble, and link
   an input program.  Note that the input program must have the
   extension '.BAS' for this script to work.

BASICCS

   This file is a script that will compile, assemble, and link
   an input program.  Note that the input program must have the
   extension '.BAS' for this script to work.  This version tells
   the compiler to generate 32bit math for the arithmetic 
   expressions, generating output more closely matching the NBS 
   Minimal BASIC test suite expectations.

dumpregs.s

   This is an assembler source file you can build and link in to an
   executable.  It contains 'dumpregs', a procedure that takes no
   arguments and returns no values but does dump the registers used
   for normal programming for this project, including the xmm registers,
   eflags, and mxcsr flags.  It does not dump the FP registers or state
   since this project uses SSE exclusively for floating point math.
   Unlike the main compiler, this file I wrote is in the public domain.
   I hope anybody who needs to code in assembler in 64bit on AMD64/EM64T
   in Linux will file it useful.
  

datum.dot

   This is the graphviz dot source file for the diagram of the finite 
   state machine used by the INPUT runtime subsystem.

parseinput.c

   This is the C source code for the INPUT runtime subsystem. Compile
   with -DTROUBLE to get a trace of the states as the transitions occur.

zonermore.c

   This is the C source code for the PRINT runtime subsystem.

robert1.c

   This is the C source code for the RND function and RANDOMIZE 
   statements.  Unlike the compiler itself, this file is in the public 
   domain and is derived from Bob Jenkin's ISAAC-64 .
   http://burtleburtle.net/bob/rand/isaacafa.html

peephole.c

   This is the very simple stand-alone peephole optimizer.  It reads
   the assembly language file generated by the compiler and generates
   a new assembly language file.  It removes any superfluous
   'pushxmm 0'/'popxmm 0' sequences.  It also removes any superfluous
   'pushsaddr'/'popsaddr %rdi' sequences.

ECMA55-slideshow.odp

   This is a slideshow generated with LibreOffice.  It gives a good
   overview of this compiler project, including the motivation, the
   overall structure, and suggestions for future work.

ECMA55-slideshow.pdf

   PDF/1A version of ECMA55-slideshow.odp.  This includes the fonts
   and should display identically on all machines with a graphical
   PDF file viewer.

run_tests

   This is the bash shell script that automatically runs the NBS tests.
   It is called from the Makefile when you do 'make check' or 
   'make check32'.  This should be run before any check in of code
   to ensure that no regressions occur.

GETTING THE CODE

If you are reading this you should have a copy of the code from a snapshot or
release tar.xz file.  Between snapshots some changes may exist only in the
upstream mercurial repository.  If you want to get the absolutely latest
version from the upstream mercurial repository, you need to do a clone
operation like this:

   hg clone http://hg.code.sf.net/p/buraphakit/MinimalBASIC/

This creates the 'MinimalBASIC' subdirectory which has the code and a local
copy of the upstream repository.  After the initial clone, assuming you didn't
modify anything, you can easily use mercurial to stay up-to-date with a
'make distclean', followed by a 'hg pull', followed by a 'hg update', followed 
by a 'make check'.  You cannot push changes upstream directly with mercurial.  
If you want to contribute a fix or improvement, please generate a patch with
'hg diff' and submit it on the SourceForge site (Support->Patches).

The SourceForge site for this project has this URL:

   http://sourceforge.net/projects/buraphakit/

Information on obtaining and using the mercurial version control software
is available from the mercurial web site which has this URL:

   http://mercurial.selenic.com/

REPORTING BUGS

If you found a bug but do not know how to fix it, please submit a bug report on
the SourceForge site (Support->Bugs).  If you know what assembly should be
generated but do not known how to modify the compiler to make that happen,
please include the .BAS program file and the assembly that should have been
generated in the bug report.  If you just found a problem but have no idea how
to fix it, please include the .BAS program file in the bug report, and explain
what you think should have happened.

EXAMPLE SESSION:

1.  CREATE SOURCE

    vi WHATEVER.BAS

2.  COMPILE IT

    ./BASICC WHATEVER.BAS

3.  RUN IT

    ./WHATEVER

You can optionally strip the executable with the 'strip' command and it
will still work, but be slightly smaller.

NOTES:

Should you want to do a leak check after modifying the code, here is
what you would do:

   make distclean all COMPILE_MODE=DEBUG
   valgrind --leak-check=full --show-leak-kinds=all --redzone-size=128 \
            --read-var-info=yes --leak-resolution=high --track-origins=yes \
            --malloc-fill=FF --free-fill=AA --num-callers=40 ./ecma55 -v BAD.BAS

The clang compiler generates code that results in valgrind giving a horrible 
tombstone at the start talking about a DIE it cannot parse, so use gcc for this.
The makefile takes care of that for you.

If you want to check for array out-of-bounds problems, you should use the address
sanitizer in clang.  You need to rebuild like this:

   make distclean all COMPILE_MODE=ASAN

Then to test the compiler on some file BOGUS.BAS, you would do this:

   export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer
   ASAN_OPTIONS=symbolize=1 ./BASICC BOGUS.BAS

While gcc has address sanitizer support, it does not have anything equivalent
to llvm-symbolizer as of gcc-4.8.2, so it is difficult to get the line number
and know the symbol names using gcc, so use clang for this.  The makefile takes
care of that for you.

To rebuild a production version with gcc, just do:

   make distclean all

Alternatively, to build a production version with clang, just do:

   make distclean all CC=clang

It is possible to compile with 'pcc' as well.  However, pcc (at least as
packaged by Fedora 20) will not build static executables, so you need to do
this:

   make distclean all LDFLAGS= CC=pcc

You can perform a static analysis with the clang static analyzer like this:

   make distclean
   scan-build make CC=clang

If problems are found, the analyzer program output tells you how to read
the test results.

If you don't have the clang static analyzer available, you can run the cppcheck
static analyzer like this:

   make distclean
   cppcheck *.[ch]

You can try the sparse static analyzer like this:

   sparse -gcc-base-dir /usr/lib/gcc/x86_64-redhat-linux/4.8.3/ -I. \
      -D__STDC_VERSION__=199901L \
      codegen.[ch] globals.[ch] lineno.[ch] \
      main.c parser2.[ch] scanner2.[ch] symbol_table.[ch] \
      g_fmt_BASIC_normal.[ch]

The sparse analyzer cannot handle dtoa5_normal.[ch] without extreme
effort.  Here is the trick that worked for me:

   sparse -gcc-base-dir /usr/lib/gcc/x86_64-redhat-linux/4.8.3/ -I. \
      -D__STDC_VERSION__=199901L \
      -D__DBL_MAX_EXP__=1024 \
      -D__DBL_DIG__=15 \
      -D__DBL_MAX_10_EXP__=308 \
      -D__FLT_RADIX__=2 \
      -DIEEE_8087 \
      -DLong=int \
      -DPRIVATE_MEM=16384 \
      dtoa5_normal.[ch]

This is not really sparse's fault; the __WHATEVER__ stuff is not in
any header file, but instead is hard-coded into the gcc compiler, so
sparse could not get the definitions itself.  The other defines are
required just by those files.

IMPLEMENTATION-DEFINED FEATURES

  ACCURACY is about 15 digits of precision 
    IEEE754 double, as implemented by Intel CPU.
    With -s switch, about 7 digits of precision
      IEEE754 single, as implemented by Intel CPU.
  END OF LINE = ASCII value 10
  SIGNIFICANCE-WIDTH = 7
    With -w switch, 16
  EXRAD-WIDTH = 2
    With -w switch, 3
  INITIAL VALUE OF VARIABLES
    numeric variables are initialized to SNaN (signalling Not-A-Number) and will 
      force an exception if they are read before they are written.
    string variables are initialized to an ASCII 21 byte, followed by "uninitialized",
      and then 4 ASCII 0 bytes.  The 21 will force an exception if they are read
      before they are written.
  INPUT-PROMPT = "? "
  LONGEST STRING THAT CAN BE RETAINED = 18
  VALUE OF MACHINE INFINITESIMAL = 2E-1074 (denormal), 2E-1022 (normal)
    With -s switch,
      2E−149 (denormal), 2E−126 (normal)
  VALUE OF MACHINE INFINITY = +/- Infinity (Intel CPU has special values for this)
  MARGIN = 73
  PRECISION is 15 digits of precision 
    IEEE754 double, as implemented by Intel CPU.
    With -s switch, about 7 digits of precision
      IEEE754 single, as implemented by Intel CPU.
  PRINT ZONE WIDTH = 15
    With -w switch, 25
  PSEUDO-RANDOM NUMBER SEQUENCE is from ISAAC-64, see robert1.c for details.
  SIGNIFICANCE WIDTH FOR PRINTING NUMERIC REPRESENTATIONS = 15
    With -s switch, 6
  BATCH MODE INPUT uses standard UNIX redirection of STDIN
  OUTPUT WIDTH = 80 columns
    WIth -w switch, 132 columns
  MAXIMUM ARRAY SUBSCRIPT VALUE = 10000

  NOTE:
    The implementation-defined numeric functions use doubles, not singles, in
    their internal representation, even with the -s switch.

DOCUMENTED BEHAVIOR

1.  Attempts to use the value of uninitialized variables will result in
    a fatal exception 'Read of uninitialized variable'.

DOCUMENTED EXTENSIONS

1.  Lower-case letters are permitted within a quoted string.
2.  Lower-case letters are permitted in a REM statement after the REM keyword.

OBTIAINING A WORKING C COMPILER 

Most testing has been done with Fedora 20 64bit on a Core 2 Duo machine.
Occasional testing is done on a Linux-from-scratch descended 64bit machine
with a Haswell i7.  The code has been regularly tested with gcc and clang.
As of June 30, 2014, it is also regularly tested with pcc.

     +---------+--------------------------------+
     |compiler |   URL to sources               |
     +---------+--------------------------------+
     |clang    |   http://llvm.org/             |
     |gcc      |   ftp://ftp.gnu.org/gnu/gcc/   |
     |pcc      |   http://pcc.ludd.ltu.se/      |
     +---------+--------------------------------+