Menu

ENIGMA_compiler

Hugh Greene

ENIGMA's compiler takes EDL and compiles it to
C++.

Process

Before the process begins, the compiler is already aware of the target
platform, the necessary make calls, and the variables, functions, and
other important definitions from the C++ engine. The process begins with
this information tucked away in global memory.

First and foremost, the compiler tosses around resource names and makes
declarations for them, adding them to a new virtual namespace allocated
for this compile. This includes minor code
generation
for instances. From there, it
begins lexing all the code and does some parse operations such as adding
semicolons (see page on Parser for details on lexer
and preliminary parsing), and then takes note of all the types that are
declared locally in each object and script. At this point, the compiler
has a structure for each event in each object and for each script,
containing the code, the lex string, and a list of the variables it
declares for each scope: globally, instance-locally, and via
dot-access.

From there, it looks at which objects make what calls to what scripts,
and which scripts call what scripts, in a complex resolution pass that
results in a list of every script that could possibly be invoked by an
object. Using that resolved list, the compiler scopes the scripts into
the appropriate objects and then starts at the bottom and works its way
up, gathering variables used by any script or event. The results of this
pass are a comprehensive list of both scripts invoked by and variables
used in each object.

Using the list of used variables and scripts for each object, the
compiler can make choices on where to scope scripts and objects, be it
solely at the global scope (using with() where
necessary), at the parent-object scope (where all objects will inherit
it), or at the individual object scope.

From there, the compiler conducts a second pass, using its newly
gathered information to resolve access routines
and other heavily context-dependent mechanisms of EDL, many of which
involve heavy code generation. Dot-based
access of form a.b, where '' 'a' '' is an integer, resolves to
either enigma::glaccess(a)->b in the case of shared or "global"
locals, or enigma::varaccess_b(a) for strict locals. It is up to the
compiler to generate these functions.

  • First, the compiler must isolate a type that will represent '' 'b.'
    '' It does this by crawling objects to find any that declare it
    explicitly.
    • If all objects agree on one type, it writes an access routine
      for it and allocates a dummy to be returned to prevent segfault.
    • If they do not agree, it bitches and does so anyway
  • The accessor function switch()es the object index. It then makes a
    case for each index that contains the correct definition of
    someVariable.
  • The default case returns the anti-segfault dummy for that type; it
    is declared exactly once in form static`` ``someTypedummy_someType;

After that, the hard work is basically done on ENIGMA's part; it uses
the lex buffer to dump the code buffer into the specific files under
ENIGMAsystem/SHELL/Preprocessor_Environment_Editable/
so it looks nice, meanwhile adding the strings and other collapsed
sections back in. At this point, it is compiled to C++, and it is just a
matter of invoking the GCC on the produced code. Native compiler
invocation is done through Make; when that process
finishes, the game is officially natively compiled.

From there, the compiler simply tacs resource data onto the end of the
executable, or where requested by
Compilers/*/compiler.ey.
If requested, the compiler will then invoke the game.

Code Generation

Getting GML to bode well with a C++ compiler is obviously impossible
without generating some additional code to be compiled with the game.
The code often fills large gaps in the ENIGMA
engine
. Among the various code pieces
generated to get a variety of games to compile are the following:

  1. A switch statement is generated for use by instance_create(). Since
    class ids cannot be enumerated in an array in C++, the switch
    statement pairs each object index with its own new statement.
  2. A framework of structures is generated for each object. Locals and
    scripts are then scoped into each structure as appropriate.
  3. An accessor function is generated for each
    local variable accessed as object.local_variable.
  4. A common-class cast is generated to allow instance_change to be
    implemented; each object has a method to cast to the common class
    and a constructor from it.

What needs done to the compiler

  1. Template type tracking: The C
    Parser
    needs to keep track of all
    template instantiations. This may involve creating an instantiation
    scope in each template, or creating an instantiation parameters list
    in each object.
  2. Default flag: All searchable objects need to have a flag set so
    a special case doesn't need made for 0xFFFFFFFF flag search in the C
    Parser.
  3. Constants and enums need flagged as such: For future items on
    this list to work, the "const" keyword needs acknowledged.
  4. Flag pair "local const" needs special treatment: Local constants
    should be initialized in the constructor instead of set inline to
    avoid errors.
  5. Local array bounds need coerced: To permit having a local array
    of variable-sized dimension, array subscripts should be determined
    to be constant or variable. Constant subscripts should remain in the
    declaration, variable subscripts should be replaced with * and
    allocated in the constructor.
  6. Switch statements need coerced: To allow for a more efficient
    switch statement, the types of the switch value and of each case
    label should be coerced. There is only one switch value type, the
    key type. Since there are typically multiple case labels, the worst
    type used in any of the switch()'s case labels will represent them
    all. The "best" type is the smallest integer type, then largest
    integer type, then any floating point type is bad, and the "worst"
    is any string or variant type. The case type is considered
    const if and only if all of the case label types are constant.
    Scenarios for (''key,case'') type pairs are as follows
    (??? indicates that the type is irrelevant, all const types are
    denoted as such):
    • (int:const int): The statement is left alone completely.
    • (???:const ???): The statement is replaced with a hash function
      and integral keys as the case labels. An if() is placed in each
      case to make sure the hash was accurate.
    • (???:???): Regardless of switch value type, if the case types
      are not all constant, the switch() must be replaced with
      consecutive if()s.
  7. Locally- and globally-declared array subscripts need special
    treatment.
    Variables marked "const" need to be declared first; of
    those, local consts need initialized via () in the constructor. It'd
    be a good idea to allow = for in-place construction and () for
    in-constructor construction.
  8. eYAML files of locals need acted upon: Ism
    presently has a mechanism by which she can look up alarms in
    separate sources. Files like the one she created manually need
    generated automatically by ENIGMA in accordance to the eYAML files
    under Extensions/.
  9. Variable tracking mechanism needs implemented: In accordance
    with the eYAML files mentioned above, a system needs implemented
    that can execute certain code at the end of events in which it is
    possible that a value may have changed. This is useful for
    establishing spacial containers for speeding up the collision
    system
    .
  10. The options in the LGM ENIGMA settings pane (and the ones that were
    requested but aren't there) need implemented. This is actually
    relatively trivial and not worth naming, but a couple not listed are
    as follows:
    • Scripts should have two modes for max efficiency; either being
      placed in the global scope and var accessed via a with(), or
      being scoped into each object that uses them (this is the
      current behavior)
    • Global array types should have two type options: pointer or var
      (many people use view_xview without an array subscript, which
      will error for int* but not for var).
    • Switch() should have an option to use strictly GML or strictly C
      methods.
    • There needs to be an option for = vs == treatment in
      conditionals and parameters.

Toolchain Calls

To allow compilation of games for all platforms, and to allow
cross-compilation, a system needed
incorporated for compiler management. Though the
About.ey files allow for some specification of
system dependencies, compilers need to be delimited in a manner in which
they can be looked up by the name of one of the three operating systems
on which the IDE can run. In other words, a directory called
Compilers/ must be kept containing a folder for
each of Windows, Linux, and MacOSX. In each of those child folders, an
eYAML file must be kept specifying fundamental
information needed to call the toolchain
executables
.


Related

Wiki: About.ey
Wiki: Accessor
Wiki: C++
Wiki: Code::Blocks
Wiki: Collision_Systems
Wiki: Compile
Wiki: Dot_access
Wiki: EDL
Wiki: ENIGMA_Compiler
Wiki: ENIGMA_engine
Wiki: Events.res
Wiki: Make
Wiki: MinGW
Wiki: Parser

MongoDB Logo MongoDB