Re: [sdcc-devel] Dissection of a compiler

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, 18 Dec 2006, Jan Waclawek wrote:

> Erik, thanks for your response. I (hopefully) managed to subscribe to
> the -devel list (although I still have an another mysterious subscription
> which I cannot get rid of...)

I told the mailing list to forward your messages without requiring manual
approval; this might be the source of the extra subscription although it
shouldn't be sending you multiple copies of the messages.

> - do all compilation steps process the whole source file before passing to
> the next step? (My guess is - not, this is valid only for preprocessing,
> assembling and linking - the rest is called from the parser
> function-wise???).

The preprocessing and the rest of the compiling (but not the linking and
assembling) effectively operate in parallel. Although the preprocessor
runs as a separate program, its output is piped back to the main compiler
which parses the preprocessed results as they are generated.

> preprocessor:
> - the preprocessor apparently uses a different way of interpreting the
> source ("lexer") - is this true? does this pose any problem when making
> extensions?
> - can be the preprocessor simplified? apparently it is called with a fixed
> set of options - true?)
> - what is the output of preprocessor? (= input to parser?)
> - any new substantial work on the preprocessor?

The preprocessor was recently updated to a newer (but still old) gcc
preprocessor. The preprocessor isn't one of my specialties, so I don't
know much beyond that.

> parser:
> - any other sources for the parser (.y, .lex, port->keywords)?

The only other thing I can think of is there's a hook that allows a port
to look at the #pragma directives; it can parse that however it likes.

> - any further reading on AST?

The concept of the annotated syntax tree is discussed in many compiler
textbooks. The implementation details, however, can widely vary because
they are dependent on the language's syntax and what sort of annotations
are desired/useful. The AST is essentially just a data structure to
represent a program's source code so that the rest of the compiler doesn't
have to worry about the details of parsing.

> - what is the purpose of the .y and the .lex? Any documentation for the
> syntax?

The .lex file generates the lexer which is what recognizes groups of
characters as "words" (tokens). The .y file generates the parser which
takes the output of the lexer and attempts to fit them into the defined
grammar. The traditional programs that processed these files are lex and
yacc; common alternatives are flex and bison.

I have a paper copy of _lex & yacc_ (ISBN: 1-56592-000-7) that I use for
reference. Some online references can also be found at:

  http://dinosaur.compilertools.net/

> - any new substantial work on the parser?

I'm working on updating the grammar to handle the inline and restrict
keywords.

> What is the next step in processing of the code?

After each function is parsed: 1) the AST for the function is converted to
intermediate code, 2) processor independent optimizations are performed on
the intermediate code, 3) processor dependent optimizations are performed
on the intermediate code and register usage is determined, 4) assembly
code is generated from the intermediate code, and 5) the assembly code is
peephole optimized.

After reaching the end of the source file, the initializers for any
non-const global or static variables also go through these same steps.

   Erik

Re: [sdcc-devel] Dissection of a compiler

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Re: [sdcc-devel] Dissection of a compiler