From: Erik P. <epe...@iv...> - 2006-12-19 05:46:46
|
On Mon, 18 Dec 2006, Jan Waclawek wrote: > Erik, thanks for your response. I (hopefully) managed to subscribe to > the -devel list (although I still have an another mysterious subscription > which I cannot get rid of...) I told the mailing list to forward your messages without requiring manual approval; this might be the source of the extra subscription although it shouldn't be sending you multiple copies of the messages. > - do all compilation steps process the whole source file before passing to > the next step? (My guess is - not, this is valid only for preprocessing, > assembling and linking - the rest is called from the parser > function-wise???). The preprocessing and the rest of the compiling (but not the linking and assembling) effectively operate in parallel. Although the preprocessor runs as a separate program, its output is piped back to the main compiler which parses the preprocessed results as they are generated. > preprocessor: > - the preprocessor apparently uses a different way of interpreting the > source ("lexer") - is this true? does this pose any problem when making > extensions? > - can be the preprocessor simplified? apparently it is called with a fixed > set of options - true?) > - what is the output of preprocessor? (= input to parser?) > - any new substantial work on the preprocessor? The preprocessor was recently updated to a newer (but still old) gcc preprocessor. The preprocessor isn't one of my specialties, so I don't know much beyond that. > parser: > - any other sources for the parser (.y, .lex, port->keywords)? The only other thing I can think of is there's a hook that allows a port to look at the #pragma directives; it can parse that however it likes. > - any further reading on AST? The concept of the annotated syntax tree is discussed in many compiler textbooks. The implementation details, however, can widely vary because they are dependent on the language's syntax and what sort of annotations are desired/useful. The AST is essentially just a data structure to represent a program's source code so that the rest of the compiler doesn't have to worry about the details of parsing. > - what is the purpose of the .y and the .lex? Any documentation for the > syntax? The .lex file generates the lexer which is what recognizes groups of characters as "words" (tokens). The .y file generates the parser which takes the output of the lexer and attempts to fit them into the defined grammar. The traditional programs that processed these files are lex and yacc; common alternatives are flex and bison. I have a paper copy of _lex & yacc_ (ISBN: 1-56592-000-7) that I use for reference. Some online references can also be found at: http://dinosaur.compilertools.net/ > - any new substantial work on the parser? I'm working on updating the grammar to handle the inline and restrict keywords. > What is the next step in processing of the code? After each function is parsed: 1) the AST for the function is converted to intermediate code, 2) processor independent optimizations are performed on the intermediate code, 3) processor dependent optimizations are performed on the intermediate code and register usage is determined, 4) assembly code is generated from the intermediate code, and 5) the assembly code is peephole optimized. After reaching the end of the source file, the initializers for any non-const global or static variables also go through these same steps. Erik |