[Introspector-developers] Re: Tree documentation

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Steven, 
thank you for your interest. I have forward your mail to the mailling
list (I hope you dont mind). Mario Luca and I have evaluated the simple
trees and found them to be too unstable at the moment for usage.
I will review your documentation. 

The introspector API is for the gcc, but is based on the idea of making
the docuementation as part of the API, basically self descriptive.

We are working on testing the c++ interface at this moment, and will
need to review the c++ tree documentation in detail, as the c++ trees
are crashing.

I will work on a texi output dump of the current meta-model, based on
your documentation. We will see if we can come to a synthesis.

mike
--- Steven Bosscher <s.b...@st...> wrote:
> Wow, what an email change can do...  I didn't know you were working
> on
> improving the documentation of, a.o.t., the tree structure.  As part
> of
> the tree-ssa work, I was planning some documentation of GENERIC, and
> later of GIMPLE.  The former is the intermediate representation that
> future front ends should target, the latter is the lowest-level tree
> representation we have in GCC (well, on the branch at least), on
> which
> we're doing SSA-bases optimizations.
> 
> Attached is my first shot at it, that I posted to Diego Novillo a
> month
> or so ago.  Maybe it's of some help for you, or better, maybe we can
> finish this together.
> 
> Greetz
> Steven
> 
> 
> 
> > \input texinfo  @c -*-texinfo-*-
> @c Copyright (c) 2003 Free Software Foundation, Inc.
> @c Free Software Foundation, Inc.
> @c This is part of the GCC manual.
> @c For copying conditions, see the file gcc.texi.
> 
> @c
> ---------------------------------------------------------------------
> @c GENERIC
> @c
> ---------------------------------------------------------------------
> 
> 
> @node GENERIC
> @chapter GENERIC: The language-independent abstract syntax tree
> intermediate representation.
> @cindex Trees
> @cindex GENERIC
> 
> This chapter explains the language-independent intermediate
> representation GENERIC.
> When presented with a source program,
> a front end parses the program, performs semantic analysis
> (including the generation of error messages),
> and then produces the internal representation described here.
> This representation contains a complete representation for
> the entire translation unit provided as input to the front end.
> 
> This is a draft of a design of this new intermediate
> representation.
> It's a bit sketchy at this point, because the
> language-independent optimizers are still very much
> work-in-progress.
> Sometimes the specification of this IR has to be changed
> to better suite the needs of the optimizers.
> 
> 
> @node Introduction
> @section Introduction to GENERIC
> 
> Like most other compilers, GCC uses different intermediate
> representations, or @dfn{IRs}, at different stages of the
> compilation process.
> At the highest level, each front end represents the syntax
> and semantics of the input in a language-dependent parse tree.
> The parse tree may not be the most suitable representation
> to perform high-level optimizations on.
> Also, if optimizations were to be performed on the parse tree,
> then the optimizations could not be language independent,
> and different implementations of the optimizations would
> have to exist for each language. 
> To avoid all that code duplication, GCC has a language
> independent abstract syntax tree representation called
> @dfn{GENERIC}.
> In GENERIC, the full semantics of each of the supported
> languages can be represented.
> Therefore, GCC can now perform high-level optimizations
> in a language independent way on this IR.
> 
> In GENERIC, the semantics of the source language is
> represented as a chain of @dfn{statement-expressions}, i.e.,
> a series of statements which may return a value.
> The distinction between statements and expressions is that
> a statement is something for which we are only interested
> in its side-effects, not its value.
> It's a difference of evaluation context, not always an
> intrinsic difference.
> However, most languages do explicitly distinguish between
> statements and expressions.
> This makes it harder to represent the language-indepentend
> semantics in a properly structured way.
> By not distinguishing between statements and expressions,
> and by virtue of working mostly in the existing backend
> tree codes,
> GENERIC is relatively small IR that can represent
> the semantics of all the languages that GCC has
> front ends for.
> 
> 
> @node Functions
> @section Functions
> @tindex FUNCTION_DECL
> 
> A function declaration is represented by a @code{FUNCTION_DECL} node,
> 
> @example
> @group
>    function :
> 	FUNCTION_DECL
> 	   DECL_SAVED_TREE -> block
> @end group
> @end example
> 
> To determine the scope of a function, you can use the
> @code{DECL_CONTEXT} macro.
> This points to either a @code{FUNCTION_DECL} for the
> containing function, a @code{RECORD_TYPE}
> or a @code{UNION_TYPE} for the containing class,
> or @code{NULL_TREE} if the function has "file scope".
> 
> If the @code{DECL_CONTEXT} for a function is another
> @code{FUNCTION_DECL}, this representation indicates
> that the GNU nested function extension is in use.
> For details on the semantics of nested functions,
> you are refered to the GCC Users Manual.
> The nested function can refer to local variables in
> its containing function.
> Such references are not explicitly marked in GENERIC.
> 
> Overloaded functions cannot be represented in GENERIC,
> and should be resolved in the front end.
> 
> 
> @node Scope
> @section Scope
> @tindex BIND_EXPR
> 
> A @code{BIND_EXPR} represents a scope.
> @code{BIND_EXPR_VARS} holds a list of @code{VAR_DECL} nodes,
> connected via their @code{TREE_CHAIN} field.
> These will never require cleanups.
> All of the decls for a block are given RTL at the beginning
> of the block.
> Declarations with static initializers keep their
> @code{DECL_INITIAL};
> other initializations are implemented with @code{INIT_EXPRs}
> in the codestream.
> The scope of these variables is just the body of the
> @code{BIND_EXPR},
> which can be accessed through the @code{BIND_EXPR_BODY} macro.
> The body holds the expression to be computed using the
> variables.
> 
> If the @code{BIND_EXPR} is ever expanded,
> its @code{TREE_USED} flag is set.
> @code{BIND_EXPR_BLOCK} is the @code{BLOCK} that corresponds
> to the bindings in this block, for debugging purposes.
> If this @code{BIND_EXPR} is actually expanded,
> that sets the @code{TREE_USED} flag in the @code{BLOCK}.
> This tells the code for debugging symbol tables not to ignore
> the @code{BIND_EXPR}.
> If the @code{BIND_EXPR} should be output for debugging but
> will not be expanded,
> set the @code{TREE_USED} flag by hand.
> 
> @example
> @group
>    block :
> 	BIND_EXPR
> 	   BIND_EXPR_VARS -> DECL chain
> 	   BIND_EXPR_BLOCK -> BLOCK
> 	   BIND_EXPR_BODY -> compound-stmt
> @end group
> @end example
> 
> @example
> @group
>    compound-stmt :
> 	COMPOUND_EXPR
> 	   op0 -> non-compound-stmt
> 	   op1 -> stmt
> 
>    stmt :
> 	compound-stmt
> 	| non-compound-stmt
> @end group
> @end example
> 
> Questions have been raised about the advisability of using
> @code{COMPOUND_EXPR} to chain statements;
> in most front ends, the program is represented as a chain
> of @code{*_STMT} nodes, and the @code{TREE_CHAIN} field of
> these nodes is used to chain them together.
> The benefit of using @code{COMPOUND_EXPR} instead is modularity;
> apart from the statemenent/expression distinction,
> using @code{COMPOUND_EXPR} makes it easy to replace a
> single, complex expression with a sequence of simple ones,
> simply by plugging in a @code{COMPOUND_EXPR} in its place.
> The @code{TREE_CHAIN} scheme required a lot more pointer
> management in order to splice the new @code{*_STMTs}
> in at both ends.
> 
> (An additional advantage that we do not use yet,
> is that double-chaining could be provided by using
> the @code{TREE_CHAIN} of the @code{COMPOUND_EXPRs}.
> Without double-chaining,
> some code transformations like goto elimination are
> much harder to implement.)
> 
> 
> @node Control Flow
> @section Control Flow
> 
> @c Every @code{non-compound-stmt} affects control flow.
> 
> @example
> @group
>    non-compound-stmt :
> 	block
> 	| loop-stmt
> 	| if-stmt
> 	| switch-stmt
> 	| labeled-block-stmt
> 	| jump-stmt
> 	| label-stmt
> 	| try-stmt
> 	| modify-stmt
> 	| call-stmt
> @end group
> @end example
> 
> @node Unconditional Branching
> @subsection Unconditional Branching
> 
> @example
> @group
>    jump-stmt :
> 	EXIT_EXPR
> 	   EXIT_EXPR_COND -> condition
> 	| GOTO_EXPR
> 	   GOTO_DESTINATION -> LABEL_DECL | '*' ID
> 	| RETURN_EXPR
> 	   op0 -> modify-stmt | NULL_TREE
> 	| EXIT_BLOCK_EXPR
> 	   EXIT_BLOCK_LABELED_BLOCK -> ref to LABELED_BLOCK_EXPR
> 	   op1 -> NULL_TREE
> 	| THROW_EXPR ???
> @end group
> @end example
> 
> Ideally, the assignment to the return value would
> be moved out of the @code{RETURN_EXPR},
> but it appears that @code{expand_return} depends
> on getting an assignment
> (or actually, a @code{MODIFY_EXPR})
> in order to handle some return semantics.
> 
> @example
> @group
>    labeled-block-stmt :
> 	LABELED_BLOCK_EXPR
> 	   LABELED_BLOCK_LABEL -> LABEL_DECL
> 	   LABELED_BLOCK_BODY -> stmt
> @end group
> @end example
> 
> @example
> @group
>    label-stmt :
> 	LABEL_EXPR
> 	   LABEL_EXPR_LABEL -> LABEL_DECL
> 	| CASE_LABEL_EXPR
> 	     CASE_LOW -> value | NULL_TREE
> 	     CASE_HIGH -> value | NULL_TREE
> @end group
> @end example
> 
> @node Conditional Branching
> @subsection Conditional Branching
> @tindex COND_EXPR
> @tindex SWITCH_EXPR
> 
> There are two kinds of expressions in GENERIC that
> represent conditional branching:
> @code{COND_EXPR} and @code{SWITCH_EXPR}.
> The former is used to represent a conditional branch
> on a boolean expression;
> the latter represents multiway branching.
> 
> The @code{COND_EXPR} is a natural way to represent an
> @code{if} statement.
> The @code{COND_EXPR} should be interpreted as a
> @code{ ... ? ... : ...} expression in C.
> 
> @example
> @group
>    if-stmt :
> 	COND_EXPR
> 	   op0 -> condition
> 	   op1 -> stmt
> 	   op2 -> stmt
> @end group
> @end example
> 
> A multiway branching expression depending on the value
> of a certain expression,
> such as the @code{switch} statement,
> is represented by a @code{SWITCH_EXPR}.
> 
> @example
> @group
>    switch-stmt:
> 	SWITCH_EXPR
> 	   op0 -> value
> 	   op1 -> stmt
> @end group
> @end example
> 
> The exact representation for multiway branches
> is not entirely clear yet.
> There will probably have to be a vector that maps
> all the case value ranges to @code{LABEL_EXPRs}
> for each @code{SWITCH_EXPR},
> but this has not been implemented yet.
> There are many ways the different cases could be
> represented in the IR,
> and each of them has their specific advantages and
> disadvantages.
> Intermediate representations for other compilers,
> such as the McCAT SIMPLE IR,
> require that during lowering the case labels are
> made disjoint by copying shared code around,
> allowing a more structured representation of a
> multiway branch.
> For GCC this is too dubious an optimization to be
> performed by default,
> as it would just copy too much code around.
> This still might be interesting as part
> of a goto-elimination pass, though,
> or for multiway branches with no more than a
> certain number of cases.
> 
> @node Loops
> @subsection Loops
> @tindex LOOP_EXPR
> @tindex COND_EXPR
> 
> All loops are represented by a generic @code{LOOP_EXPR}.
> Because a @code{LOOP_EXPR} represents an infinite loop, 
> the loop body has one or more @code{EXIT_EXPR} nodes,
> used to express the loop condition.
> Usually the @code{EXIT_EXPR} is in one of the branches
> for an @code{COND_EXPR}.
> 
> @example
> @group
>    loop-stmt:
> 	LOOP_EXPR
> 	   LOOP_EXPR_BODY -> stmt
> @end group
> @end example
> 
> There is no distinction between what might have been
> a @code{for} loop or a @code{do @{ ... @} while} loop
> in the source program.
> The only difference in the representation is whether the
> exit condition is evaluated at the start of the loop body,
> as in @code{while} and @code{for} loops,
> or at the end of the loop body,
> as in @code{do-while} loops.
> 
> @code{EXIT_EXPR} is a bit backwards for some purposes,
> as its sense is opposite to that of the loop condition,
> so we end up calling invert_truthvalue twice in
> the process of generating and expanding it.
> That's not a big deal,
> but perhaps we should change it anyway.
> 
> 
> @node Exception Handling
> @section Exception Handling
> @example
> @group
>    try-stmt :
> 	TRY_CATCH_EXPR
> 	| TRY_FINALLY_EXPR
> @end group
> @end example
> 
> This will need to be extended to handle type-based catch clauses as
> well.
> 
> @c I think it makes sense to leave this as a separate tree code for
> handling
> @c cleanups.
> 
> @node Expressions
> @section Expressions
> 
> @example
> @group
>    modify-stmt :
> 	MODIFY_EXPR
> 	   op0 -> lhs
> 	   op1 -> rhs
> 	| INIT_EXPR
> 	     op0 -> lhs
> 	     op1 -> rhs
> 
>    call-stmt :
> 	CALL_EXPR
> 	   op0 -> ID
> 	   op1 -> arglist
> @end group
> @end example
> 
> Assignment and calls are the only expressions with
> intrinsic side-effects,
> so only they can appear at statement context.
> 
> The rest of this is basically copied from the McCAT design.
> I think it still needs some tweaking,
> but that can wait until after the statement-level stuff is
> worked out.
> 
> @example
> @group
>    varname :
> 	compref
> 	| ID (rvalue)
> 
>    lhs :
> 	compref
> 	| ID (lvalue)
> 	| * ID (rvalue)
> 
>    pseudo-lval :
> 	ID
> 	| '*' ID  (either lvaule or rvalue)
> 
>    compref :
> 	COMPONENT_REF
> 	   op0 -> compref | pseudo-lval
> 	| ARRAY_REF
> 	   op0 -> compref | pseudo-lval
> 	   op1 -> value
> 
>    condition :
> 	value
> 	| value relop val
> 
>    value :
> 	ID
> 	| CONST
> 
>    rhs :
> 	varname
> 	| CONST
> 	| '*' ID
> 	| '&' varname_or_temp
> 	| call_expr
> 	| unop value
> 	| value binop value
> 	| '(' cast ')' varname
> 
>    unop :
> 	'+'
> 	| '-'
> 	| '!'
> 	| '~'
> 
>    binop :
> 	relop
> 	| '-'
> 	| '+'
> 	| '/'
> 	| '*'
> 	| '%'
> 	| '&'
> 	| '|'
> 	| '<<'
> 	| '>>'
> 	| '^'
> 
>    relop :
> 	'<'
> 	| '<='
> 	| '>'
> 	| '>='
> 	| '=='
> 	| '!='
> @end group
> @end example
> 
> 
> 
> 

=====
James Michael DuPont
http://introspector.sourceforge.net/

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com