|
From: Andre B. <and...@gm...> - 2004-08-17 08:14:44
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hello baptiste,<br>
<br>
Have you ever thought about moving the symbol extraction part also into
the grammar ?<br>
I've notice this idea on a parser-generator language and was wondering
if the same mechanism<br>
can help us to simplify the work of symboltable extraction. Currently
it's not so easy to synchronize<br>
the cpp_grammar.txt file and the "symboldeclarator.cpp" where symbol
extraction takes place.<br>
what about extending the grammar-syntax by keywords for symbol
extraction. Some examples:<br>
<br>
<b></b><b>Entry-Point:</b><br>
translation_unit = <b>:enterscope('translation-unit')</b> :node(
'translation_unit', declaration_seq ) <b>:leavescope</b>;<br>
<br>
<b>Declaring a class:</b><br>
<br>
class_head_name = :node( 'class_name',<br>
optional_alternative( :node('dll_macro', id ),<br>
?( :node('nested_name_specifier',
nested_name_specifier) ) <br>
[:node('id', id) | template_id] ) );<br>
<br>
class_head = class_key ?( class_head_name ) ?( base_clause );<br>
<br>
class_specifier = :node( 'class_specifier', class_head <b>:declareSymbol('class',$subnode('class_name')
)</b> <br>
<b>:enterscope('class-scope',
:subnode('class_name'))</b><br>
'{' ?( member_specification ) '}' <br>
<b>:leavescope</b><br>
);<br>
<br>
<b>New scope declaration extension:</b><br>
named_namespace_definition = 'namespace' id <b>:push($2)</b>
namespace_body <b>:pop</b>; ## $2 means second element which is
'id'<br>
<br>
namespace_body = '{' <b>:enterscope($top) </b>declaration_seq <b>:leavescope
</b>'}'; ## $top is the top element in element stack<br>
<br>
unnamed_namespace_definition = 'namespace' <b>:push('<>')</b>
namespace_body <b>:pop</b>;<br>
<br>
namespace_definition = :node( 'named_namespace_def',
named_namespace_definition )<br>
| :node( 'unnamed_namespace_def',
unnamed_namespace_definition )<br>
;<br>
<br>
I'm not sure about that push/pop mechanism, but I think we need
something to propagate information across <br>
none-terminal-elements.<br>
<br>
<b>:enterscope(<string>)<br>
:enterscope(<scope-type>,<named-node>)<br>
:leavescope<br>
:push(<string>)<br>
</b><b>:push(<number>)</b><br>
<b>:push(<named-node>)<br>
:pop<br>
:subnode(<string>)<br>
<br>
</b>What do you think about this ? I haven't read through the
parser-code for grammar, so I don't know about the pit falls.<br>
<br>
greetings from <br>
André<br>
<br>
</body>
</html>
|
|
From: Baptiste L. <gai...@fr...> - 2004-08-18 05:55:52
|
Well, you could certainly extends the parser to do something like that. =
Though, you would have to use something similar to :node instead of =
push/pop (it's a stupid backtracking parser: it tries all alternatives =
and backtracks on failure. pop() would never be called in the case of a =
partially matched rule. It's one of the reason why it's so slow).
That being said, I've been thinking in the last few days about what's =
need to be done to do a 'type' aware parser.
The first major issue is that you need multiple passes to resolve =
symbols. Example:
class A {
int size() const { return size_; } // size_ is refered to before =
being declared.
int size_;
};
Clearly all symbols in function bodies need to be resolved after =
declaring all the 'outer' stuff, but the detail aren't clear for me yet. =
There might be similar issues with function parameter types and default =
value (and maybe a similar case with template parameter default value).
The second issue is handling (template) function and 'type' overload =
(template specialization). The current symbol table is clearly not =
suited for this. If a symbol is either a function or a template type, =
then in a given scope, you have N valid declaration to look up to find =
the correct overloading.
The python ast.py module actually extracts most of the node information =
and wraps them in typed node which are much easier to use (mostly =
class/members stuff aren't wrapped yet). My guess is that the next step =
is probably to try to implement a more robust symbol table with =
multi-passes tree traversal around those typed nodes. While the current =
cppparser has some bugs it's often correct. So the current tree output =
would provide a good base to design the symbol table and find out how =
the multi-pass stuff should works.
By the way, an overview of the declaration node structures can be found =
in src/cppparser/grammar_tree.txt.
Baptiste
----- Original Message -----=20
From: Andre Baresel=20
To: CppTool Mailing List=20
Sent: Tuesday, August 17, 2004 10:08 AM
Subject: [Cpptool-develop] About extracting symbol information
Hello baptiste,
Have you ever thought about moving the symbol extraction part also =
into the grammar ?
I've notice this idea on a parser-generator language and was wondering =
if the same mechanism
can help us to simplify the work of symboltable extraction. Currently =
it's not so easy to synchronize
the cpp_grammar.txt file and the "symboldeclarator.cpp" where symbol =
extraction takes place.
what about extending the grammar-syntax by keywords for symbol =
extraction. Some examples:
Entry-Point:
translation_unit =3D :enterscope('translation-unit') :node( =
'translation_unit', declaration_seq ) :leavescope;
Declaring a class:
class_head_name =3D :node( 'class_name',
optional_alternative( :node('dll_macro', id ),
?( :node('nested_name_specifier', =
nested_name_specifier) )=20
[:node('id', id) | template_id] ) );
class_head =3D class_key ?( class_head_name ) ?( base_clause );
class_specifier =3D :node( 'class_specifier', class_head =
:declareSymbol('class',$subnode('class_name') )=20
:enterscope('class-scope', =
:subnode('class_name'))
'{' ?( member_specification ) '}'=20
:leavescope
);
New scope declaration extension:
named_namespace_definition =3D 'namespace' id :push($2) namespace_body =
:pop; ## $2 means second element which is 'id'
namespace_body =3D '{' :enterscope($top) declaration_seq :leavescope =
'}'; ## $top is the top element in element stack
unnamed_namespace_definition =3D 'namespace' :push('<>') =
namespace_body :pop;
namespace_definition =3D :node( 'named_namespace_def', =
named_namespace_definition )
| :node( 'unnamed_namespace_def', =
unnamed_namespace_definition )
;
I'm not sure about that push/pop mechanism, but I think we need =
something to propagate information across=20
none-terminal-elements.
:enterscope(<string>)
:enterscope(<scope-type>,<named-node>)
:leavescope
:push(<string>)
:push(<number>)
:push(<named-node>)
:pop
:subnode(<string>)
What do you think about this ? I haven't read through the parser-code =
for grammar, so I don't know about the pit falls.
greetings from=20
Andr=E9
------------------------------------------------------- SF.Net email =
is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic =
DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off =
Retail on Ink & Toner - Free Shipping and Free Gift. =
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 =
_______________________________________________ Cpptool-develop mailing =
list Cpp...@li... =
https://lists.sourceforge.net/lists/listinfo/cpptool-develop |