From: Stefan R. <sr...@ma...> - 2003-03-12 14:54:26
Attachments:
condition.patch
|
Hello, I have been writing a program which takes an OpenC++ parse tree, and translates it into a more regular output, i.e., user-defined operators are turned into function calls, overload resolution being done, etc... I have used OpenC++ only as a parser library (I'm working with Michael Hohmuth). I found some C++ constructs which OpenC++ does not handle: - "if (declaration)" is not supported, neither is "while (declaration)" nor "switch(declaration)". I have appended a patch which adds that. What is your general position about changing the parse tree like this? Will that break people's programs or is that not a problem? - "explicit" is not supported, but works, sort-of. "explicit" is only allowed for constructors, and OpenC++ thinks that "explicit" is the return type :-) - wide characters (L"klingon sentence here"). I've seen there is a patch on SF which adds that. - "namespace A = B" - type-specifiers in non-canonical order are not accepted. For example, "unsigned const long typedef int a" is legal C++, but OpenC++ doesn't grok it. I'm unsure how that problem can be solved minimally-invasive, and whether anyone cares about it. - function-try-blocks are not accepted. a::a() try : mem_init() { } catch(...) { } In addition, some parse tree nodes lose information about which production was applied. For example, the types "unsigned int" and "::name" both turn into two-element lists [unsigned int] and [:: name], although the first one contains two type-specifiers while the second one contains one. I'm unsure whether that's a problem worth fixing, but it caused me some headaches when I wanted to analyze such trees. OpenC++ also does not recognize di- and trigraphs, nor does it support replacement keywords (i.e. "and_eq" instead of "&="). I think these are better in a preprocessor. Another problem which I found quite hard to solve: take an expression like this: "(name) + x * y". Depending upon whether the "name" is a type or a variable, this either means ((name) +x) * y (cast +x, then multiply) or (name) + (x * y) (multply, then add). OpenC++ usually parses this as the first form. Okay, I have no problem with OpenC++ mis-parsing things; I know it can't do better with that little knowledge it has. The problem here is that a parse problem affects its parent node, not just its child nodes (i.e. "(name)(1,2)" is either a function call or a cast of a comma expression. OpenC++ parses it as the latter but it is quite simple to fix that up.) Well, any helpful hints on that topic are appreciated. The program which we finally want to run through the OpenC++ parser is an operating system kernel. Kernel hackers tend to use gcc extensions a lot. What is your opinion on adding these to OpenC++ (i.e., if I made a patch for them, would it make it into OpenC++?). I'm talking about... - compound expressions ("({ if (x) .... })") - extended literals, gcc and C99 form ("return (div_t) { 1, 0 };", "div_t x = { quot:3 };" resp. "div_t x = { .quot = 3 }") - some keywords: restrict, asm, ... - typeof - maybe some __attribute__ should be handled? OTOH, it can simply be #defined out. Thank you for reading until here, Stefan |
From: Grzegorz J. <ja...@he...> - 2003-03-13 06:30:36
|
Hi Stefan, On Wed, 12 Mar 2003, Stefan Reuther wrote: > Hello, > > I have been writing a program which takes an OpenC++ parse tree, > and translates it into a more regular output, i.e., user-defined > operators are turned into function calls, overload resolution > being done, etc... I have used OpenC++ only as a parser library > (I'm working with Michael Hohmuth). That's a pity that you do not implement it in OpenC++ itself. You are working on Fiasco project, right? Are your sources open? > I found some C++ constructs which OpenC++ does not handle: > > - "if (declaration)" is not supported, neither is "while > (declaration)" nor "switch(declaration)". I have appended a > patch which adds that. > > What is your general position about changing the parse tree > like this? Will that break people's programs or is that not a > problem? > > - "explicit" is not supported, but works, sort-of. "explicit" is > only allowed for constructors, and OpenC++ thinks that "explicit" > is the return type :-) > > - wide characters (L"klingon sentence here"). I've seen there is > a patch on SF which adds that. > > - "namespace A = B" > > - type-specifiers in non-canonical order are not accepted. For > example, "unsigned const long typedef int a" is legal C++, > but OpenC++ doesn't grok it. I'm unsure how that problem can > be solved minimally-invasive, and whether anyone cares about it. > > - function-try-blocks are not accepted. > a::a() try : mem_init() { } catch(...) { } > > In addition, some parse tree nodes lose information about which > production was applied. For example, the types "unsigned int" > and "::name" both turn into two-element lists [unsigned int] and > [:: name], although the first one contains two type-specifiers > while the second one contains one. I'm unsure whether that's a > problem worth fixing, but it caused me some headaches when I > wanted to analyze such trees. I find the answer to this question important to the whole project, so I post it in a separate e-mail. > OpenC++ also does not recognize di- and trigraphs, nor does it > support replacement keywords (i.e. "and_eq" instead of "&="). I > think these are better in a preprocessor. I do not quite understand where is the problem here. OpenC++ works on preproc output, so it has no chance to see trigraps&co, right? > > Another problem which I found quite hard to solve: take an > expression like this: "(name) + x * y". Depending upon whether > the "name" is a type or a variable, this either means > ((name) +x) * y > (cast +x, then multiply) or > (name) + (x * y) > (multply, then add). > > OpenC++ usually parses this as the first form. Okay, I have no > problem with OpenC++ mis-parsing things; I know it can't do > better with that little knowledge it has. The problem here is > that a parse problem affects its parent node, not just its > child nodes (i.e. "(name)(1,2)" is either a function call or a > cast of a comma expression. OpenC++ parses it as the latter but > it is quite simple to fix that up.) Well, any helpful hints on > that topic are appreciated. I do not see any solution within parser itself. You have to know the local environment to be able to do anything. The honest solution would be to have node type that makes it obvious, that parser does not know what it is. Further analysis would change the node to one or another form, based on the relevant identifiers bindings. I am not sure if this scheme is currently implementable without seriously compromising backward compatibility. > The program which we finally want to run through the OpenC++ > parser is an operating system kernel. Kernel hackers tend to use > gcc extensions a lot. What is your opinion on adding these to > OpenC++ (i.e., if I made a patch for them, would it make it > into OpenC++?). I'm talking about... > > - compound expressions ("({ if (x) .... })") > > - extended literals, gcc and C99 form > ("return (div_t) { 1, 0 };", "div_t x = { quot:3 };" resp. > "div_t x = { .quot = 3 }") > > - some keywords: restrict, asm, ... > > - typeof I do not see any problems with letting those in, but perhaps under a switch. > - maybe some __attribute__ should be handled? OTOH, it can > simply be #defined out. This one I would leave out at the moment, especially if you do not have clear requirement for it. Again, if you need it, go ahead and put it under the switch. > Thank you for reading until here, "I have made this letter longer than usual, because I lack time to make it short." [Blaise Pascal] Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2002 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |
From: Stefan R. <sr...@ma...> - 2003-03-13 12:08:26
|
Hello, On Thu, Mar 13, 2003 at 01:48:52PM +0800, Grzegorz Jakacki wrote: > On Wed, 12 Mar 2003, Stefan Reuther wrote: > > I have been writing a program which takes an OpenC++ parse tree, > > and translates it into a more regular output, i.e., user-defined > > operators are turned into function calls, overload resolution > > being done, etc... I have used OpenC++ only as a parser library > > (I'm working with Michael Hohmuth). > > That's a pity that you do not implement it in OpenC++ itself. The decision was to not change OpenC++ for the moment, and do everything from the outside. This is advantageous from a modularity point-of-view, but some of "my" code is a bit more complicated than it could be. For example, as far as I can tell, OpenC++'s symbol management doesn't do everything we need (like overloading functions, "int stat()" vs "struct stat", ...; but feel free to prove me wrong :-), so the solutions would be either to change OpenC++, or to implement our own symbol table. We decided to pick the latter choice, and leave the former as an option for the future. > You are working on Fiasco project, right? Are your sources open? VFiasco, right. Sources are not yet open, because we did not yet package them up nicely, and because the project is far from complete. Other than that, there's no problem. > > OpenC++ also does not recognize di- and trigraphs, nor does it > > support replacement keywords (i.e. "and_eq" instead of "&="). I > > think these are better in a preprocessor. > > I do not quite understand where is the problem here. OpenC++ works on > preproc output, so it has no chance to see trigraps&co, right? gcc's preprocessor doesn't do digraphs ("<%" instead of "{"). It also doesn't do replacement keywords, but at least these can be worked around using some #defines, like C99 does it. I think there are some subtle gotchas in these, but I would just declare that these things are out of OpenC++'s scope (I don't care whether "<%" stringifies as "<%" or "{"). And, honestly, I've never seen anyone use them. > > Another problem which I found quite hard to solve: take an > > expression like this: "(name) + x * y". Depending upon whether > > the "name" is a type or a variable, this either means > > ((name) +x) * y > > (cast +x, then multiply) or > > (name) + (x * y) > > (multply, then add). [...] > The honest solution would be to have node type that makes it > obvious, that parser does not know what it is. Further analysis > would change the node to one or another form, based on the > relevant identifiers bindings. I am not sure if this scheme > is currently implementable without seriously compromising backward > compatibility. Okay, I had some similar things in mind. For compatibility, the "Walker" could have a routine "TranslateAmbiguousCast" or somesuch, whose default implementation calls "TranslateCastExpr", "TranslatePrefixExpr" and "TranslateInfixExpr". Other than that, my (implicit) question was how many people would cry when I change the parse tree somehow :-) At least, the addition of "typeid" broke one of our regression tests. I think, static_cast & co should have their own PtreeCxxStyleCast node, which would probably change someone else's tests. Addition of new productions wouldn't break immediately, but someone who implements their own Walker might accidentially miss some nodes. That problem could at least be solved by making an abstract Walker base class where all the TranslateFooBar functions are pure virtual. > > The program which we finally want to run through the OpenC++ > > parser is an operating system kernel. Kernel hackers tend to use > > gcc extensions a lot. What is your opinion on adding these to > > OpenC++ (i.e., if I made a patch for them, would it make it > > into OpenC++?). I'm talking about... [...] > > I do not see any problems with letting those in, but perhaps under > a switch. Okay. > > - maybe some __attribute__ should be handled? OTOH, it can > > simply be #defined out. > > This one I would leave out at the moment, especially if you do not have > clear requirement for it. Again, if you need it, go ahead and put > it under the switch. Maybe this can be solved more generally, as a parameterized user keyword. If I recall correctly, such a thing is on the wishlist? (haven't yet digged through all of Michael's archive) Stefan |
From: Grzegorz J. <ja...@he...> - 2003-03-14 02:53:48
|
On Thu, 13 Mar 2003, Stefan Reuther wrote: > > You are working on Fiasco project, right? Are your sources open? > > VFiasco, right. Sources are not yet open, because we did not yet > package them up nicely, and because the project is far from > complete. Other than that, there's no problem. > > > > OpenC++ also does not recognize di- and trigraphs, nor does it > > > support replacement keywords (i.e. "and_eq" instead of "&="). I > > > think these are better in a preprocessor. > > > > I do not quite understand where is the problem here. OpenC++ works on > > preproc output, so it has no chance to see trigraps&co, right? > > gcc's preprocessor doesn't do digraphs ("<%" instead of "{"). It > also doesn't do replacement keywords, but at least these can be > worked around using some #defines, like C99 does it. I think > there are some subtle gotchas in these, but I would just declare > that these things are out of OpenC++'s scope (I don't care > whether "<%" stringifies as "<%" or "{"). And, honestly, I've > never seen anyone use them. There is brand new preprocessor based on Spirit parser, you may want to have a look at: http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&group=comp.compilers&selm=03-03-046%40comp.compilers however I am not sure if it handles trigraphs. > > > Another problem which I found quite hard to solve: take an > > > expression like this: "(name) + x * y". Depending upon whether > > > the "name" is a type or a variable, this either means > > > ((name) +x) * y > > > (cast +x, then multiply) or > > > (name) + (x * y) > > > (multply, then add). > [...] > > The honest solution would be to have node type that makes it > > obvious, that parser does not know what it is. Further analysis > > would change the node to one or another form, based on the > > relevant identifiers bindings. I am not sure if this scheme > > is currently implementable without seriously compromising backward > > compatibility. > > Okay, I had some similar things in mind. For compatibility, the > "Walker" could have a routine "TranslateAmbiguousCast" or > somesuch, whose default implementation calls > "TranslateCastExpr", "TranslatePrefixExpr" and > "TranslateInfixExpr". The problem is that occ client should be isolated from this detail. Somebody using Walker should not care about it, it should be taken care of by the framework. I think there is a problem now, since there is no islation between syntax tree that OpenC++ is working on and syntax three that is exposed to the clients. I have been working on this for some time and I think I have a couple of idas on how to provide isolation here without runtime overhead, but I have not run those ideas through too many persons yet. Perhaps this is a good time and good place to do it. I will try to prepare a write-up on weekend and post it here. I am not sure, if it will be immediately applicable and what are exact backward compatibility issues on that, but maybe people here will help figuring this out. > Other than that, my (implicit) question was how many people > would cry when I change the parse tree somehow :-) Just wait for a week to see. > At least, the > addition of "typeid" broke one of our regression tests. This is precisely why I would like the isolation. > I think, > static_cast & co should have their own PtreeCxxStyleCast node, > which would probably change someone else's tests. Addition of > new productions wouldn't break immediately, but someone who > implements their own Walker might accidentially miss some nodes. > That problem could at least be solved by making an abstract > Walker base class where all the TranslateFooBar functions are > pure virtual. Right. However this involves problems with all the code that instantiates Walker --- it has to be fixed by hand, better yet switched to Factory-based solution. > Maybe this can be solved more generally, as a parameterized user > keyword. Why not. > If I recall correctly, such a thing is on the wishlist? > (haven't yet digged through all of Michael's archive) Unfortunately I do not maintain wishlist, so I cannot verify. Best regards Grzegorz ################################################################## # Grzegorz Jakacki Huada Electronic Design # # Senior Engineer, CAD Dept. 1 Gaojiayuan, Chaoyang # # tel. +86-10-64365577 x2074 Beijing 100015, China # # Copyright (C) 2002 Grzegorz Jakacki, HED. All Rights Reserved. # ################################################################## |