Thread: [Oopic-compiler-devel] Compiler design
Status: Planning
Brought to you by:
ndurant
From: Neil D. <nd...@us...> - 2004-05-12 00:45:40
|
Did either have you have any ideas on the overall design of the compiler? I've been re-reading my book on Flex/Bison, and it would definitely be a big time-saver using those tools. They would generate the C code for parsing the grammar we provide (in a simple notation), and allow us to call functions when the various semantic constructs are encountered. That way we can build up a parse tree and populate symbol tables, and the subsequently walk the tree generating code. We could then add an optimisation stage that checks through the Forth-like codes to see if anything silly is being done. This is the basic idea I have in mind, although I know there's a heck of a lot of detail I've just hopped over! I'd be interested to hear how you guys envisioned how the compiler would be structured. Thoughts? Neil -- Neil Durant <nd...@us...> |
From: D. D. M. <dd...@mc...> - 2004-05-12 01:19:02
|
A couple of quick thoughts. 1. lack of support for the three syntax styles Savage implemented is probably a non-starter (unless we have the IDE selecting between the OOpicMK compiler and this one). As it is now, the user must specify the syntax; we can require the same as a command line. Or possibly use a token in the sourcecode similar to what Parallax did with their new PBasic syntax and the firmware revisions, and let the preprocessor pick it out. 2. After those, the grammer for an (or many) arbitrary syntax could be defined. 3. In all cases, the compiler tools yield a similar internal representation that is subjected to code generation and optimization. For the original scripts' syntax, should we generate identical code to the OOpicMK or use our own? 4. Does the grammer specification require some (or deep) knowledge of the possible syntactical structures the native byte-codes would permit? If so, then I should get some more work done. Daniel ----- Original Message ----- From: "Neil Durant" <nd...@us...> To: "OOPic Compiler List" <oop...@li...> Sent: Tuesday, 11 May 2004 20:45 Subject: [Oopic-compiler-devel] Compiler design > Did either have you have any ideas on the overall design of the compiler? > I've been re-reading my book on Flex/Bison, and it would definitely be a > big time-saver using those tools. They would generate the C code for > parsing the grammar we provide (in a simple notation), and allow us to call > functions when the various semantic constructs are encountered. That way > we can build up a parse tree and populate symbol tables, and the > subsequently walk the tree generating code. We could then add an > optimisation stage that checks through the Forth-like codes to see if > anything silly is being done. > > This is the basic idea I have in mind, although I know there's a heck of a > lot of detail I've just hopped over! I'd be interested to hear how you > guys envisioned how the compiler would be structured. > > Thoughts? > > Neil > -- > Neil Durant > <nd...@us...> > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > Oopic-compiler-devel mailing list > Oop...@li... > https://lists.sourceforge.net/lists/listinfo/oopic-compiler-devel > |
From: Neil D. <nd...@us...> - 2004-05-12 01:49:43
|
D. Daniel McGlothin wrote: > A couple of quick thoughts. > > 1. lack of support for the three syntax styles Savage implemented is > probably a non-starter (unless we have the IDE selecting between the OOpicMK > compiler and this one). As it is now, the user must specify the syntax; we > can require the same as a command line. Or possibly use a token in the > sourcecode similar to what Parallax did with their new PBasic syntax and the > firmware revisions, and let the preprocessor pick it out. Agreed. We could also have the syntax selected implicitly based on the file extension. There's no reason why we need to stick with .osc file extensions - we could use .c, .bas, .java or whatever. Or perhaps a combination - if it's explicitly specified on the command line, use that, if not then check for a token in the source, and failing that use the file extension. And failing that, assume it's C! > 2. After those, the grammer for an (or many) arbitrary syntax could be > defined. > > 3. In all cases, the compiler tools yield a similar internal representation > that is subjected to code generation and optimization. Agreed. Many compilers that support multiple input languages (eg gcc) separate the internal representation generation from the code generation and optimisation, and I think it would be wise for us to do the same. I think C, BASIC and Java are similar enough to allow identical internal representation. We would effectively have a number of different "front ends" that convert input code into an internal structure. > For the original scripts' syntax, should we generate identical code to > the OOpicMK or use our own? I'm not sure I understand what you mean (it's 2.44am here - brain fade!) > 4. Does the grammer specification require some (or deep) knowledge of the > possible syntactical structures the native byte-codes would permit? If so, > then I should get some more work done. I don't think so - the grammar specification should be pretty much like the standard C grammar (for C language support) which we can obtain from C syntax documents in various places. The format Flex/Bison take is very similar to the syntax used in syntax standards documents, so it should be fairly straightforward to implement a full C parser. For BASIC and Java I guess we just play it by ear...! My understanding is that the grammar specification should be fairly well isolated from the native byte-code. The byte code looks sufficiently low-level to be flexible. We know it's possible to implement the current OOPIC scripting language constructs in OOPIC byte code, after all. Perhaps we'll discover things that are hard or even impossible to implement in byte code, but we may only find out when we try.... Neil -- Neil Durant <nd...@us...> |
From: D. D. M. <dd...@mc...> - 2004-05-12 02:04:32
|
> > For the original scripts' syntax, should we generate identical code to > > the OOpicMK or use our own? > > I'm not sure I understand what you mean (it's 2.44am here - brain fade!) The point is whether there is two code generators or one. One for the project's compiler/code generation; another one (or three) for the original Savage implementations of C, Basic, and Java syntaxes. I'd opt for one only. Asking only for completeness of discussion. Good night <grin> Daniel |
From: Neil D. <nd...@us...> - 2004-05-12 02:15:12
|
D. Daniel McGlothin wrote: > > > For the original scripts' syntax, should we generate identical code to > > > the OOpicMK or use our own? > > > > I'm not sure I understand what you mean (it's 2.44am here - brain fade!) > > The point is whether there is two code generators or one. > > One for the project's compiler/code generation; another one (or three) for > the original Savage implementations of C, Basic, and Java syntaxes. > > I'd opt for one only. Asking only for completeness of discussion. I think we'll be able to handle the original Savage C/BASIC/Java syntaxen using a preprocessor like Andy's, which would then output grammar which is compatible with our new compiler. Of course, we should be trying to persuade people to switch to our more complete and standard syntax, but we should also aim to support the Savage "legacy" syntax... :-) > Good night <grin> <Yawn> Good night! Neil -- Neil Durant <nd...@us...> |
From: Andrew P. <wa...@ic...> - 2004-05-12 14:21:28
|
At 03:15 AM 5/12/2004 +0100, Neil Durant wrote: >I think we'll be able to handle the original Savage C/BASIC/Java syntaxen >using a preprocessor like Andy's, which would then output grammar which is >compatible with our new compiler. I would expect that you could handle the original syntax by taking your definitions for C / BASIC / Java and just modifuing them a bit. IOW, you would have 6 languages to support. >Of course, we should be trying to persuade people to switch to our more >complete and standard syntax, but we should also aim to support the Savage >"legacy" syntax... :-) Agreed. ...Andy |
From: Neil D. <nd...@us...> - 2004-05-12 19:17:24
|
Andrew Porrett wrote: > At 03:15 AM 5/12/2004 +0100, Neil Durant wrote: > >I think we'll be able to handle the original Savage C/BASIC/Java syntaxen > >using a preprocessor like Andy's, which would then output grammar which is > >compatible with our new compiler. > > I would expect that you could handle the original syntax by taking your > definitions for C / BASIC / Java and just modifuing them a bit. IOW, you > would have 6 languages to support. That would be the other way of doing it. Ironically that would mean that some of the grammar-checking rules for the Savage C language module would have to spit out an error for certain perfectly legal C syntax... :-) I say we go for our ideal standard C to begin with, almost as a proof of concept. Then once we're happy it's all working then the other 5 languages (or more!) can be filled in. Neil -- Neil Durant <nd...@us...> |
From: Andrew P. <wa...@ic...> - 2004-05-12 21:12:39
|
At 08:17 PM 5/12/2004 +0100, Neil Durant wrote: >I say we go for our ideal standard C to begin with, almost as a proof of >concept. Then once we're happy it's all working then the other 5 languages >(or more!) can be filled in. Works for me! ...Andy |
From: Neil D. <nd...@us...> - 2004-05-12 22:26:19
|
Andrew Porrett wrote: > At 08:17 PM 5/12/2004 +0100, Neil Durant wrote: > >I say we go for our ideal standard C to begin with, almost as a proof of > >concept. Then once we're happy it's all working then the other 5 languages > >(or more!) can be filled in. > > Works for me! Excellent! Ok, so if this project is going to fall naturally into two development halves - me doing the parsing and internal representation building, and you (Andy) doing the code gegeration and optimization, then we're going to have to discuss what data structures you're going to need to start generating code. This will then allow me to get going on the parsing side of things. Have you had chance to think about what kinds of data structures you'll need for code generation? Neil -- Neil Durant <nd...@us...> |
From: Andrew P. <wa...@ic...> - 2004-05-12 14:21:27
|
At 09:13 PM 5/11/2004 -0400, D. Daniel McGlothin wrote: >A couple of quick thoughts. > >1. lack of support for the three syntax styles Savage implemented is >probably a non-starter (unless we have the IDE selecting between the OOpicMK >compiler and this one). As it is now, the user must specify the syntax; we >can require the same as a command line. Or possibly use a token in the >sourcecode similar to what Parallax did with their new PBasic syntax and the >firmware revisions, and let the preprocessor pick it out. Using the existing compiler's command line flags makes sense (makes ours a drop in replacement). Nothing wrong with supporting a token in the source code as well (it would override the command line). As I think about it, I like the idea of simply writing a trivial prescanner that determines what language the user picked. If you find "sub void", it ain't real C, etc. Lots of semicolons means it's not BASIC, etc. > >2. After those, the grammer for an (or many) arbitrary syntax could be >defined. > >3. In all cases, the compiler tools yield a similar internal representation >that is subjected to code generation and optimization. For the original >scripts' syntax, should we generate identical code to the OOpicMK or use our >own? The currently generated code is pretty bad. I'm sure we can do better. > >4. Does the grammer specification require some (or deep) knowledge of the >possible syntactical structures the native byte-codes would permit? If so, >then I should get some more work done. The native bytecodes will tell us what we can and can't do. I see the bytecode map as step #1. ...Andy |
From: Andrew P. <wa...@ic...> - 2004-05-12 01:25:15
|
Don't know very much about compiler design. Assumed we would do pretty much what you outlined. I expect my expertise will come into play more when we get around to code generation and optimization. At 01:45 AM 5/12/2004 +0100, Neil Durant wrote: >Did either have you have any ideas on the overall design of the compiler? |
From: Neil D. <nd...@us...> - 2004-05-12 02:01:49
|
My expertise is on the other end - I have written a number of parsers and code to generate internal representations from an input grammar. So it seems our expertise lies in three convenient areas: Andy: Code generation/optimisation Daniel: Reverse-engineering and op-code specialist Me: Parsing of the input language So perhaps we should work out the boundary between the "front end" part of the compiler, ie the language parsing stuff, and the code generation side. If we can come up with a design for the interface between those two modules, and a rough design for the data structures that the parser will generate to pass on for code generation, then we can make a start. Parsers for compilers generally populate a tree structure, where each node represents some semantic aspect of the code. For example one node might represent a "for" loop, and could have properties such as pointers into a symbol table for the counter variable, and links to other nodes representing the test and iteration expression. It would also have a child node representing the block of code to loop over. That child node would have other child nodes representing the code within the block, and so on. I guess we need to work out what structures will be required to generate the code. Off the top of my head, we would need at least: A tree structure for the program itself A symbol table for variables (with type information) I'm sure we'll need a lot more than this, but my brain is begging for sleep!! Andy, can you think what other structures we'll need to start code generation? Neil Andrew Porrett wrote: > Don't know very much about compiler design. Assumed we would do pretty > much what you outlined. I expect my expertise will come into play more > when we get around to code generation and optimization. > > At 01:45 AM 5/12/2004 +0100, Neil Durant wrote: > >Did either have you have any ideas on the overall design of the compiler? > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > Oopic-compiler-devel mailing list > Oop...@li... > https://lists.sourceforge.net/lists/listinfo/oopic-compiler-devel > -- Neil Durant <nd...@us...> |
From: Andrew P. <wa...@ic...> - 2004-05-12 14:21:29
|
At 03:01 AM 5/12/2004 +0100, Neil Durant wrote: >I'm sure we'll need a lot more than this, but my brain is begging for >sleep!! Andy, can you think what other structures we'll need to start code >generation? Eh? It's 10 AM and my brain isn't awake yet. ...Andy |