Re: [Flex-help] Python Lexical Analysis
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Marcel L. <ma...@la...> - 2010-08-03 06:36:24
|
Can't you have a global stack in your scanner to keep track of the indentation levels? Something like: [ \t]+ { size_t top_len = yyextra->indent_stack.top(); if (top_len < yyleng) { yyextra->indent_stack.push(yyleng); return INDENT; } else if (top_len > yyleng) { yyextra->indent_stack.pop(); return DEDENT; } } You will need to add proper start conditions to only let this fire after you return a NEWLINE. Additionally you need to handle cases like this: foo bar etc var I think after scanning "etc" it should return DEDENT, DEDENT, INDENT; but the sample lex example above won't handle that correctly. You should be able to pull this off with clever start conditions, or YY_USER_ACTION however. On Tue, 3 Aug 2010 06:19:59 +0100, Philip Herron <her...@go...> wrote: > Hey guys > > I am writing a flex and bison implementation of the python parser, and > coming up on a problem, thinking i may need to revert to a hand > written lexer, but i would like the insight of more experienced flex > guys. > > The problem being is the suite grammar which is the grammar for a block > example: > > def foo ( .. ) : > <code_block> > > I am not going to talk about grammar but what i want to illustrate is > the problem in figuring out indentation the grammar for something like > this is as follows: > > http://docs.python.org/release/2.5.2/ref/grammar.txt > > funcdef ::= > [decorators] "def" funcname "(" [parameter_list] ")" > ":" suite > > suite ::= > stmt_list NEWLINE > | NEWLINE INDENT statement+ DEDENT > > The problem being is on the lexical side of things figuring out what > is INDENT and DEDENT, so for this parser i am requiring 4 spaces for > an indent because that's what emacs python-mode is doing for me for > now anyways. So reading up on: > > http://docs.python.org/reference/lexical_analysis.html > > The indentation part they use a stack to figure out the indentation > levels, so first off 0 is pushed onto a stack as a kind of initializer > or baseline for the system, then if we find an indentation on a new > logical line we push 1 onto the stack if we find multiple we need to > check that level of indentation exists on the stack and so on you get > the idea you need to read that little paragraph, this is all to figure > out when to generate a DEDENT token which is the real crux of the > problem. > > The problem i am having implementing this is really everything > revolves arount these to flex rules: > > "\n" { return NEWLINE; } > " " { return INDENT; } > > The problm being there is no lexical token that we actually read in > the file for DEDENT, so my idea is so far either create a handwritten > lexer or do somthing like: > > "\n" { vec_push( 0 ); return NEWLINE; } > " " { vec_head->indent++; return INDENT; } > > Then with newline i can do some if checks to figure out if there was a > dedent, but the problem is i will need things like return DEDENT then > immediately after return INDENT or NEWLINE, and C wont allow multiple > returns in one code block ;) > > So then i could then make a general token stack for what to actually > return in flex but this all sounds very complicated with lots of > vector work which i think i could do but its not the most pleasant of > solutions maybe you guys would have some insight. > > --Phil > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > _______________________________________________ > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help |