Re: [Flex-help] Python Lexical Analysis
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Daniel Varga-H. <var...@ov...> - 2010-08-03 12:09:49
|
I am not, by all means, an expert on Flex and without giving much thought to it, I would possibly check every line how many time it's been indented. Whenever it is indented I would enter a start condition and when that happens increment a global, like YY_EXTRA. Something like: <ST_INDENT> " " { YY_EXTRA++; BEGIN(ST_INDENT); } This way YY_EXTRA would contain the number of indentions and whenever your LEXER returns a token to BISON, you can examine the YY_EXTRA's value in the grammar file to see the level of indention. This is quite barbaric I suppose and might not even work. It's been like a year since I last used Flex, but I hope I could give you another perspective even if the practical things aren't accurate as they are now. As well I have found some cookies for you :) http://docs.python.org/release/2.5.2/ref/grammar.txt Regards, Daniel On Tue, Aug 3, 2010 at 8:16 AM, Marcel Laverdet <ma...@la...> wrote: > > > Can't you have a global stack in your scanner to keep track of the > indentation levels? Something like: > > [ \t]+ { > size_t top_len = yyextra->indent_stack.top(); > if (top_len < yyleng) { > yyextra->indent_stack.push(yyleng); > return INDENT; > } else if (top_len > yyleng) { > yyextra->indent_stack.pop(); > return DEDENT; > } > } > > You will need to add proper start conditions to only let this fire after > you return a NEWLINE. Additionally you need to handle cases like this: > > foo > bar > etc > var > > I think after scanning "etc" it should return DEDENT, DEDENT, INDENT; but > the sample lex example above won't handle that correctly. You should be > able to pull this off with clever start conditions, or YY_USER_ACTION > however. > > On Tue, 3 Aug 2010 06:19:59 +0100, Philip Herron > <her...@go...> wrote: >> Hey guys >> >> I am writing a flex and bison implementation of the python parser, and >> coming up on a problem, thinking i may need to revert to a hand >> written lexer, but i would like the insight of more experienced flex >> guys. >> >> The problem being is the suite grammar which is the grammar for a block >> example: >> >> def foo ( .. ) : >> <code_block> >> >> I am not going to talk about grammar but what i want to illustrate is >> the problem in figuring out indentation the grammar for something like >> this is as follows: >> >> http://docs.python.org/release/2.5.2/ref/grammar.txt >> >> funcdef ::= >> [decorators] "def" funcname "(" [parameter_list] ")" >> ":" suite >> >> suite ::= >> stmt_list NEWLINE >> | NEWLINE INDENT statement+ DEDENT >> >> The problem being is on the lexical side of things figuring out what >> is INDENT and DEDENT, so for this parser i am requiring 4 spaces for >> an indent because that's what emacs python-mode is doing for me for >> now anyways. So reading up on: >> >> http://docs.python.org/reference/lexical_analysis.html >> >> The indentation part they use a stack to figure out the indentation >> levels, so first off 0 is pushed onto a stack as a kind of initializer >> or baseline for the system, then if we find an indentation on a new >> logical line we push 1 onto the stack if we find multiple we need to >> check that level of indentation exists on the stack and so on you get >> the idea you need to read that little paragraph, this is all to figure >> out when to generate a DEDENT token which is the real crux of the >> problem. >> >> The problem i am having implementing this is really everything >> revolves arount these to flex rules: >> >> "\n" { return NEWLINE; } >> " " { return INDENT; } >> >> The problm being there is no lexical token that we actually read in >> the file for DEDENT, so my idea is so far either create a handwritten >> lexer or do somthing like: >> >> "\n" { vec_push( 0 ); return NEWLINE; } >> " " { vec_head->indent++; return INDENT; } >> >> Then with newline i can do some if checks to figure out if there was a >> dedent, but the problem is i will need things like return DEDENT then >> immediately after return INDENT or NEWLINE, and C wont allow multiple >> returns in one code block ;) >> >> So then i could then make a general token stack for what to actually >> return in flex but this all sounds very complicated with lots of >> vector work which i think i could do but its not the most pleasant of >> solutions maybe you guys would have some insight. >> >> --Phil >> >> > ------------------------------------------------------------------------------ >> The Palm PDK Hot Apps Program offers developers who use the >> Plug-In Development Kit to bring their C/C++ apps to Palm for a share >> of $1 Million in cash or HP Products. Visit us here for more details: >> http://p.sf.net/sfu/dev2dev-palm >> _______________________________________________ >> Flex-help mailing list >> Fle...@li... >> https://lists.sourceforge.net/lists/listinfo/flex-help > > ------------------------------------------------------------------------------ > The Palm PDK Hot Apps Program offers developers who use the > Plug-In Development Kit to bring their C/C++ apps to Palm for a share > of $1 Million in cash or HP Products. Visit us here for more details: > http://p.sf.net/sfu/dev2dev-palm > _______________________________________________ > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > |