Re: [Flex-help] Python Lexical Analysis

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Can't you have a global stack in your scanner to keep track of the
indentation levels? Something like:

[ \t]+ {
  size_t top_len = yyextra->indent_stack.top();
  if (top_len < yyleng) {
    yyextra->indent_stack.push(yyleng);
    return INDENT;
  } else if (top_len > yyleng) {
    yyextra->indent_stack.pop();
    return DEDENT;
  }
}

You will need to add proper start conditions to only let this fire after
you return a NEWLINE. Additionally you need to handle cases like this:

  foo
      bar
        etc
    var

I think after scanning "etc" it should return DEDENT, DEDENT, INDENT; but
the sample lex example above won't handle that correctly. You should be
able to pull this off with clever start conditions, or YY_USER_ACTION
however.

On Tue, 3 Aug 2010 06:19:59 +0100, Philip Herron
<her...@go...> wrote:
> Hey guys
> 
> I am writing a flex and bison implementation of the python parser, and
> coming up on a problem, thinking i may need to revert to a hand
> written lexer, but i would like the insight of more experienced flex
> guys.
> 
> The problem being is the suite grammar which is the grammar for a block
> example:
> 
> def foo ( .. ) :
>  <code_block>
> 
> I am not going to talk about grammar but what i want to illustrate is
> the problem in figuring out indentation the grammar for something like
> this is as follows:
> 
> http://docs.python.org/release/2.5.2/ref/grammar.txt
> 
> funcdef ::=
>              [decorators] "def" funcname "(" [parameter_list] ")"
>               ":" suite
> 
> suite ::=
>              stmt_list NEWLINE
>               | NEWLINE INDENT statement+ DEDENT
> 
> The problem being is on the lexical side of things figuring out what
> is INDENT and DEDENT, so for this parser i am requiring 4 spaces for
> an indent because that's what emacs python-mode is doing for me for
> now anyways. So reading up on:
> 
> http://docs.python.org/reference/lexical_analysis.html
> 
> The indentation part they use a stack to figure out the indentation
> levels, so first off 0 is pushed onto a stack as a kind of initializer
> or baseline for the system, then if we find an indentation on a new
> logical line we push 1 onto the stack if we find multiple we need to
> check that level of indentation exists on the stack and so on you get
> the idea you need to read that little paragraph, this is all to figure
> out when to generate a DEDENT token which is the real crux of the
> problem.
> 
> The problem i am having implementing this is really everything
> revolves arount these to flex rules:
> 
> "\n"                    { return NEWLINE; }
> "    "                  { return INDENT; }
> 
> The problm being there is no lexical token that we actually read in
> the file for DEDENT, so my idea is so far either create a handwritten
> lexer or do somthing like:
> 
> "\n"                    { vec_push( 0 ); return NEWLINE; }
> "    "                  { vec_head->indent++; return INDENT; }
> 
> Then with newline i can do some if checks to figure out if there was a
> dedent, but the problem is i will need things like return DEDENT then
> immediately after return INDENT or NEWLINE, and C wont allow multiple
> returns in one code block ;)
> 
> So then i could then make a general token stack for what to actually
> return in flex but this all sounds very complicated with lots of
> vector work which i think i could do but its not the most pleasant of
> solutions maybe you guys would have some insight.
> 
> --Phil
> 
>
------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://p.sf.net/sfu/dev2dev-palm
> _______________________________________________
> Flex-help mailing list
> Fle...@li...
> https://lists.sourceforge.net/lists/listinfo/flex-help

Re: [Flex-help] Python Lexical Analysis

flex is a tool for generating scanners

Re: [Flex-help] Python Lexical Analysis