flex-help Mailing List for flex: the fast lexical analyser (Page 8)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I just started with flex and I wonder if you would consider some manual changes. As an addendum I am included my personal 'cheat sheet'. I doubt that I will ever change it and it (probably) severely handicapped with errors, but it gives an indication of my own focus. If it has any use, use it, otherwise, don't.

In the end each 'suggestion' 
should be prefixed or suffixed with the statement "would it be possible 
to". Since taste is subjective the list of things only reflect my own 
understanding and not that of the developers. Whatever comes of the 
suggestion the product is an outstandingly good one. Thanks.

art

There is no standardized capitalization rules for anything. Do you see a way forward where you could provide a standard for capitalization? As an example, in Section 10 you have the initial start condition noted as INITIAL and the start condition in the example noted as 'expect'. Why not EXPECT to make capitalization consistent with INITIAL?

Section 6 Patterns
There is no indication of what the valid range of characters is so there is no way to determined what {^x] means. I have assumed that the range is [:print:] so that [^x} translates to [:print:]{-}x or [\x21-\x7E]{-}x.

The character '*' seems to have two independent meanings, <sc>a* is different from [a*]. If this is correct it would be nice to include some statement to this effect in the manual.

You have specified that '.' represents any character except a newline. Does this mean that [^.] represents the range [\x00-\xFF]{-}\n, [:print:]{-}\n,[:graph:]{-}\n or ...? 

flex supports either a 7bit or an 8bit mode. What effect does either mode have on allowable patterns? Are the patterns restricted to [\x00-\x7E] so that they work in both modes? I think some statement concerning the universe of discourse and the effect on 7bit and 8bit is warranted.

There is no example on how to use [::] predefined ranges. Would it be possible to provide some? For example, in the definitions section:

is it
    NAME [[:print:]]     or

    NAME [{[:print:]}]   and/or

    NAME {[:print:]}     and/or

    NAME {[:print:]}     and/or

    NAME [:print:]

and similarly in the rules section.

The definition and example given for r/s should be made somewhat clearer. Perhaps as in:

 "If r and s are regular expressions, then r/s means that the
  lexer will recognize r only if s is present. If s is not 

  present then scanning will continue."

 "The regular expression 's' should be of fixed size. That is if 's '

  stands for a* where we are asking for an indefinite number of a's,
  then flex will/will not be able to generate a lexer."

 "In a similar way, if 's' is [abc|defg], where the alternatives are
  not the same size, flex will/will not be able to generate a lexer."
  Or the generated lexer will be ...

 "As an example, suppose we are generating a lexer for a mini-language,
  (right by the mini-bar in your hotel room) and we want all input
  numbers to be followed by a space. In other word, 123x will not be
  legal. We can represent this by:"

    [:digit]+/' '

 "Which translates to, follow one or more digits, [0-9], with a space."

 "Without the forward reference, that is [:digit:], the scanner will
  recognize a number followed by anything, for example, 'x', where the
  'x' will be scanned by the lexer as the beginning of some other
  token giving us two tokens, one for a number and one for 'x', instead
  of rejecting the input pattern."

Section 10 Start Conditions
The example on strings doesn't contain hexadecimal input. Is this deliberate?

In my lexer input files it seems that "<sc>{" will only work when '{' is adjacent to <sc>, that is, there is no space between the '>' and the '{'. If this is accurate do you think it should be mentioned?

Indices
In order of preference:
1: Would it be possible to include descriptive text on each value?
2: Could all descriptions be put into the same column, as in a table?

The alphabetized list serves a purpose for users who know the name of an item but do not remember it's full meaning or implementation. The current descriptive text allows a user to discover the organization of items but not the meaning of an item. Would it be possible to provide a separate set of indices organized functionally? This is useful to someone, like me, who is just starting out, doesn't know the names of items (so the alphabetized list has less meaning) but does understand the functional area where insight is needed, and with a little description, I'd be on my way.

I am including tables, below, that I constructed for my own use. I assure you that it is full of mistakes, is incomplete and should be subject to some loud laughs, but I think it covers most of flex in an acceptable manner. The first table is a template that I use to summarize various sections in the manual, and the second contains a summary of my understanding (and guesses) as to what a pattern is.

------------------------------------------------------
  Index of Multiple Input Buffer Functions and Macros
------------------------------------------------------

    ---------------------------------
              Functions
    ---------------------------------
    yy_create_buffer(File *, int)        Creates an input buffer
    yy_delete_buffer(YY_BUFFER_STATE)    Reclaim buffer memory
    yy_flush_buffer()                    Forces YY_INPUT() on next scan
    yy_new_buffer(FILE *, int)           Creates an input buffer (C++)
    yy_scan_buffer(char *, yy_size_t)    Create a yy_size_t YY_BUFFER_STATE using char *
    yy_scan_bytes(const char *, int)     Copy int bytes of char * and Create a YY_BUFFER_STATE
    yy_scan_string(const char *)         Copy all bytes of char * and Create a YY_BUFFER_STATE
    yy_switch_to_buffer(YY_BUFFER_STATE) Switch input buffer
    yypush_buffer_state(YY_BUFFER_STATE) Pushes buffer onto stack
    yypop_buffer_state()                 Pops and deletes buffer from stack

    ---------------------------------
              Typedefs
    ---------------------------------
    yy_size_t                            Integer typedef

    ---------------------------------
              Variables
    ---------------------------------
    YY_CURRENT_BUFFER                    YY_BUFFER_STATE handle of top of stack

-------------------------------------------
     Index of Patterns
-------------------------------------------
r, s stand for any regular expression, called 'pattern' below.

The collating order is for ANSI-C characters {UTF-8} and the normal range of characters considered is \x21-\x7FE.

7-bit and 8-bit options have no affect in pattern recognition.

Whitespace is defined as a blank (" ") plus any of \a, \b, \f, \n, \r, \t, \v in the pattern.

When a pattern is matched it is immediately processed and the next character is processed as the first character in the current start condition. This has the effect that if we are checking for decimal numbers followed by a space, 123abc is matched as 123 and then abc.

Pattern matching starts at column 1 of the rules section in the input file to flex. Column 1 in the input file is defined as either the first column following a new line or the first column following the close of a start condition list, ('<sc,...>').

The character '*' has a dual role. Within a bracketed regular expression, [ ],
and quoted, "*", it stands for itself. Otherwise it indicates 0 or more instances
of a regular expression.

The character '.' in no place stands for itself except when quoted, "." or in [?s:r]. In
all other cases it indicates "any character in [\x00-\x7E]" except \n.

\                              Escape the next character
\"                             Match '"'
\<                             Match '<' (only at line 1)
\>                             Match '>' (only at line 1)
\0                             Match the binary value ANSI-C NUL
\a                             Match the binary value 
\b                             Match the binary value ANSI-C BEL
\f                             Match the binary value ANSI_C FF form feed
\n                             Match the binary value ANSI-C newline
\r                             Match the binary value ANSI-C carriage return
\t                             Match the binary value ANSI-C HT horizonal tab
\v                             Match the binary value ANSI-C VT vertical tab
\123                           Match the binary value in octal
\x123                          Match the binary value in hexadecimal
x                              Match the character 'x'
.                              Match any character except '\n'
[abc]                          Match any of an 'a', 'b', or 'c' character
[a-m]                          Match any character from 'a' to 'm'
[abj-z]                        Match any of 'a', 'b', and 'j' through 'z'
[^abc]                         Match any character in [:ptrint:] except 'a', 'b', or 'c'
[^abc-m]                       Match any character in [:ptrint:] except 'a', 'b', or 'c' through 'm'
[a-z]{-}b                      Match any character except 'b'
r{-}s                          Exclude pattern s from pattern r
[a-z]{-}[aeiou]                Match any character except the vowels
[a-z]{-}[m-p]                  Match any character except any of 'm' through 'p'
r{+}s                          Include pattern s into pattern r
[a-z]{+}\n                     Match lower case characters and a new line
[a-z]{+}[0-9]                  Match lower case characters and numbers
"text"                         Match the literal string "text"
[: :]                          Predefined pattern
[:^ :]                         Negation of predefined pattern (everything except)
[:alnum:]                      Match [a-zA-Z0-9]
[:alpha:]                      Match [a-zA-Z]
[:blank:]                      Match [ \t]
[:cntrl:]                      Match [\x00-\x1F\x7F]
[:digit:]                      Match [0-9]
[:graph:]                      Match [\x21-\x7E]
[:lower:]                      Match [a-z]
[:print:]                      Match [\x20-\x7E]
[:punct:]                      Match [!"#$%&'()*+,\-./:;<=>?@\[\\\]^_`{|}~]
[:space:]                      Match [ \f\n\r\t\v]
[:upper:]                      Match [A-Z]
[:xdigit:]                     Match [a-fA-F0-9]

Each of the above is a regular expression.
{NAME}                         Reference to NAME surrounded by '(' ')'
r?                             Match the pattern 'r' 0 or 1 times
r+                             Match the pattern 'r' 1 or more times
r*                             Match the pattern 'r' 0 or more times
r{4}                           Match the pattern 'r' exactly 4 times
r{2,}                          Match the pattern 'r' 2 or more times
r{2,4}                         Match the pattern 'r' 2, 3, or 4 times
(r)                            Match the pattern 'r' and override precedence
rs                             Match pattern 'r' followed by 's' - concatenation
r|s                            Match either pattern 'r' or pattern 's'
r/s                            Match pattern 'r' but only if followed by 's'
^r                             Match pattern 'r' starting at the beginning of a line
r$                             Match pattern 'r' at the end of a line
<<EOF>>                        Match the end of the input file
(?o:r)                         Convert pattern 'r' according to the options 'o'
                               Note: the all negative options precede positive options
    i                          Case insensitive - [A] becomes [aA]
   -i                          Case sensitive   - [aA] becomes [aA]  default
    s                          Match any byte to '.'
   -s                          Match any byte except '\n' to '.'     default
    x                          Treat '/* */' as a comment and ignore whitespace
   -x                          Treat '/* */' as part of the pattern  default
                               do not ignore whitspace in patterns
   (?isx:r)                    Convert 'r', case insensitive, '.' matches anything
                               /* */ is part of the pattern
   (?x-is:r)                   Convert 'r', allow comments, do not change pattern
                               case, match '.' to any character except '\n'
(?# comment }                  Embedded comment in pattern    

2004	Jan	Feb	Mar (2)	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec
2006	Jan	Feb (2)	Mar (2)	Apr (2)	May (3)	Jun (4)	Jul (10)	Aug (6)	Sep (20)	Oct (30)	Nov (10)	Dec (40)
2007	Jan (25)	Feb (18)	Mar (34)	Apr (36)	May (29)	Jun (1)	Jul (35)	Aug (5)	Sep (7)	Oct (15)	Nov (16)	Dec (13)
2008	Jan (11)	Feb (23)	Mar (17)	Apr (32)	May (7)	Jun (20)	Jul (2)	Aug (13)	Sep (13)	Oct (16)	Nov (3)	Dec (17)
2009	Jan (10)	Feb (10)	Mar (13)	Apr (3)	May (25)	Jun (11)	Jul (1)	Aug (17)	Sep (19)	Oct (9)	Nov (20)	Dec (22)
2010	Jan (29)	Feb (13)	Mar (11)	Apr (10)	May (9)	Jun (13)	Jul (4)	Aug (28)	Sep (8)	Oct (8)	Nov (4)	Dec (7)
2011	Jan (3)	Feb (3)	Mar (5)	Apr (4)	May (2)	Jun (7)	Jul (12)	Aug (10)	Sep (6)	Oct (14)	Nov (1)	Dec (9)
2012	Jan (6)	Feb (1)	Mar (13)	Apr (4)	May (5)	Jun (1)	Jul (6)	Aug (18)	Sep (12)	Oct (46)	Nov (7)	Dec (4)
2013	Jan (2)	Feb (3)	Mar	Apr (5)	May (2)	Jun (11)	Jul	Aug	Sep	Oct (11)	Nov (16)	Dec (1)
2014	Jan (2)	Feb (1)	Mar	Apr (11)	May	Jun (2)	Jul (2)	Aug	Sep	Oct (8)	Nov (1)	Dec (7)
2015	Jan	Feb (1)	Mar	Apr	May (1)	Jun	Jul (11)	Aug (1)	Sep	Oct	Nov	Dec (2)
2016	Jan (1)	Feb (4)	Mar (6)	Apr (2)	May (15)	Jun (19)	Jul (10)	Aug	Sep (1)	Oct (6)	Nov (4)	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2018	Jan (4)	Feb (1)	Mar (5)	Apr	May	Jun (3)	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb (3)	Mar	Apr	May	Jun	Jul (1)	Aug (1)	Sep	Oct	Nov	Dec
2021	Jan (3)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2022	Jan	Feb	Mar (3)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar	Apr	May	Jun (5)	Jul	Aug	Sep	Oct	Nov	Dec (1)

flex-help Mailing List for flex: the fast lexical analyser (Page 8)

flex is a tool for generating scanners

flex-help — help with using flex in other applications