[Seed7-users] Preview chapter about symbol scanning functions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello,
I am writing a chapter about symbol scanning functions.
It would be nice to get some feedback about it
before the final release. Here is it:

8.7 Scanning a file

    The I/O concept introduced in the previous chapters separates
  the input of data from it's conversion. The 'read', 'readln',
  'getwd' and 'getln' functions are designed to read whitespace
  separated data elements. When the data elements are not separated
  by whitespace characters this I/O concept is not possible.
  Instead the functions which read from the file need some
  knowledge about the type which they intend to read. Fortunately
  this is a well researched area. The lexical scanners used by
  compilers solve exactly this problem.

  Lexical scanners read symbols from a file and use the concept of
  a current character. A symbol can be a name, a number, a string,
  an operator, a parenthesis or something else. The current
  character is the first character to be processed when scanning a
  symbol. After a scanner has read a symbol the current character
  contains the character just after the symbol. This character
  could be the first character of the next symbol or some
  whitespace character. If the set of symbols is choosen wisely all
  decisions about the type of the symbol and when to stop reading
  characters for a symbol can be done based on the current
  character.

  Every 'file' contains a 'bufferChar' variable which is used as
  current character by the scanner functions defined in the
  "scanfile.s7i" library. The "scanfile.s7i" library contains
  skip... and get... functions. The skip... procedures return void
  and are used to skip input while the get... functions return the
  string of characters they have read. The following basic scanner
  functions are defined in the "scanfile.s7i" library:

    skipComment
      Skips a possibly nested comment from a 'file'.
    getComment
      Reads a possibly nested comment from a 'file'.
    skipLineComment
      Skips a line comment from a 'file'.
    getLineComment
      Reads a line comment from a 'file'.
    getDigits
      Reads a sequence of digits from a 'file'.
    getNumber
      Reads a numeric literal from a 'file'.
    getCharLiteral
      Reads a character literal from a 'file'.
    getStringLiteral
      Reads a string literal from a 'file'.
    getName
      Reads an alphanumeric name from a 'file'.

  Contrary to 'read' and 'getwd' basic scanner functions
  do not skip leading whitespace characters. To skip whitespace
  characters one of the following functions can be used:

    skipSpace
      Skips space characters from a 'file'.
    skipWhiteSpace
      Skips whitespace characters from a 'file'.
    getWhiteSpace
      Reads whitespace characters from a 'file'.
    skipLine
      Skips a line from a 'file'.
    getLine
      Reads a line from a 'file'.

  The advanced scanner functions do skip whitespace characters
  before reading a symbol:

    getSymbolOrComment
      Reads a symbol or a comment from a 'file'.
    getSymbol
      Reads a symbol from a 'file'.
    getHtmlTagSymbolOrComment
      Reads a HTML tag, a symbol or a comment from a 'file'.
    getHtmlTagOrContent
      Reads a HTML tag or the HTML content text from a 'file'.
    getSimpleSymbol
      Reads a simple symbol from a 'file'.

  All scanner functions assume that the first character to be
  processed is in 'bufferChar' and after they are finished the next
  character which should be processed is also in 'bufferChar'.
  To use scanner functions for a new opened file it is necessary to
  assign the first character to the 'bufferChar' with:

    myFile.bufferChar := getc(myFile);

  In most cases whole files are either processed with normal I/O
  functions or with scanner functions. When normal I/O functions
  need to be combined with scanner functions care has to be taken:

    - When the last function which read from a file was
      one of 'read', 'readln', 'getwd' or 'getln'
      the 'bufferChar' already contains the character which
      should be processed next and therefore subsequent scanner
      functions can be used.
    - Other I/O functions like 'getc' and 'gets'
      do not assign something to 'bufferChar'. In this
      case something should be assigned to 'bufferChar'.
    - Switching back from scanner functions to
      normal I/O functions is best done when the content of
      'bufferChar' is known. For example at the end
      of the line.

  Scanner functions are helpful when it is necessary to read
  numeric input without failing when no digits are present:

    skipWhiteSpace(IN);
    if eoln(IN) then
      writeln("empty input");
    elsif IN.bufferChar in {'0' .. '9'} then
      number := integer parse getDigits(IN);
      skipLine(IN);
      writeln("number " <& number);
    else
      stri := getLine(IN);
      writeln("command " <& literal(stri));
    end if;

  The function 'getSymbol' is designed to read Seed7 symbols. When
  it returns "" the end of the file is reached. The following loop
  can be used to process the symbols of a Seed7 program:

    inFile.bufferChar := getc(inFile);
    currSymbol := getSymbol(inFile);
    while currSymbol <> "" do
      ... process currSymbol ...
      currSymbol := getSymbol(inFile);
    end while;

  Whitespace and comments are automatically skipped with the
  function 'getSymbol'. When comments should also be returned the
  function 'getSymbolOrComment' can be used. Together with the
  function 'getWhiteSpace' it is even possible to get the
  whitespace between the symbols:

    const func string: processFile (in string: fileName) is func
      result
        var string: result is "";
      local
        var file: inFile is STD_NULL;
        var string: currSymbol is "";
      begin
        inFile := open(fileName, "r");
        if inFile <> STD_NULL then
          inFile.bufferChar := getc(inFile);
          result := getWhiteSpace(inFile);
          currSymbol := getSymbolOrComment(inFile);
          while currSymbol <> "" do
            result &:= currSymbol;
            result &:= getWhiteSpace(inFile);
            currSymbol := getSymbolOrComment(inFile);
          end while;
        end if;
      end func;

  In the example above the function 'processFile' gathers all
  symbols, whitespace and comments in the string it returns. The
  string returned by 'processFile' is equivalent to the one
  returned by the function 'getf'. That way it is easy to test
  the scanner functionality.

  The logic with 'getWhiteSpace' and 'getSymbolOrComment' can be used
  to add HTML tags to comments and literals. The following function
  colors comments with green, string and char literals with maroon and
  numeric literals with purple:

    const proc: sourceToHtml (inout file: inFile, inout file: outFile) is func
      local
        var string: currSymbol is "";
      begin
        inFile.bufferChar := getc(inFile);
        write(outFile, "<pre>\n");
        write(outFile, getWhiteSpace(inFile));
        currSymbol := getSymbolOrComment(inFile);
        while currSymbol <> "" do
          currSymbol := replace(currSymbol, "&amp;", "&amp;amp;");
          currSymbol := replace(currSymbol, "<", "&amp;lt;");
          if currSymbol[1] in {'"', '''} then
            write(outFile, "<font color=\"maroon\">");
            write(outFile, currSymbol);
            write(outFile, "</font>");
          elsif currSymbol[1] = '#' or startsWith(currSymbol, "(*") then
            write(outFile, "<font color=\"green\">");
            write(outFile, currSymbol);
            write(outFile, "</font>");
          elsif currSymbol[1] in digit_char then
            write(outFile, "<font color=\"purple\">");
            write(outFile, currSymbol);
            write(outFile, "</font>");
          else
            write(outFile, currSymbol);
          end if;
          write(outFile, getWhiteSpace(inFile));
          currSymbol := getSymbolOrComment(inFile);
        end while;
        write(outFile, "</pre>\n");
      end func;

  The functions 'skipSpace' and 'skipWhiteSpace' are defined in
  the "scanfile.s7i" library as follows:

    const proc: skipSpace (inout file: inFile) is func
      local
        var char: ch is ' ';
      begin
        ch := inFile.bufferChar;
        while ch = ' ' do
          ch := getc(inFile);
        end while;
        inFile.bufferChar := ch;
      end func;

    const proc: skipWhiteSpace (inout file: inFile) is func
      begin
        while inFile.bufferChar in white_space_char do
          inFile.bufferChar := getc(inFile);
        end while;
      end func;

  The functions 'skipComment' and 'skipLineComment', which can be
  used to skip Seed7 comments, are defined as follows:

    const proc: skipComment (inout file: inFile) is func
      local
        var char: character is ' ';
      begin
        character := getc(inFile);
        repeat
          repeat
            while character not in special_comment_char do
              character := getc(inFile);
            end while;
            if character = '(' then
              character := getc(inFile);
              if character = '*' then
                skipComment(inFile);
                character := getc(inFile);
              end if;
            end if;
          until character = '*' or character = EOF;
          if character <> EOF then
            character := getc(inFile);
          end if;
        until character = ')' or character = EOF;
        if character = EOF then
          inFile.bufferChar := EOF;
        else
          inFile.bufferChar := getc(inFile);
        end if;
      end func; # skipComment

    const proc: skipLineComment (inout file: inFile) is func
      local
        var char: character is ' ';
      begin
        repeat
          character := getc(inFile);
        until character = '\n' or character = EOF;
        inFile.bufferChar := character;
      end func; # skipLineComment

================================

Thanks in advance for your effort.

Greetings Thomas Mertes

Seed7 Homepage:  http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

-- 
Super-Acktion nur in der GMX Spieleflat: 10 Tage für 1 Euro.
Über 180 Spiele downloaden und spiele: http://flat.games.gmx.de

[Seed7-users] Preview chapter about symbol scanning functions

Interpreter and compiler for the Seed7 programming language.

[Seed7-users] Preview chapter about symbol scanning functions