[Seed7-users] Preview chapter about symbol scanning functions
Interpreter and compiler for the Seed7 programming language.
Brought to you by:
thomas_mertes
From: Thomas M. <tho...@gm...> - 2008-05-23 16:10:11
|
Hello, I am writing a chapter about symbol scanning functions. It would be nice to get some feedback about it before the final release. Here is it: 8.7 Scanning a file The I/O concept introduced in the previous chapters separates the input of data from it's conversion. The 'read', 'readln', 'getwd' and 'getln' functions are designed to read whitespace separated data elements. When the data elements are not separated by whitespace characters this I/O concept is not possible. Instead the functions which read from the file need some knowledge about the type which they intend to read. Fortunately this is a well researched area. The lexical scanners used by compilers solve exactly this problem. Lexical scanners read symbols from a file and use the concept of a current character. A symbol can be a name, a number, a string, an operator, a parenthesis or something else. The current character is the first character to be processed when scanning a symbol. After a scanner has read a symbol the current character contains the character just after the symbol. This character could be the first character of the next symbol or some whitespace character. If the set of symbols is choosen wisely all decisions about the type of the symbol and when to stop reading characters for a symbol can be done based on the current character. Every 'file' contains a 'bufferChar' variable which is used as current character by the scanner functions defined in the "scanfile.s7i" library. The "scanfile.s7i" library contains skip... and get... functions. The skip... procedures return void and are used to skip input while the get... functions return the string of characters they have read. The following basic scanner functions are defined in the "scanfile.s7i" library: skipComment Skips a possibly nested comment from a 'file'. getComment Reads a possibly nested comment from a 'file'. skipLineComment Skips a line comment from a 'file'. getLineComment Reads a line comment from a 'file'. getDigits Reads a sequence of digits from a 'file'. getNumber Reads a numeric literal from a 'file'. getCharLiteral Reads a character literal from a 'file'. getStringLiteral Reads a string literal from a 'file'. getName Reads an alphanumeric name from a 'file'. Contrary to 'read' and 'getwd' basic scanner functions do not skip leading whitespace characters. To skip whitespace characters one of the following functions can be used: skipSpace Skips space characters from a 'file'. skipWhiteSpace Skips whitespace characters from a 'file'. getWhiteSpace Reads whitespace characters from a 'file'. skipLine Skips a line from a 'file'. getLine Reads a line from a 'file'. The advanced scanner functions do skip whitespace characters before reading a symbol: getSymbolOrComment Reads a symbol or a comment from a 'file'. getSymbol Reads a symbol from a 'file'. getHtmlTagSymbolOrComment Reads a HTML tag, a symbol or a comment from a 'file'. getHtmlTagOrContent Reads a HTML tag or the HTML content text from a 'file'. getSimpleSymbol Reads a simple symbol from a 'file'. All scanner functions assume that the first character to be processed is in 'bufferChar' and after they are finished the next character which should be processed is also in 'bufferChar'. To use scanner functions for a new opened file it is necessary to assign the first character to the 'bufferChar' with: myFile.bufferChar := getc(myFile); In most cases whole files are either processed with normal I/O functions or with scanner functions. When normal I/O functions need to be combined with scanner functions care has to be taken: - When the last function which read from a file was one of 'read', 'readln', 'getwd' or 'getln' the 'bufferChar' already contains the character which should be processed next and therefore subsequent scanner functions can be used. - Other I/O functions like 'getc' and 'gets' do not assign something to 'bufferChar'. In this case something should be assigned to 'bufferChar'. - Switching back from scanner functions to normal I/O functions is best done when the content of 'bufferChar' is known. For example at the end of the line. Scanner functions are helpful when it is necessary to read numeric input without failing when no digits are present: skipWhiteSpace(IN); if eoln(IN) then writeln("empty input"); elsif IN.bufferChar in {'0' .. '9'} then number := integer parse getDigits(IN); skipLine(IN); writeln("number " <& number); else stri := getLine(IN); writeln("command " <& literal(stri)); end if; The function 'getSymbol' is designed to read Seed7 symbols. When it returns "" the end of the file is reached. The following loop can be used to process the symbols of a Seed7 program: inFile.bufferChar := getc(inFile); currSymbol := getSymbol(inFile); while currSymbol <> "" do ... process currSymbol ... currSymbol := getSymbol(inFile); end while; Whitespace and comments are automatically skipped with the function 'getSymbol'. When comments should also be returned the function 'getSymbolOrComment' can be used. Together with the function 'getWhiteSpace' it is even possible to get the whitespace between the symbols: const func string: processFile (in string: fileName) is func result var string: result is ""; local var file: inFile is STD_NULL; var string: currSymbol is ""; begin inFile := open(fileName, "r"); if inFile <> STD_NULL then inFile.bufferChar := getc(inFile); result := getWhiteSpace(inFile); currSymbol := getSymbolOrComment(inFile); while currSymbol <> "" do result &:= currSymbol; result &:= getWhiteSpace(inFile); currSymbol := getSymbolOrComment(inFile); end while; end if; end func; In the example above the function 'processFile' gathers all symbols, whitespace and comments in the string it returns. The string returned by 'processFile' is equivalent to the one returned by the function 'getf'. That way it is easy to test the scanner functionality. The logic with 'getWhiteSpace' and 'getSymbolOrComment' can be used to add HTML tags to comments and literals. The following function colors comments with green, string and char literals with maroon and numeric literals with purple: const proc: sourceToHtml (inout file: inFile, inout file: outFile) is func local var string: currSymbol is ""; begin inFile.bufferChar := getc(inFile); write(outFile, "<pre>\n"); write(outFile, getWhiteSpace(inFile)); currSymbol := getSymbolOrComment(inFile); while currSymbol <> "" do currSymbol := replace(currSymbol, "&", "&amp;"); currSymbol := replace(currSymbol, "<", "&lt;"); if currSymbol[1] in {'"', '''} then write(outFile, "<font color=\"maroon\">"); write(outFile, currSymbol); write(outFile, "</font>"); elsif currSymbol[1] = '#' or startsWith(currSymbol, "(*") then write(outFile, "<font color=\"green\">"); write(outFile, currSymbol); write(outFile, "</font>"); elsif currSymbol[1] in digit_char then write(outFile, "<font color=\"purple\">"); write(outFile, currSymbol); write(outFile, "</font>"); else write(outFile, currSymbol); end if; write(outFile, getWhiteSpace(inFile)); currSymbol := getSymbolOrComment(inFile); end while; write(outFile, "</pre>\n"); end func; The functions 'skipSpace' and 'skipWhiteSpace' are defined in the "scanfile.s7i" library as follows: const proc: skipSpace (inout file: inFile) is func local var char: ch is ' '; begin ch := inFile.bufferChar; while ch = ' ' do ch := getc(inFile); end while; inFile.bufferChar := ch; end func; const proc: skipWhiteSpace (inout file: inFile) is func begin while inFile.bufferChar in white_space_char do inFile.bufferChar := getc(inFile); end while; end func; The functions 'skipComment' and 'skipLineComment', which can be used to skip Seed7 comments, are defined as follows: const proc: skipComment (inout file: inFile) is func local var char: character is ' '; begin character := getc(inFile); repeat repeat while character not in special_comment_char do character := getc(inFile); end while; if character = '(' then character := getc(inFile); if character = '*' then skipComment(inFile); character := getc(inFile); end if; end if; until character = '*' or character = EOF; if character <> EOF then character := getc(inFile); end if; until character = ')' or character = EOF; if character = EOF then inFile.bufferChar := EOF; else inFile.bufferChar := getc(inFile); end if; end func; # skipComment const proc: skipLineComment (inout file: inFile) is func local var char: character is ' '; begin repeat character := getc(inFile); until character = '\n' or character = EOF; inFile.bufferChar := character; end func; # skipLineComment ================================ Thanks in advance for your effort. Greetings Thomas Mertes Seed7 Homepage: http://seed7.sourceforge.net Seed7 - The extensible programming language: User defined statements and operators, abstract data types, templates without special syntax, OO with interfaces and multiple dispatch, statically typed, interpreted or compiled, portable, runs under linux/unix/windows. -- Super-Acktion nur in der GMX Spieleflat: 10 Tage für 1 Euro. Über 180 Spiele downloaden und spiele: http://flat.games.gmx.de |