At lexical analysis step the SQL parser should do a character based analysis of the SQL instruction flow and extract lexical elements (tokens). Lexical analysis should extract numbers and character strings, keywords and identifiers. Identifiers and reserved words text should be converted to upper case (as identifiers and reserved words are case-insensitive).
Lexical analysis should use a system abstraction layer in order to detect linefeeds (linefeed character sequence is system-dependant) and translate character values from source text's character encoding to database's universal character encoding (multibyte Unicode ?). The lexical analysis step should also remove comments - as they are irrelevant for semantic analysis.
Lexical analysis result should be some form of token flow: a list of objects where every object is an instance of a lexical element class of the SQL Framework library. Token objects structure should allow the semantic analysis step to easy identify type of the token (reserved word, identifier, or value)
The SQL standard defines some "reserved for future extension" words. These are words that are not keyword in current version of the standard, but will possibly be keywords in future versions. The lexical analysis step should return a warning message for every such word encountered, and thread them as regular tokens.