Hi Eric,
In the optic of improving the development of new semantic lexers, and
perhaps to be able to move the "string-match-lexical-stuff" from the
LL parser into lexers, I had the idea of a new macro (see code at end)
that derives a generic type analyzer from an existing syntax-table
oriented analyzer.
Such generic type analyzers would be very useful to automatically
handle lexical elements provided in BY grammar, like these ones I
copied from c.by:
%token <open-paren> LPAREN "("
%token <close-paren> RPAREN ")"
%token <open-paren> LBRACE "{"
%token <close-paren> RBRACE "}"
%token <semantic-list> BRACK_BLCK "\\[.*\\]$"
%token <semantic-list> PAREN_BLCK "^("
%token <semantic-list> VOID_BLCK "^(void)$"
%token <semantic-list> PARENS "()"
%token <semantic-list> BRACKETS "\\[\\]"
To handle the above lexemes if should suffice to derive two generic
type analyzers from respectively `semantic-lex-paren-or-list' and
`semantic-lex-close-paren', like this:
(define-derived-lex-type-analyzer semantic-lex-paren-or-list-type
semantic-lex-paren-or-list)
So, in addition to `open-paren' or `semantic-list' unspecific tokens,
the above analyzer will automatically return specific LPAREN, LBRACE,
BRACK_BLCK, PAREN_BLCK, VOID_BLCK, PARENS and BRACKETS tokens
depending on the matching input stream data.
(define-derived-lex-type-analyzer semantic-lex-close-paren-type
semantic-lex-close-paren)
So, in addition to `close-paren' unspecific tokens, the above analyzer
will automatically return specific RPAREN and RBRACE tokens depending
on the matching input stream data.
Here is the code of the new macro. Unfortunately, I hadn't enough
time to really test it yet, however I would like to know your opinion
before going further.
(defmacro define-derived-lex-type-analyzer (name analyzer &optional doc)
"Define NAME as a generic type analyzer derived from ANALYZER.
ANALYZER must be a symbol that identifies a previously defined lexical
analyzer.
Optional argument DOC is the NAME analyzer doc string.
A generic type analyzer filters tokens produced by ANALYZER, based on
values found in the current table of lexical tokens for the type of
tokens returned by ANALYZER, to return more specific lexical tokens.
Here is a small example to analyse lexical keywords or symbols from
these grammar lexical elements:
%token IF \"if\" ; keyword 'if'
%token THEN \"then\" ; keyword 'then'
%token <symbol> ID ; default lexical symbol
%token <symbol> VAR \"^$\" ; variable names start with $
Define a generic type analyzer from `semantic-lex-symbol-or-keyword':
(define-derived-lex-type-analyzer semantic-lex-keyword-or-symbol-type
semantic-lex-symbol-or-keyword)
Here is what the above generic analyzer returns from the following
input stream:
if $val then result = $val
(IF 1 . 3) ; the keyword IF
(VAR 4 . 8) ; a dollar variable
(THEN 9 . 13) ; the keyword THEN
(ID 14 . 20) ; a generic identifier
(VAR 23 . 27) ; a dollar variable"
(let* ((tok (make-symbol "tok"))
(typ (make-symbol "typ"))
(val (make-symbol "val"))
(lst (make-symbol "lst"))
(def (make-symbol "def"))
(elt (make-symbol "elt"))
(code (symbol-value analyzer))
(condition (car code))
(forms
`(,@(cdr code)
(let* ((,tok (car semantic-lex-token-stream))
(,typ (semantic-lex-token-class ,tok))
(,val (semantic-lex-token-text ,tok))
(,lst (semantic-lex-type-value (symbol-name ,typ t)))
(,def (car ,lst)) ;; default lexical token or nil
(,lst (cdr ,lst)) ;; alist of (TOKEN . MATCH-STRING)
,elt)
;; Search for a matching lexical token
(while (and ,lst (not ,elt))
(setq ,elt (and (string-match (cdar ,lst) ,val) (caar ,lst))
,lst (cdr ,lst)))
;; If not found, use a default lexical token if
;; provided, or the initial token type otherwise.
(setcar ,tok (or ,elt ,def ,typ))))))
`(define-lex-analyzer ,name
,doc
,(car code)
,@forms)))
What do you think? Any other thoughts?
David
|