Hi Eric,
I am sorry for this late reply, but I was in bed all the week end
because of a nasty sore throat :-(
> I've made the following patch to semantic-lex:
>
> *** semantic-lex.el.~1.10.~=09Thu Sep 5 17:42:46 2002
> --- semantic-lex.el=09Sun Sep 15 07:55:18 2002
> ***************
> *** 482,487 ****
> --- 482,489 ----
> (while (and (< (point) end)
> (or (not length) (<=3D (length token-stream) len=
gth)))
> (semantic-lex-one-token ,analyzers)
> + =09 (when (eq end-point start)
> + =09 (error "Lexical Analyzer: potential hang detected"))
> (goto-char end-point)))
> ;; Return to where we started.
> ;; Do not wrap in protective stuff so that if there is an erro=
r
>
> in the hopes it can find the class of bug recently uncovered with
> David's semantic grammar lexer. Is there any reason we want start
> and end to be the same after a single analysis=3F
I am afraid but I don't remember what class of bug we recently
uncovered with the grammar lexer=3F
Anyway, I think it is a good idea to check that the lexer actually
eats the input stream ;-)
On the last Friday, I wrote this code for `semantic-lex-punctuation':
(define-lex-analyzer semantic-lex-a-punctuation
"Detect and create a punctuation token.
Recognized punctuations are defined in the current lexical token
table, as the value of the `punctuation' token class."
(and (looking-at "\\(\\s.\\|\\s$\\|\\s'\\)+")
(let* ((key (match-string 0))
(pos (match-beginning 0))
(end (match-end 0))
(len (- end pos))
(lst (semantic-lex-token-value "punctuation" t))
(def (car lst)) ;; default lexical symbol or nil
(lst (cdr lst)) ;; alist of (LEX-SYM . PUNCT-STRING)
(elt (rassoc key lst)))
;; Starting with the longest one, search if the punctuation
;; string is defined for this language.
(while (and (not elt) (> len 0))
(setq len (1- len)
key (substring key 0 len)
elt (rassoc key lst)))
(if elt ;; Return the punctuation token found
(semantic-lex-token (car elt) pos (+ pos len))
(if def ;; Return a default generic token
(semantic-lex-token def pos end)
;; Nothing match
)))))
Compared to the current implementation (`wisent-lex-punctuation') the
above one takes into account a specified default token symbol.
For example you can write:
%token <punctuation> PLUS "+"
%token <punctuation> MINUS "-"
%token <punctuation> OTHER
Which means that, when a punctuation is neither "+" nor "-", the
analyzer will return a token (OTHER start . end). Of course 'OTHER
can be the symbol 'punctuation ;-)
I will update semantic-lex as soon as I will feel a little better (I
am too tired for now, sorry).
Thanks.
David
|