Eric,
I would like to share with you some difficulties and possible
solutions that I've encountered as I attempt to write semantic lexer
for python. I include what I have so far for semantic-python.el as
well as a diff for semantic.el at the end of this email.
Problem #1: No 'newline token is generated if a line containing code
also contains trailing comment. This is easily demonstrated using the
following makefile:
0: #
1: all : one
2:
3: one : # one
4: echo one
The token list generated by semantic-flex is
((symbol 3 . 6) # 1: all
(whitespace 6 . 7) # 1:
(punctuation 7 . 8) # 1: :
(whitespace 8 . 9) # 1:
(symbol 9 . 12) # 1: one
(newline 12 . 13) # 1:
(newline 13 . 14) # 2:
(symbol 14 . 17) # 3: one
(whitespace 17 . 18) # 3:
(punctuation 18 . 19) # 3: :
# 3: MISSING newline HERE
(whitespace 19 . 20) # 4:
(symbol 27 . 31) # 4: echo
(whitespace 31 . 32) # 4:
(symbol 32 . 35) # 4: one
(newline 35 . 36)) # 4:
This is problematic for python. The last diff segment for semantic.el
shown below is my attempt at a solution. I offer this as a quick hack
rather than a robust solution.
Problem #2: semantic-flex-extensions is not flexible enough to allow
handling of line-continuation character in python, i.e., `\'.
In semantic-make.el, the following forms are found:
(defvar semantic-flex-make-extensions
'(("^\\(\t\\)" . semantic-flex-make-command)
("\\(\\\\\n\t*\\)" . semantic-flex-nonewline))
"Extensions to the flexer for make.")
(defun semantic-flex-nonewline ()
"If there is a \ ending a line, then it isn't really a newline."
(goto-char (match-end 0))
(cons 'whitespace (cons (match-beginning 0) (match-end 0))) )
If I understand this correctly, the "\\\\" regexp is meant to catch
the line continuation character. Unfortunately, this is handled by
generating `whitespace' character. Makefiles are simple enough that
this may not be too much of a problem. However if `whitespace' token
can be placed anywhere in a Python token stream, the I'm afraid that
most of python's grammar needs to be modified with `whitespace' tokens
showing up everywhere in the grammar!
One easy way to fix this seems to be to make token generation optional
in functions specified in semantic-flex-make-extensions.
The diff shown below does this by simply adding the returned token
to the token list if it is not nil.
Problem #3: semantic-flex-extensions does not seem to be flexible
enough to allow generating DEDENT python tokens.
It is easy to generate INDENT tokens by
(setq semantic-flex-python-extensions
'(("^\\([ \t]+\\)" . semantic-flex-python-indentation)))
and having semantic-flex-python-indentation generate INDENT tokens
with the help of an indentation stack state variable.
The DEDENT is a problem though, because one or more DEDENT tokens
may need to be generated without consuming any input characters!
Current semantic-flex-python-extensions does not allow this, i.e.,
the following does not work due to infinite loop:
(setq semantic-flex-python-extensions
'(("^\\([ \t]*\\)" . semantic-flex-python-indentation)))
where `+' from above was changed to `*'.
My attempt at solving this is to add another user option called
semantic-flex-newline-handler which is called if not nil after a
newline token is generated. This should not incur any overhead
for grammars that do not need newline tokens.
The following is what I have so far on python lexer.
I present this to you for your comments.
Thanks.
;;; semantic-python.el --- Lexer/parser for Python Lanugage
;; Copyright (c) 2002 Richard Y. Kim
;; Author: Richard Y. Kim, <ryk@...>
;; Version: $Id: elisp.dm,v 1.1.1.1 2002/04/07 18:11:31 ryk Exp $
;; This file is not part of GNU Emacs.
;; This is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.
;; This software is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs; see the file COPYING. If not, write to the
;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
;; Boston, MA 02111-1307, USA.
;;; Commentary:
;;
;;; History:
;;
;;; Code:
;; Indentation stack. Each INDNET token generation coincides with one
;; entry being pushed onto this stack, and each DEDENT token
;; generation resulting popping an entry from this stack.
(defvar semantic-flex-python-indent-stack '(0))
;; Not written yet.
(defun semantic-flex-python-line-continuation ()
(let ()
))
;; To be used as semantic-flex-newline-handler.
;; On entry, the point is at the beginning of a line.
;; If the current line is either blank or a comment line, then do
;; nothing and return nil.
;; Otherwise, compute the current indentation level and generate one
;; INDENT/DEDENT tokens as needed.
;; The return value is a list of tokens rather than a single token,
;; because multiple DEDENT tokens could be generated.
(defun semantic-flex-python-newline-handler ()
(let* ((beg (point))
(end (progn (skip-chars-forward " \t") (point)))
(column (- end beg)))
(cond
;; Comment line
((looking-at
(if comment-start-skip
(concat "\\(\\s<\\|" comment-start-skip "\\)")
(concat "\\(\\s<\\)")))
nil)
;; Blank line
((looking-at "\\s-*$")
nil)
;; Line with code
(t
(cond ((= column (car semantic-flex-python-indent-stack))
(message "SAME %d" column)
nil)
((> column (car semantic-flex-python-indent-stack))
(push column semantic-flex-python-indent-stack)
(message "INDENT %d" column)
(list (cons 'INDENT (cons beg end))))
(t ;; column < (car semantic-flex-python-indent-stack)
(let ((count 0)
tokens)
(while (< column (car semantic-flex-python-indent-stack))
(pop semantic-flex-python-indent-stack)
(setq tokens (cons (cons 'DENDENT (cons beg end)) tokens))
(setq count (1+ count)))
(unless (or (= column (car semantic-flex-python-indent-stack))
(eq 1 (length semantic-flex-python-indent-stack)))
(message "Invalid %d" column))
(message "DEDENT %d" count)
tokens)))))))
(setq semantic-flex-python-extensions
'(
;;("\\\\" . semantic-flex-python-line-continuation)
)
)
;; My attempt at resetting semantic-flex-python-indent-stack before
;; entering semantic-flex. This is not robust enough since
;; semantic-flex is called by functions other than the top-level
;; bovination command. We need something better!
(defun semantic-python-before-toplevel-bovination-hook ()
(setq semantic-flex-python-indent-stack '(0)))
(defun semantic-default-python-setup ()
(let ()
(setq semantic-flex-extensions semantic-flex-python-extensions)
;; python grammar requires NEWLINE tokens!
(setq semantic-flex-enable-newlines t)
;; This generates INDENT and DEDENT tokens.
(setq semantic-flex-newline-handler semantic-flex-python-newline-handler)
;; My poor attempt at resetting internal variables upon startup.
(add-hook 'semantic-before-toplevel-bovination-hook
'semantic-python-before-toplevel-bovination-hook)
))
(add-hook 'python-mode-hook 'semantic-default-python-setup)
;;; semantic-python.el ends here
*** semantic.el.~1.140.~ Sun May 26 02:41:54 2002
--- semantic.el Sun May 26 03:56:00 2002
***************
*** 1860,1876 ****
(setq ep (match-end 1)
ts (cons (cons 'newline
(cons (match-beginning 1) ep))
! ts)))
;; special extensions, sometimes includes some whitespace.
((and semantic-flex-extensions
(let ((fe semantic-flex-extensions)
! (r nil))
(while fe
! (if (looking-at (car (car fe)))
! (setq ts (cons (funcall (cdr (car fe))) ts)
! r t
! fe nil
! ep (point)))
(setq fe (cdr fe)))
(if (and r (not (car ts))) (setq ts (cdr ts)))
r)))
--- 1860,1885 ----
(setq ep (match-end 1)
ts (cons (cons 'newline
(cons (match-beginning 1) ep))
! ts))
! (when semantic-flex-newline-handler
! (goto-char ep)
! (let ((tokens (funcall semantic-flex-newline-handler)))
! (when tokens
! (setq ts (append tokens ts))
! (setq eq (point))))))
;; special extensions, sometimes includes some whitespace.
((and semantic-flex-extensions
(let ((fe semantic-flex-extensions)
! (r nil)
! token)
(while fe
! (when (looking-at (car (car fe)))
! (setq token (funcall (cdr (car fe))))
! (if token
! (setq ts (cons token ts)))
! (setq r t
! fe nil
! ep (point)))
(setq fe (cdr fe)))
(if (and r (not (car ts))) (setq ts (cdr ts)))
r)))
***************
*** 1964,1970 ****
(save-excursion
(end-of-line)
(point)))
! (forward-comment 1))
(if (eq (point) comment-start-point)
(error "Strange comment syntax prevents lexical analysis"))
(setq ep (point)))
--- 1973,1983 ----
(save-excursion
(end-of-line)
(point)))
! ;;(forward-comment 1)
! ;; Generate newline token if enabled
! (if (and semantic-flex-enable-newlines
! (bolp))
! (backward-char 1)))
(if (eq (point) comment-start-point)
(error "Strange comment syntax prevents lexical analysis"))
(setq ep (point)))
|