[CEDET-devel] an idea for generating semantic-list tokens in python

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Eric and David,

I would like to share with you a work in progress which attempts to
make python parser generate semantic-list tokens properly.

Current semantic/wisent parser employees two methods of of generating
semantic-list tokens.  (Please correct me if I'm wrong.)

One is via the wisent-skip-block function, e.g., the "block"
non-terminal found in wisent-java.wy.  This relies on writing a
grammar that intentionally forces an error condition which is handled
via wisent-skip-block function which in turn generates a semantic-list
token as part of handling the error.

In the second method, the semantic-list tokens are generated by the
lexer even before the parser begins parsing, e.g.,
define-lex-block-analyzer.

Both methods rely eventually on Emacs' built-in primitive function
called scan-lists to skip matched blocks deliniated by parenthesis,
braces and brackets.  This takes advantage of the emacs syntax tables.

Since python relies on indentation rather than braces for block
structure, the built-in scan-lists function is useless.

However, it seems like it is possible to mimick scan-lists for python
with a bit of elisp code shown below.

First, we advise scan-lists to intercept python-mode and take special
course of action.  Note that current implementation ignores the three
arguments to scan-lists.  If necessary, it would not be too hard to
add support for those later.

(defadvice scan-lists (around handle-python-mode activate compile)
  "Use python mode specific function, python-scan-lists, if the
current major mode is python-mode.
Otherwise simply call the original function."
  (if (eq major-mode 'python-mode)
      (setq ad-return-value (python-scan-lists))
    ad-do-it))

Within wisent-python parser, scan-lists is called only when we
encounter an INDENT token.  The goal is to locate the matching DEDENT
token.  This can be done by first finding the next line which could
produce DEDENT tokens then compare the indentation of that line with
the starting line.  If the indentation is equal or less, then we have
found the end of current "block".  If the indentation is greater, then
we simply iterate by going to the next line.

(defun python-scan-lists ( &optional target-column )
  "Without actually changing the position, return the buffer position of
the next line whose indentation is the same as the current line or less
than current line."
  (or target-column (setq target-column (current-column)))
  (save-excursion
    (python-next-line)
    (while (> (current-indentation) target-column)
      (python-next-line))
    ;; Move the cursor to the original indentation level or first non-white
    ;; character which ever comes first.
    (skip-chars-forward " \t" (+ (point) target-column))
    (point)))

(defun python-next-line ()
  "Move the cursor to the next line to check for INDENT or DEDENT tokens.
Usually this is simply the next line unless strings, lists, or blank lines,
or comment lines are encountered.  This function skips over such items."
  (let (beg)
    (while (not (eolp))
      (setq beg (point))
      (cond
       ;; skip over triple-quote string
       ((looking-at "\"\"\"")
	(forward-char 3)
	(search-forward "\"\"\""))
       ;; skip over lists, strings, etc
       ((looking-at "\\(\\s(\\|\\s\"\\|\\s<\\)")
	(forward-sexp 1))
       ;; skip over white space, word, symbol, and punctuation characters
       (t (skip-syntax-forward "-w_.")))
      (if (= (point) beg)
	  (error "You have found a bug in python-next-line")))
    ;; the point now should be at the end of a line
    (forward-line 1)
    (while (and (looking-at "\\s-*\\(\\s<\\|$\\)")
		(not (eobp))) ;; skip blank and comment lines
      (forward-line 1))))

What do you think of this idea?