[CEDET-devel] an idea for generating semantic-list tokens in python
Brought to you by:
zappo
From: <ry...@ds...> - 2003-01-14 05:29:33
|
Hi Eric and David, I would like to share with you a work in progress which attempts to make python parser generate semantic-list tokens properly. Current semantic/wisent parser employees two methods of of generating semantic-list tokens. (Please correct me if I'm wrong.) One is via the wisent-skip-block function, e.g., the "block" non-terminal found in wisent-java.wy. This relies on writing a grammar that intentionally forces an error condition which is handled via wisent-skip-block function which in turn generates a semantic-list token as part of handling the error. In the second method, the semantic-list tokens are generated by the lexer even before the parser begins parsing, e.g., define-lex-block-analyzer. Both methods rely eventually on Emacs' built-in primitive function called scan-lists to skip matched blocks deliniated by parenthesis, braces and brackets. This takes advantage of the emacs syntax tables. Since python relies on indentation rather than braces for block structure, the built-in scan-lists function is useless. However, it seems like it is possible to mimick scan-lists for python with a bit of elisp code shown below. First, we advise scan-lists to intercept python-mode and take special course of action. Note that current implementation ignores the three arguments to scan-lists. If necessary, it would not be too hard to add support for those later. (defadvice scan-lists (around handle-python-mode activate compile) "Use python mode specific function, python-scan-lists, if the current major mode is python-mode. Otherwise simply call the original function." (if (eq major-mode 'python-mode) (setq ad-return-value (python-scan-lists)) ad-do-it)) Within wisent-python parser, scan-lists is called only when we encounter an INDENT token. The goal is to locate the matching DEDENT token. This can be done by first finding the next line which could produce DEDENT tokens then compare the indentation of that line with the starting line. If the indentation is equal or less, then we have found the end of current "block". If the indentation is greater, then we simply iterate by going to the next line. (defun python-scan-lists ( &optional target-column ) "Without actually changing the position, return the buffer position of the next line whose indentation is the same as the current line or less than current line." (or target-column (setq target-column (current-column))) (save-excursion (python-next-line) (while (> (current-indentation) target-column) (python-next-line)) ;; Move the cursor to the original indentation level or first non-white ;; character which ever comes first. (skip-chars-forward " \t" (+ (point) target-column)) (point))) (defun python-next-line () "Move the cursor to the next line to check for INDENT or DEDENT tokens. Usually this is simply the next line unless strings, lists, or blank lines, or comment lines are encountered. This function skips over such items." (let (beg) (while (not (eolp)) (setq beg (point)) (cond ;; skip over triple-quote string ((looking-at "\"\"\"") (forward-char 3) (search-forward "\"\"\"")) ;; skip over lists, strings, etc ((looking-at "\\(\\s(\\|\\s\"\\|\\s<\\)") (forward-sexp 1)) ;; skip over white space, word, symbol, and punctuation characters (t (skip-syntax-forward "-w_."))) (if (= (point) beg) (error "You have found a bug in python-next-line"))) ;; the point now should be at the end of a line (forward-line 1) (while (and (looking-at "\\s-*\\(\\s<\\|$\\)") (not (eobp))) ;; skip blank and comment lines (forward-line 1)))) What do you think of this idea? |