Re: [cedet-semantic] [PATCH] Fix Python parsing for triple-quoted strings
Brought to you by:
zappo
From: Eric M. L. <eri...@gm...> - 2012-08-07 02:13:39
|
Hi Dale, thanks for patch for python support. I'm not a python user, so I don't have comments on the patch. Have you signed papers to assign copyright of changes you make to Emacs to the FSF? In order for me to include your patch in CEDET proper, you will need to assign copyright. I've attached the form you fill out, and mail to fsf...@gn... when you are done. I can generally accept small patches without an assignment, but your patch is significant enough to need an assignment. Thanks! Eric On 08/04/2012 03:42 PM, Dale Sedivec wrote: > Greetings, > > Below is a patch to correct Semantic's parsing of Python's > triple-quoted strings, such as '''foo''' and """foo""". I write Python > for my day job and I've been using this patch for over a year with > success. > > I actually tried to submit this last year, but I needed to get my > copyright assignment filed with the FSF. I've now got that filed. > > These changes were originally necessary because python.el in Emacs<= > 23 used font-lock-syntactic-keywords to parse triple-quoted strings, > which Semantic switches off during parsing. I figured Emacs 24 and > syntax-propertize-function would have made these changes unnecessary, > since Semantic _presumably_ lets syntax-propertize-function run. > However, if I just run my cedet-utests.el changes in Emacs 24 with CEDET > HEAD, I still get a test failure, so I guess these changes are still > necessary. > > Patch below against what is probably CEDET HEAD (taken from > https://github.com/emacsmirror/cedet). > > Thanks, > Dale > > > > commit c9e7a2c7ca9f3a24efb7dc5d22806ac0e8e9933f > Date: Fri Mar 18 16:12:00 2011 -0500 > > wisent-python-forward-line and -string work despite hairy strings. > > These functions broke on some cases of triple-quoted strings; see > str_test_1 and str_test_2. This parser now depends less (if at all) > on python.el applying syntax properties (which Semantic doesn't use) > via font-lock-syntactic-keywords (which Semantic may not enable). > > diff --git a/lisp/cedet/semantic/wisent/python.el > b/lisp/cedet/semantic/wisent/python.el > index 45ec58e..d9e8f2b 100644 > --- a/lisp/cedet/semantic/wisent/python.el > +++ b/lisp/cedet/semantic/wisent/python.el > @@ -85,11 +85,33 @@ > ;; to be suppressed. For example, r"01\n34" is a string with six > ;; characters 0, 1, \, n, 3 and 4. The 'u' prefix means the following > ;; string is a unicode. > -(defconst wisent-python-string-re > - (concat (regexp-opt '("r" "u" "ur" "R" "U" "UR" "Ur" "uR") t) > - "?['\"]") > +(defconst wisent-python-string-start-re "[uU]?[rR]?['\"]" > "Regexp matching beginning of a Python string.") > > +(defconst wisent-python-string-re > + (rx > + (opt (any "uU")) (opt (any "rR")) > + (or > + ;; Triple-quoted string using apostrophes > + (: "'''" (zero-or-more (or "\\'" > + (not (any "'")) > + (: (repeat 1 2 "'") (not (any "'"))))) > + "'''") > + ;; String using apostrophes > + (: "'" (zero-or-more (or "\\'" > + (not (any "'")))) > + "'") > + ;; Triple-quoted string using quotation marks. > + (: "\"\"\"" (zero-or-more (or "\\\"" > + (not (any "\"")) > + (: (repeat 1 2 "\"") (not (any "\""))))) > + "\"\"\"") > + ;; String using quotation marks. > + (: "\"" (zero-or-more (or "\\\"" > + (not (any "\"")))) > + "\""))) > + "Regexp matching a complete Python string.") > + > (defvar wisent-python-EXPANDING-block nil > "Non-nil when expanding a paren block for Python lexical analyzer.") > > @@ -101,16 +123,46 @@ curly braces." > > (defsubst wisent-python-forward-string () > "Move point at the end of the Python string at point." > - (when (looking-at wisent-python-string-re) > - ;; skip the prefix > - (and (match-end 1) (goto-char (match-end 1))) > - ;; skip the quoted part > - (cond > - ((looking-at "\"\"\"[^\"]") > - (search-forward "\"\"\"" nil nil 2)) > - ((looking-at "'''[^']") > - (search-forward "'''" nil nil 2)) > - ((forward-sexp 1))))) > + (if (looking-at wisent-python-string-re) > + (let ((start (match-beginning 0)) > + (end (match-end 0))) > + ;; Incomplete triple-quoted string gets matched instead as a > + ;; complete single quoted string. (This special case would be > + ;; unnecessary if Emacs regular expressions had negative > + ;; look-ahead assertions.) > + (when (and (= (- end start) 2) > + (looking-at "\"\\{3\\}\\|'\\{3\\}")) > + (error "unterminated syntax")) > + (goto-char end)) > + (error "unterminated syntax"))) > + > +(defun wisent-python-forward-balanced-expression () > + "Move point to the end of the balanced expression at point. > +Here 'balanced expression' means anything matched by Emacs' > +open/close parenthesis syntax classes. We can't use forward-sexp > +for this because that Emacs built-in can't parse Python's > +triple-quoted string syntax." > + (let ((end-char (cdr (syntax-after (point))))) > + (forward-char 1) > + (while (not (or (eobp) (eq (char-after (point)) end-char))) > + (cond > + ;; Skip over python strings. > + ((looking-at wisent-python-string-start-re) > + (wisent-python-forward-string)) > + ;; At a comment start just goto end of line. > + ((looking-at "\\s<") > + (end-of-line)) > + ;; Skip over balanced expressions. > + ((looking-at "\\s(") > + (wisent-python-forward-balanced-expression)) > + ;; Skip over white space, word, symbol, punctuation, paired > + ;; delimiter (backquote) characters, line continuation, and end > + ;; of comment characters (AKA newline characters in Python). > + ((zerop (skip-syntax-forward "-w_.$\\>")) > + (error "can't figure out how to go forward from here")))) > + ;; Skip closing character. As a last resort this should raise an > + ;; error if we hit EOB before we find our closing character.. > + (forward-char 1))) > > (defun wisent-python-forward-line () > "Move point to the beginning of the next logical line. > @@ -124,14 +176,14 @@ line ends at the end of the buffer, leave the > point there." > (progn > (cond > ;; Skip over python strings. > - ((looking-at wisent-python-string-re) > + ((looking-at wisent-python-string-start-re) > (wisent-python-forward-string)) > ;; At a comment start just goto end of line. > ((looking-at "\\s<") > (end-of-line)) > - ;; Skip over generic lists and strings. > - ((looking-at "\\(\\s(\\|\\s\"\\)") > - (forward-sexp 1)) > + ;; Skip over balanced expressions. > + ((looking-at "\\s(") > + (wisent-python-forward-balanced-expression)) > ;; At the explicit line continuation character > ;; (backslash) move to next line. > ((looking-at "\\s\\") > @@ -253,7 +305,7 @@ continuation of current line." > > (define-lex-regex-analyzer wisent-python-lex-string > "Detect and create python string tokens." > - wisent-python-string-re > + wisent-python-string-start-re > (semantic-lex-push-token > (semantic-lex-token > 'STRING_LITERAL > diff --git a/tests/cedet/semantic/utest-parse.el > b/tests/cedet/semantic/utest-parse.el > index f030613..43e770b 100644 > --- a/tests/cedet/semantic/utest-parse.el > +++ b/tests/cedet/semantic/utest-parse.el > @@ -255,6 +255,32 @@ if x: > x = 2 > y = 3 > r, s, t = 1, 2, '3' > + > +# Test string corner cases. Note that triple-quoted strings used > +# to depend on font-lock to apply syntax properties to them. > +# Code in the Python lexer that depended on scan-sexps and the > +# like has been replaced with more manual methods to work around > +# this problem. > +def str_test_1(): > + '''This might trip up wisent-python-forward-string: \\''' ''' > + > +def str_test_2(): > + ('''Internal apostrophe in PAREN_BLOCK doesn't end this > + string literal. If you're using forward-sexp to skip this > + parenthetical expression, and syntax properties from > + python-mode haven't been applied, you'll fail to recognize > + the end of this triple-quoted string because this last > + apostrophe makes an odd number of apostrophes: ' Now it would > + look like you have an unterminated string literal starting at > + the last of these three apostrophes:''') > + > +def str_test_3(): > + \"don't\" \"trip\" \"on\" \"adjacent\" \"strings\" > + > +# str_test_4 is only here to make sure that we're still correctly > +# finding tags after all the preceding tests. > +def str_test_4(): > + pass > " > > > @@ -406,6 +432,12 @@ r, s, t = 1, 2, '3' > ("x" variable nil nil nil) > ("y" variable nil nil nil) > ("r, s, t" code nil nil nil) ;; TODO should be multiple variable tags > + > + ;; String tests > + ("str_test_1" function nil nil nil) > + ("str_test_2" function nil nil nil) > + ("str_test_3" function nil nil nil) > + ("str_test_4" function nil nil nil) > ) > "List of expected tag names for Python.") > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > cedet-semantic mailing list > ced...@li... > https://lists.sourceforge.net/lists/listinfo/cedet-semantic > |