From: Richard Y. Kim <ryk@ds...>  20020529 06:53:22

David, I like your `semanticflexenableindents' based code in all respects compared with my initial gross hack. Your code is simpler and more general. I don't yet understand what wisentflex offers, but I assume it can keep track of "stack of indentations" and properly compute INDENT and DEDENT tokens for use by the parser. I'll study wisentflex so that I understand what you are talking about. After that, I'll see if I can use your modified semanticflex along with wisentflex and see if I can finish off the python lexer. Thanks for good ideas. >>>>> "DP" == David Ponce <david@...> writes: DP> DP> Hi Eric & Richard, DP> Following Richard's work and remarks about `semanticflex' and python DP> I submit you the attached version of `semanticflex' hacked to catch DP> indentation. I don't tested it intensively but it could be a starting DP> point to enhance `semanticflex' ;) DP> DP> I defined a new buffer local option `semanticflexenableindents'. DP> When nonnil `semanticflex' catches indentation (at beginning of DP> lines) and inserts corresponding pseudosyntactic 'indent tokens in DP> the returned token stream. I called such tokens pseudosyntactic ones DP> because they don't actually match data in the input source DP> (`semanticflex' don't move the point when it catches one). DP> DP> I used the form (indent . N), where N is the `currentindentation' DP> value, because I think it could be more useful than a token of the DP> form (indent START . END). Particularly because the true indentation DP> value can be different of ( END START) when there are tab characters. DP> DP> Thus, after evaluating something like this: DP> DP> (let ((semanticflexenableindents t) DP> (semanticflexenablewhitespace t)) DP> (semanticflexbuffer)) DP> DP> it is possible to get the following stream: DP> DP> ((indent . 8) (whitespace 1 . 2) (symbol 2 . 5)) DP> DP> from a buffer containing: DP> DP> <tab>ITEM DP> DP> assuming the tab width is 8! DP> DP> Such 'indent tokens should then be easily handled by the `wisentflex' DP> lexer to produce the INDENT and DEDENT lexical tokens needed to parse DP> python. DP> DP> What do you think? DP> DP> Sincerely, DP> David DP> DP> DP> (defvar semanticflexenableindents nil DP> "When flexing, report 'indent as pseudo syntactic elements. DP> Useful for languages like python which are indentation sensitive. DP> Only set this on a per mode basis, not globally.") DP> (makevariablebufferlocal 'semanticflexenableindents) DP> DP> (defun semanticflex (start end &optional depth length) DP> "Using the syntax table, do something roughly equivalent to flex. DP> Semantically check between START and END. Optional argument DEPTH DP> indicates at what level to scan over entire lists. DP> The return value is a token stream. Each element is a list, such DP> of the form (symbol startexpression . endexpression) where DP> SYMBOL denotes the token type. DP> See `semanticflextokens' variable for details on token types. DP> END does not mark the end of the text scanned, only the end of the beginning DP> of text scanned. Thus, if a string extends past END, the end of the DP> return token will be larger than END. To truly restrict DP> scanning, use `narrowtoregion'. DP> The last argument, LENGTH specifies that `semanticflex' should only return DP> LENGTH tokens." DP> ;(message "Flexing muscles...") DP> (if (not semanticflexkeywordsobarray) DP> (setq semanticflexkeywordsobarray [ nil ])) DP> (let ((ts nil) DP> (pos (point)) DP> (ep nil) DP> (curdepth 0) DP> (cs (if commentstartskip DP> (concat "\\(\\s<\\" commentstartskip "\\)") DP> (concat "\\(\\s<\\)"))) DP> (newsyntax (copysyntaxtable (syntaxtable))) DP> (mods semanticflexsyntaxmodifications) DP> ;; Use the default depth if it is not specified. DP> (depth (or depth semanticflexdepth))) DP> ;; Update the syntax table DP> (while mods DP> (modifysyntaxentry (car (car mods)) (car (cdr (car mods))) newsyntax) DP> (setq mods (cdr mods))) DP> (withsyntaxtable newsyntax DP> (gotochar start) DP> (while (and (< (point) end) (or (not length) (<= (length ts) length))) DP> (cond ( ;; catch newlines when needed DP> (and semanticflexenablenewlines DP> (lookingat "\\s*\\(\n\\)")) DP> (setq ep (matchend 1) DP> ts (cons (cons 'newline DP> (cons (matchbeginning 1) ep)) DP> ts))) DP> ;; catch indentation when needed. Just insert a pseudo DP> ;; token (indent . N), where N is the DP> ;; `currentindentation' value, in the token stream DP> ;; without moving the point. DP> ((if semanticflexenableindents DP> (saveexcursion DP> (if (eolp) (forwardchar)) DP> (if (bolp) DP> (setq ts (cons (cons 'indent DP> (currentindentation)) DP> ts))) DP> nil))) DP> ;; special extensions, sometimes includes some whitespace. DP> ((and semanticflexextensions DP> (let ((fe semanticflexextensions) DP> (r nil)) DP> (while fe DP> (if (lookingat (car (car fe))) DP> (setq ts (cons (funcall (cdr (car fe))) ts) DP> r t DP> fe nil DP> ep (point))) DP> (setq fe (cdr fe))) DP> (if (and r (not (car ts))) (setq ts (cdr ts))) DP> r))) DP> ;; comment end is also EOL for some languages. DP> ((lookingat "\\(\\s\\\\s>\\)+") DP> (if semanticflexenablewhitespace DP> (setq ts (cons (cons 'whitespace DP> (cons (matchbeginning 0) DP> (matchend 0))) DP> ts)))) DP> ;; numbers DP> ((and semanticnumberexpression DP> (lookingat semanticnumberexpression)) DP> (setq ts (cons (cons 'number DP> (cons (matchbeginning 0) DP> (matchend 0))) DP> ts))) DP> ;; symbols DP> ((lookingat "\\(\\sw\\\\s_\\)+") DP> (setq ts (cons (cons DP> ;; Get info on if this is a keyword or not DP> (or (semanticflexkeywordp (matchstring 0)) DP> 'symbol) DP> (cons (matchbeginning 0) (matchend 0))) DP> ts))) DP> ;; Character quoting characters (ie, \n as newline) DP> ((lookingat "\\s\\+") DP> (setq ts (cons (cons 'charquote DP> (cons (matchbeginning 0) (matchend 0))) DP> ts))) DP> ;; Open parens, or semanticlists. DP> ((lookingat "\\s(") DP> (if (or (not depth) (< curdepth depth)) DP> (progn DP> (setq curdepth (1+ curdepth)) DP> (setq ts (cons (cons 'openparen DP> (cons (matchbeginning 0) (matchend 0))) DP> ts))) DP> (setq ts (cons DP> (cons 'semanticlist DP> (cons (matchbeginning 0) DP> (saveexcursion DP> (conditioncase nil DP> (forwardlist 1) DP> ;; This case makes flex robust DP> ;; to broken lists. DP> (error DP> (gotochar DP> (funcall DP> semanticflexunterminatedsyntaxendfunction DP> 'semanticlist DP> start end)))) DP> (setq ep (point))))) DP> ts)))) DP> ;; Close parens DP> ((lookingat "\\s)") DP> (setq ts (cons (cons 'closeparen DP> (cons (matchbeginning 0) (matchend 0))) DP> ts)) DP> (setq curdepth (1 curdepth))) DP> ;; String initiators DP> ((lookingat "\\s\"") DP> ;; Zing to the end of this string. DP> (setq ts (cons (cons 'string DP> (cons (matchbeginning 0) DP> (saveexcursion DP> (conditioncase nil DP> (forwardsexp 1) DP> ;; This case makes flex DP> ;; robust to broken strings. DP> (error DP> (gotochar DP> (funcall DP> semanticflexunterminatedsyntaxendfunction DP> 'string DP> start end)))) DP> (setq ep (point))))) DP> ts))) DP> ((lookingat cs) DP> (if semanticignorecomments DP> ;; If the language doesn't deal with comments, DP> ;; ignore them here. DP> (let ((commentstartpoint (point))) DP> (forwardcomment 1) DP> (if (eq (point) commentstartpoint) DP> ;; In this case our startskip string failed DP> ;; to work properly. Lets try and move over DP> ;; whatever white space we matched to begin DP> ;; with. DP> (skipsyntaxforward ".'" DP> (saveexcursion DP> (endofline) DP> (point))) DP> (forwardcomment 1)) DP> (if (eq (point) commentstartpoint) DP> (error "Strange comment syntax prevents lexical analysis")) DP> (setq ep (point))) DP> ;; Language wants comments, link them together. DP> (if (eq (car (car ts)) 'comment) DP> (setcdr (cdr (car ts)) (saveexcursion DP> (forwardcomment 1) DP> (setq ep (point)))) DP> (setq ts (cons (cons 'comment DP> (cons (matchbeginning 0) DP> (saveexcursion DP> (forwardcomment 1) DP> (setq ep (point))))) DP> ts))))) DP> ((lookingat "\\(\\s.\\\\s$\\\\s'\\)") DP> (setq ts (cons (cons 'punctuation DP> (cons (matchbeginning 0) (matchend 0))) DP> ts))) DP> (t (error "What is that?"))) DP> (gotochar (or ep (matchend 0))) DP> (setq ep nil))) DP> (gotochar pos) DP> ;(message "Flexing muscles...done") DP> (nreverse ts))) 