Thread: Re: [CEDET-devel] semantic lexer for python
Brought to you by:
zappo
From: <pon...@ne...> - 2002-05-29 08:48:48
|
Hi Richard & Eric, [...] > Yes in most cases. In Dave's code however, INDENT tokens > may be generated by the empty string at the beginning of > lines without leading white spaces! That is what I tried to achieve ;-) > Also a key difference betwen INDENT and whitespace tokens is > that Dave's INDENT token does not consume any input > characters! Dave stuck in an entry in the middle of `cond' > clauses that may generate INDENT tokens, but it does not > move the current point. There is no infinite recursion, > because the cond clause Dave added always evaluates to `nil' > so that it goes on to the next cond clause *always*. I had > to look at the code for a couple of minutes before I > understood what was going on. Completely legal code, but > unusual use of the `cond' form. I have no problem with the > code so long as we add a comment in capital letters what is > going on. You're right! It is an unusual use of `cond' that should be emphasized. Unless you (or Eric) got a better idea on how to implement that ;-) > Despite the fact that Dave turned on both > semantic-flex-enable-indents and > semantic-flex-enable-whitespace in his sample code, the two > are independent features, i.e., they can be turn on/off > independently. I say this, because my first concern when I > saw Dave's code was that whitespace tokens need to be turned > on to turn on the INDENT tokens. After studying his code, > they seem to be indepedent. Yes these two options are completely independent. I my example I just wanted to illustrate that INDENT tokens just match the empty string at beginning of line and don't prevent to handle white spaces if needed! [...] > If speed becomes an issue, it may make sense to implement > part of semantic in C. I don't know that we have reached > that point yet with regard to python. [...] I don't know python at all but it seems that its original design clearly exhibits the limits of Emacs which is clearly designed to work well with languages syntax based on classic parenthesized block structures. Probably because of the Lisp inheritance ;-) So, in the case of python, I think it will be difficult for semantic-flex to easily produce the so nice 'semantic-list tokens needed to recursively parse sub parts of code. Thus allowing a simple but general, robust and efficient mechanism to skip code with invalid syntax without breaking the parser nor cluttering up the (LALR) grammar with a lot of error recovery rules difficult to tune. I agree with Eric that syntax tables are mainly oriented to navigate, particularly through parenthesized blocks of code. So, in the case of python, because of the above orientation (limitation?), it will be quasi impossible to use such powerful navigation tools like `up-list', `down-list', etc., heavily used by semantic-ctxt stuff. A lot of semantic-ctxt functions will probably need to be overrode by specific code certainly less efficient than the built-in Emacs one :-( I don't think that writing parts of the Semantic lexer/parser tools in C will improve Emacs design. Maybe we could submit a Request For Enhancement to Emacs developers, so Emacs could take into account new language concepts like python's indentation? Sincerely, David __________________________________________________________________ Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ |
From: <pon...@ne...> - 2002-05-29 11:19:38
|
To continue on this subject and to illustrate how a wisent lexer could be wrote for python, based on my previous hack of `semantic-flex' I just wrote the following basic piece of code (untested) ;-) (defvar wisent-python-last-indent nil "The last level of indentation encountered so far. Should be reset before starting a new parse task.") (defun wisent-python-lexer () "Return the next python's lexical token available. Filter any `semantic-flex' 'indent tokens available to produce (INDENT N) or (DEDENT N) lexical tokens needed to parse python code. Other `semantic-flex' tokens are handled in a normal way by `wisent-flex'." (let (wlex curr-indent last-indent) ;; Digest `semantic-flex' 'indent tokens (while (and (not wlex) (eq (caar wisent-flex-istream) 'indent)) (setq curr-indent (cdar wisent-flex-istream) last-indent (or wisent-python-last-indent 0) wisent-python-last-indent curr-indent wisent-flex-istream (cdr wisent-flex-istream)) (cond ;; No indentation change ((= curr-indent last-indent)) ;; Just eat 'indent token ;; Indentation increased ((> curr-indent last-indent) ;; Return an INDENT lexical token (setq wlex (list 'INDENT (- curr-indent last-indent)))) ;; Indentation decreased (t ;; Pop indentation stack (setq wlex (list 'DEDENT (- last-indent curr-indent)))))) ;; In all cases return the next lexical token found (or wlex (wisent-flex)))) What do you think? David __________________________________________________________________ Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ |
From: Richard Y. K. <ry...@ds...> - 2002-05-30 04:51:30
|
David, Thanks for the effort. I still have not studied wisent-flex, but wisent-python-last-indent does not look right. Python indentations need to be stored on a stack so that we know how many DEDENT tokens to generate! For example, def f(x): simple_statement compount_statement: simple_statement simple_statement def g(x): # Two DEDENT tokens are needed here! ... I quote from <http://www.python.org/doc/current/ref/indentation.html>: The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows. Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line's indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero. Unless I'm missing something fundamental, your code below will not generate proper number of DEDENT tokens. >>>>> "DP" == David Ponce <pon...@ne...> writes: DP> DP> To continue on this subject and to illustrate how a DP> wisent lexer could be wrote for python, based on my DP> previous hack of `semantic-flex' I just wrote the DP> following basic piece of code (untested) ;-) DP> (defvar wisent-python-last-indent nil DP> "The last level of indentation encountered so far. DP> Should be reset before starting a new parse task.") DP> DP> (defun wisent-python-lexer () DP> "Return the next python's lexical token available. DP> Filter any `semantic-flex' 'indent tokens available to produce (INDENT DP> N) or (DEDENT N) lexical tokens needed to parse python code. Other DP> `semantic-flex' tokens are handled in a normal way by `wisent-flex'." DP> (let (wlex curr-indent last-indent) DP> ;; Digest `semantic-flex' 'indent tokens DP> (while (and (not wlex) (eq (caar wisent-flex-istream) 'indent)) DP> (setq curr-indent (cdar wisent-flex-istream) DP> last-indent (or wisent-python-last-indent 0) DP> wisent-python-last-indent curr-indent DP> wisent-flex-istream (cdr wisent-flex-istream)) DP> (cond DP> ;; No indentation change DP> ((= curr-indent last-indent)) ;; Just eat 'indent token DP> ;; Indentation increased DP> ((> curr-indent last-indent) DP> ;; Return an INDENT lexical token DP> (setq wlex (list 'INDENT (- curr-indent last-indent)))) DP> ;; Indentation decreased DP> (t DP> ;; Pop indentation stack DP> (setq wlex (list 'DEDENT (- last-indent curr-indent)))))) DP> ;; In all cases return the next lexical token found DP> (or wlex (wisent-flex)))) DP> DP> What do you think? DP> DP> David |
From: <pon...@ne...> - 2002-05-31 09:06:44
Attachments:
sem-flex2.el
|
Hi Eric & Richard, Attached you will find the latest version of `semantic-flex' including Richard's newline enhancement plus the following: - newline are also correctly handled when `semantic-ignore-comments' is disabled. - I moved BOL detection before processing `semantic-flex-extensions' because BOL tokens must be always inserted before any other tokens found on the same line. Maybe a more general approach to keep the consistency of tokens ordering would be to replace the last (nreverse ts) by (sort ts #'(lambda (t1 t2) (<= (cadr t1) (cadr t2)))) Or even better, to also merge here whitespace and comment tokens too? Anyway I think this new version is more consistent in the way it handle the various `semantic-flex-enable-...' options. Here is a small example of different results from `semantic-flex' depending on what options are enabled. The input is the following small piece of Java code. /** * Describe variable <code>x</code> here. */ if (ok) int[] x = {1,2,3}; // end 1- Want all (let ((semantic-ignore-comments nil) (semantic-flex-enable-bol t) (semantic-flex-enable-newlines t) (semantic-flex-enable-whitespace t)) (semantic-flex-buffer)) ((bol 1 . 1) (comment 1 . 50) (newline 50 . 51) (bol 51 . 51) (IF 51 . 53) (whitespace 53 . 54) (semantic-list 54 . 58) (newline 58 . 59) (bol 59 . 59) (whitespace 59 . 61) (INT 61 . 64) (semantic-list 64 . 66) (whitespace 66 . 67) (symbol 67 . 68) (whitespace 68 . 69) (punctuation 69 . 70) (whitespace 70 . 71) (semantic-list 71 . 78) (punctuation 78 . 79) (newline 79 . 80) (bol 80 . 80) (comment 80 . 86) (newline 86 . 87) (bol 87 . 87)) 2- Ignore comments (returns them as whitespaces) (let ((semantic-flex-enable-bol t) (semantic-flex-enable-newlines t) (semantic-flex-enable-whitespace t)) (semantic-flex-buffer)) ((bol 1 . 1) (whitespace 1 . 50) (newline 50 . 51) (bol 51 . 51) (IF 51 . 53) (whitespace 53 . 54) (semantic-list 54 . 58) (newline 58 . 59) (bol 59 . 59) (whitespace 59 . 61) (INT 61 . 64) (semantic-list 64 . 66) (whitespace 66 . 67) (symbol 67 . 68) (whitespace 68 . 69) (punctuation 69 . 70) (whitespace 70 . 71) (semantic-list 71 . 78) (punctuation 78 . 79) (newline 79 . 80) (bol 80 . 80) (whitespace 80 . 86) (newline 86 . 87) (bol 87 . 87)) 3- Want newline & whitespace (let ((semantic-flex-enable-newlines t) (semantic-flex-enable-whitespace t)) (semantic-flex-buffer)) ((whitespace 1 . 50) (newline 50 . 51) (IF 51 . 53) (whitespace 53 . 54) (semantic-list 54 . 58) (newline 58 . 59) (whitespace 59 . 61) (INT 61 . 64) (semantic-list 64 . 66) (whitespace 66 . 67) (symbol 67 . 68) (whitespace 68 . 69) (punctuation 69 . 70) (whitespace 70 . 71) (semantic-list 71 . 78) (punctuation 78 . 79) (newline 79 . 80) (whitespace 80 . 86) (newline 86 . 87)) 4- Just whitespace (let ((semantic-flex-enable-whitespace t)) (semantic-flex-buffer)) ((whitespace 1 . 50) (IF 51 . 53) (whitespace 53 . 54) (semantic-list 54 . 58) (whitespace 59 . 61) (INT 61 . 64) (semantic-list 64 . 66) (whitespace 66 . 67) (symbol 67 . 68) (whitespace 68 . 69) (punctuation 69 . 70) (whitespace 70 . 71) (semantic-list 71 . 78) (punctuation 78 . 79) (whitespace 80 . 87)) 5- The default (semantic-flex-buffer) ((IF 51 . 53) (semantic-list 54 . 58) (INT 61 . 64) (semantic-list 64 . 66) (symbol 67 . 68) (punctuation 69 . 70) (semantic-list 71 . 78) (punctuation 78 . 79)) 6- etc.! Eric, if you agree maybe could we check it in so Richard could use it for his semantic-python stuff? Sincerely, David __________________________________________________________________ Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ |
From: Eric M. L. <er...@si...> - 2002-05-31 12:35:26
|
>>> pon...@ne... (David Ponce) seems to think that: > >---------708bd7da1659c033708bd7da1659c033 >Content-Type: text/plain; charset=iso-8859-1 >Content-Transfer-Encoding: 8bit >Content-Disposition: inline > >Hi Eric & Richard, > >Attached you will find the latest version of `semantic-flex' >including Richard's newline enhancement plus the following: > >- newline are also correctly handled when `semantic-ignore-comments' > is disabled. > >- I moved BOL detection before processing `semantic-flex-extensions' > because BOL tokens must be always inserted before any other tokens > found on the same line. > > Maybe a more general approach to keep the consistency of tokens > ordering would be to replace the last > > (nreverse ts) > > by > > (sort ts #'(lambda (t1 t2) (<= (cadr t1) (cadr t2)))) > > Or even better, to also merge here whitespace and comment tokens > too? nreverse is going to be the fastest method of re-ordering the tokens. If I remember correctly, Emacs uses quicksort, and quicksort is least efficient on a fully-ordered list ( O(n^2) ). In addition, Emacs is forced to use nthcdr, which adds O(log(n)) to the mix (An extra scan every time it divides.) Thus, the grand total (If I did my analysis correctly) is O(n^2 log(n)) for our case. To our detriment, lexical token lists are very long. If this final reverse/sort is a small portion of the overall time spent analyzing (certainly possible), then I don't have a problem using something else though. ;) >Anyway I think this new version is more consistent in the way it >handle the various `semantic-flex-enable-...' options. [ ... ] I think your resolution to his problem is very good. When you are comfortable with it, please check it in. Thanks Eric -- Eric Ludlam: za...@gn..., er...@si... Home: www.ultranet.com/~zappo Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net GNU: www.gnu.org |
From: Richard Y. K. <ry...@ds...> - 2002-05-31 18:18:47
|
>>>>> "EL" == Eric M Ludlam <er...@si...> writes: EL> EL> [ ... ] EL> EL> I think your resolution to his problem is very good. When you are EL> comfortable with it, please check it in. Wait a minute! I have no problem with Dave's fine work. I have problem with my own change that may have unfortunate side affects! In semantic-flex that deals with comments, I replaced the second (forward-comment 1) with (if (and semantic-flex-enable-newlines (bolp)) (backward-char 1)) Should this be instead (if (and semantic-flex-enable-newlines (bolp)) (backward-char 1) (forward-comment 1)) i.e., add back (forward-comment 1) so that the old behavior is not modified too much? What I did was a quick "fix" in order to test other pieces of code. I certainly would like both of you to review this thoroughly before getting checked in. |
From: David P. <da...@dp...> - 2002-05-31 18:47:49
|
Hi, [...] > nreverse is going to be the fastest method of re-ordering the > tokens. If I remember correctly, Emacs uses quicksort, and quicksort > is least efficient on a fully-ordered list ( O(n^2) ). In addition, > Emacs is forced to use nthcdr, which adds O(log(n)) to the mix (An > extra scan every time it divides.) Thus, the grand total (If I did > my analysis correctly) is O(n^2 log(n)) for our case. To our > detriment, lexical token lists are very long. [...] Good point! For now there is no need for not using `nreverse' so there is no need to change the code of `semantic-flex'. [...] > I think your resolution to his problem is very good. When you are > comfortable with it, please check it in. Thanks! I checked it in so Richard can use it :) I also updated the manual accordingly and fixed some documentation inaccuracy ;-) Following is a patch. I you agree I could check it in too. David *** semantic.texi.ori Mon May 13 07:48:21 2002 --- semantic.texi Fri May 31 20:03:15 2002 *************** *** 413,419 **** --- 413,425 ---- the value of @var{semantic-flex-make-extensions} which may generate @code{shell-command} tokens. + @anchor{Default syntactic tokens} + @subsection Default syntactic tokens if the lexer is not extended. @table @code + @item bol + Empty string matching a beginning of line. + This token is produced only if the user set + @var{semantic-flex-enable-bol} to non-@code{nil}. @item charquote String sequences that match @code{\\s\\+}. @item close-paren *************** *** 425,431 **** They are produced only if the user set @var{semantic-ignore-comments} to @code{nil}. @item newline ! Characters matching @code{\\s-*\\(\n\\)}. This token is produced only if the user set @var{semantic-flex-enable-newlines} to non-@code{nil}. --- 431,437 ---- They are produced only if the user set @var{semantic-ignore-comments} to @code{nil}. @item newline ! Characters matching @code{\\s-*\\(\n\\|\\s>\\)}. This token is produced only if the user set @var{semantic-flex-enable-newlines} to non-@code{nil}. *************** *** 447,452 **** --- 453,464 ---- matching end. @item symbol String sequences that match @code{\\(\\sw\\|\\s_\\)+}. + @item whitespace + Characters that match `\\s-+' regexp. + This token is produced only if the user set + @var{semantic-flex-enable-whitespace} to non-@code{nil}. If + @var{semantic-ignore-comments} is non-@code{nil} too comments are + considered as whitespaces. @end table @node Lexer Options, Keywords, Lexer Output, Lexing *************** *** 456,461 **** --- 468,484 ---- functions, there are ways for you to extend or customize the lexer. Three variables shown below serve this purpose. + @defvar semantic-flex-unterminated-syntax-end-function + Function called when unterminated syntax is encountered. + This should be set to one function. That function should take three + parameters. The @var{SYNTAX}, or type of syntax which is unterminated. + @var{SYNTAX-START} where the broken syntax begins. + @var{FLEX-END} is where the lexical analysis was asked to end. + This function can be used for languages that can intelligently fix up + broken syntax, or the exit lexical analysis via @dfn{throw} or @dfn{signal} + when finding unterminated syntax. + @end defvar + @defvar semantic-flex-extensions Buffer local extensions to the lexical analyzer. This should contain an alist with a key of a regex and a data element of *************** *** 497,509 **** Only set this on a per mode basis, not globally. @end defvar ! @defvar semantic-flex-unterminated-syntax-throw-symbol ! Symbol specifying what to @dfn{throw} upon finding unterminated syntax. ! Lists and strings, could be unterminated. This provides something that ! can be @code{thrown} from the lexical analysis phase for tools that wish ! to take special care when problems arise during a parse. ! Set this variable in a @dfn{let} statement, then wrap lexical or parsing ! calls in @dfn{catch}. @end defvar @node Keywords, Keyword Properties, Lexer Options, Lexing --- 520,567 ---- Only set this on a per mode basis, not globally. @end defvar ! @defvar semantic-flex-enable-whitespace ! When flexing, report @code{'whitespace} as syntactic elements. ! Useful for languages where the syntax is whitespace dependent. ! Only set this on a per mode basis, not globally. ! @end defvar ! ! @defvar semantic-flex-enable-bol ! When flexing, report beginning of lines as syntactic elements. ! Useful for languages like python which are indentation sensitive. ! Only set this on a per mode basis, not globally. ! @end defvar ! ! @defvar semantic-number-expression ! Regular expression for matching a number. ! If this value is @code{nil}, no number extraction is done during lex. ! This expression tries to match C and Java like numbers. ! ! @example ! DECIMAL_LITERAL: ! [1-9][0-9]* ! ; ! HEX_LITERAL: ! 0[xX][0-9a-fA-F]+ ! ; ! OCTAL_LITERAL: ! 0[0-7]* ! ; ! INTEGER_LITERAL: ! <DECIMAL_LITERAL>[lL]? ! | <HEX_LITERAL>[lL]? ! | <OCTAL_LITERAL>[lL]? ! ; ! EXPONENT: ! [eE][+-]?[09]+ ! ; ! FLOATING_POINT_LITERAL: ! [0-9]+[.][0-9]*<EXPONENT>?[fFdD]? ! | [.][0-9]+<EXPONENT>?[fFdD]? ! | [0-9]+<EXPONENT>[fFdD]? ! | [0-9]+<EXPONENT>?[fFdD] ! ; ! @end example @end defvar @node Keywords, Keyword Properties, Lexer Options, Lexing *************** *** 1033,1064 **** will explicitly match one period when used in the above rule. ! Default syntactic tokens (If the lexer is not extended) are: ! @table @code ! @item newline ! A newline if @var{semantic-flex-enable-newline} is non-nil. ! @item symbol ! A symbol for the language, usually comprising alpha numeric ! characters, and _. ! @item number ! A number for the language. You can specify a number format with ! the variable @var{semantic-number-expression}. ! @item charquote ! A character quoting punctuation. Like ? in Emacs Lisp. ! @item semantic-list ! A list, delimited on either end with some parenthetical form. ! @item open-paren ! An opening parenthesis. ! @item close-paren ! A closing parenthesis. ! @item string ! A string, including starting and ending delimiters. ! @item comment ! A comment. This can be stripped from the stream if ! @var{semantic-ignore-comments} is non-nil. ! @item punctuation ! Punctuation characters, such as operators, period, and coma. ! @end table @node Optional Lambda Expression, Examples, Rules, BNF conversion @section Optional Lambda Expressions --- 1091,1098 ---- will explicitly match one period when used in the above rule. ! @xref{Default syntactic tokens}. ! @node Optional Lambda Expression, Examples, Rules, BNF conversion @section Optional Lambda Expressions *************** *** 1186,1192 **** ( "A" "B" ) @end example ! @node Style Guide , , Examples, BNF conversion @section Semantic Token Style Guide In order for a generalized program using Semantic to work with --- 1220,1226 ---- ( "A" "B" ) @end example ! @node Style Guide , , Examples, BNF conversion @section Semantic Token Style Guide In order for a generalized program using Semantic to work with *************** *** 2310,2316 **** For details on using these functions to get more detailed information about the current context: @xref{Context Analysis}. ! @node Making New Methods, , Local Context, Override Methods @subsection Making New Methods @node Parser Hooks, Example Programs, Override Methods, Programming --- 2344,2350 ---- For details on using these functions to get more detailed information about the current context: @xref{Context Analysis}. ! @node Making New Methods, , Local Context, Override Methods @subsection Making New Methods @node Parser Hooks, Example Programs, Override Methods, Programming *************** *** 2432,2438 **** during a flush when the cache is given a new value of nil. @end defvar ! @node Example Programs, , Parser Hooks, Programming @section Programming Examples Here are some simple examples that use different aspects of the --- 2466,2472 ---- during a flush when the cache is given a new value of nil. @end defvar ! @node Example Programs, , Parser Hooks, Programming @section Programming Examples Here are some simple examples that use different aspects of the *************** *** 3213,3219 **** @dfn{semantic-analyze-possible-completions}. @end deffn ! @node Speedbar Analysis, , Smart Completion, analyzer @comment node-name, next, previous, up @subsection Speedbar Analysis --- 3247,3253 ---- @dfn{semantic-analyze-possible-completions}. @end deffn ! @node Speedbar Analysis, , Smart Completion, analyzer @comment node-name, next, previous, up @subsection Speedbar Analysis |
From: Eric M. L. <er...@si...> - 2002-05-31 19:19:19
|
>>> "Richard Y. Kim" <ry...@ds...> seems to think that: >>>>>> "EL" == Eric M Ludlam <er...@si...> writes: > EL> > EL> [ ... ] > EL> > EL> I think your resolution to his problem is very good. When you are > EL> comfortable with it, please check it in. > >Wait a minute! > >I have no problem with Dave's fine work. >I have problem with my own change that may have unfortunate >side affects! > >In semantic-flex that deals with comments, >I replaced the second (forward-comment 1) with > > (if (and semantic-flex-enable-newlines > (bolp)) > (backward-char 1)) > >Should this be instead > > (if (and semantic-flex-enable-newlines > (bolp)) > (backward-char 1) > (forward-comment 1)) > >i.e., add back (forward-comment 1) so that the old behavior >is not modified too much? > >What I did was a quick "fix" in order to test other pieces >of code. I certainly would like both of you to review this >thoroughly before getting checked in. [ ... ] I was merely referring to the overall solution as far as programmer API and behavior, which is why I said to only check it in after you were comfortable with it. (As opposed to fixing everything, and then waiting a week for me to get back from some vacation or other before checking something in.) Eric -- Eric Ludlam: za...@gn..., er...@si... Home: www.ultranet.com/~zappo Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net GNU: www.gnu.org |
From: Eric M. L. <er...@si...> - 2002-05-29 12:47:38
|
Hi, [ ... ] >> If speed becomes an issue, it may make sense to implement >> part of semantic in C. I don't know that we have reached >> that point yet with regard to python. >[...] > >I don't know python at all but it seems that its original design >clearly exhibits the limits of Emacs which is clearly designed to >work well with languages syntax based on classic parenthesized block >structures. Probably because of the Lisp inheritance ;-) > >So, in the case of python, I think it will be difficult for >semantic-flex to easily produce the so nice 'semantic-list tokens >needed to recursively parse sub parts of code. Thus allowing a simple >but general, robust and efficient mechanism to skip code with invalid >syntax without breaking the parser nor cluttering up the (LALR) >grammar with a lot of error recovery rules difficult to tune. Perhaps we can define a hook of some sort that would allow Richard to write a piece of code for the lexer that will identify the first line of indented code, then connect all such lines together into one giant token called 'body, or some-such. In this way, the ll or lalr parser would skip over all such lines very quickly. The trick would then be that the tagging lexer would want this addition, but the full parser (since wisent parsers seem to come in twos) would want it turned off. >I agree with Eric that syntax tables are mainly oriented to navigate, >particularly through parenthesized blocks of code. So, in the case of >python, because of the above orientation (limitation?), it will be >quasi impossible to use such powerful navigation tools like `up-list', >`down-list', etc., heavily used by semantic-ctxt stuff. A lot of >semantic-ctxt functions will probably need to be overrode by specific >code certainly less efficient than the built-in Emacs one :-( The Emacs Lisp and Makefile parsers both need many of these local context functions too. For Emacs Lisp, however, all the cool stuff you can do with the ctxt code is already in Emacs. With Makefiles, much of it makes no sense. Thus, they have not been written. With Python, however, writing such functions would enable new commands such as beginning/end of command to be written. These already exist for C and Java in cc-mode, which is probably why they haven't been written. These would be something good to add to senator: senator-[beginning|end]-of-statement senator-[up|down|forward|backward]-block (better name that block?) >I don't think that writing parts of the Semantic lexer/parser tools in >C will improve Emacs design. Maybe we could submit a Request For >Enhancement to Emacs developers, so Emacs could take into account new >language concepts like python's indentation? [ ... ] Emacs has a built in function called `parse-partial-sexp' which is very interesting, and also very fast. I've always wanted to have something similar that stepped over a buffer from POINT one character at a time matching into the syntax table (exactly as parse-partial sexp does) but return a location stopped when there was a change in syntax in addition to returning the type of thing found. There is also `skip-syntax-forward' which is pretty nifty, but you have to know what syntax is under the cursor. Perhaps a combination of `char-syntax' and `skip-syntax-forward' could be used to make things faster. Hmmm. I detect an experiment I may have to do. Anyway, each time we do a regex search, that regex has to be compiled, then interpreted. Since we are just matching syntax table elements, this is overkill. Since Emacs' syntax handling is meant for navigation, this has never been needed before. Have fun Eric -- Eric Ludlam: za...@gn..., er...@si... Home: www.ultranet.com/~zappo Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net GNU: www.gnu.org |