Re: [CEDET-devel] Thinking out loud...
Brought to you by:
zappo
From: Eric M. L. <er...@si...> - 2008-02-02 18:57:59
|
Hi, To the best of my knowledge, everything you need to give this a try is available in CEDET. Start by making a copy of cedet/contrib/wisent-ruby.wy which is a great contribution from Daniel Debertin. Replace the various actions with code that calls put-text-property for various colors instead. In addition, the various lexers for strings and comments could add the colors right at detection time. I think you'd need to disable font-lock in that buffer to make it work though. Even when the parser doesn't work, you can enable `global-semantic-show-unmatched-syntax-mode' to show what doesn't work. Since Semantic implements iterative parsing, it makes handling errors very simple since bogus text is just ignored. As for editing, Semantic already supports incremental parsing during editing for tags. It ought to be possible to take advantage of that. It might be possible to mix tag creation and font-locking. If this is the case, the tag caching of semanticdb would be a detriment since you'd have to store the entire color scheme to disk too, or do the parse. Disabling font-lock for ruby, and adding a single text property setting into the existing ruby parser would likely be a great way to see how it might work. The harder part would likely be the infrastructure you'd need to keep it working unobtrusively. Good Luck Eric >>> Perry Smith <pe...@ea...> seems to think that: >This note may not exactly fit this list but I thought it came close. > >So far, I have not gotten font-lock for Ruby to work 100%. And part >of the problem is that parsing Ruby is really hard to do by, what I >call, guessing. E.g. % string (there is a space before and after >string. If the % comes in some parse states, then the whole thing is >a string but if it shows up in other parse states, then the % is an >operator and string is an identifier. There is also this toe tapping >and noise touching you have to do if strings can take up multiple >lines. All this is based upon the assumption that a full and complete >parse is not possible. > >Well... it is. And probably for a lot less cost than all the guessing >currently being done. So, here's my vague grand idea: > >Using tools like wisent, implement a full parser and lexer. Something >that totally parses the entire language. As it is parsing, it adds >properties to the text. One thought is to add them only to >significant characters like the first character of an identifier -- >mostly to save space and time. > >These properties are to be defined but generally they represent the >state of the parser at that moment. This might be just the top non- >terminal on the parse stack or it might be the whole parse stack. >And, another property will be the lexical name of the token. IDENT or >STRING or whatever. > >The file is loaded, we run through it, parse and mark it. > >Now, we can properly do font-lock based upon the token properties. No >guess work at all is needed. > >Problem 1: what about adding or deleting text? My blue sky idea is to >back up until we find a parse stack property. Then start parsing as >before. We stop when we come to a parse stack property in the text >that matches the current parser's parse stack. You can see a trade >off between the frequency that we mark the text. Lots of marks we >need to re-parse and re-color smaller pieces but we also trade off >space and time adding the marks. > >Problem 2: Parse Failures... Lets break this down to sub problems. > >2.A: In languages like C that have a preprocessor, a single file is >usually not legal C until all of the preprocessing has been done. >And, with conditional compilation, the output of the preprocessor >skips sections of C code. In these languages, probably the most >practical approach is to start guessing again. When the error happen, >try and recover as best as possible. Perhaps allow special comments >in the code to identify particular structures to re-sync the parse. >e.g. /** start of function **/. Or patterns that could be set to >identify the start of a function. That would resync most parsing >problems and keep the vague, unparsed areas down to a minimum. > >2.B: truly illegal code: in this case, flag it in a warning color. >Recovery from parse errors is still key > >Really, all the parse failure problems put us back into guess mode but >yacc/bison parse error recovery is well researched and it could be >augmented with patterns (including special comments) to help out. > >pedz > > >------------------------------------------------------------------------- >This SF.net email is sponsored by: Microsoft >Defy all challenges. Microsoft(R) Visual Studio 2008. >http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >_______________________________________________ >Cedet-devel mailing list >Ced...@li... >https://lists.sourceforge.net/lists/listinfo/cedet-devel > -- Eric Ludlam: er...@si... Siege: www.siege-engine.com Emacs: http://cedet.sourceforge.net |