[Pyparsing] Using pyparsing for syntax highlighting
Brought to you by:
ptmcg
From: Orestis M. <or...@or...> - 2008-12-14 15:27:35
|
Hello, I'm trying to use pyparsing to do syntax highlighting for a toy editor I'm writing. I'm trying to duplicate Vim's syntax which is simple enough not to be mind-boggling, but complicated enough to be interesting. Here's one example: syn keyword pythonStatement def class nextgroup=pythonFunction skipwhite syn match pythonFunction "[a-zA-Z_][a-zA-Z0-9_]*" contained This says that 'def' and 'class' are keywords belonging to the group 'pythonStatement'. When they are encountered, you should first try the 'pythonFunction' match (before trying everything else). The 'pythonFunction' isn't ever tried standalone because it's 'contained'. I ignore 'skipwhite' for now. I've tried to duplicate this by: def pos(s, loc, tokens): actual_loc = s.index(tokens[0], loc) return (actual_loc, len(tokens[0]) + actual_loc, tokens) def contains(expr): expr.setParseAction(pos) def parse_contained_expr(s, loc, tok): substr = s[loc+len(tok[0]):] print expr.parseString(substr) # change the tokens list here. return parse_contained_expr pythonFunction = Regex("[a-zA-Z_][a-zA-Z0-9_]*") + WordEnd(alphanums + '_').suppress() def_class = set('def class'.split()) def_class = Or(map(Keyword, def_class)).setParseAction(contains(pythonFunction), pos) test = """\ def something(): pass """ print def_class.parseString(test) produces: [(1, 10, (['something'], {}))] [(0, 3, (['def'], {}))] I have the following problems with this approach: 1) I have to make 'match' elements to be single-lines only. I want to handle lines by myself. I tried doing 'SkipTo(LineEnd(), include=False)' but it still goes to the next line, and it also gives me the '():' 2) I want to keep a track of the locations of the tokens. Effectively, at the end of the parsing I would like to have a marked-up version of the input string, with the types of the tokens. My loc function is an attempt at that, but it doesn't have any global state, so the second location (of the contained expression) is based on the chopped input string. 3) It seems very complicated and fiddly, I'm sure I'm doing something wrong. I wonder if I need another level of abstraction for this (like a scanner object that keeps track of the locations, keeping the parse actions simple) or another approach. Adding another level of containment seems a nightmare. Many thanks for your help! Orestis -- or...@or... http://orestis.gr/ |