Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode
Brought to you by:
ptmcg
From: Diez B. R. <de...@we...> - 2010-04-07 13:02:47
|
Hi, I somehow lost the mail by Denis, so I quote it by hand here, hope that works: > (Not really sure about your intent.) My intent is to simply parse a string like this: div . class\.name as tag[div], class[class.name] instead of tag[div], class[class], class[name] For this to happen, I need to special-case escape-codes beginning with \ so that they are *not* treated as identifier followed by a dot, but instead always group the two characters "\." together. > You seem to be using pyparsing features rather strangely. > The 'Word' pattern type allows defining distinct patterns for start and ( > > optional) following characters. Both are character _classes_. You could use > it like: > nameStartChar = ... > nameFollowingChar = ... > name = Word(nameStartChar,nameFollowingChar) > If you want to generalize name to include a dotted format, then rename the > above to namePart and write a pattern including dots. I'm not sure what you mean by this, nor if it helps me. I try to come up with a more concise example, here it is: from pyparsing import * nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape} name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii & ident = nmstart + ZeroOrMore(name) #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]")) MINUS = Literal("-") IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO DOT = Literal(".") ASTERISK = Literal("*") class_ = Combine(DOT + IDENT) element_name = IDENT | ASTERISK selector = (element_name + (ZeroOrMore( class_ )) | OneOrMore( class_ )) print selector.parseString(r"foo.bar") print selector.parseString(r"foo.bar\baz") print selector.parseString(r"foo.bar\.baz") The result is ['foo', '.bar'] ['foo', '.bar\\baz'] ['foo', '.bar\\', '.baz'] So clearly the escaping isn't considering the second dot as part of IDENT instead of a DOT. And for this to happen, I need a specific lexer rule like quotedString - I guess. Diez |