Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode
Brought to you by:
ptmcg
From: Diez B. R. <de...@we...> - 2010-04-07 20:51:57
|
Hi, ok, I don't know why I didn't think of this the first place - maybe some weird "you are using pyparsing, no need to bother with nitty gritty regexes", but that's what helped - and should have been obvious to me :) escapes = r"\\\\|\\\." IDENT = Regex(r"([a-zA-Z_-]|(%(escapes)s))([a-zA-Z0-9_-]|(% (escapes)s))*" % dict(escapes=escapes)) I post this just for the record. Diez Am 07.04.2010 um 15:57 schrieb Diez B. Roggisch: > Hi, > > I somehow lost the mail by Denis, so I quote it by hand here, hope > that works: > >> (Not really sure about your intent.) > > My intent is to simply parse a string like this: > > div . class\.name > > as > > tag[div], class[class.name] > > instead of > > tag[div], class[class], class[name] > > For this to happen, I need to special-case escape-codes beginning > with \ so > that they are *not* treated as identifier followed by a dot, but > instead > always group the two characters "\." together. > >> You seem to be using pyparsing features rather strangely. >> The 'Word' pattern type allows defining distinct patterns for start >> and ( > >> optional) following characters. Both are character _classes_. You >> could use >> it like: > >> nameStartChar = ... >> nameFollowingChar = ... >> name = Word(nameStartChar,nameFollowingChar) > >> If you want to generalize name to include a dotted format, then >> rename the >> above to namePart and write a pattern including dots. > > > I'm not sure what you mean by this, nor if it helps me. I try to > come up with > a more concise example, here it is: > > from pyparsing import * > > nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape} > name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii & > > ident = nmstart + ZeroOrMore(name) > > #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]")) > > MINUS = Literal("-") > IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO > > DOT = Literal(".") > ASTERISK = Literal("*") > > class_ = Combine(DOT + IDENT) > element_name = IDENT | ASTERISK > > selector = (element_name + (ZeroOrMore( class_ )) | > OneOrMore( class_ )) > > > print selector.parseString(r"foo.bar") > print selector.parseString(r"foo.bar\baz") > print selector.parseString(r"foo.bar\.baz") > > > > The result is > > ['foo', '.bar'] > ['foo', '.bar\\baz'] > ['foo', '.bar\\', '.baz'] > > > So clearly the escaping isn't considering the second dot as part of > IDENT > instead of a DOT. And for this to happen, I need a specific lexer > rule like > quotedString - I guess. > > Diez > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |