Thread: [Pyparsing] escape sequence in identifieres - with inline sourcecode
Brought to you by:
ptmcg
From: Diez B. R. <de...@we...> - 2010-04-02 13:23:35
|
Hi, it seems as if the ML strips attachments, so here comes the aforementioned example code inline: from pyparsing import * nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape} name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii & escape #numlit = Word(srange("[0-9]")) MINUS = Literal("-") IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name), adjacent=True) # TODO print IDENT.parseString(r"foo\bar") print IDENT.parseString(r"foo\.bar") The output is (cssprocessor)mac-dir:ablcssprocessor deets$ python /tmp/test.py ['foo\\bar'] ['foo\\'] So you can see there is the whole "\.bar"-stuff missing. Diez |
From: spir ☣ <den...@gm...> - 2010-04-06 10:47:34
|
On Fri, 2 Apr 2010 15:23:27 +0200 "Diez B. Roggisch" <de...@we...> wrote: > Hi, > > it seems as if the ML strips attachments, so here comes the > aforementioned example code inline: > > from pyparsing import * > > nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape} > name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii & > escape > #numlit = Word(srange("[0-9]")) > > MINUS = Literal("-") > IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name), > adjacent=True) # TODO (Not really sure about your intent.) You seem to be using pyparsing features rather strangely. The 'Word' pattern type allows defining distinct patterns for start and (optional) following characters. Both are character _classes_. You could use it like: nameStartChar = ... nameFollowingChar = ... name = Word(nameStartChar,nameFollowingChar) If you want to generalize name to include a dotted format, then rename the above to namePart and write a pattern including dots. Denis ________________________________ vit esse estrany ☣ spir.wikidot.com |
From: Diez B. R. <de...@we...> - 2010-04-07 13:02:47
|
Hi, I somehow lost the mail by Denis, so I quote it by hand here, hope that works: > (Not really sure about your intent.) My intent is to simply parse a string like this: div . class\.name as tag[div], class[class.name] instead of tag[div], class[class], class[name] For this to happen, I need to special-case escape-codes beginning with \ so that they are *not* treated as identifier followed by a dot, but instead always group the two characters "\." together. > You seem to be using pyparsing features rather strangely. > The 'Word' pattern type allows defining distinct patterns for start and ( > > optional) following characters. Both are character _classes_. You could use > it like: > nameStartChar = ... > nameFollowingChar = ... > name = Word(nameStartChar,nameFollowingChar) > If you want to generalize name to include a dotted format, then rename the > above to namePart and write a pattern including dots. I'm not sure what you mean by this, nor if it helps me. I try to come up with a more concise example, here it is: from pyparsing import * nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape} name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii & ident = nmstart + ZeroOrMore(name) #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]")) MINUS = Literal("-") IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO DOT = Literal(".") ASTERISK = Literal("*") class_ = Combine(DOT + IDENT) element_name = IDENT | ASTERISK selector = (element_name + (ZeroOrMore( class_ )) | OneOrMore( class_ )) print selector.parseString(r"foo.bar") print selector.parseString(r"foo.bar\baz") print selector.parseString(r"foo.bar\.baz") The result is ['foo', '.bar'] ['foo', '.bar\\baz'] ['foo', '.bar\\', '.baz'] So clearly the escaping isn't considering the second dot as part of IDENT instead of a DOT. And for this to happen, I need a specific lexer rule like quotedString - I guess. Diez |
From: Diez B. R. <de...@we...> - 2010-04-07 20:51:57
|
Hi, ok, I don't know why I didn't think of this the first place - maybe some weird "you are using pyparsing, no need to bother with nitty gritty regexes", but that's what helped - and should have been obvious to me :) escapes = r"\\\\|\\\." IDENT = Regex(r"([a-zA-Z_-]|(%(escapes)s))([a-zA-Z0-9_-]|(% (escapes)s))*" % dict(escapes=escapes)) I post this just for the record. Diez Am 07.04.2010 um 15:57 schrieb Diez B. Roggisch: > Hi, > > I somehow lost the mail by Denis, so I quote it by hand here, hope > that works: > >> (Not really sure about your intent.) > > My intent is to simply parse a string like this: > > div . class\.name > > as > > tag[div], class[class.name] > > instead of > > tag[div], class[class], class[name] > > For this to happen, I need to special-case escape-codes beginning > with \ so > that they are *not* treated as identifier followed by a dot, but > instead > always group the two characters "\." together. > >> You seem to be using pyparsing features rather strangely. >> The 'Word' pattern type allows defining distinct patterns for start >> and ( > >> optional) following characters. Both are character _classes_. You >> could use >> it like: > >> nameStartChar = ... >> nameFollowingChar = ... >> name = Word(nameStartChar,nameFollowingChar) > >> If you want to generalize name to include a dotted format, then >> rename the >> above to namePart and write a pattern including dots. > > > I'm not sure what you mean by this, nor if it helps me. I try to > come up with > a more concise example, here it is: > > from pyparsing import * > > nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape} > name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii & > > ident = nmstart + ZeroOrMore(name) > > #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]")) > > MINUS = Literal("-") > IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO > > DOT = Literal(".") > ASTERISK = Literal("*") > > class_ = Combine(DOT + IDENT) > element_name = IDENT | ASTERISK > > selector = (element_name + (ZeroOrMore( class_ )) | > OneOrMore( class_ )) > > > print selector.parseString(r"foo.bar") > print selector.parseString(r"foo.bar\baz") > print selector.parseString(r"foo.bar\.baz") > > > > The result is > > ['foo', '.bar'] > ['foo', '.bar\\baz'] > ['foo', '.bar\\', '.baz'] > > > So clearly the escaping isn't considering the second dot as part of > IDENT > instead of a DOT. And for this to happen, I need a specific lexer > rule like > quotedString - I guess. > > Diez > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |