Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

ok, I don't know why I didn't think of this the first place - maybe  
some weird "you are using pyparsing, no need to bother with nitty  
gritty regexes", but that's what helped - and should have been obvious  
to me :)

         escapes = r"\\\\|\\\."
         IDENT = Regex(r"([a-zA-Z_-]|(%(escapes)s))([a-zA-Z0-9_-]|(% 
(escapes)s))*" %
                        dict(escapes=escapes))

I post this just for the record.

Diez

Am 07.04.2010 um 15:57 schrieb Diez B. Roggisch:

> Hi,
>
> I somehow lost the mail by Denis, so I quote it by hand here, hope  
> that works:
>
>> (Not really sure about your intent.)
>
> My intent is to simply parse a string like this:
>
>  div . class\.name
>
> as
>
> tag[div], class[class.name]
>
> instead of
>
> tag[div], class[class], class[name]
>
> For this to happen, I need to special-case escape-codes beginning  
> with \ so
> that they are *not* treated as  identifier followed by a dot, but  
> instead
> always group the two characters "\." together.
>
>> You seem to be using pyparsing features rather strangely.
>> The 'Word' pattern type allows defining distinct patterns for start  
>> and ( >
>> optional) following characters. Both are character _classes_. You  
>> could use
>> it like:
>
>> nameStartChar = ...
>> nameFollowingChar = ...
>> name = Word(nameStartChar,nameFollowingChar)
>
>> If you want to generalize name to include a dotted format, then  
>> rename the
>> above to namePart and write a pattern including dots.
>
>
> I'm not sure what you mean by this, nor if it helps me. I try to  
> come up with
> a more concise example, here it is:
>
> from pyparsing import *
>
> nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape}
> name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii &
>
> ident = nmstart + ZeroOrMore(name)
>
> #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]"))
>
> MINUS = Literal("-")
> IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO
>
> DOT = Literal(".")
> ASTERISK = Literal("*")
>
> class_ = Combine(DOT + IDENT)
> element_name = IDENT | ASTERISK
>
> selector = (element_name + (ZeroOrMore( class_ )) |
>            OneOrMore( class_ ))
>
>
> print selector.parseString(r"foo.bar")
> print selector.parseString(r"foo.bar\baz")
> print selector.parseString(r"foo.bar\.baz")
>
>
>
> The result is
>
> ['foo', '.bar']
> ['foo', '.bar\\baz']
> ['foo', '.bar\\', '.baz']
>
>
> So clearly the escaping isn't considering the second dot as part of  
> IDENT
> instead of a DOT. And for this to happen, I need a specific lexer  
> rule like
> quotedString - I guess.
>
> Diez
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Pyparsing-users mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyparsing-users
>