Thread: [Pyparsing] escape sequence in identifieres - with inline sourcecode

Brought to you by: ptmcg

pyparsing-users

[Pyparsing] escape sequence in identifieres - with inline sourcecode

From: Diez B. R. <de...@we...> - 2010-04-02 13:23:35

Hi,

it seems as if the ML strips attachments, so here comes the  
aforementioned example code inline:

from pyparsing import *

nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape}
name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii &  
escape
#numlit = Word(srange("[0-9]"))

MINUS = Literal("-")
IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name),  
adjacent=True) # TODO


print IDENT.parseString(r"foo\bar")
print IDENT.parseString(r"foo\.bar")


The output is

(cssprocessor)mac-dir:ablcssprocessor deets$ python /tmp/test.py
['foo\\bar']
['foo\\']

So you can see there is the whole "\.bar"-stuff missing.

Diez

Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

From: spir ☣ <den...@gm...> - 2010-04-06 10:47:34

On Fri, 2 Apr 2010 15:23:27 +0200
"Diez B. Roggisch" <de...@we...> wrote:

> Hi,
> 
> it seems as if the ML strips attachments, so here comes the  
> aforementioned example code inline:
> 
> from pyparsing import *
> 
> nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape}
> name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii &  
> escape
> #numlit = Word(srange("[0-9]"))
> 
> MINUS = Literal("-")
> IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name),  
> adjacent=True) # TODO

(Not really sure about your intent.)
You seem to be using pyparsing features rather strangely.
The 'Word' pattern type allows defining distinct patterns for start and (optional) following characters. Both are character _classes_. You could use it like:

nameStartChar = ...
nameFollowingChar = ...
name = Word(nameStartChar,nameFollowingChar)

If you want to generalize name to include a dotted format, then rename the above to namePart and write a pattern including dots.

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com

Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

From: Diez B. R. <de...@we...> - 2010-04-07 13:02:47

Hi,

I somehow lost the mail by Denis, so I quote it by hand here, hope that works:

> (Not really sure about your intent.)

My intent is to simply parse a string like this:

  div . class\.name

as 

 tag[div], class[class.name]

instead of 

 tag[div], class[class], class[name]

For this to happen, I need to special-case escape-codes beginning with \ so 
that they are *not* treated as  identifier followed by a dot, but instead 
always group the two characters "\." together.

> You seem to be using pyparsing features rather strangely.
> The 'Word' pattern type allows defining distinct patterns for start and ( >
> optional) following characters. Both are character _classes_. You could use  
> it like:

> nameStartChar = ...
> nameFollowingChar = ...
> name = Word(nameStartChar,nameFollowingChar)

> If you want to generalize name to include a dotted format, then rename the
> above to namePart and write a pattern including dots.


I'm not sure what you mean by this, nor if it helps me. I try to come up with 
a more concise example, here it is:

from pyparsing import *

nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape}
name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii &  

ident = nmstart + ZeroOrMore(name)

#ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]"))

MINUS = Literal("-")
IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO

DOT = Literal(".")
ASTERISK = Literal("*")

class_ = Combine(DOT + IDENT)
element_name = IDENT | ASTERISK

selector = (element_name + (ZeroOrMore( class_ )) |
            OneOrMore( class_ ))


print selector.parseString(r"foo.bar")
print selector.parseString(r"foo.bar\baz")
print selector.parseString(r"foo.bar\.baz")



The result is 

['foo', '.bar']
['foo', '.bar\\baz']
['foo', '.bar\\', '.baz']


So clearly the escaping isn't considering the second dot as part of IDENT 
instead of a DOT. And for this to happen, I need a specific lexer rule like 
quotedString - I guess. 

Diez

Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

From: Diez B. R. <de...@we...> - 2010-04-07 20:51:57

Hi,


ok, I don't know why I didn't think of this the first place - maybe  
some weird "you are using pyparsing, no need to bother with nitty  
gritty regexes", but that's what helped - and should have been obvious  
to me :)

         escapes = r"\\\\|\\\."
         IDENT = Regex(r"([a-zA-Z_-]|(%(escapes)s))([a-zA-Z0-9_-]|(% 
(escapes)s))*" %
                        dict(escapes=escapes))


I post this just for the record.

Diez

Am 07.04.2010 um 15:57 schrieb Diez B. Roggisch:

> Hi,
>
> I somehow lost the mail by Denis, so I quote it by hand here, hope  
> that works:
>
>> (Not really sure about your intent.)
>
> My intent is to simply parse a string like this:
>
>  div . class\.name
>
> as
>
> tag[div], class[class.name]
>
> instead of
>
> tag[div], class[class], class[name]
>
> For this to happen, I need to special-case escape-codes beginning  
> with \ so
> that they are *not* treated as  identifier followed by a dot, but  
> instead
> always group the two characters "\." together.
>
>> You seem to be using pyparsing features rather strangely.
>> The 'Word' pattern type allows defining distinct patterns for start  
>> and ( >
>> optional) following characters. Both are character _classes_. You  
>> could use
>> it like:
>
>> nameStartChar = ...
>> nameFollowingChar = ...
>> name = Word(nameStartChar,nameFollowingChar)
>
>> If you want to generalize name to include a dotted format, then  
>> rename the
>> above to namePart and write a pattern including dots.
>
>
> I'm not sure what you mean by this, nor if it helps me. I try to  
> come up with
> a more concise example, here it is:
>
> from pyparsing import *
>
> nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape}
> name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii &
>
> ident = nmstart + ZeroOrMore(name)
>
> #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]"))
>
> MINUS = Literal("-")
> IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO
>
> DOT = Literal(".")
> ASTERISK = Literal("*")
>
> class_ = Combine(DOT + IDENT)
> element_name = IDENT | ASTERISK
>
> selector = (element_name + (ZeroOrMore( class_ )) |
>            OneOrMore( class_ ))
>
>
> print selector.parseString(r"foo.bar")
> print selector.parseString(r"foo.bar\baz")
> print selector.parseString(r"foo.bar\.baz")
>
>
>
> The result is
>
> ['foo', '.bar']
> ['foo', '.bar\\baz']
> ['foo', '.bar\\', '.baz']
>
>
> So clearly the escaping isn't considering the second dot as part of  
> IDENT
> instead of a DOT. And for this to happen, I need a specific lexer  
> rule like
> quotedString - I guess.
>
> Diez
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Pyparsing-users mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyparsing-users
>