pyparsing-users Mailing List for Python parsing module (Page 12)

Brought to you by: ptmcg

pyparsing-users — User notes and help on the pyparsing module

You can subscribe to this list here.

2004	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug (2)	Sep	Oct	Nov (2)	Dec
2005	Jan (2)	Feb	Mar (2)	Apr (12)	May (2)	Jun	Jul	Aug (12)	Sep	Oct (1)	Nov	Dec
2006	Jan (5)	Feb (1)	Mar (10)	Apr (3)	May (7)	Jun (2)	Jul (2)	Aug (7)	Sep (8)	Oct (17)	Nov	Dec (3)
2007	Jan (4)	Feb	Mar (10)	Apr	May (6)	Jun (11)	Jul (1)	Aug	Sep (19)	Oct (8)	Nov (32)	Dec (8)
2008	Jan (12)	Feb (6)	Mar (42)	Apr (47)	May (17)	Jun (15)	Jul (7)	Aug (2)	Sep (13)	Oct (6)	Nov (11)	Dec (3)
2009	Jan (2)	Feb (3)	Mar	Apr	May (11)	Jun (13)	Jul (19)	Aug (17)	Sep (8)	Oct (3)	Nov (7)	Dec (1)
2010	Jan (2)	Feb	Mar (19)	Apr (6)	May	Jun (2)	Jul	Aug (1)	Sep	Oct (4)	Nov (3)	Dec (2)
2011	Jan (4)	Feb	Mar (5)	Apr (1)	May (3)	Jun (8)	Jul (6)	Aug (8)	Sep (35)	Oct (1)	Nov (1)	Dec (2)
2012	Jan (2)	Feb	Mar (3)	Apr (4)	May	Jun (1)	Jul	Aug (6)	Sep (18)	Oct	Nov (1)	Dec
2013	Jan (7)	Feb (7)	Mar (1)	Apr (4)	May	Jun	Jul (1)	Aug (5)	Sep (3)	Oct (11)	Nov (3)	Dec
2014	Jan (3)	Feb (1)	Mar	Apr (6)	May (10)	Jun (4)	Jul	Aug (5)	Sep (2)	Oct (4)	Nov (1)	Dec
2015	Jan	Feb	Mar	Apr (13)	May (1)	Jun	Jul (2)	Aug	Sep (9)	Oct (2)	Nov (11)	Dec (2)
2016	Jan	Feb (3)	Mar (2)	Apr	May	Jun	Jul (3)	Aug	Sep	Oct (1)	Nov (1)	Dec (4)
2017	Jan (2)	Feb (2)	Mar (2)	Apr	May	Jun	Jul (4)	Aug	Sep	Oct (4)	Nov (3)	Dec
2018	Jan (10)	Feb	Mar (1)	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (2)	Nov	Dec
2019	Jan	Feb	Mar	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec
2020	Jan	Feb (1)	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2023	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb (1)	Mar	Apr (1)	May	Jun	Jul (1)	Aug (3)	Sep (1)	Oct (1)	Nov	Dec

Flat | Threaded

<< < 1 .. 10 11 12 13 14 .. 31 > >> (Page 12 of 31)

[Pyparsing] PayPal IPN message parsing

From: Werner F. B. <wer...@fr...> - 2011-01-06 14:10:18

I am having some problems decoding these messages.

The data comes in as an email message with a defined content type as 
"Content-Type: text/plain", however it is really Content-Type: 
text/plain; charset="windows-1252", so I read it in with

thisfile = codecs.open(regFile, "r", "windows-1252").

The parsing works fine except on things like:

address_name = Göran Petterson

Which I parse with:
         alphanums = pyp.Word(pyp.alphanums)

         # address
         str_add_name = pyp.Literal("address_name =").suppress() +\
                               alphanums + pyp.restOfLine
         add_name = str_add_name.setParseAction(self.str_add_nameAction)

But I get in str_add_nameAction:
([u'G', u'\xf6ran Petterson\r'], {})

The raw data at this point is "address_name = G\xf6ran Petterson"

What am I doing wrong in all this?

I tried using pyp.printables instead of alphanums but with the same result.

A tip would be very much appreciated.

Werner

P.S.
Happy New Year to you all.

[Pyparsing] fan mail + changes suggestions for operatorPrecedence and Or

From: Andrea C. <an...@cd...> - 2010-12-26 05:25:38

Hi,

I found PyParsing really easy to work with. Here is what I built on top of it:

      http://andreacensi.github.com/contracts/

In the mean time, I toyed around and changed something in it. Perhaps
some of this is helpful.

The goal was to get better error messages ("closer" to the error; I
hope you know what I mean).
The following are the bits that I think are useful.

1) operatorPrecedence,  modification 1

Around line 3579, there is:

            elif arity == 2:
                if opExpr is not None:
                    matchExpr = FollowedBy(lastExpr + opExpr +
lastExpr) + Group( lastExpr + OneOrMore( opExpr + lastExpr ) )

This seems wasteful and does not use "-" when it should. I modified it as such:

            elif arity == 2:
                if opExpr is not None:
                    matchExpr = Group(lastExpr + FollowedBy(opExpr) +
OneOrMore(opExpr - lastExpr))

In this way, we advance the pointer past the opExpr. I think this is
the right semantics for 99% of the cases. The exception is if the user
is overloading the opExprs.

2) operatorPrecedence,  modification 2

At the beginning, you have:

    lastExpr = baseExpr | ( Suppress('(') + ret + Suppress(')') )

I modified it using:

    opnames = ",".join(str(x) for x in allops)
    parenthesis = Suppress('(') + ret +
FollowedBy(NotAny(oneOf(allops))) - Suppress(')')
    lastExpr = parenthesis | baseExpr

Basically if I see the parenthesis, after a ret, if there isn't an
operator, we have to find the parenthesis.

These two together make me have much better error messages: (see in fixed width)

line  1 >list(1,2,(tuple(str,a,(?)))
                                 ^
                                 |
                                 here or nearby

 line  1 >1+(3*2?)
                ^
                |
                here or nearby

You can find the whole function here:

https://github.com/AndreaCensi/contracts/blob/feature/better_error_messages/src/contracts/pyparsing_utils.py

(this is not the main branch)


2) Catching ambiguity in Or().

Because my grammar is context-dependent (meaning that 'x' might be
parsed differently according to the context), I had several debugging
pains, especially when I was trying to get rid of the Or() in favor of
MatchFirst().

What I did was modify Or() such that it checks that, if two clauses
can parse the string with the same number of characters, then they
have to have the same ParseResults. (if that's not true, it's a
disaster waiting to happen)

This involved adding __eq__ to ParseResults and then add the following
to Or. Where it says:

            else:
                if loc2 > maxMatchLoc:
                    maxMatchLoc = loc2
                    maxMatchExp = e

I modify it in:

       else:
                if loc2 > maxMatchLoc:
                    maxMatchLoc = loc2
                    maxMatchExp = e
                elif loc2 == maxMatchLoc:
                    val1 = e._parse(instring, loc, True)
                    val2 = maxMatchExp._parse(instring, loc, True)
                    if not(val1 == val2):
                        msg = ('Ambiguous syntax, I could match both
(and maybe more):\n- %s\n- %s\n.' %
                               (get_desc(e), get_desc(maxMatchExp)))
                        msg += 'Their values are: \n'
                        msg += '- {0!r}\n'.format(val1)
                        msg += '- {0!r}\n'.format(val2)
                        raise ParseFatalException(instring, loc, msg, self)

You can see this in

https://github.com/AndreaCensi/contracts/blob/feature/better_error_messages/src/contracts/mypyparsing.py#L2546

(ignore the other stuff I changed in mypyparsing; I was
"experimenting" to understand that business of Fatal vs non-Fatal
exceptions)


Best,

Andrea

-- 
Andrea Censi
PhD student, Control & Dynamical Systems, Caltech
http://www.cds.caltech.edu/~andrea/
    "Life is far too important to be taken seriously" (Oscar Wilde)

[Pyparsing] SVN-version Python3 ?

From: Helmut J. <jar...@ig...> - 2010-12-23 11:41:18

Hi,

I'd like to write an ebuild on my GenToo system for installing 
Pyparsing from SVN. If I specify that pyparsing is supporting Python 3
my install scripts execute the file
pyparsing_py3.py with Python-3.1

pyparsing_py3.py has an
import builtin
statement which fails under Python3.
What am I missing?

Thanks for a hint,
Helmut.

-- 
Helmut Jarausch
Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

Re: [Pyparsing] Lists and Groups

From: Michael F. <nob...@gm...> - 2010-11-15 16:39:48

Thanks!

I've confirmed it works, even after suppressing 'get' and 'AT', and grouping
the shopping list.

--Michael

On Sat, Nov 13, 2010 at 8:25 PM, Paul McGuire <pt...@au...> wrote:

> I don't think you've gone overboard here.  Your BNF *will* be somewhat
> informal, but don't give up on it.
>
> Your basic form is:
>
> (get) (stuff) (at) (location)
>
> Given that you have some entries where you just get stuff, or specify a
> location just based on a leading '@' symbol, this gets a little more
> complex, (using []'s for optional parts:
>
> (get) (stuff) [[(at)] (location)]
>
> (stuff) and (location) are going to be pretty unstructured, but
> fortunately,
> you're defining some specific forms for (get) and (at).
>
> Your (stuff) can contain a list of items separated by commas, (and), or (,
> and), so I think you can define it as pretty open for (item), and then
> define a delimited list for the list of items.  You'll need to specify the
> lookahead (as you already picked up in your posted pyparsing code) to avoid
> parsing "at" or "and" or a store label with a leading '@' as an item word.
> And using Keyword's for your delimiting words is a good choice, to guard
> against accidentally reading the leading 'at' in 'athletic socks' as 'at',
> and the remaining 'hletic socks' as some kind of store.
>
> So I'd informally write this as:
>
> shopping-list ::= get stuff [[at] location]
> get ::= "get" | "buy" | "pick up"
> at ::= "at" | "from"
> item ::= (~(and | at | '@') Word(alphas+'-'))...
> stuff ::= item [ (', and' | ',' | 'and') item ]...
> location ::= ['@'] Word(alphas)...
>
> Rendered into pyparsing, it ends up very similar to your posted code:
>
> COMMA,AT = map(Literal, ',@')
> KW = CaselessKeyword
> get = KW('get') | KW('buy') | KW('pick') + KW('up') | KW('need')
> at = KW('at') | KW('from')
> location = Combine(Optional(AT) + OneOrMore(Word(alphas)), ' ',
> adjacent=False)
> and_ = KW('and')
> itemdelim = COMMA + and_ | COMMA | and_
> item = Combine(OneOrMore(~(at | itemdelim | '@') + Word(alphas)), ' ',
> adjacent=False)
> stuff = delimitedList(item, itemdelim)
> shoppingList = get + stuff("items") + Optional(Optional(at) +
> location("location"))
>
> And this seems to work ok for your posted tests.
>
> Please try to avoid defining Literals with embedded spaces, and
> *especially*
> not with leading spaces (as you do with " and ") - pyparsing's default
> whitespace skipping will almost surely make your leading-whitespace literal
> unmatchable.  Note how I defined "pick up" as an option for (get) as two
> joined keywords - this immunizes us against cases with extra whitespace
> between the two words, at very little cost.
>
> Thanks for writing, and welcome to the World of Pyparsing!  - :)
>
> -- Paul
>
>
>

Re: [Pyparsing] Lists and Groups

From: Paul M. <pt...@au...> - 2010-11-14 03:25:48

I don't think you've gone overboard here.  Your BNF *will* be somewhat
informal, but don't give up on it.

Your basic form is:

(get) (stuff) (at) (location)

Given that you have some entries where you just get stuff, or specify a
location just based on a leading '@' symbol, this gets a little more
complex, (using []'s for optional parts:

(get) (stuff) [[(at)] (location)]

(stuff) and (location) are going to be pretty unstructured, but fortunately,
you're defining some specific forms for (get) and (at).

Your (stuff) can contain a list of items separated by commas, (and), or (,
and), so I think you can define it as pretty open for (item), and then
define a delimited list for the list of items.  You'll need to specify the
lookahead (as you already picked up in your posted pyparsing code) to avoid
parsing "at" or "and" or a store label with a leading '@' as an item word.
And using Keyword's for your delimiting words is a good choice, to guard
against accidentally reading the leading 'at' in 'athletic socks' as 'at',
and the remaining 'hletic socks' as some kind of store.

So I'd informally write this as:

shopping-list ::= get stuff [[at] location]
get ::= "get" | "buy" | "pick up"
at ::= "at" | "from"
item ::= (~(and | at | '@') Word(alphas+'-'))...
stuff ::= item [ (', and' | ',' | 'and') item ]...
location ::= ['@'] Word(alphas)...

Rendered into pyparsing, it ends up very similar to your posted code:

COMMA,AT = map(Literal, ',@')
KW = CaselessKeyword
get = KW('get') | KW('buy') | KW('pick') + KW('up') | KW('need')
at = KW('at') | KW('from')
location = Combine(Optional(AT) + OneOrMore(Word(alphas)), ' ',
adjacent=False)
and_ = KW('and')
itemdelim = COMMA + and_ | COMMA | and_
item = Combine(OneOrMore(~(at | itemdelim | '@') + Word(alphas)), ' ',
adjacent=False)
stuff = delimitedList(item, itemdelim)
shoppingList = get + stuff("items") + Optional(Optional(at) +
location("location"))

And this seems to work ok for your posted tests.

Please try to avoid defining Literals with embedded spaces, and *especially*
not with leading spaces (as you do with " and ") - pyparsing's default
whitespace skipping will almost surely make your leading-whitespace literal
unmatchable.  Note how I defined "pick up" as an option for (get) as two
joined keywords - this immunizes us against cases with extra whitespace
between the two words, at very little cost.

Thanks for writing, and welcome to the World of Pyparsing!  - :)

-- Paul

[Pyparsing] Lists and Groups

From: Michael F. <nob...@gm...> - 2010-11-12 18:26:25

Hello all,

First, thanks for the wonderful programming API. I'm having a long of fun
with it.

I have a question, I feel like I'm ruining the BNF expression when overly
massage the input. Should I stick to BNF or start tokenizing? Allow me to
explain.

I've always been fascinated with interfaces which seem to understand human
language, kind of like gmail's "quick-add" featurer for its calender.

So, to learn how to do this better, I'm writing a little diddy to interpret
simple commands into creating a shopping list. Here is a list of generic
input I expect it to understand


#test grammar
tests= """\ buy chocolate milk, eggs and bread at store
    get eggs at @Vons
    need headache medicine and cough medicine from downtown drug store
    buy nails, claw hammer, and studs at @homedepot
    buy ragjoint at car parts store
    buy laptop at electronics store
    get candy @Sees
    get a clue""".splitlines()

My evil mashup is as follows...

# define grammar
KW = Keyword
and_ = Literal(" and ")
comma = Literal(",")
at = KW("from") | KW("at")
label = Literal("@") + Word(alphanums)
buy = KW("buy") | KW("get") | KW("need")
item = Group( OneOrMore( ~ (at | and_) + (Word(alphas + "-"))) +
Suppress(Optional(comma))) | Suppress(Optional(and_))
items = Group( OneOrMore( item ) )
store = KW("store") | KW("market") | KW("grocery")
storetype = Suppress(at) + Group(OneOrMore( Word(alphas)))
storelabel = label
storename = storetype | storelabel
shlisti = Suppress(buy) + items + Optional( storename )

What I wish to do is interpret a sentence starts with some synonym of "buy",
interprets everything after that as a list of things to buy (accepting and
as a synonym for a comma) until it reaches some synonym of "from", and then
everything after that is interpreted as a store name. All I want to keep is
the list of items, and the storename.

What I wind up doing is as follows:

buy chocolate milk, eggs and bread at store -> [[['chocolate', 'milk']]]
get eggs at @Vons -> [[['eggs']]]
need headache medicine and cough medicine from downtown drug store ->
[[['headache', 'medicine', 'and', 'cough', 'medicine']], ['downtown',
'drug', 'store']]
buy nails, claw hammer, and studs at @homedepot -> [[['nails']]]
buy ragjoint at car parts store -> [[['ragjoint']], ['car', 'parts',
'store']]
buy laptop at electronics store -> [[['laptop']], ['electronics', 'store']]
get candy @Sees -> [[['candy']], '@', 'Sees']
get a clue -> [[['a', 'clue']]]

Is there a way to express it in the nice BNF form or did I already jump off
that reservation with the grouping and suppressing?

Re: [Pyparsing] parsing a simple Language

From: Paul M. <pt...@au...> - 2010-10-10 23:02:47

> I didn't complicate my minimal example with it, but I've got results
> names set for initials, medials, and vowels.
> 
Since the Regex option seems to be a likely path, you can use named capture
groups in your re, and the Regex class will convert these to results names.

syllable =
Group(Regex(r"(?P<init>[sSbB])(?P<meds>[mMpP]*)(?P<vow>[aeiou]?)"))

Iterate over your syllables, and you can access the fields as in:

res = syllables.parseString(t)
for syl in res.syllables:
    print syl.init, syl.meds, syl.vow
    - or -
    print "init: %(init)s meds:%(meds)s vow:%(vow)s" % syl


> Thanks again for your help and insights, Paul -- once again, pyparsing
> shines in all of its glory :-)
> 
> d

:)  Thanks! As I said, this will be very interesting to see how it plays
out.  Pyparsing is already being used in zhpy to support Chinese language
Python development in Python versions 2.x that pre-date support for Unicode
identifiers.  I'd also like to hear sometime just how you got involved in
this application in the first place (perhaps you've already captured this in
a blog post - just send me the post).

Best regards,
-- Paul

Re: [Pyparsing] parsing a simple Language

From: Duncan M. <dun...@gm...> - 2010-10-10 21:50:06

On Sun, Oct 10, 2010 at 10:56 AM, Paul McGuire <pt...@au...> wrote:
> Duncan, my friend, so good to hear from you again!  I'm glad pyparsing
> continues to be of some use to you.  I must admit, you are the first I have
> heard of to be parsing Tibetan with pyparsing.  I think I can propose a few
> alternative solutions for you.
>
> First of all, your immediate problem has to do with your use of 'max'.  'max
> = 1' means just that, 1 AND NO MORE!

Ah, I see. I had incorrectly interpreted that as "match only one
initial, and if another initial is found, starting parsing that as a
new syllable."

> In your failing case, "sSmi", the
> leading 's' is followed by another 'S', which by definition of your init
> word is not allowed; you exceeded the maximum -> parser fail!  Fortunately,
> the simplest remedy is to use the 'exact' argument instead of 'max':
>
> init = Word('sSbB', exact=1).setName("initial")
> med = Word('mMpP').setName("medial")
> vow = Word('aeiou', exact=1).setName("vowel")
>
> 'exact' does not impose the same lookahead restriction that 'max' does.
>
> If your test case is close enough to your Tibetan application, you might try
> one of these other options.  You can merge your initial and medial
> expressions into a single word, since what you describe is exactly the same
> as the 2-argument constructor for word.  Breaking out the definition of
> syllable as:
>
> syllable = Combine(
>    init + ZeroOrMore(med) + Optional(vow)
>    )
> syllables = Group(OneOrMore(syllable)).setResultsName("syllables")
>
> The first two bits of your syllable can be merged into a single Word
> expression:
>
> syllable = Combine(
>    Word('sSbB', 'mMpP') + Optional(vow)
>    )
> syllables = Group(OneOrMore(syllable)).setResultsName("syllables")

Hrm, I tried that, but wan't able to figure out to get at the parsed
data for the medials. I need to be able to introspect the parsed data
in order to perform various conversion operations (at a later time). I
didn't complicate my minimal example with it, but I've got results
names set for initials, medials, and vowels.

> Or if you can tolerate an even more liberal expression (which would match if
> vowels were mixed in with medials, and not just added to the end):
>
> syllable = Word('sSbB', 'mMpPaeiouAEIOU')
>
> This will parse fairly quickly as well, since it is able to internally
> convert this entire thing into the single regex "[sSbB][mMpPaeiouAEIOU]*".

Ah, this is a great example -- thanks! Sadly, I can't use it, since
the rules for vowels in Tibetam unicode are strict about being at the
end.

> If you still need the more rigor of your original case (only a single
> potential vowel at the end of the syllable, not mixed in with medials), you
> might still try rolling your own Regex:
>
> syllable = Regex(r"[sSbB][mMpP]*[aeiou]?")

Oh, this is very nice. I'm going to play with this some more. Thanks!

> I've found that for low-level tokens like words and numbers, using a Regex
> really outperforms "Combine(startWithThis + (somethingElse|anotherThing) +
> Optional(stillAnotherThing))"; while keeping the re's localized to just a
> simple building block pretty much keeps them from getting too out-of-hand.
> For instance, I've modified the fourFn.py example that ships with pyparsing
> to show the old style commented out, and a still-fairly-easy-to-follow-regex
> replacement:
>
> #~ fnumber = Combine( Word( "+-"+nums, nums ) +
>                   #~ Optional( point + Optional( Word( nums ) ) ) +
>                   #~ Optional( e + Word( "+-"+nums, nums ) ) )
> fnumber = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
> ident = Word(alphas, alphas+nums+"_$")
>
> If these syllabic constructs in Tibetan can be built up from single Unicode
> characters, then I think all of these suggestions are still valid, even down
> to the Regex idea.
>
> I'd be very interested to see more of your Tibetan parser, as things
> progress - good luck!

Once I get it hammered out, I'll reply with a single-file example :-)
It's part of a library I'm creating to support advanced features in
Tibetan software, but the grammar itself should lend itself nicely to
an example.

Thanks again for your help and insights, Paul -- once again, pyparsing
shines in all of its glory :-)

d

Re: [Pyparsing] parsing a simple Language

From: Paul M. <pt...@au...> - 2010-10-10 16:57:00

Duncan, my friend, so good to hear from you again!  I'm glad pyparsing
continues to be of some use to you.  I must admit, you are the first I have
heard of to be parsing Tibetan with pyparsing.  I think I can propose a few
alternative solutions for you.

First of all, your immediate problem has to do with your use of 'max'.  'max
= 1' means just that, 1 AND NO MORE!  In your failing case, "sSmi", the
leading 's' is followed by another 'S', which by definition of your init
word is not allowed; you exceeded the maximum -> parser fail!  Fortunately,
the simplest remedy is to use the 'exact' argument instead of 'max':

init = Word('sSbB', exact=1).setName("initial")
med = Word('mMpP').setName("medial")
vow = Word('aeiou', exact=1).setName("vowel")

'exact' does not impose the same lookahead restriction that 'max' does.

If your test case is close enough to your Tibetan application, you might try
one of these other options.  You can merge your initial and medial
expressions into a single word, since what you describe is exactly the same
as the 2-argument constructor for word.  Breaking out the definition of
syllable as:

syllable = Combine(
    init + ZeroOrMore(med) + Optional(vow)
    )
syllables = Group(OneOrMore(syllable)).setResultsName("syllables")

The first two bits of your syllable can be merged into a single Word
expression:

syllable = Combine(
    Word('sSbB', 'mMpP') + Optional(vow)
    )
syllables = Group(OneOrMore(syllable)).setResultsName("syllables")

Or if you can tolerate an even more liberal expression (which would match if
vowels were mixed in with medials, and not just added to the end):

syllable = Word('sSbB', 'mMpPaeiouAEIOU')

This will parse fairly quickly as well, since it is able to internally
convert this entire thing into the single regex "[sSbB][mMpPaeiouAEIOU]*".

If you still need the more rigor of your original case (only a single
potential vowel at the end of the syllable, not mixed in with medials), you
might still try rolling your own Regex:

syllable = Regex(r"[sSbB][mMpP]*[aeiou]?")

I've found that for low-level tokens like words and numbers, using a Regex
really outperforms "Combine(startWithThis + (somethingElse|anotherThing) +
Optional(stillAnotherThing))"; while keeping the re's localized to just a
simple building block pretty much keeps them from getting too out-of-hand.
For instance, I've modified the fourFn.py example that ships with pyparsing
to show the old style commented out, and a still-fairly-easy-to-follow-regex
replacement:

#~ fnumber = Combine( Word( "+-"+nums, nums ) + 
                   #~ Optional( point + Optional( Word( nums ) ) ) +
                   #~ Optional( e + Word( "+-"+nums, nums ) ) )
fnumber = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
ident = Word(alphas, alphas+nums+"_$")

If these syllabic constructs in Tibetan can be built up from single Unicode
characters, then I think all of these suggestions are still valid, even down
to the Regex idea.

I'd be very interested to see more of your Tibetan parser, as things
progress - good luck!

-- Paul

[Pyparsing] parsing a simple Language

From: Duncan M. <dun...@gm...> - 2010-10-10 15:51:15

Hey all!

Paul, I think it's been since PyCon 2006 since we chatted last! I've
been using pyparsing on and off since then for various projects. I've
even used it at work for conceptual modeling (we're working on a
gesture language for multi-touch in Ubuntu).

However, I'm emailing for help about parsing syllabic rules in a
natural language... Tibetan. I won't bore you with linguistic details,
though: I've created a minimal example with a fake language below.
Here are the rules:

1. All syllables must start with s, S, b, or B.
2. Syllables can be as short as one initial letter.
3. If there are additional consonants in the syllable, they must be
one of m, M, p, or P.
4. Medial consonants may repeat multiple times.
5. Vowels are optional.

Here was my first try at a grammar for these rules:

init = Word('sSbB', max=1).setName("initial")
med = Word('mMpP').setName("medial")
vow = Word('aeiou', max=1).setName("vowel")
syllables = Group(OneOrMore(Combine(
    init + ZeroOrMore(med) + Optional(vow)
    ))).setResultsName("syllables")

For most cases, this resulted in the desired parsing:

syllables.parseString("sabmaSMpo").asList()
[['sa', 'bma', 'SMpo']]

However, I discovered an edge case that wasn't covered. The following
examples result in exceptions:

syllables.parseString("sSma").asList()
syllables.parseString("sbisi").asList()

Now, if I change the init definition to the following:

init = oneOf('S s b B').setName("initial")

I get the desired results for everything. The two problem cases result in this:

syllables.parseString("sSma").asList()
[['s', 'Sma']]
syllables.parseString("sbisi").asList()
[['s', 'bi', 'si']]

So it seems to me that Word should *somehow* be able to do this,
though obviously my use of max=1 and the hope that this would do it is
naive ;-) For the sake of consistency, I'd rather not have to join the
list of initial characters with a space. Is there a way of
accomplishing my goal with Word instead of oneOf?

Thanks!

d

[Pyparsing] license for examples?

From: Peter J. <ta...@hi...> - 2010-08-31 17:00:52

Hi --

I'd like to use the deltaTime.py script from the examples directory of the
1.5.5 release in my own project, but I want to make sure of the license before
I do so.  It doesn't explicitly state a license within, only that it is
copyright Paul McGuire.  Is it also covered under the MIT license?  Thanks.

pete

Re: [Pyparsing] Problem with eastern european characters when scraping data from the European Parliament Website

From: Diez B. R. <de...@we...> - 2010-06-10 12:57:00

On Thursday, June 10, 2010 13:27:11 Thomas Jensen wrote:
> Dear PyParser Experts
> 
> I am trying to scrape a lot of data from the European Parliament
> website for a research project. The first step is to create a list of
> all parliamentarians, however due to the many Eastern European names
> and the accents they use i get a lot of missing entries. Here is an
> example of what is giving me troubles (notice the accents at the end
> of the family name):

I would suggest you use BeatifulSoup for this instead of pyparsing. Pyparsing 
is great, but parsing HTML is a done thing, and to get it robust actually 
requires a *lot* of effort.

Diez

[Pyparsing] Problem with eastern european characters when scraping data from the European Parliament Website

From: Thomas J. <tho...@eu...> - 2010-06-10 11:40:52

Dear PyParser Experts

I am trying to scrape a lot of data from the European Parliament  
website for a research project. The first step is to create a list of  
all parliamentarians, however due to the many Eastern European names  
and the accents they use i get a lot of missing entries. Here is an  
example of what is giving me troubles (notice the accents at the end  
of the family name):


     <td class="listcontentlight_left">
     <a href="/members/expert/alphaOrder/view.do? 
language=EN&amp;id=28276" title="ANDRIKIENĖ, Laima  
Liucija">ANDRIKIENĖ, Laima Liucija</a>
     <br/>
     Group of the European People's Party (Christian Democrats)
     <br/>
     </td>

Here is the url from which the html example is taken from:
http://www.europarl.europa.eu/members/expert/alphaOrder.do?letter=B&language=EN


So far I have been using PyParser and the following code (I know about  
hyphens and so forth this is just a test to see if I can get the name  
listed above):

     #parser_names
     name = Word(alphanums + alphas8bit)
     begin, end = map(Suppress, "><")
     names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end

     for name in names.searchString(page):
         print(name)

However this does not catch the name from the html above. Any advice  
in how to proceed?

Best, Thomas

P.S: Here is all the code i have so far:

     # -*- coding: utf-8 -*-

     import urllib.request
     from pyparsing_py3 import *

     page = urllib.request.urlopen("http://www.europarl.europa.eu/members/expert/alphaOrder.do?letter=B&language=EN 
")
     page = page.read().decode("utf8")


     #parser_names
     name = Word(alphanums + alphas8bit)
     begin, end = map(Suppress, "><")
     names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end

     for name in names.searchString(page):
         print(name)

[Pyparsing] Not sure why I'm not getting the correct tokens

From: Kevin <kcc...@gl...> - 2010-04-19 18:06:25

Hi all:

Been working with pyparsing for a few days now and need some help with 
the following

I have a file which is read on startup to either create or create and 
populate classes

The following would be a valid input file

Class1::Instance1
Class1::Instance2
Class1::Instance3 {
     variable1 = TRUE
     variable2 = FALSE
     variable3 = { {"valuename1", 10},
                   {"valuename2", 20} }
     variable4 = { {10, "valuename3"} }
     variable5 = "variable5value" }
Class1::Instance4
Class2::Instance1

etc, etc.

My parser looks as follows

def importFile_BNF():
     global importBNF

#    if importBNF:                # We've been here before. Don't do it again
#        return importBNF


     # Literals not to be stored
     LCB, RCB, COMMA, DQT = map(Suppress, '{},"')
     DCLN = Suppress(Literal('::'))
     comment = Suppress(Literal('#') + Optional(restOfLine))
     assign = Suppress(Literal('='))
     insert = Suppress(Literal('+='))
     remove = Suppress(Literal ('-='))
     operation = assign ^ insert ^ remove

     classIdentifier = Word(alphas , alphanums + '_')
     instanceIdentifier = Word(alphas, alphanums + '_-')
     name = DQT + instanceIdentifier + DQT
     order = Word(nums)
     driverEntry = LCB + name + order + RCB
     valueList = LCB + OneOrMore(driverEntry) + Optional(COMMA) + RCB
     statement = instanceIdentifier + assign + valueList

     identifier = Group(classIdentifier + DCLN + instanceIdentifier)
     stanzaBody = LCB + OneOrMore(statement) + RCB
     stanza = identifier + Optional(stanzaBody)
     stanza.setDebug()

     importBNF = ZeroOrMore(stanza)
     importBNF.setDefaultWhitespaceChars(' \t\r')
     importBNF.ignore(comment)
     importBNF.ignore(blankline)


     return importBNF

What happens when I run this is that when I reach the LCB on Class1::Instance3 I get an exception
that says "Expected W:(abcd.....) (at char XXX), (line:YY, col ZZ)

Having the stanza body marked optional should allow the opening LCB to be parsed, but it appears
that the parser is insisting that I only do 'definitions' and will not allow me to populate the class.

Any help is greatly appreciated.

Kevin

Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

From: Diez B. R. <de...@we...> - 2010-04-07 20:51:57

Hi,


ok, I don't know why I didn't think of this the first place - maybe  
some weird "you are using pyparsing, no need to bother with nitty  
gritty regexes", but that's what helped - and should have been obvious  
to me :)

         escapes = r"\\\\|\\\."
         IDENT = Regex(r"([a-zA-Z_-]|(%(escapes)s))([a-zA-Z0-9_-]|(% 
(escapes)s))*" %
                        dict(escapes=escapes))


I post this just for the record.

Diez

Am 07.04.2010 um 15:57 schrieb Diez B. Roggisch:

> Hi,
>
> I somehow lost the mail by Denis, so I quote it by hand here, hope  
> that works:
>
>> (Not really sure about your intent.)
>
> My intent is to simply parse a string like this:
>
>  div . class\.name
>
> as
>
> tag[div], class[class.name]
>
> instead of
>
> tag[div], class[class], class[name]
>
> For this to happen, I need to special-case escape-codes beginning  
> with \ so
> that they are *not* treated as  identifier followed by a dot, but  
> instead
> always group the two characters "\." together.
>
>> You seem to be using pyparsing features rather strangely.
>> The 'Word' pattern type allows defining distinct patterns for start  
>> and ( >
>> optional) following characters. Both are character _classes_. You  
>> could use
>> it like:
>
>> nameStartChar = ...
>> nameFollowingChar = ...
>> name = Word(nameStartChar,nameFollowingChar)
>
>> If you want to generalize name to include a dotted format, then  
>> rename the
>> above to namePart and write a pattern including dots.
>
>
> I'm not sure what you mean by this, nor if it helps me. I try to  
> come up with
> a more concise example, here it is:
>
> from pyparsing import *
>
> nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape}
> name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii &
>
> ident = nmstart + ZeroOrMore(name)
>
> #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]"))
>
> MINUS = Literal("-")
> IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO
>
> DOT = Literal(".")
> ASTERISK = Literal("*")
>
> class_ = Combine(DOT + IDENT)
> element_name = IDENT | ASTERISK
>
> selector = (element_name + (ZeroOrMore( class_ )) |
>            OneOrMore( class_ ))
>
>
> print selector.parseString(r"foo.bar")
> print selector.parseString(r"foo.bar\baz")
> print selector.parseString(r"foo.bar\.baz")
>
>
>
> The result is
>
> ['foo', '.bar']
> ['foo', '.bar\\baz']
> ['foo', '.bar\\', '.baz']
>
>
> So clearly the escaping isn't considering the second dot as part of  
> IDENT
> instead of a DOT. And for this to happen, I need a specific lexer  
> rule like
> quotedString - I guess.
>
> Diez
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Pyparsing-users mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyparsing-users
>

Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

From: Diez B. R. <de...@we...> - 2010-04-07 13:02:47

Hi,

I somehow lost the mail by Denis, so I quote it by hand here, hope that works:

> (Not really sure about your intent.)

My intent is to simply parse a string like this:

  div . class\.name

as 

 tag[div], class[class.name]

instead of 

 tag[div], class[class], class[name]

For this to happen, I need to special-case escape-codes beginning with \ so 
that they are *not* treated as  identifier followed by a dot, but instead 
always group the two characters "\." together.

> You seem to be using pyparsing features rather strangely.
> The 'Word' pattern type allows defining distinct patterns for start and ( >
> optional) following characters. Both are character _classes_. You could use  
> it like:

> nameStartChar = ...
> nameFollowingChar = ...
> name = Word(nameStartChar,nameFollowingChar)

> If you want to generalize name to include a dotted format, then rename the
> above to namePart and write a pattern including dots.


I'm not sure what you mean by this, nor if it helps me. I try to come up with 
a more concise example, here it is:

from pyparsing import *

nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape}
name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii &  

ident = nmstart + ZeroOrMore(name)

#ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]"))

MINUS = Literal("-")
IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO

DOT = Literal(".")
ASTERISK = Literal("*")

class_ = Combine(DOT + IDENT)
element_name = IDENT | ASTERISK

selector = (element_name + (ZeroOrMore( class_ )) |
            OneOrMore( class_ ))


print selector.parseString(r"foo.bar")
print selector.parseString(r"foo.bar\baz")
print selector.parseString(r"foo.bar\.baz")



The result is 

['foo', '.bar']
['foo', '.bar\\baz']
['foo', '.bar\\', '.baz']


So clearly the escaping isn't considering the second dot as part of IDENT 
instead of a DOT. And for this to happen, I need a specific lexer rule like 
quotedString - I guess. 

Diez

Re: [Pyparsing] escape sequence in identifieres - with inline sourcecode

From: spir ☣ <den...@gm...> - 2010-04-06 10:47:34

On Fri, 2 Apr 2010 15:23:27 +0200
"Diez B. Roggisch" <de...@we...> wrote:

> Hi,
> 
> it seems as if the ML strips attachments, so here comes the  
> aforementioned example code inline:
> 
> from pyparsing import *
> 
> nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape}
> name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii &  
> escape
> #numlit = Word(srange("[0-9]"))
> 
> MINUS = Literal("-")
> IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name),  
> adjacent=True) # TODO

(Not really sure about your intent.)
You seem to be using pyparsing features rather strangely.
The 'Word' pattern type allows defining distinct patterns for start and (optional) following characters. Both are character _classes_. You could use it like:

nameStartChar = ...
nameFollowingChar = ...
name = Word(nameStartChar,nameFollowingChar)

If you want to generalize name to include a dotted format, then rename the above to namePart and write a pattern including dots.

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com

[Pyparsing] escape sequence in identifieres - with inline sourcecode

From: Diez B. R. <de...@we...> - 2010-04-02 13:23:35

Hi,

it seems as if the ML strips attachments, so here comes the  
aforementioned example code inline:

from pyparsing import *

nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape}
name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii &  
escape
#numlit = Word(srange("[0-9]"))

MINUS = Literal("-")
IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name),  
adjacent=True) # TODO


print IDENT.parseString(r"foo\bar")
print IDENT.parseString(r"foo\.bar")


The output is

(cssprocessor)mac-dir:ablcssprocessor deets$ python /tmp/test.py
['foo\\bar']
['foo\\']

So you can see there is the whole "\.bar"-stuff missing.

Diez

[Pyparsing] escape sequences in identifiers

From: Diez B. R. <de...@we...> - 2010-04-02 13:12:38

Hi,

I'm using pyparsing to parse CSS. Now I've encountered css-classes  
with dots in them the first time - so I need to extend my identifier  
definition to encompass backslash escapes.

This test-program illustrates my problem. Or at least some problem, my  
real code throws for the same testcase an exception:

======================================================================
ERROR: Tests that various parts of the grammer parse
----------------------------------------------------------------------
Traceback (most recent call last):
   File "/Users/deets/projects/privat/TurboGears/ablcssprocessor/tests/ 
test_parser.py", line 42, in test_subexpressions
     (r'foo\.bar', [r"foo\.bar"]),
   File "/Users/deets/projects/privat/TurboGears/ablcssprocessor/tests/ 
test_parser.py", line 27, in parse
     result = g.parseString(test, True).asList()
   File "/Users/deets/.virtualenvs2.5/cssprocessor/lib/python2.5/site- 
packages/pyparsing-1.5.2-py2.5.egg/pyparsing.py", line 1076, in  
parseString
     raise exc
ParseException: Expected end of text (at char 4), (line:1, col:5)

----------------------------------------------------------------------

Any suggestions how to deal with this? Thanks,

Diez

Re: [Pyparsing] left-recursion again: expression of attribute

From: spir ☣ <den...@gm...> - 2010-03-31 09:37:03

On Tue, 30 Mar 2010 22:43:11 +0200
Eike Welk <eik...@gm...> wrote:

> Hello Denis!
> 
> On Tuesday March 30 2010 21:31:58 spir ☣ wrote:
> > Hello,
> > 
> > This time, that's me having a left-recursion issue. I'm trying to parse var
> >  names that can possibly refer to attributes, like "a.b.c". I can parse it
> >  as simpleName + ZeroOrMore(extension)
> > but then I need to reformat the result resursively, to get the real
> >  semantics of: getattr(getattr(container, name), name)
> > i.e. in the case of "a.b.c":
> > 	((a).b).c
> 
> It is impossible to do what you want in an elegant way, you must reformat the 
> parse result.

Right, I knew it in fact ;-) but was expecting kind of a miracle!

> I had a similar question some time ago, this is the simplified 
> summary of Paul's answer. 
> 
> For code that gives you the right operator precedence look at Pyparsing's 
> operatorPrecedence(...) or at the calculator example. It also gives you a 
> parse result that can be relatively easily reformatted. 
> 
> By the way, if you really implement a programming language with Pyparsing use 
> operatorPrecedence(...) it can parse nearly all of Python's expressions 
> including function calls. 
[...]

I don't, in fact (even may swith to Lua for several reasons). And there is no operator precedence in the language I'm parsing.

For possible interest: As of now, operators map to class methods, meaning eg "+" does not map to
   self.add(other)
but to
   Number.add(numbers)
and this scheme may (as of now, it does) mirror in a prefix notation syntax:
   +(n1 n2 n3 ...)
instead of
   n1 + n2 + n3 ...

Thus there is no operator precedence at all.
(Anyway, I wanted no arbitrary precedence between operators of the same "kind" (arithmetics, logical...), so I would have imposed parens if not used prefix notation.)

I have always found operators mapping to instance methods simply wrong (less wrong for unary operators, but still). So, this scheme brings me both advantages at once.

> If you are interested I can talk you though my still very big parser 
> implementation, which can be found here (look at line 979):
> http://tinyurl.com/yc3gyqc
> 
> Original URL, but Kmail messes this URL up:
> http://bazaar.launchpad.net/~vcs-
> imports/freeode/trunk/annotate/head:/freeode_py/freeode/simlparser.py

I have had a look, thank you. But your app is really big ;-) I'm exploring it superficially, looks interesting and the implementation is really clean :-)

* custom nodes *
Just a side note you may find interesting, since you yield nodes of custom type, to represent chunks of source code, via parse actions:
For my present app, I introduce a modif in my parsing library that allows specifying a class, instead of a func, as match (parse) action. In this case, instead of instantiating a standard node (parse result) from the match result, and then applying an action on it --that will possibly yield a custom node, as you do--, the matching method directly calls the class (with the same args as if it were a standard node), and returns it normally -- so that it will be inserted in the tree like any other node.
Hem, seems I'm not clear, so the sketch is:

class KindOfPattern(Pattern):
    ...
    def _match(source):
        # (the source holds its current pos, after matching it holds the range)
        result = <try and match, else MatchFailure exception>
        # result is either matched source snippet
        # or a sequence of child nodes
        action = self.action
        if isinstance(action, type):
            # case custom
            node = action(self, result, source)
            # needed transformation, if any, should be done in __init__
        else:
            # case standard
            node = Node(self, result, source)
            # node applies possible action itself, if not None
        return node

[A disadvantage is I cannot use anymore builtin match actions that reformat result nodes according to common needs (drop, extract sub result, join, flatten leaves, debug output...). So, I'll have to reintroduce the possibility to specify several actions (which I removed before, because one action can simply call others: "def action2(node): action1(node); ...").]

Every custom node is regarded (from a higher-level node's point of view) as a single / simple node, meaning it becomes a leaf whatever it represents. But it may be useful that custom nodes _act_ like composite ones when they conceptually are, by implementing indexing/iteration, treeview, whatever...

Don't know if Paul likes this option and would like to introduce it in pyparsing.

An intermediate solution is to modify the lib itself so as to have only custom nodes ;-) I did this first, first by making Node a subtype of my top Data type (so that the door is open to homoiconicity). But this is not very practicle, since usually lower-level matches are well dealt with as standard nodes, before they become input data for higher-level ones.

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com

Re: [Pyparsing] left-recursion again: expression of attribute

From: Eike W. <eik...@gm...> - 2010-03-30 20:43:20

Hello Denis!

On Tuesday March 30 2010 21:31:58 spir ☣ wrote:
> Hello,
> 
> This time, that's me having a left-recursion issue. I'm trying to parse var
>  names that can possibly refer to attributes, like "a.b.c". I can parse it
>  as simpleName + ZeroOrMore(extension)
> but then I need to reformat the result resursively, to get the real
>  semantics of: getattr(getattr(container, name), name)
> i.e. in the case of "a.b.c":
> 	((a).b).c

It is impossible to do what you want in an elegant way, you must reformat the 
parse result. I had a similar question some time ago, this is the simplified 
summary of Paul's answer. 

For code that gives you the right operator precedence look at Pyparsing's 
operatorPrecedence(...) or at the calculator example. It also gives you a 
parse result that can be relatively easily reformatted. 

By the way, if you really implement a programming language with Pyparsing use 
operatorPrecedence(...) it can parse nearly all of Python's expressions 
including function calls. The only exception are the intertwined unary minus 
and exponentiation operators. You'll need hand crafted code for them. 
operatorPrecedence(...) has significantly reduced the size of my parser (I 
have 12 levels of precedence). 

If you are interested I can talk you though my still very big parser 
implementation, which can be found here (look at line 979):
http://tinyurl.com/yc3gyqc

Original URL, but Kmail messes this URL up:
http://bazaar.launchpad.net/~vcs-
imports/freeode/trunk/annotate/head:/freeode_py/freeode/simlparser.py

Eike.

[Pyparsing] left-recursion again: expression of attribute

From: spir ☣ <den...@gm...> - 2010-03-30 19:32:10

Hello,

This time, that's me having a left-recursion issue. I'm trying to parse var names that can possibly refer to attributes, like "a.b.c". I can parse it as
	simpleName + ZeroOrMore(extension)
but then I need to reformat the result resursively, to get the real semantics of:
	getattr(getattr(container, name), name)
i.e. in the case of "a.b.c":
	((a).b).c

In PEG, I cannot find a way to avoid left-recursion:

simpleName	: [a-zA-Z_] [a-zA-Z_0-9]*
attribute 	: name '.' simpleName
name		: attribute / simpleName

and still get an expression of the recursive pattern directly.
The issue is indeed that (unlike the way I wrote it above) the expression of an attribute is not nicely wrapped in delimiters.
[I really look for that because in my case any name ends up mapping to getattr(world, name), ie it refers an attribute of <world>, the equivalent of in the language I'm parsing of py globals().]

Is there any workaround?
Thanks for reading,

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com

Re: [Pyparsing] RuntimeError: maximum recursion depth exceeded

From: Paul M. <pt...@au...> - 2010-03-28 06:41:56

I can see that you are just getting started with pyparsing.  First of all,
nums is not intended to be used as a parsing expression as you have used it.
nums is a string defined in pyparsing to make it easy to create Word's made
up of digits and other charaters.  Pyparsing allows you to mix strings and
expressions so that you can easily build up parsers using the '+' and '|'
operators.  For instance, to match a Python comment that starts with a '#'
character and goes to the end of the line, you can use:

  comment = "#" + restOfLine

The '+' operator in pyparsing will automatically promote strings to a
pyparsing Literal, so this is synonymous with:

  comment = Literal("#") + restOfLine

but I think that the first version is a little easier to read.

You are use the string 'nums' that is defined by pyparsing, but I don't
think the results are as you intended.  Your statement:

  decimal_digits << (nums | (decimal_digits + nums))

is like saying:

  decimal_digits << ("0123456789" | (decimal_digits + "0123456789"))

which will get expanded to:

  decimal_digits << (Literal("0123456789") | (decimal_digits +
Literal("0123456789")))

Now your input string "876875.878" *does* contain numeric digits, it does
not contain the exact sequence "0123456789", so that Literal will never
match.  So then pyparsing proceeds to the second alternative, which is

  (decimal_digits + Literal("0123456789"))

So pyparsing recursively tries to match decimal_digits, which takes us back
to the original expression, which fails, and so we recurse again,... and
again and again until we hit the recursion limit.

Let's start by correcting your implementation of a repetition of digits, to
match the leading part of your test string.  You can't just match the string
defined by nums, since that will only match the literal sequence
"0123456789".  So you want to match *any one of* these characters, one at a
time.  Conveniently, pyparsing includes a helper method named oneOf that
will take a list of strings, or a single string of space-separated strings.

I suspect you are trying to follow a BNF definition for a real number,
something like:

  decimal_digits ::= digit | decimal_digits digit

This is a recursive definition that will recursively match a sequence of
digits.  To directly translate this to pyparsing would be similar to your
attempt, but we'll correct it to use oneOf:

  decimal_digits = Forward()
  digit = oneOf(list(nums))
  decimal_digits << (digit + decimal_digits | digit)

This is a common experience when trying to take a classical BNF definition
and convert it directly to pyparsing.  Your definition uses recursion to
successively match the leading digits of your input string.

BNF does not have syntax for repetition, so to define something like a list
of elements, one must use:

  list_of_items ::= item list_of_items | item

In pyparsing, you can simply define

  list_of_items = OneOrMore(item)

So we could replace our complicated decimal_digits definition above with
just:
  
  digit = oneOf(list(nums))
  decimal_digits = OneOrMore(digit)

But even better would be to use pyparsing's Word class, for which nums was
intended in the first place.

  decimal_digits = Word(nums)

Word takes one or two strings to specify a sequence of characters.  If just
one string of possible characters is given, then Word will match as many
characters in that set as possible, like decimal_digits.  If two strings are
given, the first will be used as the set of valid *initial* characters, and
the second will be used as the set of valid *body* characters.  So you could
see something like:

  uppers = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
  lowers = "abcdefghijklmnopqrstuvwxyz"
  capitalized_word = Word(uppers, lowers)

To match your full string, you would need something like this:

  real_number = Combine(Optional("-") + Word(nums) + "." + Word(nums))

Please read some of the pyparsing documentation, and check out the examples
on the pyparsing wiki.

Welcome to pyparsing!

-- Paul

> -----Original Message-----
> From: elekis [mailto:el...@gm...]
> Sent: Saturday, March 27, 2010 12:30 PM
> To: pyp...@li...
> Subject: [Pyparsing] RuntimeError: maximum recursion depth exceeded
> 
> hi,
> 
> I have a error that I can't find the solution.
> 
> I have  the following recursive rule
> 
> decimal_digits = Forward()
> decimal_digits << (nums | (decimal_digits + nums))
> p = decimal_digits.parseString("876875.878")
> 
> but I have a runtimeerro maximum recursion depth exceeded.
> 
> dunno why .
> 
> any idea ?
> 
> thanks
> 
> a++
> 
> --
> http://twoji.deviantart.com/
> http://www.flickr.com/elekis
>

[Pyparsing] RuntimeError: maximum recursion depth exceeded

From: elekis <el...@gm...> - 2010-03-27 17:30:36

hi,

I have a error that I can't find the solution.

I have  the following recursive rule

decimal_digits = Forward()
decimal_digits << (nums | (decimal_digits + nums))
p = decimal_digits.parseString("876875.878")

but I have a runtimeerro maximum recursion depth exceeded.

dunno why .

any idea ?

thanks

a++

-- 
http://twoji.deviantart.com/
http://www.flickr.com/elekis

Re: [Pyparsing] Using both Forward and operatorPrecedence()

From: ThanhVu (V. N. <ngu...@gm...> - 2010-03-25 22:07:46

thanks for the explanation  -- I think your code works pretty well
VN -



2010/3/25 spir ☣ <den...@gm...>:
> On Wed, 24 Mar 2010 15:57:34 -0600
> "ThanhVu (Vu) Nguyen" <ngu...@gm...> wrote:
>
>> Hi, I tried to generate this simple recursive rule that involves both
>> Forward and operatorPrecedence() and get errors about maximum
>> recursion depth exceeded .
>>
>> Thanks,
>>
>>
>>
>> def getRule_test():
>>     #rule   exp = name | num | name[exp] | exp + exp | exp * exp |
>>     name = Word(alphas)
>>     num = Word(nums)
>>
>>     exp = Forward()
>>     idx=name + '[' + exp + ']'
>>
>>     arith = operatorPrecedence(
>>         exp,[('*',2,opAssoc.LEFT),
>>              ('+',2,opAssoc.RIGHT)],)
>>
>>     exp << (arith|idx|name|num)  #works ok if take out arith
>>     return exp
>>
>> VN -
>
> Yes, the recursive term "exp" in your format appears on the left side of arith, and thus finally on the left side of itself. This cannot cannot be matched since it lauches an infinite recursive loop of call to exp.match().
>
> More generally, you cannot write a pattern such as:
>        p1 : p1 whatever
> But it can always be reformulated into something like:
>        p2 : whatever p2
>
> Here, you need to distinguish between the levels of a non-recursive operand (inside arith) and of a whole exp. Operator precedence already involves the inherent recursivity of operations. Exp only needs be recursive because it appears inside idx. (I guess.)
>
> Something like, maybe (untested!):
>
>    #rule   exp = name | num | name[exp] | exp + exp | exp * exp |
>    name = Word(alphas)
>    num = Word(nums)
>
>    exp = Forward()
>    idx=name + '[' + exp + ']'
>
>    operand = (idx|name|num)
>    arith = operatorPrecedence(
>        operand,[('*',2,opAssoc.LEFT),
>             ('+',2,opAssoc.RIGHT)],)
>
>    exp << (arith|operand)  #works ok if take out arith
>    return exp
>
> Denis
> ________________________________
>
> vit esse estrany ☣
>
> spir.wikidot.com
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Pyparsing-users mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyparsing-users
>

20 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 10 11 12 13 14 .. 31 > >> (Page 12 of 31)