Menu

#75 Won't match WordEnd after optional, and absent, element

v1.0 (example)
open
nobody
None
5
2014-11-23
2014-09-07
Jonas Olson
No

In the following example, I expect a match to be found with {'a': 'A', 'b': ''}, but no match is found.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/env python3
#coding=utf-8

from pyparsing import *

text = 'ABC'

a = Literal('A')
b = oneOf(['B',''])
pattern = Combine(a.setResultsName('a') +
                  b.setResultsName('b') +
                  WordEnd('A'))

pattern.parseString(text)

Using "Optional" instead of "oneOf" yields no match either.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/env python3
#coding=utf-8

from pyparsing import *

text = 'ABC'

a = Literal('A')
b = Literal('B')
pattern = Combine(a.setResultsName('a') +
                  Optional(b.setResultsName('b')) +
                  WordEnd('A'))

pattern.parseString(text)
1 Attachments

Related

Bugs: #75

Discussion

  • Paul McGuire

    Paul McGuire - 2014-09-07

    What you describe is the intended behavior of WordEnd. In your example, 'ABC', there is no word break after 'A' or 'AB'. For WordEnd to match, you would have to parse a string like 'A BC', 'AB C', 'AB(C'. 'AB' has to be followed by a character that is not in the normal set of word characters. What are you trying to accomplish with this usage of WordEnd?

     
    • Jonas Olson

      Jonas Olson - 2014-09-07

      For WordEnd to match, you would have to parse a string like 'A BC', 'AB C', 'AB(C'. 'AB' has to be followed by a character that is not in the normal set of word characters.

      Please note that I am giving an argument to WordEnd. It is my understanding that it specifies what characters are allowed in the word:

      WordEnd('A')
      

      If you leave out the b part, it works as I would expect, including matching a WordEnd right after the 'A':

       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      #!/usr/bin/env python3
      # coding=utf-8
      
      from pyparsing import *
      
      text = 'ABC'
      
      a = Literal('A')
      pattern = Combine(a.setResultsName('a') +
                        WordEnd('A'))
      
      pattern.parseString(text)
      

      What are you trying to accomplish with this usage of WordEnd?

      I am parsing values for electronic components, such as resistors, capacitors and inductors. As is often the case in electronics, these are written with the unit of measurement left out, but with an optional unit prefix still present, i.e., it might say "100 k" instead of "100 kΩ".

      Somewhat simplified, my matching pattern therefore looks like this:

      1. A number
      2. Zero or one spaces
      3. One unit prefix (where the empty string counts as a prefix whose corresponding multiplier is 1)
      4. A WordEnd with some suitable set of characters as the argument
      5. An arbitrary continuation of the string
       
  • Jonas Olson

    Jonas Olson - 2014-10-02

    Any chance of having this looked at, you think?

     
    • Paul McGuire

      Paul McGuire - 2014-10-02

      WordEnd not only looks forward but also looks backward. So WordEnd('A') can
      only succeed if the previous character is an 'A'. In both of your cases,
      the previous character is 'B', so WordEnd will fail.

      -- Paul

      From: Jonas Olson [mailto:bromskloss@users.sf.net]
      Sent: Thursday, October 02, 2014 1:17 PM
      To: [pyparsing:bugs]
      Subject: [pyparsing:bugs] #75 Won't match WordEnd after optional, and
      absent, element

      Any chance of having this looked at, you think?


      [bugs:#75] http://sourceforge.net/p/pyparsing/bugs/75 Won't match WordEnd
      after optional, and absent, element

      Status: open
      Group: v1.0 (example)
      Created: Sun Sep 07, 2014 04:02 PM UTC by Jonas Olson
      Last Updated: Sun Sep 07, 2014 04:42 PM UTC
      Owner: nobody

      In the following example, I expect a match to be found with {'a': 'A', 'b':
      ''}, but no match is found.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14

      !/usr/bin/env python3

      coding=utf-8

      from pyparsing import *

      text = 'ABC'

      a = Literal('A')
      b = oneOf(['B',''])
      pattern = Combine(a.setResultsName('a') +
      b.setResultsName('b') +
      WordEnd('A'))

      pattern.parseString(text)

      Using "Optional" instead of "oneOf" yields no match either.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14

      !/usr/bin/env python3

      coding=utf-8

      from pyparsing import *

      text = 'ABC'

      a = Literal('A')
      b = Literal('B')
      pattern = Combine(a.setResultsName('a') +
      Optional(b.setResultsName('b')) +
      WordEnd('A'))

      pattern.parseString(text)


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pyparsing/bugs/75/
      https://sourceforge.net/p/pyparsing/bugs/75

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      https://sourceforge.net/auth/subscriptions


      This email is free from viruses and malware because avast! Antivirus protection is active.
      http://www.avast.com

       

      Related

      Bugs: #75

  • Jonas Olson

    Jonas Olson - 2014-10-02

    WordEnd not only looks forward but also looks backward. So WordEnd('A') can
    only succeed if the previous character is an 'A'. In both of your cases,
    the previous character is 'B', so WordEnd will fail.

    Actually, in the examples of my original post, I expect the pattern to match just the 'A' of the input string 'ABC'. More precisely, subpattern a would match 'A' and subpattern b would match ''. The next character would thus be 'B', which would constitute a WordEnd.

     
    • Paul McGuire

      Paul McGuire - 2014-10-02

      Well, since the is a 'B' there, then subpattern b (whether using oneOf or
      Optional) will match the 'B'. At that point, the next WordEnd('A') will
      fail. There is no backtracking to undo the match of the letter 'B' since it
      was optional to see if maybe the WordEnd will match. I would post a working
      example, but I don't really get why you are including both an Optional('B')
      and a WordEnd('A'), which must fail if the 'B' is present. If you change
      it to WordEnd('AB'), then it makes a little more sense to me.

      And I've never really considered using oneOf with a list including an empty
      string, in place of Optional. It is not really how oneOf was intended to be
      used - does it work?

      -- Paul

      From: Jonas Olson [mailto:bromskloss@users.sf.net]
      Sent: Thursday, October 02, 2014 3:12 PM
      To: [pyparsing:bugs]
      Subject: [pyparsing:bugs] #75 Won't match WordEnd after optional, and
      absent, element

      WordEnd not only looks forward but also looks backward. So WordEnd('A') can
      only succeed if the previous character is an 'A'. In both of your cases,
      the previous character is 'B', so WordEnd will fail.

      Actually, in the examples of my original post, I expect the pattern to match
      just the 'A' of the input string 'ABC'. More precisely, subpattern a would
      match 'A' and subpattern b would match ''. The next character would thus be
      'B', which would constitute a WordEnd.


      [bugs:#75] http://sourceforge.net/p/pyparsing/bugs/75 Won't match WordEnd
      after optional, and absent, element

      Status: open
      Group: v1.0 (example)
      Created: Sun Sep 07, 2014 04:02 PM UTC by Jonas Olson
      Last Updated: Thu Oct 02, 2014 06:17 PM UTC
      Owner: nobody

      In the following example, I expect a match to be found with {'a': 'A', 'b':
      ''}, but no match is found.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14

      !/usr/bin/env python3

      coding=utf-8

      from pyparsing import *

      text = 'ABC'

      a = Literal('A')
      b = oneOf(['B',''])
      pattern = Combine(a.setResultsName('a') +
      b.setResultsName('b') +
      WordEnd('A'))

      pattern.parseString(text)

      Using "Optional" instead of "oneOf" yields no match either.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14

      !/usr/bin/env python3

      coding=utf-8

      from pyparsing import *

      text = 'ABC'

      a = Literal('A')
      b = Literal('B')
      pattern = Combine(a.setResultsName('a') +
      Optional(b.setResultsName('b')) +
      WordEnd('A'))

      pattern.parseString(text)


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/pyparsing/bugs/75/
      https://sourceforge.net/p/pyparsing/bugs/75

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      https://sourceforge.net/auth/subscriptions


      This email is free from viruses and malware because avast! Antivirus protection is active.
      http://www.avast.com

       

      Related

      Bugs: #75

      • Jonas Olson

        Jonas Olson - 2014-10-14

        There is no backtracking to undo the match of the letter 'B' since it
        was optional to see if maybe the WordEnd will match.

        There we have it, possibly. I was working under the assumption that pyparsing guarantees to find a match if there is at least one interpretation of the matching pattern that matches. That's what I am used to from for example regular expressions (where A*A matches the string 'A') and that's what I thought was the standard way of parsing in general.

        I would post a working
        example, but I don't really get why you are including both an Optional('B')
        and a WordEnd('A'), which must fail if the 'B' is present. If you change
        it to WordEnd('AB'), then it makes a little more sense to me.

        This was just a minimal example I put together for the purpose of reporting what I perceived as a bug. Do you want me to post what I'm actually trying to do? It would be great to get a working example of that.

        And I've never really considered using oneOf with a list including an empty
        string, in place of Optional. It is not really how oneOf was intended to be
        used - does it work?

        At first I thought it worked, but it seems to break when a very specific criterion is satisfied, namely that exactly one of the list elements is exactly two characters long.

        >>> oneOf(['a', 'b', ''])
        Re:('a|b|')
        >>> oneOf(['aa', 'b', ''])
        Re:('[aab]')
        >>> oneOf(['aaa', 'b', ''])
        Re:('aaa|b|')
        >>> oneOf(['aa', 'bb', ''])
        Re:('aa|bb|')
        
         
  • Paul McGuire

    Paul McGuire - 2014-11-23

    Yes, please post a more complete example of what you are doing. I will have a little more time during the holidays to devote to answering pyparsing questions.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.