Python parsing module / Bugs / #75 Won't match WordEnd after optional, and absent, element

What you describe is the intended behavior of WordEnd. In your example, 'ABC', there is no word break after 'A' or 'AB'. For WordEnd to match, you would have to parse a string like 'A BC', 'AB C', 'AB(C'. 'AB' has to be followed by a character that is not in the normal set of word characters. What are you trying to accomplish with this usage of WordEnd?

For WordEnd to match, you would have to parse a string like 'A BC', 'AB C', 'AB(C'. 'AB' has to be followed by a character that is not in the normal set of word characters.

Please note that I am giving an argument to WordEnd. It is my understanding that it specifies what characters are allowed in the word:

WordEnd('A')

If you leave out the b part, it works as I would expect, including matching a WordEnd right after the 'A':

#!/usr/bin/env python3
# coding=utf-8

from pyparsing import *

text = 'ABC'

a = Literal('A')
pattern = Combine(a.setResultsName('a') +
                  WordEnd('A'))

pattern.parseString(text)

What are you trying to accomplish with this usage of WordEnd?

I am parsing values for electronic components, such as resistors, capacitors and inductors. As is often the case in electronics, these are written with the unit of measurement left out, but with an optional unit prefix still present, i.e., it might say "100 k" instead of "100 kΩ".

Somewhat simplified, my matching pattern therefore looks like this:

A number
Zero or one spaces
One unit prefix (where the empty string counts as a prefix whose corresponding multiplier is 1)
A WordEnd with some suitable set of characters as the argument
An arbitrary continuation of the string

Jonas Olson - 2014-10-02

Any chance of having this looked at, you think?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2014-10-02
  
  WordEnd not only looks forward but also looks backward. So WordEnd('A') can
  only succeed if the previous character is an 'A'. In both of your cases,
  the previous character is 'B', so WordEnd will fail.
  
  -- Paul
  
  From: Jonas Olson [mailto:bromskloss@users.sf.net]
  Sent: Thursday, October 02, 2014 1:17 PM
  To: [pyparsing:bugs]
  Subject: [pyparsing:bugs] #75 Won't match WordEnd after optional, and
  absent, element
  
  Any chance of having this looked at, you think?
  
  [bugs:#75] http://sourceforge.net/p/pyparsing/bugs/75 Won't match WordEnd
  after optional, and absent, element
  
  Status: open
  Group: v1.0 (example)
  Created: Sun Sep 07, 2014 04:02 PM UTC by Jonas Olson
  Last Updated: Sun Sep 07, 2014 04:42 PM UTC
  Owner: nobody
  
  In the following example, I expect a match to be found with {'a': 'A', 'b':
  ''}, but no match is found.
  
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  
  !/usr/bin/env python3
  
  coding=utf-8
  
  from pyparsing import *
  
  text = 'ABC'
  
  a = Literal('A')
  b = oneOf(['B',''])
  pattern = Combine(a.setResultsName('a') +
  b.setResultsName('b') +
  WordEnd('A'))
  
  pattern.parseString(text)
  
  Using "Optional" instead of "oneOf" yields no match either.
  
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  
  !/usr/bin/env python3
  
  coding=utf-8
  
  from pyparsing import *
  
  text = 'ABC'
  
  a = Literal('A')
  b = Literal('B')
  pattern = Combine(a.setResultsName('a') +
  Optional(b.setResultsName('b')) +
  WordEnd('A'))
  
  pattern.parseString(text)
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/pyparsing/bugs/75/
  https://sourceforge.net/p/pyparsing/bugs/75
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  https://sourceforge.net/auth/subscriptions
  
  This email is free from viruses and malware because avast! Antivirus protection is active.
  http://www.avast.com
  
  Related
  
  Bugs: #75
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jonas Olson - 2014-10-02

WordEnd not only looks forward but also looks backward. So WordEnd('A') can
only succeed if the previous character is an 'A'. In both of your cases,
the previous character is 'B', so WordEnd will fail.

Actually, in the examples of my original post, I expect the pattern to match just the 'A' of the input string 'ABC'. More precisely, subpattern a would match 'A' and subpattern b would match ''. The next character would thus be 'B', which would constitute a WordEnd.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2014-10-02
  
  Well, since the is a 'B' there, then subpattern b (whether using oneOf or
  Optional) will match the 'B'. At that point, the next WordEnd('A') will
  fail. There is no backtracking to undo the match of the letter 'B' since it
  was optional to see if maybe the WordEnd will match. I would post a working
  example, but I don't really get why you are including both an Optional('B')
  and a WordEnd('A'), which must fail if the 'B' is present. If you change
  it to WordEnd('AB'), then it makes a little more sense to me.
  
  And I've never really considered using oneOf with a list including an empty
  string, in place of Optional. It is not really how oneOf was intended to be
  used - does it work?
  
  -- Paul
  
  From: Jonas Olson [mailto:bromskloss@users.sf.net]
  Sent: Thursday, October 02, 2014 3:12 PM
  To: [pyparsing:bugs]
  Subject: [pyparsing:bugs] #75 Won't match WordEnd after optional, and
  absent, element
  
  WordEnd not only looks forward but also looks backward. So WordEnd('A') can
  only succeed if the previous character is an 'A'. In both of your cases,
  the previous character is 'B', so WordEnd will fail.
  
  Actually, in the examples of my original post, I expect the pattern to match
  just the 'A' of the input string 'ABC'. More precisely, subpattern a would
  match 'A' and subpattern b would match ''. The next character would thus be
  'B', which would constitute a WordEnd.
  
  [bugs:#75] http://sourceforge.net/p/pyparsing/bugs/75 Won't match WordEnd
  after optional, and absent, element
  
  Status: open
  Group: v1.0 (example)
  Created: Sun Sep 07, 2014 04:02 PM UTC by Jonas Olson
  Last Updated: Thu Oct 02, 2014 06:17 PM UTC
  Owner: nobody
  
  In the following example, I expect a match to be found with {'a': 'A', 'b':
  ''}, but no match is found.
  
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  
  !/usr/bin/env python3
  
  coding=utf-8
  
  from pyparsing import *
  
  text = 'ABC'
  
  a = Literal('A')
  b = oneOf(['B',''])
  pattern = Combine(a.setResultsName('a') +
  b.setResultsName('b') +
  WordEnd('A'))
  
  pattern.parseString(text)
  
  Using "Optional" instead of "oneOf" yields no match either.
  
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  11
  12
  13
  14
  
  !/usr/bin/env python3
  
  coding=utf-8
  
  from pyparsing import *
  
  text = 'ABC'
  
  a = Literal('A')
  b = Literal('B')
  pattern = Combine(a.setResultsName('a') +
  Optional(b.setResultsName('b')) +
  WordEnd('A'))
  
  pattern.parseString(text)
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/pyparsing/bugs/75/
  https://sourceforge.net/p/pyparsing/bugs/75
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  https://sourceforge.net/auth/subscriptions
  
  This email is free from viruses and malware because avast! Antivirus protection is active.
  http://www.avast.com
  
  Related
  
  Bugs: #75
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jonas Olson - 2014-10-14
    
    There is no backtracking to undo the match of the letter 'B' since it
    was optional to see if maybe the WordEnd will match.
    
    There we have it, possibly. I was working under the assumption that pyparsing guarantees to find a match if there is at least one interpretation of the matching pattern that matches. That's what I am used to from for example regular expressions (where A*A matches the string 'A') and that's what I thought was the standard way of parsing in general.
    
    I would post a working
    example, but I don't really get why you are including both an Optional('B')
    and a WordEnd('A'), which must fail if the 'B' is present. If you change
    it to WordEnd('AB'), then it makes a little more sense to me.
    
    This was just a minimal example I put together for the purpose of reporting what I perceived as a bug. Do you want me to post what I'm actually trying to do? It would be great to get a working example of that.
    
    And I've never really considered using oneOf with a list including an empty
    string, in place of Optional. It is not really how oneOf was intended to be
    used - does it work?
    
    At first I thought it worked, but it seems to break when a very specific criterion is satisfied, namely that exactly one of the list elements is exactly two characters long.
    
    >>> oneOf(['a', 'b', '']) Re:('a|b|') >>> oneOf(['aa', 'b', '']) Re:('[aab]') >>> oneOf(['aaa', 'b', '']) Re:('aaa|b|') >>> oneOf(['aa', 'bb', '']) Re:('aa|bb|')
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2014-11-23

Yes, please post a more complete example of what you are doing. I will have a little more time during the holidays to devote to answering pyparsing questions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Won't match WordEnd after optional, and absent, element

Group

Searches

Help

#75 Won't match WordEnd after optional, and absent, element

Related

Discussion

!/usr/bin/env python3

coding=utf-8

!/usr/bin/env python3

coding=utf-8

Related

!/usr/bin/env python3

coding=utf-8

!/usr/bin/env python3

coding=utf-8

Related