#6 lineno function can be confused by tabs - add to docu

closed
nobody
5
2007-04-12
2007-01-08
Eike Welk
No

In a presumably very common usecase the results of the lineno(...) function are wrong, when the parsed string contains tab characters.

A warning about this (and instructions on what to do) should be added to the documentation of the lineno(...) function.
(The same applies to col(...) and line(...))

Long explaination:
-------------------
The computation of line numbers usually happens after the parse function has returned.

When tab expansion takes place, parsing and line number computation work on different strings. The tab expansion makes the string longer and then the parser works on the enlarged string. Therefore the index (loc) reported to parse actions or exceptions is not valid for the original input string.

Wokaround:
----------
There are two workarounds if the input string may contain tabs:
- Turn off tab expansion by calling:
theParser.parseWithTabs()
- Perform tab expansion on the original string:
theString.expandtabs()

Eike.

PS: Why is tab expansion on by default? Are there any obscure consequences of tabs in the input string?

Discussion

  • Paul McGuire

    Paul McGuire - 2007-01-09

    Logged In: YES
    user_id=893320
    Originator: NO

    The lineno and line methods should be unaffected by the presence or absence of tabs, since they work off of newlines (which are not modified at all). As you surmise, col *is* affected by tab expansion. I thought I had something to that effect in the description of col, but if not, I'll be sure to add it.

    However what *should be* and what *is* are often different, especially in software. Do you have an actual test case that shows this to be a problem? If so, please post it, because that is most definitely a bug!

    Ah! Rereadng your post, I think I see the problem you are having. I suspect you are writing a parse action using one of the simplified method signatures, such as parseAction(loc,tokens), and then using loc to index into the original string. Yes, this will break all three functions you describe. The real workaround here is to use the full parse action method signature parseAction(parseString, loc, tokens), which passes the string being parsed to the parse action, tab-expanded or no. I will definitely add some discussion of this to the documentation.

    I think tab expansion is there as a legacy of one of the earliest pyparsing applications, in which column location was an important part of the parser, and I didn't want to hassle with tab-to-space conversion on the fly at parse time. If I find it in my notes, I'll post a better explanation.

     
  • Paul McGuire

    Paul McGuire - 2007-01-09
    • labels: --> Documentation
     
  • Eike Welk

    Eike Welk - 2007-01-09

    Logged In: YES
    user_id=13106
    Originator: YES

    Testcase? Sorry I was lazy. I just read the code and (as usual) thought I kew what was going on. :-)

    I'm reffering to the following line in pyparsing (1.4.4):
    Line 803 (ParserElement.parseString):
    loc, tokens = self._parse( instring.expandtabs(), 0 )

    expandtabs() returns a new string; and the parsing happens with this new string. When the input contains tabs the new string is longer than the input string.
    The loc arguments of the parse actions are correct in the tab-expanded string but wrong in the input string. lineno() is however usually computed with the input string, but with loc values from the tab-expanded strings.
    (I think).

    ---
    My parse actions are all class methods; they have four arguments: (self, parseString, loc, tokens). Might that confuse pyparsing?
    With these parse actions I store locs. After parsing is finished, the program can detect semantic errors and wants to report them to the user.

    ---
    >> I think tab expansion is there as a legacy of one of the
    >> earliest pyparsing applications,
    Ah, interesting.

    Yours, Eike.

    ---
    PS.: Thank you for writing such a usefull library; and maintaining it.

     
  • Paul McGuire

    Paul McGuire - 2007-04-12
    • status: open --> closed
     
  • Paul McGuire

    Paul McGuire - 2007-04-12

    Logged In: YES
    user_id=893320
    Originator: NO

    Documentation expanded for release 1.4.6.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks