Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#59 line & column numbers incorrect when LT reads from stdin

1.0
closed-fixed
None
5
2012-06-24
2010-08-28
Daniel Naber
No

Copying Dominique Pellé's report from the mailing list:

Example:

================================================
# sample files with errors at line 0 and 2 (line 1 is empty).
$ cat test.txt
This is a test of of language tool.

This is is a test of language tool.

# When reading from a file, the line numbers (fromy and toy look correct):
$ java -jar JLanguageTool/dist/LanguageTool.jar --api test.txt
<?xml version="1.0" encoding="UTF-8"?>
<matches>
<error fromy="0" fromx="15" toy="0" tox="21" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="of"
context="This is a test of of language tool. This is is a test of
languag..." contextoffset="15" errorlength="5"/>
<error fromy="2" fromx="5" toy="2" tox="10" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="is"
context="This is a test of of language tool. This is is a test of
language tool. " contextoffset="42" errorlength="5"/>
</matches>
<!--
Time: 105ms for 2 sentences (19.0 sentences/sec)
-->

# Now when reading the same file test.txt from stdin...
$ java -jar JLanguageTool/dist/LanguageTool.jar --api - < test.txt
<?xml version="1.0" encoding="UTF-8"?>
<matches>
<error fromy="1" fromx="15" toy="1" tox="21" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="of"
context="This is a test of of language tool. " contextoffset="15"
errorlength="5"/>
<error fromy="2" fromx="5" toy="2" tox="11" ruleId="WORD_REPEAT_RULE"
msg="Possible typo: you repeated a word" replacements="is"
context="This is is a test of language tool. " contextoffset="5"
errorlength="5"/>
<!--
Time: 111ms for 3 sentences (27.0 sentences/sec)
-->
================================================

Notice that the fromy and toy fields are incorrect. The tox is
also incorrect. The contextoffset is also different.

Discussion

1 2 > >> (Page 1 of 2)
  • Daniel Naber
    Daniel Naber
    2010-08-28

    Line numbers should be fixed in 1.0.1 but column number are sometimes still wrong.

     
  • Daniel Naber
    Daniel Naber
    2012-03-25

    • summary: line & column numbers incorrect when languagetool reads fro --> line & column numbers incorrect when LT reads from stdin
     
  • > Line numbers should be fixed in 1.0.1 but column number are sometimes still wrong.

    Column numbers are still wrong in LanguageTool-1.8.

    The following 2 line are enough to reproduce the bug:

    $ (echo "An test"; echo "An test") | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l en --api
    <?xml version="1.0" encoding="UTF-8"?>
    <matches>
    <error fromy="1" fromx="0" toy="1" tox="3" ruleId="EN_A_VS_AN" msg="Use 'A' instead of 'An' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'" replacements="A" context="An test An test " contextoffset="0" errorlength="2"/>
    <error fromy="1" fromx="0" toy="2" tox="15" ruleId="PHRASE_REPETITION" subId="1" msg="This phrase is duplicated. You should probably leave only 'An test'." replacements="An test" context="An test An test " contextoffset="0" errorlength="15"/>
    <error fromy="2" fromx="0" toy="2" tox="2" ruleId="EN_A_VS_AN" msg="Use 'A' instead of 'An' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'" replacements="A" context="An test An test " contextoffset="8" errorlength="2"/>
    <!--
    Time: 107ms for 1 sentences (9.3 sentences/sec)
    -->

    The 2 lines given to LT are identical.
    Yet LT reports different columns for the 2 lines (tox=2 and then tox=3) in the 2 errors EN_A_VS_AN"

     
  • Here is another case was gave wrong column number:

    $ (echo "This is"; echo "is an error.") | \ java -jar LanguageTool.jar -l en --api

    It gives:

    fromy="1" fromx="5" toy="2" tox="10"

    I fixed this one in this SVN checkin:

    ==========
    r7389 | dominikoeo | 2012-06-17 21:06:34 +0200 (Sun, 17 Jun 2012) | 3 lines

    - bug #3054895: the column reported by LanguageTool
    was sometimes wrong when error span the new line.
    ==========

    However, this case is still wrong:

    $ (echo "An test"; echo "An test") | java -jar ~/sb/languagetool/dist/LanguageTool.jar -l en --api -d PHRASE_REPETITION
    <?xml version="1.0" encoding="UTF-8"?>
    <matches>
    <error fromy="1" fromx="0" toy="1" tox="3" ruleId="EN_A_VS_AN" msg="Use 'A' instead of 'An' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'" replacements="A" context="An test An test " contextoffset="0" errorlength="2"/>
    <error fromy="2" fromx="0" toy="2" tox="2" ruleId="EN_A_VS_AN" msg="Use 'A' instead of 'An' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'" replacements="A" context="An test An test " contextoffset="8" errorlength="2"/>
    <!--
    Time: 99ms for 1 sentences (10.1 sentences/sec)
    -->

    Notice that fist error has tox="3" and second error has tox="2".
    They should be both identical to tox="2".

     
  • In 1.8-dev I fixed some parts of this, now you get:

    <error fromy="0" fromx="15" toy="0" tox="20" ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word" replacements="of" context="This is a test of of language tool. This is is a test of languag..." contextoffset="15" errorlength="5"/>
    <error fromy="2" fromx="5" toy="2" tox="10" ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word" replacements="is" context="This is a test of of language tool. This is is a test of language tool. " contextoffset="42" errorlength="5"/>

    and

    <error fromy="0" fromx="15" toy="0" tox="20" ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word" replacements="of" context="This is a test of of language tool. " contextoffset="15" errorlength="5"/>
    <error fromy="2" fromx="5" toy="2" tox="10" ruleId="ENGLISH_WORD_REPEAT_RULE" msg="Possible typo: you repeated a word" replacements="is" context="This is is a test of language tool. " contextoffset="5" errorlength="5"/>

    Note the tox value: it is now consistent. As far as I can see, fromy and toy are also correct, but there are outstanding problems with contextoffset and missing </matches> element.

     
    • assigned_to: nobody --> milek_pl
     
  • When I look again, the contextoffset is fine (the context is different in both cases, so it's OK). It seems the bug is fixed completely. Dominique, could you check?

     
    • status: open --> open-fixed
     
  • > Dominique,
    > could you check?

    At least the 2 examples I gave are now fixed. Doing further tests... I don't see anything wrong anymore.
    I assume this bug can be marked a resolved now. Changing resolution as fixed.

     
    • status: open-fixed --> closed-fixed
     
1 2 > >> (Page 1 of 2)