Menu

#122 LanguageTool reports error at wrong line number after <...>

development version
closed-fixed
nobody
None
5
2014-03-25
2012-08-08
No

LanguageTool-1.8 (and latest version of in SVN as of today 2012/08/08, svn r7815)
reports errors at the wrong line numbers in the following case:

$ cat sample-text.txt
Errors are detected hereeee at correct line 1.

< Errrrrors here are nottttt detected.
>
Errors in all text from now on is detected at incorrrrect line (4 instead of 5).

$ java -jar dist/LanguageTool.jar -l en-US sample-text.txt
Expected text language: English (US)
Working on sample-text.txt...
1.) Line 1, column 21, Rule ID: MORFOLOGIK_RULE_EN_US
Message: Possible spelling mistake found
Errors are detected hereeee at correct line 1. Errors in all text from...
^^^^^^^

2.) Line 4, column 47, Rule ID: MORFOLOGIK_RULE_EN_US
Message: Possible spelling mistake found
...rrors in all text from now on is detected at incorrrrect line (4 instead of 5).
^^^^^^^^^^^
Time: 157ms for 3 sentences (19.1 sentences/sec)

Notice the following bugs:
* spelling errors at line 2 are not detected
* spelling errors at line 5 are detected but reported at incorrect line 4 (in fact everything after this line will be reported at the incorrect line).

It seems triggered at the < ... > pattern in the sample_text.txt file which spans multiple lines.

Discussion

  • Dominique Pelle

    Dominique Pelle - 2012-08-08

    sample input file where spelling errors are detected at wrong line number

     
  • Nobody/Anonymous

    I see, I think it's because Main.java reads in the input file with getFilteredText(...) which calls StringTools.filterXML(fileContents); to filter out xml tag.

    But my sample file here was not a xml file, it just happened to contains < and > characters which messes up checking.

    I find it odd that the input is treated as a xml file (filtering what xml tags, or what LT thinks looks like XML tags).

    I would prefer if iinput was treated as xml.
    I don't see a good reason to do that. Removing this feature seems better.
    If user wants to filter xml tag, he can filter XML tag prior to running LT.

    Or alternatively, provide a command line flag to indicate whether to filter-out xml tags or not.
    But in the Unix philosophie of doing one thing only, LT should not care about doing that to keep it simpler.

     
  • Daniel Naber

    Daniel Naber - 2012-08-12

    Thanks for the report. This is partially fixed now: there's a new option --xmlfilter. If that option is not set, the text is not filtered and thus the positions are okay. For real XML and with --xmlfilter the positions might still be broken (namely when there are linebreaks inside the XML elements).

     
  • Daniel Naber

    Daniel Naber - 2014-03-25
    • status: open --> closed-fixed
     
  • Daniel Naber

    Daniel Naber - 2014-03-25

    Closing, as it's fixed according to the last comment.

     

Log in to post a comment.

MongoDB Logo MongoDB