LanguageTool-1.8 (and latest version of in SVN as of today 2012/08/08, svn r7815)
reports errors at the wrong line numbers in the following case:
$ cat sample-text.txt
Errors are detected hereeee at correct line 1.
< Errrrrors here are nottttt detected.
>
Errors in all text from now on is detected at incorrrrect line (4 instead of 5).
$ java -jar dist/LanguageTool.jar -l en-US sample-text.txt
Expected text language: English (US)
Working on sample-text.txt...
1.) Line 1, column 21, Rule ID: MORFOLOGIK_RULE_EN_US
Message: Possible spelling mistake found
Errors are detected hereeee at correct line 1. Errors in all text from...
^^^^^^^
2.) Line 4, column 47, Rule ID: MORFOLOGIK_RULE_EN_US
Message: Possible spelling mistake found
...rrors in all text from now on is detected at incorrrrect line (4 instead of 5).
^^^^^^^^^^^
Time: 157ms for 3 sentences (19.1 sentences/sec)
Notice the following bugs:
* spelling errors at line 2 are not detected
* spelling errors at line 5 are detected but reported at incorrect line 4 (in fact everything after this line will be reported at the incorrect line).
It seems triggered at the < ... > pattern in the sample_text.txt file which spans multiple lines.
sample input file where spelling errors are detected at wrong line number
I see, I think it's because Main.java reads in the input file with getFilteredText(...) which calls StringTools.filterXML(fileContents); to filter out xml tag.
But my sample file here was not a xml file, it just happened to contains < and > characters which messes up checking.
I find it odd that the input is treated as a xml file (filtering what xml tags, or what LT thinks looks like XML tags).
I would prefer if iinput was treated as xml.
I don't see a good reason to do that. Removing this feature seems better.
If user wants to filter xml tag, he can filter XML tag prior to running LT.
Or alternatively, provide a command line flag to indicate whether to filter-out xml tags or not.
But in the Unix philosophie of doing one thing only, LT should not care about doing that to keep it simpler.
Thanks for the report. This is partially fixed now: there's a new option --xmlfilter. If that option is not set, the text is not filtered and thus the positions are okay. For real XML and with --xmlfilter the positions might still be broken (namely when there are linebreaks inside the XML elements).
Closing, as it's fixed according to the last comment.