When we input the following sentence, the system is crashed.
"於 1984 年的教育統籌委員會第一號報告書中".
This error is also found in the web demo site. (http://www.languagetool.org/)
Here is the error message.
Error: java.lang.StringIndexOutOfBoundsException: String index out of range: 18
at java.lang.String.substring(String.java:1907)
at org.languagetool.JLanguageTool.adjustRuleMatchPos(JLanguageTool.java:645)
at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:617)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:540)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:496)
at org.languagetool.server.LanguageToolHttpHandler.checkText(LanguageToolHttpHandler.java:238)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:116)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:668)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:638)
at sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:156)
at sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:424)
at sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:389)
at java.lang.Thread.run(Thread.java:722)
Thanks for the report. I think is is a problem with the tokenizer we use. "年" for some reason is analyzed as "始##始年/t". The "/t" is some tag, but the "##" looks wrong. As "始##始年" is longer than the original string, LanguageTool gets confused.
You might want to submit a bug directly at the tokenizer project we use: http://code.google.com/p/ictclas4j/.
Thanks for your help!
I'm not sure how active ictclas4j development is, so if you're a developer and can fix this in ictclas4j that would be great.
This bug at ictclas4j: http://code.google.com/p/ictclas4j/issues/detail?id=14
Thanks for your reply. That bug is actually our team reports to them. Hope that the bug could be solved soon.
I have added a workaround to LanguageTool. You can test it with the daily snapshot at http://languagetool.org/download/snapshots/LanguageTool-20130810-snapshot.zip
It seems that the file is different to the source file that provided in language tools. I think I have some misunderstanding with it. Would you like to explain more about the usage of snapshot as some of the file used in the original source is missing in the snapshot? Thanks.
A snapshot is just what you get when you check out the current code from git (https://github.com/languagetool-org/languagetool) and compile it. It helps people who want to try the latest code but do not want to compile it themselves. Did you try the snapshot? Did it work for you or did the bug appear again?
Thanks for your reply. We solve the problem temporarily by skipping rule "wa5". Thanks for your reply.=]
Not really fixed, but there's a workaround and it's not our bug so I'm closing this issue.