Menu

#186 Bug found in Chinese

development version
closed-fixed
nobody
5
2014-03-25
2013-08-08
Kason
No

When we input the following sentence, the system is crashed.
"於 1984 年的教育統籌委員會第一號報告書中".
This error is also found in the web demo site. (http://www.languagetool.org/)

Here is the error message.
Error: java.lang.StringIndexOutOfBoundsException: String index out of range: 18
at java.lang.String.substring(String.java:1907)
at org.languagetool.JLanguageTool.adjustRuleMatchPos(JLanguageTool.java:645)
at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:617)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:540)
at org.languagetool.JLanguageTool.check(JLanguageTool.java:496)
at org.languagetool.server.LanguageToolHttpHandler.checkText(LanguageToolHttpHandler.java:238)
at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:116)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:668)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:638)
at sun.net.httpserver.ServerImpl$DefaultExecutor.execute(ServerImpl.java:156)
at sun.net.httpserver.ServerImpl$Dispatcher.handle(ServerImpl.java:424)
at sun.net.httpserver.ServerImpl$Dispatcher.run(ServerImpl.java:389)
at java.lang.Thread.run(Thread.java:722)

Discussion

  • Daniel Naber

    Daniel Naber - 2013-08-08

    Thanks for the report. I think is is a problem with the tokenizer we use. "年" for some reason is analyzed as "始##始年/t". The "/t" is some tag, but the "##" looks wrong. As "始##始年" is longer than the original string, LanguageTool gets confused.

    You might want to submit a bug directly at the tokenizer project we use: http://code.google.com/p/ictclas4j/.

     
  • Kason

    Kason - 2013-08-09

    Thanks for your help!

     
  • Daniel Naber

    Daniel Naber - 2013-08-09

    I'm not sure how active ictclas4j development is, so if you're a developer and can fix this in ictclas4j that would be great.

    This bug at ictclas4j: http://code.google.com/p/ictclas4j/issues/detail?id=14

     
    • Kason

      Kason - 2013-08-10

      Thanks for your reply. That bug is actually our team reports to them. Hope that the bug could be solved soon.

       
  • Daniel Naber

    Daniel Naber - 2013-08-10

    I have added a workaround to LanguageTool. You can test it with the daily snapshot at http://languagetool.org/download/snapshots/LanguageTool-20130810-snapshot.zip

     
    • Kason

      Kason - 2013-08-26

      It seems that the file is different to the source file that provided in language tools. I think I have some misunderstanding with it. Would you like to explain more about the usage of snapshot as some of the file used in the original source is missing in the snapshot? Thanks.

       
  • Daniel Naber

    Daniel Naber - 2013-08-26

    A snapshot is just what you get when you check out the current code from git (https://github.com/languagetool-org/languagetool) and compile it. It helps people who want to try the latest code but do not want to compile it themselves. Did you try the snapshot? Did it work for you or did the bug appear again?

     
    • Kason

      Kason - 2013-08-27

      Thanks for your reply. We solve the problem temporarily by skipping rule "wa5". Thanks for your reply.=]

       
  • Daniel Naber

    Daniel Naber - 2014-03-25

    Not really fixed, but there's a workaround and it's not our bug so I'm closing this issue.

     
  • Daniel Naber

    Daniel Naber - 2014-03-25
    • status: open --> closed-fixed
     

Log in to post a comment.