#157 Java API: JLanguageTool is not thread safe

2.0
closed-fixed
nobody
None
5
2013-08-31
2013-01-17
No

I`ve just integrated JLanguageTool in a server environment to help the customer to get there spelling right. So far JLanguageTool is working great - Good work!

But sometimes i get a exceptions in the serverlog (BufferOverFlow, RuntimeExceptions, ...) from the LanguageTool. To investigate the problem i`ve created a JUnit Testdriver to reproduce the problems.

The problem is, that at some places JLanguageTool is not threadsafe and uses unsynchronized static variables, i.e. running new JLanguageTool(...).check() in parallel will cause all different kind of exceptions.

I did (not yet) investigate future. If LanguageTool would be simple forkable at githup (Issue 3600257 ...) i would just fork it and try to fix that myself.

For now i will use JLanguageTool only within a synchronized(JLanguageTool.class) block.

The testdriver is:

import static org.junit.Assert.assertNotNull;

import java.util.ArrayList;
import java.util.List;

import org.junit.Test;
import org.languagetool.JLanguageTool;
import org.languagetool.Language;

public class TestSpellCheckerFailures {
@Test
public void testSpellCheckerFailure() throws Exception {
System.setProperty("javax.xml.parsers.SAXParserFactory",
"com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl");

final String txt = " \"Auch wenn Deine kleinen Füße die Erde nie berührten, sind deine Spuren trotzdem da überall.\"\n"
+ " \n" + "";
final Object syncLock = new Object();
List<Thread> threads = new ArrayList<>();
synchronized (syncLock) {
for (int i = 0; i < 20; i++) {
Runnable r = new Runnable() {
@Override
public void run() {
/*
* Take the lock to ensure that all threads start at the
* same moment
*/
synchronized (syncLock) {
/*
* Just do something to ensure the VM does not
* optimize the lock out.
*/
syncLock.notifyAll();
}
for (int i = 0; i < 100; i++) {
try {
JLanguageTool tool = new JLanguageTool(Language.GERMANY_GERMAN);
assertNotNull(tool.check(txt));
} catch (Exception e) {
// Set a breakpoint here to see the exceptions
throw new RuntimeException(e);
}
}
}
};
Thread t = new Thread(r);
t.start();
threads.add(t);
}
}

for (Thread t : threads)
t.join();
}
}

Discussion

  • Daniel Naber

    Daniel Naber - 2013-01-19

    The issue might be that all Language classes (like Language.GERMAN etc) look like constants but are in fact complex objects with state. This might be fixed by the switch to Maven modules I'm just working on.

     
  • Stefan Lotties

    Stefan Lotties - 2013-05-27

    It's not just about the constants. Things I've found so far:
    - Hunspell.Dictionary is not thread-safe because the dictionary stores a parser state (using these funny little nio buffers)
    - disambiguation/pattern rules store parser states within the rules
    - the IStemmer implementation that is usually used (DictionaryLookup) is not thread-safe

    and probably much more. Very often a difference between a rule/parser and the parser state is completely missing. I just forked the library on github (https://github.com/slotties/languagetool-mirror/commits/master) and fix these issues, well, as far as the fix allows to be backward-compatible. I'll file a push-request once I'm done. Though you shouldn't expect this to be done so quick as I'm just working on it on my free time.

    btw, nice library (in spite of this ... bug here). We used rapid spell so far but languagetool is sure better (and more powerful, well, it's more than just a spell checker).

     
    Last edit: Stefan Lotties 2013-05-27
  • Daniel Naber

    Daniel Naber - 2013-05-27

    Thanks for taking care of this! Please send pull-requests when ready. BTW, our github mirror is not 100% up-to-date, but it should be good enough for now.

     
  • Daniel Naber

    Daniel Naber - 2013-08-31
    • status: open --> closed-fixed
     
  • Daniel Naber

    Daniel Naber - 2013-08-31

    This is done, quoting the change log: "Thread-safety has been improved. The recommended use case is now to create a new JLanguageTool object for each thread, but to create the language only once (e.g. new English()) and use that for all JLanguageTool instances."

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks