OpenNLP / Discussion / Open Discussion: Sentence Detector bug

Seedlet Zhong - 2009-03-16

Dear,

I was trying to train a new English Sentence Detector model with opennlp.tools.sentdetect.SentenceDetectorME.

My setting is like this:

-- Java version: java version "1.6.0_07"
                      Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
                      Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)

-- training data set: penn treebank wsj section 00-18

-- test data set: penn treebank wsj section 22-24.

When I tried to train a new model, I encountered a runtime java error. Here is a snippet from the error report file:

Current CompileTask:
C2: 16      opennlp.tools.sentdetect.DefaultSDContextGenerator.previousSpaceIndex(Ljava/lang/CharSequence;I)I (53 bytes)

--------------- P R O C E S S ---------------

Java Threads: ( => current thread )
0x000000005b30a400 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=32684, stack(0x0000000040f4c000,0x000000004104d000)]
=>0x000000005b307800 JavaThread "CompilerThread1" daemon [_thread_in_native, id=32683, stack(0x0000000040e4b000,0x0000000040f4c000)]
0x000000005b303800 JavaThread "CompilerThread0" daemon [_thread_in_native, id=32682, stack(0x0000000040d4a000,0x0000000040e4b000)]
0x000000005b301800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=32681, stack(0x0000000040c49000,0x0000000040d4a000)]
0x000000005b2de400 JavaThread "Finalizer" daemon [_thread_blocked, id=32680, stack(0x0000000041902000,0x0000000041a03000)]
0x000000005b2dcc00 JavaThread "Reference Handler" daemon [_thread_blocked, id=32679, stack(0x0000000041801000,0x0000000041902000)]
0x000000005b26c000 JavaThread "main" [_thread_in_Java, id=32669, stack(0x0000000040845000,0x0000000040946000)]

Moreover, when I used the model EnglishSD.bin.gz provided by the authors of opennlp to test my test data set, a similar runtime error was observed. However, it works OK with a 5 sentences test set.

I compared the EnglishSD.bin.gz file for version 1.3 and 1.4.3, they are exactly the same. So I guess no new model has been retrained with the new version 1.4.3.

Can anyone help figure out this problem? Thanks in advance.

Regards,
Seedlet

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2009-03-16
  
  Hi,
  This isn't an exception in the usual sense. Looks like the jvm is complaining about running out of memory. You may need to explicitly set the heap size when you launch it especially for training.
  
  For running it, It may be that you are very close to the memory bounds after you load the model and so for 5 sentence you don't encounter any sentence long enough to push you over.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Seedlet Zhong - 2009-03-16
  
  Hi Tom,
  
  Thanks for your prompt reply.
  
  In fact, I tried the same training data and test data set with opennlp version 1.3. It worked OK with version 1.3. That's why I thought it's a bug of version 1.4.3.
  
  Regards,
  Seedlet
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- David R. MacIver - 2009-05-20
  
  Hi,
  
  I've only just noticed this thread.
  
  I ran into this bug yesterday and spent some time tracking down the route cause. I think the snippet you posted from the error log is misleading: The issue is not the low memory detector, but in fact that hotspot is segfaulting when trying to compile the previousSpaceIndex method. Here's an example of the full error: http://drmaciver.com/hs_err_pid19390.log. There seems to be a strange interaction between the loop optimisation and synchronization optimisation which is causing the sentence detector's loop over a StringBuffer to confuse it.
  
  I've opened a ticket about this with OpenNLP ( https://sourceforge.net/tracker/?func=detail&aid=2793972&group_id=3368&atid=103368 ) and also with Sun as it's a hotspot bug. In the meantime you can work around this by patching your version of OpenNLP in one of the ways I describe in the ticket.
  
  Regards,
  David
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sentence Detector bug

Forums

Help

Sentence Detector bug

Sentence Detector bug

Forums

Help

Sentence Detector bug document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sentence Detector bug