Menu

Sentence Detector bug

2009-03-16
2013-04-16
  • Seedlet Zhong

    Seedlet Zhong - 2009-03-16

    Dear,

    I was trying to train a new English Sentence Detector model with opennlp.tools.sentdetect.SentenceDetectorME.

    My setting is like this:

    -- Java version: java version "1.6.0_07"
                          Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
                          Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode)

    -- training data set: penn treebank wsj section 00-18

    -- test data set: penn treebank wsj section 22-24.

    When I tried to train a new model, I encountered a runtime java error. Here is a snippet from the error report file:

    Current CompileTask:
    C2: 16      opennlp.tools.sentdetect.DefaultSDContextGenerator.previousSpaceIndex(Ljava/lang/CharSequence;I)I (53 bytes)

    ---------------  P R O C E S S  ---------------

    Java Threads: ( => current thread )
      0x000000005b30a400 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=32684, stack(0x0000000040f4c000,0x000000004104d000)]
    =>0x000000005b307800 JavaThread "CompilerThread1" daemon [_thread_in_native, id=32683, stack(0x0000000040e4b000,0x0000000040f4c000)]
      0x000000005b303800 JavaThread "CompilerThread0" daemon [_thread_in_native, id=32682, stack(0x0000000040d4a000,0x0000000040e4b000)]
      0x000000005b301800 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=32681, stack(0x0000000040c49000,0x0000000040d4a000)]
      0x000000005b2de400 JavaThread "Finalizer" daemon [_thread_blocked, id=32680, stack(0x0000000041902000,0x0000000041a03000)]
      0x000000005b2dcc00 JavaThread "Reference Handler" daemon [_thread_blocked, id=32679, stack(0x0000000041801000,0x0000000041902000)]
      0x000000005b26c000 JavaThread "main" [_thread_in_Java, id=32669, stack(0x0000000040845000,0x0000000040946000)]

    Moreover, when I used the model EnglishSD.bin.gz provided by the authors of opennlp to test my test data set, a similar runtime error was observed. However, it works OK with a 5 sentences test set.

    I compared the EnglishSD.bin.gz file for version 1.3 and 1.4.3, they are exactly the same. So I guess no new model has been retrained with the new version 1.4.3.

    Can anyone help figure out this problem? Thanks in advance.

    Regards,
    Seedlet

     
    • Thomas Morton

      Thomas Morton - 2009-03-16

      Hi,
         This isn't an exception in the usual sense.  Looks like the jvm is complaining about running out of memory.  You may need to explicitly set the heap size when you launch it especially for training. 

      For running it, It may be that you are very close to the memory bounds after you load the model and so for 5 sentence you don't encounter any sentence long enough to push you over.

      Hope this helps...Tom

       
    • Seedlet Zhong

      Seedlet Zhong - 2009-03-16

      Hi Tom,

      Thanks for your prompt reply.

      In fact, I tried the same training data and test data set with opennlp version 1.3. It worked OK with version 1.3. That's why I thought it's a bug of version 1.4.3.

      Regards,
      Seedlet

       
    • David R. MacIver

      Hi,

      I've only just noticed this thread.

      I ran into this bug yesterday and spent some time tracking down the route cause. I think the snippet you posted from the error log is misleading: The issue is not the low memory detector, but in fact that hotspot is segfaulting when trying to compile the previousSpaceIndex method. Here's an example of the full error: http://drmaciver.com/hs_err_pid19390.log.  There seems to be a strange interaction between the loop optimisation and synchronization optimisation which is causing the sentence detector's loop over a StringBuffer to confuse it.

      I've opened a ticket about this with OpenNLP ( https://sourceforge.net/tracker/?func=detail&aid=2793972&group_id=3368&atid=103368 ) and also with Sun as it's a hotspot bug. In the meantime you can work around this by patching your version of OpenNLP in one of the ways I describe in the ticket.

      Regards,
      David

       

Log in to post a comment.