#2 SentenceDetectorME.java improvement

closed-accepted
nobody
None
5
2010-06-11
2009-08-20
James Kosin
No

I've added a patch to maybe help complete what may have been started.
The patch makes both -lang and -encoding options a requirment. Usually enclosing any option in [] means the entire option is completely optional, this really isn't the case anymore.
The next half of the patch checks both lang and encoding for non-null values before continuing.
The last actually uses the lang value when calling the train function...

Simple enough.

Other usefull things may be to add a way to get the valid encoding names, and supported lang values... ie: "en", "es", etc...

Discussion

  • James Kosin

    James Kosin - 2009-08-20

    Patch for SentenceDetectorME.java on TRUNK

     
  • Joern Kottmann

    Joern Kottmann - 2009-08-20

    Thanks, for the patch. Its applied now.

     
  • Joern Kottmann

    Joern Kottmann - 2009-08-20

    Thanks, for the patch. Its applied now.

     
  • James Kosin

    James Kosin - 2009-08-20

    Thanks for taking.
    If you want to make the -lang and -encoding optional again, you only have to change the null assigment at the top of the main routine. I didn't want to biasly pick "en" and "US-ASCII" as the defaults.

     
  • Joern Kottmann

    Joern Kottmann - 2009-08-21

    Usually its a good idea to use the platform default encoding as default. Did the training of a sentence model worded for you ? We now also have an evaluator to measure the performance of the sentence detector, in case there are no test data we have built-in support for cross validation.

     
  • James Kosin

    James Kosin - 2009-08-22

    If they are using their native encoding then I may agree with you.
    However, this parameter really describes the encoding of the input file; which may or may not be in the native format.
    Maybe, if we kept a simple normal encoding for the supported languages, we could look up the default encoding based on the specified language. But, we would have to be careful not to overwrite the encoding they may be specifying on the command line for this.

    The model trained OK for me. I only had the small sample set with the source to test with.

     
  • Joern Kottmann

    Joern Kottmann - 2010-06-11

    Closed it because the patch is already applied.

     
  • Joern Kottmann

    Joern Kottmann - 2010-06-11
    • status: open --> closed-accepted
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks