Share

OpenNLP

Tracker: Patches

5 SentenceDetectorME.java improvement - ID: 2840782
Last Update: Comment added ( jameskosin )

I've added a patch to maybe help complete what may have been started.
The patch makes both -lang and -encoding options a requirment. Usually
enclosing any option in [] means the entire option is completely optional,
this really isn't the case anymore.
The next half of the patch checks both lang and encoding for non-null
values before continuing.
The last actually uses the lang value when calling the train function...

Simple enough.

Other usefull things may be to add a way to get the valid encoding names,
and supported lang values... ie: "en", "es", etc...


James Kosin ( jameskosin ) - 2009-08-20 03:44

5

Open

None

Nobody/Anonymous

None

None

Public


Comments ( 5 )




Date: 2009-08-22 04:45
Sender: jameskosin

If they are using their native encoding then I may agree with you.
However, this parameter really describes the encoding of the input file;
which may or may not be in the native format.
Maybe, if we kept a simple normal encoding for the supported languages, we
could look up the default encoding based on the specified language. But,
we would have to be careful not to overwrite the encoding they may be
specifying on the command line for this.

The model trained OK for me. I only had the small sample set with the
source to test with.



Date: 2009-08-21 10:27
Sender: joernkottmann

Usually its a good idea to use the platform default encoding as default.
Did the training of a sentence model worded for you ? We now also have an
evaluator to measure the performance of the sentence detector, in case
there are no test data we have built-in support for cross validation.


Date: 2009-08-20 23:09
Sender: jameskosin

Thanks for taking.
If you want to make the -lang and -encoding optional again, you only have
to change the null assigment at the top of the main routine. I didn't want
to biasly pick "en" and "US-ASCII" as the defaults.



Date: 2009-08-20 13:53
Sender: joernkottmann

Thanks, for the patch. Its applied now.


Date: 2009-08-20 13:47
Sender: joernkottmann

Thanks, for the patch. Its applied now.


Log in to comment.




Attached File ( 1 )

Filename Description Download
sentdetect.patch Patch for SentenceDetectorME.java on TRUNK Download

Change ( 1 )

Field Old Value Date By
File Added 339854: sentdetect.patch 2009-08-20 03:44 jameskosin