I prepared everything for the release now, is there anything which should go into it, Jason and James ?
Would be nice if you can both get the head of the current code and test it a little. I already did
lots of testing and think its ready now. I might do a little more testing before we actually release.
Can someone have a look over the readme file ?
Thanks, I'm currently okay with the current release. I'm currently restructuring the dictionary creator for the name-finder and census data. Sorry for the long delay in this part. I'm not worried since, we really need a use for this in the final project before we fully integrate the dictionary for the namefinder.
Yeah, sadly we have no free training data for the name finder right now. Since many people have this problem I guess we should write a parser for the conll03 data.
The Conll03 training data contains english and german articles with named entities.
I picked up the latest stuff and tried it out. Works fine when embedded in another system and used as a jar file. However, running the command line doesn't work so well, and perhaps it is my inexperience with maven that is the problem. I did "mvn install", and it claims to have built just fine. The bin/opennlp script is looking for an opennlp-tools-*.jar file in the main opennlp dir, but instead I have target/opennlp-tools-1.5.0-SNAPSHOT.jar. If I just try running the jar file directly, I get:
~/devel/opennlp$ java -jar target/opennlp-tools-1.5.0-SNAPSHOT.jar
Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/model/EventStream
Caused by: java.lang.ClassNotFoundException: opennlp.model.EventStream
at java.security.AccessController.doPrivileged(Native Method)
… 1 more
Could not find the main class: opennlp.tools.cmdline.CLI. Program will exit.
Am I doing something wrong?
Try mvn assembly:assembly, and then unpack the binary distribution. The script should work then.
The class paths have changed for opennlp and maxent recently. That gave me the same problems. I use the latest development builds and here is what I use:
set JAVA_CMD=java -classpath .\dist\lib;.\dist\lib\jwnl-1.4_rc3.jar;.\dist\lib\opennlp-maxent-3.0.1-SNAPSHOT.jar;.\dist\lib\opennlp-tools-1.5.0-SNAPSHOT.jar
%JAVA_CMD% opennlp.tools.cmdline.CLI SentenceDetector .\enSentenceDetector.model < test.txt > out1.txt
You have to be sure to get the versions correct also when you do things.
Doing 'mvn assembly:assembly' and then using bin/opennlp in the binary distribution works fine. So, I guess the README should be updated to mention this. Otherwise, looks good to go!
Yeah, that and Java doesn't allow .jar files embedded in .jar files just yet for resources. At least not since the last time I checked.
James, we do not embed jars in jars right ? The opennlp jar file just contains the class path in its manifest file.
OK, its released now. I will create a small announcement post for our forums.