This is the stand alone distribution for the Krovetz stemmer, as used by
both indri and galago.
It provides both a C++ and a Java implementation, with a unified set of head word and exception data. It is suitable for inclusion in other projects which require a stemmer, and as a standalone utility.
The example file paths below refer to the 3.4 release. Newer releases will be visible in https://sourceforge.net/projects/lemur/files/lemur/
Download the most recent binary kstem-x.x.jar from https://sourceforge.net/projects/lemur/files/lemur/KrovetzStemmer-3.4/ and add the jar to your classpath. Create an instance of org.lemurproject.kstem.KrovetzStemmer, and call
public String stem(String term)
in your code. KrovetzStemmer includes a simple main method that allows stemming a single term or an input file of terms, one per line.
For example:
$ java -classpath /path/to/kstem-x.x.jar org.lemurproject.kstem.KrovetzStemmer -w someTerm someTerm someTermStem $ java -classpath /path/to/kstem-x.x.jar org.lemurproject.kstem.KrovetzStemmer someFile term1 term1Stem term2 term2Stem ...
You can also download the source and build the jar file with mvn, or include the source in your project.
Download the source distribution from http://sourceforge.net/projects/lemur/files/lemur/KrovetzStemmer-3.4/KrovetzStemmer-3.4.tar.gz/download and use make to compile the kstem application in the src/c++ subdirectory.
The kstem application provides the same functionality as the java application described above.
The APIs to use in your C++ code are:
char *kstem_stemmer(char *term) int stem::KrovetzStemmer::kstem_stem_to_buffer(char *term, char *buffer)