Menu

KrovetzStemmer

David Fisher

Standalone Krovetz Stemmer

This is the stand alone distribution for the Krovetz stemmer, as used by
both indri and galago.

It provides both a C++ and a Java implementation, with a unified set of head word and exception data. It is suitable for inclusion in other projects which require a stemmer, and as a standalone utility.

The example file paths below refer to the 3.4 release. Newer releases will be visible in https://sourceforge.net/projects/lemur/files/lemur/

KrovetzStemmer Installation and Usage

Krovetz Stemmer Installation

Java

Download the most recent binary kstem-x.x.jar from https://sourceforge.net/projects/lemur/files/lemur/KrovetzStemmer-3.4/ and add the jar to your classpath. Create an instance of org.lemurproject.kstem.KrovetzStemmer, and call

public String stem(String term)

in your code. KrovetzStemmer includes a simple main method that allows stemming a single term or an input file of terms, one per line.

For example:

$ java -classpath /path/to/kstem-x.x.jar org.lemurproject.kstem.KrovetzStemmer -w someTerm
someTerm someTermStem
$ java -classpath /path/to/kstem-x.x.jar org.lemurproject.kstem.KrovetzStemmer someFile
term1 term1Stem
term2 term2Stem
...

You can also download the source and build the jar file with mvn, or include the source in your project.

C++

Download the source distribution from http://sourceforge.net/projects/lemur/files/lemur/KrovetzStemmer-3.4/KrovetzStemmer-3.4.tar.gz/download and use make to compile the kstem application in the src/c++ subdirectory.

The kstem application provides the same functionality as the java application described above.

The APIs to use in your C++ code are:

char *kstem_stemmer(char *term)
int stem::KrovetzStemmer::kstem_stem_to_buffer(char *term, char *buffer)

Related

Wiki: Home