suffix arrays for phrase extraction download

Java Suffix array library for phrase discovery. Inspired initially by the classic paper of Yamamoto & Church, with newer ideas from Abouelhoda et al and Kim et al. Adapted for large alphabet so that words can be tokenized as alphabet characters.

Features

Adapted to large alphabet for NLP
Includes tokenizers, normalizers and symbol table.
Calculates term and document frequency; full distribution across texts
Modular design, user can apply various statistics to phrases
Includes Aho Corasick automaton, also for large alphabet
Needs: Better sorting, though radix qsort usually works okay
Needs: improvement to the symbol table. This is the slowest part.
Needs: Tokenizers, normalizers for more languages.
Needs: Some links to foma (foma.sourceforge.net/)

Project Activity

See All Activity >

License

Apache Software License

Follow suffix arrays for phrase extraction

suffix arrays for phrase extraction Web Site

User Reviews

Be the first to post a review of suffix arrays for phrase extraction!

Additional Project Details

Intended Audience

Information Technology

Programming Language

Java

Related Categories

Java Linguistics Software, Java Natural Language Processing (NLP) Tool

Registered

2010-05-02

Similar Business Software

BoltsEtAl

BoltsEtAl identifies every hole in your 3D CAD assembly. Users can specify fastener configurations for each hole, after which the software automatically builds corresponding nuts, bolts, and washers. All parts are oriented and packaged neatly into one STEP file, eliminating manual part...

See Software
IBM watsonx Assistant

IBM watsonx Assistant (Formerly Watson Assistant) is a market-leading enterprise conversational AI platform that allows you to build intelligent virtual and voice assistants that can provide customers with fast, consistent and accurate answers across any messaging platform, application, device...

See Software
ActCAD Software

ACTCAD is a 2D & 3D CAD software with functionality of the industry leaders. We offer two product options: - ACTCAD Standard for 2D Drafting Power Users - ACTCAD Premium for 2D Drafting, 3D Modeling and BIM (Building Information Modeling) functionality ACTCAD can be used for applications...

See Software

Report inappropriate content