suffix arrays for phrase extraction download

Java Suffix array library for phrase discovery. Inspired initially by the classic paper of Yamamoto & Church, with newer ideas from Abouelhoda et al and Kim et al. Adapted for large alphabet so that words can be tokenized as alphabet characters.

Features

Adapted to large alphabet for NLP
Includes tokenizers, normalizers and symbol table.
Calculates term and document frequency; full distribution across texts
Modular design, user can apply various statistics to phrases
Includes Aho Corasick automaton, also for large alphabet
Needs: Better sorting, though radix qsort usually works okay
Needs: improvement to the symbol table. This is the slowest part.
Needs: Tokenizers, normalizers for more languages.
Needs: Some links to foma (foma.sourceforge.net/)

Project Activity

See All Activity >

License

Apache Software License

Follow suffix arrays for phrase extraction

suffix arrays for phrase extraction Web Site

Other Useful Business Software

Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account

Rate This Project

User Reviews

Be the first to post a review of suffix arrays for phrase extraction!

Additional Project Details

Intended Audience

Information Technology

Programming Language

Java

Related Categories

Java Linguistics Software, Java Natural Language Processing (NLP) Tool

Registered

2010-05-02

Similar Business Software

LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
QBench

The modern, flexible, easy-to-use LIMS. QBench enables our customers to get a LIMS up and running faster. Automate your entire lab with our developer-friendly API, Inventory Management, Customer Portal, Billing, and Quality Management System modules. QBench is a cloud-based Laboratory...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
Lockbox LIMS

A sample tracking, test result capture, and inventory management cloud LIMS for life science research, biotech/NGS, and industrial QC labs. Includes regulatory support for CLIA, HIPAA, Part 11, and ISO 17025. Nothing is more critical to a lab’s success than the quality, security, and...

See Software
kama.ai

A Responsible AI Agent platform providing accurate, accountable, and safe AI for your organization. As a Composite (hybrid) platform, it combines Knowledge Graph AI, governed Generative AI, and Intelligent Automation technologies. This combination gives you trusted answers that are accurate...

See Software
Enterprise Bot

Enterprise Bot, based in Switzerland, is a pioneer in Conversational AI, Process Automation, and Generative AI. With the trust of esteemed enterprise giants across industries like Generali, SIX, SBB, DHL, and SWICA, Enterprise Bot is revolutionizing both customer and employee experiences....

See Software