Parser generator to read, process, or translate structured text
Python binding to the Apache Tika™ REST services
Award-winning modern data processing SDK in C++20
Editor with scripting language, security features & system interfaces.
Simple XML editor and XSD viewer
Stuttering Chinese word segmentation
PDF Library for Developers
Ansj word segmentation
General-Purpose PDF Library for Java and .NET
Detexter is an app designed to extract text from PDF files.
TextBlob is a Python library for processing textual data