Data processing for and with foundation models
ExtractThinker is a Document Intelligence library for LLMs
Extract schema, statistics and entities from datasets
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
The Classical Language Toolkit
Public opinion analysis system
Han Language Processing
Stanford NLP Python library for many human languages
A natural language interface for computers
Toolkit for conversational AI
A Repo For Document AI
Text mining using tidy tools
Underthesea - Vietnamese NLP Toolkit
Superlinked is a Python framework for AI Engineers
A curated list of data mining papers about fraud detection
Training data (data labeling, annotation, workflow) for all data types
Fast and customizable framework for automatic ML model creation
Persian NLP Toolkit
Data and tools for generating and inspecting OLMo pre-training data
Semantic search and workflows for medical/scientific papers
Industrial-strength Natural Language Processing (NLP)
Efficient few-shot learning with Sentence Transformers
Easy-to-use and powerful NLP library with Awesome model zoo
Easy-to-use and high-performance NLP and LLM framework
The most accurate natural language detection library for Python