Data processing for and with foundation models
Toolkit for conversational AI
Data and tools for generating and inspecting OLMo pre-training data
A Repo For Document AI
Industrial-strength Natural Language Processing (NLP)
Hub of ready-to-use datasets for ML models
Superlinked is a Python framework for AI Engineers
State of the Art Natural Language Processing
ExtractThinker is a Document Intelligence library for LLMs
Stanford NLP Python library for many human languages
Semantic search and workflows for medical/scientific papers
Han Language Processing
Extract schema, statistics and entities from datasets
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Obsei is a low code AI powered automation tool
The Classical Language Toolkit
Haystack is an open source NLP framework to interact with your data
Training data (data labeling, annotation, workflow) for all data types
Underthesea - Vietnamese NLP Toolkit
Persian NLP Toolkit
Fast and customizable framework for automatic ML model creation
The most accurate natural language detection library for Python
Easy-to-use and high-performance NLP and LLM framework
Efficient few-shot learning with Sentence Transformers
Build AI-powered semantic search applications