Stanford NLP Python library for many human languages
Data and tools for generating and inspecting OLMo pre-training data
Extract schema, statistics and entities from datasets
The Classical Language Toolkit
The no-nonsense RAG chunking library
Superlinked is a Python framework for AI Engineers
A Repo For Document AI
ExtractThinker is a Document Intelligence library for LLMs
A full spaCy pipeline and models for scientific/biomedical documents
Persian NLP Toolkit
Easy-to-use and high-performance NLP and LLM framework
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Training data (data labeling, annotation, workflow) for all data types
Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.
Obsei is a low code AI powered automation tool
A library for deep learning end-to-end dialog systems and chatbots
Code repo for "WebArena to build Autonomous Agents
Resources, corpora, and tools for Chinese natural language processing
Unified embedding model
Common Resource Grep
Compose Software Without Writing Any Programing Code
A toolkit for managing and manipulating text annotations
Aseryla code repositories
jiant is an nlp toolkit
PyTorch original implementation of Cross-lingual Language Model