Superlinked is a Python framework for AI Engineers
Data and tools for generating and inspecting OLMo pre-training data
Stanford NLP Python library for many human languages
Easy-to-use and high-performance NLP and LLM framework
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
The Classical Language Toolkit
Persian NLP Toolkit
A full spaCy pipeline and models for scientific/biomedical documents
ExtractThinker is a Document Intelligence library for LLMs
The no-nonsense RAG chunking library
Obsei is a low code AI powered automation tool
Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.
A Repo For Document AI
Training data (data labeling, annotation, workflow) for all data types
Haystack is an open source NLP framework to interact with your data
Code repo for "WebArena to build Autonomous Agents
Resources, corpora, and tools for Chinese natural language processing
Unified embedding model
A toolkit for managing and manipulating text annotations
jiant is an nlp toolkit
PyTorch original implementation of Cross-lingual Language Model
High-accuracy NLP parser with models for 11 languages
Tools to download and cleanup Common Crawl data
fastNLP: A Modularized and Extensible NLP Framework
Natural Language Processing Best Practices & Examples