A Repo For Document AI
Data and tools for generating and inspecting OLMo pre-training data
ExtractThinker is a Document Intelligence library for LLMs
Training data (data labeling, annotation, workflow) for all data types
The Classical Language Toolkit
Haystack is an open source NLP framework to interact with your data
Obsei is a low code AI powered automation tool
Superlinked is a Python framework for AI Engineers
Stanford NLP Python library for many human languages
Code repo for "WebArena to build Autonomous Agents
Easy-to-use and high-performance NLP and LLM framework
Unified embedding model
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Persian NLP Toolkit
A full spaCy pipeline and models for scientific/biomedical documents
The no-nonsense RAG chunking library
A library for deep learning end-to-end dialog systems and chatbots
jiant is an nlp toolkit
A toolkit for managing and manipulating text annotations
High-accuracy NLP parser with models for 11 languages
fastNLP: A Modularized and Extensible NLP Framework
NLP made easy
Natural Language Processing Best Practices & Examples
We describe a simple XML format to share text documents and annotation