Data and tools for generating and inspecting OLMo pre-training data
Easy-to-use and high-performance NLP and LLM framework
The no-nonsense RAG chunking library
Gives you the power to build your own Wisdom widget
Text mining using tidy tools
Stanford NLP Python library for many human languages
The Classical Language Toolkit
A full spaCy pipeline and models for scientific/biomedical documents
ExtractThinker is a Document Intelligence library for LLMs
Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.
Stanford CoreNLP, a Java suite of core NLP tools
Training data (data labeling, annotation, workflow) for all data types
Persian NLP Toolkit
Superlinked is a Python framework for AI Engineers
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Apache OpenNLP
A Repo For Document AI
A library for deep learning end-to-end dialog systems and chatbots
Public opinion analysis system
Modular Suite of NLP Tools
Obsei is a low code AI powered automation tool
Code repo for "WebArena to build Autonomous Agents
Resources, corpora, and tools for Chinese natural language processing
Unified embedding model
Unicode XML TEI text analysis platform