The open-source data curation platform for LLMs
Simple, Pythonic building blocks to evaluate LLM applications
Stanford NLP Python library for many human languages
MII makes low-latency and high-throughput inference possible
MTEB: Massive Text Embedding Benchmark
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Central interface to connect your LLM's with external data
State-of-the-art diffusion models for image and audio generation
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
High-quality multi-lingual text-to-speech library by MyShell.ai
tiktoken is a fast BPE tokeniser for use with OpenAI's models
A Model Context Protocol (MCP) server
LLM abstractions that aren't obstructions
A modular graph-based Retrieval-Augmented Generation (RAG) system
Han Language Processing
Capable of understanding text, audio, vision, video
Label, clean and enrich text datasets with LLMs
Code for the paper Language Models are Unsupervised Multitask Learners
Seamlessly integrate LLMs into scikit-learn
ktrain is a Python library that makes deep learning AI more accessible
Repo of Qwen2-Audio chat & pretrained large audio language model
Underthesea - Vietnamese NLP Toolkit
Stable Diffusion built-in to Blender
Qwen2.5-VL is the multimodal large language model series
Framework that is dedicated to making neural data processing