A nearly-live implementation of OpenAI's Whisper
Unifying 3D Mesh Generation with Language Models
SoTA open-source TTS
Toolkit for conversational AI
A fast TTS architecture with conditional flow matching
A community-supported supercharged version of paperless
Open source healthcare AI
Knowledge Graph Generation from Any Text
OCR model for complex documents with layout-aware structured outputs
LLM
Scalable data pre processing and curation toolkit for LLMs
Qwen3-omni is a natively end-to-end, omni-modal LLM
A very simple framework for state-of-the-art NLP
State-of-the-art (SoTA) text-to-video pre-trained model
Open Source Speech Language Model
Synchronized Translation for Videos
Generate blog articles from video or audio
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Speech-AI-Forge is a project developed around TTS generation model
Qwen-Image is a powerful image generation foundation model
Implementation of Imagen, Google's Text-to-Image Neural Network
Stanford NLP Python library for many human languages
Easily compute clip embeddings and build a clip retrieval system
Collection of Gemma 3 variants that are trained for performance
Foundation model for image generation