Easy-to-use and powerful NLP library with Awesome model zoo
Controllable & emotion-expressive zero-shot TTS
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Code for running inference and finetuning with SAM 3 model
EPUB to audiobook converter, optimized for Audiobookshelf
Miso TTS is an 8 billion, highly emotive text-to-speech model
Official MiniMax Model Context Protocol (MCP) server
Multimodal embedding and reranking models built on Qwen3-VL
Open Source Document Management System for Digital Archives
Large-language-model & vision-language-model based on Linear Attention
Stable Diffusion web UI
Visual Causal Flow
Paste Markdown and AI responses into Word Excel instantly fast
Powerful Android AI agent with tools, automation, and Linux shell
A New Axis of Sparsity for Large Language Models
SOTA discrete acoustic codec models with 40/75 tokens per second
Chinese XLNet pre-trained model
Open-source framework for intelligent speech interaction
Audio foundation model excelling in audio understanding
Concatenate a directory full of files into a single prompt
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Quick illustration of how one can easily read books together with LLMs
SDK for building interactive UI components over MCP for AI tools
LLM-based agent for general purpose software engineering tasks
Towards Human-Level Text-to-Speech through Style Diffusion