Fast stable diffusion on CPU and AI PC
AI Fully Automated Short Video Engine
Wan2.1: Open and Advanced Large-Scale Video Generative Model
1 min voice data can also be used to train a good TTS model
Open-source AI agent framework
Faster Whisper transcription with CTranslate2
Automatic Speech Recognition with Word-level Timestamps
Code for running inference and finetuning with SAM 3 model
Advanced language and coding AI model
Official inference repo for FLUX.2 models
Open source personal AI Assistant for Linux, Windows and Mac
A lightweight audio-to-MIDI converter with pitch bend detection
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
A command-line productivity tool powered by AI large language models
High-Quality Voice Cloning TTS for 600+ Languages
Advanced LLM-powered brute-force tool combining AI intelligence
Text and image to video generation: CogVideoX and CogVideo
Tokenizer-Free TTS for Multilingual Speech Generation
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
A set of ready to use Agent Skills for research, science, engineering
Qwen3-TTS is an open-source series of TTS models
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Reference PyTorch implementation and models for DINOv3
Python bindings for llama.cpp
A Lightweight Face Recognition and Facial Attribute Analysis