Qwen2.5-VL is the multimodal large language model series
Using AI models to automatically provide commentary and edit videos
A list of free LLM inference resources accessible via API
Open-source multi-speaker long-form text-to-speech model
Supercharge Your LLM with the Fastest KV Cache Layer
Open Source Speech Language Model
Multimodal embedding and reranking models built on Qwen3-VL
Multimodal-Driven Architecture for Customized Video Generation
AutoGluon: AutoML for Image, Text, and Tabular Data
Build Vision Agents quickly with any model or video provider
Implementation of Video Diffusion Models
An open source implementation of CLIP
OCR model for complex documents with layout-aware structured outputs
Cloud-native open source data warehouse for analytics and AI queries
Long-form streaming TTS system for multi-speaker dialogue generation
Foundation model for image generation
Automated translation solution for visual novels
Document content and metadata extraction microservice
Controllable and fast Text-to-Speech for over 7000 languages
Fast stable diffusion on CPU and AI PC
Generate audiobooks from e-books
Instant voice cloning by MIT and MyShell. Audio foundation model
lightweight package to simplify LLM API calls
Qwen3-ASR is an open-source series of ASR models
A Python package for segmenting geospatial data with the SAM