Python inference and LoRA trainer package for the LTX-2 audio–video
Large Multimodal Models for Video Understanding and Editing
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Text and image to video generation: CogVideoX and CogVideo
A Family of Open Sourced Music Foundation Models
Reference PyTorch implementation and models for DINOv3
State-of-the-art TTS model under 25MB
Industrial-level controllable zero-shot text-to-speech system
DeepSeek Coder: Let the Code Write Itself
Revolutionizing Database Interactions with Private LLM Technology
Qwen3-Coder is the code version of Qwen3
Recovering the Visual Space from Any Views
Qwen2.5-VL is the multimodal large language model series
Python bindings for llama.cpp
Contexts Optical Compression
Provides convenient access to the Anthropic REST API from any Python 3
A Powerful Native Multimodal Model for Image Generation
Visual Causal Flow
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Qwen-Image is a powerful image generation foundation model
Models for object and human mesh reconstruction
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Open-source multi-speaker long-form text-to-speech model
Diversity-driven optimization and large-model reasoning ability
Video Object and Interaction Deletion