State-of-the-art TTS model under 25MB
Qwen2.5-VL is the multimodal large language model series
Models for object and human mesh reconstruction
Text and image to video generation: CogVideoX and CogVideo
DeepSeek Coder: Let the Code Write Itself
High-Resolution Image Synthesis with Latent Diffusion Models
Lets make video diffusion practical
Visual Causal Flow
A Systematic Framework for Interactive World Modeling
AlphaFold 3 inference pipeline
Open-source multi-speaker long-form text-to-speech model
An experimental version of DeepSeek model
LTX-Video Support for ComfyUI
Open-source large language model family from Tencent Hunyuan
Industrial-level controllable zero-shot text-to-speech system
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Python bindings for llama.cpp
Video Object and Interaction Deletion
Official repository for LTX-Video
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Advancing Open-source World Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Contexts Optical Compression
VMZ: Model Zoo for Video Modeling
Provides convenient access to the Anthropic REST API from any Python 3