Advancing Open-source World Models
State-of-the-art TTS model under 25MB
Qwen3-TTS is an open-source series of TTS models
Python inference and LoRA trainer package for the LTX-2 audio–video
Open-Source Financial Large Language Models
DeepMind model for tracking arbitrary points across videos & robotics
Official repository for LTX-Video
Capable of understanding text, audio, vision, video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A Systematic Framework for Interactive World Modeling
Long-form streaming TTS system for multi-speaker dialogue generation
Qwen3-ASR is an open-source series of ASR models
Generate Any 3D Scene in Seconds
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen3-omni is a natively end-to-end, omni-modal LLM
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A Multi-Modal World Model for Reconstructing, Generating, Simulation
This repository contains the official implementation of FastVLM
Large Multimodal Models for Video Understanding and Editing
RGBD video generation model conditioned on camera input
Sharp Monocular Metric Depth in Less Than a Second
Open-source industrial-grade ASR models
Hunyuan Translation Model Version 1.5
Implementation of "MobileCLIP" CVPR 2024
Open-weight, large-scale hybrid-attention reasoning model