State-of-the-art TTS model under 25MB
Foundation Models for Time Series
Python inference and LoRA trainer package for the LTX-2 audio–video
A PyTorch library for implementing flow matching algorithms
A Systematic Framework for Interactive World Modeling
Open-Source Financial Large Language Models
Qwen3-TTS is an open-source series of TTS models
RGBD video generation model conditioned on camera input
Capable of understanding text, audio, vision, video
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Official repository for LTX-Video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
This repository contains the official implementation of FastVLM
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Generate Any 3D Scene in Seconds
Qwen3-omni is a natively end-to-end, omni-modal LLM
Large Multimodal Models for Video Understanding and Editing
Towards self-verifiable mathematical reasoning
Foundational Models for State-of-the-Art Speech and Text Translation
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Controllable & emotion-expressive zero-shot TTS
Sharp Monocular Metric Depth in Less Than a Second
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Pokee Deep Research Model Open Source Repo
Open-weight, large-scale hybrid-attention reasoning model