Advancing Open-source World Models
Long-form streaming TTS system for multi-speaker dialogue generation
Implementation of "MobileCLIP" CVPR 2024
This repository contains the official implementation of FastVLM
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
Contexts Optical Compression
Qwen3-ASR is an open-source series of ASR models
Hackable and optimized Transformers building blocks
gpt-oss-120b and gpt-oss-20b are two open-weight language models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Block Diffusion for Ultra-Fast Speculative Decoding
This repository contains the official implementation of research
Official repo for consistency models
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Model that fuses instruct, reasoning and agentic skills
OpenAI’s compact 20B open model for fast, agentic, and local use