Renderer for the harmony response format to be used with gpt-oss
Repo of Qwen2-Audio chat & pretrained large audio language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
RGBD video generation model conditioned on camera input
Qwen3-omni is a natively end-to-end, omni-modal LLM
Inference code for scalable emulation of protein equilibrium ensembles
Netease Youdao's open-source embedding and reranker models
Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Robust Speech Recognition Across Languages, Dialects
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
Audio Language Models are Few-Shot Learners
A 0.1B Omni model trained from scratch
Open Source Speech Language Model
Open-source industrial-grade ASR models
Foundation model for image generation
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL