State-of-the-art TTS model under 25MB
Convert Google Gemini web into OpenAI-compatible API
Unified Multimodal Understanding and Generation Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large Multimodal Models for Video Understanding and Editing
Fast and Universal 3D reconstruction model for versatile tasks
tiktoken is a fast BPE tokeniser for use with OpenAI's models
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A Systematic Framework for Interactive World Modeling
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Qwen3-omni is a natively end-to-end, omni-modal LLM
The official PyTorch implementation of Google's Gemma models
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Netease Youdao's open-source embedding and reranker models
Open Source Speech Language Model
Ling is a MoE LLM provided and open-sourced by InclusionAI
Phi-3.5 for Mac: Locally-run Vision and Language Models
Bidirectional token-classification model for identifiable info
Pretrained time-series foundation model developed by Google Research
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Foundation model for image generation
Collection of Gemma 3 variants that are trained for performance
MOSS‑TTS Family open‑source speech and sound generation model
Open-source deep-learning framework