State-of-the-art TTS model under 25MB
Unified Multimodal Understanding and Generation Models
Sharp Monocular Metric Depth in Less Than a Second
Capable of understanding text, audio, vision, video
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
DeepSeek Coder: Let the Code Write Itself
Renderer for the harmony response format to be used with gpt-oss
Tool for exploring and debugging transformer model behaviors
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Lets make video diffusion practical
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Example Discord bot written in Python that uses the completions API
Programmatic access to the AlphaGenome model
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Revolutionizing Database Interactions with Private LLM Technology
Fast and Universal 3D reconstruction model for versatile tasks
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
High-resolution models for human tasks
Inference framework for 1-bit LLMs
Qwen2.5-VL is the multimodal large language model series
Qwen3-omni is a natively end-to-end, omni-modal LLM
Foundation Models for Time Series