Sharp Monocular Metric Depth in Less Than a Second
Capable of understanding text, audio, vision, video
Qwen-Image is a powerful image generation foundation model
gpt-oss-120b and gpt-oss-20b are two open-weight language models
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Lets make video diffusion practical
Unified Multimodal Understanding and Generation Models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
DeepSeek Coder: Let the Code Write Itself
Tool for exploring and debugging transformer model behaviors
A Powerful Native Multimodal Model for Image Generation
Foundation Models for Time Series
Renderer for the harmony response format to be used with gpt-oss
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen2.5-VL is the multimodal large language model series
Qwen3-omni is a natively end-to-end, omni-modal LLM
Example Discord bot written in Python that uses the completions API
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Programmatic access to the AlphaGenome model
Revolutionizing Database Interactions with Private LLM Technology
Fast and Universal 3D reconstruction model for versatile tasks
Pokee Deep Research Model Open Source Repo
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
FAIR Sequence Modeling Toolkit 2