A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
State-of-the-art (SoTA) text-to-video pre-trained model
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Hunyuan Translation Model Version 1.5
Official repository for LTX-Video
Global weather forecasting model using graph neural networks and JAX
GPT4V-level open-source multi-modal model based on Llama3-8B
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Inference script for Oasis 500M
Fast and Universal 3D reconstruction model for versatile tasks
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
An Efficient Agentic Model for Computer Use
Large-language-model & vision-language-model based on Linear Attention
A Pragmatic VLA Foundation Model
Collection of Gemma 3 variants that are trained for performance
Tiny vision language model
Inference code for scalable emulation of protein equilibrium ensembles
Renderer for the harmony response format to be used with gpt-oss
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model