GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Real-time behaviour synthesis with MuJoCo, using Predictive Control
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Inference code for scalable emulation of protein equilibrium ensembles
Memory-efficient and performant finetuning of Mistral's models
Official implementation of Watermark Anything with Localized Messages
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
Fast and Universal 3D reconstruction model for versatile tasks
A PyTorch library for implementing flow matching algorithms
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A SOTA open-source image editing model
The ChatGPT Retrieval Plugin lets you easily find personal documents
GPT4V-level open-source multi-modal model based on Llama3-8B
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Ling is a MoE LLM provided and open-sourced by InclusionAI
Chat & pretrained large vision language model
Qwen3-TTS is an open-source series of TTS models
Open-source framework for intelligent speech interaction
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
Implementation of the Surya Foundation Model for Heliophysics
Large-language-model & vision-language-model based on Linear Attention
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning