Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Personalize Any Characters with a Scalable Diffusion Transformer
GLM-4 series: Open Multilingual Multimodal Chat LMs
Chinese and English multimodal conversational language model
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Ling is a MoE LLM provided and open-sourced by InclusionAI
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
A Customizable Image-to-Video Model based on HunyuanVideo
Chat & pretrained large audio language model proposed by Alibaba Cloud
A series of math-specific large language models of our Qwen2 series
Revolutionizing Database Interactions with Private LLM Technology
The official PyTorch implementation of Google's Gemma models
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Tracking Any Point (TAP)
Video understanding codebase from FAIR for reproducing video models
Hackable and optimized Transformers building blocks
Provides convenient access to the Anthropic REST API from any Python 3
Diffusion Transformer with Fine-Grained Chinese Understanding