The official repo of Qwen chat & pretrained large language model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Multimodal-Driven Architecture for Customized Video Generation
RGBD video generation model conditioned on camera input
Controllable & emotion-expressive zero-shot TTS
Pokee Deep Research Model Open Source Repo
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Chat & pretrained large vision language model
Global weather forecasting model using graph neural networks and JAX
OCR expert VLM powered by Hunyuan's native multimodal architecture
Industrial-level controllable zero-shot text-to-speech system
PyTorch code and models for the DINOv2 self-supervised learning
Pushing the Limits of Mathematical Reasoning in Open Language Models
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Official implementation of Watermark Anything with Localized Messages
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Fast and Universal 3D reconstruction model for versatile tasks
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Tooling for the Common Objects In 3D dataset
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding