FAIR Sequence Modeling Toolkit 2
Advanced language and coding AI model
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Robust Speech Recognition Across Languages, Dialects
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Industrial-level controllable zero-shot text-to-speech system
A state-of-the-art open visual language model
Controllable & emotion-expressive zero-shot TTS
Pokee Deep Research Model Open Source Repo
Tooling for the Common Objects In 3D dataset
Video Object and Interaction Deletion
Qwen3-omni is a natively end-to-end, omni-modal LLM
OCR expert VLM powered by Hunyuan's native multimodal architecture
PyTorch code and models for the DINOv2 self-supervised learning
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Official implementation of Watermark Anything with Localized Messages
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Global weather forecasting model using graph neural networks and JAX
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Fast and Universal 3D reconstruction model for versatile tasks
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Large-language-model & vision-language-model based on Linear Attention
RGBD video generation model conditioned on camera input