Stable Virtual Camera: Generative View Synthesis with Diffusion Models
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
Global weather forecasting model using graph neural networks and JAX
Tooling for the Common Objects In 3D dataset
Qwen3-omni is a natively end-to-end, omni-modal LLM
Bidirectional token-classification model for identifiable info
Pretrained time-series foundation model developed by Google Research
General-purpose image editing model that delivers high-fidelity
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
Memory-efficient and performant finetuning of Mistral's models
Diffusion Transformer with Fine-Grained Chinese Understanding
State-of-the-art (SoTA) text-to-video pre-trained model
LLM-based Reinforcement Learning audio edit model
High-Fidelity and Controllable Generation of Textured 3D Assets
A state-of-the-art open visual language model
New family of code large language models (LLMs)
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
DeepMind model for tracking arbitrary points across videos & robotics