Video Object and Interaction Deletion
OCR expert VLM powered by Hunyuan's native multimodal architecture
Repo for SeedVR2 & SeedVR
Programmatic access to the AlphaGenome model
Unified Multimodal Understanding and Generation Models
Designed for text embedding and ranking tasks
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Audio foundation model excelling in audio understanding
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Official implementation of Watermark Anything with Localized Messages
A Production-ready Reinforcement Learning AI Agent Library
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
GPT4V-level open-source multi-modal model based on Llama3-8B
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention
StudioOllamaUI is a local, portable interface for Ollama
Official repo for consistency models
800,000 step-level correctness labels on LLM solutions to MATH problem
Code release for "Masked-attention Mask Transformer
GLIDE: a diffusion-based text-conditional image synthesis model
An implementation of model parallel GPT-2 and GPT-3-style models
Dia-1.6B generates lifelike English dialogue and vocal expressions