OCR expert VLM powered by Hunyuan's native multimodal architecture
Stable Diffusion with Core ML on Apple Silicon
Official implementation of Watermark Anything with Localized Messages
Programmatic access to the AlphaGenome model
A Powerful Native Multimodal Model for Image Generation
DeepSeek Coder: Let the Code Write Itself
Code for running inference with the SAM 3D Body Model 3DB
Unified Multimodal Understanding and Generation Models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Large-language-model & vision-language-model based on Linear Attention
Designed for text embedding and ranking tasks
Audio foundation model excelling in audio understanding
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
A Production-ready Reinforcement Learning AI Agent Library
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
GPT4V-level open-source multi-modal model based on Llama3-8B
Open-weight, large-scale hybrid-attention reasoning model
Example Discord bot written in Python that uses the completions API
StudioOllamaUI is a local, portable interface for Ollama
Official code for Style Aligned Image Generation via Shared Attention
Official repo for consistency models
800,000 step-level correctness labels on LLM solutions to MATH problem
Code release for "Masked-attention Mask Transformer