Contexts Optical Compression
Industrial-level controllable zero-shot text-to-speech system
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Fast and Universal 3D reconstruction model for versatile tasks
Diffusion Transformer with Fine-Grained Chinese Understanding
Pokee Deep Research Model Open Source Repo
Recovering the Visual Space from Any Views
Bidirectional token-classification model for identifiable info
A Production-ready Reinforcement Learning AI Agent Library
Programmatic access to the AlphaGenome model
Open Source Speech Language Model
Z80-μLM is a 2-bit quantized language model
A 0.1B Omni model trained from scratch
Inference script for Oasis 500M
4M: Massively Multimodal Masked Modeling
Hackable and optimized Transformers building blocks
A Powerful Native Multimodal Model for Image Generation
Revolutionizing Database Interactions with Private LLM Technology
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Collection of Gemma 3 variants that are trained for performance
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI