Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Industrial-level controllable zero-shot text-to-speech system
Provides convenient access to the Anthropic REST API from any Python 3
DeepSeek Coder: Let the Code Write Itself
Open-Source Financial Large Language Models
Repo for SeedVR2 & SeedVR
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Hackable and optimized Transformers building blocks
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Block Diffusion for Ultra-Fast Speculative Decoding
Towards Real-World Vision-Language Understanding
Models for object and human mesh reconstruction
gpt-oss-120b and gpt-oss-20b are two open-weight language models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Large Multimodal Models for Video Understanding and Editing
Diversity-driven optimization and large-model reasoning ability
Audio foundation model excelling in audio understanding
Code for running inference with the SAM 3D Body Model 3DB
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Generating Immersive, Explorable, and Interactive 3D Worlds
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
A Production-ready Reinforcement Learning AI Agent Library
Foundation Models for Time Series
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding