ICLR2024 Spotlight: curation/training code, metadata, distribution
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Official implementation of DreamCraft3D
Open-source framework for intelligent speech interaction
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-Source Financial Large Language Models
Hackable and optimized Transformers building blocks
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
code for Mesh R-CNN, ICCV 2019
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Foundation Models for Time Series
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Generating Immersive, Explorable, and Interactive 3D Worlds
Tiny vision language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Open-source industrial-grade ASR models
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Official implementation of Watermark Anything with Localized Messages
Video understanding codebase from FAIR for reproducing video models
A Conversational Speech Generation Model
Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
DeepMind model for tracking arbitrary points across videos & robotics
Renderer for the harmony response format to be used with gpt-oss
Ultra-Efficient LLMs on End Device