Generating Immersive, Explorable, and Interactive 3D Worlds
Tongyi Deep Research, the Leading Open-source Deep Research Agent
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Qwen-Image is a powerful image generation foundation model
Qwen3-omni is a natively end-to-end, omni-modal LLM
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
super expressive prompting model based on ltx2.3
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Visual Causal Flow
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Collection of Gemma 3 variants that are trained for performance
CLIP, Predict the most relevant text snippet given an image
Multimodal-Driven Architecture for Customized Video Generation
Project Lyra: Open Generative 3D World Models
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
HY-Motion model for 3D character animation generation
4M: Massively Multimodal Masked Modeling
Claude Code action for GitHub PRs
GLM-4-Voice | End-to-End Chinese-English Conversational Model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
tiktoken is a fast BPE tokeniser for use with OpenAI's models
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A Multi-Modal World Model for Reconstructing, Generating, Simulation