From Images to High-Fidelity 3D Assets
Long-form streaming TTS system for multi-speaker dialogue generation
OpenTinker is an RL-as-a-Service infrastructure for foundation models
An AI-powered security review GitHub Action using Claude
Python SDK for Claude Agent
CLIP, Predict the most relevant text snippet given an image
Renderer for the harmony response format to be used with gpt-oss
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
High-Fidelity and Controllable Generation of Textured 3D Assets
Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing
The official PyTorch implementation of Google's Gemma models
Generate Any 3D Scene in Seconds
GLM-4-Voice | End-to-End Chinese-English Conversational Model
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
A state-of-the-art open visual language model
LLM-based Reinforcement Learning audio edit model
Multimodal embedding and reranking models built on Qwen3-VL
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Designed for text embedding and ranking tasks
Tiny vision language model
Qwen-Image is a powerful image generation foundation model
Inference code for scalable emulation of protein equilibrium ensembles