Implementation of Imagen, Google's Text-to-Image Neural Network
Image inpainting tool powered by SOTA AI Model
Autoregressive Model Beats Diffusion
Generating Immersive, Explorable, and Interactive 3D Worlds
Multimodal-Driven Architecture for Customized Video Generation
Flexible Photo Recrafting While Preserving Your Identity
Offline inference engine for art, real-time voice conversations
Stable Diffusion web UI
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Contexts Optical Compression
Easily compute clip embeddings and build a clip retrieval system
Chinese and English multimodal conversational language model
A Pioneering Open-Source Alternative to GPT-4o
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Implementation of Phenaki Video, which uses Mask GIT
ImageBind One Embedding Space to Bind Them All
Diffusion Transformer with Fine-Grained Chinese Understanding
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim
Capable of understanding text, audio, vision, video
Edit PDF files with Nano Banana
Towards Real-World Vision-Language Understanding
High-Resolution Image Synthesis with Latent Diffusion Models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Fast stable diffusion on CPU and AI PC