Text and image to video generation: CogVideoX and CogVideo
Ready-to-use OCR with 80+ supported languages
Autoregressive Model Beats Diffusion
Offline inference engine for art, real-time voice conversations
Chat & pretrained large vision language model
Multimodal-Driven Architecture for Customized Video Generation
Collection of Gemma 3 variants that are trained for performance
Contexts Optical Compression
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Stable Diffusion built-in to Blender
Flexible Photo Recrafting While Preserving Your Identity
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Stable Diffusion web UI
Towards Real-World Vision-Language Understanding
Capable of understanding text, audio, vision, video
Easily compute clip embeddings and build a clip retrieval system
A Pioneering Open-Source Alternative to GPT-4o
Implementation of Phenaki Video, which uses Mask GIT
CogView4, CogView3-Plus and CogView3(ECCV 2024)
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim
Easy-to-use and powerful NLP library with Awesome model zoo
Fast stable diffusion on CPU and AI PC
ImageBind One Embedding Space to Bind Them All
Diffusion Transformer with Fine-Grained Chinese Understanding
An open source implementation of CLIP