CogView4, CogView3-Plus and CogView3(ECCV 2024)
A Unified Framework for Text-to-3D and Image-to-3D Generation
Qwen-Image is a powerful image generation foundation model
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
Pokee Deep Research Model Open Source Repo
Unified Multimodal Understanding and Generation Models
Tooling for the Common Objects In 3D dataset
State-of-the-art (SoTA) text-to-video pre-trained model
LLM-based Reinforcement Learning audio edit model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A Conversational Speech Generation Model
High-Resolution Image Synthesis with Latent Diffusion Models
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Official PyTorch Implementation of "Scalable Diffusion Models"
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
A latent text-to-image diffusion model
GLIDE: a diffusion-based text-conditional image synthesis model
A mix of GAN implementations including progressive growing
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Dia-1.6B generates lifelike English dialogue and vocal expressions
High-efficiency reasoning and agentic intelligence model