VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Official implementation of DreamCraft3D
Tongyi Deep Research, the Leading Open-source Deep Research Agent
OCR expert VLM powered by Hunyuan's native multimodal architecture
Towards Real-World Vision-Language Understanding
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
ICLR2024 Spotlight: curation/training code, metadata, distribution
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Language modeling in a sentence representation space
The ChatGPT Retrieval Plugin lets you easily find personal documents
A SOTA open-source image editing model
High-Resolution Image Synthesis with Latent Diffusion Models
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
A Conversational Speech Generation Model
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Open Multilingual Multimodal Chat LMs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Fine-tuning ChatGLM-6B with PEFT
A method to increase the speed and lower the memory footprint
Code release for "Masked-attention Mask Transformer
PyTorch implementation of MAE
Large-scale autoregressive pixel model for image generation by OpenAI
Environment generation code for the paper "Emergent Tool Use"
A mix of GAN implementations including progressive growing