Chat & pretrained large vision language model
A Unified Framework for Text-to-3D and Image-to-3D Generation
Ready-to-use OCR with 80+ supported languages
Awesome multilingual OCR toolkits based on PaddlePaddle
Diffusion Transformer with Fine-Grained Chinese Understanding
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Generating Immersive, Explorable, and Interactive 3D Worlds
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Collection of Gemma 3 variants that are trained for performance
A SOTA open-source image editing model
Open Source Differentiable Computer Vision Library
State-of-the-art diffusion models for image and audio generation
Guiding Instruction-based Image Editing via Multimodal Large Language
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Flexible Photo Recrafting While Preserving Your Identity
CLIP, Predict the most relevant text snippet given an image
Stable Diffusion with Core ML on Apple Silicon
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
This repo contains the code for 1D tokenizer and generator
Code for running inference with the SAM 3D Body Model 3DB
Reference PyTorch implementation and models for DINOv3
Towards Real-World Vision-Language Understanding
Multimodal-Driven Architecture for Customized Video Generation
RGBD video generation model conditioned on camera input
Lets make video diffusion practical