Chat & pretrained large vision language model
A Unified Framework for Text-to-3D and Image-to-3D Generation
Flexible Photo Recrafting While Preserving Your Identity
State-of-the-art diffusion models for image and audio generation
Awesome multilingual OCR toolkits based on PaddlePaddle
Diffusion Transformer with Fine-Grained Chinese Understanding
Official inference repo for FLUX.1 models
Generating Immersive, Explorable, and Interactive 3D Worlds
A SOTA open-source image editing model
Open Source Differentiable Computer Vision Library
Collection of Gemma 3 variants that are trained for performance
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Implementation of Imagen, Google's Text-to-Image Neural Network
Guiding Instruction-based Image Editing via Multimodal Large Language
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Easily turn large sets of image urls to an image dataset
CLIP, Predict the most relevant text snippet given an image
Stable Diffusion with Core ML on Apple Silicon
A neural network that transforms a design mock-up into static websites
Code for running inference with the SAM 3D Body Model 3DB
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
This repo contains the code for 1D tokenizer and generator
Train machine learning models within Docker containers
GPT4V-level open-source multi-modal model based on Llama3-8B
Mixture-of-Experts Vision-Language Models for Advanced Multimodal