High-Resolution Image Synthesis with Latent Diffusion Models
MII makes low-latency and high-throughput inference possible
Build cross-modal and multimodal applications on the cloud
Plug-n-play module turning text-to-image models into animation
Run GGUF models easily with a UI or API. One File. Zero Install.
A Python application to add watermarks (text or image) to PDF files
Mice speech to text with MX Cinnamon OS ISO
Guiding Instruction-based Image Editing via Multimodal Large Language
AI-powered tool to quickly remove watermarks from images flawlessly
Overcoming Data Limitations for High-Quality Video Diffusion Models
A library for transfer learning by reusing parts of TensorFlow models
Multi-Voice and Prompt-Controlled TTS Engine
Official code for Style Aligned Image Generation via Shared Attention
Embed images and sentences into fixed-length vectors
Generate 3D objects conditioned on text or images
Convert an image to text to spot intelligible words.
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis
CLIP + FFT/DWT/RGB = text to image/video
Text-to-Image generation. The repo for NeurIPS 2021 paper
Run the Stable Diffusion releases in a Docker container
Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting
Let us control diffusion models
An open-source framework for training large multimodal models
Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.
Task-oriented finetuning for better embeddings on neural search