Search Results for "automatic1111-stable-diffusion"

Sort By:

533 projects for "automatic1111-stable-diffusion" with 2 filters applied:

Artificial Intelligence BSD Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

Stable Diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models. ...

2 Reviews

Downloads: 302 This Week

Last Update: 2025-02-28
See Project
2

fast-stable-diffusion

Fast-stable-diffusion + DreamBooth

fast-stable-diffusion is a community-curated GitHub repository that provides Colab notebooks and integration examples for running Stable Diffusion and associated UIs like AUTOMATIC1111, ComfyUI, and DreamBooth directly on Google Colab environments. Rather than being a standalone packaged application, this project offers ready-to-use interactive notebooks that install and launch full-feature Stable Diffusion web UIs inside Colab without requiring complex local setups or GPU installations. ...

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
3

Stable Diffusion WebUI Docker

Easy Docker setup for Stable Diffusion with user-friendly UI

Stable Diffusion WebUI Docker is a Docker-based repository that simplifies running Stable Diffusion with rich user interfaces by packaging multiple popular web UIs into an easy-to-deploy containerized solution. It integrates leading community UIs like AUTOMATIC1111 and ComfyUI into a Docker Compose setup that can be started with a single command, abstracting away dependency installation and environment configuration.

Downloads: 4 This Week

Last Update: 2026-02-03
See Project
4

Stable Diffusion WebUI Forge

Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion

Stable Diffusion WebUI Forge is a performance- and feature-oriented fork of the popular AUTOMATIC1111 interface that experiments with new backends, memory optimizations, and UX improvements. It targets heavy users and researchers who push large models, control nets, and high-resolution pipelines where default settings can become bottlenecks.

Downloads: 3 This Week

Last Update: 2025-10-21
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

Diffusion for World Modeling

Learning agent trained in a diffusion world model

Diffusion for World Modeling is an experimental reinforcement learning system that trains intelligent agents inside a simulated environment generated by a diffusion-based world model. The project introduces the idea of using diffusion models, commonly used for image generation, to simulate the dynamics of an environment and predict future states based on previous observations and actions.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
6

StableSwarmUI

Multi-user UI for managing and running Stable Diffusion workflows tool

StableSwarmUI is a web-based interface designed to manage and coordinate Stable Diffusion image generation workflows in a multi-user environment. It focuses on enabling multiple users to interact with shared resources, making it suitable for collaborative or server-based deployments. It provides a centralized system where users can submit, monitor, and manage generation tasks through a browser interface. It abstracts much of the complexity involved in running diffusion models by offering a structured environment for handling prompts, outputs, and processing queues. ...

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
7

SimpleTuner

A general fine-tuning kit geared toward image/video/audio diffusion

SimpleTuner is an open-source toolkit designed to simplify the fine-tuning of modern diffusion models for generating images, video, and audio. The project focuses on providing a clear and understandable training environment for researchers, developers, and artists who want to customize generative AI models without navigating complex machine learning pipelines. It supports fine-tuning workflows for models such as Stable Diffusion variants and other diffusion architectures, enabling users to adapt pretrained models to specialized datasets or creative tasks. ...

Downloads: 0 This Week

Last Update: 2026-03-25
See Project
8

civitai

Open platform for sharing and discovering Stable Diffusion models

Civitai is an open source project that provides the codebase for a platform designed to share and manage generative AI models used for image generation. It focuses primarily on models compatible with Stable Diffusion and related technologies, allowing creators to upload, organize, and distribute custom AI models and related resources. These resources can include textual inversions, hypernetworks, aesthetic gradients, and variational autoencoders that modify or extend the capabilities of diffusion-based image generation systems. Civitai encourages collaboration by allowing users to publish their models, explore models created by others, and learn from the techniques used in the community. ...

Downloads: 5 This Week

Last Update: 11 hours ago
See Project
9

Riffusion App

Stable diffusion for real-time music generation (web app)

Riffusion App Hobby is an open-source interactive web application that enables real-time music generation using stable diffusion models adapted for audio synthesis. Unlike traditional music generation tools, it treats audio as spectrogram images and applies diffusion techniques to generate continuous sound transitions, allowing users to create evolving musical loops and compositions. The application is built with modern web technologies including Next.js, React, and three.js, providing a responsive and visually engaging interface for experimentation. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

dLLM

dLLM: Simple Diffusion Language Modeling

dLLM is an open-source framework designed to simplify the development, training, and evaluation of diffusion-based large language models. Unlike traditional autoregressive models that generate text sequentially token by token, diffusion language models generate text through an iterative denoising process that refines masked tokens over multiple steps. This approach allows models to reason over the entire sequence simultaneously and potentially produce more coherent outputs with bidirectional context. ...

Downloads: 0 This Week

Last Update: 2026-03-08
See Project
11

TurboDiffusion

100–200× Acceleration for Video Diffusion Models

...The project targets large video models and enables developers to run accelerated generation even on single high-end GPUs, making fast video synthesis more practical for research and creative workflows. TurboDiffusion is structured to integrate with existing diffusion model architectures and provides tools for experimenting with and benchmarking speed and quality trade-offs.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
12

DFlash

Block Diffusion for Ultra-Fast Speculative Decoding

DFlash is an open-source framework for ultra-fast speculative decoding using a lightweight block diffusion model to draft text in parallel with a target large language model, dramatically improving inference speed without sacrificing generation quality. It acts as a “drafter” that proposes likely continuations which the main model then verifies, enabling significant throughput gains compared to traditional autoregressive decoding methods that generate token by token.

Downloads: 2 This Week

Last Update: 2026-03-17
See Project
13

Z-Image

Image generation model with single-stream diffusion transformer

Z-Image is an efficient, open-source image generation foundation model built to make high-quality image synthesis more accessible. With just 6 billion parameters — far fewer than many large-scale models — it uses a novel “single-stream diffusion Transformer” architecture to deliver photorealistic image generation, demonstrating that excellence does not always require extremely large model sizes. The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. ...

Downloads: 49 This Week

Last Update: 2026-02-09
See Project
14

HunyuanWorld-Voyager

RGBD video generation model conditioned on camera input

HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks.

Downloads: 14 This Week

Last Update: 2025-12-17
See Project
15

VoxCPM

TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic information while preserving fine-grained prosody, leading to more stable and expressive generation than many discrete-token systems. ...

Downloads: 1 This Week

Last Update: 2025-12-05
See Project
16

Text-to-image Playground

A playground to generate images from any text prompt using SD

dalle-playground is an open-source web application that allows users to generate images from natural language text prompts using modern text-to-image generative models. Originally built around DALL-E Mini, the project later transitioned to using Stable Diffusion, enabling more detailed and higher-quality image synthesis. The system combines a backend machine learning service with a browser-based frontend interface that lets users experiment interactively with prompt engineering and generative AI. Developers can run the application locally or deploy it using cloud infrastructure, making it accessible both for experimentation and educational use. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
17

LlamaGen

Autoregressive Model Beats Diffusion

LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. ...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
18

HunyuanDiT

Diffusion Transformer with Fine-Grained Chinese Understanding

HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions.

Downloads: 1 This Week

Last Update: 2025-11-27
See Project
19

VibeVoice

Open-source multi-speaker long-form text-to-speech model

...A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. Safety mechanisms include an audible disclaimer and imperceptible watermarking in all generated audio to mitigate misuse risks.

Downloads: 22 This Week

Last Update: 3 hours ago
See Project
20

GLM-Image

GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image

GLM-Image is an open-source generative AI model designed to create high-fidelity images from text prompts using a hybrid architecture that combines autoregressive semantic understanding with diffusion-based detail refinement. It excels at generating images that include complex layouts and detailed text content, making it especially useful for posters, diagrams, info-graphics, social media graphics, and visual content that requires precise text placement and semantic alignment. Because it blends linguistic reasoning with image synthesis, GLM-Image produces visual outputs where semantic relationships and textual accuracy are prioritized alongside artistic style and realism, and its model structure enables it to handle dense visual knowledge tasks that challenge many pure diffusion models. ...

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
21

CVPR 2026

Collection of CVPR 2026 Papers and Open Source Projects

...The repository acts as a continuously updated catalog of cutting-edge research across a wide range of topics including computer vision, multimodal AI, generative models, diffusion systems, autonomous driving, medical imaging, and remote sensing. Each entry typically links to the research paper as well as the public code repository associated with the work, allowing researchers and developers to quickly access reproducible implementations. The project serves as a centralized index that makes it easier for practitioners to explore the latest advances presented at major computer vision conferences. ...

Downloads: 7 This Week

Last Update: 2026-03-10
See Project
22

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks.

Downloads: 1 This Week

Last Update: 2025-09-28
See Project
23

JiT

PyTorch implementation of JiT

JiT is an open-source PyTorch implementation of a state-of-the-art image diffusion model designed around a minimalist yet powerful architecture for pixel-level generative modeling, based on the paper Back to Basics: Let Denoising Generative Models Denoise. Rather than predicting noise, JiT models directly predict clean image data, which the research suggests aligns better with the manifold structure of natural images and leads to stronger generative performance at high resolution. ...

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
24

Cosmos-RL

Cosmos-RL is a flexible and scalable Reinforcement Learning framework

...The framework supports multiple parallelism strategies, including tensor, pipeline, and data parallelism, allowing it to leverage large GPU clusters effectively. It is built with compatibility in mind, supporting popular model families such as LLaMA, Qwen, and diffusion-based world models, as well as integration with Hugging Face ecosystems. cosmos-rl also includes support for advanced RL algorithms, low-precision training, and fault-tolerant execution, making it suitable for large-scale production workloads.

Downloads: 0 This Week

Last Update: 2 days ago
See Project
25

LTX-2.3

Official Python inference and LoRA trainer package

...Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. This unified approach allows creators to generate complete multimedia sequences where motion, timing, and sound are aligned automatically. ...

Downloads: 92 This Week

Last Update: 2026-03-11
See Project