Showing 906 open source projects for "automatic1111-stable-diffusion"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    NVIDIA Model Optimizer

    NVIDIA Model Optimizer

    A unified library of SOTA model optimization techniques

    ...The library is designed to reduce model size and computational requirements while maintaining accuracy, making it particularly valuable for deploying large models in production environments. It supports a wide range of model types, including large language models, diffusion models, and vision-language models, and integrates with deployment frameworks such as TensorRT and vLLM. By providing standardized workflows and APIs, it enables developers to experiment with different optimization strategies and select the best approach for their use case.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Auto-Photoshop-StableDiffusion-Plugin

    Auto-Photoshop-StableDiffusion-Plugin

    Plug-in that makes it easy to generate stable diffusion images

    Auto-Photoshop-StableDiffusion-Plugin is an open-source extension that seamlessly integrates Stable Diffusion image generation directly into Adobe Photoshop, allowing artists and designers to generate, edit, and refine AI-generated imagery without leaving the Photoshop environment. It bridges Photoshop with popular diffusion backends like AUTOMATIC1111 or ComfyUI, effectively embedding powerful generative tools into a familiar creative workflow so users can apply AI creation to layers, selections, and masks while retaining Photoshop’s full editing capabilities. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 4
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong instruction-following capabilities and richer motion outputs compared to existing open-source models. The training strategy for the HY-Motion series includes extensive pre-training on thousands of hours of varied motion data, fine-tuning on curated high-quality datasets, and reinforcement learning with human feedback, which improves both the plausibility and adaptability of generated motion sequences.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    JiT

    JiT

    PyTorch implementation of JiT

    JiT is an open-source PyTorch implementation of a state-of-the-art image diffusion model designed around a minimalist yet powerful architecture for pixel-level generative modeling, based on the paper Back to Basics: Let Denoising Generative Models Denoise. Rather than predicting noise, JiT models directly predict clean image data, which the research suggests aligns better with the manifold structure of natural images and leads to stronger generative performance at high resolution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Lyra 2

    Lyra 2

    Project Lyra: Open Generative 3D World Models

    The Lyra 2 project is a research-driven framework developed by NVIDIA that focuses on building open generative 3D world models using advanced diffusion-based techniques. It enables the creation of fully explorable 3D environments from minimal inputs such as a single image or video, leveraging self-distillation methods to generate consistent spatial representations. The system evolves across versions, with newer iterations introducing long-horizon generation and improved 3D consistency across frames. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    ...Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. This unified approach allows creators to generate complete multimedia sequences where motion, timing, and sound are aligned automatically. ...
    Downloads: 200 This Week
    Last Update:
    See Project
  • 8
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    ...The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each speaker. It includes advanced control over generation parameters like attention backend, diffusion steps, sampling temperature, guidance scale, and quantization settings, allowing users to tune the trade-offs between quality, VRAM usage, and speed. The project also introduces first-class LoRA support, making it possible to fine-tune and load custom LoRA adapters that modify voice identity or style while keeping the base VibeVoice model intact.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Cosmos-RL

    Cosmos-RL

    Cosmos-RL is a flexible and scalable Reinforcement Learning framework

    ...The framework supports multiple parallelism strategies, including tensor, pipeline, and data parallelism, allowing it to leverage large GPU clusters effectively. It is built with compatibility in mind, supporting popular model families such as LLaMA, Qwen, and diffusion-based world models, as well as integration with Hugging Face ecosystems. cosmos-rl also includes support for advanced RL algorithms, low-precision training, and fault-tolerant execution, making it suitable for large-scale production workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 10
    InstantCharacter

    InstantCharacter

    Personalize Any Characters with a Scalable Diffusion Transformer

    InstantCharacter is a tuning-free diffusion transformer framework created by Tencent Hunyuan / InstantX team, which enables generating images of a specific character (subject) from a single reference image, preserving identity and character features. Uses adapters, so full fine-tuning of the base model is not required. Demo scripts and pipeline API (via infer_demo.py, pipeline.py) included.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    NExT-GPT

    NExT-GPT

    Code and models for ICML 2024 paper, NExT-GPT

    ...Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different sensory formats and generate responses in different media types. This architecture allows the model to convert between modalities, such as generating images from text descriptions or producing audio or video outputs based on textual prompts. The project also introduces instruction-tuning strategies that enable the model to perform complex multimodal reasoning and generation tasks with minimal additional parameters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Oasis

    Oasis

    Inference script for Oasis 500M

    Open-Oasis provides inference code and released weights for Oasis 500M, an interactive world model that generates gameplay frames conditioned on user keyboard input. Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time. The project focuses on enabling action-conditional frame generation so developers can experiment with interactive, model-generated environments rather than static video generation alone. Because it’s an inference-focused repository, it’s especially useful as a practical reference for running the model, wiring inputs, and producing the autoregressive sequence of gameplay frames. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    HunyuanVideo-Avatar

    HunyuanVideo-Avatar

    Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

    HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces, enabling multiple characters to be animated in a scene. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SwarmUI

    SwarmUI

    Modular AI image and video generation web UI with extensible tools

    SwarmUI is a modular web-based user interface designed for AI-driven image generation, with a strong focus on usability, performance, and extensibility. It serves as a unified environment for working with multiple AI models, including Stable Diffusion and newer image and video generation systems, allowing users to create and manage outputs through a browser interface. SwarmUI is built to accommodate both beginners and advanced users by offering a simple “Generate” interface alongside more advanced workflow tools that expose deeper configuration options. It integrates with underlying systems like node-based workflows, enabling flexible and customizable pipelines for complex generation tasks. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    SD.Next

    SD.Next

    All-in-one WebUI for AI generative image and video creation

    SD.Next is an all-in-one web user interface for generative image creation that expands beyond basic Stable Diffusion workflows to cover broader image and video generation, captioning, and processing tasks. It is designed as a power-user environment where model management, generation features, and workflow controls are centralized in a single UI rather than spread across separate scripts and utilities. The project emphasizes broad model support and includes mechanisms for discovering, downloading, and configuring models through integrated tooling, lowering the setup burden for experimentation. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    A One-click installer for Windows: (Python 3.11, Cuda 11.8, Torch 2.1.2) Repository for integration with the StableProjectorz, a free AI-texturing tool. https://stableprojectorz.com Our Discord server: https://discord.gg/aWbnX2qan2 supports float16 and int32 optimizations
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    MegaTTS 3

    MegaTTS 3

    Official PyTorch Implementation

    MegaTTS3 is an open-source text-to-speech (TTS) and voice-cloning system from ByteDance that aims to deliver high-quality, expressive speech synthesis, including zero-shot voice cloning of previously unseen speakers. Its backbone is a lightweight diffusion-transformer (on the order of ~0.45 B parameters), which enables efficient inference while still producing high-fidelity audio. Given a reference audio sample (and corresponding latent representation), MegaTTS3 can generate speech in the style and voice timbre of that speaker — useful for personalized TTS, voice-overs, dubbing, or multi-speaker applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Grounded-Segment-Anything

    Grounded-Segment-Anything

    Marrying Grounding DINO with Segment Anything & Stable Diffusion

    Grounded-Segment-Anything is a research-oriented project that combines powerful open-set object detection with pixel-level segmentation and subsequent creative workflows, effectively enabling detection, segmentation, and high-level vision tasks guided by free-form text prompts. The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    FLUX.1

    FLUX.1

    Official inference repo for FLUX.1 models

    FLUX.1 repository contains inference code and tooling for the FLUX.1 text-to-image diffusion models, enabling developers and researchers to generate and edit images from natural-language prompts using open-weight versions of the model on their own hardware or within custom applications. The project is part of a larger family of FLUX models developed by Black Forest Labs, designed to produce high-quality, detailed visuals from text descriptions with competitive prompt adherence and artistic fidelity. ...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 20
    DreamO

    DreamO

    A Unified Framework for Image Customization

    DreamO is a unified, open-source framework from ByteDance for advanced image customization and generation that consolidates multiple “image manipulation” tasks into a single system, rather than requiring separate specialized models. Built on a diffusion-transformer (DiT) backbone, it supports a diverse set of tasks — including identity preservation, virtual “try-on” (e.g. clothing, accessories), style transfer, IP adaptation (objects/characters), and layout/condition-aware customizations — all handled within the same unified architecture. DreamO’s design introduces a feature routing constraint that helps disentangle different control conditions (like identity, style, clothing) when more than one is specified, which significantly reduces conflicts and artifacts when combining controls. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    Step1X-Edit is a state-of-the-art open-source image editing model/framework that uses a multimodal large language model (LLM) together with a diffusion-based image decoder to let users edit images simply via natural-language instructions plus a reference image. You supply an existing image and a textual command — e.g. “add a ruby pendant on the girl’s neck” or “make the background a sunset over mountains” — and the model interprets the instruction, computes a latent embedding combining the image content and user intent, then decodes a new image implementing the edit. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Flow Matching

    Flow Matching

    A PyTorch library for implementing flow matching algorithms

    flow_matching is a PyTorch library implementing flow matching algorithms in both continuous and discrete settings, enabling generative modeling via matching vector fields rather than diffusion. The underlying idea is to parameterize a flow (a time-dependent vector field) that transports samples from a simple base distribution to a target distribution, and train via matching of flows without requiring score estimation or noisy corruption—this can lead to more efficient or stable generative training. The library supports both continuous-time flows (via differential equations) and discrete-time analogues, giving flexibility in design and tradeoffs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Step1X-3D

    Step1X-3D

    High-Fidelity and Controllable Generation of Textured 3D Assets

    ...It combines a hybrid architecture: a geometry generation stage using a VAE-DiT model to output a watertight 3D representation (e.g. TSDF surface), and a texture synthesis stage that conditions on geometry and optionally reference input (or prompts) to produce view-consistent textures using a diffusion-based texture module. The result is fully 3D assets — meshes + textures — which can be rendered from any viewpoint, textured consistently, and used in 3D applications. To achieve this, the project includes a massive curated dataset: among more than 5 million candidate 3D assets, it filters and standardizes to produce a high-quality 2 million–asset subset suitable for training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible motion and visuals. The model handles bilingual input (e.g. English and Chinese) thanks to dual encoders, and supports end-to-end text-to-video generation without requiring external assets. Its training and generation pipeline includes techniques like flow-matching, full 3D attention for temporal consistency, and fine-tuning approaches (e.g. video-based DPO) to improve fidelity and reduce artifacts. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Linfa

    Linfa

    A Rust machine learning framework

    linfa aims to provide a comprehensive toolkit to build Machine Learning applications with Rust. Kin in spirit to Python's scikit-learn, it focuses on common preprocessing tasks and classical ML algorithms for your everyday ML tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB