Showing 613 open source projects for "z-image"

View related business solutions
  • Turn more customers into advocates. Icon
    Turn more customers into advocates.

    Fight skyrocketing paid media costs by turning your customers into a primary vehicle for acquisition, awareness, and activation with Extole.

    The platform's advanced capabilities ensure companies get the most out of their referral programs. Leverage custom events, profiles, and attributes to enable dynamic, audience-specific referral experiences. Use first-party data to tailor customer segment messaging, rewards, and engagement strategies. Use our flexible APIs to build management capabilities and consumer experiences–headlessly or hybrid. We have all the tools you need to build scalable, secure, and high-performing referral programs.
    Learn More
  • Repair-CRM Icon
    Repair-CRM

    For small companies that repair and maintenance customer machines

    All-In-One Solution with an Online Booking portal for automating scheduling & dispatching to ditch paperwork and improve the productivity of your technicians!
    Learn More
  • 1
    Gemini Next Chat

    Gemini Next Chat

    Deploy your private Gemini application for free with one click

    Gemini Next Chat is an open-source web application that allows you to deploy your own private chat interface powered by Google’s Gemini models (e.g., Gemini 1.5, Gemini 2.0, etc.). It is built with Next.js/TypeScript and targets developers and hobbyists who want a self-hosted solution for interacting with advanced multimodal models (text, image, voice). It supports features like image recognition, voice-based conversation, plugins (web search, ArXiv search, weather, etc.), and client apps (tray app) for greater convenience. The project emphasizes “one-click” deployment, aiming to make it easy to spin up a custom chat front end without deep infra-setup. It’s licensed under MIT and has an active community of contributors; documentation and release notes note support for newer features like mixed image+text generation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SageMaker Training Toolkit

    SageMaker Training Toolkit

    Train machine learning models within Docker containers

    ...The SageMaker Training Toolkit can be easily added to any Docker container, making it compatible with SageMaker for training models. If you use a prebuilt SageMaker Docker image for training, this library may already be included. Write a training script (eg. train.py). Define a container with a Dockerfile that includes the training script and any dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    InstantCharacter

    InstantCharacter

    Personalize Any Characters with a Scalable Diffusion Transformer

    InstantCharacter is a tuning-free diffusion transformer framework created by Tencent Hunyuan / InstantX team, which enables generating images of a specific character (subject) from a single reference image, preserving identity and character features. Uses adapters, so full fine-tuning of the base model is not required. Demo scripts and pipeline API (via infer_demo.py, pipeline.py) included. It works by adapting a base image generation model with a lightweight adapter so that you can produce character-preserving generations in various downstream tasks (e.g. changing pose, clothing, scene) without needing full model fine-tuning. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 5
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. The model supports multiple backbone architectures, including Vision Transformers (ViT), and can handle larger image resolutions with improved stability during training. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    ...It also supports inference tools for forwarding image + prompt through the model to produce text output. DeepSeek-VL is a predecessor to their newer VL2 model, and presumably shares core design philosophy but with earlier scaling, fewer enhancements, or capability tradeoffs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Imagen - Pytorch

    Imagen - Pytorch

    Implementation of Imagen, Google's Text-to-Image Neural Network

    Implementation of Imagen, Google's Text-to-Image Neural Network that beats DALL-E2, in Pytorch. It is the new SOTA for text-to-image synthesis. Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pre-trained T5 model (attention network). It also contains dynamic clipping for improved classifier-free guidance, noise level conditioning, and a memory-efficient unit design.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    DreamO

    DreamO

    A Unified Framework for Image Customization

    DreamO is a unified, open-source framework from ByteDance for advanced image customization and generation that consolidates multiple “image manipulation” tasks into a single system, rather than requiring separate specialized models. Built on a diffusion-transformer (DiT) backbone, it supports a diverse set of tasks — including identity preservation, virtual “try-on” (e.g. clothing, accessories), style transfer, IP adaptation (objects/characters), and layout/condition-aware customizations — all handled within the same unified architecture. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    MATLAB Deep Learning Model Hub

    MATLAB Deep Learning Model Hub

    Discover pretrained models for deep learning in MATLAB

    Discover pre-trained models for deep learning in MATLAB. Pretrained image classification networks have already learned to extract powerful and informative features from natural images. Use them as a starting point to learn a new task using transfer learning. Inputs are RGB images, the output is the predicted label and score.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 10
    AutoGluon

    AutoGluon

    AutoGluon: AutoML for Image, Text, and Tabular Data

    ...Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case. AutoGluon is modularized into sub-modules specialized for tabular, text, or image data. You can reduce the number of dependencies required by solely installing a specific sub-module via: python3 -m pip install <submodule>.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Generative AI for Beginners (Version 3)

    Generative AI for Beginners (Version 3)

    21 Lessons, Get Started Building with Generative AI

    ...Lessons are split into “Learn” modules for core concepts and “Build” modules with hands-on code in Python and TypeScript, so you can jump in at any point that matches your goals. The course covers everything from model selection, prompt engineering, and chat/text/image app patterns to secure development practices and UX for AI. It also walks through modern application techniques such as function calling, RAG with vector databases, working with open source models, agents, fine-tuning, and using SLMs. Each lesson includes a short video, a written guide, runnable samples for Azure OpenAI, the GitHub Marketplace Model Catalog, and the OpenAI API, plus a “Keep Learning” section for deeper study.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    MMDetection

    MMDetection

    An open source object detection toolbox based on PyTorch

    ...It stems from the codebase developed by the MMDet team, who won the COCO Detection Challenge in 2018. Since that win this toolbox has continuously been developed and improved. MMDetection detects various objects within a given image with high efficiency. Its training speed is comparable or even faster than those of other codebases like Detectron2 and SimpleDet. It supports multiple detection frameworks right out of the box, as well as various backbones and methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...
    Downloads: 118 This Week
    Last Update:
    See Project
  • 14
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Transformers

    Transformers

    State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

    ...Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    ...It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds state-of-the-art performance in many multimodal benchmarks, particularly spoken language understanding, audio reasoning, image/video understanding, etc. Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    FramePack

    FramePack

    Lets make video diffusion practical

    ...The repository demonstrates both packing and unpacking steps, making it straightforward to integrate into preprocessing pipelines. It’s useful for diffusion and generative models that learn from sequential image datasets, as well as classical pipelines that batch many related frames. With a simple API and examples, it invites experimentation on tradeoffs between compression, fidelity, and speed.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    ...It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    AI Logo Generator

    AI Logo Generator

    A free + OSS logo generator powered by Flux on Together AI

    AI Logo Generator is an open-source AI logo generator that lets you create professional-looking logos in seconds from a simple text prompt. It uses the Flux Pro 1.1 model hosted on Together AI to generate logos, so the heavy lifting is done by a state-of-the-art image model while the app focuses on UX and workflow. The project is built with Next.js and TypeScript, and it uses shadcn/ui plus Tailwind CSS for a modern, responsive interface that feels like a polished SaaS product rather than a demo. It integrates Clerk for authentication so users can sign in, save their logo history (planned via a dashboard), and potentially manage usage tied to their own API key. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Open-Sora

    Open-Sora

    Open-Sora: Democratizing Efficient Video Production for All

    Open-Sora is an open-source initiative aimed at democratizing high-quality video production. It offers a user-friendly platform that simplifies the complexities of video generation, making advanced video techniques accessible to everyone. The project embraces open-source principles, fostering creativity and innovation in content creation. Open-Sora provides tools, models, and resources to create high-quality videos, aiming to lower the entry barrier for video production and support diverse...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 21
    marqo

    marqo

    Tensor search for humans

    ...Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images. It can seamlessly handle image-to-image, image-to-text and text-to-image search and analytics. Marqo adapts and stores your data in a fully schemaless manner. It combines tensor search with a query DSL that provides efficient pre-filtering. Tensor search allows you to go beyond keyword matching and search based on the meaning of text, images and other unstructured data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ncnn

    ncnn

    High-performance neural network inference framework for mobile

    ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    x-unet

    x-unet

    Implementation of a U-net complete with efficient attention

    Implementation of a U-net complete with efficient attention as well as the latest research findings. For 3d (video or CT / MRI scans).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Lightweight' GAN

    Lightweight' GAN

    Implementation of 'lightweight' GAN, proposed in ICLR 2021

    ...You can test and see how your images will be augmented before they pass into a neural network (if you use augmentation). The general recommendation is to use suitable augs for your data and as many as possible, then after some time of training disable the most destructive (for image) augs. You can turn on automatic mixed precision with one flag --amp. You should expect it to be 33% faster and save up to 40% memory. Aim is an open-source experiment tracker that logs your training runs, and enables a beautiful UI to compare them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of specific image regions or objects. The model is trained to balance imperceptibility, ensuring minimal visual distortion, with robustness against transformations and edits such as cropping or motion.
    Downloads: 0 This Week
    Last Update:
    See Project