Showing 323 open source projects for "images"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting,...
    Downloads: 111 This Week
    Last Update:
    See Project
  • 2
    ComfyUI-HunyuanVideoWrapper

    ComfyUI-HunyuanVideoWrapper

    ComfyUI wrapper nodes for HunyuanVideo

    ...It supports prompt-based referencing of images, where placeholders in text correspond to connected inputs, allowing fine control over generation behavior. The project is particularly useful for creators experimenting with multimodal AI video synthesis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    ...It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Transformers

    Transformers

    State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

    ...These models support common tasks in different modalities. Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs by 54% Icon
    Cut Data Warehouse Costs by 54%

    Easily migrate from Snowflake, Redshift, or Databricks with free tools.

    BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.
    Try Free
  • 5
    RSS to Telegram Bot

    RSS to Telegram Bot

    A Telegram RSS bot that cares about your reading experience

    A Telegram RSS bot that cares about your reading experience.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Local File Organizer

    Local File Organizer

    An AI-powered file management tool that ensures privacy

    ...The project focuses on privacy-first file organization by performing all processing locally rather than sending data to external cloud services. It uses language and vision models to understand the contents of documents, images, and other file types so that files can be grouped intelligently according to their meaning or context. The system scans directories, extracts relevant information from files, and restructures folder hierarchies to make content easier to locate and manage. Through AI-driven analysis, the software can detect themes, topics, and metadata in files, allowing it to organize information in ways that traditional rule-based file managers cannot achieve. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including photorealistic, impressionist, anime, and minimalist aesthetics. Qwen-Image supports sophisticated editing tasks such as style transfer, object insertion and removal, detail enhancement, and even human pose manipulation, making it suitable for both professional and casual users. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    ...These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. By using containerized environments, developers can ensure that their applications run consistently across different Jetson platforms and JetPack versions. The repository also includes build tools and package management utilities that help automate the process of assembling machine learning environments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 10
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    FastKoko

    FastKoko

    Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

    FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple languages and voicepacks and allows phoneme based generation for more accurate pronunciation and prosody. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    fastdup

    fastdup

    An unsupervised and free tool for image and video dataset analysis

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Microsandbox

    Microsandbox

    Secure local-first microVM sandbox for running untrusted code fast

    ...Microsandbox is particularly geared toward AI agent workflows, offering integrations that enable automated systems to safely run generated code and commands. It also supports standard container images, making it compatible with existing development ecosystems and tooling.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    ...The project is particularly useful for digitizing historical documents, research papers, and scanned materials where traditional OCR often struggles. It also includes tools for processing batches of images or documents, enabling automated document digitization workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    ex-skill

    ex-skill

    Distill your ex into an AI Skill

    ex-skill is an experimental AI tooling project that allows users to transform personal memories, particularly past relationships, into interactive AI “skills” that replicate the communication style, personality, and behavioral patterns of a specific individual. The system works by ingesting various forms of personal data such as chat logs, social media content, photos, and user-provided descriptions, then structuring this information into a layered representation that combines memory and...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 18
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Lightly

    Lightly

    A python library for self-supervised learning on images

    A python library for self-supervised learning on images. We, at Lightly, are passionate engineers who want to make deep learning more efficient. That's why - together with our community - we want to popularize the use of self-supervised methods to understand and curate raw image data. Our solution can be applied before any data annotation step and the learned representations can be used to visualize and analyze datasets.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    TRELLIS.2

    TRELLIS.2

    Native and Compact Structured Latents for 3D Generation

    TRELLIS.2 is a cutting-edge open-source model and codebase for high-fidelity 3D asset generation from 2D images, developed to push forward the state of the art in image-to-3D generation. At its core is a novel sparse voxel structure called O-Voxel that jointly encodes both geometry and surface appearance, enabling reconstruction and generation of complex 3D shapes with arbitrary topology, open surfaces, and physically based rendering (PBR) textures.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 21
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    ...Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing conflicts in multimodal models: namely that the visual encoder has to serve both analysis (understanding) and synthesis (generation) roles. By splitting those pathways but keeping one unified core transformer, Janus maintains flexibility and achieves strong performance across tasks previously requiring distinct architectures. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Cog

    Cog

    Package and deploy machine learning models using Docker containers

    Cog is an open source tool designed to package machine learning models into standardized, production-ready containers. It simplifies the process of deploying models by automatically generating Docker images based on a simple configuration file, eliminating the need to manually write complex Dockerfiles. Developers can define the runtime environment, dependencies, and Python versions required for their models, allowing Cog to build a consistent container environment that follows best practices. Cog also resolves compatibility issues between frameworks and GPU libraries by automatically selecting compatible combinations of CUDA, cuDNN, and machine learning frameworks such as PyTorch or TensorFlow. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MedGemma

    MedGemma

    Collection of Gemma 3 variants that are trained for performance

    MedGemma is a collection of specialized open-source AI models created by Google as part of its Health AI Developer Foundations initiative, built on the Gemma 3 family of transformer models and trained for medical text and image comprehension tasks that help accelerate the development of healthcare-focused AI applications. It includes multiple variants such as a 4 billion-parameter multimodal model that can process both medical images and text and a 27 billion-parameter text-only (and multimodal) model that offers deeper clinical reasoning and understanding at higher capacity, making it suitable for complex tasks like medical question answering, summarization of clinical notes, or generating reports from radiology images. The multimodal versions pair a SigLIP-based image encoder pre-trained on diverse de-identified medical imaging data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of specific image regions or objects. ...
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo