Showing 533 open source projects for "visual python"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    GitDiagram

    GitDiagram

    AI tool that converts GitHub repositories into interactive diagrams

    GitDiagram is an open source web application designed to help developers quickly understand the structure and architecture of GitHub repositories by automatically generating interactive diagrams. It analyzes repository metadata such as the file tree and project documentation to build a visual representation of how different components of a project relate to one another. It uses an AI-powered pipeline to interpret repository structure and transform that information into system design diagrams...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    ...The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    FastVLM

    FastVLM

    This repository contains the official implementation of FastVLM

    FastVLM is an efficiency-focused vision-language modeling stack that introduces FastViTHD, a hybrid vision encoder engineered to emit fewer visual tokens and slash encoding time, especially for high-resolution images. Instead of elaborate pruning stages, the design trades off resolution and token count through input scaling, simplifying the pipeline while maintaining strong accuracy. Reported results highlight dramatic speedups in time-to-first-token and competitive quality versus...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Open3D

    Open3D

    A modern library for 3D data processing

    ...Open3D has been used in a number of published research projects and is actively deployed in the cloud. We welcome contributions from the open-source community. GCC 5.X and later on Linux. XCode 10+ and later on OS X 10.14+. Visual Studio 2019 and later on Windows.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 8
    PS2 Cover

    PS2 Cover

    PS2 Covers Collection

    PS2 Covers is a large-scale curated repository of PlayStation 2 game cover images designed to be used with emulators such as PCSX2 and DuckStation, providing a complete visual library for enhancing game collections. It organizes cover art by game serial identifiers, allowing automated systems to fetch the correct image for each title. The repository includes both standard and 3D-style covers, supporting different presentation preferences. It is widely used in emulator setups to create...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Windrecorder

    Windrecorder

    Windrecorder is a memory search app by records everything

    Windrecorder is an open-source personal memory search engine that continuously records on-screen activity in a highly optimized and storage-efficient format. It captures screen content locally and builds a searchable database using OCR and image understanding, allowing users to rewind and rediscover anything they have previously seen. The system indexes only meaningful visual changes, extracting text, browser data, and contextual information to improve search accuracy and reduce storage...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    UFO³

    UFO³

    Weaving the Digital Agent Galaxy

    UFO is an open-source framework developed by Microsoft for building intelligent agents that automate interactions with graphical user interfaces on the Windows operating system. The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    firerpa LAMDA

    firerpa LAMDA

    The most powerful Android RPA agent framework

    lamda is an Android RPA agent framework that provides visual remote desktop control and automation at scale, geared toward testing, automation validation, and device management. It exposes a clean UI to monitor and interact with connected devices and includes tooling to script actions reliably across apps and OS versions. The project emphasizes low-friction setup and powerful control primitives so teams can move from interactive validation to repeatable automation. A public wiki, releases,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    HunyuanWorld-1.0 is an open-source, simulation-capable 3D world generation model developed by Tencent Hunyuan that creates immersive, explorable, and interactive 3D environments from text or image inputs. It combines the strengths of video-based diversity and 3D-based geometric consistency through a novel framework using panoramic world proxies and semantically layered 3D mesh representations. This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    FramePack

    FramePack

    Lets make video diffusion practical

    FramePack explores compact representations for sequences of image frames, targeting tasks where many near-duplicate frames carry redundant information. The idea is to “pack” frames by detecting shared structure and storing differences efficiently, which can accelerate training or inference on video-like data. By reducing I/O and memory bandwidth, datasets become lighter to load while models still see the essential temporal variation. The repository demonstrates both packing and unpacking...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 16
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 17
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports...
    Downloads: 45 This Week
    Last Update:
    See Project
  • 18
    AdalFlow

    AdalFlow

    The library to build & auto-optimize LLM applications

    AdalFlow is a framework for building AI-powered automation workflows, enabling users to design and execute intelligent automation pipelines with minimal coding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Videomass

    Videomass

    Videomass is a free, open source and cross-platform GUI for FFmpeg

    Videomass is a free, open-source graphical interface for FFmpeg designed to make advanced video and audio processing accessible to both beginners and experienced users. Built in Python using wxPython, it provides a cross-platform environment for managing encoding, conversion, and editing tasks through a visual interface. The software supports multitasking operations, allowing users to process multiple media files simultaneously. It offers extensive configuration options while also providing presets to simplify common workflows. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    LongCat-Image

    LongCat-Image

    Foundation model for image generation

    LongCat-Image is an open-source foundation model for image generation and editing created by the LongCat team at Meituan, designed to deliver high-quality visual outputs while remaining efficient and accessible for developers and researchers. Rather than relying on massive parameter counts typical of many cutting-edge models, LongCat-Image achieves strong photorealism, stable structure, and accurate bilingual (Chinese and English) text rendering with a more compact ~6-billion parameter...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    Frontend Slides

    Frontend Slides

    Create beautiful slides on the web using Claude's frontend skills

    Frontend Slides is a lightweight tool that enables users to create visually appealing, animation-rich web presentations without requiring knowledge of CSS or JavaScript by leveraging a guided, interactive workflow. It operates on a “show, don’t tell” philosophy, generating visual previews of styles so users can select their preferred design rather than describing it abstractly. The system produces fully self-contained HTML presentations with inline CSS and JavaScript, eliminating the need...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Super Magic

    Super Magic

    All-in-one AI productivity platform with agents, workflows, and IM

    Magic is an open source all-in-one AI productivity platform designed to help organizations build, deploy, and scale AI-driven applications efficiently. It is not a single tool but a complete product ecosystem composed of multiple integrated systems that work together to enhance productivity across different business scenarios. Magic centers around a general-purpose AI agent system called Super Magic, which can autonomously understand tasks, plan actions, execute workflows, and perform error...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    HyperTools

    HyperTools

    A Python toolbox for gaining geometric insights

    HyperTools is a library for visualizing and manipulating high-dimensional data in Python. It is built on top of matplotlib (for plotting), seaborn (for plot styling), and scikit-learn (for data manipulation). Functions for plotting high-dimensional datasets in 2/3D. Static and animated plots. Simple API for customizing plot styles. Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ERAlchemy

    ERAlchemy

    Entity Relation Diagrams generation tool

    ERAlchemy is a tool that generates Entity-Relationship (ER) diagrams from databases or SQLAlchemy models and vice versa. It’s useful for database documentation, reverse engineering, and understanding complex schemas. ERAlchemy can export diagrams in formats like Graphviz and Mermaid, making it easy to include in reports or markdown files.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB