Showing 194 open source projects for "visual python"

View related business solutions
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    video-use

    video-use

    Edit videos with Claude Code

    Video Use is an open-source AI-powered video editing tool that allows users to transform raw footage into polished videos using natural language commands. Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 2
    ComfyUI-3D-Pack

    ComfyUI-3D-Pack

    An extensive node suite that enables ComfyUI to process 3D inputs

    ComfyUI-3D-Pack is an extension package for the ComfyUI visual AI workflow environment that enables users to generate and manipulate 3D assets using advanced machine learning techniques. ComfyUI itself is a node-based interface for designing and executing generative AI pipelines, and this extension expands its capabilities by introducing nodes specifically designed for working with three-dimensional data. The package allows the platform to process inputs such as meshes and UV textures and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    VideoRAG

    VideoRAG

    "VideoRAG: Chat with Your Videos

    VideoRAG is a retrieval-augmented generation (RAG) framework tailored for video content that enables AI systems to answer questions, summarize, and reason over long videos by combining visual embeddings with contextual search. The system works by first breaking video into clips, extracting visual and audio-textual features, and indexing them into embeddings, then using an LLM with a retriever to pull relevant segments on demand. When a user query is received, VideoRAG locates semantically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    DeepWiki Open

    DeepWiki Open

    AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

    DeepWiki Open is an open-source, AI-powered wiki generator that automatically creates fully navigable, richly structured wiki documentation for GitHub, GitLab, or Bitbucket repositories by combining code analysis, vector embeddings, retrieval-augmented generation (RAG), and visualization tools. Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Coze Studio

    Coze Studio

    An AI agent development platform with all-in-one visual tools

    Coze Studio is ByteDance’s open‑source, visual AI agent development platform. It offers no-code/low-code workflows to build, debug, and deploy conversational agents, integrating prompting, RAG-based knowledge bases, plugin systems, and workflow orchestration. Developed in Go (backend) and React/TypeScript (frontend), it uses a containerized microservices architecture suitable for enterprise deployment.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for...
    Downloads: 72 This Week
    Last Update:
    See Project
  • 8
    Langflow

    Langflow

    Low-code app builder for RAG and multi-agent AI applications

    Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    LISA is an open-source multimodal AI system designed to enable language models to perform pixel-level reasoning and segmentation tasks on images. The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    StarVector is a multimodal foundation model designed for generating Scalable Vector Graphics (SVG) from images or textual descriptions. The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    NVIDIA AI Blueprint

    NVIDIA AI Blueprint

    Suite of reference architectures for building GPU-accelerated vision

    ...The project is organized around real-time video intelligence, downstream analytics, and agentic offline processing. It supports workflows such as natural-language video search, visual question answering, long-video summarization, clip retrieval, verified alerts, and incident analysis. It is designed for technical users who need deployable reference architectures for smart spaces, warehouse automation, SOP validation, monitoring, and operational video analytics. The repository includes Python agent code, Docker Compose deployment configurations, skills, scripts, and a Next.js-based UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Elyra

    Elyra

    Elyra extends JupyterLab with an AI centric approach

    Elyra is a set of AI-centric extensions to JupyterLab Notebooks. The Elyra Getting Started Guide includes more details on these features. A version-specific summary of new features is located on the releases page.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    dtreeviz

    dtreeviz

    Python library for decision tree visualization & model interpretation

    ...The visualizations are inspired by an educational animation by R2D3; A visual introduction to machine learning. Please see How to visualize decision trees for deeper discussion of our decision tree visualization library and the visual design decisions we made.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    AstronRPA

    AstronRPA

    Agent-ready RPA suite with visual workflow automation tools engine

    Astron RPA is an enterprise-grade robotic process automation platform designed to help organizations and developers build automated workflows for desktop and web applications. It provides a visual workflow designer that supports low-code and no-code development, allowing users to create automation processes through a drag-and-drop interface instead of writing extensive code. It enables automation of common desktop software and browser-based tasks, making it suitable for repetitive business...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries,...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 19
    Watermark-Removal

    Watermark-Removal

    Machine learning image inpainting task that removes watermarks

    Watermark-Removal repository is a machine learning project focused on removing visible watermarks from digital images using deep learning and image inpainting techniques. The system analyzes an image containing a watermark and attempts to reconstruct the underlying visual content so that the watermark is removed while preserving the original appearance of the image. The project uses neural network models inspired by research in contextual attention and gated convolution, which are methods...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    A.I.G

    A.I.G

    Full-stack AI Red Teaming platform

    AI-Infra-Guard is a powerful open-source security platform from Tencent’s Zhuque Lab designed to assess the safety and resilience of AI infrastructures, codebases, and components through automated scanning and evaluation tools. It brings together AI infrastructure vulnerability scanning, MCP server risk analysis, and jailbreak evaluation into a unified workflow so that enterprises and individuals can identify critical security issues without relying on external services. Users can deploy it...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ManiSkill

    ManiSkill

    SAPIEN Manipulation Skill Framework

    ManiSkill is a benchmark platform for training and evaluating reinforcement learning agents on dexterous manipulation tasks using physics-based simulations. Developed by Hao Su Lab, it focuses on robotic manipulation with diverse, high-quality 3D tasks designed to challenge perception, control, and planning in robotics. ManiSkill provides both low-level control and visual observation spaces for realistic learning scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    GitDiagram

    GitDiagram

    AI tool that converts GitHub repositories into interactive diagrams

    GitDiagram is an open source web application designed to help developers quickly understand the structure and architecture of GitHub repositories by automatically generating interactive diagrams. It analyzes repository metadata such as the file tree and project documentation to build a visual representation of how different components of a project relate to one another. It uses an AI-powered pipeline to interpret repository structure and transform that information into system design diagrams...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB