Showing 366 open source projects for "visual"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    LangGraph Studio

    LangGraph Studio

    Desktop app for prototyping and debugging LangGraph applications

    LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications. With visual graphs and the ability to edit state, you can better understand agent workflows and iterate faster. LangGraph Studio integrates with LangSmith so you can collaborate with teammates to debug failure modes. While in Beta, LangGraph Studio is available for free to all LangSmith users on any plan tier. LangGraph Studio requires docker-compose version 2.22.0+ or higher. ...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 2
    Ollama Grid Search

    Ollama Grid Search

    A multi-platform desktop application to evaluate and compare LLM

    ...Instead of manually testing combinations, the tool performs grid search experiments by iterating across different models, prompt variations, and parameter configurations, allowing users to quickly identify optimal setups for specific tasks. It provides a visual interface where experiment results can be inspected, compared, and refined, making it especially useful for prompt engineering and benchmarking workflows. The system integrates directly with local or remote Ollama servers, enabling seamless access to models already deployed in a user’s environment. It also includes experiment logging and A/B testing capabilities, which allow users to compare outputs side by side and track performance metrics such as latency or token usage.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Notion AI Avatar

    Notion AI Avatar

    AI-powered online tool for making notion-style avatars.

    Notion AI Avatar is an AI-powered web application that allows users to generate custom avatars in the distinctive Notion-style illustration aesthetic through an intuitive and interactive interface. The project focuses on providing a simple yet expressive way to create personalized profile images by combining different visual elements such as facial features, accessories, and styles. It leverages modern web technologies and AI-assisted design techniques to produce visually consistent and appealing avatars without requiring design skills. The tool is particularly popular among users who want cohesive branding for digital platforms, especially those using Notion or similar productivity tools. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    ComfyUI-3D-Pack

    ComfyUI-3D-Pack

    An extensive node suite that enables ComfyUI to process 3D inputs

    ...It incorporates modern 3D generation technologies including neural radiance fields, Gaussian splatting, and other AI-driven reconstruction techniques. Through these nodes, users can convert images into 3D models, manipulate geometry, and experiment with generative 3D workflows inside the visual pipeline editor.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models answer structured questions about the environment to guide decision making. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    ...The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual segmentation outputs. This approach allows the system to identify objects or regions in images based on semantic descriptions, contextual reasoning, and world knowledge. The model integrates multimodal capabilities by combining language understanding with visual perception so that text instructions guide the segmentation process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    A.I.G

    A.I.G

    Full-stack AI Red Teaming platform

    ...Users can deploy it via Docker or scripts to get a modern web UI that guides them through tasks like scanning third-party frameworks for known CVEs and experimenting with prompt security against attack vectors. The tool provides both a visual interface and a comprehensive API, making integration with internal security systems or CI/CD pipelines practical for ongoing risk management.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    VMZ (Video Model Zoo)

    VMZ (Video Model Zoo)

    VMZ: Model Zoo for Video Modeling

    The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal integration strategies that influenced modern architectures like SlowFast and X3D.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    ...It supports labeling tasks across images and videos and enables developers to prepare training datasets for tasks such as object detection, segmentation, classification, tracking, and pose estimation. The tool is built with an interactive graphical interface that simplifies annotation workflows and allows users to draw and edit labels directly on visual data. It also supports a wide range of export formats compatible with popular machine learning pipelines, making it easier to integrate with training frameworks.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 12
    StoryGen Atelier

    StoryGen Atelier

    AI-assisted storyboard and video generation tool

    StoryGen Atelier is an advanced creative tool that blends AI with visual storytelling, making it possible to generate fully structured storyboards and stitched videos from text prompts without requiring manual art or animation skills. Users begin with natural language descriptions of their story or scene, and the system uses state-of-the-art large models to generate both the script and corresponding frames.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Rowboat

    Rowboat

    Open-source AI coworker, with memory

    Rowboat is an open-source, local-first AI automation and multi-agent development platform designed to help developers and knowledge workers create, orchestrate, and manage intelligent workflows with minimal boilerplate and maximum flexibility. It functions as both an AI-powered IDE and a CLI that lets you build multi-agent systems using natural language prompts, connect to MCP and agent tool servers, and integrate automations into everyday work tasks like summarizing emails or generating...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 14
    Lightpanda Browser

    Lightpanda Browser

    Lightpanda: the headless browser designed for AI and automation

    ...This design allows it to execute JavaScript and interact with web pages while avoiding the overhead associated with rendering images, fonts, and layout elements intended for visual display. The browser is implemented using the Zig programming language and integrates the V8 JavaScript engine to run modern web applications and scripts efficiently. Because it avoids graphical rendering and other heavy browser components, the system uses significantly less memory and launches almost instantly compared to conventional browsers such as Chrome.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 15
    emgucv

    emgucv

    Cross platform .Net wrapper to the OpenCV image processing library

    Emgu CV is a cross platform .Net wrapper to the OpenCV image processing library. Allowing OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio and Unity, it can run on Windows, Linux, Mac OS, iOS and Android.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    JimuReport

    JimuReport

    Open source drag-and-drop reporting and dashboard builder platform

    JimuReport is an open source data visualization and reporting platform designed to help developers and organizations build reports, dashboards, and large screen data displays through a visual interface. It provides an online report designer that uses an Excel-like editing experience, allowing users to construct reports with drag-and-drop components and cell-based layouts. It focuses on simplifying complex report development by enabling visual configuration instead of manual coding. JimuReport supports traditional report generation, print templates, and modern dashboard visualizations for business intelligence scenarios. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    CodeMirror MCP

    CodeMirror MCP

    CodeMirror extension to hook up a Model Context Provider (MCP)

    The codemirror-mcp project is a CodeMirror extension that integrates the Model Context Protocol (MCP) into the CodeMirror editor. This extension enhances the editor's capabilities by providing features such as autocompletion for resource mentions and prompt commands, as well as visual styling for these elements. It aims to streamline the user experience when working with MCP within the CodeMirror environment. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    dtreeviz

    dtreeviz

    Python library for decision tree visualization & model interpretation

    ...Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. The visualizations are inspired by an educational animation by R2D3; A visual introduction to machine learning. Please see How to visualize decision trees for deeper discussion of our decision tree visualization library and the visual design decisions we made.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Qwen3.5

    Qwen3.5

    Qwen3.5 is the large language model series developed by Qwen team

    ...Qwen3.5 builds on earlier Qwen generations by improving multilingual understanding, reasoning ability, and efficiency, while also introducing native multimodal capabilities that allow the model to work with both language and visual inputs. Architecturally, the system leverages modern large-scale training techniques and mixture-of-experts style efficiency so that very large parameter counts can be used while keeping inference practical.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 20
    VOID

    VOID

    Video Object and Interaction Deletion

    VOID is an advanced AI video processing system developed by Netflix that focuses on removing objects from videos while preserving the physical and visual realism of the surrounding environment. Unlike traditional inpainting methods that only erase pixels or simple artifacts, VOID models the full interaction dynamics between objects and their environment, including shadows, reflections, and even physical consequences such as movement or balance changes. Built on top of transformer-based architectures and fine-tuned for video inpainting tasks, the system uses interaction-aware mask conditioning to ensure temporal consistency across frames. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    opcode

    opcode

    A powerful GUI app and Toolkit for Claude Code

    opcode is an open source desktop application and toolkit designed to enhance the developer experience when working with Claude Code by providing a graphical interface and advanced workflow management tools. The project acts as a command center for AI-assisted programming, bridging the gap between command-line workflows and modern visual development environments. Built using the Tauri framework, Opcode enables developers to manage multiple Claude sessions, create custom agents, and track usage in a centralized interface. The platform is intended to make AI-assisted coding more intuitive by providing visual tools for monitoring agent activity, organizing projects, and reviewing development timelines. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ManiSkill

    ManiSkill

    SAPIEN Manipulation Skill Framework

    ...Developed by Hao Su Lab, it focuses on robotic manipulation with diverse, high-quality 3D tasks designed to challenge perception, control, and planning in robotics. ManiSkill provides both low-level control and visual observation spaces for realistic learning scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    RuoYi AI

    RuoYi AI

    Enterprise AI platform for building, deploying, and managing apps

    ...RuoYi AI includes built-in support for retrieval-augmented generation, enabling organizations to create secure, private knowledge bases with high-accuracy search and reasoning capabilities. It also offers visual workflow orchestration tools that allow users to design complex AI pipelines, automate tasks, and coordinate multi-agent systems for advanced decision-making scenarios. In addition to backend capabilities, RuoYi AI includes frontend components and administrative dashboards built with modern web technologies, making it a complete end-to-end solution.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB