Showing 345 open source projects for "visual\"

View related business solutions
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    AWS Toolkit for Visual Studio Code

    AWS Toolkit for Visual Studio Code

    Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.

    The AWS Toolkit extension for Visual Studio Code enables you to interact with Amazon Web Services (AWS). Try the AWS Code Sample Catalog to start coding with the AWS SDK. The AWS Explorer provides access to the AWS services that you can work with when using the Toolkit. To see the AWS Explorer, choose the AWS icon in the Activity bar. The Developer Tools panel is a section for developer-focused tooling curated for working in an IDE.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    DVC Extension for Visual Studio Code

    DVC Extension for Visual Studio Code

    https://github.com/iterative/vscode-dvc

    A Visual Studio Code extension that integrates Data Version Control (DVC) into the development environment, enhancing reproducibility and collaboration for machine learning projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    FaceFusion

    FaceFusion

    Industry leading face manipulation platform

    FaceFusion is an open-source face swapping and facial enhancement toolkit designed for high-quality video and image manipulation workflows. The project enables users to replace faces in images or videos while maintaining temporal consistency and visual realism. It integrates modern deep learning models for face detection, alignment, and blending to produce smoother results than traditional approaches. FaceFusion is built with a modular pipeline that allows users to customize processing steps and optimize performance for different hardware environments. The tool is often used in content creation, visual effects experimentation, and research into generative media. ...
    Downloads: 303 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    UI-TARS Desktop

    UI-TARS Desktop

    A GUI Agent app based on UI-TARS to control your computer using AI

    UI-TARS Desktop is a graphical user interface (GUI) agent application that leverages the UI-TARS vision-language model to enable natural language control of computers. This cross-platform tool supports both Windows and macOS, allowing users to perform tasks through intuitive commands. Key features include screenshot-based visual recognition, precise mouse and keyboard control, and real-time feedback on actions. Provides immediate responses and visual feedback on actions performed. The application facilitates seamless interaction with the computer, enhancing user experience by simplifying complex operations into straightforward language instructions. Leverages advanced AI to bridge the gap between visual elements and language commands. ...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 6
    Next AI Draw.io

    Next AI Draw.io

    A next.js web application that integrates AI capabilities with draw.io

    Next AI Draw.io is an AI-enhanced diagramming application that integrates generative intelligence into the familiar draw.io-style visual workflow. The project aims to help users create diagrams, flowcharts, and structured visual content more efficiently by leveraging AI-assisted generation and editing capabilities. It combines modern web technologies with diagram automation features to reduce the manual effort typically required in visual design tools. The system is intended for developers, product teams, and technical planners who need rapid diagram creation within collaborative environments. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Pixelle-Video

    Pixelle-Video

    AI Fully Automated Short Video Engine

    ...The system emphasizes modularity, allowing developers to plug in different models or processing steps depending on the use case. It can be used for tasks such as content generation, video editing, or visual storytelling. Overall, Pixelle-Video provides a flexible environment for building AI-powered video generation and processing workflows.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 8
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    n8n

    n8n

    Free and source-available fair-code licensed workflow automation tool

    n8n is an extendable workflow automation tool. With a fair-code distribution model, n8n will always have visible source code, be available to self-host, and allow you to add your own custom functions, logic and apps. n8n's node-based approach makes it highly versatile, enabling you to connect anything to everything. n8n has 200+ different nodes to automate workflows.
    Downloads: 861 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    ClawX

    ClawX

    Desktop app that provides a graphical interface for OpenClaw AI

    ClawX is a cross-platform desktop application that provides a graphical user interface for OpenClaw AI agents, transforming complex command-line orchestration into an accessible visual experience. Built with Electron, React, and TypeScript, the software embeds the OpenClaw runtime directly into the application to deliver a battery-included setup without requiring separate installations. The platform focuses on usability by offering a guided setup wizard, visual configuration panels, and real-time validation, enabling users to deploy AI agents without terminal expertise. ...
    Downloads: 120 This Week
    Last Update:
    See Project
  • 11
    OpenClaw Office

    OpenClaw Office

    OpenClaw Office is the visual monitoring and management frontend

    ...Users can observe communication flows between agents through visual connections, track token usage and operational costs, and analyze performance through integrated dashboards and charts. The system also includes live chat capabilities, allowing users to monitor conversations and tool calls as they occur.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    DESIGN.md

    DESIGN.md

    A format specification for describing a visual identity

    design.md is an open specification created by Google Labs that defines a standardized way to describe design systems for AI coding agents. It allows developers to encode visual identity elements such as colors, typography, spacing, and components in a structured format. The file combines machine-readable design tokens with human-readable explanations, enabling agents to generate consistent user interfaces aligned with a brand. By providing persistent design context, it eliminates the need to repeatedly describe styling requirements to AI tools. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    Jaaz

    Jaaz

    Open source multimodal creative AI assistant with infinite canvas tool

    Jaaz is an open source multimodal creative assistant designed to help users generate and organize visual media using artificial intelligence. It functions as a creative workspace where images, videos, and visual storyboards can be produced and arranged on an infinite canvas environment. It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and cloud-based inference systems, enabling flexible creative workflows. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    graphify

    graphify

    AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw)

    ...The architecture emphasizes flexibility, enabling users to customize how data is mapped and displayed. It may also include analytical features to explore patterns, clusters, or anomalies within the graph. Overall, graphify serves as a bridge between raw data and visual insight.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    Flowise

    Flowise

    Drag & drop UI to build your customized LLM flow

    Open source UI visual tool to build your customized LLM flow using LangchainJS, written in Node Typescript/Javascript. Conversational agent for a chat model which utilizes chat-specific prompts and buffer memory. Open source is the core of Flowise, and it will always be free for commercial and personal usage. Flowise support different environment variables to configure your instance.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 16
    InternGPT

    InternGPT

    Open source demo platform where you can easily showcase your AI models

    InternGPT is an open-source multimodal AI framework designed to extend large language models beyond text interactions into visual reasoning and image manipulation tasks. The system integrates conversational AI with computer vision models so users can interact with images, videos, and visual environments through natural language instructions. Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. This integration empowers non-programmers and rapid-iteration teams to harness the performance of LTX-Video while maintaining the clarity and flexibility of a dataflow graph model. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 18
    Kimi K2.5

    Kimi K2.5

    Moonshot's most powerful AI model

    Kimi K2.5 is Moonshot AI’s open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision and text tokens. Based on a 1T-parameter Mixture-of-Experts (MoE) architecture with 32B activated parameters, it integrates advanced language reasoning with strong visual understanding. K2.5 supports both “Thinking” and “Instant” modes, enabling either deep step-by-step reasoning or low-latency responses depending on the task. Designed for agentic workflows, it features an Agent Swarm mechanism that decomposes complex problems into coordinated sub-agents executing in parallel. With a 256K context length and MoonViT vision encoder, the model excels across reasoning, coding, long-context comprehension, image, and video benchmarks. ...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 19
    OpenClaw

    OpenClaw

    Your own personal AI assistant. Any OS. Any Platform.

    OpenClaw (formerly Clawdbot/Moltbot) is an open-source, self-hosted autonomous AI assistant designed to run on user-controlled hardware and bridge conversational natural language with real-world task execution, effectively acting as a proactive digital assistant rather than a reactive chatbot. It lets you send instructions through familiar messaging platforms like WhatsApp, Telegram, Discord, Slack, Signal, iMessage, and more, and then interprets those instructions to carry out actions such...
    Downloads: 1,149 This Week
    Last Update:
    See Project
  • 20
    InternLM-XComposer-2.5

    InternLM-XComposer-2.5

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

    ...It incorporates visual understanding modules that allow the model to analyze images and integrate them into coherent narrative outputs. The framework also supports tasks such as image captioning, multimodal reasoning, and layout generation for structured visual documents. By combining language generation with visual composition capabilities, the system enables new forms of content creation that integrate written explanations with automatically generated visual components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    InternVL

    InternVL

    A Pioneering Open-Source Alternative to GPT-4o

    InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a wide variety of tasks, including visual perception, image classification, and cross-modal retrieval between images and text. It can also be connected to language models to enable conversational interfaces that understand images, videos, and other visual content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Playwright MCP

    Playwright MCP

    Playwright MCP server

    An MCP server developed by Microsoft that offers browser automation capabilities using Playwright, enabling LLMs to interact with web pages through structured accessibility snapshots without relying on visual data. ​
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    ClaudeBar

    ClaudeBar

    A macOS menu bar application that monitors AI coding assistant usage

    ...Rather than constantly running CLI commands or navigating web dashboards, users can glance at their quota statistics for services like Claude, Codex, Gemini, GitHub Copilot, and Antigravity directly from the menu bar. The application provides real-time tracking of session, weekly, and model-specific usage percentages, using visual indicators such as color-coded progress bars to communicate when quotas are healthy, nearing limits, or depleted. It includes options to enable or disable monitoring for individual providers, supports multiple visual themes (including dark mode and a festive theme), and refreshes data at configurable intervals so users always have up-to-date information.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 25
    GalTransl

    GalTransl

    Automated translation solution for visual novels

    GalTransl is an automated translation system specifically designed for visual novels, particularly those in the “galgame” genre, leveraging large language models to streamline and enhance the translation process. It integrates support for multiple advanced LLM providers such as GPT-4, Claude, DeepSeek, and other models, enabling high-quality, context-aware translations that go beyond traditional machine translation approaches.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB