Showing 366 open source projects for "visual"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    swark.io

    swark.io

    Create architecture diagrams from code automatically using LLMs

    Swark is an open-source developer tool and Visual Studio Code extension that automatically generates software architecture diagrams directly from source code using large language models. The project aims to help developers quickly understand complex codebases by analyzing repositories and producing visual diagrams that represent system architecture, dependencies, and component relationships.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    ...The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into structured vector representations, enabling high-quality vectorization and design generation. This approach allows StarVector to create scalable graphics that maintain visual quality regardless of resolution, which is especially useful for design tools and illustration workflows. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    FastVLM

    FastVLM

    This repository contains the official implementation of FastVLM

    ...The repository documents model variants, showcases head-to-head numbers against known baselines, and explains how the encoder integrates with common LLM backbones. Apple’s research brief frames FastVLM as targeting real-time or latency-sensitive scenarios, where lowering visual token pressure is critical to interactive UX. In short, it’s a practical recipe to make VLMs fast without exotic token-selection heuristics.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    OpenPromptStudio

    OpenPromptStudio

    Visual editor for AI prompts with translation, categories, and tools

    OpenPromptStudio is an open source visual editor designed to help users create, organize, and manage prompts for AI image generation tools. It focuses on improving the workflow for building prompts by turning them into structured, visual components that are easier to edit and rearrange. It supports the creation and classification of prompt segments, allowing users to organize them into different types such as styles, quality modifiers, commands, or general prompt elements. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    ChainForge

    ChainForge

    An open-source visual programming environment

    ChainForge is an open-source visual programming environment designed to help developers systematically test, compare, and evaluate prompts and outputs across multiple large language models in a structured and scalable way. Instead of relying on isolated prompt experimentation, it introduces a dataflow-based interface that allows users to create complex prompt pipelines and evaluate them across different models, parameters, and datasets simultaneously.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ClaudeBar

    ClaudeBar

    A macOS menu bar application that monitors AI coding assistant usage

    ...Rather than constantly running CLI commands or navigating web dashboards, users can glance at their quota statistics for services like Claude, Codex, Gemini, GitHub Copilot, and Antigravity directly from the menu bar. The application provides real-time tracking of session, weekly, and model-specific usage percentages, using visual indicators such as color-coded progress bars to communicate when quotas are healthy, nearing limits, or depleted. It includes options to enable or disable monitoring for individual providers, supports multiple visual themes (including dark mode and a festive theme), and refreshes data at configurable intervals so users always have up-to-date information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    VideoRAG

    VideoRAG

    "VideoRAG: Chat with Your Videos

    VideoRAG is a retrieval-augmented generation (RAG) framework tailored for video content that enables AI systems to answer questions, summarize, and reason over long videos by combining visual embeddings with contextual search. The system works by first breaking video into clips, extracting visual and audio-textual features, and indexing them into embeddings, then using an LLM with a retriever to pull relevant segments on demand. When a user query is received, VideoRAG locates semantically relevant moments in the video using the embedding index, retrieves associated clips or transcripts, and feeds them to a generative model to produce accurate, grounded answers or summaries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    LandPPT

    LandPPT

    An LLM-based presentation generation platform

    ...The application integrates multiple AI models from providers such as OpenAI, Anthropic, Google, and locally hosted models to generate text, images, and structured presentation layouts. It also includes template systems and style options that allow presentations to be customized for different industries, visual themes, or storytelling formats.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    video-use

    video-use

    Edit videos with Claude Code

    ...Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of content types, including interviews, tutorials, montages, and talking-head videos. By combining structured text representations with on-demand visual previews, it minimizes processing overhead while maintaining high-quality results. Overall, Video Use reimagines video editing as an AI-driven, conversational workflow.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...
    Downloads: 77 This Week
    Last Update:
    See Project
  • 14
    AliceVision

    AliceVision

    3D Computer Vision Framework

    ...The framework is built with a strong emphasis on research-grade algorithms while maintaining the robustness required for production environments, making it suitable for industries such as visual effects, cultural heritage preservation, and robotics. AliceVision is modular, enabling developers to use individual components or customize the pipeline for specific workflows, including panorama stitching and camera tracking. It integrates with tools like Meshroom, which offers a graphical interface to simplify complex reconstruction processes for non-technical users.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    UFO³

    UFO³

    Weaving the Digital Agent Galaxy

    ...The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be manipulated. This enables the agent to navigate complex software environments and perform tasks that normally require manual interaction. UFO integrates mechanisms for task decomposition, planning, and execution so that high-level user requests can be broken down into smaller steps performed by specialized agents. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    FormCreate

    FormCreate

    The easy-to-use Vue low-code visual AI form designer

    FormCreate is a low-code visual form builder built on Vue that enables developers to create complex, dynamic forms through a drag-and-drop interface rather than manual coding. It is part of the broader form-create ecosystem and leverages JSON-based schema generation to dynamically render forms, handle validation, and manage data collection workflows. The tool is designed to significantly reduce development time by allowing users to visually assemble forms while automatically generating the underlying configuration and logic. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    ...The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. It also maintains a timeline of analyzed camera events that can be displayed in dashboards or queried through the assistant interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Refly

    Refly

    The first open-source agent skills builder

    Refly is an AI-native workflow platform that democratizes automated workflow and skills creation for both technical and non-technical users by offering a visual, natural-language-driven interface. Instead of requiring code, Refly lets creators define tasks and business logic through simple “vibes,” which are compiled into structured, reusable agent skills that can be executed on engines like Claude Code, Cursor, or other supported runtimes. With a focus on making automation accessible, it provides a visual canvas and low-code components that feel similar to drag-and-drop builders but backed by powerful AI orchestration, memory handling, and integrations with external services. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    agentation

    agentation

    The visual feedback tool for agents

    Agentation is a visual annotation and feedback tool designed to make interacting with AI coding agents more intuitive and precise by letting developers visually click on frontend elements in a browser and annotate them with context before sending structured feedback to an agent. Instead of describing UI elements in text — like “the blue button in the sidebar” — users click directly on elements to automatically capture selectors, positions, and contextual metadata that can be consumed by AI agents to locate exact code references. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Dafthunk

    Dafthunk

    A workflow execution platform built on top of the fantastic Cloudflare

    ...It aims to combine the approachability of a visual editor with the practical needs of real automation: state persistence, execution history, reusable nodes, and integrations with external systems. A key appeal is that you can go from idea to running automation quickly in a hosted-like experience while still keeping the project open source and infrastructure-aware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ViZDoom

    ViZDoom

    Doom-based AI research platform for reinforcement learning

    ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular. ViZDoom is based on ZDOOM, the most popular modern source-port of DOOM. This means compatibility with a huge range of tools and resources that can be used to create custom scenarios, availability of detailed documentation of the engine and tools and support of Doom community. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    Model Explorer

    Model Explorer

    A modern model graph visualizer and debugger

    Model Explorer is a visual tool for exploring, debugging, and optimizing ML models deployed on edge devices. Developed by Google AI Edge, it offers a browser-based interface to inspect layer-wise performance, memory usage, and inference timing of TensorFlow Lite and other supported models. It’s a powerful utility for developers optimizing models for constrained environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Midscene

    Midscene

    Vision-based AI framework for cross-platform UI automation tasks

    Midscene.js is an open source AI-driven UI automation framework designed to control user interfaces across multiple platforms using natural language instructions. Instead of relying on traditional selectors, DOM structures, or accessibility attributes, it uses a vision-first approach where screenshots are analyzed by visual-language models to identify interface elements and perform actions. It allows developers to automate interactions on web applications, desktop software, and mobile devices without needing platform-specific automation logic. Developers can describe tasks such as clicking buttons, filling forms, or extracting information, and the system interprets these commands to interact with the interface accordingly. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...
    Downloads: 41 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB