Showing 371 open source projects for "visual"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 1
    InfiniteYou

    InfiniteYou

    Flexible Photo Recrafting While Preserving Your Identity

    ...Using an architecture built around diffusion transformers (DiTs), InfiniteYou introduces a component called InfuseNet that injects identity features derived from reference images into the generation process — via residual connections — so that the output matches the person’s identity closely, without sacrificing visual quality or text-image alignment. The team uses a multi-stage training strategy with synthetic multi-sample data per identity to fine-tune for both identity consistency and aesthetic quality. Compared to prior methods, InfiniteYou significantly improves on identity similarity, text-prompt adherence, overall image quality, and avoids common problems such as face copy-pasting artifacts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    TEN

    TEN

    Open-source framework for conversational voice AI agents

    TEN (Transformative Extensions Network) is an open source framework designed to empower developers to build real-time multimodal AI agents capable of voice, video, text, image, and data-stream interaction with ultra-low latency. It includes a full ecosystem, TEN Turn Detection, TEN Agent, and TMAN Designer, allowing developers to rapidly assemble human-like, responsive agents that can see, speak, hear, and interact. With support for languages like Python, C++, and Go, it offers flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Tiny CUDA Neural Networks

    Tiny CUDA Neural Networks

    Lightning fast C++/CUDA neural network framework

    This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning-fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers. We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    airda

    airda

    airda(Air Data Agent

    airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Luna AI

    Luna AI

    Virtual AI anchor that combines state-of-the-art technology

    Luna AI is a virtual AI streamer framework designed to power an interactive VTuber that can go live on major platforms and chat with viewers in real time. It is built around a core assistant persona called “Luna AI,” which can be driven by a wide range of large language models and platforms, including GPT-style APIs, Claude, LangChain-based backends, ChatGLM, Kimi, Ollama, and many others. The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE),...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    OculiX

    OculiX

    Visual Automation IDE — automate anything you see on screen

    ...Key features: - Guided step-by-step recorder with live code preview - Image recognition via OpenCV 4.10 - Dual OCR: Tesseract (built-in) + PaddleOCR (neural, high precision) - Local and remote automation via integrated VNC - SSH tunnels via embedded JSch - Cross-platform: Windows, macOS (Apple Silicon M1-M4), Linux - Scripting: Jython, JRuby, Java, PowerShell, AppleScript - Java 17 recommended (Java 8+ supported) - Full CI/CD with automated builds for all platforms Used worldwide for test automation, RPA, and visual regression testing. MIT License. Maintained by oculix-org.
    Leader badge
    Downloads: 42 This Week
    Last Update:
    See Project
  • 9
    A series of open source files and programs available to use for developing programs to work with the WowWee Robotics RSMedia Robot. These include a USB serial console, a cross-compiler, a firmware dump program, text-to-speech and source code.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    AçorOS

    AçorOS

    AçorOS: Debian com múltiplos desktops, fácil para iniciantes.

    Bem-vindo ao AçorOS: Uma Distribuição Linux Baseada no Debian Stable O AçorOS é uma experiência Linux intuitiva e estável. Escolha entre vários desktops, aproveite a estabilidade do Debian e personalize facilmente. Descubra o Linux com facilidade, estabilidade e estilo no AçorOS.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 11
    Joget

    Joget

    AI Powered Open Source Platform to Easily Build Enterprise Web Apps

    Joget offers an open-source, AI-powered platform that converges no-code/low-code development with AI to rapidly build and customize enterprise applications at scale. By combining AI with visual app builders—not raw code—Joget makes app generation faster, safer, and more accessible for everyone. With Generative AI and Agentic AI capabilities, Joget Intelligence enables organizations to automate and enhance processes while maintaining oversight and compliance. Unlike typical AI code generation, Joget's visual-first approach ensures applications are maintainable and governed within collaborative human workflows. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    AnimateDiff

    AnimateDiff

    Plug-n-play module turning text-to-image models into animation

    ...This plug-and-play tool is compatible with a wide range of community models and facilitates the generation of animation directly from pre-existing text-to-image models. It supports various configurations to create animations with different visual styles, providing flexibility and ease of use for developers and artists interested in exploring dynamic, AI-generated animations.
    Leader badge
    Downloads: 26 This Week
    Last Update:
    See Project
  • 13
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    ChatScript
    ...The technology behind Outfit7's mobile app Tom Loves Angela and ESL chatbots at Japan's SpeakGlobal. 3rd place winner Chatbot Battles 2012 and awarded best 15 minute conversation prize. 3rd place Loebner winner for 2013 and 1st place in 2014 and 2015. Also has useful ontology files for nouns, verbs, adjectives, adverbs. Stand-alone or server modes. LINUX ( 64 bit) and Windows (Visual Studio 10) and Mac/iOS. See BrilligUnderstanding.com for our home website. See github.com/chatscriptnlp/ChatScriptNLP for git accessible form (which also has fixes as needed prior to next full release cycle)
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    NodeTool

    NodeTool

    Visual AI Workflow Builder

    NodeTool is an open‑source, visual AI workflow builder that lets you connect nodes for text, images, audio, video, data, and automation—then run them locally or on the cloud. Build multi‑step agents, RAG systems, and creative media pipelines without coding, inspect execution in real time, and deploy anywhere: home server, private VPC, RunPod, or Cloud Run. With a local‑first design, NodeTool keeps models and data under your control while still supporting providers like OpenAI, Anthropic, Replicate, and HuggingFace. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Mermaid.js to SVG Converter

    Mermaid.js to SVG Converter

    Visualize the diagrams of your projects

    ...The trick is to ask the AI to write a diagram in Mermaid.js format that solidifies the structure of the project and then use that as context to keep the AI reminded at all times what the project is as a whole. This will prevent it from changing things its not supposed to change. This standalone offline web app will convert that mermaid.js code into a visual SVG image so that YOU as a human will be able to understand what the AI think about the structure of your projects so you can see it and fix any misconceptions until the diagram is correct for your project. (Source code is included in the html itself. Open it in a web browser to use.)
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Visual Studio Code client for Tabnine

    Visual Studio Code client for Tabnine

    Visual Studio Code client for Tabnine

    This extension is for Tabnine’s Starter (free), Pro and Enterprise SaaS users only. Tabnine Enterprise users with the self-hosted setup should use the Tabnine Enterprise extension in the VSCode Marketplace. Tabnine is an AI code assistant that makes you a better developer. Tabnine will increase your development velocity with real-time code completions, chat, and code generation in all the most popular coding languages and IDEs. Whether you call it IntelliSense, intelliCode, autocomplete,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    TypeUI

    TypeUI

    Design system skills for agentic tools

    ...It enables developers to enforce consistent design systems by generating and applying structured design “skills” that guide how UI components are built. These design skills include tokens, styling rules, and layout guidelines that ensure visual consistency regardless of the AI model or tool used. TypeUI works seamlessly with popular AI coding environments like Claude, Codex, Cursor, and Gemini CLI. It also offers a registry of pre-built design styles that can be easily pulled into projects via simple CLI commands. Overall, TypeUI bridges the gap between AI-generated code and professional UI design by standardizing aesthetics across tools and workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Computer vision projects

    Computer vision projects

    computer vision projects | Fun AI projects related to computer vision

    ...The repository provides examples that combine machine learning models with real-world applications such as robotic arms, video analysis, and automated visual measurement systems.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    MGIE

    MGIE

    Guiding Instruction-based Image Editing via Multimodal Large Language

    MGIE—Guiding Instruction-based Image Editing—demonstrates how a multimodal LLM can parse natural-language editing instructions and then drive image transformations accordingly. The project focuses on making edits explainable and controllable: the model interprets text guidance, reasons over image content, and outputs edits aligned with user intent. It’s positioned as an ICLR 2024 Spotlight work, with code and references that show how to connect language planning to concrete image operations....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    RAGxplorer

    RAGxplorer

    Open-source tool to visualise your RAG

    ...However, RAG systems can be complex because they involve multiple components such as embedding models, vector databases, and retrieval algorithms. RAGxplorer provides visual tools that allow developers to inspect how documents are embedded, retrieved, and used to answer queries. The software can load documents, generate embeddings, and project them into reduced vector spaces so that users can visually explore relationships between queries and retrieved documents. It also includes interactive interfaces that show how retrieval affects the final output of the language model.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    AI-Aimbot

    AI-Aimbot

    CS2, Valorant, Fortnite, APEX, every game

    ...The project emphasizes that it is intended for educational purposes to illustrate potential vulnerabilities in game design and anti-cheat systems. Because the system relies solely on visual detection rather than reading game memory, it attempts to bypass certain traditional anti-cheat detection methods.
    Downloads: 659 This Week
    Last Update:
    See Project
  • 23
    solo-learn

    solo-learn

    Library of self-supervised methods for visual representation

    A library of self-supervised methods for visual representation learning powered by Pytorch Lightning. A library of self-supervised methods for unsupervised visual representation learning powered by PyTorch Lightning. We aim at providing SOTA self-supervised methods in a comparable environment while, at the same time, implementing training tricks. The library is self-contained, but it is possible to use the models outside of solo-learn.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Exposure Correction

    Exposure Correction

    Learning multi-scale deep model correcting over- and under- exposed

    ...The repository includes pre-trained models, datasets, and training/testing code to enable reproducibility and experimentation. By leveraging this framework, researchers and developers can apply exposure correction to a wide range of natural images, improving visual quality without manual editing. The project serves both as a research reference and a practical tool for computational photography and image enhancement.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as objects or attributes change. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB