Showing 847 open source projects for "vision"

View related business solutions
  • Vibes don’t ship, Retool does Icon
    Vibes don’t ship, Retool does

    Start from a prompt and build production-ready apps on your data—with security, permissions, and compliance built in.

    Vibe coding tools create cool demos, but Retool helps you build software your company can actually use. Generate internal apps that connect directly to your data—deployed in your cloud with enterprise security from day one. Build dashboards, admin panels, and workflows with granular permissions already in place. Stop prototyping and ship on a platform that actually passes security review.
    Build apps that ship
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • 1
    Vision Camera

    Vision Camera

    The Camera library that sees the vision

    VisionCamera was designed from the ground up to provide all features a camera app should have. You have full control over what device is used, and can even configure options such as frame rate, colorspace, and more. While having a lot of features, VisionCamera makes sure you don't get overwhelmed from the beginning. It provides hooks and functions to help you get started faster, and if you need full control, you can easily do that. Every functionality has been thoroughly documented and even...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Raster Vision

    Raster Vision

    Open source framework for deep learning satellite and aerial imagery

    ...The input to a Raster Vision pipeline is a set of images and training data, optionally with Areas of Interest (AOIs) that describe where the images are labeled. The output of a Raster Vision pipeline is a model bundle that allows you to easily utilize models in various deployment scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AskUI Vision Agent

    AskUI Vision Agent

    Enable AI to control your desktop, mobile and HMI devices

    ...The repository presents a feature overview, sample media, and frequent release notes, which show ongoing improvements such as CORS checks and other operational tweaks. The broader AskUI documentation covers the Python Vision Agent along with suite services and inference APIs, indicating a productized ecosystem rather than a single library. Community-curated lists also recognize Vision Agent as part of the broader “GUI agents” landscape, placing it among other computer-use agents.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Vision Transformer Pytorch

    Vision Transformer Pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA

    ...Because it stays close to vanilla PyTorch, you can integrate custom datasets and training loops without framework lock-in. It’s widely used as an educational reference for people learning transformers in vision and as a lightweight baseline for research prototypes. The project encourages experimentation—swap optimizers, change augmentations, or plug the transformer backbone into downstream tasks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • MyQ Print Management Software Icon
    MyQ Print Management Software

    SAVE TIME WITH PERSONALIZED PRINT SOLUTIONS

    Boost your digital or traditional workplace with MyQ’s secure print and scan solutions that respect your time and help you focus on what you do best.
    Learn More
  • 5
    UI.Vision RPA

    UI.Vision RPA

    Open-Source RPA Software (formerly Kantu)

    The UI Vision RPA software is the tool for visual process automation, codeless UI test automation, web scraping and screen scraping. Automate tasks on Windows, Mac and Linux. The UI Vision RPA core is open-source with enterprise security. The free and open-source browser extension can be extended with local apps for desktop UI automation. UI.Vision RPA's computer-vision visual UI testing commands allow you to write automated visual tests with UI.Vision RPA - this makes UI.Vision RPA the first and only Chrome and Firefox extension (and Selenium IDE) that has "👁👁 eyes". ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Computer Vision Annotation Tool (CVAT)

    Computer Vision Annotation Tool (CVAT)

    Interactive video and image annotation tool for computer vision

    Computer Vision Annotation Tool (CVAT) is a free and open source, interactive online tool for annotating videos and images for Computer Vision algorithms. It offers many powerful features, including automatic annotation using deep learning models, interpolation of bounding boxes between key frames, LDAP and more. It is being used by its own professional data annotation team to annotate millions of objects with different properties.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 8
    OpenCV

    OpenCV

    Open Source Computer Vision Library

    OpenCV (Open Source Computer Vision Library) is a comprehensive open-source library for computer vision, machine learning, and image processing. It enables developers to build real-time vision applications ranging from facial recognition to object tracking. OpenCV supports a wide range of programming languages including C++, Python, and Java, and is optimized for both CPU and GPU operations.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 9
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 10
    Kornia

    Kornia

    Open Source Differentiable Computer Vision Library

    ...With Kornia we fill the gap between classical and deep computer vision that implements standard and advanced vision algorithms for AI. Our libraries and initiatives are always according to the community needs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    MESHROOM

    MESHROOM

    3D reconstruction software

    ...Photography is the projection of a 3D scene onto a 2D plane, losing depth information. The goal of photogrammetry is to reverse this process. The dense modeling of the scene is the result yielded by chaining two computer vision-based pipelines, “Structure-from-Motion” (SfM) and “Multi View Stereo” (MVS). Fusion of Multi-bracketing LDR images into HDR. Alignment of panorama images. Support for fisheye optics. Automatically estimate fisheye circle or manually edit it. Take advantage of motorized-head file. Easy to integrate in your Renderfarm System. Add specific rules to select the most suitable machines regarding CPU, RAM, GPU requirements of each Node.
    Downloads: 143 This Week
    Last Update:
    See Project
  • 13
    R1-V

    R1-V

    Witness the aha moment of VLM with less than $3

    R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    COLMAP

    COLMAP

    Structure-from-Motion and Multi-View Stereo

    COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface. It offers a wide range of features for the reconstruction of ordered and unordered image collections. The software is licensed under the new BSD license.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 15
    SAM 2

    SAM 2

    The repository provides code for running inference with SAM 2

    ...The updated model is optimized for faster inference and lower memory use, enabling real-time interactivity even on larger images or constrained hardware. SAM2 comes with pretrained weights and easy-to-use APIs, enabling developers and researchers to integrate promptable segmentation into annotation tools, vision pipelines, or downstream tasks. The project also includes scripts and notebooks to compare SAM2 against SAM on edge cases, benchmarks showing improvements, and evaluation suites to measure mask quality metrics like IoU and boundary error.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    MakeHuman

    MakeHuman

    This is the main repository for the MakeHuman application as such

    ...Mac users should be able to use the same instructions as windows users, although this has not been thoroughly tested. At the point of writing this, the source code is almost ready for a stable release. The testing vision for this code is to build a community release that includes main application and often-used, user-contributed plug-ins. We hope that the utility of this integrated functionality is sufficient to entice a larger cohort of testers who get value-added in exchange for the possibility of uncovering deficiencies in our application.
    Downloads: 93 This Week
    Last Update:
    See Project
  • 18
    OpenVINO

    OpenVINO

    OpenVINO™ Toolkit repository

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. ...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 19
    Albumentations

    Albumentations

    Fast image augmentation library and an easy-to-use wrapper

    Albumentations is a computer vision tool that boosts the performance of deep convolutional neural networks. Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Tarsier

    Tarsier

    Vision utilities for web interaction agents

    ...We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True. Furthermore, we've developed an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand. Since current vision-language models still lack fine-grained representations needed for web interaction tasks, this is critical.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    GoCV

    GoCV

    Go package for computer vision using OpenCV 4 and beyond

    GoCV gives programmers who use the Go programming language access to the OpenCV 4 computer vision library. The GoCV package supports the latest releases of Go and OpenCV v4.5.4 on Linux, macOS, and Windows. Our mission is to make the Go language a “first-class” client compatible with the latest developments in the OpenCV ecosystem. Computer Vision (CV) is the ability of computers to process visual information, and perform tasks normally associated with those performed by humans. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    SAHI

    SAHI

    A lightweight vision library for performing large object detection

    A lightweight vision library for performing large-scale object detection & instance segmentation. Object detection and instance segmentation are by far the most important fields of applications in Computer Vision. However, detection of small objects and inference on large images are still major issues in practical usage. Here comes the SAHI to help developers overcome these real-world problems with many vision utilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TorchIO

    TorchIO

    Medical imaging toolkit for deep learning

    ...Transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    DeepSqueak

    DeepSqueak

    DeepSqueak Using Machine Vision to Accelerate Bioacoustics Research

    Using Machine Vision to Accelerate Bioacoustics Research.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    GoogleTest

    GoogleTest

    Google Testing and Mocking Framework

    GoogleTest is Google's C++ mocking and test framework. It's used by many internal projects at Google, as well as a number of notable projects such as The Chromium projects, the OpenCV computer vision library, and the LLVM compiler. This GoogleTest project is actually a union of what used to be two separate projects: the old GoogleTest and GoogleMock, an extension of GoogleTest for writing and using C++ mock classes. Since they were so closely related, they were merged to create an even better GoogleTest. GoogleTest features an xUnit test framework, a rich set of assertions, user-defined assertions, death tests, among many others. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next