Page 2 | vision free download

Showing 968 open source projects for "vision"

View related business solutions

Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
1

Night Vision

Night Vision is a "planetarium" program written in Java

Night Vision is a "planetarium" program that will display the heavens from any location on earth. Viewing options allow the user to control which sky objects to display, which font to use, and manipulation of various star parameters. Time may be set to run at multiple speeds, including backwards. Star charts may be printed. Night Vision is written in Java, allowing it to run on all major desktop systems (includes PCs, Macs, Linux, ...).

1 Review

Downloads: 14 This Week

Last Update: 2025-04-17
See Project
2

CleanVision

Automatically find issues in image datasets

...CleanVision helps you automatically identify common types of data issues lurking in image datasets. This package currently detects issues in the raw images themselves, making it a useful tool for any computer vision task such as: classification, segmentation, object detection, pose estimation, keypoint detection, generative modeling, etc.

Downloads: 1 This Week

Last Update: 2026-01-05
See Project
3

Kimi K2.5

Moonshot's most powerful AI model

...With a 256K context length and MoonViT vision encoder, the model excels across reasoning, coding, long-context comprehension, image, and video benchmarks. Kimi K2.5 is available via Moonshot’s API (OpenAI/Anthropic-compatible) and supports deployment through vLLM, SGLang, and KTransformers.

Downloads: 37 This Week

Last Update: 6 days ago
See Project
4

torchvision

Datasets, transforms and models specific to Computer Vision

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. We recommend Anaconda as Python package management system. Torchvision currently supports Pillow (default), Pillow-SIMD, which is a much faster drop-in replacement for Pillow with SIMD, if installed will be used as the default. Also, accimage, if installed can be activated by calling torchvision.set_image_backend('accimage'), libpng, which can be installed via conda conda install libpng or any of the package managers for debian-based and RHEL-based Linux distributions, and libjpeg, which can be installed via conda conda install jpeg or any of the package managers for debian-based and RHEL-based Linux distributions. ...

Downloads: 5 This Week

Last Update: 2026-03-10
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Tarsier

Vision utilities for web interaction agents

...We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True. Furthermore, we've developed an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand. Since current vision-language models still lack fine-grained representations needed for web interaction tasks, this is critical.

Downloads: 0 This Week

Last Update: 2024-09-20
See Project
6

GoCV

Go package for computer vision using OpenCV 4 and beyond

GoCV gives programmers who use the Go programming language access to the OpenCV 4 computer vision library. The GoCV package supports the latest releases of Go and OpenCV v4.5.4 on Linux, macOS, and Windows. Our mission is to make the Go language a “first-class” client compatible with the latest developments in the OpenCV ecosystem. Computer Vision (CV) is the ability of computers to process visual information, and perform tasks normally associated with those performed by humans. ...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
7

YOLOv5

YOLOv5 is the world's most loved vision AI

Introducing Ultralytics YOLOv8, the latest version of the acclaimed real-time object detection and image segmentation model. YOLOv8 is built on cutting-edge advancements in deep learning and computer vision, offering unparalleled performance in terms of speed and accuracy. Its streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs. Explore the YOLOv8 Docs, a comprehensive resource designed to help you understand and utilize its features and capabilities. ...

Downloads: 61 This Week

Last Update: 2024-05-29
See Project
8

Advanced AI explainability for PyTorch

Advanced AI Explainability for computer vision

pytorch-grad-cam is an open-source library that provides advanced explainable AI techniques for interpreting the predictions of deep learning models used in computer vision. The project implements Grad-CAM and several related visualization methods that highlight the regions of an image that most strongly influence a neural network’s decision. These visualization techniques allow developers and researchers to better understand how convolutional neural networks and transformer-based vision models make predictions. ...

Downloads: 1 This Week

Last Update: 2026-04-18
See Project
9

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. ...

Downloads: 1 This Week

Last Update: 2025-05-28
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
10

MIVisionX

Set of comprehensive computer vision & machine intelligence libraries

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX delivers highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions along with Convolution Neural Net Model Compiler & Optimizer supporting ONNX, and Khronos NNEF™ exchange formats. The toolkit allows for rapid prototyping and deployment of optimized computer vision and machine learning inference workloads on a wide range of computer hardware, including small embedded x86 CPUs, APUs, discrete GPUs, and heterogeneous servers. ...

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
11

Open Generative AI

Uncensored, open-source alternative to Higgsfield AI

...The repository organizes information about models, libraries, datasets, and learning materials, making it easier for developers to navigate the rapidly evolving AI landscape. It includes references to tools for natural language processing, computer vision, and multimodal systems. The project is designed as a knowledge hub, helping users discover technologies and best practices for building generative AI applications. It is particularly useful for beginners who need a structured overview as well as for experienced developers looking for new tools. The repository is continuously updated to reflect the latest developments in the field. ...

Downloads: 17 This Week

Last Update: 5 hours ago
See Project
12

Open-AutoGLM

An open phone agent model & framework

...Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Downloads: 11 This Week

Last Update: 2026-03-06
See Project
13

JavaCV

Java interface to OpenCV, FFmpeg, and more

JavaCV uses wrappers from the JavaCPP Presets of commonly used libraries by researchers in the field of computer vision (OpenCV, FFmpeg, libdc1394, FlyCapture, Spinnaker, OpenKinect, librealsense, CL PS3 Eye Driver, videoInput, ARToolKitPlus, flandmark, Leptonica, and Tesseract) and provides utility classes to make their functionality easier to use on the Java platform, including Android. JavaCV also comes with hardware accelerated full-screen image display (CanvasFrame and GLCanvasFrame), easy-to-use methods to execute code in parallel on multiple cores (Parallel), user-friendly geometric and color calibration of cameras and projectors (GeometricCalibrator, ProCamGeometricCalibrator, ProCamColorCalibrator), detection and matching of feature points (ObjectFinder), a set of classes that implement direct image alignment of projector-camera systems (mainly GNImageAligner, ProjectiveTransformer, ProjectiveColorTransformer, ProCamTransformer, and ReflectanceInitializer), and more.

Downloads: 22 This Week

Last Update: 2026-02-22
See Project
14

DeepSeek VL

Towards Real-World Vision-Language Understanding

DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot).

Downloads: 8 This Week

Last Update: 2025-10-03
See Project
15

MakeHuman

This is the main repository for the MakeHuman application as such

...Mac users should be able to use the same instructions as windows users, although this has not been thoroughly tested. At the point of writing this, the source code is almost ready for a stable release. The testing vision for this code is to build a community release that includes main application and often-used, user-contributed plug-ins. We hope that the utility of this integrated functionality is sufficient to entice a larger cohort of testers who get value-added in exchange for the possibility of uncovering deficiencies in our application.

Downloads: 56 This Week

Last Update: 2024-04-19
See Project
16

Metalhead.jl

Computer vision models for Flux

Metalhead.jl provides standard machine learning vision models for use with Flux.jl. The architectures in this package make use of pure Flux layers, and they represent the best practices for creating modules like residual blocks, inception blocks, etc. in Flux. Metalhead also provides some building blocks for more complex models in the Layers module.

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
17

Midscene

Vision-based AI framework for cross-platform UI automation tasks

Midscene.js is an open source AI-driven UI automation framework designed to control user interfaces across multiple platforms using natural language instructions. Instead of relying on traditional selectors, DOM structures, or accessibility attributes, it uses a vision-first approach where screenshots are analyzed by visual-language models to identify interface elements and perform actions. It allows developers to automate interactions on web applications, desktop software, and mobile devices without needing platform-specific automation logic. Developers can describe tasks such as clicking buttons, filling forms, or extracting information, and the system interprets these commands to interact with the interface accordingly. ...

Downloads: 5 This Week

Last Update: 6 days ago
See Project
18

Perception Models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models

...It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. Meanwhile, PLM integrates with PE to power vision-language modeling, achieving results competitive with leading multimodal systems such as QwenVL2.5 and InternVL3, all while being fully reproducible with open data. ...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
19

Segment Anything

Provides code for running inference with the SegmentAnything Model

...It’s a promptable segmenter: you guide it with points, boxes, or rough masks, and it predicts high-quality object masks consistent with the prompt. The architecture separates a powerful image encoder from a lightweight mask decoder, so the heavy vision work can be computed once and the interactive part stays fast. A bundled automatic mask generator can sweep an image and propose many object masks, which is useful for dataset bootstrapping or bulk annotation. The repository includes ready-to-use weights, Python APIs, and example notebooks demonstrating both interactive and automatic modes. ...

Downloads: 4 This Week

Last Update: 2025-10-06
See Project
20

DINOv3

Reference PyTorch implementation and models for DINOv3

...DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. The model supports multiple backbone architectures, including Vision Transformers (ViT), and can handle larger image resolutions with improved stability during training. The learned embeddings generalize robustly across tasks like classification, retrieval, and segmentation without fine-tuning, showing state-of-the-art transfer performance among self-supervised models.

Downloads: 27 This Week

Last Update: 2026-03-30
See Project
21

paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx

paperless-gpt is an AI-powered extension for document management systems that enhances the capabilities of paperless-ngx by integrating large language models and vision-based OCR to automate document processing and organization. It is designed to transform scanned or uploaded documents into structured, searchable, and intelligently categorized data without requiring manual tagging or sorting. The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate metadata such as tags, titles, and categories automatically. ...

Downloads: 1 This Week

Last Update: 2026-03-19
See Project
22

AliceVision

3D Computer Vision Framework

AliceVision is an open-source photogrammetric computer vision framework designed to reconstruct detailed 3D scenes and camera motion from collections of images or videos. It provides a complete pipeline for structure-from-motion (SfM), multi-view stereo (MVS), and mesh generation, allowing users to convert 2D imagery into accurate 3D models. The framework is built with a strong emphasis on research-grade algorithms while maintaining the robustness required for production environments, making it suitable for industries such as visual effects, cultural heritage preservation, and robotics. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
23

RF-DETR

RF-DETR is a real-time object detection and segmentation

RF-DETR is an open-source computer vision framework that implements a real-time object detection and instance segmentation model based on transformer architectures. Developed by Roboflow, the project builds upon modern vision transformer backbones such as DINOv2 to achieve strong accuracy while maintaining efficient inference speeds suitable for real-time applications. The model is designed to detect objects and segment them within images or video streams using a unified detection pipeline. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
24

Skywork-R1V4

Skywork-R1V is an advanced multimodal AI model series

Skywork-R1V is an open-source multimodal reasoning model designed to extend the capabilities of large language models into vision-language tasks that require complex logical reasoning. The project introduces a model architecture that transfers the reasoning abilities of advanced text-based models into visual domains so the system can interpret images and perform multi-step reasoning about them. Instead of retraining both language and vision models from scratch, the framework uses a lightweight visual projection layer that connects a pretrained vision backbone with a reasoning-capable language model. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
25

InternVL

A Pioneering Open-Source Alternative to GPT-4o

InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. ...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project