Showing 29 open source projects for "low vision"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 1
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Kornia

    Kornia

    Open Source Differentiable Computer Vision Library

    Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by existing packages, this library is composed by a subset of packages containing operators that can be inserted within neural networks to train models to perform image transformations, epipolar geometry, depth estimation, and low-level image processing such as filtering and edge detection that operate directly on tensors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    MiniMind-V

    MiniMind-V

    "Big Model" trains a visual multimodal VLM with 26M parameters

    MiniMind-V is an experimental open-source project that aims to train a very small multimodal vision–language model (VLM) from scratch with extremely low compute and cost, making research and experimentation accessible to more people. The repository showcases training workflows and code designed to produce a 26-million parameter model—including both image and text capabilities—using minimal resources in very little time, reflecting a trend toward democratizing AI research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    dots.ocr

    dots.ocr

    Multilingual Document Layout Parsing in a Single Vision-Language Model

    dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks. The model is designed to recognize virtually any human script, making it highly effective for global and low-resource language scenarios. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset! The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Cactus

    Cactus

    Low-latency AI inference engine optimized for mobile devices

    Cactus is a low-latency, energy-efficient AI inference framework designed specifically for mobile devices and wearables, enabling advanced machine learning capabilities directly on-device. It provides a full-stack architecture composed of an inference engine, a computation graph system, and highly optimized hardware kernels tailored for ARM-based processors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LingBot-VLA

    LingBot-VLA

    A Pragmatic VLA Foundation Model

    ...The model aims to bridge vision, language understanding, and motor control within one unified architecture, making it capable of understanding high-level instructions and generating coherent low-level actions in physical environments. Because LingBot-VLA includes not just the model weights but also a full production-ready codebase with tools for data handling, training, and evaluation, developers can adapt it to custom robots or simulation environments efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    fastai

    fastai

    Deep learning library

    fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying...
    Downloads: 3 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    RuView

    RuView

    Turn WiFi signals into real-time human sensing and spatial awareness.

    ...Built on the concept of WiFi DensePose, it analyzes disturbances in WiFi Channel State Information (CSI) caused by human movement to reconstruct body position, breathing patterns, heart rate, and presence. Unlike traditional vision systems, RuView operates without cameras, wearables, or cloud connectivity, making it a privacy-first sensing solution. The system runs on low-cost hardware such as ESP32 sensor meshes and performs signal processing and machine learning directly at the edge. By learning the RF signature of each environment over time, RuView adapts automatically to different spaces and improves its sensing accuracy. ...
    Downloads: 74 This Week
    Last Update:
    See Project
  • 11
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Cosmos-RL

    Cosmos-RL

    Cosmos-RL is a flexible and scalable Reinforcement Learning framework

    Cosmos-RL is a scalable reinforcement learning framework designed specifically for physical AI systems such as robotics, autonomous agents, and multimodal models. It provides a distributed training architecture that separates policy learning and environment rollout processes, enabling efficient and asynchronous reinforcement learning at scale. The framework supports multiple parallelism strategies, including tensor, pipeline, and data parallelism, allowing it to leverage large GPU clusters...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    SageAttention

    SageAttention

    NeurIPS2025 Spotlight] Quantized Attention

    SageAttention is an open-source optimization library designed to accelerate the attention mechanism used in transformer-based neural networks. Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    CoreNet

    CoreNet

    CoreNet: A library for training deep neural networks

    CoreNet is Apple’s internal deep learning framework for distributed neural network training, designed for high scalability, low-latency communication, and strong hardware efficiency. It focuses on enabling large-scale model training across clusters of GPUs and accelerators by optimizing data flow and parallelism strategies. CoreNet provides abstractions for data, tensor, and pipeline parallelism, allowing models to scale without code duplication or heavy manual configuration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FastDeploy

    FastDeploy

    High-performance Inference and Deployment Toolkit for LLMs and VLMs

    FastDeploy is an open-source inference and deployment toolkit designed to simplify the process of running and serving deep learning models across a wide range of hardware platforms. Developed within the PaddlePaddle ecosystem, the toolkit focuses on providing high-performance deployment capabilities for modern AI models including large language models and vision-language systems. The platform enables developers to deploy trained models quickly using optimized inference pipelines that support...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    ...It supports high-resolution output (up to ~4 megapixels), which allows for photography-quality images, detailed product shots, infographics or UI mockups rather than just low-resolution drafts. FLUX.2 is built with a modern architecture (a flow-matching transformer + a revamped VAE + a strong vision-language encoder), enabling strong prompt adherence, correct rendering of text/typography in images, reliable lighting, layout, and physical realism, and consistent style/character/product identity across multiple generations or edits.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 18
    HomeRobot

    HomeRobot

    Mobile manipulation research tools for roboticists

    HomeRobot is an open source robotics framework developed by Facebook Research (Meta AI) that provides a complete software stack for mobile manipulation in real-world and simulated environments. Designed for low-cost robots such as the Hello Robot Stretch, HomeRobot enables agents to perceive, navigate, and interact with their surroundings through vision and manipulation. The system focuses on Open Vocabulary Mobile Manipulation (OVMM) — a challenging task in which a robot must explore an unknown environment, identify an object by name, locate a suitable receptacle, and place the object appropriately. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    PaddleSpeech

    PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model

    PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with state-of-art and influential models. Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide high-speed and ultra-lightweight models, and also cutting-edge technology. We provide production ready streaming asr and streaming tts system. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    SteadyDancer

    SteadyDancer

    Harmonized and Coherent Human Image Animation

    SteadyDancer is a research-oriented motion stabilization and dancer tracking system designed to analyze and correct motion in videos, making captured performances appear smoother and more stable while preserving expressiveness. It employs computer vision and motion modeling to estimate and reduce unwanted jitters, shakes, or camera wobbles — particularly in dance or movement sequences where traditional smoothing would distort intentional motion. By differentiating between intentional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MMEditing

    MMEditing

    MMEditing is a low-level vision toolbox based on PyTorch

    MMEditing is an open-source toolbox for low-level vision. It supports various tasks. MMEditing is a low-level vision toolbox based on PyTorch, supporting super-resolution, inpainting, matting, video interpolation, etc. We decompose the editing framework into different components and one can easily construct a customized editor framework by combining different modules. The toolbox directly supports popular and contemporary inpainting, matting, super-resolution and generation tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    EvaDB

    EvaDB

    Database system for building simpler and faster AI-powered application

    Over the last decade, AI models have radically changed the world of natural language processing and computer vision. They are accurate on various tasks ranging from question answering to object tracking in videos. To use an AI model, the user needs to program against multiple low-level libraries, like PyTorch, Hugging Face, Open AI, etc. This tedious process often leads to a complex AI app that glues together these libraries to accomplish the given task.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    iJEPA

    iJEPA

    Official codebase for I-JEPA

    i-JEPA (Image Joint-Embedding Predictive Architecture) is a self-supervised learning framework that predicts missing high-level representations rather than reconstructing pixels. A context encoder sees visible regions of an image and predicts target embeddings for masked regions produced by a slowly updated target encoder, focusing learning on semantics instead of texture. This objective sidesteps generative pixel losses and avoids heavy negative sampling, producing features that transfer...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    qiji-font

    qiji-font

    Typeface from Ming Dynasty woodblock printed books

    ...Generate a low-poly mask for each character on the grid, and save the thumbnails (using OpenCV). First, red channel is subtracted from the grayscale, in order to clean the annotations printed in red ink. Next, the image is thresholded and fed into the contour-tracing algorithm. A metric is then used to discard shapes that are unlikely to be part of the character in interest.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    PaddlePaddle models

    PaddlePaddle models

    Pre-trained and Reproduced Deep Learning Models

    Pre-trained and Reproduced Deep Learning Models ("Flying Paddle" official model library, including a variety of academic frontier and industrial scene verification of deep learning models) Flying Paddle's industrial-level model library includes a large number of mainstream models that have been polished by industrial practice for a long time and models that have won championships in international competitions; it provides many scenarios for semantic understanding, image classification,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB