Showing 136 open source projects for "segmentation"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    SlowFast

    SlowFast

    Video understanding codebase from FAIR for reproducing video models

    ...Together, these two pathways complement each other, allowing the network to model both appearance and motion without excessive computational cost. The architecture is modular and supports tasks like action recognition, temporal localization, and video segmentation, performing strongly on benchmarks like Kinetics and AVA. The repository provides training recipes, pretrained models, and distributed pipelines optimized for large-scale video datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    CO3D (Common Objects in 3D)

    CO3D (Common Objects in 3D)

    Tooling for the Common Objects In 3D dataset

    ...It builds upon the original CO3Dv1 dataset, expanding both scale and quality—featuring 2× more sequences and 4× more frames, with improved image fidelity, more accurate segmentation masks, and enhanced annotations for object-centric 3D reconstruction. CO3Dv2 enables research in multi-view 3D reconstruction, novel view synthesis, and geometry-aware representation learning. Each of the thousands of sequences in CO3Dv2 captures a common object (from categories like cars, chairs, or plants) from multiple real-world viewpoints. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Advanced AI explainability for PyTorch

    Advanced AI explainability for PyTorch

    Advanced AI Explainability for computer vision

    ...These visualization techniques allow developers and researchers to better understand how convolutional neural networks and transformer-based vision models make predictions. The library supports a wide variety of tasks including image classification, object detection, semantic segmentation, and similarity analysis. It also provides metrics and evaluation tools that help measure the reliability and quality of the generated explanations. By integrating easily with PyTorch models, the library allows developers to diagnose model errors, detect biases in datasets, and improve model transparency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Segment Anything

    Segment Anything

    Provides code for running inference with the SegmentAnything Model

    Segment Anything (SAM) is a foundation model for image segmentation that’s designed to work “out of the box” on a wide variety of images without task-specific fine-tuning. It’s a promptable segmenter: you guide it with points, boxes, or rough masks, and it predicts high-quality object masks consistent with the prompt. The architecture separates a powerful image encoder from a lightweight mask decoder, so the heavy vision work can be computed once and the interactive part stays fast. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    DeepDetect

    DeepDetect

    Deep Learning API and Server in C++14 support for Caffe, PyTorch

    ...While the Open Source Deep Learning Server is the core element, with REST API, and multi-platform support that allows training & inference everywhere, the Deep Learning Platform allows higher level management for training neural network models and using them as if they were simple code snippets. Ready for applications of image tagging, object detection, segmentation, OCR, Audio, Video, Text classification, CSV for tabular data and time series. Neural network templates for the most effective architectures for GPU, CPU, and Embedded devices. Training in a few hours and with small data thanks to 25+ pre-trained models. Full Open Source, with an ecosystem of tools (API clients, video, annotation, ...) ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    InternGPT

    InternGPT

    Open source demo platform where you can easily showcase your AI models

    ...Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. The framework connects multiple specialized AI models that perform tasks such as object detection, segmentation, captioning, and visual editing while coordinating them through a central conversational interface. This architecture enables the system to plan actions, execute visual operations, and return results in a coherent dialogue with the user.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    SAHI

    SAHI

    A lightweight vision library for performing large object detection

    A lightweight vision library for performing large-scale object detection & instance segmentation. Object detection and instance segmentation are by far the most important fields of applications in Computer Vision. However, detection of small objects and inference on large images are still major issues in practical usage. Here comes the SAHI to help developers overcome these real-world problems with many vision utilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Wink-NLP

    Wink-NLP

    Developer friendly Natural Language Processing

    Wink-NLP is a lightweight and fast natural language processing library for JavaScript, optimized for browser and Node.js environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Hazm

    Hazm

    Persian NLP Toolkit

    Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10
    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP

    Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    spaCy models

    spaCy models

    Models for the spaCy Natural Language Processing (NLP) library

    spaCy is designed to help you do real work, to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    VideoCaptioner

    VideoCaptioner

    AI-powered tool for generating, optimizing, and translating subtitles

    VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps. After transcription, large language models...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 13
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Director

    Director

    AI video agents framework for next-gen video interactions

    Director is a video database management system designed to organize, search, and retrieve large collections of video content efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    ...The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. It includes support for high-resolution inputs and post-processing tools that refine depth predictions, helping downstream tasks like segmentation, bounding volume estimation, and mixed reality layering.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    supervision

    supervision

    We write your reusable computer vision tools

    We write your reusable computer vision tools. Whether you need to load your dataset from your hard drive, draw detections on an image or video, or count how many detections are in a zone. You can count on us.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Techniques

    Techniques

    Techniques for deep learning with satellite & aerial imagery

    This repository is a comprehensive, curated collection of deep learning techniques and best practices specifically applied to satellite and aerial imagery. It covers everything from preprocessing and annotation to model architectures and open datasets. The guide includes code snippets, links to research papers, and hands-on tools, making it valuable for researchers, engineers, and enthusiasts working in remote sensing and geospatial AI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    ...Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    ncnn

    ncnn

    High-performance neural network inference framework for mobile

    ...It is cross-platform and supports most commonly used CNN networks, including Classical CNN (VGG AlexNet GoogleNet Inception), Face Detection (MTCNN RetinaFace), Segmentation (FCN PSPNet UNet YOLACT), and more. ncnn is currently being used in a number of Tencent applications, namely: QQ, Qzone, WeChat, and Pitu.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 20
    React Native ExecuTorch

    React Native ExecuTorch

    Declarative way to run AI models in React Native on device

    ...It is powered by ExecuTorch and provides a declarative approach to on-device model execution. The project supports a range of AI use cases, including large language models, computer vision, OCR, object detection, speech processing, segmentation, and embeddings. It helps React Native developers use local AI capabilities without needing deep native programming or machine learning infrastructure expertise. The library is especially relevant for privacy-first apps, offline experiences, and mobile products that need low-latency inference. Its main value is bridging React Native with native AI execution so apps can run powerful models directly on user devices.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    UCO3D

    UCO3D

    Uncommon Objects in 3D dataset

    uCO3D is a large-scale 3D vision dataset and toolkit centered on turn-table videos of everyday objects drawn from the LVIS taxonomy. It provides about 170,000 full videos per object instance rather than still frames, along with per-video annotations including object masks, calibrated camera poses, and multiple flavors of point clouds. Each sequence also ships with a precomputed 3D Gaussian Splat reconstruction, enabling fast, differentiable rendering workflows and modern implicit/point-based...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    AtomAI

    AtomAI

    Deep and Machine Learning for Microscopy

    AtomAI is a Pytorch-based package for deep and machine-learning analysis of microscopy data that doesn't require any advanced knowledge of Python or machine learning. The intended audience is domain scientists with a basic understanding of how to use NumPy and Matplotlib. It was developed by Maxim Ziatdinov at Oak Ridge National Lab. The purpose of the AtomAI is to provide an environment that bridges the instrument-specific libraries and general physical analysis by enabling the seamless...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    ...Qwen-Image supports sophisticated editing tasks such as style transfer, object insertion and removal, detail enhancement, and even human pose manipulation, making it suitable for both professional and casual users. It also includes advanced image understanding capabilities like object detection, semantic segmentation, depth and edge estimation, and novel view synthesis.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. ...
    Downloads: 1 This Week
    Last Update:
    See Project
Auth0 Logo