113 projects for "object" with 2 filters applied:

  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    Token-Oriented Object Notation

    Token-Oriented Object Notation

    Token-Oriented Object Notation (TOON)

    Token-Oriented Object Notation is an open specification and toolkit for a data serialization format called Token-Oriented Object Notation (TOON), designed specifically to optimize how structured data is passed to large language models. The format aims to reduce token overhead compared with traditional formats like JSON while remaining human-readable and structurally expressive.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Frigate

    Frigate

    NVR with realtime local object detection for IP cameras

    Frigate - NVR With Realtime Object Detection for IP Cameras A complete and local NVR designed for Home Assistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras. Use of a Google Coral Accelerator is optional, but highly recommended. The Coral will outperform even the best CPUs and can process 100+ FPS with very little overhead.
    Downloads: 51 This Week
    Last Update:
    See Project
  • 3
    SAM 3D Objects

    SAM 3D Objects

    Models for object and human mesh reconstruction

    SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image systems struggle. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 4
    BoxMOT

    BoxMOT

    Pluggable SOTA multi-object tracking modules for segmentation

    BoxMOT is an open-source framework designed to provide modular implementations of state-of-the-art multi-object tracking algorithms for computer vision applications. The project focuses on the tracking-by-detection paradigm, where objects detected by vision models are continuously tracked across frames in a video sequence. It provides a pluggable architecture that allows developers to combine different object detectors with multiple tracking algorithms without modifying the core codebase. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 5
    CutLER

    CutLER

    Code release for Cut and Learn for Unsupervised Object Detection

    CutLER is an approach for unsupervised object detection and instance segmentation that trains detectors without human-annotated labels, and the repo also includes VideoCutLER for unsupervised video instance segmentation. The method follows a “Cut-and-LEaRn” recipe: bootstrap object proposals, refine them iteratively, and train detection/segmentation heads to discover objects across diverse datasets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    VOID

    VOID

    Video Object and Interaction Deletion

    ...Built on top of transformer-based architectures and fine-tuned for video inpainting tasks, the system uses interaction-aware mask conditioning to ensure temporal consistency across frames. One of its most notable capabilities is its ability to simulate realistic scene behavior after object removal, such as causing an object to fall naturally if its support is removed, which significantly enhances realism.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    RF-DETR

    RF-DETR

    RF-DETR is a real-time object detection and segmentation

    RF-DETR is an open-source computer vision framework that implements a real-time object detection and instance segmentation model based on transformer architectures. Developed by Roboflow, the project builds upon modern vision transformer backbones such as DINOv2 to achieve strong accuracy while maintaining efficient inference speeds suitable for real-time applications. The model is designed to detect objects and segment them within images or video streams using a unified detection pipeline. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Frigate NVR

    Frigate NVR

    NVR with realtime local object detection for IP cameras

    Frigate is a local network video recorder designed for real-time object detection on IP camera streams using machine learning. It runs entirely on local hardware and integrates closely with Home Assistant to provide smart surveillance without relying on cloud processing. The system uses OpenCV and TensorFlow to analyze video feeds and detect objects such as people, vehicles, and animals in real time.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Grounded-Segment-Anything

    Grounded-Segment-Anything

    Marrying Grounding DINO with Segment Anything & Stable Diffusion

    Grounded-Segment-Anything is a research-oriented project that combines powerful open-set object detection with pixel-level segmentation and subsequent creative workflows, effectively enabling detection, segmentation, and high-level vision tasks guided by free-form text prompts. The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for objects once they are localized. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    hCaptcha Challenger

    hCaptcha Challenger

    Gracefully face hCaptcha challenge with multimodal llms

    ...Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. It implements an agent-style workflow where the system interprets the challenge prompt, selects the appropriate vision model, and generates the required interaction automatically.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    Segment Anything

    Segment Anything

    Provides code for running inference with the SegmentAnything Model

    Segment Anything (SAM) is a foundation model for image segmentation that’s designed to work “out of the box” on a wide variety of images without task-specific fine-tuning. It’s a promptable segmenter: you guide it with points, boxes, or rough masks, and it predicts high-quality object masks consistent with the prompt. The architecture separates a powerful image encoder from a lightweight mask decoder, so the heavy vision work can be computed once and the interactive part stays fast. A bundled automatic mask generator can sweep an image and propose many object masks, which is useful for dataset bootstrapping or bulk annotation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Ultralytics

    Ultralytics

    Ultralytics YOLO

    Ultralytics is a comprehensive computer vision framework that provides state-of-the-art implementations of the YOLO (You Only Look Once) family of models, enabling developers to perform tasks such as object detection, segmentation, classification, tracking, and pose estimation within a unified system. It is designed to be fast, accurate, and easy to use, offering both command-line and Python-based interfaces for training, validation, and deployment of machine learning models. The framework supports a full end-to-end workflow, including dataset preparation, model training, evaluation, and export to various deployment formats. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. By combining text and structured image representations, it aims to facilitate tasks where both descriptive and structural understanding are important, such as detailed image QA, interactive image editing via prompt layers, and image-conditioned generation with structural control. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    GeoAI

    GeoAI

    GeoAI: Artificial Intelligence for Geospatial Data

    ...It provides a unified framework that combines machine learning libraries such as PyTorch and Transformers with geospatial tools, allowing users to process satellite imagery, aerial photos, and vector datasets in a streamlined workflow. The platform supports a wide range of tasks including image classification, object detection, segmentation, and change detection, making it suitable for applications in environmental monitoring, urban planning, and disaster response. GeoAI simplifies complex workflows by offering high-level APIs that abstract data preprocessing, model training, and inference, reducing the technical barrier for users who are not experts in both AI and geospatial systems.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    Interactive Machine Learning Experiments

    Interactive Machine Learning Experiments

    Interactive Machine Learning experiments

    ...The project combines Jupyter or Colab notebooks with browser-based visual demos that allow users to see trained models operating in real time. Many experiments involve tasks such as image classification, object detection, gesture recognition, and simple generative models. The models are typically trained in Python using TensorFlow and then exported for interactive demonstrations in a web environment using JavaScript and TensorFlow.js. Because the project focuses on experimentation rather than production systems, it acts as a sandbox where developers can explore machine learning concepts and observe model behavior. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    InternGPT

    InternGPT

    Open source demo platform where you can easily showcase your AI models

    ...Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. The framework connects multiple specialized AI models that perform tasks such as object detection, segmentation, captioning, and visual editing while coordinating them through a central conversational interface. This architecture enables the system to plan actions, execute visual operations, and return results in a coherent dialogue with the user.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    gTTS

    gTTS

    Python library and CLI tool to interface with Google Translate

    gTTS (Google Text-to-Speech) is a Python library and command-line tool that wraps the speech functionality of Google Translate. It lets you send text to the Google Translate TTS endpoint and receive spoken audio back as MP3 data, either written to a file, a file-like object, or standard output. The library is designed to handle long texts, using a speech-specific sentence tokenizer that keeps intonation and punctuation natural while splitting requests into acceptable chunks. It supports customizable text pre-processors, which can correct pronunciations, tweak formatting, or handle domain-specific vocabulary before sending it to the API. gTTS is primarily aimed at developers who want a quick way to add cloud-backed speech to scripts, apps, or pipelines without managing any model weights locally. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 19
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    ...Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. It includes support for high-resolution inputs and post-processing tools that refine depth predictions, helping downstream tasks like segmentation, bounding volume estimation, and mixed reality layering.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Sa2VA

    Sa2VA

    Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

    ...With minimal instruction tuning (often one-shot), Sa2VA can handle tasks such as “segment the main subject,” “what are the objects in this scene?”, or “track this object through the video,” outputting pixel-perfect masks or spoken/textual answers as appropriate.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MemOS

    MemOS

    AI memory OS for LLM and Agent systems

    MemOS is an experimental operating system and runtime built around the concept of memory-centric computing, where memory objects are first-class citizens and program execution is organized around efficient, persistent memory access rather than traditional process and file system boundaries. The project explores rethinking system abstractions by tightly coupling computation with memory objects so that programs can operate on large datasets without expensive serialization or context switching....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    React Native ExecuTorch

    React Native ExecuTorch

    Declarative way to run AI models in React Native on device

    ...It is powered by ExecuTorch and provides a declarative approach to on-device model execution. The project supports a range of AI use cases, including large language models, computer vision, OCR, object detection, speech processing, segmentation, and embeddings. It helps React Native developers use local AI capabilities without needing deep native programming or machine learning infrastructure expertise. The library is especially relevant for privacy-first apps, offline experiences, and mobile products that need low-latency inference. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    MediaPipe Solutions

    MediaPipe Solutions

    Cross-platform, customizable ML solutions

    ...These pipelines can run on a wide variety of platforms including mobile devices, desktop systems, web browsers, and embedded edge devices. MediaPipe is widely used in computer vision and multimedia applications such as hand tracking, face detection, pose estimation, object recognition, and gesture analysis. The framework includes prebuilt solutions that developers can quickly integrate into applications as well as lower-level APIs that allow custom pipeline construction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    ...Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. DreamCraft3D may also support style or attribute control (e.g. “make this object metallic,” “add textures”) via prompt conditioning or guides.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo