Showing 25 open source projects for "visual object net"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 99.99% Uptime for MySQL and PostgreSQL Databases Icon
    99.99% Uptime for MySQL and PostgreSQL Databases

    Sub-second maintenance. 2x read/write performance. Built-in vector search for AI apps.

    Cloud SQL Enterprise Plus delivers near-zero downtime with 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server.
    Try Free
  • 1
    VOID

    VOID

    Video Object and Interaction Deletion

    ...One of its most notable capabilities is its ability to simulate realistic scene behavior after object removal, such as causing an object to fall naturally if its support is removed, which significantly enhances realism.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    InternGPT

    InternGPT

    Open source demo platform where you can easily showcase your AI models

    ...The framework connects multiple specialized AI models that perform tasks such as object detection, segmentation, captioning, and visual editing while coordinating them through a central conversational interface. This architecture enables the system to plan actions, execute visual operations, and return results in a coherent dialogue with the user.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    ...The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for tasks such as object detection, segmentation, classification, tracking, and pose estimation. The tool is built with an interactive graphical interface that simplifies annotation workflows and allows users to draw and edit labels directly on visual data. ...
    Downloads: 97 This Week
    Last Update:
    See Project
  • 4
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 5
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Grounded-Segment-Anything

    Grounded-Segment-Anything

    Marrying Grounding DINO with Segment Anything & Stable Diffusion

    Grounded-Segment-Anything is a research-oriented project that combines powerful open-set object detection with pixel-level segmentation and subsequent creative workflows, effectively enabling detection, segmentation, and high-level vision tasks guided by free-form text prompts. The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Jaaz

    Jaaz

    Open source multimodal creative AI assistant with infinite canvas tool

    Jaaz is an open source multimodal creative assistant designed to help users generate and organize visual media using artificial intelligence. It functions as a creative workspace where images, videos, and visual storyboards can be produced and arranged on an infinite canvas environment. It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LLM Vision

    LLM Vision

    Visual intelligence for your home.

    LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 5 This Week
    Last Update:
    See Project
  • Save Up to 91% on Cloud Compute With Spot VMs Icon
    Save Up to 91% on Cloud Compute With Spot VMs

    Automatic sustained-use discounts. One free VM per month. No negotiation needed.

    Run batch jobs at 60-91% off with Spot VMs. Long-running workloads get automatic discounts with sustained use.
    Try Free
  • 10
    Sa2VA

    Sa2VA

    Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

    Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about visual content, perform referring segmentation, and maintain temporal consistency across frames in video. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    ...The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual segmentation outputs. This approach allows the system to identify objects or regions in images based on semantic descriptions, contextual reasoning, and world knowledge. The model integrates multimodal capabilities by combining language understanding with visual perception so that text instructions guide the segmentation process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MoCo (Momentum Contrast)

    MoCo (Momentum Contrast)

    Self-supervised visual learning using momentum contrast in PyTorch

    MoCo is an open source PyTorch implementation developed by Facebook AI Research (FAIR) for the papers “Momentum Contrast for Unsupervised Visual Representation Learning” (He et al., 2019) and “Improved Baselines with Momentum Contrastive Learning” (Chen et al., 2020). It introduces Momentum Contrast (MoCo), a scalable approach to self-supervised learning that enables visual representation learning without labeled data. The core idea of MoCo is to maintain a dynamic dictionary with a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Pytorch-toolbelt

    Pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

    A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming. Easy model building using flexible encoder-decoder architecture. Modules: CoordConv, SCSE, Hypercolumn, Depthwise separable convolution and more. GPU-friendly test-time augmentation TTA for segmentation and classification. GPU-friendly inference on huge (5000x5000) images. Every-day common routines (fix/restore random seed, filesystem utils, metrics). Losses:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...It combines the strengths of video-based diversity and 3D-based geometric consistency through a novel framework using panoramic world proxies and semantically layered 3D mesh representations. This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and disentangled object representations for enhanced interactivity. The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    hCaptcha Challenger

    hCaptcha Challenger

    Gracefully face hCaptcha challenge with multimodal llms

    hCaptcha Challenger is an open-source automation framework designed to solve hCaptcha verification challenges using computer vision models and multimodal reasoning techniques. The project integrates machine learning models capable of analyzing visual captcha tasks and identifying the correct responses required to pass the verification process. Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    ...The model excels in maintaining visual and text stylistic fidelity, allowing users to preserve the original artistic qualities of an image while applying creative changes according to natural language instructions. In addition to editing single images, FireRed supports multi-image editing scenarios such as virtual try-on or batch transformations, making it suitable for both creative and practical workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Computer vision projects

    Computer vision projects

    computer vision projects | Fun AI projects related to computer vision

    Computer vision projects is an open-source collection of computer vision projects and experiments that demonstrate practical applications of modern AI techniques in image processing, robotics, and real-time visual analysis. The repository includes multiple demonstration systems implemented using languages such as Python and C++, covering topics ranging from object detection to embedded vision systems. Many of the projects illustrate how computer vision algorithms can interact with hardware platforms, including robotics systems and edge computing devices. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    AI-Aimbot

    AI-Aimbot

    CS2, Valorant, Fortnite, APEX, every game

    ...Because the system relies solely on visual detection rather than reading game memory, it attempts to bypass certain traditional anti-cheat detection methods.
    Downloads: 5,295 This Week
    Last Update:
    See Project
  • 20
    Nougat

    Nougat

    Implementation of Nougat Neural Optical Understanding

    Nougat is a multi-modal generative modeling framework that bridges vision and text modalities with structured generation control (e.g. layout, scene composition) rather than treating images as flat contexts. It combines object-centric modules with transformer-based reasoning to propose, refine, and render scenes in a generative pipeline. The architecture allows you to specify or prompt a layout (which objects should be where) and then the model fills in appearance, context, lighting, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Surface Defect Detection Dataset Papers

    Surface Defect Detection Dataset Papers

    Constantly summarizing open source dataset and critical papers

    ...Generally speaking, imaging schemes are usually designed by using the different properties of the inspected surface or defects. A reasonable imaging scheme helps to obtain images with uniform illumination and clearly reflect the surface defects of the object. In recent years, many defect detection methods based on deep learning have also been widely used in various industrial scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    PyTracking

    PyTracking

    Visual tracking library based on PyTorch

    A general python framework for visual object tracking and video object segmentation, based on PyTorch. Official implementation of the RTS (ECCV 2022), ToMP (CVPR 2022), KeepTrack (ICCV 2021), LWL (ECCV 2020), KYS (ECCV 2020), PrDiMP (CVPR 2020), DiMP (ICCV 2019), and ATOM (CVPR 2019) trackers, including complete training code and trained models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    mAP

    mAP

    Evaluates the performance of your neural net for object recognition

    In practice, a higher mAP value indicates a better performance of your neural net, given your ground truth and set of classes. The performance of your neural net will be judged using the mAP criteria defined in the PASCAL VOC 2012 competition. We simply adapted the official Matlab code into Python (in our tests they both give the same results). First, your neural net detection-results are sorted by decreasing confidence and are assigned to ground-truth objects. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    aCompute

    aCompute

    Aims to enable researcher to tap in to mobile computing capability

    This is a software agent based computing program that will enable researchers and other users to tap in computing power of machine available by sharing work load on the fly with zero configuration on network & resources A self organizing agent program that will understand network and its resource. where as the only job left to researcher is to split up jobs in several chunks of programs either parallel or sequential jobs and go issue the job (A visual Modeler or Scripting support need to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    scripthea

    Scripthea is designed to streamline of crafting prompts for T2I gen.

    Scripthea is a free, open-source Windows application designed to streamline the process of crafting prompts for text-to-image AI generators like Stable Diffusion. Scripthea offers a structured environment for building, testing, and refining prompts, making it an invaluable tool for artists, designers, and AI enthusiasts seeking greater control over their creative outputs. At its core, Scripthea simplifies prompt engineering by breaking down prompts into two components: cues (descriptive...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo