visual object net free download

Showing 79 open source projects for "visual object net"

View related business solutions

Python Clear Filters & Widen Search

Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
1

VOID

Video Object and Interaction Deletion

...One of its most notable capabilities is its ability to simulate realistic scene behavior after object removal, such as causing an object to fall naturally if its support is removed, which significantly enhances realism.

Downloads: 3 This Week

Last Update: 2026-06-20
See Project
2

InternGPT

Open source demo platform where you can easily showcase your AI models

...The framework connects multiple specialized AI models that perform tasks such as object detection, segmentation, captioning, and visual editing while coordinating them through a central conversational interface. This architecture enables the system to plan actions, execute visual operations, and return results in a coherent dialogue with the user.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
3

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything

...The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for tasks such as object detection, segmentation, classification, tracking, and pose estimation. The tool is built with an interactive graphical interface that simplifies annotation workflows and allows users to draw and edit labels directly on visual data. ...

Downloads: 97 This Week

Last Update: 3 days ago
See Project
4

Label Studio

Label Studio is a multi-type data labeling and annotation tool

The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...

Downloads: 12 This Week

Last Update: 2026-03-13
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

Qwen-Image-Layered

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. ...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
6

Grounded-Segment-Anything

Marrying Grounding DINO with Segment Anything & Stable Diffusion

Grounded-Segment-Anything is a research-oriented project that combines powerful open-set object detection with pixel-level segmentation and subsequent creative workflows, effectively enabling detection, segmentation, and high-level vision tasks guided by free-form text prompts. The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for...

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
7

Jaaz

Open source multimodal creative AI assistant with infinite canvas tool

Jaaz is an open source multimodal creative assistant designed to help users generate and organize visual media using artificial intelligence. It functions as a creative workspace where images, videos, and visual storyboards can be produced and arranged on an infinite canvas environment. It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and...

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
8

LLM Vision

Visual intelligence for your home.

LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. ...

Downloads: 1 This Week

Last Update: 2026-05-26
See Project
9

LatentSync

Taming Stable Diffusion for Lip Sync

...In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).

Downloads: 5 This Week

Last Update: 2025-12-02
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

Sa2VA

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about visual content, perform referring segmentation, and maintain temporal consistency across frames in video. ...

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
11

Depth Anything 3

Recovering the Visual Space from Any Views

Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. ...

Downloads: 4 This Week

Last Update: 2026-03-21
See Project
12

LISA

LISA: Reasoning Segmentation via Large Language Model

...The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual segmentation outputs. This approach allows the system to identify objects or regions in images based on semantic descriptions, contextual reasoning, and world knowledge. The model integrates multimodal capabilities by combining language understanding with visual perception so that text instructions guide the segmentation process. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
13

MoCo (Momentum Contrast)

Self-supervised visual learning using momentum contrast in PyTorch

MoCo is an open source PyTorch implementation developed by Facebook AI Research (FAIR) for the papers “Momentum Contrast for Unsupervised Visual Representation Learning” (He et al., 2019) and “Improved Baselines with Momentum Contrastive Learning” (Chen et al., 2020). It introduces Momentum Contrast (MoCo), a scalable approach to self-supervised learning that enables visual representation learning without labeled data. The core idea of MoCo is to maintain a dynamic dictionary with a...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
14

sidmon5.net

Sudden ionospheric disturbance monitor with Stokes data product

This package is a VLF receiver for monitoring VLF transmitter signals for evidence of transients indicating ionospheric disturbances, usually caused by x-ray bursts from the sun. It takes sample pairs from dual-channel sound cards and spectrally processes them to Stokes parameters. Data are plotted as time series and in scatter plots.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
15

Pytorch-toolbelt

PyTorch extensions for fast R&D prototyping and Kaggle farming

A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming. Easy model building using flexible encoder-decoder architecture. Modules: CoordConv, SCSE, Hypercolumn, Depthwise separable convolution and more. GPU-friendly test-time augmentation TTA for segmentation and classification. GPU-friendly inference on huge (5000x5000) images. Every-day common routines (fix/restore random seed, filesystem utils, metrics). Losses:...

Downloads: 0 This Week

Last Update: 2024-11-21
See Project
16

HunyuanWorld 1.0

Generating Immersive, Explorable, and Interactive 3D Worlds

...It combines the strengths of video-based diversity and 3D-based geometric consistency through a novel framework using panoramic world proxies and semantically layered 3D mesh representations. This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and disentangled object representations for enhanced interactivity. The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.

Downloads: 2 This Week

Last Update: 2026-04-15
See Project
17

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

hCaptcha Challenger is an open-source automation framework designed to solve hCaptcha verification challenges using computer vision models and multimodal reasoning techniques. The project integrates machine learning models capable of analyzing visual captcha tasks and identifying the correct responses required to pass the verification process. Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
18

Python Progressbar

Progressbar 2 - A progress bar for Python 2 and Python 3

...The ProgressBar class manages the current progress, and the format of the line is given by a number of widgets. A widget is an object that may display differently depending on the state of the progress bar.

Downloads: 1 This Week

Last Update: 2024-08-28
See Project
19

FireRed-Image-Edit

General-purpose image editing model that delivers high-fidelity

...The model excels in maintaining visual and text stylistic fidelity, allowing users to preserve the original artistic qualities of an image while applying creative changes according to natural language instructions. In addition to editing single images, FireRed supports multi-image editing scenarios such as virtual try-on or batch transformations, making it suitable for both creative and practical workflows.

Downloads: 1 This Week

Last Update: 2026-04-03
See Project
20

DWSIM - Open Source Process Simulator

Simulate chemical processes using advanced thermodynamic models

DWSIM is an open source, CAPE-OPEN compliant chemical process simulator for Windows, Linux and macOS systems. Written in VB.NET and C#, DWSIM features a comprehensive set of unit operations, advanced thermodynamic models, support for reacting systems, petroleum characterization tools and a fully-featured graphical interface. DWSIM Pro is a commercial sibling of DWSIM built on top of open-source software. It offers extended features, comes with private support, and is accessible in the...

Downloads: 1,677 This Week

Last Update: 2025-10-28
See Project
21

SCons

A software construction tool

SCons is a software construction tool that is a superior alternative to the classic "Make" build tool that we all know and love. SCons is implemented as a Python script and set of modules, and SCons "configuration files" are actually executed as Python scripts. This gives SCons many powerful capabilities not found in other software build tools. We make SCons available in three distinct packages, for different purposes. - The scons package is the basic package to install SCons. You...

28 Reviews

Downloads: 2,029 This Week

Last Update: 2025-11-16
See Project
22

lms2fits

Dual-channel spectroscopic receiver using LimeSDR-USB

`lms2fits` is a dual-channel spectroscopic receiver for radio astronomy that employs theLimeSDR's LimeSDR-USB dual-channel transceiver, which in turn employs Lime Microsystems' LMS7002M transceiver chip. These systems allow a frequency-agile (<30 MHz to 3.8 GHz) receiver providing Stokes parameters in dynamic spectra of up to 60 MHz analog bandwidth streamed to a FITS file with three-axis primary table. It runs on linux under .Net.

Downloads: 0 This Week

Last Update: 2026-01-13
See Project
23

Universal runtime installer

This installer allows to install the latest Windows Runtimes

This installer allows to install the latest Visual C++ Runtime of all years (2008-2022), Full DirectX Runtime, Microsoft XNA Framework, .Net Runtime, Java and OpenAL at once.

Downloads: 28 This Week

Last Update: 2026-05-08
See Project
24

Snap7

32/64 bit multi-platform Ethernet S7 PLC communication suite

Snap7, through three specialized components: Client and the inedited Server and Partner, allows you to definitively integrate your PC based systems into a PLC automation chain. Designed to transfer large amounts of hi-speed data in industrial facilities, it scales easily, down to small Linux Arm boards such as Raspberry PI. Hi level object oriented wrappers are provided, currently C/C++, .NET/Mono, Pascal, LabVIEW, Python with many source code examples. Very easy to use, a full working server example is not bigger than the “Hello world”. Many projects/makefiles are ready to run to easily rebuild Snap7 in any platform without the need of be a C++ guru. Very detailed documentation provided.

26 Reviews

Downloads: 1,098 This Week

Last Update: 2025-06-24
See Project
25

rx2fits

HF/VHF spectrosopy code for the rx888mk2 direct-sampling receiver

rx2fits is an SDR code for the rx888mk2 direct-sampling receiver that streams spectral data with frequency up to 65 MHz to FITS files. This code is for the direct-sampling input of the receiver, which transmits real samples over USB3 at up to 130 MHz sample rate. rx2fits processes these samples to spectral intensities via a Fourier-transform poly-phase filter bank, which provides spectral resolution approaching the spectral bin width with good stop-band and adjacent-channel rejection. ...

Downloads: 0 This Week

Last Update: 2026-01-12
See Project