Showing 23 open source projects for "object"

View related business solutions
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    SAM 3D Objects

    SAM 3D Objects

    Models for object and human mesh reconstruction

    SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image systems struggle. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 2
    VOID

    VOID

    Video Object and Interaction Deletion

    ...Built on top of transformer-based architectures and fine-tuned for video inpainting tasks, the system uses interaction-aware mask conditioning to ensure temporal consistency across frames. One of its most notable capabilities is its ability to simulate realistic scene behavior after object removal, such as causing an object to fall naturally if its support is removed, which significantly enhances realism.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. By combining text and structured image representations, it aims to facilitate tasks where both descriptive and structural understanding are important, such as detailed image QA, interactive image editing via prompt layers, and image-conditioned generation with structural control. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    Mesh R-CNN

    Mesh R-CNN

    code for Mesh R-CNN, ICCV 2019

    ...The system combines 2D detection from Mask R-CNN with 3D reasoning modules that output full mesh reconstructions aligned with the input image. It has been evaluated on datasets such as Pix3D, where it demonstrates state-of-the-art performance in reconstructing real-world object geometry.
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    CO3D (Common Objects in 3D)

    CO3D (Common Objects in 3D)

    Tooling for the Common Objects In 3D dataset

    ...It builds upon the original CO3Dv1 dataset, expanding both scale and quality—featuring 2× more sequences and 4× more frames, with improved image fidelity, more accurate segmentation masks, and enhanced annotations for object-centric 3D reconstruction. CO3Dv2 enables research in multi-view 3D reconstruction, novel view synthesis, and geometry-aware representation learning. Each of the thousands of sequences in CO3Dv2 captures a common object (from categories like cars, chairs, or plants) from multiple real-world viewpoints. The dataset includes RGB images, depth maps, masks, and camera poses for each frame, along with pre-defined training, validation, and testing splits for both few-view and many-view reconstruction tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    ...The model excels not only in text rendering but also in a wide range of artistic styles, including photorealistic, impressionist, anime, and minimalist aesthetics. Qwen-Image supports sophisticated editing tasks such as style transfer, object insertion and removal, detail enhancement, and even human pose manipulation, making it suitable for both professional and casual users. It also includes advanced image understanding capabilities like object detection, semantic segmentation, depth and edge estimation, and novel view synthesis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    ...Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. It includes support for high-resolution inputs and post-processing tools that refine depth predictions, helping downstream tasks like segmentation, bounding volume estimation, and mixed reality layering.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    ...Qwen3-VL is built for complex tasks such as GUI automation, multimodal coding (converting images or videos into HTML, CSS, JS, or Draw.io diagrams), long-context reasoning with support up to 1M tokens, and comprehensive video understanding. It also brings advanced perception capabilities, including spatial grounding, object recognition, OCR across 32 languages, and robust handling of challenging inputs like low-light or distorted text.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...It combines the strengths of video-based diversity and 3D-based geometric consistency through a novel framework using panoramic world proxies and semantically layered 3D mesh representations. This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and disentangled object representations for enhanced interactivity. The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    ...Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. DreamCraft3D may also support style or attribute control (e.g. “make this object metallic,” “add textures”) via prompt conditioning or guides.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong instruction following and editing consistency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T N1.5 is the world's first open foundation model

    NVIDIA Isaac‑GR00T N1.5 is an open-source foundation model engineered for generalized humanoid robot reasoning and manipulation skills. It accepts multimodal inputs—such as language and images—and uses a diffusion transformer architecture built upon vision-language encoders, enabling adaptive robot behaviors across diverse environments. It is designed to be customizable via post-training with real or synthetic data. The vision-language model remains frozen during both pretraining and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MediaPipe Face Detection

    MediaPipe Face Detection

    Detect faces in an image

    The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    MoveNet

    MoveNet

    A CNN model that predicts human joints from RGB images of a person

    The MoveNet model is an efficient, real-time human pose estimation system designed for detecting and tracking keypoints of human bodies. It utilizes deep learning to accurately locate 17 key points across the body, providing precise tracking even with fast movements. Optimized for mobile and embedded devices, MoveNet can be integrated into applications for fitness tracking, augmented reality, and interactive systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Blazeface

    Blazeface

    Blazeface is a lightweight model that detects faces in images

    Blazeface is a lightweight, high-performance face detection model designed for mobile and embedded devices, developed by TensorFlow. It is optimized for real-time face detection tasks and runs efficiently on mobile CPUs, ensuring minimal latency and power consumption. Blazeface is based on a fast architecture and uses deep learning techniques to detect faces with high accuracy, even in challenging conditions. It supports multiple face detection in varying lighting and poses, and is designed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    ...A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial support. This leads to accurate masks with sharp boundaries and strong small-object performance while remaining efficient on high-resolution inputs. The project provides extensive configurations and pretrained models across popular benchmarks like COCO, ADE20K, and Cityscapes. Built on top of Detectron2, it includes training scripts, inference tools, and visualization utilities that make experimentation straightforward.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    YOLOv4

    YOLOv4

    PyTorch implementation of YOLOv4

    PyTorch_YOLOv4 is a PyTorch implementation of YOLOv4 based on the earlier ultralytics YOLOv3 codebase. It provides a practical way to train, test, and run YOLOv4-style object detection models without relying only on the original Darknet implementation. The repository supports common detection workflows such as dataset preparation, model training, evaluation, inference, and weight conversion. It is useful for developers who prefer the PyTorch ecosystem for experimentation, debugging, and integration with other machine learning tooling. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    DeepSDF

    DeepSDF

    Learning Continuous Signed Distance Functions for Shape Representation

    DeepSDF is a deep learning framework for continuous 3D shape representation using Signed Distance Functions (SDFs), as presented in the CVPR 2019 paper DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation by Park et al. The framework learns a continuous implicit function that maps 3D coordinates to their corresponding signed distances from object surfaces, allowing compact, high-fidelity shape modeling. Unlike traditional discrete voxel grids or meshes, DeepSDF encodes shapes as continuous neural representations that can be smoothly interpolated and used for reconstruction, generation, and analysis. The repository provides complete tooling for preprocessing mesh datasets (e.g., ShapeNet), training DeepSDF models, reconstructing meshes from learned latent codes, and quantitatively evaluating results with metrics such as Chamfer Distance and Earth Mover’s Distance.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    ...Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

    Qwen2.5-VL-3B-Instruct is a 3.75 billion parameter multimodal model by Qwen, designed to handle complex vision-language tasks in both image and video formats. As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and frame rate. It uses a SwiGLU and RMSNorm-enhanced ViT architecture and introduces mRoPE updates for robust temporal and spatial understanding. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    ...It leverages both Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling users to edit at both the content and detail level. The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or removing elements without altering surrounding regions. A standout feature is its bilingual text editing in English and Chinese, which preserves original font, size, and style during modifications. Benchmarks confirm its state-of-the-art performance in image editing, establishing it as a reliable foundation for both artistic and practical tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo