Showing 20 open source projects for "open object rfexx"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    VOID

    VOID

    Video Object and Interaction Deletion

    VOID is an advanced AI video processing system developed by Netflix that focuses on removing objects from videos while preserving the physical and visual realism of the surrounding environment. Unlike traditional inpainting methods that only erase pixels or simple artifacts, VOID models the full interaction dynamics between objects and their environment, including shadows, reflections, and even physical consequences such as movement or balance changes. Built on top of transformer-based...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Mesh R-CNN

    Mesh R-CNN

    code for Mesh R-CNN, ICCV 2019

    Mesh R-CNN is a 3D reconstruction and object understanding framework developed by Facebook Research that extends Mask R-CNN into the 3D domain. Built on top of Detectron2 and PyTorch3D, Mesh R-CNN enables end-to-end 3D mesh prediction directly from single RGB images. The model learns to detect, segment, and reconstruct detailed 3D mesh representations of objects in natural images, bridging the gap between 2D perception and 3D understanding. Unlike voxel-based or point-based approaches, Mesh...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography,...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 6
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    Qwen3-VL is the latest multimodal large language model series from Alibaba Cloud’s Qwen team, designed to integrate advanced vision and language understanding. It represents a major upgrade in the Qwen lineup, with stronger text generation, deeper visual reasoning, and expanded multimodal comprehension. The model supports dense and Mixture-of-Experts (MoE) architectures, making it scalable from edge devices to cloud deployments, and is available in both instruction-tuned and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and disentangled object representations for enhanced interactivity. The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T N1.5 is the world's first open foundation model

    NVIDIA Isaac‑GR00T N1.5 is an open-source foundation model engineered for generalized humanoid robot reasoning and manipulation skills. It accepts multimodal inputs—such as language and images—and uses a diffusion transformer architecture built upon vision-language encoders, enabling adaptive robot behaviors across diverse environments. It is designed to be customizable via post-training with real or synthetic data. The vision-language model remains frozen during both pretraining and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong instruction following and editing consistency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    ...The model targets general-purpose editing: from object addition/removal, style changes, recoloring, retouching, background replacement, to complex transformations like changing lighting, mood, or art style. The authors trained it on a large curated dataset and benchmarked it on a newly introduced evaluation suite, showing that Step1X-Edit significantly outperforms previous open-source baselines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MediaPipe Face Detection

    MediaPipe Face Detection

    Detect faces in an image

    The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Blazeface

    Blazeface

    Blazeface is a lightweight model that detects faces in images

    Blazeface is a lightweight, high-performance face detection model designed for mobile and embedded devices, developed by TensorFlow. It is optimized for real-time face detection tasks and runs efficiently on mobile CPUs, ensuring minimal latency and power consumption. Blazeface is based on a fast architecture and uses deep learning techniques to detect faces with high accuracy, even in challenging conditions. It supports multiple face detection in varying lighting and poses, and is designed...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    MoveNet

    MoveNet

    A CNN model that predicts human joints from RGB images of a person

    The MoveNet model is an efficient, real-time human pose estimation system designed for detecting and tracking keypoints of human bodies. It utilizes deep learning to accurately locate 17 key points across the body, providing precise tracking even with fast movements. Optimized for mobile and embedded devices, MoveNet can be integrated into applications for fitness tracking, augmented reality, and interactive systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    Mask2Former is a unified segmentation architecture that handles semantic, instance, and panoptic segmentation with one model and one training recipe. Its core idea is to cast segmentation as mask classification: a transformer decoder predicts a set of mask queries, each with an associated class score, eliminating the need for task-specific heads. A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DeepSDF

    DeepSDF

    Learning Continuous Signed Distance Functions for Shape Representation

    DeepSDF is a deep learning framework for continuous 3D shape representation using Signed Distance Functions (SDFs), as presented in the CVPR 2019 paper DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation by Park et al. The framework learns a continuous implicit function that maps 3D coordinates to their corresponding signed distances from object surfaces, allowing compact, high-fidelity shape modeling. Unlike traditional discrete voxel grids or meshes, DeepSDF...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    sg2im is a research codebase that learns to synthesize images from scene graphs—structured descriptions of objects and their relationships. Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    Qwen-Image-Edit is the image editing extension of Qwen-Image, a 20B parameter model that combines advanced visual and text-rendering capabilities for creative and precise editing. It leverages both Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling users to edit at both the content and detail level. The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo