49 projects for "unity image processing" with 2 filters applied:

  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    MATLAB Deep Learning Model Hub

    MATLAB Deep Learning Model Hub

    Discover pretrained models for deep learning in MATLAB

    Discover pre-trained models for deep learning in MATLAB. Pretrained image classification networks have already learned to extract powerful and informative features from natural images. Use them as a starting point to learn a new task using transfer learning. Inputs are RGB images, the output is the predicted label and score.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    ...It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity.
    Downloads: 3 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    OpenAI Quickstart Node

    OpenAI Quickstart Node

    Node.js example app from the OpenAI API quickstart tutorial

    ...The repository provides structured sample code for a variety of API endpoints, including chat completions, assistants, embeddings, fine-tuning, moderation, batch processing, and image generation. Each folder contains runnable scripts that demonstrate both basic usage and more advanced scenarios. By following the examples, developers can quickly understand how to authenticate with an API key, send requests, and handle responses within a Node.js environment. The project is a practical starting point for building AI-powered applications, serving as a foundation for experimentation and integration into larger projects. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or post-processing modules (e.g. mesh smoothing, texturing) to make the outputs more output-ready. Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Step3-VL-10B

    Step3-VL-10B

    Multimodal model achieving SOTA performance

    ...It achieves this efficiency and strong performance through unified pre-training on a massive 1.2 trillion-token multimodal corpus that jointly optimizes a language-aligned perception encoder with a powerful decoder, creating deep synergy between image processing and text understanding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    Dolphin — maintained by ByteDance — is a project aimed at providing a high-performance, robust, and extensible media or multimedia framework / player infrastructure (or possibly a streaming media solution), intended to meet modern demands for efficiency, flexibility, and integration in media-heavy applications. It seeks to combine performant media playback or handling (audio/video decoding, streaming, buffering) with a modular, developer-friendly API that allows easy embedding into larger...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    OAGI Python SDK

    OAGI Python SDK

    Python SDK for the Computer Use model Lux, developed by OpenAGI

    OAGI Python SDK is a Python client library for the Lux computer-use model that turns Lux into a programmable automation layer for operating human-facing software via vision and actions. It exposes the OAGI API in an ergonomic way, letting you trigger Lux in three main modes: Tasker for precise scripted sequences, Actor for fast one-shot tasks, and Thinker for open-ended, multi-step objectives. The SDK is designed around “computer use” as a paradigm, where the AI actually navigates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SMILI

    SMILI

    Scientific Visualisation Made Easy

    ...The main sMILX application features for viewing n-D images, vector images, DICOMs, anonymizing, shape analysis and models/surfaces with easy drag and drop functions. It also features a number of standard processing algorithms for smoothing, thresholding, masking etc. images and models, both with graphical user interfaces and/or via the command-line. See our YouTube channel for tutorial videos via the homepage. The applications are all built out of a uniform user-interface framework that provides a very high level (Qt) interface to powerful image processing and scientific visualisation algorithms from the Insight Toolkit (ITK) and Visualisation Toolkit (VTK). ...
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15

    libsombrero

    Astronomical object/structure detection from 1D and 2D data sets.

    Sombrero is a fast wavelet image processing and object detection C library for astronomical images. Sombrero is named after the "Mexican Hat" shape of the wavelet masks used in image convolution and is released under the GNU LGPL library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ADAMS

    ADAMS

    ADAMS is a workflow engine for building complex knowledge workflows.

    ...This allows rapid development and easy maintenance of large workflows, with hundreds or thousands of operators. Operators include machine learning (WEKA, MOA, MEKA) and image processing (ImageJ, JAI, BoofCV, LIRE and Gnuplot). R available using Rserve. WEKA webservice allows other frameworks to use WEKA models. Fast prototyping with Groovy and Jython. Read/write support for various databases and spreadsheet applications.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    MediaPipe Face Detection

    MediaPipe Face Detection

    Detect faces in an image

    The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    FrankMocap

    FrankMocap

    A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

    FrankMocap is a monocular 3D human capture system that estimates body, hand, and optionally face pose from a single RGB image or video. It regresses parametric human models (e.g., SMPL/SMPL-X) directly, producing temporally stable meshes and joint angles suitable for animation or analytics. The pipeline couples a robust 2D keypoint detector with 3D mesh regression networks and priors that keep results anatomically plausible. It can run frame-by-frame or with temporal smoothing, and includes demo apps for live webcam capture as well as batch processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    LTI-Lib is an object oriented computer vision library written in C++ for Windows/MS-VC++ and Linux/gcc. It provides lots of functionality to solve mathematical problems, many image processing algorithms, some classification tools and much more...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    Spectral Python

    A python module for hyperspectral image processing

    Spectral Python (SPy) is a python package for reading, viewing, manipulating, and classifying hyperspectral image (HSI) data. SPy includes functions for clustering, dimensionality reduction, supervised classification, and more.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    VoiceShot API - PHP SDK

    VoiceShot API - PHP SDK

    PHP SDK for processing phone calls and SMS through the VoiceShot API.

    VoiceShot's API allows you to quickly integrate both incoming and outgoing phone calling and text messaging services into your applications. From your own applications, you can easily place and receive interactive telephone calls and text messages. Put callers in touch with the data and people they want when they want it. Send notification phone calls and text messages, automate customer service calls/texts, provide order status and integrate with your own custom applications to provide...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    VoiceShot API - .NET SDK

    VoiceShot API - .NET SDK

    .NET SDK for processing phone calls and SMS through the VoiceShot API.

    VoiceShot's API allows you to quickly integrate both incoming and outgoing phone calling and text messaging services into your applications. From your own applications, you can easily place and receive interactive telephone calls and text messages. Put callers in touch with the data and people they want when they want it. Send notification phone calls and text messages, automate customer service calls/texts, provide order status and integrate with your own custom applications to provide...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    VoiceShot API - ASP SDK

    VoiceShot API - ASP SDK

    ASP SDK for processing phone calls and SMS through the VoiceShot API.

    VoiceShot's API allows you to quickly integrate both incoming and outgoing phone calling and text messaging services into your applications. From your own applications, you can easily place and receive interactive telephone calls and text messages. Put callers in touch with the data and people they want when they want it. Send notification phone calls and text messages, automate customer service calls/texts, provide order status and integrate with your own custom applications to provide...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB