33 projects for "video processing" with 2 filters applied:

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Cloud tools for web scraping and data extraction Icon
    Cloud tools for web scraping and data extraction

    Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.

    Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
    Explore 10,000+ tools
  • 1
    LTX-Video

    LTX-Video

    Official repository for LTX-Video

    LTX-Video is a sophisticated multimedia processing framework from Lightricks designed to handle high-quality video editing, compositing, and transformation tasks with performance and scalability. It provides runtime components that efficiently decode, encode, and manipulate video streams, frame buffers, and audio tracks while exposing a rich API for building customized editing features like transitions, effects, color grading, and keyframe automation. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 3
    Sora.FM

    Sora.FM

    Sora AI Video Generator by Sora.FM

    ...For creators wanting to explore AI-based content generation — for example automated video clips, short-form media, or other generated video content — sorafm offers a starting point. As with many open-source generators in this space, the tradeoff lies in balancing ease-of-use and the limitations of generative output, but the fact that it’s publicly available means users can experiment, iterate, or fork to adapt pipelines: maybe customizing model prompts, video templates, or post-processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SlowFast

    SlowFast

    Video understanding codebase from FAIR for reproducing video models

    SlowFast is a video understanding framework that captures both spatial semantics and temporal dynamics efficiently by processing video frames at two different temporal resolutions. The slow pathway encodes semantic context by sampling frames sparsely, while the fast pathway captures motion and fine temporal cues by operating on densely sampled frames with fewer channels.
    Downloads: 2 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 5
    SALMONN family

    SALMONN family

    A suite of advanced multi-modal LLMs

    SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FastRTC

    FastRTC

    The python library for real-time communication

    FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat, or collaborative audio/video tools. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Frigate

    Frigate

    NVR with realtime local object detection for IP cameras

    Frigate - NVR With Realtime Object Detection for IP Cameras A complete and local NVR designed for Home Assistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras. Use of a Google Coral Accelerator is optional, but highly recommended. The Coral will outperform even the best CPUs and can process 100+ FPS with very little overhead.
    Downloads: 59 This Week
    Last Update:
    See Project
  • The Original Buy Center Software. Icon
    The Original Buy Center Software.

    Never Go To The Auction Again.

    VAN sources private-party vehicles from over 20 platforms and provides all necessary tools to communicate with sellers and manage opportunities. Franchise and Independent dealers can boost their buy center strategies with our advanced tools and an experienced Acquisition Coaching™ team dedicated to your success.
    Learn More
  • 10
    Live API Web Console

    Live API Web Console

    A react-based starter app for using the Live API over websockets

    Live API Web Console is a React starter that demonstrates how to use Gemini’s Live API over WebSockets to build real-time, multimodal experiences. The app includes modules for streaming audio playback, recording user media from the microphone, webcam, or even screen capture, and it surfaces a unified event log so you can debug the session as it flows. Configuration lives in a simple .env file and the project boots with standard web tooling, letting you experiment quickly with models, system...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 13
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    ...It is designed to integrate with other tools and libraries and provide stable playback or media-processing pipelines, while remaining open-source so that users can inspect, extend, and adapt it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    OpenCV

    OpenCV

    Open Source Computer Vision Library

    The Open Source Computer Vision Library has >2500 algorithms, extensive documentation and sample code for real-time computer vision. It works on Windows, Linux, Mac OS X, Android, iOS in your browser through JavaScript. Languages: C++, Python, Julia, Javascript Homepage: https://opencv.org Q&A forum: https://forum.opencv.org/ Documentation: https://docs.opencv.org Source code: https://github.com/opencv Please pay special attention to our tutorials!...
    Leader badge
    Downloads: 3,446 This Week
    Last Update:
    See Project
  • 16
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    MediaPipe Face Detection

    MediaPipe Face Detection

    Detect faces in an image

    The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    FrankMocap

    FrankMocap

    A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

    FrankMocap is a monocular 3D human capture system that estimates body, hand, and optionally face pose from a single RGB image or video. It regresses parametric human models (e.g., SMPL/SMPL-X) directly, producing temporally stable meshes and joint angles suitable for animation or analytics. The pipeline couples a robust 2D keypoint detector with 3D mesh regression networks and priors that keep results anatomically plausible. It can run frame-by-frame or with temporal smoothing, and includes demo apps for live webcam capture as well as batch processing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Physics Simulation Software based on user sketchs running a pattern recognition agent, this app is able to animate a physics sketch, from a blackboard
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The Integrating Vision Toolkit (IVT) is a powerful and fast C++ computer vision library with an easy-to-use object-oriented architecture. It offers its own multi-platform GUI toolkit. OpenCV is integrated optionally. Website: http://ivt.sourceforge.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    library for rapid development of video processing, computer vision, and computer graphics algorithms
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    QVision: Computer Vision Library for Qt

    QVision: Computer Vision Library for Qt

    Computer vision and image processing library for Qt.

    This library contains among other things a set of graphical widgets for video output, performance evaluation and augmented reality. The library also provides classes for several data types usually required by computer vision and image processing applications such as vectors, matrices, quaternions and images. Thanks to a large number of wrapper functions these objects can be used with highly efficient functionality from third party libraries such as OpenCV, GNU Scientific Library, Computational Geometry Algorithms Library, Intel's Math Kernel Library and Integrated Performance Primitives, the Octave library, etc...
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    Black Hole Cortex

    Sphere surface layers of visual cortex approach maximum info density

    Near the surface (even horizon) of a black hole, there is maximum information density in units of squared plancks (and some translation to qubits). Similarly, our imagination is the set of all possible things we can draw onto our most dense layer of visual cortex in electricity patterns. Bigger layers have more neurons to handle those possibilities. A Black Hole Cortex is a kind of visual cortex that has density of neuron layers similar to density at various radius from a black hole. What we...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    BayesianCortex

    BayesianCortex

    simple algorithm for a realtime interactive visual cortex for painting

    A paint program where the canvas is the visual cortex of a simple kind of artificial intelligence. You paint with the mouse into its dreams and it responds by changing what you painted gradually. There will also be an API for using it with other programs as a general high-dimensional space. Each pixel's brightness is its own dimension. Bayesian nodes have exactly 3 childs because that is all thats needed to do NAND in a fuzzy way as Bayes' Rule which is NAND at certain extremes. NAND can be...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Real time face tracking and recognition refers to the task of locating human faces in a video stream and identifying the faces by matching them against the database of known faces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next