Showing 51 open source projects for "processing"

View related business solutions
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    LTX-Video

    LTX-Video

    Official repository for LTX-Video

    ...The toolkit is built with both real-time and offline workflows in mind, enabling applications from consumer editing to professional content creation and batch processing. Internally optimized for multi-core processors and hardware acceleration where available, LTX-Video makes it feasible to work with high-resolution content and complex timelines without sacrificing responsiveness.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    llama.cpp Python Bindings

    llama.cpp Python Bindings

    Python bindings for llama.cpp

    llama-cpp-python provides Python bindings for llama.cpp, enabling the integration of LLaMA (Large Language Model Meta AI) language models into Python applications. This facilitates the use of LLaMA's capabilities in natural language processing tasks within Python environments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    GLM-5

    GLM-5

    From Vibe Coding to Agentic Engineering

    ...It incorporates innovations like DeepSeek Sparse Attention (DSA) to preserve massive context windows while reducing deployment costs and supporting long context processing, which is crucial for detailed plans and agent tasks.
    Downloads: 98 This Week
    Last Update:
    See Project
  • 6
    VOID

    VOID

    Video Object and Interaction Deletion

    VOID is an advanced AI video processing system developed by Netflix that focuses on removing objects from videos while preserving the physical and visual realism of the surrounding environment. Unlike traditional inpainting methods that only erase pixels or simple artifacts, VOID models the full interaction dynamics between objects and their environment, including shadows, reflections, and even physical consequences such as movement or balance changes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    OpenMythos

    OpenMythos

    A theoretical reconstruction of the Claude Mythos architecture

    ...The project explores the idea that instead of stacking hundreds of unique transformer layers, a smaller set of layers can be reused iteratively during inference to achieve deeper reasoning without increasing parameter count. It divides computation into three main stages, including a pre-processing phase, a looped recurrent reasoning block, and a final output refinement stage, creating a structured pipeline for inference. The architecture incorporates advanced techniques such as mixture-of-experts routing, adaptive computation time, and multiple attention mechanisms to dynamically allocate compute where needed. It is highly configurable through a centralized configuration system, allowing experimentation with different architectural parameters such as loop depth, attention type.
    Downloads: 50 This Week
    Last Update:
    See Project
  • 8
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    ...It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    ...The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. It includes support for high-resolution inputs and post-processing tools that refine depth predictions, helping downstream tasks like segmentation, bounding volume estimation, and mixed reality layering.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    OpenAI Privacy Filter

    OpenAI Privacy Filter

    Bidirectional token-classification model for identifiable info

    ...It operates as a bidirectional token classification system that labels sensitive data in a single forward pass rather than generating text sequentially, enabling fast processing for large datasets. The model supports long-context inputs, allowing it to analyze extensive documents without chunking, which improves consistency in redaction tasks. It can run locally on standard hardware, ensuring that sensitive information never leaves the user’s environment and supporting privacy-first workflows. The system is fine-tunable, enabling adaptation to specific datasets or compliance requirements across industries. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 12
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    ...A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. Safety mechanisms include an audible disclaimer and imperceptible watermarking in all generated audio to mitigate misuse risks.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    TRIBE v2

    TRIBE v2

    A multimodal model for brain response prediction

    ...TRIBE v2 allows researchers to simulate and analyze brain activity without requiring direct human experiments. Overall, it provides a powerful tool for studying perception, cognition, and multimodal processing in the brain.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Qwen

    Qwen

    The official repo of Qwen chat & pretrained large language model

    Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a versatile tool for developers looking to integrate advanced AI functionalities into their applications.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    VGGSfM

    VGGSfM

    VGGSfM: Visual Geometry Grounded Deep Structure From Motion

    VGGSfM is an advanced structure-from-motion (SfM) framework jointly developed by Meta AI Research (GenAI) and the University of Oxford’s Visual Geometry Group (VGG). It reconstructs 3D geometry, dense depth, and camera poses directly from unordered or sequential images and videos. The system combines learned feature matching and geometric optimization to generate high-quality camera calibrations, sparse/dense point clouds, and depth maps in standard COLMAP format. Version 2.0 adds support...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    SlowFast

    SlowFast

    Video understanding codebase from FAIR for reproducing video models

    SlowFast is a video understanding framework that captures both spatial semantics and temporal dynamics efficiently by processing video frames at two different temporal resolutions. The slow pathway encodes semantic context by sampling frames sparsely, while the fast pathway captures motion and fine temporal cues by operating on densely sampled frames with fewer channels. Together, these two pathways complement each other, allowing the network to model both appearance and motion without excessive computational cost. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    MiniCPM4

    MiniCPM4

    Ultra-Efficient LLMs on End Device

    MiniCPM4 is part of the MiniCPM family of ultra-efficient large language models designed specifically for high performance on edge devices and resource-constrained environments. Unlike traditional large-scale models that require extensive computational resources, MiniCPM4 focuses on delivering competitive reasoning and language capabilities while maintaining significantly lower latency and higher efficiency. It achieves this through optimized architectures, scalable training strategies, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Step3-VL-10B

    Step3-VL-10B

    Multimodal model achieving SOTA performance

    ...It achieves this efficiency and strong performance through unified pre-training on a massive 1.2 trillion-token multimodal corpus that jointly optimizes a language-aligned perception encoder with a powerful decoder, creating deep synergy between image processing and text understanding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    ...The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or post-processing modules (e.g. mesh smoothing, texturing) to make the outputs more output-ready. Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. DreamCraft3D may also support style or attribute control (e.g. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next