Showing 9 open source projects for "audio video streaming server source code"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    ...It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Anthropic SDK TypeScript

    Anthropic SDK TypeScript

    Access to Anthropic's safety-first language model APIs

    anthropic-sdk-typescript is the TypeScript / JavaScript client library for the Anthropic REST API, enabling backend or Node.js usage of models like Claude. It wraps API endpoints for creating messages, streaming responses, and managing parameters in a type-safe TS environment. The library is designed for server-side use, interfacing with REST, and is stable for integration in web services or backend agents. Example usage shows how to instantiate the Anthropic client, call...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries,...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 7
    Antigravity Claude Proxy

    Antigravity Claude Proxy

    Proxy that exposes Antigravity provided claude / gemini models

    Antigravity Claude Proxy is a purpose-built proxy server that enables developers to interface with Claude models through a standardized RESTful API, allowing tools and workflows that expect generic HTTP APIs to operate on Anthropic’s Claude without native support. The project acts as a translation layer, receiving web requests in common formats (such as OpenAI-style endpoints) and forwarding them to Anthropic’s API in the required structure, while converting responses back into a familiar...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    ...HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. The project provides code, pretrained weights, and inference instructions, making it feasible to deploy locally or on a server, and to integrate with applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The...
    Downloads: 103 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB