Showing 61 open source projects for "media"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    MoneyPrinterTurbo

    MoneyPrinterTurbo

    Generate short videos with one click using AI LLM

    MoneyPrinterTurbo is an AI-driven tool that enables users to generate high-definition short videos with minimal input. By providing a topic or keyword, the system automatically creates video scripts, sources relevant media assets, adds subtitles, and incorporates background music, resulting in a polished video ready for distribution.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 2
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    Dolphin — maintained by ByteDance — is a project aimed at providing a high-performance, robust, and extensible media or multimedia framework / player infrastructure (or possibly a streaming media solution), intended to meet modern demands for efficiency, flexibility, and integration in media-heavy applications. It seeks to combine performant media playback or handling (audio/video decoding, streaming, buffering) with a modular, developer-friendly API that allows easy embedding into larger applications or services. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    ...It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. It separates client-side media handling from backend AI processing, reducing data exposure while still enabling transcription and document generation. AI-Media2Doc supports flexible customization through prompts, allowing users to tailor output styles based on their needs. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Jaaz

    Jaaz

    Open source multimodal creative AI assistant with infinite canvas tool

    Jaaz is an open source multimodal creative assistant designed to help users generate and organize visual media using artificial intelligence. It functions as a creative workspace where images, videos, and visual storyboards can be produced and arranged on an infinite canvas environment. It combines AI agents with visual editing tools, allowing users to generate media through prompts, sketches, or simple instructions. Jaaz supports multiple AI models and can integrate both local and cloud-based inference systems, enabling flexible creative workflows. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    FaceFusion

    FaceFusion

    Industry leading face manipulation platform

    ...FaceFusion is built with a modular pipeline that allows users to customize processing steps and optimize performance for different hardware environments. The tool is often used in content creation, visual effects experimentation, and research into generative media. Overall, FaceFusion functions as a flexible and extensible platform for AI-driven face replacement and enhancement tasks.
    Downloads: 221 This Week
    Last Update:
    See Project
  • 6
    SuggestArr

    SuggestArr

    Request recommended movies, TV shows and anime to Jellyseer/Overseer

    SuggestArr is an open-source automation platform designed to recommend and automatically request movies, TV shows, and anime based on a user’s viewing history in self-hosted media servers. The project integrates with popular media management systems such as Jellyfin, Plex, and Emby, allowing it to analyze recently watched content and identify similar titles using metadata from the TMDb database. Once potential recommendations are identified, SuggestArr can automatically send download or request instructions to services like Jellyseer or Overseerr, which then coordinate with media download tools and libraries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MoneyPrinter V2

    MoneyPrinter V2

    Automate the process of making money online

    MoneyPrinter V2 is an open-source automation platform designed to streamline and scale online income generation workflows by combining content creation, social media automation, and marketing strategies into a single system. It is a complete rewrite of the original MoneyPrinter project, focusing on modularity, extensibility, and broader functionality across multiple monetization channels. The platform operates primarily through Python-based scripts that automate tasks such as generating and publishing YouTube Shorts, posting on social media platforms like Twitter, and executing affiliate marketing campaigns. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    Medeo Video Generator

    Medeo Video Generator

    AI-powered video generation skill for OpenClaw

    ...The project focuses on bridging the gap between language-based AI systems and multimedia outputs by enabling models to produce structured video content as part of their workflows. It supports tasks such as video generation, editing, and transformation, making it useful for applications in content creation, marketing, and automated media production. The framework is designed to be modular, allowing developers to plug video capabilities into larger AI pipelines or agent systems. It emphasizes ease of integration and scalability, enabling both simple use cases and more complex multimedia workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    ShortGPT

    ShortGPT

    AI framework for automated short video creation and editing tools

    ...ShortGPT includes specialized content engines that manage different workflows, such as generating short videos, producing longer videos, and translating existing videos into other languages. It can automatically assemble videos by combining generated scripts, sourced media assets, captions, and synthesized voice narration. A modular editing system based on structured markup and JSON allows editing steps to be broken into manageable components that can be interpreted by language models.
    Downloads: 11 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Robyn

    Robyn

    Experimental, AI/ML-powered and open sourced Marketing Mix Modeling

    ...Its goal is to democratize rigorous MMM: what traditionally required expert statisticians and expensive consulting becomes accessible to any company with data. Robyn takes in historical data (spends on different marketing channels, conversions, or revenue, and optional context or organic-media variables) and uses a combination of techniques, regularized regression (Ridge), time-series decomposition (trend, seasonality, holiday effects), and hyperparameter optimization (via evolutionary algorithms), to estimate the incremental impact of each marketing channel. It explicitly models “carry-over” (adstock) and diminishing-returns (saturation) effects per channel, enabling realistic modeling of how advertising persists over time and saturates.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    WhatsApp MCP Server

    WhatsApp MCP Server

    WhatsApp MCP server enabling AI access to chats and messaging

    ...All message data is stored in a local SQLite database and is only accessed when explicitly requested through defined tools, giving users control over how their data is used. It supports both sending and receiving messages, including various media types such as images, audio, videos, and documents. It integrates with AI applications like Claude through MCP, enabling conversational automation and contextual message retrieval.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    BettaFish

    BettaFish

    Public opinion analysis system

    BettaFish is an open-source, multi-agent public opinion analysis system built to automate the collection, deep analysis, and reporting of social media data at scale through conversational queries. It uses a modular architecture of specialized agents that collaborate to crawl mainstream platforms, extract multimodal content like text and short video, and synthesize insights through both statistical and large language model techniques. With a design that lets users pose questions in natural language and receive structured reports, charts, and visualizations, the system aims to break information cocoons and provide comprehensive views of trends and public sentiment. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Chatterbox

    Chatterbox

    SoTA open-source TTS

    ...If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub-200ms—ideal for production use in agents, applications, or interactive media.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 14
    AudioMuse-AI

    AudioMuse-AI

    AudioMuse-AI is an Open Source Dockerized environment

    ...By analyzing the underlying audio content rather than relying on external metadata services, the system can organize large personal music libraries and generate curated playlists for different moods or listening contexts. AudioMuse-AI integrates with several popular self-hosted music servers including Jellyfin, Navidrome, and Emby, allowing users to extend existing media servers with advanced AI-powered recommendation capabilities. The system uses machine learning and audio analysis tools such as Librosa and ONNX models to extract features directly from audio tracks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Story Flicks

    Story Flicks

    Generate high-definition story short videos with one click using AI

    ...It aims to let users generate high-definition short movies or video stories with minimal manual effort, using AI models under the hood to assemble visuals, timing, and possibly narration or subtitles. For creators who want to produce narrative short-form content — whether for social media, storytelling, or prototyping video ideas — story-flicks offers a lightweight, code-backed alternative to complex video editing suites. Because the project is open and modifiable, developers can customize the generation pipeline: adjust story structure, alter rendering parameters, tweak video quality or resolution, or integrate with other AI models (e.g. for audio, voice-over, or image-to-video). ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    ...Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. The framework targets both interactive graphical applications and media-rich experiences, making it a solid foundation for games, creative tools, or visualization systems that demand both performance and flexibility. While being low-level, it also provides sensible defaults and helper abstractions that reduce boilerplate and help teams maintain clear, maintainable code.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 17
    HunyuanWorld-Mirror

    HunyuanWorld-Mirror

    Fast and Universal 3D reconstruction model for versatile tasks

    ...The project sits within a broader family of Hunyuan models that explore world generation and 3D-consistent understanding, and this mirror variant makes the reconstruction stack easier to test. It’s attractive for rapid prototyping of scenes, environment scans, or reference assets when you need repeatable 3D results from ordinary media.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    notebooklm-py

    notebooklm-py

    Unofficial Python API and agentic skill for Google NotebookLM

    ...The project covers notebook management, source ingestion, conversational querying, research workflows, and sharing controls, while also enabling the generation of a wide range of study and media artifacts. These outputs include audio overviews, videos, slide decks, infographics, quizzes, flashcards, reports, data tables, and mind maps, with configurable formats and export options.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 19
    /last30days

    /last30days

    Claude Code skill that researches any topic across Reddit + X

    /last30days is a specialized Claude Code skill designed to research current trends and practices across Reddit, X, and the wider web from the last 30 days, synthesize that data, and produce copy-paste-ready prompts or summaries that reflect what the community is actually talking about now. Rather than returning generic model responses, it intelligently analyzes social media and community discussions to identify what’s genuinely trending or working in practice across topics ranging from prompt techniques to tool usage or cultural trends. This makes it particularly useful for prompt engineers, content creators, and developers who want up-to-date prompts and insights that align with the most recent consensus and shared best practices in fast-moving fields like AI tooling.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 20
    NarratoAI

    NarratoAI

    Using AI models to automatically provide commentary and edit videos

    NarratoAI is an open-source platform designed to automate the generation of narrative content using artificial intelligence. The system combines large language models with media processing capabilities to create scripts, stories, and structured narrative outputs from user inputs. NarratoAI supports workflows where users provide prompts, themes, or source materials, and the software organizes them into coherent narrative structures suitable for articles, scripts, or multimedia storytelling. The project integrates multiple AI components such as text generation models, content structuring pipelines, and automated editing tools to streamline content creation. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    SD.Next

    SD.Next

    All-in-one WebUI for AI generative image and video creation

    SD.Next is an all-in-one web user interface for generative image creation that expands beyond basic Stable Diffusion workflows to cover broader image and video generation, captioning, and processing tasks. It is designed as a power-user environment where model management, generation features, and workflow controls are centralized in a single UI rather than spread across separate scripts and utilities. The project emphasizes broad model support and includes mechanisms for discovering,...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 22
    Open Interpreter

    Open Interpreter

    A natural language interface for computers

    Open Interpreter is an open-source tool that provides a natural-language interface for interacting with your computer. It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 23
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    ...Cognee acts a semantic memory layer, unveiling hidden connections within your data and infusing it with your company's language and principles. This self-optimizing process ensures ultra-relevant, personalized, and contextually aware LLM retrievals. Any kind of data works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so many more. Add small or large files, or many files at once. We map out a knowledge graph from all the facts and relationships we extract from your data. Then, we establish graph topology and connect related knowledge clusters, enabling the LLM to "understand" the data.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 24
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 25
    AskUI Vision Agent

    AskUI Vision Agent

    Enable AI to control your desktop, mobile and HMI devices

    ...It is designed for multi-platform compatibility and supports multiple AI models so you can tailor perception and decision-making to your workload. The repository presents a feature overview, sample media, and frequent release notes, which show ongoing improvements such as CORS checks and other operational tweaks. The broader AskUI documentation covers the Python Vision Agent along with suite services and inference APIs, indicating a productized ecosystem rather than a single library. Community-curated lists also recognize Vision Agent as part of the broader “GUI agents” landscape, placing it among other computer-use agents.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB