Showing 484 open source projects for "video-subtitle-extract"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    Subtitle group machine translation assistant - [Function 1: Translate subtitle file] .srt .ass .vtt [Function 2: Voice to text] (Drag in video or audio to recognize subtitles) (The latest version v4.1.0 Update time 2021 2 May 23) 12 translation service providers can be configured, such as Google, Baidu, Tencent, Caiyun, IBM, Azure, Amazon, etc. (6 voice service providers can be configured: Alibaba Cloud, Xunfei, Tencent Cloud, IBM, Azure, Amazon ) Advantages: 1. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    LLPlayer

    LLPlayer

    The media player for language learning, with dual subtitles

    LLPlayer is an open-source media player designed specifically for language learning through video content. Unlike traditional media players, the application focuses on advanced subtitle-related features that help learners understand and interact with foreign language media more effectively. The player supports dual subtitles so users can simultaneously view text in both the original language and their native language while watching videos.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 3
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    VideoCaptioner

    VideoCaptioner

    AI-powered tool for generating, optimizing, and translating subtitles

    VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps.
    Downloads: 13 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    ...The tool then time-stretches or compresses each TTS clip to match the original speech duration exactly, which preserves lip-sync and rhythm as closely as possible without manual editing. Finally, it combines all the clips into a single dubbed audio track that can be muxed with the original video, along with new translated subtitle files.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    KrillinAI

    KrillinAI

    Video translation and dubbing tool powered by LLMs

    KrillinAI is an end-to-end content localization, translation, and dubbing tool aimed at helping creators transform videos into multiple languages with minimal manual effort. It integrates several stages of the pipeline: video acquisition (either from local files or remote via download tools), speech recognition (ASR), subtitle segmentation and alignment, machine translation (with context-aware translation to preserve semantics), and voice cloning + text-to-speech (TTS) to produce dubbed audio tracks. KrillinAI supports both landscape and portrait videos, which makes it suitable for a wide range of platforms — from YouTube to TikTok or other vertical-video sites — and ensures correct formatting and layout for the final video.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Short Video Factory

    Short Video Factory

    AI tool for automatic batch short video creation and editing

    Short Video Factory is an open source desktop application designed to simplify the creation of short-form videos using AI-driven automation. It enables users to generate product marketing clips and general content videos by combining simple prompt-based input with pre-prepared media assets. Short Video Factory integrates multiple stages of video production, including script generation, voice synthesis, video editing, and subtitle effects, into a single streamlined workflow. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    LTX-Video

    LTX-Video

    Official repository for LTX-Video

    LTX-Video is a sophisticated multimedia processing framework from Lightricks designed to handle high-quality video editing, compositing, and transformation tasks with performance and scalability. It provides runtime components that efficiently decode, encode, and manipulate video streams, frame buffers, and audio tracks while exposing a rich API for building customized editing features like transitions, effects, color grading, and keyframe automation.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 11
    MoneyPrinterTurbo

    MoneyPrinterTurbo

    Generate short videos with one click using AI LLM

    MoneyPrinterTurbo is an AI-driven tool that enables users to generate high-definition short videos with minimal input. By providing a topic or keyword, the system automatically creates video scripts, sources relevant media assets, adds subtitles, and incorporates background music, resulting in a polished video ready for distribution.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 12
    AutoSubs

    AutoSubs

    Instantly generate AI-powered subtitles on your device

    AutoSubs is an open-source, AI-powered subtitle generation tool that enables users to automatically transcribe audio and video content into accurate, editable subtitles directly on their device. It supports both standalone usage and integration with professional video editing software such as DaVinci Resolve, allowing creators to generate and edit subtitles within their existing workflows.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 13
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    stt

    stt

    Voice Recognition to Text Tool

    stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Whisper-WebUI

    Whisper-WebUI

    A Web UI for easy subtitle using whisper model

    Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription speed and reducing memory usage compared to standard models. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 17
    TikTok MCP

    TikTok MCP

    Model Context Protocol (MCP) with TikTok integration

    The TikTok MCP integrates TikTok access into AI applications like Claude AI via TikNeuron. It enables analysis and interaction with TikTok content to determine virality factors and extract video content. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Medeo Video Generator

    Medeo Video Generator

    AI-powered video generation skill for OpenClaw

    Medeo Video Generator is an AI-driven project designed to enable advanced video processing and generation capabilities within agent-based or automation systems. It provides a “skill” module that can be integrated into AI agents, allowing them to create, edit, and manipulate video content programmatically. The project focuses on bridging the gap between language-based AI systems and multimedia outputs by enabling models to produce structured video content as part of their workflows. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible motion and visuals. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 21
    Video Diffusion - Pytorch

    Video Diffusion - Pytorch

    Implementation of Video Diffusion Models

    Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. It uses a special space-time factored U-net, extending generation from 2D images to 3D videos. 14k for difficult moving mnist (converging much faster and better than NUWA) - wip.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    EasyVoice

    EasyVoice

    Open source text-to-speech tool, supports extra-long text

    ...The system supports multi-role voice acting, letting users assign different neural voices to different characters or narrative roles and configure parameters such as rate, pitch, and volume per role. It offers streaming playback so audio starts almost immediately, even for very long inputs, and automatically generates subtitle files suitable for video production or translation workflows. Under the hood, easyVoice uses a modern stack with Vue 3 and Element Plus on the front end, Node.js and Express on the back end, and TTS engines such as Microsoft Azure TTS and OpenAI-compatible APIs, orchestrated through ffmpeg.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Make-A-Video - Pytorch (wip)

    Make-A-Video - Pytorch (wip)

    Implementation of Make-A-Video, new SOTA text to video generator

    Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch. They combine pseudo-3d convolutions (axial convolutions) and temporal attention and show much better temporal fusion. The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks".
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    xgplayer

    xgplayer

    A HTML5 video player with a parser that saves traffic

    xgplayer is a web-friendly, open-source media player library maintained by ByteDance, designed for playing audio/video streams in browsers or web applications with robust control, flexibility, and extensibility. It abstracts many of the lower-level complexities of HTML5 media, providing a consistent API for playback control, custom UI overlays, adaptive streaming, plugin hooks, and cross-browser compatibility. Because of its emphasis on modularity and extensibility, xgplayer can be embedded into modern web projects and customized — developers can add controls, custom buffering strategies, subtitle handling, adaptive bitrate streaming, or integrate with other web-based video infrastructures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    ...It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. The library is asynchronous under the hood, which makes it efficient for batch jobs or web services that need to synthesize many utterances concurrently.
    Downloads: 19 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next