Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "video to text" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Windows 97
Linux 90
Mac 81
More...
BSD 37
ChromeOS 34
Mobile Operating Systems 5

Category

Artificial Intelligence 75
Multimedia 26
Software Development 8
System 7
Communications 6
Text Editors 6
Business 4
Formats and Protocols 2
Games 2
Productivity 2
Security 2
Database 1
Education 1
Internet 1
Scientific/Engineering 1

License

OSI-Approved Open Source 91

Translations

Programming Language

Python 106
JavaScript 5
ActionScript 2
C++ 2
TypeScript 2
More...
Unix Shell 2
Assembly 1
BASIC 1
C 1
C# 1
Java 1
PowerShell 1
Zope 1

Status

Production/Stable 16
Beta 8
Pre-Alpha 4
Mature 1

Showing 106 open source projects for "video to text"

View related business solutions

Python Clear Filters & Widen Search

AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
1

VideoCaptioner

AI-powered tool for generating, optimizing, and translating subtitles

VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps. After transcription, large language models are used to intelligently restructure subtitles into natural sentences, correct wording, and improve readability for viewers. ...

Downloads: 10 This Week

Last Update: 2026-03-28
See Project
2

AI YouTube Shorts Generator

A python tool that uses GPT-4, FFmpeg, and OpenCV

AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...

Downloads: 12 This Week

Last Update: 4 days ago
See Project
3

ComfyUI-WanVideoWrapper

ComfyUI wrapper nodes for WanVideo and related models

The ComfyUI-WanVideoWrapper project is a custom node extension for ComfyUI that enables advanced video generation workflows using WanVideo diffusion models. It acts as a standalone wrapper layer that allows developers and creators to integrate experimental features and models without modifying the core ComfyUI codebase. This design makes it easier to rapidly test new capabilities such as text-to-video and image-to-video generation while avoiding compatibility issues with the main framework. ...

Downloads: 1 This Week

Last Update: 2026-04-16
See Project
4

HY-World 2.0

A Multi-Modal World Model for Reconstructing, Generating, Simulation

HY-World 2.0 is a multi-modal world model framework for reconstructing, generating, and simulating navigable 3D worlds from diverse inputs. It accepts text prompts, single-view images, multi-view images, and videos, and produces 3D world representations rather than limiting output to flat video generation. For text and single-image inputs, it generates high-fidelity 3D Gaussian Splatting scenes through a multi-stage pipeline that includes panorama generation, trajectory planning, world expansion, and world composition. ...

Downloads: 13 This Week

Last Update: 2026-04-24
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT

NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different sensory formats and generate responses in different media types. This architecture allows the model to convert between modalities, such as generating images from text descriptions or producing audio or video outputs based on textual prompts. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
6

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...

1 Review

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
7

Pipecat

Framework for building real-time voice and multimodal AI agents

...Developers can create a wide range of interactive systems including voice assistants, customer service agents, interactive storytelling applications, and multimodal interfaces that combine voice, video, images, and text. Its modular architecture allows components to be composed into pipelines that process audio, text, and video streams in real time.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
8

stt

Voice Recognition to Text Tool

stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. ...

Downloads: 0 This Week

Last Update: 2026-02-17
See Project
9

Zulip

Powerful open source team chat application

Zulip is a powerful open source group chat application that combines the immediacy of real-time chat with the productivity benefits of a threaded conversation model. Zulip’s unique threading model allows users to easily catch up on important conversations, helping to save time and increase productivity.

Downloads: 4 This Week

Last Update: 6 days ago
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

yt-fts

Search all of YouTube from the command line

yt-fts, short for YouTube Full Text Search, is an open-source command-line tool that enables users to search the spoken content of YouTube videos by indexing their subtitles. The program automatically downloads subtitles from a specified YouTube channel using the yt-dlp utility and stores them in a local SQLite database. Once indexed, users can perform full-text searches across all transcripts to quickly locate keywords or phrases mentioned within the videos.

Downloads: 2 This Week

Last Update: 2026-03-06
See Project
11

SAM 3

Code for running inference and finetuning with SAM 3 model

SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...

Downloads: 33 This Week

Last Update: 6 days ago
See Project
12

Qwen3-VL-Embedding

Multimodal embedding and reranking models built on Qwen3-VL

...Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
13

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. ...

Downloads: 2 This Week

Last Update: 2025-09-28
See Project
14

Label Studio

Label Studio is a multi-type data labeling and annotation tool

...Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.

Downloads: 19 This Week

Last Update: 2026-03-13
See Project
15

Qwen-2.5-VL

Qwen2.5-VL is the multimodal large language model series

Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...

Downloads: 13 This Week

Last Update: 2026-01-30
See Project
16

ShortGPT

AI framework for automated short video creation and editing tools

ShortGPT is an experimental AI-powered framework designed to automate the creation of short-form and long-form video content. It provides a structured system that handles multiple stages of the content creation workflow, including script generation, asset sourcing, voiceover synthesis, and video editing. ShortGPT uses large language models to generate scripts and prompts that guide the automated editing and production process. ShortGPT includes specialized content engines that manage...

Downloads: 0 This Week

Last Update: 2026-03-13
See Project
17

edge-tts

Use Microsoft Edge's online text-to-speech service from Python

...It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. The library is asynchronous under the hood, which makes it efficient for batch jobs or web services that need to synthesize many utterances concurrently.

Downloads: 21 This Week

Last Update: 2026-03-22
See Project
18

MiniCPM-o

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

...It supports both text and audio inputs to generate outputs in various forms, including voice cloning, emotion control, and interactive role-playing.

Downloads: 0 This Week

Last Update: 2025-05-15
See Project
19

BettaFish

Public opinion analysis system

...Unlike simpler analytics tools, BettaFish employs agent collaboration and a “forum” style internal mechanism to combine diverse model outputs, making the analysis richer and more robust. It also integrates multimodal processing, enabling it to parse images and video alongside text.

Downloads: 1 This Week

Last Update: 2026-02-17
See Project
20

WhisperJAV

Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

...WhisperJAV introduces a specialized pipeline that separates text generation from timestamp alignment, allowing the system to generate transcripts and then align them with audio using forced alignment techniques. The framework supports several speech recognition models, including Qwen-based ASR systems and fine-tuned Whisper models trained on domain-specific dialogue.

Downloads: 22 This Week

Last Update: 3 days ago
See Project
21

Auto Synced & Translated Dubs

Automatically translates the text of a video based on a subtitle file

Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
22

HunyuanWorld 1.0

Generating Immersive, Explorable, and Interactive 3D Worlds

HunyuanWorld-1.0 is an open-source, simulation-capable 3D world generation model developed by Tencent Hunyuan that creates immersive, explorable, and interactive 3D environments from text or image inputs. It combines the strengths of video-based diversity and 3D-based geometric consistency through a novel framework using panoramic world proxies and semantically layered 3D mesh representations. This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and disentangled object representations for enhanced interactivity. ...

Downloads: 3 This Week

Last Update: 2026-04-15
See Project
23

Whisper-WebUI

A Web UI for easy subtitle using whisper model

Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription speed and reducing memory usage compared to standard models. It supports multiple input sources including local files, YouTube content, and microphone input, making it versatile for different workflows. ...

Downloads: 9 This Week

Last Update: 2026-03-18
See Project
24

Windrecorder

Windrecorder is a memory search app by records everything

Windrecorder is an open-source personal memory search engine that continuously records on-screen activity in a highly optimized and storage-efficient format. It captures screen content locally and builds a searchable database using OCR and image understanding, allowing users to rewind and rediscover anything they have previously seen. The system indexes only meaningful visual changes, extracting text, browser data, and contextual information to improve search accuracy and reduce storage...

Downloads: 5 This Week

Last Update: 2026-04-24
See Project
25

Generative AI for Beginners (Version 3)

21 Lessons, Get Started Building with Generative AI

...Lessons are split into “Learn” modules for core concepts and “Build” modules with hands-on code in Python and TypeScript, so you can jump in at any point that matches your goals. The course covers everything from model selection, prompt engineering, and chat/text/image app patterns to secure development practices and UX for AI. It also walks through modern application techniques such as function calling, RAG with vector databases, working with open source models, agents, fine-tuning, and using SLMs. Each lesson includes a short video, a written guide, runnable samples for Azure OpenAI, the GitHub Marketplace Model Catalog, and the OpenAI API, plus a “Keep Learning” section for deeper study.

Downloads: 9 This Week

Last Update: 1 day ago
See Project

Previous
1
You're on page 2
3
4
5
Next

Related Searches

tts

youtube

mega-voice

srt to speech

multilingual language

subtitle

ai video generator

youtube videos

transcribe audio to srt

speech

Related Categories

Artificial Intelligence

Multimedia

Software Development

System

Communications

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise