Search Results for "video to text" - Page 3

Sort By:

Showing 106 open source projects for "video to text"

View related business solutions

Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

OmAgent

Build multimodal language agents for fast prototype and production

OmAgent is an open-source Python framework designed to simplify the development of multimodal language agents that can reason, plan, and interact with different types of data sources. The framework provides abstractions and infrastructure for building AI agents that operate on text, images, video, and audio while maintaining a relatively simple interface for developers. Instead of forcing developers to implement complex orchestration logic manually, the system manages task scheduling, worker coordination, and node optimization behind the scenes. Its architecture uses a graph-based workflow engine where tasks are represented as nodes in a directed workflow, enabling modular composition of complex reasoning pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
2

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
3

InternVL

A Pioneering Open-Source Alternative to GPT-4o

InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. The model supports a...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
4

AudioNotes

Extract audio and video content and organize it into a Markdown note

AudioNotes is an application (or proof-of-concept) that likely combines audio recording or playback with note-taking or annotation functionality — enabling users to record voice or audio and attach textual or timestamped notes, making it ideal for lectures, interviews, meetings, or personal memos. Such a tool offers a more expressive and flexible way to capture and revisit information: instead of just typed notes or raw audio, users get both audio context and structured notes. As an...

Downloads: 0 This Week

Last Update: 2025-12-04
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

The Arcade Library

Easy to use Python library for creating 2D arcade games

Arcade is an easy-to-use Python library for creating 2D video games. It provides a modern and straightforward API, enabling developers to craft engaging games and graphical applications efficiently. Arcade supports rendering shapes, handling user input, and managing game physics, making it suitable for both beginners and experienced developers.

Downloads: 6 This Week

Last Update: 2025-10-09
See Project
6

GLM-4.6V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and can output or act via tools seamlessly, bridging perception and execution. Its architecture supports a very large context window (on the order of 128K tokens during training), which lets it handle complex multimodal inputs like long documents, multi-page reports, or video transcripts, while maintaining coherence across extended content. ...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
7

DocArray

The data structure for multimodal data

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. ...

Downloads: 0 This Week

Last Update: 2025-03-21
See Project
8

myGPTReader

AI Slack bot for reading, summarizing, and chatting with content

myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use cases. It also integrates voice interaction capabilities, allowing users to communicate with the system verbally and even use it as a language practice assistant. ...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
9

Milvus Bootcamp

Dealing with all unstructured data, such as reverse image search

Milvus Bootcamp is a collection of tutorials, examples, and best practices for using Milvus, an open-source vector database designed for AI-powered similarity search and retrieval applications.

Downloads: 0 This Week

Last Update: 2025-05-22
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Generative AI for Beginners (Version 3)

21 Lessons, Get Started Building with Generative AI

...Lessons are split into “Learn” modules for core concepts and “Build” modules with hands-on code in Python and TypeScript, so you can jump in at any point that matches your goals. The course covers everything from model selection, prompt engineering, and chat/text/image app patterns to secure development practices and UX for AI. It also walks through modern application techniques such as function calling, RAG with vector databases, working with open source models, agents, fine-tuning, and using SLMs. Each lesson includes a short video, a written guide, runnable samples for Azure OpenAI, the GitHub Marketplace Model Catalog, and the OpenAI API, plus a “Keep Learning” section for deeper study.

Downloads: 1 This Week

Last Update: 3 days ago
See Project
11

txtai

Build AI-powered semantic search applications

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
12

InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models

...This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.

1 Review

Downloads: 15 This Week

Last Update: 2026-03-22
See Project
13

GLM-4.5V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
14

LLM Colosseum

Benchmark LLMs by fighting in Street Fighter 3

LLM-Colosseum is an experimental benchmarking framework designed to evaluate the capabilities of large language models through gameplay interactions rather than traditional text-based benchmarks. The system places language models inside the environment of the classic video game Street Fighter III, where they must interpret the game state and decide which actions to perform during combat. This setup creates a dynamic environment that tests reasoning, situational awareness, and decision-making abilities in real time. ...

Downloads: 0 This Week

Last Update: 2026-03-07
See Project
15

Jina

Build cross-modal and multimodal applications on the cloud

...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. Fast deployment to Kubernetes, Docker Compose and Jina Cloud. ...

Downloads: 0 This Week

Last Update: 2024-11-12
See Project
16

BWR Ai watermark remover

AI-powered tool to quickly remove watermarks from videos flawlessly

Blue Wave Remover is an advanced AI-driven video watermark removal software designed to effortlessly eliminate logos, text, timestamps, and watermarks from video content. Utilizing cutting-edge computer vision and generative AI algorithms, it accurately detects and removes both static and moving watermarks while preserving the original video's quality, colors, and clarity. The program supports popular video formats and offers batch processing for fast and efficient removal on multiple files. ...

1 Review

Downloads: 15 This Week

Last Update: 2025-10-29
See Project
17

SoundTranscriber

SoundTranscriber can be used to generate automatic transcription / aut

SoundTranscriber can be used to generate automatic transcription / aut

1 Review

Downloads: 2 This Week

Last Update: 2025-07-10
See Project
18

VideoCrafter2

Overcoming Data Limitations for High-Quality Video Diffusion Models

VideoCrafter is an open-source video generation and editing toolbox designed to create high-quality video content. It features models for both text-to-video and image-to-video generation. The system is optimized for generating videos from textual descriptions or still images, leveraging advanced diffusion models. VideoCrafter2, an upgraded version, improves on its predecessor by enhancing motion dynamics and concept combinations, especially in low-data scenarios. ...

1 Review

Downloads: 3 This Week

Last Update: 2025-03-06
See Project
19

xSTUDIO

xSTUDIO is a high performance playback and review tool.

xSTUDIO is a high performance playback and review tool designed by and for Visual Effects, Animation and Post Production professionals. The application can load and play large collections of media files. The efficient playback engine allows you to quickly load and play high resolution image formats with a wide range of file formats and encoding. Intuitive tools allow you to create and organise playlists and media sub-sets within playlists to build interactive review sessions, image and video...

Downloads: 19 This Week

Last Update: 2026-03-21
See Project
20

myplayer Free Karaoke Software

myplayer Free Karaoke & Media Player Software (Myanmar)

myplayer2k22 is a video player application (Computer Software) that not only lets you easily find and sing karaoke songs, but also lets you easily find and watch movies. Also compatible with Android Phone/Tablet [myplayer remote] app. Karaoke device (PC) doesn't have the song you want to sing, but you can sing with the karaoke song file on your phone, so it's convenient if you have your favorite song on your phone.

Downloads: 9 This Week

Last Update: 2024-12-19
See Project
21

Clipstitch

Uility to make home movies from your digital camera files

Full documentation: Download clipstitchX.Y.html To make movies from your camera (or phone) video files. FFmpeg is a professional-quality, free, open-source program for video editing, with the ability to implement a huge number of operations and handle every data format! This kind of ability comes at a cost: its commands are quite complex-looking and difficult to use and remember. Clipstitch runs as a front-end to ffmpeg so that you use only the sub-set of ffmpeg commands necessary for making a home movie from your digital camera, puts them in easier-to-read form, and internally combines multiple ffmpeg commands to do certain tasks.. ...

Downloads: 0 This Week

Last Update: 2025-03-24
See Project
22

FranMsxApps

My MSX programs and some additional .cas tools

...There is a web page that you can visit: https://www.frojasg1.com:8443/downloads_web/web/html/aplisMSX.html?origin=sourceforge Some of the programs are cool. The most relevant are: - A graphic designer in Assembler Z-80 - A ship game in Assembler Z-80 - A text to speech in Spanish. - The seed of a maze game. - My version of Tetris in Assembler Z-80 There are some demo videos, and even a video of a hisoft assembler session. I hope you enjoy

Downloads: 0 This Week

Last Update: 2025-08-12
See Project
23

Ainee

Ainee - AI Notetaking and Learning Companion

Ainee is your ultimate AI-powered notetaking and learning companion. Capture lecture notes in real-time and effortlessly transform audio, text, files, and YouTube videos into formatted notes, mindmaps, quizzes, flashcards, podcasts, and more. Explore our AI meeting note taker, AI notes, video transcript generator, PDF to AI converter, and AI flashcard maker. Enhance your learning with our AI voice recorder, article summarizer AI, and AI quiz generator. Additionally, share your knowledge base with others to foster the flow of information and help new users benefit from collective insights. ...

1 Review

Downloads: 0 This Week

Last Update: 2025-05-23
See Project
24

garysfm

An advanced file manager with qss themes and iso and folder previews

garysfm which stands for Gary's File Manager is a file manager with some advanced features. Those features include bulk renaming and folder image previews. I has rather advanced search functions, tab browsing with persistence between launches. It remembers your folder sorting and view options in icon view. It also remembers your active tabs between sessions. It has progress dialog while doing large operations like copying large files, and folders with many files. python version works on...

Downloads: 3 This Week

Last Update: 2025-10-20
See Project
25

Aphantasia

CLIP + FFT/DWT/RGB = text to image/video

This is a collection of text-to-image tools, evolved from the artwork of the same name. Based on CLIP model and Lucent library, with FFT/DWT/RGB parameterizes (no-GAN generation). Illustrip (text-to-video with motion and depth) is added. DWT (wavelets) parameterization is added. Check also colabs below, with VQGAN and SIREN+FFM generators. Tested on Python 3.7 with PyTorch 1.7.1 or 1.8.

Downloads: 0 This Week

Last Update: 2023-10-19
See Project