A python tool that uses GPT-4, FFmpeg, and OpenCV
World's first open-source, agentic video production system
Build Vision Agents quickly with any model or video provider
Large Multimodal Models for Video Understanding and Editing
Code and models for ICML 2024 paper, NExT-GPT
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Framework for building real-time voice and multimodal AI agents
Voice Recognition to Text Tool
Code for running inference and finetuning with SAM 3 model
Multimodal embedding and reranking models built on Qwen3-VL
Powerful open source team chat application
Multimodal Diffusion with Representation Alignment
Search all of YouTube from the command line
Use Microsoft Edge's online text-to-speech service from Python
Qwen2.5-VL is the multimodal large language model series
AI framework for automated short video creation and editing tools
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Label Studio is a multi-type data labeling and annotation tool
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Generating Immersive, Explorable, and Interactive 3D Worlds
21 Lessons, Get Started Building with Generative AI
Public opinion analysis system
Windrecorder is a memory search app by records everything
A Web UI for easy subtitle using whisper model
Data Infrastructure providing an approach to multimodal AI workloads