AI-powered tool for generating, optimizing, and translating subtitles
A python tool that uses GPT-4, FFmpeg, and OpenCV
ComfyUI wrapper nodes for WanVideo and related models
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Code and models for ICML 2024 paper, NExT-GPT
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Framework for building real-time voice and multimodal AI agents
Voice Recognition to Text Tool
Powerful open source team chat application
Search all of YouTube from the command line
Code for running inference and finetuning with SAM 3 model
Multimodal embedding and reranking models built on Qwen3-VL
Multimodal Diffusion with Representation Alignment
Label Studio is a multi-type data labeling and annotation tool
Qwen2.5-VL is the multimodal large language model series
AI framework for automated short video creation and editing tools
Use Microsoft Edge's online text-to-speech service from Python
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Public opinion analysis system
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Automatically translates the text of a video based on a subtitle file
Generating Immersive, Explorable, and Interactive 3D Worlds
A Web UI for easy subtitle using whisper model
Windrecorder is a memory search app by records everything
21 Lessons, Get Started Building with Generative AI