ComfyUI wrapper nodes for WanVideo and related models
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A python tool that uses GPT-4, FFmpeg, and OpenCV
Code and models for ICML 2024 paper, NExT-GPT
Framework for building real-time voice and multimodal AI agents
Voice Recognition to Text Tool
Powerful open source team chat application
Multimodal embedding and reranking models built on Qwen3-VL
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Search all of YouTube from the command line
Label Studio is a multi-type data labeling and annotation tool
Public opinion analysis system
Qwen2.5-VL is the multimodal large language model series
AI framework for automated short video creation and editing tools
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Use Microsoft Edge's online text-to-speech service from Python
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
A Web UI for easy subtitle using whisper model
Windrecorder is a memory search app by records everything
Automatically translates the text of a video based on a subtitle file
Generating Immersive, Explorable, and Interactive 3D Worlds
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
Extract audio and video content and organize it into a Markdown note
Easy to use Python library for creating 2D arcade games