Harmonized and Coherent Human Image Animation
Ffree local self hosted video compressor webui
Streaming Real-time Audio-Driven Avatar Generation
GPT4V-level open-source multi-modal model based on Llama3-8B
Voice Recognition to Text Tool
Private chat with local GPT with document, images, video, etc.
Public opinion analysis system
Recovering the Visual Space from Any Views
Easy to use Python library for creating 2D arcade games
AI Slack bot for reading, summarizing, and chatting with content
Mastering Diverse Domains through World Models
Benchmark LLMs by fighting in Street Fighter 3
Unofficial Python API and agentic skill for Google NotebookLM
A Web UI for easy subtitle using whisper model
Effortless data labeling with AI support from Segment Anything
The Cradle framework is a first attempt at General Computer Control
An Open Source package that allows video game creators
OCR expert VLM powered by Hunyuan's native multimodal architecture
Extract audio and video content and organize it into a Markdown note
Install Jenkins and configure Docker
A lightweight vision library for performing large object detection
Build cross-modal and multimodal applications on the cloud
AI-powered tool to quickly remove watermarks from videos flawlessly
Slim Camera - Lightweight RTSP Video Player
xSTUDIO is a high performance playback and review tool.