Taming Stable Diffusion for Lip Sync
GPT4V-level open-source multi-modal model based on Llama3-8B
Douyin TikTok Download API
Multimodal-Driven Architecture for Customized Video Generation
Large Audio Language Model built for natural interactions
Topic Modelling for Humans
ComfyUI wrapper nodes for HunyuanVideo
Easy to use Python library for creating 2D arcade games
Build Vision Agents quickly with any model or video provider
A speech-text foundation model for real time dialogue
Python Stream Processing
A python module to download twitter spaces
Code and models for ICML 2024 paper, NExT-GPT
Multi-Platform Live Stream Automatic Recording Tool
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Audio Normalization for Python/ffmpeg
NVR with realtime local object detection for IP cameras
A Web UI for easy subtitle using whisper model
Pythonic interface for FFmpeg/FFprobe command line
Recovering the Visual Space from Any Views
Free, high-quality text-to-speech API endpoint to replace OpenAI
Minimal scripts to run the emulator in a container for various systems
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Segmentation models with pretrained backbones. PyTorch
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning