Framework for building real-time voice and multimodal AI agents
Code and models for ICML 2024 paper, NExT-GPT
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
A python tool that uses GPT-4, FFmpeg, and OpenCV
A Systematic Framework for Interactive World Modeling
Open Source Speech Language Model
Towards Human-Sounding Speech
ImageBind One Embedding Space to Bind Them All
Spring AI Alibaba examples for building and testing AI apps
VMZ: Model Zoo for Video Modeling
An Open Source text-to-speech system built by inverting Whisper
Controllable & emotion-expressive zero-shot TTS
A sound cloning tool with a web interface, using your voice
State-of-the-art diffusion models for image and audio generation
Data Infrastructure providing an approach to multimodal AI workloads
The official Python library for the OpenAI API
Official repository for LTX-Video
Industrial-level controllable zero-shot text-to-speech system
Document Image Parsing via Heterogeneous Anchor Prompting”
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Converts text to speech in realtime
Python library and CLI tool to interface with Google Translate
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A simple native web interface that uses ChatTTS to synthesize text