An Open Source text-to-speech system built by inverting Whisper
Towards Human-Sounding Speech
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Generate Any 3D Scene in Seconds
TextWorld is a sandbox learning environment for the training
Implementation of Make-A-Video, new SOTA text to video generator
Open-Sora: Democratizing Efficient Video Production for All
End-to-end speech processing toolkit
A TTS model capable of generating ultra-realistic dialogue
Interface for OuteTTS models
Documentation for Google's Gen AI site - including Gemini API & Gemma
Foundational model for human-like, expressive TTS
Implementation of AudioLM audio generation model in Pytorch
Real-time voice interactive digital human
MARS5 speech model (TTS) from CAMB.AI
Synchronized Translation for Videos
Sample code and notebooks for Generative AI on Google Cloud
Towards Real-World Vision-Language Understanding
Provides CTP stock options and Zhongtai Securities XTP
Large-language-model & vision-language-model based on Linear Attention
Guiding Instruction-based Image Editing via Multimodal Large Language
Diffusion Transformer with Fine-Grained Chinese Understanding
Multi-lingual large voice generation model, providing inference
Bailing is a voice dialogue robot similar to GPT-4o