An Open Source text-to-speech system built by inverting Whisper
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Implementation of Imagen, Google's Text-to-Image Neural Network
Multi-modal large language model designed for audio understanding
VITS2 backbone with multilingual-bert
High-Resolution Image Synthesis with Latent Diffusion Models
Multi-Voice and Prompt-Controlled TTS Engine
A performance-oriented patch interface for FluidSynth
Data Analysis, Simulations and Visualization on the Sphere
- RetroScheme is used for molecule sketching and retrosynthesis
Convert colors to synth presets
Air traffic control tower and radar simulator (solo + multi-player)
A Conversational Speech Generation Model
Best practice TTS based on BERT and VITS
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
Unofficial Parallel WaveGAN
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis
Create synth presets from words
Implementation of Nougat Neural Optical Understanding
Official PyTorch Implementation of "Scalable Diffusion Models"
Implementations and code to accompany DeepMind publications
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Implementation of NÜWA, attention network for text to video synthesis